Key Features:
- Enterotype Classification: XGboost and Lasso regression models, trained with 10x repeated 10-fold cross-validation procedure and tested on an independent validation dataset, for accurate classification of microbiome samples into enterotypes.
- Different Enterotype Models: By default, Enterotyper uses the 3-enterotype model derived from FKM (fuzzy k-means) clustering but also offers a 2-enterotype model as well as 2-, 3-, and 4-enterotypes models derived from PAM (partition around medoid) clustering.
- Enterotype Dysbiosis Score (EDS): Quantifies the dysbiosis state of the microbiome samples within the enterotype landscape using a novel scoring system to facilitate health-related microbial studies. The EDS is calculated only when using the default 3-enterotype FKM model by inverting, z-score normalizing using the cemter and standard deviation from the global enterotype training data as a reference, and scaling the maximum strength for a sample to obtain values between 0 and 1. A higher EDS indicates a more dysbiotic microbial community for the sample.
- Comprehensive Dataset: Built on a global large-scale metagenomic dataset of 16,772 fecal metagenomic samples from 129 studies. Take a closer look at the global training dataset on this world map
- Compatible with various taxonomic classifiers: Enterotyper accepts taxonomic profiles from a variety of different taxonomic classifiers. See below [link to Usage/Input part] for more information.
- Data Visualization: Evaluate enterotype classifications and EDS with built-in visualization output for a quick first assessment.
How it works
The Enterotype assignments provided by Enterotyper are independent of de novo clustering. This allows for robust identification across studies and also in data sets that are too small for de novo clustering.
We have constructed, XGboost regression models to predict the strength of each enterotype determined by FKM clustering. The prediction models were trained based on the GTDB genus-level taxonomic profiles of the dataset using the train function of the caret package in R with a 10-times repeated 10-fold cross-validation procedure. Before the model construction, minor genera with an average relative abundance of <1E-4 were excluded and only abundant genera above the threshold were used for the training. To avoid model overfitting due to multiple samples derived from the same individuals, we ensured these samples were incorporated exclusively in either the training or the evaluation data during each cross-validation fold. The models' accuracies were evaluated by applying the prediction models to the unused validation dataset including 347 samples from three studies. Separate models were constructed for each enterotype (i.e., two models for the two-enterotype clustering and three models for three-enterotype clustering) and the highest strength obtained from these models were used for enterotype classification and EDS of each sample.
Additionally, binary LASSO classification models were also constructed to predict enterotypes determined by PAM clustering. The models were trained and constructed in the same procedure described above. The highest score obtained from the LASSO classification models was used for enterotype assignment.
Enterotype concept
In an attempt to simplify the complex structure of the fecal microbiome, taxonomic profiles have been grouped into distinct, reproducible microbial community clusters (often at the genus level) called ‘enterotypes,’ which are dominated by and typically named after specific taxa. Enterotypes were introduced in 2011 based on 33 samples by Arumugam et al. using partitioning around medoid (PAM) clustering and have been repeatedly confirmed in continuously growing datasets (Costea et al. 2017). The latest work used 16,772 metagenomes for PAM clustering and introduced the use of fuzzy k-means (FKM) clustering-based enterotyping, which accounts for the inherent continuous nature of the microbiome by allowing overlapping clusters. Fuzzy clustering reports a classification strength for each sample for each enterotype, reflecting the consistency of enterotype classification in multiple cluster iterations. The Enterotype Dysbiosis Score (EDS) is calculated based on the classification strength with lower strength reporting a higher dysbiosis.
In addition, we can recommend the following papers to get familiar with the Enterotype concept:
- Arumugam, M., J. Raes, E. Pelletier, D. Le Paslier, T. Yamada, D. R. Mende, G. R. Fernandes, et al. 2011. “Enterotypes of the Human Gut Microbiome.” Nature 473 (7346): 174–80. https://doi.org/10.1038/nature09944.
- Costea, Paul I., Falk Hildebrand, Manimozhiyan Arumugam, Fredrik Bäckhed, Martin J. Blaser, Frederic D. Bushman, Willem M. de Vos, et al. 2018. “Enterotypes in the Landscape of Gut Microbial Community Composition.” Nature Microbiology 3 (1): 8–16. https://doi.org/10.1038/s41564-017-0072-8.
- Keller, M. I., et al. (2024). "Refined enterotyping reveals dysbiosis in global fecal metagenomes [Preprint]." bioRxiv. https://doi.org/10.1101/2024.08.13.607711.
There is other work that explores alternative methods to identify subclusters in fecal microbiome composition. These align well with the enterotypes concept:
- Holmes, Ian, Keith Harris, and Christopher Quince. 2012. “Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics.” PLOS ONE 7 (2): e30126. https://doi.org/10.1371/journal.pone.0030126.
- Frioux, Clémence, Rebecca Ansorge, Ezgi Özkurt, Chabname Ghassemi Nedjad, Joachim Fritscher, Christopher Quince, Sebastian M. Waszak, and Falk Hildebrand. 2023. “Enterosignatures Define Common Bacterial Guilds in the Human Gut Microbiome.” Cell Host & Microbe 0 (0). https://doi.org/10.1016/j.chom.2023.05.024.
- Tap, Julien, Franck Lejzerowicz, Aurélie Cotillard, Matthieu Pichaud, Daniel McDonald, Se Jin Song, Rob Knight, Patrick Veiga, and Muriel Derrien. 2023. “Global Branches and Local States of the Human Gut Microbiome Define Associations with Environmental and Intrinsic Factors.” Nature Communications 14 (1): 3310. https://doi.org/10.1038/s41467-023-38558-7.