Enterotype Logo

Reference-based Enterotype Classification Tool

Enterotypes describe clustering patterns across samples from the human intestinal microbiome that are associated with disease, medication, diet, and lifestyle. The Enterotyper is a computational tool designed for the scientific community to classify human fecal metagenomes into established enterotypes. Using a robust XGboost machine learning model, this tool provides accurate enterotype classification based on a large global dataset of fecal metagenomes.

Step 1: Upload a taxonomic profile.
Step 2: Choose the enterotype model.

Step 3: Choose number of enterotypes.

Please cite when you use Enterotyper: Keller et al.

Show example

Key Features:

  • Enterotype Classification: XGboost and Lasso regression models, trained with 10x repeated 10-fold cross-validation procedure and tested on an independent validation dataset, for accurate classification of microbiome samples into enterotypes.
  • Different Enterotype Models: By default, Enterotyper uses the 3-enterotype model derived from FKM (fuzzy k-means) clustering but also offers a 2-enterotype model as well as 2-, 3-, and 4-enterotypes models derived from PAM (partition around medoid) clustering.
  • Enterotype Dysbiosis Score (EDS): Quantifies the dysbiosis state of the microbiome samples within the enterotype landscape using a novel scoring system to facilitate health-related microbial studies. The EDS is calculated only when using the default 3-enterotype FKM model by inverting, z-score normalizing using the cemter and standard deviation from the global enterotype training data as a reference, and scaling the maximum strength for a sample to obtain values between 0 and 1. A higher EDS indicates a more dysbiotic microbial community for the sample.
  • Comprehensive Dataset: Built on a global large-scale metagenomic dataset of 16,772 fecal metagenomic samples from 129 studies. Take a closer look at the global training dataset on this world map
  • Compatible with various taxonomic classifiers: Enterotyper accepts taxonomic profiles from a variety of different taxonomic classifiers. See below [link to Usage/Input part] for more information.
  • Data Visualization: Evaluate enterotype classifications and EDS with built-in visualization output for a quick first assessment.

Usage

Input

Please upload a taxonomic profile generated either by one of the following taxonomic profiler:

Alternatively, you can also upload a GTDB taxonomic profile at genus level with genera as rownames and sample names as columns. An example file is shown here.

Output

More information...

.. on the Enterotyper

The Enterotype assignments provided by Enterotyper are independent of de novo clustering. This allows for robust identification across studies and also in data sets that are too small for de novo clustering.

We have constructed, XGboost regression models to predict the strength of each enterotype determined by FKM clustering. The prediction models were trained based on the GTDB genus-level taxonomic profiles of the dataset using the train function of the caret package in R with a 10-times repeated 10-fold cross-validation procedure. Before the model construction, minor genera with an average relative abundance of <1E-4 were excluded and only abundant genera above the threshold were used for the training. To avoid model overfitting due to multiple samples derived from the same individuals, we ensured these samples were incorporated exclusively in either the training or the evaluation data during each cross-validation fold. The models' accuracies were evaluated by applying the prediction models to the unused validation dataset including 347 samples from three studies. Separate models were constructed for each enterotype (i.e., two models for the two-enterotype clustering and three models for three-enterotype clustering) and the highest strength obtained from these models were used for enterotype classification and EDS of each sample.

Additionally, binary LASSO classification models were also constructed to predict enterotypes determined by PAM clustering. The models were trained and constructed in the same procedure described above. The highest score obtained from the LASSO classification models was used for enterotype assignment.

.. on Enterotypes

You can find more information on the enterotype models and the enterotype dysbiosis score in the paper Keller et al. (add link).

In addition, we can recommend the following papers to get familiar with the Enterotype concept:

and with alternative methods and concepts: