Reference-based Enterotype Classification Tool
Enterotypes describe clustering patterns across samples from the human intestinal microbiome that are associated with disease, medication, diet, and lifestyle. The Enterotyper is a computational tool designed for the scientific community to classify human fecal metagenomes into established enterotypes. Using a robust XGboost machine learning model, this tool provides accurate enterotype classification based on a large global dataset of fecal metagenomes.
Please cite when you use Enterotyper: Keller et al.
Please upload a taxonomic profile generated either by one of the following taxonomic profiler:
Alternatively, you can also upload a GTDB taxonomic profile at genus level with genera as rownames and sample names as columns. An example file is shown here.
EnterotypeAssignment_EnterotypeModel.tsv
contains the enterotype classification. In case of FKM clustering, a classification strength for each enterotype for each sample is also reported. The enterotype with the highest classification strength determines the hard classification. In addition, the Enterotype Dysbiosis Score, which is the inverted, z-score normalised and scaled between 0 and 1 maximum classification strength, is reported in the FKM 3-enterotype model.GTDB-genus-level-taxonomy.tsv
(adapt name according to Daniels output) contains the GTD genus-level taxonomy profiles for the uploaded data which is used as a basis for the Enterotyper.The Enterotype assignments provided by Enterotyper are independent of de novo clustering. This allows for robust identification across studies and also in data sets that are too small for de novo clustering.
We have constructed, XGboost regression models to predict the strength of each enterotype determined by FKM clustering. The prediction models were trained based on the GTDB genus-level taxonomic profiles of the dataset using the train function of the caret package in R with a 10-times repeated 10-fold cross-validation procedure. Before the model construction, minor genera with an average relative abundance of <1E-4 were excluded and only abundant genera above the threshold were used for the training. To avoid model overfitting due to multiple samples derived from the same individuals, we ensured these samples were incorporated exclusively in either the training or the evaluation data during each cross-validation fold. The models' accuracies were evaluated by applying the prediction models to the unused validation dataset including 347 samples from three studies. Separate models were constructed for each enterotype (i.e., two models for the two-enterotype clustering and three models for three-enterotype clustering) and the highest strength obtained from these models were used for enterotype classification and EDS of each sample.
Additionally, binary LASSO classification models were also constructed to predict enterotypes determined by PAM clustering. The models were trained and constructed in the same procedure described above. The highest score obtained from the LASSO classification models was used for enterotype assignment.
You can find more information on the enterotype models and the enterotype dysbiosis score in the paper Keller et al. (add link).
In addition, we can recommend the following papers to get familiar with the Enterotype concept:
and with alternative methods and concepts: