Enterotyping : R tutorials


Enterotypes are a way to stratify human individuals based on their gut microbiome (Arumugam, Raes et al. 2011). They are derived from the relative abundances of different microbial groups in an individual's gut microbiome (currently at the genus level; please see the Wikipedia article on "Relative species abundance" to understand the concept of relative abundance). Enterotypes are not discrete types, like blood types, but rather densely populated regions in a higher dimensional space of microbiome features. We hypothesize that there are densely populated areas with a large fraction of human individuals, with sparsely populated regions in between or around them, but the reason for such distribution is not known yet, nor are sufficient data published to quantify the robustness of the classification and possible switches of enterotypes over time. These pages will help to detect and to characterise enterotypes in old and new datasets.

This website will serve as a repository of tutorials to reproduce past and current work on enterotypes. The methods and code snippets provided here can also be applied to your own datasets, for example to verify whether enterotypes exist in your datasets. Since we are actively developing several methods for analyzing the human gut microbiome datasets and for stratifying them, we will be regularly updating this website once our methods are stable and usable by the public.


1. Original publication

In April 2011, the MetaHIT consortium published the discovery of enterotypes in the human gut microbiome (Arumugam, Raes et al. 2011). The data associated with the study were made publicly available, and the theory behind the computational procedures was explained in the Supplementary Information of the article. However, the exact set of commands (in the R environment) that would enable anyone to reproduce all the figures in the article was not reported in the supplement. Here we present a detailed tutorial to reproduce our work in the original article using R with datasets used in the original article.

Open the tutorial