journal article Open Access May 02, 2023

The best practice for microbiome analysis using R

View at Publisher Save 10.1093/procel/pwad024
Abstract
AbstractWith the gradual maturity of sequencing technology, many microbiome studies have published, driving the emergence and advance of related analysis tools. R language is the widely used platform for microbiome data analysis for powerful functions. However, tens of thousands of R packages and numerous similar analysis tools have brought major challenges for many researchers to explore microbiome data. How to choose suitable, efficient, convenient, and easy-to-learn tools from the numerous R packages has become a problem for many microbiome researchers. We have organized 324 common R packages for microbiome analysis and classified them according to application categories (diversity, difference, biomarker, correlation and network, functional prediction, and others), which could help researchers quickly find relevant R packages for microbiome analysis. Furthermore, we systematically sorted the integrated R packages (phyloseq, microbiome, MicrobiomeAnalystR, Animalcules, microeco, and amplicon) for microbiome analysis, and summarized the advantages and limitations, which will help researchers choose the appropriate tools. Finally, we thoroughly reviewed the R packages for microbiome analysis, summarized most of the common analysis content in the microbiome, and formed the most suitable pipeline for microbiome analysis. This paper is accompanied by hundreds of examples with 10,000 lines codes in GitHub, which can help beginners to learn, also help analysts compare and test different tools. This paper systematically sorts the application of R in microbiome, providing an important theoretical basis and practical reference for the development of better microbiome tools in the future. All the code is available at GitHub github.com/taowenmicro/EasyMicrobiomeR.
Topics

No keywords indexed for this article. Browse by subject →

References
71
[1]
Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns

Amnon Amir, Daniel McDonald, Jose A. Navas-Molina et al.

mSystems 2017 10.1128/msystems.00191-16
[2]
Aßhauer "Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data" Bioinformatics (2015) 10.1093/bioinformatics/btv287
[3]
microViz: an R package for microbiome data visualization and statistics

David Barnett, Ilja Arts, John Penders

The Journal of Open Source Software 2021 10.21105/joss.03201
[4]
Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

Evan Bolyen, Jai Ram Rideout, Matthew R. Dillon et al.

Nature Biotechnology 2019 10.1038/s41587-019-0209-9
[5]
DADA2: High-resolution sample inference from Illumina amplicon data

Benjamin J Callahan, Paul J McMurdie, Michael J Rosen et al.

Nature Methods 2016 10.1038/nmeth.3869
[6]
QIIME allows analysis of high-throughput community sequencing data

J Gregory Caporaso, Justin Kuczynski, Jesse Stombaugh et al.

Nature Methods 2010 10.1038/nmeth.f.303
[7]
Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome

Víctor J. Carrión, Juan Perez-Jaramillo, Viviane Cordovez et al.

Science 2019 10.1126/science.aaw9285
[8]
VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R

Hanbo Chen, Paul C Boutros

BMC Bioinformatics 2011 10.1186/1471-2105-12-35
[9]
Chen "EVenn: easy to create repeatable and editable Venn diagrams and Venn networks online" J Genet Genom (2021) 10.1016/j.jgg.2021.07.007
[10]
Chen "Parallel-Meta Suite: interactive and rapid microbiome data analysis on multiple platforms" iMeta (2022) 10.1002/imt2.1
[11]
Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data

Jasmine Chong, Peng Liu, Guangyan Zhou et al.

Nature Protocols 2020 10.1038/s41596-019-0264-1
[12]
UpSetR: an R package for the visualization of intersecting sets and their properties

Jake R Conway, Alexander Lex, Nils Gehlenborg

Bioinformatics 2017 10.1093/bioinformatics/btx364
[13]
Dimitriadou "Misc functions of the Department of Statistics (e1071), TU Wien" R Package (2008)
[14]
Theade4Package: Implementing the Duality Diagram for Ecologists

Stéphane Dray, Anne-Béatrice Dufour

Journal of Statistical Software 2007 10.18637/jss.v022.i04
[15]
Dray "Package ‘adespatial’" (2018)
[16]
Search and clustering orders of magnitude faster than BLAST

Robert C. Edgar

Bioinformatics 2010 10.1093/bioinformatics/btq461
[17]
Error filtering, pair assembly and error correction for next-generation sequencing reads

Robert C. Edgar, Henrik Flyvbjerg

Bioinformatics 2015 10.1093/bioinformatics/btv401
[18]
THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS

R. A. Fisher

Annals of Eugenics 1936 10.1111/j.1469-1809.1936.tb02137.x
[19]
Species-level functional profiling of metagenomes and metatranscriptomes

Eric A. Franzosa, Lauren J. McIver, Gholamali Rahnavard et al.

Nature Methods 2018 10.1038/s41592-018-0176-y
[20]
Complex heatmap visualization

Zuguang Gu

iMeta 2022 10.1002/imt2.43
[21]
circlizeimplements and enhances circular visualization in R

Zuguang Gu, Lei Gu, Roland Eils et al.

Bioinformatics 2014 10.1093/bioinformatics/btu393
[22]
ggtern: Ternary Diagrams Using ggplot2

Nicholas E. Hamilton, Michael Ferry

Journal of Statistical Software 2018 10.18637/jss.v087.c03
[23]
Harrell "Package ‘hmisc’" CRAN2018 (2019)
[24]
Hofner "Model-based boosting in R: a hands-on tutorial using the R package mboost" Comput Stat (2014) 10.1007/s00180-012-0382-5
[25]
Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper

Jaime Huerta-Cepas, Kristoffer Forslund, Luis Pedro Coelho et al.

Molecular Biology and Evolution 2017 10.1093/molbev/msx148
[26]
MEGAN analysis of metagenomic data

Daniel H. Huson, Alexander F. Auch, Ji Qi et al.

Genome Research 2007 10.1101/gr.5969107
[27]
R: A Language for Data Analysis and Graphics

Ross Ihaka, Robert Gentleman

Journal of Computational and Graphical Statistics 1996 10.1080/10618600.1996.10474713
[28]
Picante: R tools for integrating phylogenies and ecology

Steven W. Kembel, Peter D. Cowan, Matthew R. Helmus et al.

Bioinformatics 2010 10.1093/bioinformatics/btq166
[29]
Bayesian community-wide culture-independent microbial source tracking

Dan Knights, Justin Kuczynski, Emily S Charlson et al.

Nature Methods 2011 10.1038/nmeth.1650
[30]
Building Predictive Models in R Using the caret Package

Max Kuhn

Journal of Statistical Software 2008 10.18637/jss.v028.i05
[31]
Sparse and Compositionally Robust Inference of Microbial Ecological Networks

Zachary D. Kurtz, Christian L. Müller, Emily R. Miraldi et al.

PLOS Computational Biology 2015 10.1371/journal.pcbi.1004226
[32]
WGCNA: an R package for weighted correlation network analysis

Peter Langfelder, Steve Horvath

BMC Bioinformatics 2008 10.1186/1471-2105-9-559
[33]
Li "Sequence-based functional metagenomics reveals novel natural diversity of functioning CopA in environmental microbiomes" Genom Proteom Bioinform (2022)
[34]
Liaw "Classification and regression by randomForest" R News (2002)
[35]
Lin "Analysis of microbial compositions: a review of normalization and differential abundance analysis" Npj Biofilms Microbiomes (2020) 10.1038/s41522-020-00160-w
[36]
microeco: an R package for data mining in microbial community ecology

Chi Liu, Yaoming Cui, Xiangzhen Li et al.

FEMS Microbiology Ecology 2020 10.1093/femsec/fiaa255
[37]
A practical guide to amplicon and metagenomic analysis of microbiome data

Yong-Xin Liu, Yuan Qin, Meiping Lu et al.

Protein & Cell 2021 10.1007/s13238-020-00724-8
[38]
Liu "EasyAmplicon: an easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research" iMeta (2023) 10.1002/imt2.83
[39]
Decoupling function and taxonomy in the global ocean microbiome

Stilianos Louca, Laura Wegener Parfrey, Michael Doebeli

Science 2016 10.1126/science.aaf4507
[40]
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael I Love, Wolfgang Huber, Simon Anders

Genome Biology 2014 10.1186/s13059-014-0550-8
[41]
phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

PAUL J. McMURDIE, Susan Holmes

PLoS ONE 2013 10.1371/journal.pone.0061217
[42]
Metcalf "Microbial community assembly and metabolic function during mammalian corpse decomposition" Science (2016) 10.1126/science.aad2646
[43]
Nearing "Microbiome differential abundance methods produce different results across 38 datasets" Nat Commun (2022) 10.1038/s41467-022-28034-z
[44]
FUNGuild: An open annotation tool for parsing fungal community datasets by ecological guild

Nhu H. Nguyen, Zewei Song, Scott Thomas Bates et al.

Fungal Ecology 2016 10.1016/j.funeco.2015.06.006
[45]
Ning "A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming" Nat Commun (2020) 10.1038/s41467-020-18560-z
[46]
Oksanen "The vegan package" Community Ecol Package (2007)
[47]
Pages "Biostrings: string objects representing biological sequences, and matching algorithms" R Package Version (2016)
[48]
Paoli "Biosynthetic potential of the global ocean microbiome" Nature (2022) 10.1038/s41586-022-04862-3
[49]
Pasolli "Accessible, curated metagenomic data through ExperimentHub" Nat Methods (2017) 10.1038/nmeth.4468
[50]
Proctor "The integrative human microbiome project" Nature (2019) 10.1038/s41586-019-1238-8

Showing 50 of 71 references

Metrics
119
Citations
71
References
Details
Published
May 02, 2023
Vol/Issue
14(10)
Pages
713-725
License
View
Authors
Funding
Natural Science Foundation of China Award: 42277297
Jiangsu Funding Program for Excellent Postdoctoral Talent Award: 2022ZB325
China Academy of Chinese Medical Sciences Award: C12021A04115
Fundamental Research Funds
Agricultural Science and Technology Innovation Program Award: CAAS-ZDRW202308
Central Public Welfare Research Institutes Award: ZZ13-YQ-095
Scientific and Technology Innovation Project
Cite This Article
Tao Wen, Guoqing Niu, Qirong Shen, et al. (2023). The best practice for microbiome analysis using R. Protein & Cell, 14(10), 713-725. https://doi.org/10.1093/procel/pwad024