SqueezeMeta logo

Welcome to SqueezeMeta’s documentation!

SqueezeMeta is a fully automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis. SqueezeMeta includes multi-metagenome support allowing the co-assembly of related metagenomes and the retrieval of individual metagenome-assembled genomes (MAGs) via binning procedures. Thus, SqueezeMeta features several characteristics:

  1. Several assembly and co-assembly algorithms and strategies for short and long reads

  2. Several binning algorithms for the recovery of metagenome-assembled genomes (MAGs)

  3. Taxonomic annotation, functional annotation and quantification of genes, contigs, and bins

  4. Support for the annotation and quantification of pre-existing assemblies or collections of genomes

  5. Support for de-novo metatranscriptome assembly and hybrid metagenomics/metatranscriptomics projects

  6. Support for the annotation of unassembled shotgun metagenomic reads

  7. An R package to easily explore your results, including bindings for microeco and phyloseq

Note

Check out the Use cases section for more information.

SqueezeMeta uses a combination of custom scripts and external software packages for the different steps of the analysis:

  1. Assembly

  2. RNA prediction and classification

  3. ORF (CDS) prediction

  4. Homology searching against taxonomic and functional databases

  5. Hmmer searching against Pfam database

  6. Taxonomic assignment of genes

  7. Functional assignment of genes (OPTIONAL)

  8. Blastx on parts of the contigs with no gene prediction or no hits

  9. Taxonomic assignment of contigs, and check for taxonomic disparities

  10. Coverage and abundance estimation for genes and contigs

  11. Estimation of taxa abundances

  12. Estimation of function abundances

  13. Merging of previous results to obtain the ORF table

  14. Binning with different methods

  15. Binning integration with DAS tool

  16. Taxonomic assignment of bins, and check for taxonomic disparities

  17. Checking of bins with CheckM2 (and optionally classify them with GTDB-Tk)

  18. Merging of previous results to obtain the bin table

  19. Merging of previous results to obtain the contig table

  20. Prediction of kegg and metacyc patwhays for each bin

  21. Final statistics for the run

  22. Generation of tables with aggregated taxonomic and functional profiles

Detailed information about the different steps of the pipeline can be found in the Scripts, output files and file format section.

Contents