Welcome to SqueezeMeta’s documentation!
SqueezeMeta is a fully automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis. SqueezeMeta includes multi-metagenome support allowing the co-assembly of related metagenomes and the retrieval of individual metagenome-assembled genomes (MAGs) via binning procedures. Thus, SqueezeMeta features several characteristics:
Several assembly and co-assembly algorithms and strategies for short and long reads
Several binning algorithms for the recovery of metagenome-assembled genomes (MAGs)
Taxonomic annotation, functional annotation and quantification of genes, contigs, and bins
Support for the annotation and quantification of pre-existing assemblies or collections of genomes
Support for de-novo metatranscriptome assembly and hybrid metagenomics/metatranscriptomics projects
Support for the annotation of unassembled shotgun metagenomic reads
An R package to easily explore your results, including bindings for microeco and phyloseq
Note
Check out the Use cases section for more information.
SqueezeMeta uses a combination of custom scripts and external software packages for the different steps of the analysis:
Assembly
RNA prediction and classification
ORF (CDS) prediction
Homology searching against taxonomic and functional databases
Hmmer searching against Pfam database
Taxonomic assignment of genes
Functional assignment of genes (OPTIONAL)
Blastx on parts of the contigs with no gene prediction or no hits
Taxonomic assignment of contigs, and check for taxonomic disparities
Coverage and abundance estimation for genes and contigs
Estimation of taxa abundances
Estimation of function abundances
Merging of previous results to obtain the ORF table
Binning with different methods
Binning integration with DAS tool
Taxonomic assignment of bins, and check for taxonomic disparities
Checking of bins with CheckM2 (and optionally classify them with GTDB-Tk)
Merging of previous results to obtain the bin table
Merging of previous results to obtain the contig table
Prediction of kegg and metacyc patwhays for each bin
Final statistics for the run
Generation of tables with aggregated taxonomic and functional profiles
Detailed information about the different steps of the pipeline can be found in the Scripts, output files and file format section.
Contents
- Use cases
- Choosing an assembly strategy
- Analyzing metatranscriptomes
- Combined analysis of metagenomes and metatranscriptomes
- Alternative analysis modes
- Working with Oxford Nanopore MinION and PacBio reads
- Working in a low-memory environment
- Tips for working in a computing cluster
- Downstream analysis of SqueezeMeta results
- Analyzing SqueezeMeta results in your desktop computer
- Installation and testing
- Execution, restart and running scripts
- Advanced annotation
- Scripts, output files and file format
- Step 1: Assembly
- Step 2: RNA finding
- Step 3: Gene prediction
- Step 4: Homology searching against taxonomic (nr) and functional (COG, KEGG) databases
- Step 5: HMM search for Pfam database
- Step 6: Taxonomic assignment
- Step 7: Functional assignment
- Step 8: Blastx on parts of the contigs without gene prediction or without hits
- Step 9: Taxonomic assignment of contigs
- Step 10: Mapping of reads to contigs and calculation of abundance measures
- Step 11: Calculation of the abundance of all taxa
- Step 12: Calculation of the abundance of all functions
- Step 13: Creation of the ORF table
- Step 14: Binning
- Step 15: Merging bins with DAS Tool
- Step 16: Taxonomic assignment of bins
- Step 17: Running CheckM2 and optionally GTDB-Tk on bins
- Step 18: Creation of the bin table
- Step 19: Creation of the contig table
- Step 20: Prediction of pathway presence in bins using MinPath
- Step 21: Final statistics for the run
- Step 22: Calculation of summary tables for the project
- Alternative analysis modes
- The SQMtools R package
- Adding new binners and assemblers
- Utility scripts
- Explanation of SqueezeMeta algorithms