Installation and testing

SqueezeMeta is intended to be run in a x86-64 Linux OS (tested in Ubuntu and CentOS). The easiest way to install it is by using conda. The default conda solver might however be slow solving the dependencies, so it’s better to first set up the libmamba solver with

conda update -n base conda # if your conda version is below 22.11
conda install -n base conda-libmamba-solver
conda config --set solver libmamba

and then use conda to install SqueezeMeta

conda create -n SqueezeMeta -c conda-forge -c bioconda -c fpusan squeezemeta --no-channel-priority --override-channels

If the environment does not solve and you get a message saying that __cuda is missing in your system, try adding CONDA_OVERRIDE_CUDA=12.4 before the installation command:

CONDA_OVERRIDE_CUDA=12.4 conda create -n SqueezeMeta -c conda-forge -c bioconda -c fpusan squeezemeta=1.7 --no-channel-priority --override-channels

If you change squeezemeta to squeezemeta-dev you will instead get the latest development version. This will contain additional bugfixes and features, but potentially also new bugs, as it will not have been tested as thoroughly as the stable version.

The commands above will create a new conda environment named SqueezeMeta, which must then be activated.

conda activate SqueezeMeta

When using conda, all the scripts from the SqueezeMeta distribution will be available on $PATH.

Alternatively, you can download the latest release from the GitHub repository and uncompress the tarball in a suitable directory. The tarball includes the SqueezeMeta scripts as well as the ref:third-party software <Vendored tools> redistributed with SqueezeMeta. Note that, you may need to provide additional dependencies, and potentially recompile some binaries from source in order for the manual install to work. The conda method is now the recommended way to install SqueezeMeta, and we will not prioritize support to issues regarding manual installation.

The test_install.pl script can be run in order to check whether the required dependencies are available in your environment.

/path/to/SqueezeMeta/utils/install_utils/test_install.pl

Downloading or building databases

SqueezeMeta uses several databases. GenBank nr for taxonomic assignment, and eggnog, KEGG and Pfam for functional assignment. The script download_databases.pl can be run to download a pre-formatted version of all the databases required by SqueezeMeta.

/path/to/SqueezeMeta/utils/install_utils/download_databases.pl /download/path

, where /download/path is the destination folder. This is the recommended option, but the files are hosted in our institutional server, which can at times be unreachable.

Alternatively, the script make_databases.pl can be run to download from source and format the latest version of the databases.

/path/to/SqueezeMeta/utils/install_utils/make_databases.pl /download/path

Generally, download_databases.pl is the safest choice for getting your databases set up. When running make_databases.pl, data download (e.g. from the NCBI server) can be interrupted, leading to a corrupted database. Always run test_install.pl to check that the database was properly created. Otherwise, you can try re-running make_databases.pl, or just run download_databases.pl instead.

The databases occupy 470Gb, but we recommend having at least 700Gb free disk space during the building process.

Two directories will be generated after running either make_databases.pl or download_databases.pl.

/download/path/db, which contains the actual databases.
/download/path/test, which contains data for a test run of SqueezeMeta.

If the SqueezeMeta databases are already built in another location in the system, a different copy of SqueezeMeta can be configured to use them with

/path/to/SqueezeMeta/utils/install_utils/configure_nodb.pl /path/to/db

, where /path/to/db is the route to the db folder that was generated by either make_databases.pl or download_databases.pl.

After configuring the databases, the test_install.pl can be run in order to check that SqueezeMeta is ready to work (see previous section).

Updating SqueezeMeta

Assuming your databases are not inside the SqueezeMeta directory, just remove it, download the new version and configure it with

/path/to/SqueezeMeta/utils/install_utils/configure_nodb.pl /path/to/db

Vendored tools

This is a list of all the tools redistributed with SqueezeMeta, and a brief description of the custom modifications (if any) that were applied to each tool.

We vendor third-party software since

The pipeline is complex and we want to minimize the burden on our users. Initially, we aimed for SqueezeMeta to depend only on libraries that can be installed via standard packaging tools (apt, yum, etc). Now we are trying to simplify even more, by using conda to meet all dependencies
Some tools require modifications (e.g. parametrized rather than hardcoded database locations) to work well within our pipeline.

Over time some of the vendored tools have been replaced by conda packages. This was a natural transition to make as most of our users were using conda for installing SqueezeMeta, and some vendored binaries had trouble running in different linux distributions/versions. However we still redistribute all the tools listed below, even if some of them are no longer used by default.

The External software section of the SqueezeMeta/scripts/SqueezeMeta_conf.pl file controls all the software that is called by the pipeline. The executable called for each program is stored in a different variable. If no path to the executable is listed there, the executable will be assumed to be present in $PATH (e.g. because it is provided by a conda environment). For example:

$spades_soft = "$installpath/bin/SPAdes/spades.py"; will take the spades.py executable that we vendor with SqueezeMeta
$spades_soft = "spades.py"; will take whatever spades.py executable available in $PATH

Note that some of these tools require additional software and libraries to be available via $PATH and $LD_LIBRARY_PATH. This is also indicated in the SqueezeMeta_conf.pl file. Normally this will not be relevant when using versions from conda, since in that case all the dependencies should be in place when activating the environment.

So, in order to control which software is called by SqueezeMeta, modify the External software section of the SqueezeMeta/scripts/SqueezeMeta_conf.pl file.

SqueezeMeta redistributes the following third-party software. Note that, for compatibility versions, we now use conda to provide some of these. A given tool should be replaceable by its original version if has no custom patch listed, or has ONLY the “Work within the SQM directory structure” patch listed:

trimmomatic
MEGAHIT
SPAdes
- Work within the SQM directory structure
Canu
prinseq
kmer-db
CD-HIT
- Recompile with MAX_SEQ=20000000
amos
- Work within the SQM directory structure
- Add multithreading in nucmer calls (minimus2)
- Add a custom minimus2 script for the SQM-seqmerge mode
mummer
hmmer
barrnap
- Work within the SQM directory structure
- Add -dbdir as an additional command line argument
aragorn
prodigal
DIAMOND
bwa
minimap2
bowtie2
MaxBin
- Work within the SQM directory structure
- Add -markerpath as an additional command line argument
MetaBAT
CONCOCT
- Fix an error in transform.py with newer version of scikit-kearn
`DAS Tool https://github.com/cmks/DAS_Tool>`_
- Add extra logging, remove some superfluous error messages
- Explicitly load library(methods) in DAS_Tool.R since Rscript does not load it on startup (even if R console does)
checkm
- Work within the SQM directory structure
- Port to python3
checkm2
- Work within the SQM directory structure
- Work with newer versions of pandas, scikit-learn
comparem
- Work within the SQM directory structure
- Port to python3
MinPath
- Work within the SQM directory structure
- Port to python3
RDP classifier
pullseq
Short-Pair
- Work within the SQM directory structure
- Port to python3
SAMtools
Mothur
Flye
POGENOM
- Only includes the pogenom.pl script, without modifications