******* loadSQM ******* .. container:: ======= =============== loadSQM R Documentation ======= =============== .. rubric:: Load a SqueezeMeta project into R :name: loadSQM .. rubric:: Description :name: description This function takes the path to a project directory generated by `SqueezeMeta `__ (whose name is specified in the ``-p`` parameter of the SqueezeMeta.pl script) and parses the results into a SQM object. Alternatively, it can load the project data from a zip file produced by ``sqm2zip.py``. .. rubric:: Usage :name: usage .. code:: R loadSQM( project_path, tax_mode = "prokfilter", trusted_functions_only = FALSE, single_copy_genes = "MGOGs", load_sequences = TRUE, engine = "data.table" ) .. rubric:: Arguments :name: arguments +----------------------------+----------------------------------------+ | ``project_path`` | character, a vector of project | | | directories generated by SqueezeMeta, | | | and/or zip files generated by | | | ``sqm2zip.py``. | +----------------------------+----------------------------------------+ | ``tax_mode`` | character, which taxonomic | | | classification should be loaded? | | | SqueezeMeta applies the identity | | | thresholds described in `Luo et al., | | | 2014 `__. | | | Use ``allfilter`` for applying the | | | minimum identity threshold to all | | | taxa, ``prokfilter`` for applying the | | | threshold to Bacteria and Archaea, but | | | not to Eukaryotes, and ``nofilter`` | | | for applying no thresholds at all | | | (default ``prokfilter``). | +----------------------------+----------------------------------------+ | ``trusted_functions_only`` | logical. If ``TRUE``, only highly | | | trusted functional annotations (best | | | hit + best average) will be considered | | | when generating aggregated function | | | tables. If ``FALSE``, best hit | | | annotations will be used (default | | | ``FALSE``). Will only have an effect | | | if ``project_path`` is not a zip file, | | | and ``project_path/results/tables`` is | | | not already present. | +----------------------------+----------------------------------------+ | ``single_copy_genes`` | character, source of single copy genes | | | for copy number normalization, either | | | ``RecA`` (COG0468, RecA/RadA), | | | ``MGOGs`` (COGs for 10 single copy and | | | housekeeping genes, Salazar, G *et | | | al.* 2019), ``MGKOs`` (KOs for 10 | | | single copy and housekeeping genes, | | | Salazar, G *et al.*, 2019) or | | | ``USiCGs`` (KOs for 15 single copy | | | genes, Carr *et al.*, 2013. Table S1). | | | For ``MGOGs``, ``MGKOs`` and | | | ``USiCGs``, the median coverage of a | | | set of single copy genes will be used | | | for normalization. Default ``MGOGs``. | +----------------------------+----------------------------------------+ | ``load_sequences`` | logical. If ``TRUE``, contig and orf | | | sequences will be loaded in the SQM | | | object. Setting it to ``FALSE`` will | | | reduce memory usage. Default ``TRUE``. | +----------------------------+----------------------------------------+ | ``engine`` | character. Engine used to load the | | | ORFs and contigs tables. Either | | | ``data.frame`` or ``data.table`` | | | (significantly faster if your project | | | is large). Default ``data.table``. | +----------------------------+----------------------------------------+ .. rubric:: Value :name: value SQM object containing the parsed project. If more than one path is provided in ``project_path`` this function will return a SQMbunch object instead. The structure of this object is similar to that of a SQMlite object (see ``loadSQMlite``) but with an extra entry named ``projects`` that contains one SQM object for input project. SQM and SQMbunch objects will otherwise behave similarly when used with the subset and plot functions from this package. .. rubric:: Prerequisites :name: prerequisites Run `SqueezeMeta `__! An example call for running it would be: | ``/path/to/SqueezeMeta/scripts/SqueezeMeta.pl`` | ``-m coassembly -f fastq_dir -s samples_file -p project_dir`` .. rubric:: The SQM object structure :name: the-sqm-object-structure The SQM object is a nested list which contains the following information: +---------+---------+---------+---------+---------+---------+---------+ | * | * | * | * | **rows/ | **co | * | | *lvl1** | *lvl2** | *lvl3** | *type** | names** | lumns** | *data** | +---------+---------+---------+---------+---------+---------+---------+ | ** | **$ | | *dat | orfs | misc. | misc. | | $orfs** | table** | | aframe* | | data | data | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | | * | orfs | samples | abu | | | abund** | | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | | * | orfs | samples | abu | | | bases** | | numeric | | | ndances | | | | | matrix* | | | (bases) | +---------+---------+---------+---------+---------+---------+---------+ | | * | | * | orfs | samples | co | | | *$cov** | | numeric | | | verages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | * | | * | orfs | samples | covs. / | | | *$cpm** | | numeric | | | 10^6 | | | | | matrix* | | | reads | +---------+---------+---------+---------+---------+---------+---------+ | | * | | * | orfs | samples | tpm | | | *$tpm** | | numeric | | | | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | ** | | *ch | orfs | (n/a) | se | | | $seqs** | | aracter | | | quences | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | * | | *ch | orfs | tax. | t | | | *$tax** | | aracter | | ranks | axonomy | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$t | | *ch | orfs | (n/a) | 16S | | | ax16S** | | aracter | | | rRNA | | | | | vector* | | | t | | | | | | | | axonomy | +---------+---------+---------+---------+---------+---------+---------+ | | **$ma | | *list* | orfs | (n/a) | CheckM1 | | | rkers** | | | | | markers | +---------+---------+---------+---------+---------+---------+---------+ | **$co | **$ | | *dat | contigs | misc. | misc. | | ntigs** | table** | | aframe* | | data | data | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | | * | contigs | samples | abu | | | abund** | | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | | * | contigs | samples | abu | | | bases** | | numeric | | | ndances | | | | | matrix* | | | (bases) | +---------+---------+---------+---------+---------+---------+---------+ | | * | | * | contigs | samples | co | | | *$cov** | | numeric | | | verages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | * | | * | contigs | samples | covs. / | | | *$cpm** | | numeric | | | 10^6 | | | | | matrix* | | | reads | +---------+---------+---------+---------+---------+---------+---------+ | | * | | * | contigs | samples | tpm | | | *$tpm** | | numeric | | | | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | ** | | *ch | contigs | (n/a) | se | | | $seqs** | | aracter | | | quences | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | * | | *ch | contigs | tax. | tax | | | *$tax** | | aracter | | ranks | onomies | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | ** | | *ch | contigs | bin. | bins | | | $bins** | | aracter | | methods | | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | $bins | **$ | | *dat | bins | misc. | misc. | | | table** | | aframe* | | data | data | +---------+---------+---------+---------+---------+---------+---------+ | | **$l | | * | bins | (n/a) | length | | | ength** | | numeric | | | | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | | * | bins | samples | abu | | | abund** | | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | **$pe | | * | bins | samples | abu | | | rcent** | | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | | * | bins | samples | abu | | | bases** | | numeric | | | ndances | | | | | matrix* | | | (bases) | +---------+---------+---------+---------+---------+---------+---------+ | | * | | * | bins | samples | co | | | *$cov** | | numeric | | | verages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | * | | * | bins | samples | covs. / | | | *$cpm** | | numeric | | | 10^6 | | | | | matrix* | | | reads | +---------+---------+---------+---------+---------+---------+---------+ | | * | | *ch | bins | tax. | t | | | *$tax** | | aracter | | ranks | axonomy | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$tax | | *ch | bins | tax. | GTDB | | | _gtdb** | | aracter | | ranks | t | | | | | matrix* | | | axonomy | +---------+---------+---------+---------+---------+---------+---------+ | ** | **$ | **$ | * | superk | samples | abu | | $taxa** | superki | abund** | numeric | ingdoms | | ndances | | | ngdom** | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$pe | * | superk | samples | perc | | | | rcent** | numeric | ingdoms | | entages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$p | **$ | * | phyla | samples | abu | | | hylum** | abund** | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$pe | * | phyla | samples | perc | | | | rcent** | numeric | | | entages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | **$ | * | classes | samples | abu | | | class** | abund** | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$pe | * | classes | samples | perc | | | | rcent** | numeric | | | entages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | **$ | * | orders | samples | abu | | | order** | abund** | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$pe | * | orders | samples | perc | | | | rcent** | numeric | | | entages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$f | **$ | * | f | samples | abu | | | amily** | abund** | numeric | amilies | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$pe | * | f | samples | perc | | | | rcent** | numeric | amilies | | entages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$ | **$ | * | genera | samples | abu | | | genus** | abund** | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$pe | * | genera | samples | perc | | | | rcent** | numeric | | | entages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$sp | **$ | * | species | samples | abu | | | ecies** | abund** | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$pe | * | species | samples | perc | | | | rcent** | numeric | | | entages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | **$func | ** | **$ | * | KEGG | samples | abu | | tions** | $KEGG** | abund** | numeric | ids | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$ | * | KEGG | samples | abu | | | | bases** | numeric | ids | | ndances | | | | | matrix* | | | (bases) | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | KEGG | samples | co | | | | *$cov** | numeric | ids | | verages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | KEGG | samples | covs. / | | | | *$cpm** | numeric | ids | | 10^6 | | | | | matrix* | | | reads | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | KEGG | samples | tpm | | | | *$tpm** | numeric | ids | | | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | ** | * | KEGG | samples | avg. | | | | $copy_n | numeric | ids | | copies | | | | umber** | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | * | **$ | * | COG ids | samples | abu | | | *$COG** | abund** | numeric | | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$ | * | COG ids | samples | abu | | | | bases** | numeric | | | ndances | | | | | matrix* | | | (bases) | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | COG ids | samples | co | | | | *$cov** | numeric | | | verages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | COG ids | samples | covs. / | | | | *$cpm** | numeric | | | 10^6 | | | | | matrix* | | | reads | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | COG ids | samples | tpm | | | | *$tpm** | numeric | | | | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | ** | * | COG ids | samples | avg. | | | | $copy_n | numeric | | | copies | | | | umber** | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | ** | **$ | * | PFAM | samples | abu | | | $PFAM** | abund** | numeric | ids | | ndances | | | | | matrix* | | | (reads) | +---------+---------+---------+---------+---------+---------+---------+ | | | **$ | * | PFAM | samples | abu | | | | bases** | numeric | ids | | ndances | | | | | matrix* | | | (bases) | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | PFAM | samples | co | | | | *$cov** | numeric | ids | | verages | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | PFAM | samples | covs. / | | | | *$cpm** | numeric | ids | | 10^6 | | | | | matrix* | | | reads | +---------+---------+---------+---------+---------+---------+---------+ | | | * | * | PFAM | samples | tpm | | | | *$tpm** | numeric | ids | | | | | | | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | ** | * | PFAM | samples | avg. | | | | $copy_n | numeric | ids | | copies | | | | umber** | matrix* | | | | +---------+---------+---------+---------+---------+---------+---------+ | ** | | | * | samples | (n/a) | total | | $total_ | | | numeric | | | reads | | reads** | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | ** | **$ | | *ch | (empty) | (n/a) | project | | $misc** | project | | aracter | | | name | | | _name** | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$sa | | *ch | (empty) | (n/a) | samples | | | mples** | | aracter | | | | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$ta | **$ | *ch | short | (n/a) | full | | | x_names | superki | aracter | names | | names | | | _long** | ngdom** | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | **$p | *ch | short | (n/a) | full | | | | hylum** | aracter | names | | names | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | **$ | *ch | short | (n/a) | full | | | | class** | aracter | names | | names | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | **$ | *ch | short | (n/a) | full | | | | order** | aracter | names | | names | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | **$f | *ch | short | (n/a) | full | | | | amily** | aracter | names | | names | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | **$ | *ch | short | (n/a) | full | | | | genus** | aracter | names | | names | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | | **$sp | *ch | short | (n/a) | full | | | | ecies** | aracter | names | | names | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$tax | | *ch | full | (n/a) | short | | | _names_ | | aracter | names | | names | | | short** | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | * | | *ch | KEGG | (n/a) | KEGG | | | *$KEGG_ | | aracter | ids | | names | | | names** | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | * | | *ch | KEGG | (n/a) | KEGG | | | *$KEGG_ | | aracter | ids | | hi | | | paths** | | vector* | | | ararchy | +---------+---------+---------+---------+---------+---------+---------+ | | **$COG_ | | *ch | COG ids | (n/a) | COG | | | names** | | aracter | | | names | | | | | vector* | | | | +---------+---------+---------+---------+---------+---------+---------+ | | **$COG_ | | *ch | COG ids | (n/a) | COG | | | paths** | | aracter | | | hi | | | | | vector* | | | erarchy | +---------+---------+---------+---------+---------+---------+---------+ | | * | | *ch | COG ids | (n/a) | e | | | *$ext_a | | aracter | | | xternal | | | nnot_so | | vector* | | | da | | | urces** | | | | | tabases | +---------+---------+---------+---------+---------+---------+---------+ | | | | | | | | +---------+---------+---------+---------+---------+---------+---------+ If external databases for functional classification were provided to SqueezeMeta via the ``-extdb`` argument, the corresponding abundance (reads and bases), coverages, tpm and copy number profiles will be present in ``SQM$functions`` (e.g. results for the CAZy database would be present in ``SQM$functions$CAZy``). Additionally, the extended names of the features present in the external database will be present in ``SQM$misc`` (e.g. ``SQM$misc$CAZy_names``). .. rubric:: Examples :name: examples .. code:: R ## Not run: ## (outside R) ## Run SqueezeMeta on the test data. /path/to/SqueezeMeta/scripts/SqueezeMeta.pl -p Hadza -f raw -m coassembly -s test.samples ## Now go into R. library(SQMtools) Hadza = loadSQM("Hadza") # Where Hadza is the path to the SqueezeMeta output directory. ## End(Not run) data(Hadza) # We will illustrate the structure of the SQM object on the test data # Which are the ten most abundant KEGG IDs in our data? topKEGG = names(sort(rowSums(Hadza$functions$KEGG$tpm), decreasing=TRUE))[1:11] topKEGG = topKEGG[topKEGG!="Unclassified"] # Which functions do those KEGG IDs represent? Hadza$misc$KEGG_names[topKEGG] # What is the relative abundance of the Negativicutes class across samples? Hadza$taxa$class$percent["Negativicutes",] # Which information is stored in the orf, contig and bin tables? colnames(Hadza$orfs$table) colnames(Hadza$contigs$table) colnames(Hadza$bins$table) # What is the GC content distribution of my metagenome? boxplot(Hadza$contigs$table[,"GC perc"]) # Not weighted by contig length or abundance!