subsetTax

subsetTax

R Documentation

Filter results by taxonomy

Description

Create a SQM or SQMbunch object containing only the contigs/bins with a given consensus taxonomy, as well as the ORFs contained in them.

Usage

subsetTax(
  SQM,
  rank,
  tax,
  tax_source = NULL,
  trusted_functions_only = FALSE,
  ignore_unclassified_functions = FALSE,
  rescale_tpm = TRUE,
  rescale_copy_number = TRUE,
  recalculate_bin_stats = FALSE,
  allow_empty = FALSE
)

Arguments

SQM

SQM object to be subsetted.

rank

character. The taxonomic rank from which to select the desired taxa (superkingdom, phylum, class, order, family, genus, species)

tax

character. A taxon or vector of taxa to be selected.

tax_source

character, source data used for feature selection, and to generate the taxonomy tables present in SQM$taxa, either "orfs", "contigs", "bins" (GTDB bin taxonomy if available, SQM bin taxonomy otherwise), "bins_gtdb" (GTDB bin taxonomy) or "bins_sqm" (SQM bin taxonomy). When "bins", "bins_gtdb" or "bins_sqm", this function will select the bins from the desired taxa, otherwise it will select the contigs from the desired taxa. If using "bins_gtdb", note that GTDB taxonomy may differ from the NCBI taxonomy used throughout the rest of SqueezeMeta. Default "contigs", unless the project was created with the ‘–onlybins’ flag, where it will be "bins_gtdb" if GTDB taxonomy is available for the bins.

trusted_functions_only

logical. If TRUE, only highly trusted functional annotations (best hit + best average) will be considered when generating aggregated function tables. If FALSE, best hit annotations will be used (default FALSE).

ignore_unclassified_functions

logical. If FALSE, ORFs with no functional classification will be aggregated together into an “Unclassified” category. If TRUE, they will be ignored (default FALSE).

rescale_tpm

logical. If TRUE, TPMs for KEGGs, COGs, and PFAMs will be recalculated (so that the TPMs in the subset actually add up to 1 million). Otherwise, per-function TPMs will be calculated by aggregating the TPMs of the ORFs annotated with that function, and will thus keep the scaling present in the parent object. By default it is set to TRUE, which means that the returned TPMs will be scaled by million of reads of the selected taxon.

rescale_copy_number

logical. If TRUE, copy numbers with be recalculated using the median single-copy gene coverages in the subset. Otherwise, single-copy gene coverages will be taken from the parent object. By default it is set to TRUE, which means that the returned copy numbers for each function will represent the average copy number of that function per genome of the selected taxon.

recalculate_bin_stats

logical. If TRUE, bin abundance, quality and taxonomy are recalculated based on the contigs present in the subsetted object (default TRUE).

allow_empty

(internal use only).

Value

SQM or SQMbunch object containing only the requested taxon.

See Also

subsetFun, subsetContigs, subsetSamples, combineSQM. The most abundant items of a particular table contained in a SQM object can be selected with mostAbundant.

Examples

data(Hadza)
Hadza.Prevotella = subsetTax(Hadza, "genus", "Prevotella")
Hadza.Bacteroidota = subsetTax(Hadza, "phylum", "Bacteroidota")