********* subsetTax ********* ========= =============== subsetTax R Documentation ========= =============== Filter results by taxonomy -------------------------- Description ~~~~~~~~~~~ Create a SQM or SQMbunch object containing only the contigs/bins with a given consensus taxonomy, as well as the ORFs contained in them. Usage ~~~~~ .. code:: R subsetTax( SQM, rank, tax, tax_source = NULL, trusted_functions_only = FALSE, ignore_unclassified_functions = FALSE, rescale_tpm = TRUE, rescale_copy_number = TRUE, recalculate_bin_stats = FALSE, allow_empty = FALSE ) Arguments ~~~~~~~~~ +-----------------------------------+----------------------------------+ | ``SQM`` | SQM object to be subsetted. | +-----------------------------------+----------------------------------+ | ``rank`` | character. The taxonomic rank | | | from which to select the desired | | | taxa (``superkingdom``, | | | ``phylum``, ``class``, | | | ``order``, ``family``, | | | ``genus``, ``species``) | +-----------------------------------+----------------------------------+ | ``tax`` | character. A taxon or vector of | | | taxa to be selected. | +-----------------------------------+----------------------------------+ | ``tax_source`` | character, source data used for | | | feature selection, and to | | | generate the taxonomy tables | | | present in ``SQM$taxa``, either | | | ``"orfs"``, ``"contigs"``, | | | ``"bins"`` (GTDB bin taxonomy if | | | available, SQM bin taxonomy | | | otherwise), ``"bins_gtdb"`` | | | (GTDB bin taxonomy) or | | | ``"bins_sqm"`` (SQM bin | | | taxonomy). When ``"bins"``, | | | ``"bins_gtdb"`` or | | | ``"bins_sqm"``, this function | | | will select the bins from the | | | desired taxa, otherwise it will | | | select the contigs from the | | | desired taxa. If using | | | ``"bins_gtdb"``, note that GTDB | | | taxonomy may differ from the | | | NCBI taxonomy used throughout | | | the rest of SqueezeMeta. Default | | | ``"contigs"``, unless the | | | project was created with the | | | '–onlybins' flag, where it will | | | be ``"bins_gtdb"`` if GTDB | | | taxonomy is available for the | | | bins. | +-----------------------------------+----------------------------------+ | ``trusted_functions_only`` | logical. If ``TRUE``, only | | | highly trusted functional | | | annotations (best hit + best | | | average) will be considered when | | | generating aggregated function | | | tables. If ``FALSE``, best hit | | | annotations will be used | | | (default ``FALSE``). | +-----------------------------------+----------------------------------+ | ``ignore_unclassified_functions`` | logical. If ``FALSE``, ORFs with | | | no functional classification | | | will be aggregated together into | | | an "Unclassified" category. If | | | ``TRUE``, they will be ignored | | | (default ``FALSE``). | +-----------------------------------+----------------------------------+ | ``rescale_tpm`` | logical. If ``TRUE``, TPMs for | | | KEGGs, COGs, and PFAMs will be | | | recalculated (so that the TPMs | | | in the subset actually add up to | | | 1 million). Otherwise, | | | per-function TPMs will be | | | calculated by aggregating the | | | TPMs of the ORFs annotated with | | | that function, and will thus | | | keep the scaling present in the | | | parent object. By default it is | | | set to ``TRUE``, which means | | | that the returned TPMs will be | | | scaled *by million of reads of | | | the selected taxon*. | +-----------------------------------+----------------------------------+ | ``rescale_copy_number`` | logical. If ``TRUE``, copy | | | numbers with be recalculated | | | using the median single-copy | | | gene coverages in the subset. | | | Otherwise, single-copy gene | | | coverages will be taken from the | | | parent object. By default it is | | | set to ``TRUE``, which means | | | that the returned copy numbers | | | for each function will represent | | | the average copy number of that | | | function *per genome of the | | | selected taxon*. | +-----------------------------------+----------------------------------+ | ``recalculate_bin_stats`` | logical. If ``TRUE``, bin | | | abundance, quality and taxonomy | | | are recalculated based on the | | | contigs present in the subsetted | | | object (default ``TRUE``). | +-----------------------------------+----------------------------------+ | ``allow_empty`` | (internal use only). | +-----------------------------------+----------------------------------+ Value ~~~~~ SQM or SQMbunch object containing only the requested taxon. See Also ~~~~~~~~ ``subsetFun``, ``subsetContigs``, ``subsetSamples``, ``combineSQM``. The most abundant items of a particular table contained in a SQM object can be selected with ``mostAbundant``. Examples ~~~~~~~~ .. code:: R data(Hadza) Hadza.Prevotella = subsetTax(Hadza, "genus", "Prevotella") Hadza.Bacteroidota = subsetTax(Hadza, "phylum", "Bacteroidota")