subsetORFs
subsetORFs |
R Documentation |
Select ORFs
Description
Create a SQM object containing only the requested ORFs, and the
contigs and bins that contain them. Internally, all the other subset
functions in this package end up calling subsetORFs to do the
work for them.
Usage
subsetORFs(
SQM,
orfs,
tax_source = "orfs",
trusted_functions_only = FALSE,
ignore_unclassified_functions = FALSE,
rescale_tpm = FALSE,
rescale_copy_number = FALSE,
recalculate_bin_stats = TRUE,
contigs_override = NULL,
allow_empty = FALSE
)
Arguments
|
SQM object to be subsetted. |
|
character. Vector of ORFs to be selected. |
|
character. Features used for
calculating aggregated
abundances at the different
taxonomic ranks. Either
|
|
logical. If |
` ignore_unclassified_functions` |
logical. If |
|
logical. If |
|
logical. If |
|
logical. If |
|
character. Optional vector of contigs to be included in the subsetted object. |
|
(internal use only). |
Value
SQM object containing the requested ORFs.
A note on contig/bins subsetting
While this function selects the contigs and bins that contain the
desired orfs, it DOES NOT recalculate contig abundance and statistics
based on the selected ORFs only. This means that the abundances
presented in tables such as SQM$contig$abund will still refer to
the complete contigs, regardless of whether only a fraction of their
ORFs are actually present in the returned SQM object. This is also
true for the statistics presented in SQM$contigs$table. Bin
statistics may be recalculated if rescale_copy_number is set to
TRUE, but recalculation will be based on contigs, not ORFs.
Examples
data(Hadza)
# Select the 100 most abundant ORFs in our dataset.
mostAbundantORFnames = names(sort(rowSums(Hadza$orfs$tpm), decreasing=TRUE))[1:100]
mostAbundantORFs = subsetORFs(Hadza, mostAbundantORFnames)