subsetORFs
subsetORFs |
R Documentation |
Select ORFs
Description
Create a SQM object containing only the requested ORFs, and the contigs
and bins that contain them. Internally, all the other subset functions
in this package end up calling subsetORFs to do the work for them.
Usage
subsetORFs(
SQM,
orfs,
tax_source = "orfs",
trusted_functions_only = FALSE,
ignore_unclassified_functions = FALSE,
rescale_tpm = FALSE,
rescale_copy_number = FALSE,
recalculate_bin_stats = TRUE,
contigs_override = NULL,
allow_empty = FALSE
)
Arguments
|
SQM object to be subsetted. |
|
character. Vector of ORFs to be selected. |
|
character, source data used for
the taxonomy tables present in
|
|
logical. If |
|
logical. If |
|
logical. If |
|
logical. If |
|
logical. If |
|
character. Optional vector of contigs to be included in the subsetted object. |
|
(internal use only). |
Value
SQM object containing the requested ORFs.
A note on contig/bins subsetting
While this function selects the contigs and bins that contain the
desired orfs, it DOES NOT recalculate contig abundance and statistics
based on the selected ORFs only. This means that the abundances
presented in tables such as SQM$contig$abund will still refer to the
complete contigs, regardless of whether only a fraction of their ORFs
are actually present in the returned SQM object. This is also true for
the statistics presented in SQM$contigs$table. Bin statistics may be
recalculated if rescale_copy_number is set to TRUE, but
recalculation will be based on contigs, not ORFs.
Examples
data(Hadza)
# Select the 100 most abundant ORFs in our dataset.
mostAbundantORFnames = names(sort(rowSums(Hadza$orfs$tpm), decreasing=TRUE))[1:100]
mostAbundantORFs = subsetORFs(Hadza, mostAbundantORFnames)