*************
subsetContigs
*************

============= ===============
subsetContigs R Documentation
============= ===============

Select contigs
--------------

Description
~~~~~~~~~~~

Create a SQM object containing only the requested contigs, the ORFs
contained in them and the bins that contain them.

Usage
~~~~~

.. code:: R

   subsetContigs(
     SQM,
     contigs,
     tax_source = "contigs",
     trusted_functions_only = FALSE,
     ignore_unclassified_functions = FALSE,
     rescale_tpm = FALSE,
     rescale_copy_number = FALSE,
     recalculate_bin_stats = TRUE,
     allow_empty = FALSE
   )

Arguments
~~~~~~~~~

+-----------------------------------+----------------------------------+
| ``SQM``                           | SQM object to be subsetted.      |
+-----------------------------------+----------------------------------+
| ``contigs``                       | character. Vector of contigs to  |
|                                   | be selected.                     |
+-----------------------------------+----------------------------------+
| ``tax_source``                    | character, source data used for  |
|                                   | the taxonomy tables present in   |
|                                   | ``SQM$taxa``, either ``"bins"``  |
|                                   | (GTDB bin taxonomy if available, |
|                                   | SQM bin taxonomy otherwise),     |
|                                   | ``"bins_gtdb"`` (GTDB bin        |
|                                   | taxonomy) or ``"bins_sqm"`` (SQM |
|                                   | bin taxonomy). Default           |
|                                   | ``"contigs"``.                   |
+-----------------------------------+----------------------------------+
| ``trusted_functions_only``        | logical. If ``TRUE``, only       |
|                                   | highly trusted functional        |
|                                   | annotations (best hit + best     |
|                                   | average) will be considered when |
|                                   | generating aggregated function   |
|                                   | tables. If ``FALSE``, best hit   |
|                                   | annotations will be used         |
|                                   | (default ``FALSE``).             |
+-----------------------------------+----------------------------------+
| ``ignore_unclassified_functions`` | logical. If ``FALSE``, ORFs with |
|                                   | no functional classification     |
|                                   | will be aggregated together into |
|                                   | an "Unclassified" category. If   |
|                                   | ``TRUE``, they will be ignored   |
|                                   | (default ``FALSE``).             |
+-----------------------------------+----------------------------------+
| ``rescale_tpm``                   | logical. If ``TRUE``, TPMs for   |
|                                   | KEGGs, COGs, and PFAMs will be   |
|                                   | recalculated (so that the TPMs   |
|                                   | in the subset actually add up to |
|                                   | 1 million). Otherwise,           |
|                                   | per-function TPMs will be        |
|                                   | calculated by aggregating the    |
|                                   | TPMs of the ORFs annotated with  |
|                                   | that function, and will thus     |
|                                   | keep the scaling present in the  |
|                                   | parent object (default           |
|                                   | ``FALSE``).                      |
+-----------------------------------+----------------------------------+
| ``rescale_copy_number``           | logical. If ``TRUE``, copy       |
|                                   | numbers with be recalculated     |
|                                   | using the median single-copy     |
|                                   | gene coverages in the subset.    |
|                                   | Otherwise, single-copy gene      |
|                                   | coverages will be taken from the |
|                                   | parent object. By default it is  |
|                                   | set to ``FALSE``, which means    |
|                                   | that the returned copy numbers   |
|                                   | for each function will represent |
|                                   | the average copy number of that  |
|                                   | function per genome in the       |
|                                   | parent object.                   |
+-----------------------------------+----------------------------------+
| ``recalculate_bin_stats``         | logical. If ``TRUE``, bin        |
|                                   | abundance, quality and taxonomy  |
|                                   | are recalculated based on the    |
|                                   | contigs present in the subsetted |
|                                   | object (default ``TRUE``).       |
+-----------------------------------+----------------------------------+
| ``allow_empty``                   | (internal use only).             |
+-----------------------------------+----------------------------------+

Value
~~~~~

SQM object containing only the selected contigs.

See Also
~~~~~~~~

``subsetORFs``

Examples
~~~~~~~~

.. code:: R

   data(Hadza)
   # Which contigs have a GC content below 40?
   lowGCcontigNames = rownames(Hadza$contigs$table[Hadza$contigs$table[,"GC perc"]<40,])
   lowGCcontigs = subsetContigs(Hadza, lowGCcontigNames)
   hist(lowGCcontigs$contigs$table[,"GC perc"])