
Reclassifies ambiguous non-A residues to mitigate potential errors inherited from nanopolish segmentation.
Source:R/ninetails_data_postprocessing_functions.R
reclassify_ninetails_data.RdHigh-level wrapper that combines
correct_residue_data and
correct_class_data into a single call, producing
cleaned class and residue data frames ready for downstream analysis
and visualisation.
Usage
reclassify_ninetails_data(
residue_data,
class_data,
grouping_factor = NULL,
transcript_column,
ref = NULL
)Arguments
- residue_data
Data frame or tibble containing non-A residue predictions from the ninetails pipeline.
- class_data
Data frame or tibble containing read_classes predictions from the ninetails pipeline.
- grouping_factor
Character string or
NULL(default). A grouping variable (e.g."sample_name").- transcript_column
Character string. Name of the column containing transcript identifiers (e.g.
"contig","ensembl_transcript_id_short").- ref
Character string, character vector, or
NULL(default). Whitelist of transcripts with hybrid tails. Built-in options:"athaliana"Arabidopsis thaliana
"hsapiens"Homo sapiens
"mmusculus"Mus musculus
"scerevisiae"Saccharomyces cerevisiae
"celegans"Caenorhabditis elegans
"tbrucei"Trypanosoma brucei
A custom character vector may also be provided. Must be consistent with the content of
transcript_column. Using a whitelist is optional but allows retrieval of more true positive data.
Value
A named list with two data frames:
- class_data
Data frame. Corrected read classifications with
classandcommentscolumns updated. Compatible with plotting functions.- residue_data
Data frame. Filtered non-A residue predictions with ambiguous positions removed. Intermediate QC columns are dropped.
Details
Nanopolish segmentation can misidentify nucleotides from A-rich 3' UTR regions as part of the poly(A) tail. When tail boundaries are recognised incorrectly, non-A positions accumulate near the 3' end of the transcript, significantly affecting analysis results. This function flags and removes those ambiguous positions and reclassifies affected reads accordingly.
The procedure:
correct_residue_dataannotates each non-A position with a quality flag (qc_pos).correct_class_datareclassifies reads whose all non-A positions are flagged as ambiguous.Ambiguous positions (
qc_pos == "N") are dropped from the residue table.Corrected columns (
corr_class,corr_comments) are renamed toclassandcomments.
Caution
Reads containing only ambiguous non-A positions are reclassified as
"blank" in the class column, and their comments
are changed from "YAY" to "MPU".
See also
correct_residue_data and
correct_class_data for the underlying steps,
check_tails_guppy and create_outputs for
the pipeline that produces the input data.