
Merges ninetails tabular outputs (read classes and nonadenosine residue data) to produce one concise table.
Source:R/ninetails_data_postprocessing_functions.R
merge_nonA_tables.RdCombines the read_classes and nonadenosine_residues data
frames into a single wide-format tibble with one row per read. The
prediction column from the residue data is spread into three
columns (prediction_C, prediction_G, prediction_U)
via spread_nonA_residues, and a CIGAR-like
nonA_residues summary column is appended.
Arguments
- class_data
Data frame or tibble containing read_classes predictions produced by the ninetails pipeline. The
qc_tagcolumn may be character (Guppy/nanopolish:"PASS","SUFFCLIP", etc.) or numeric (Dorado: MAPQ score).- residue_data
Data frame or tibble containing non-A residue predictions produced by the ninetails pipeline. The
qc_tagcolumn type must match that ofclass_data.- pass_only
Logical
[TRUE]. Applies only whenqc_tagis character (Guppy/nanopolish pipeline). IfTRUE, only reads tagged as"PASS"are included. IfFALSE, reads tagged as"PASS"or"SUFFCLIP"are included. Ignored whenqc_tagis numeric (Dorado pipeline), in which case MAPQ > 0 filtering is applied instead.
Value
A tibble with summarised information from both ninetails
outputs: all columns from class_data plus
prediction_C, prediction_G, prediction_U, and
nonA_residues.
Details
Unclassified reads are excluded before merging. The quality-tag filter behaviour depends on the pipeline that produced the input data:
Guppy/nanopolish pipeline (character qc_tag):
The pass_only parameter controls filtering. When TRUE,
only reads tagged as "PASS" are retained. When FALSE,
reads tagged as "PASS" or "SUFFCLIP" are retained.
Dorado pipeline (numeric qc_tag):
The qc_tag column contains MAPQ scores. Filtering is based on
mapping quality: only reads with qc_tag > 0 are retained. The
pass_only parameter is ignored for numeric qc_tag.
After the full join, any NA values in numeric columns are
replaced with 0.
Pipeline Compatibility
This function automatically detects whether data originates from the
legacy Guppy/nanopolish pipeline (character qc_tag) or the
Dorado pipeline (numeric qc_tag) and applies appropriate
filtering logic. Both class_data and residue_data must
originate from the same pipeline to ensure consistent qc_tag
types during the merge operation.
See also
spread_nonA_residues for the reshaping step,
summarize_nonA for transcript-level summaries from the
merged table,
calculate_fisher for statistical testing on the merged
table,
read_class_single and read_residue_single
for loading the input data.
Examples
if (FALSE) { # \dontrun{
# Guppy/nanopolish data (character qc_tag)
merged_tables <- ninetails::merge_nonA_tables(
class_data = class_data,
residue_data = residue_data,
pass_only = TRUE)
# Dorado data (numeric qc_tag) - pass_only is ignored
merged_tables <- ninetails::merge_nonA_tables(
class_data = dorado_class_data,
residue_data = dorado_residue_data,
pass_only = TRUE)
} # }