
Create ninetails output tables (Guppy legacy pipeline)
Source:R/ninetails_core_functions.R
create_outputs.RdIntegrates nanopolish poly(A) tail data, CNN predictions, and positional information to generate two main outputs: per-read classifications and non-adenosine residue predictions with estimated positions along the poly(A) tail.
Usage
create_outputs(
tail_feature_list,
tail_chunk_list,
nanopolish,
predicted_list,
num_cores,
pass_only = TRUE,
qc = TRUE
)Arguments
- tail_feature_list
List object produced by
create_tail_feature_list.- tail_chunk_list
List object produced by
create_tail_chunk_list.- nanopolish
Character string or data frame. Full path of the
.tsvfile produced by nanopolish polya function, or an in-memory data frame.- predicted_list
List object produced by
predict_gaf_classes.- num_cores
Numeric. Number of physical cores to use in processing. Do not exceed 1 less than the number of cores at your disposal.
- pass_only
Logical. If
TRUE(default), only reads tagged by nanopolish as"PASS"are taken into consideration. IfFALSE, reads tagged as"PASS"and"SUFFCLIP"are both included.- qc
Logical. If
TRUE(default), quality control of the output predictions is performed. Reads with non-A residue positions in terminal nucleotides (< 2 nt from either end of the tail) are labeled with"-WARN"suffixes, as these are most likely artifacts inherited from nanopolish segmentation. It is then up to the user whether to include or discard such reads from downstream analysis.
Value
A named list with two data frames:
- read_classes
Data frame with per-read classification results. Columns:
readname,contig,polya_length,qc_tag,class, andcomments.- nonadenosine_residues
Data frame with per-chunk predictions of non-adenosine residues for decorated reads only. Columns:
readname,contig,prediction,est_nonA_pos,polya_length,qc_tag. Whenqc = TRUE, prediction values may carry a"-WARN"suffix for terminal positions.
Details
The function implements a complete read accounting system that classifies all reads from the nanopolish output into biologically meaningful categories based on both quality control metrics and modification detection results.
Read Classification System
Reads are assigned into three main categories with specific comment codes:
decorated - Reads with detected non-adenosine modifications
YAY: Move transition present, nonA residue detected
blank - Reads without detected modifications
MAU: Move transition absent, nonA residue undetected
MPU: Move transition present, nonA residue undetected
unclassified - Reads that failed quality control
IRL: Insufficient read length (poly(A) < 10 nt)
QCF: Nanopolish QC failed
NIN: Not included in the analysis (
pass_only = TRUE)
Position Estimation
Non-adenosine residue positions are estimated using the formula:
est_nonA_pos = polya_length - ((polya_length * centr_signal_pos) /
signal_length)
Positions are reported as distance from the 3' end of the tail (position 1 = most 3' nucleotide).
See also
create_tail_feature_list for feature extraction,
create_tail_chunk_list for chunk segmentation,
predict_gaf_classes for CNN classification,
check_tails_guppy for the complete pipeline wrapper