Skip to contents

Pipeline wrappers

Top-level functions for running the complete analysis pipeline. Choose the appropriate wrapper based on your basecaller and data type.

check_tails_dorado_DRS()
Complete Oxford Nanopore poly(A) tail analysis pipeline for Dorado DRS data.
check_tails_dorado_cDNA()
Complete Oxford Nanopore poly(A)/poly(T) tail analysis pipeline for Dorado cDNA data.
check_tails_guppy()
Wrapper function for complete DRS processing by ninetails package (legacy mode).

Dorado DRS pipeline

Functions for processing direct RNA sequencing (DRS) data basecalled with Dorado ≥ 1.0.0 in POD5 format.

preprocess_inputs()
Preprocess Dorado inputs for ninetails analysis (no BAM processing)
process_dorado_summary()
Process and split Dorado summary file into smaller parts
filter_dorado_summary()
Filter Dorado summary for reads fulfilling ninetails quality criteria
extract_tails_from_pod5()
Extract poly(A) tail signal segments from POD5 files using parallel Python processing
create_tail_features_list_dorado()
Creates a nested list of Dorado tail features (raw signal + pseudomoves).
create_tail_chunk_list_dorado()
Creates list of poly(A) tail chunks (Dorado mode) centered on significant signal deviations.
split_tail_centered_dorado()
Extracts fragments of poly(A) tail signal (Dorado mode) containing potential modifications along with their delimitation (positional indices; coordinates) within the tail.
process_dorado_signal_files()
Process Dorado poly(A) signal files for non-A prediction and tail chunk extraction
create_outputs_dorado()
Create Ninetails output tables for Dorado DRS pipeline

Dorado cDNA pipeline

Functions for processing cDNA sequencing data, including BAM file processing, basecalled sequence extraction, and read orientation classification (polyA vs polyT).

preprocess_inputs_cdna()
Preprocess Dorado inputs for ninetails cDNA analysis
split_bam_file_cdna()
Split BAM file into parts based on read IDs from summary file
extract_data_from_bam()
Extract data from BAM file for cDNA analysis
detect_orientation_single()
Detect poly tail type for a single sequence using Dorado-style algorithm
detect_orientation_multiple()
Classify multiple cDNA read orientations using Dorado-style poly tail detection
process_polya_reads_cdna()
Process polyA reads using standard ninetails pipeline
process_polyt_reads_cdna()
Process polyT reads using ninetails pipeline
create_outputs_dorado_cdna()
Create Ninetails output tables for Dorado cDNA pipeline
merge_cdna_results()
Merge polyA and polyT processing results for cDNA analysis
save_cdna_outputs()
Save cDNA pipeline outputs in standard ninetails format

Guppy legacy pipeline

Functions for processing DRS data basecalled with Guppy ≤ 6.0.0 using fast5 format and Nanopolish poly(A) coordinates. This pipeline is no longer actively developed.

extract_polya_data()
Extract poly(A) data from nanopolish output and sequencing summary
extract_tail_data()
Extract tail features of a single RNA read from a multi-Fast5 file
create_tail_feature_list()
Create list of poly(A) tail features from multi-Fast5 files
create_tail_chunk_list()
Create list of poly(A) tail chunks centered on significant signal deviations
split_tail_centered()
Extract modification-centered signal fragments from a poly(A) tail
create_gaf()
Convert ONT signal to Gramian Angular Field
create_gaf_list()
Create list of Gramian Angular Field matrices from tail chunks
process_polya_complete()
Process a single (unsplit) poly(A) data file through the Guppy pipeline.
process_polya_parts()
Process poly(A) data split into multiple parts through the Guppy pipeline.
split_polya_data()
Split large poly(A) data file into smaller parts.
create_outputs()
Create ninetails output tables (Guppy legacy pipeline)
save_outputs()
Save pipeline outputs to files.

Training dataset production

Functions for preparing training and validation datasets for the convolutional neural network (CNN) model.

prepare_trainingset()
Filters out signals of a given nucleotide type for neural network training-set preparation.
extract_tail_data_trainingset()
Extracts tail features of single RNA read from respective basecalled multi-fast5 file.
create_tail_feature_list_trainingset()
Extracts features of poly(A) tails of ONT RNA reads required for finding non-A nucleotides within the given tails.
create_tail_feature_list_A()
Extracts features of poly(A) tails containing only A nucleotides for training-set preparation.
create_tail_chunk_list_trainingset()
Extracts decoration-centered fragments of poly(A) tails for all reads and appends positional data to a nested list.
create_tail_chunk_list_A()
Creates list of tail chunks containing only A nucleotides.
split_tail_centered_trainingset()
Extracts decoration-centered fragments of poly(A) tail signal along with positional coordinates.
split_with_overlaps()
Splits signal to overlapping fragments of equal length.
filter_nonA_chunks_trainingset()
Filters read chunks containing non-adenosine nucleotides of interest for neural network training-set preparation.
filter_signal_by_threshold_trainingset()
Detection of outliers (peaks & valleys) in ONT signal using z-scores.
create_gaf_list_A()
Produces list of GAFs containing exclusively A-nucleotides for neural network training.

Data postprocessing

Functions for correcting, reclassifying, and reshaping ninetails output tables after the pipeline has run.

correct_class_data()
Corrects the classification of reads contained in the class_data table.
correct_residue_data()
Marks uncertain positions of non-A residues in ninetails output data.
correct_labels()
Correct read class labels for backward compatibility
reclassify_ninetails_data()
Reclassifies ambiguous non-A residues to mitigate potential errors inherited from nanopolish segmentation.
read_class_single()
Reads ninetails read_classes data frame from file.
read_class_multiple()
Reads multiple ninetails read_classes outputs at once.
read_residue_single()
Reads ninetails nonadenosine_residues data from file.
read_residue_multiple()
Reads multiple ninetails nonadenosine_residues outputs at once.
merge_nonA_tables()
Merges ninetails tabular outputs (read classes and nonadenosine residue data) to produce one concise table.
spread_nonA_residues()
Reshapes nonadenosine_residues data frame to wide format.

Annotation

Functions for biological annotation of ninetails results using external databases.

annotate_with_biomart()
Annotate ninetails output data with biomaRt

Statistics

Functions for statistical analysis and quantification of non-adenosine residues across reads and conditions.

calculate_fisher()
Perform Fisher's exact test per transcript with BH p-value adjustment
nonA_fisher()
Perform Fisher's exact test on a single transcript in ninetails output
count_class()
Counts read classes found in a read_classes data frame produced by the ninetails pipeline.
count_nonA_abundance()
Counts reads by number of non-A occurrence instances.
count_residues()
Counts non-A residues found in a nonadenosine_residues data frame produced by the ninetails pipeline.
summarize_nonA()
Produces summary table of non-A occurrences within an analyzed dataset.
nanopolish_qc()
Aggregates nanopolish polya quality control information.

Visualisation

Plotting functions for inspection of raw signals, GAF images, classification results, and statistical summaries.

plot_class_counts()
Plotting read classes data per category assigned to the analyzed reads.
plot_gaf()
Creates a visual representation of gramian angular field corresponding to the given poly(A) tail fragment (chunk).
plot_multiple_gaf()
Creates a visual representation of multiple gramian angular fields based on provided gaf_list (plots all gafs from the given list).
plot_nanopolish_qc()
Plots qc data (qc_tag) inherited from nanopolish polya function.
plot_nonA_abundance()
Plot abundances of reads with given amount of non-A residues per read
plot_panel_characteristics()
Plot panel characteristics of ninetails output
plot_residue_counts()
Plot counts of nonadenosine residues found in ninetails output data
plot_rug_density()
Scatterplot of nonA residue positions within poly(A) tail
plot_squiggle_fast5()
Draws an entire squiggle for given read.
plot_squiggle_pod5()
Draws an entire squiggle for given read from POD5 file.
plot_tail_chunk()
Draws a portion of poly(A) tail squiggle (chunk) for given read.
plot_tail_distribution()
Plots poly(A) tail length (or estimated non-A position) distribution in analyzed sample(s).
plot_tail_range_fast5()
Draws tail range squiggle for given read.
plot_tail_range_pod5()
Draws tail range squiggle for given read from POD5 file.

Analysis dashboard

Interactive Shiny application for exploring ninetails results. Supports single-sample and multi-sample analysis via YAML configuration, including read classification, residue composition, poly(A) distributions, and raw signal visualization with non-A modification overlay.

launch_signal_browser()
Launch the Ninetails Analysis Dashboard

tailfindr compatibility

Functions for converting tailfindr output into a format compatible with the ninetails Guppy legacy pipeline.

convert_tailfindr_output()
Converts tailfindr results to format compatible with ninetails
check_polya_length_filetype()
Check and convert poly(A) length file format

Signal processing

Core signal processing utilities and CNN-related helpers used internally by the pipeline functions.

filter_signal_by_threshold()
Detect outliers (peaks and valleys) in ONT signal using z-scores
winsorize_signal()
Winsorize nanopore signal
substitute_gaps()
Substitute short zero-gaps surrounded by nonzero pseudomoves
combine_gafs()
Combine GASF and GADF into a two-channel array
predict_gaf_classes()
Classify Gramian Angular Field matrices with a pretrained CNN
load_keras_model()
Load Keras model for multiclass signal prediction

Sequence helpers

Helper functions for primer matching and DNA sequence manipulation used by the cDNA orientation classification step.

reverse_complement()
Generate reverse complement of a DNA sequence
edit_distance_hw()
Calculate edit distance with sliding window (HW mode)
count_trailing_chars()
Count trailing occurrences of a character in a string

Input validation

Internal assertion and type-checking utilities used throughout the package for input validation.

assert_condition()
Assert condition is TRUE, stop with message if FALSE
assert_dir_exists()
Assert directory exists with informative error
assert_file_exists()
Assert file exists with informative error
check_fast5_filetype()
Check if the provided directory contains Fast5 files in the correct format
check_output_directory()
Check and handle existing output directory for ninetails analysis
is_RNA()
Check if fast5 file contains RNA reads
is_multifast5()
Check if fast5 file is multi-read format
is_string()
Test if x is a single non-empty character string
no_na()
Check for no NA values
get_mode()
Calculate the statistical mode of a numeric vector

Package

ninetails ninetails-package
ninetails: Nonadenosine Nucleotides in Poly(A) Tails
`%>%`
Pipe operator