This vignette describes functions for loading, processing,
annotating, and analyzing ninetails output data. All
functions documented here operate on the two output tables produced by
the main pipelines (check_tails_dorado_DRS(),
check_tails_dorado_cDNA(), or
check_tails_guppy()):
- read_classes: Per-read classification (decorated/blank/unclassified)
- nonadenosine_residues: Modification-level detail for decorated reads
For visualization of these data, see
vignette("plotting") and
vignette("shiny_app").
Reading files into R
Reading single files
# Read read_classes output
class_path <- "/path/to/read_classes.tsv"
class_data <- ninetails::read_class_single(class_path)
# Read nonadenosine_residues output
residue_path <- "/path/to/nonadenosine_residues.tsv"
residue_data <- ninetails::read_residue_single(residue_path)| Function | Input | Output |
|---|---|---|
read_class_single() |
Path to a single read_classes file |
Data frame with readname, contig,
polya_length, qc_tag, class,
comments
|
read_residue_single() |
Path to a single nonadenosine_residues file |
Data frame with readname, contig,
prediction, est_nonA_pos,
polya_length, qc_tag
|
Reading multiple files
For experiments with multiple samples, use
read_class_multiple() and
read_residue_multiple() with a metadata table. The metadata
table must contain columns sample_name, group,
class_path, and residue_path.
# Define metadata table
samples_table <- data.frame(
sample_name = c("sample_1", "sample_2", "sample_3", "sample_4"),
group = c("control", "control", "treatment", "treatment"),
class_path = c(
"/path/to/sample_1/read_classes.tsv",
"/path/to/sample_2/read_classes.tsv",
"/path/to/sample_3/read_classes.tsv",
"/path/to/sample_4/read_classes.tsv"
),
residue_path = c(
"/path/to/sample_1/nonadenosine_residues.tsv",
"/path/to/sample_2/nonadenosine_residues.tsv",
"/path/to/sample_3/nonadenosine_residues.tsv",
"/path/to/sample_4/nonadenosine_residues.tsv"
)
)
# Read all files at once
class_data <- ninetails::read_class_multiple(samples_table)
residue_data <- ninetails::read_residue_multiple(samples_table)The output data frames include additional sample_name
and group columns (as factors), which are used as grouping
variables in downstream analysis and plotting.
Using a YAML configuration file
Alternatively, provide metadata in a YAML configuration file. This is
the same format used by the Shiny dashboard (see
vignette("shiny_app")):
config <- yaml::yaml.load_file("config.yml")
samples_table <- data.frame(t(sapply(config$samples, unlist)))
rownames(samples_table) <- NULL
class_data <- ninetails::read_class_multiple(samples_table)
residue_data <- ninetails::read_residue_multiple(samples_table)Example config.yml:
samples:
sample_1:
sample_name: sample_1
group: control
class_path: /path/to/sample_1/read_classes.tsv
residue_path: /path/to/sample_1/nonadenosine_residues.tsv
sample_2:
sample_name: sample_2
group: treatment
class_path: /path/to/sample_2/read_classes.tsv
residue_path: /path/to/sample_2/nonadenosine_residues.tsvAnnotation with biomaRt
annotate_with_biomart() adds gene symbols and Ensembl
transcript IDs to ninetails output tables by querying biomaRt. This
enables transcript-level filtering by human-readable gene names instead
of raw contig identifiers.
# Annotate class data
class_data <- ninetails::annotate_with_biomart(
class_data,
species = "mmusculus"
)
# Annotate residue data
residue_data <- ninetails::annotate_with_biomart(
residue_data,
species = "mmusculus"
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
data.frame | required | Ninetails output table with a contig column |
species |
character | required | Species identifier (see below) |
Supported species
| Identifier | Species |
|---|---|
"mmusculus" |
Mus musculus |
"hsapiens" |
Homo sapiens |
"athaliana" |
Arabidopsis thaliana |
"scerevisiae" |
Saccharomyces cerevisiae |
"celegans" |
Caenorhabditis elegans |
"tbrucei" |
Trypanosoma brucei |
See function documentation for other species and custom biomaRt datasets.
Output columns added
| Column | Description |
|---|---|
symbol |
Gene symbol (e.g., Actb, Gapdh) |
ensembl_transcript_id_short |
Ensembl transcript ID without version suffix |
Note: Annotation should be performed before merging, summarizing, or launching the dashboard, since downstream functions and the dashboard use the
symbolcolumn for transcript labels when available.
Correcting classification
Sometimes nucleotides from AT-rich transcript ends are misidentified
as poly(A) tails. Use reclassify_ninetails_data() to
minimize segmentation artifacts by comparing detected residue positions
against known transcript sequences:
ninetails_data <- ninetails::reclassify_ninetails_data(
residue_data = residue_data,
class_data = class_data,
grouping_factor = "sample_name",
transcript_column = "ensembl_transcript_id_short",
ref = "mmusculus"
)
# Retrieve corrected data frames
class_data <- ninetails_data[[1]]
residue_data <- ninetails_data[[2]]Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
residue_data |
data.frame | required | Non-A residue table |
class_data |
data.frame | required | Read classification table |
grouping_factor |
character | required | Grouping column (e.g., "sample_name") |
transcript_column |
character | required | Column with transcript IDs |
ref |
character | required | Reference species (same identifiers as
annotate_with_biomart()) |
Note: Apply this function before further analysis. It requires annotated data (i.e., run
annotate_with_biomart()first).
Merging results
Combine class and residue data into a single table with one row per
read. Decorated reads gain a nonA_residues column
summarizing all detected positions.
merged_tables <- ninetails::merge_nonA_tables(
class_data = class_data,
residue_data = residue_data,
pass_only = TRUE
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
class_data |
data.frame | required | Read classification table |
residue_data |
data.frame | required | Non-A residue table |
pass_only |
logical | TRUE |
If TRUE, include only QC-passed reads |
Output
The merged table contains all columns from class_data
plus a nonA_residues column. For decorated reads, this
column contains all non-A positions from 5’ to 3’ separated by
: (e.g., "C:G" for a read with a cytidine and
a guanosine). For blank and unclassified reads, the column is
NA.
Spreading residue columns
spread_nonA_residues() reshapes the merged table to
create separate columns for each non-A position, which can be useful for
per-position analysis:
spread_table <- ninetails::spread_nonA_residues(
merged_nonA_tables = merged_tables
)The output adds columns nonA_1, nonA_2,
etc., each containing the residue type at that position (or
NA if the read has fewer modifications).
Counting and summarizing
Count by class
# Simple counts
class_counts <- ninetails::count_class(class_data)
# Grouped by sample
class_counts <- ninetails::count_class(
class_data,
grouping_factor = "sample_name"
)
# Detailed (by comments code) vs simplified (by class)
counts_detailed <- ninetails::count_class(class_data, detailed = TRUE)
counts_simple <- ninetails::count_class(class_data, detailed = FALSE)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
class_data |
data.frame | required | Read classification table |
grouping_factor |
character | NA |
Grouping column |
detailed |
logical | TRUE |
If TRUE, count by comment code; if FALSE,
count by class |
Count residues
residue_counts <- ninetails::count_residues(
residue_data,
grouping_factor = "sample_name"
)Count non-A abundance
Count how many reads have one, two, or three or more non-A residues:
abundance <- ninetails::count_nonA_abundance(
residue_data,
grouping_factor = "sample_name"
)Summarizing by transcript
Generate comprehensive summary statistics per transcript:
summarized <- ninetails::summarize_nonA(
merged_nonA_tables = merged_tables,
summary_factors = "group",
transcript_id_column = "ensembl_transcript_id_short"
)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
merged_nonA_tables |
data.frame | required | Output from merge_nonA_tables()
|
summary_factors |
character | required | Grouping column(s) for the summary |
transcript_id_column |
character | NULL |
Column with transcript IDs (optional) |
Output columns
| Column | Description |
|---|---|
| Grouping columns | As specified by summary_factors
|
contig / transcript ID |
Transcript identifier |
n_reads |
Total reads mapping to the transcript |
n_decorated |
Number of decorated reads |
n_blank |
Number of blank reads |
n_C_counts / n_G_counts /
n_U_counts
|
Reads containing C, G, or U residues |
n_C_hits / n_G_hits /
n_U_hits
|
Total C, G, or U residue occurrences |
mean_polya_length |
Mean poly(A) tail length |
median_polya_length |
Median poly(A) tail length |
Statistical analysis
Fisher’s exact test
Compare non-adenosine frequencies between two conditions:
fisher_results <- ninetails::calculate_fisher(
ninetails_data,
grouping_factor = "condition",
comparison = c("control", "treatment")
)Parameters
| Parameter | Type | Description |
|---|---|---|
ninetails_data |
data.frame | Class data or merged table |
grouping_factor |
character | Column defining conditions |
comparison |
character(2) | Two condition names to compare |
The function returns a data frame with Fisher’s exact test results (p-value, odds ratio, confidence interval) per transcript.
Typical workflow
library(ninetails)
# 1. Load data from multiple samples
samples <- data.frame(
sample_name = c("WT_1", "WT_2", "KO_1", "KO_2"),
group = c("WT", "WT", "KO", "KO"),
class_path = c("WT_1_classes.tsv", "WT_2_classes.tsv",
"KO_1_classes.tsv", "KO_2_classes.tsv"),
residue_path = c("WT_1_residues.tsv", "WT_2_residues.tsv",
"KO_1_residues.tsv", "KO_2_residues.tsv")
)
class_data <- ninetails::read_class_multiple(samples)
residue_data <- ninetails::read_residue_multiple(samples)
# 2. Annotate with gene symbols
class_data <- ninetails::annotate_with_biomart(class_data, species = "mmusculus")
residue_data <- ninetails::annotate_with_biomart(residue_data, species = "mmusculus")
# 3. Reclassify to remove artifacts
ninetails_data <- ninetails::reclassify_ninetails_data(
residue_data = residue_data,
class_data = class_data,
grouping_factor = "sample_name",
transcript_column = "ensembl_transcript_id_short",
ref = "mmusculus"
)
class_data <- ninetails_data[[1]]
residue_data <- ninetails_data[[2]]
# 4. Count classifications
class_counts <- ninetails::count_class(class_data, grouping_factor = "group")
# 5. Merge and summarize
merged <- ninetails::merge_nonA_tables(class_data, residue_data)
summary <- ninetails::summarize_nonA(merged, summary_factors = "group")
# 6. Statistical comparison
fisher <- ninetails::calculate_fisher(
class_data,
grouping_factor = "group",
comparison = c("WT", "KO")
)
# 7. Explore interactively
ninetails::launch_signal_browser(
config = "config.yml"
)Summary of postprocessing functions
| Function | Description |
|---|---|
read_class_single() |
Load a single read_classes file |
read_class_multiple() |
Load multiple read_classes files with metadata |
read_residue_single() |
Load a single nonadenosine_residues file |
read_residue_multiple() |
Load multiple nonadenosine_residues files with metadata |
annotate_with_biomart() |
Add gene symbols and transcript IDs via biomaRt |
reclassify_ninetails_data() |
Correct misclassifications from AT-rich ends |
correct_class_data() |
Apply corrections to class table |
correct_residue_data() |
Apply corrections to residue table |
correct_labels() |
Fix label inconsistencies |
merge_nonA_tables() |
Combine class and residue data (one row per read) |
spread_nonA_residues() |
Reshape merged table with per-position columns |
count_class() |
Count reads by class or comment code |
count_residues() |
Count residues by type |
count_nonA_abundance() |
Count reads with 1, 2, 3+ non-A residues |
summarize_nonA() |
Per-transcript summary statistics |
calculate_fisher() |
Fisher’s exact test between conditions |
