
Produces summary table of non-A occurrences within an analyzed dataset.
Source:R/ninetails_data_postprocessing_functions.R
summarize_nonA.RdCreates a per-transcript summary table with read counts, non-A residue counts and hits, and poly(A) tail length statistics, grouped by user-defined factors (e.g. sample, condition).
Arguments
- merged_nonA_tables
Data frame or tibble. Output of
merge_nonA_tables.- summary_factors
Character string or vector of strings. Column name(s) used for grouping (default:
"group").- transcript_id_column
Character string. Column containing the transcript identifier (default:
"ensembl_transcript_id_short", as added during data pre-processing; can be changed by the user).
Value
A tibble with one row per transcript per group, containing:
- polya_median
Numeric. Median poly(A) tail length for the transcript.
- polya_mean
Numeric. Mean poly(A) tail length for the transcript.
- counts_total
Integer. Total number of reads mapped to the transcript.
- counts_blank
Integer. Number of reads with no non-A residues.
- counts_nonA / hits_nonA
Integer. Reads with any non-A / total non-A hits.
- counts_C / hits_C
Integer. Reads with C / total C hits.
- counts_G / hits_G
Integer. Reads with G / total G hits.
- counts_U / hits_U
Integer. Reads with U / total U hits.
Details
The distinction between counts and hits:
counts — number of reads containing at least one occurrence of a given non-A residue type.
hits — total number of occurrences of a given non-A residue type across all reads (a single read may contribute multiple hits).
See also
merge_nonA_tables for preparing the input,
calculate_fisher for statistical testing on merged
data,
annotate_with_biomart for adding gene-level
annotation.