Skip to contents

Generates a five-panel summary visualization describing read categories, poly(A) tail properties, and non-A residue distributions derived from the ninetails pipeline output. Those panel charts provide the most comprehensive characterization of a given pool of reads (representing particular transcript or set of transcripts, respectively).

Usage

plot_panel_characteristics(
  input_residue_data,
  input_class_data = NULL,
  input_merged_nonA_tables_data = NULL,
  type = "default",
  max_length = 300,
  direction_5_prime = TRUE
)

Arguments

input_residue_data

Data frame containing non-A residue predictions.

input_class_data

Optional data frame containing read classification output from ninetails. Mutually exclusive with input_merged_nonA_tables_data.

input_merged_nonA_tables_data

Optional data frame returned by merge_nonA_tables. Mutually exclusive with input_class_data.

type

Character string. Either "default" or "moderna". The "moderna" option marks the default UCUAG pentamer position (100 nt).

max_length

Numeric. Maximum poly(A) tail length displayed in distribution panels.

direction_5_prime

Logical. If TRUE (default), non-A positions are reported from the 5' end of the poly(A) tail. If FALSE, positions are recalculated relative to the 3' end.

Value

A patchwork-assembled ggplot object containing panels A–E.

Details

The function assembles panels A–E into a patchwork layout:

  • A – Read categories (blank, non-A containing, total)

  • B – Counts of reads containing C, G, or U residues

  • C – Distribution of poly(A) tail lengths

  • D – Normalized distribution of non-A positions

  • E – Raw distribution of non-A positions

Internal column filtering

This function internally subsets input data to a predefined set of columns. Any additional columns present in the input data frames are silently dropped. Therefore, all columns listed below must be present in the supplied inputs.

Required columns

input_residue_data

Must contain at least the following columns:

readname

Character. Unique read identifier.

prediction

Character or factor. Predicted non-A residue (e.g., "C", "G", "U").

est_nonA_pos

Numeric. Estimated position of the non-A residue within the poly(A) tail.

polya_length

Numeric. Estimated poly(A) tail length.

These columns are used for:

  • binning positions and tail lengths (Panels D and E),

  • computing normalized residue frequencies,

  • generating residue-level counts and labels.

input_class_data (if supplied)

Must contain at least:

readname

Character. Required for merging with residue data.

group

Character or factor. Experimental group identifier.

This input is internally merged with input_residue_data using merge_nonA_tables.

input_merged_nonA_tables_data (if supplied)

Must be the output of merge_nonA_tables and contain at least the following columns (used for tail distribution, summarization and plotting):

sample

Character. Sample identifier.

group

Character or factor. Experimental group identifier.

readname

Character. Unique read identifier.

prediction

Character or factor. Non-A residue prediction.

est_nonA_pos

Numeric. Estimated non-A position.

polya_length

Numeric. Poly(A) tail length.

class

Character. Read classification label.

comments

Character. Additional classification comments.

transcript

Character. Transcript name.

ensembl_transcript_id_full

Character. Full Ensembl transcript ID.

ensembl_transcript_id_short

Character. Short Ensembl transcript ID.

prediction_C

Numeric or logical. Indicator/count of C residues.

prediction_G

Numeric or logical. Indicator/count of G residues.

prediction_U

Numeric or logical. Indicator/count of U residues.

nonA_residues

Character. Encoded non-A residue information; NA indicates blank tail.

These columns are required because they are internally retained for:

  • classification summaries (Panels A and B),

  • tail length distributions (Panel C),

  • grouping and normalization logic,

  • detection of blank vs non-A containing reads.

Single-group requirement

The input data must represent exactly one experimental group. If multiple unique values are detected in the group column, the function stops with an error.

Multiple samples within a single group are allowed.

Examples

if (FALSE) { # \dontrun{
residue_data_wt <- residue_data[residue_data$group == "WT", ]

plot_panel_characteristics(
  input_residue_data = residue_data_wt,
  input_class_data = class_data_wt,
  type = "default",
  max_length = 100
)
} # }