Extracts decoration-centered fragments of poly(A) tail signal along with positional coordinates.

Splits a single read's poly(A) tail signal into fixed-length (100 data-point) chunks, each centered on a potential non-A decoration detected from the pseudomove vector. Chunks whose start position falls before index 1 are left-padded with values randomly sampled from the three most frequent signal values.

Usage

split_tail_centered_trainingset(readname, tail_feature_list)

Arguments

readname: Character string. Name (UUID) of the given read.
tail_feature_list: List object produced by create_tail_feature_list_trainingset.

Value

A named nested list where each element corresponds to one chunk and contains:

chunk_sequence: Numeric vector (length 100). The signal fragment centered on the potential modification.
chunk_start_pos: Integer. Start index of the chunk within the full tail signal (may be negative for left-padded chunks).
chunk_end_pos: Integer. End index of the chunk within the full tail signal.
pseudomoves: Numeric vector (length 100). Recomputed pseudomoves for the chunk. Coordinates are from the 3' end.

Details

This training-set variant includes an additional pseudomoves element in each chunk sublist, making the output suitable for supervised training/validation data preparation. Pseudomoves for each extracted chunk are recomputed by calling filter_signal_by_threshold on the chunk sequence.

The centering procedure:

Runs RLE on the pseudomove vector.
Selects runs of length >= 4 with non-zero values (empirical modification threshold).
Centers a 100-element window on the midpoint of each selected run.

Chunk names follow the convention <readname>_<index>, where index is the sequential position of the modification within the read (numbered from the 3' end).

Examples

if (FALSE) { # \dontrun{

split_tail_centered_trainingset(
  readname = "1234-anexample-r3adn4m3",
  tail_feature_list = tail_feature_list)

} # }

Usage

Arguments

Value

Details

See also

Examples