Skip to contents

Processes raw poly(A) tail signals and pseudomoves (as generated by Dorado) in parallel to extract candidate signal fragments potentially containing non-A nucleotides. Each fragment is 100 signal points long, centered on pseudomove runs of sufficient length, and is returned with its positional coordinates. The resulting data are organized into a nested list keyed by read IDs.

Usage

create_tail_chunk_list_dorado(tail_feature_list, num_cores)

Arguments

tail_feature_list

list object produced by create_tail_feature_list or an equivalent Dorado-tail feature extraction function. Must contain per-read entries with $tail_signal and $tail_pseudomoves.

num_cores

numeric [1]. Number of physical cores to use in processing. Do not exceed 1 less than the number of available cores on your machine.

Value

A nested list containing the segmented tail data (chunks and coordinates), organized by read IDs. Each read entry contains one or more fragments, where each fragment is a list with:

  • chunk_sequence: numeric vector of raw signal values (length 100)

  • chunk_start_pos: integer, starting index of the chunk

  • chunk_end_pos: integer, ending index of the chunk

Details

This Dorado-specific function differs from the Guppy-based version: * moves are not used (to avoid costly BAM parsing and processing) * pseudomoves are corrected at the tail ends (last 3 values forced to 0)

Parallelization is handled with foreach and doSNOW, allowing efficient scaling across multiple CPU cores. A progress bar is displayed to monitor job completion.

Examples

if (FALSE) { # \dontrun{
tcl_dorado <- ninetails::create_tail_chunk_list_dorado(
  tail_feature_list = tfl,
  num_cores = 3
)
} # }