
Extracts features of poly(A) tails of ONT RNA reads required for finding non-A nucleotides within the given tails.
Source:R/ninetails_training_dataset_production_functions.R
create_tail_feature_list_trainingset.RdThis is the training-set variant of the feature extraction wrapper. It
processes all reads in parallel, calling
extract_tail_data_trainingset for each read, then filters
out reads with zero-moved tails or pseudomove chains too short to
represent potential modifications.
Usage
create_tail_feature_list_trainingset(
nanopolish,
sequencing_summary,
workspace,
num_cores,
basecall_group,
pass_only = TRUE
)Arguments
- nanopolish
Character string. Full path of the
.tsvfile produced bynanopolish polya.- sequencing_summary
Character string. Full path of the
.txtfile with the sequencing summary.- workspace
Character string. Full path of the directory containing basecalled multi-Fast5 files.
- num_cores
Numeric
[1]. Number of physical cores to use. Do not exceed 1 less than the number of cores at your disposal.- basecall_group
Character string
["Basecall_1D_000"]. Name of the level in the Fast5 file hierarchy from which the data should be extracted.- pass_only
Logical
[TRUE]. IfTRUE, only reads tagged by nanopolish as"PASS"are retained. Otherwise, reads tagged as"PASS"or"SUFFCLIP"are included.
Value
A named list with three elements:
- tail_feature_list
Named list of per-read feature lists (as returned by
extract_tail_data_trainingset).- zeromoved_readnames
Character vector. Read IDs discarded because their tail moves summed to zero.
- nonpseudomoved_readnames
Character vector. Read IDs discarded because their pseudomove chains were too short (< 4).
Always assign this returned list to a variable; printing the full list to the console may crash the R session.
Details
The function differs from its production counterpart in that it retains reads whose pseudomove chains satisfy a length >= 4 criterion, which is required for subsequent modification-centered chunk splitting. Two categories of discarded reads are tracked (zero-moved and non-pseudomoved) and returned alongside the valid feature list.
See also
extract_tail_data_trainingset for the per-read
extraction step,
create_tail_chunk_list_trainingset for the next pipeline
step,
extract_polya_data for input data preparation.