
Extract tail features of a single RNA read from a multi-Fast5 file
Source:R/ninetails_core_functions.R
extract_tail_data.RdExtracts metadata and signal features of a single RNA read from a
multi-Fast5 file basecalled by Guppy. The tail signal, as delimited by
nanopolish polya function, is extracted, winsorized (to remove signal
cliffs), and downsampled to 20% of its original length to facilitate
further analysis. Pseudomoves are computed from the processed signal
using filter_signal_by_threshold.
Arguments
- readname
Character string. Name of the given read (UUID) within the analyzed dataset.
- polya_summary
Data frame. The table containing data extracted from nanopolish and sequencing summary, as produced by
extract_polya_data.- workspace
Character string. Full path of the directory containing the basecalled multi-Fast5 files.
- basecall_group
Character string. Name of the level in the Fast5 file hierarchy from which data should be extracted (e.g.,
"Basecall_1D_000").
Value
A named list containing per-read tail features:
- fast5_filename
Character. Name of the source Fast5 file
- tail_signal
Numeric vector. Winsorized and downsampled poly(A) tail signal
- tail_moves
Numeric vector. Downsampled basecaller moves for the tail region
- tail_pseudomoves
Numeric vector. Pseudomove states computed from the tail signal ({-1, 0, 1})
Always assign the returned list to a variable. Printing the full output to the console may crash your R session.
Details
The function performs the following operations:
Reads raw signal from the multi-Fast5 file via rhdf5
Retrieves basecaller moves and stride information
Extracts the poly(A) tail region of the signal based on nanopolish coordinates
Applies winsorization to the tail signal (
winsorize_signal)Downsamples both signal and moves to 20% of original length via linear interpolation
Computes pseudomoves using
filter_signal_by_threshold
See also
extract_polya_data for preparing the polya_summary
input, create_tail_feature_list for batch extraction,
filter_signal_by_threshold for pseudomove computation