
Process a single (unsplit) poly(A) data file through the Guppy pipeline.
Source:R/ninetails_check_tails_guppy.R
process_polya_complete.RdRuns the full ninetails analysis on a small-to-moderate poly(A) data
set (up to part_size rows). Performs format validation,
feature extraction, tail segmentation, GAF computation, CNN
classification, and output assembly.
Usage
process_polya_complete(
polya_data,
sequencing_summary,
workspace,
num_cores,
basecall_group,
pass_only,
qc,
save_dir,
prefix,
cli_log,
...
)Arguments
- polya_data
Character string or data frame. Full path of the poly(A) length file (
.tsv), or an in-memory data frame.- sequencing_summary
Character string or data frame. Full path of the sequencing summary file, or an in-memory data frame.
- workspace
Character string. Full path of the directory containing basecalled multi-Fast5 files.
- num_cores
Numeric
[1]. Number of physical cores.- basecall_group
Character string
["Basecall_1D_000"]. Fast5 hierarchy level for data extraction.- pass_only
Logical
[TRUE]. IfTRUE, only"PASS"reads are included.- qc
Logical
[TRUE]. IfTRUE, terminal artefact positions are labelled with"-WARN".- save_dir
Character string. Output directory path.
- prefix
Character string (optional). Output file name prefix.
- cli_log
Function. Logging closure defined in
check_tails_guppyfor formatted console and log file output.- ...
Additional arguments (currently unused).
Value
A named list with two data frames:
- read_classes
Per-read classification data.
- nonadenosine_residues
Detailed non-A residue positional data.
Details
This function is called internally by
check_tails_guppy (for inputs within the
part_size limit) and by
process_polya_parts (for each part of a split input).
It requires the cli_log closure defined inside
check_tails_guppy for formatted logging, and therefore
should not be called directly by the user.
See also
check_tails_guppy which calls this function,
process_polya_parts for the split-input variant,
check_polya_length_filetype for format detection,
create_tail_feature_list,
create_tail_chunk_list,
create_gaf_list,
predict_gaf_classes,
create_outputs for the individual pipeline steps.