Skip to contents

Runs the full ninetails analysis on a small-to-moderate poly(A) data set (up to part_size rows). Performs format validation, feature extraction, tail segmentation, GAF computation, CNN classification, and output assembly.

Usage

process_polya_complete(
  polya_data,
  sequencing_summary,
  workspace,
  num_cores,
  basecall_group,
  pass_only,
  qc,
  save_dir,
  prefix,
  cli_log,
  ...
)

Arguments

polya_data

Character string or data frame. Full path of the poly(A) length file (.tsv), or an in-memory data frame.

sequencing_summary

Character string or data frame. Full path of the sequencing summary file, or an in-memory data frame.

workspace

Character string. Full path of the directory containing basecalled multi-Fast5 files.

num_cores

Numeric [1]. Number of physical cores.

basecall_group

Character string ["Basecall_1D_000"]. Fast5 hierarchy level for data extraction.

pass_only

Logical [TRUE]. If TRUE, only "PASS" reads are included.

qc

Logical [TRUE]. If TRUE, terminal artefact positions are labelled with "-WARN".

save_dir

Character string. Output directory path.

prefix

Character string (optional). Output file name prefix.

cli_log

Function. Logging closure defined in check_tails_guppy for formatted console and log file output.

...

Additional arguments (currently unused).

Value

A named list with two data frames:

read_classes

Per-read classification data.

nonadenosine_residues

Detailed non-A residue positional data.

Details

This function is called internally by check_tails_guppy (for inputs within the part_size limit) and by process_polya_parts (for each part of a split input). It requires the cli_log closure defined inside check_tails_guppy for formatted logging, and therefore should not be called directly by the user.

See also

check_tails_guppy which calls this function, process_polya_parts for the split-input variant, check_polya_length_filetype for format detection, create_tail_feature_list, create_tail_chunk_list, create_gaf_list, predict_gaf_classes, create_outputs for the individual pipeline steps.