Skip to contents

Iterates over a set of poly(A) data part files (produced by split_polya_data), processing each sequentially via process_polya_complete, saving per-part outputs, and merging all results into a single output list.

Usage

process_polya_parts(
  part_files,
  sequencing_summary,
  workspace,
  num_cores,
  basecall_group,
  pass_only,
  qc,
  save_dir,
  prefix,
  cli_log,
  ...
)

Arguments

part_files

Character vector. File paths to poly(A) data part files (as returned by split_polya_data).

sequencing_summary

Character string or data frame. Full path of the sequencing summary file, or an in-memory data frame.

workspace

Character string. Full path of the directory containing basecalled multi-Fast5 files.

num_cores

Numeric [1]. Number of physical cores.

basecall_group

Character string ["Basecall_1D_000"]. Fast5 hierarchy level for data extraction.

pass_only

Logical [TRUE]. If TRUE, only "PASS" reads are included.

qc

Logical [TRUE]. If TRUE, terminal artefact positions are labelled with "-WARN".

save_dir

Character string. Output directory path.

prefix

Character string (optional). Output file name prefix.

cli_log

Function. Logging closure defined in check_tails_guppy.

...

Additional arguments passed to process_polya_complete.

Value

A named list with two data frames (merged across all parts):

read_classes

Per-read classification data.

nonadenosine_residues

Detailed non-A residue positional data.

Details

This function is called internally by check_tails_guppy when the input exceeds the part_size threshold. It requires the cli_log closure defined inside check_tails_guppy for formatted logging, and therefore should not be called directly by the user.

For each part, intermediate results are saved as TSV files in a dedicated subdirectory (part_<i>_of_<n>) within save_dir. After all parts are processed, read_classes and nonadenosine_residues tables are merged with rbind.

See also

check_tails_guppy which calls this function, split_polya_data for the input splitting step, process_polya_complete for the per-part processing.