Skip to contents

This function splits a large BAM file into smaller parts based on read IDs from corresponding Dorado summary files. This is essential for memory management when processing large cDNA datasets. The function filters the BAM file to include only reads present in the summary file and creates appropriately sized output files.

Usage

split_bam_file_cdna(
  bam_file,
  dorado_summary,
  part_size = 100000,
  save_dir,
  part_number,
  cli_log = message
)

Arguments

bam_file

Character string. Path to input BAM file to be split.

dorado_summary

Character string. Path to corresponding Dorado summary file containing read IDs to include in this part.

part_size

Integer. Target number of reads per output file part.

save_dir

Character string. Directory where split BAM files will be saved.

part_number

Integer. Part number for naming output files.

cli_log

Function for logging messages and progress.

Value

Character vector of output BAM file paths created.

Examples

if (FALSE) { # \dontrun{
bam_files <- split_bam_file_cdna(
  bam_file = "large_dataset.bam",
  dorado_summary = "summary_part1.txt",
  part_size = 40000,
  save_dir = "bam_parts/",
  part_number = 1,
  cli_log = message
)
} # }