
Extract poly(A) data from nanopolish output and sequencing summary
Source:R/ninetails_core_functions.R
extract_polya_data.RdExtracts features of poly(A) tails of selected RNA reads from the output table provided by nanopolish polya function and the sequencing summary provided by the sequencer. Filenames are taken from the sequencing summary file. Only reads with tail lengths estimated as >= 10 nt by nanopolish polya function are taken into account.
Arguments
- nanopolish
Character string or data frame. Either the full path of the
.tsvfile produced by nanopolish polya function or an in-memory data frame containing nanopolish data.- sequencing_summary
Character string or data frame. Either the full path of the
.txtfile with sequencing summary or an in-memory data frame containing sequencing summary data.- pass_only
Logical. If
TRUE(default), only reads tagged by nanopolish as"PASS"are taken into consideration. IfFALSE, reads tagged as"PASS"and"SUFFCLIP"are both included in the analysis.
Value
A data frame containing read information organized by the read ID. Columns include:
- readname
Character. Read identifier
- polya_start
Integer. Start position of the poly(A) tail in the raw signal
- transcript_start
Integer. Start position of the transcript in the raw signal
- polya_length
Numeric. Estimated poly(A) tail length in nucleotides
- qc_tag
Character. Nanopolish quality control tag
- filename
Character. Name of the source Fast5 file
Always assign the returned data frame to a variable. Printing the full output to the console may crash your R session.
Details
The function performs the following operations:
Reads and validates nanopolish and sequencing summary inputs (accepts both file paths and in-memory data frames)
Filters reads by QC tag (
pass_onlyparameter)Joins nanopolish poly(A) data with sequencing summary by read name
Filters reads with poly(A) tail length >= 10 nt
Removes duplicate entries from secondary alignments
See also
extract_tail_data for extracting tail features from individual
reads, create_tail_feature_list for batch feature extraction