
Perform Fisher's exact test per transcript with BH p-value adjustment
Source:R/ninetails_statistic.R
calculate_fisher.RdRuns Fisher's exact test for each transcript in ninetails output data
and applies the Benjamini-Hochberg (BH) step-up procedure to control
the false discovery rate. This is a high-level wrapper for
fisher.test and p.adjust with
additional data wrangling features.
Usage
calculate_fisher(
ninetails_data,
transcript_id_column = "ensembl_transcript_id_short",
min_reads = 0,
min_nonA_reads = 0,
grouping_factor = "sample_name",
condition1 = NA,
condition2 = NA,
alpha = 0.05,
base = "C",
...
)Arguments
- ninetails_data
Data frame. The output of
merge_nonA_tables(merged tabular output containing read classification and non-A position data).- transcript_id_column
Character string. Column with transcript ID data. Default:
"ensembl_transcript_id_short"; can be changed by the user.- min_reads
Numeric. Minimum number of reads representing a given transcript to include it in the analysis. Default: 0. Keep in mind that including many transcripts with low coverage increases the risk of rejecting the true null hypothesis (Benjamini-Hochberg procedure).
- min_nonA_reads
Numeric. Minimum number of reads containing non-adenosine residues (sum of C, G, U) per transcript to include it in the analysis. This prevents considering too many observations as non-significant during p-value adjustment. Non-A-containing reads are typically a small fraction of the total pool, so additional filtering can provide more meaningful results. Default: 0.
- grouping_factor
Character string. Name of the grouping variable. Default:
"sample_name".- condition1
Character string. First level of
grouping_factorto use for comparison. Required whengrouping_factorhas more than 2 levels.- condition2
Character string. Second level of
grouping_factorto use for comparison. Required whengrouping_factorhas more than 2 levels.- alpha
Numeric. Significance threshold for FDR. Default: 0.05.
- base
Character string. Letter representing the non-A nucleotide for which statistics are computed. Accepted values:
"C","G","U", or"all"(for all non-A residues combined). Default:"C".- ...
Additional parameters passed to
nonA_fisher(under development).
Value
A tibble with per-transcript test results, sorted by adjusted p-value. Columns include:
- <transcript_id_column>
Transcript identifier
- p.value
Numeric. Raw p-value from Fisher's exact test
- stats_code
Character. Descriptive status code (see
stat_codes_listfor full definitions)- padj
Numeric. BH-adjusted p-value
- significance
Character.
"FDR<alpha"if significant,"NotSig"otherwise
Acknowledgements
Inspired by the Nanotail package written and maintained by Pawel Krawczyk (smaegol): https://github.com/LRB-IIMCB/nanotail/blob/dev/R/polya_stats.R. Many thanks to the developer of the original source code.
See also
nonA_fisher for the per-transcript Fisher's exact test,
merge_nonA_tables for preparing the input,
summarize_nonA for contingency table computation
Examples
if (FALSE) { # \dontrun{
test <- ninetails::calculate_fisher(
ninetails_data = merged_nonA_tables,
transcript_id_column = "ensembl_transcript_id_short",
min_reads = 100,
min_nonA_reads = 10,
grouping_factor = "sample_name",
condition1 = "WT",
condition2 = "KO",
alpha = 0.05,
base = "C"
)
} # }