Meth_Comp API usage
Import module
# Import main module
from pycoMeth.Meth_Comp import Meth_Comp
# optionally inport jupyter helper functions
from pycoMeth.common import head, jhelp
Getting help
jhelp(Meth_Comp)
Meth_Comp (aggregate_fn_list, ref_fasta_fn, output_tsv_fn, output_bed_fn, max_missing, min_diff_llr, sample_id_list, pvalue_adj_method, pvalue_threshold, only_tested_sites, verbose, quiet, progress, kwargs)
Compare methylation values for each CpG positions or intervals between n samples and perform a statistical test to evaluate if the positions are significantly different. For 2 samples a Mann_Withney test is performed otherwise multiples samples are compared with a Kruskal Wallis test. pValues are adjusted for multiple tests using the Benjamini & Hochberg procedure for controlling the false discovery rate.
- aggregate_fn_list (required) [list(str)]
A list of output tsv files corresponding to several samples to compare generated by either CpG_Aggregate or Interval_Aggregate. (can be gzipped)
- ref_fasta_fn (required) [str]
Reference file used for alignment in Fasta format (ideally already indexed with samtools faidx)
- output_tsv_fn (default: None) [str]
Path to write an more extensive result report in TSV format (At least 1 output file is required) (can be gzipped)
- output_bed_fn (default: None) [str]
Path to write a summary result file in BED format (At least 1 output file is required) (can be gzipped)
- max_missing (default: 0) [int]
Max number of missing samples to perform the test
- min_diff_llr (default: 2) [float]
Minimal llr boundary for negative and positive median llr. The test if only performed if at least one sample has a median llr above (methylated) and 1 sample has a median llr below (unmethylated)
- sample_id_list (default: None) [list(str)]
list of sample ids to annotate results in tsv file
- pvalue_adj_method (default: fdr_bh) [str]
Method to use for pValue multiple test adjustment
- pvalue_threshold (default: 0.01) [float]
Alpha parameter (family-wise error rate) for pValue adjustment
- only_tested_sites (default: False) [bool]
Do not include sites that were not tested because of insufficient samples or effect size in the report
verbose (default: False) [bool]
quiet (default: False) [bool]
progress (default: False) [bool]
kwargs
Example usage
Usage with CpG Aggregate output
ff = Meth_Comp (
aggregate_fn_list=[
"./data/Yeast_CpG_1.tsv.gz",
"./data/Yeast_CpG_2.tsv.gz",
"./data/Yeast_CpG_3.tsv.gz",
"./data/Yeast_CpG_4.tsv.gz"],
ref_fasta_fn="./data/yeast.fa",
output_bed_fn="./results/Yeast_CpG_meth_comp.bed",
output_tsv_fn="./results/Yeast_CpG_meth_comp.tsv.gz",
sample_id_list=["S1","S2","S3","S4"],
only_tested_sites=True,
progress=True)
head("./results/Yeast_CpG_meth_comp.tsv.gz")
head("./results/Yeast_CpG_meth_comp.bed")
Usage with Interval Aggregate output with a single significant result
ff = Meth_Comp (
aggregate_fn_list=[
"./data/Yeast_CGI_1.tsv.gz",
"./data/Yeast_CGI_2.tsv.gz",
"./data/Yeast_CGI_3.tsv.gz",
"./data/Yeast_CGI_4.tsv.gz"],
ref_fasta_fn="./data/yeast.fa",
output_bed_fn="./results/Yeast_CGI_meth_comp.bed",
output_tsv_fn="./results/Yeast_CGI_meth_comp.tsv.gz",
sample_id_list=["S1","S2","S3","S4"],
max_missing = 1,
min_diff_llr = 0,
only_tested_sites=False,
progress=True)
head("./results/Yeast_CGI_meth_comp.tsv.gz")
head("./results/Yeast_CGI_meth_comp.bed")
Usage with Interval Aggregate output and larger dataset
# Generate list of file paths and sample ids from source directory
from glob import glob
import os
fn_list = sorted(glob("./data/medaka_CGI_*"))
id_list = [os.path.split(fn)[-1].strip("medaka_CGI_").strip(".tsv.gz") for fn in fn_list]
Meth_Comp (
aggregate_fn_list=fn_list,
ref_fasta_fn="./data/medaka.fa",
output_bed_fn="./results/Medaka_CGI_meth_comp.bed",
output_tsv_fn="./results/Medaka_CGI_meth_comp.tsv.gz",
sample_id_list = id_list,
max_missing = 1,
min_diff_llr = 1,
progress=True)
head("./results/Medaka_CGI_meth_comp.tsv.gz", max_char_col=40)
head("./results/Medaka_CGI_meth_comp.bed", max_char_col=40)