Meth_Comp API usage

Import module

# Import main module 
from pycoMeth.Meth_Comp import Meth_Comp

# optionally inport jupyter helper functions
from pycoMeth.common import head, jhelp

Getting help

jhelp(Meth_Comp)

Meth_Comp (aggregate_fn_list, ref_fasta_fn, output_tsv_fn, output_bed_fn, max_missing, min_diff_llr, sample_id_list, pvalue_adj_method, pvalue_threshold, only_tested_sites, verbose, quiet, progress, kwargs)

Compare methylation values for each CpG positions or intervals between n samples and perform a statistical test to evaluate if the positions are significantly different. For 2 samples a Mann_Withney test is performed otherwise multiples samples are compared with a Kruskal Wallis test. pValues are adjusted for multiple tests using the Benjamini & Hochberg procedure for controlling the false discovery rate.

aggregate_fn_list (required) [list(str)]

A list of output tsv files corresponding to several samples to compare generated by either CpG_Aggregate or Interval_Aggregate. (can be gzipped)

ref_fasta_fn (required) [str]

Reference file used for alignment in Fasta format (ideally already indexed with samtools faidx)

output_tsv_fn (default: None) [str]

Path to write an more extensive result report in TSV format (At least 1 output file is required) (can be gzipped)

output_bed_fn (default: None) [str]

Path to write a summary result file in BED format (At least 1 output file is required) (can be gzipped)

max_missing (default: 0) [int]

Max number of missing samples to perform the test

min_diff_llr (default: 2) [float]

Minimal llr boundary for negative and positive median llr. The test if only performed if at least one sample has a median llr above (methylated) and 1 sample has a median llr below (unmethylated)

sample_id_list (default: None) [list(str)]

list of sample ids to annotate results in tsv file

pvalue_adj_method (default: fdr_bh) [str]

Method to use for pValue multiple test adjustment

pvalue_threshold (default: 0.01) [float]

Alpha parameter (family-wise error rate) for pValue adjustment

only_tested_sites (default: False) [bool]

Do not include sites that were not tested because of insufficient samples or effect size in the report

verbose (default: False) [bool]
quiet (default: False) [bool]
progress (default: False) [bool]
kwargs

Example usage

Usage with CpG Aggregate output

ff = Meth_Comp (
    aggregate_fn_list=[
        "./data/Yeast_CpG_1.tsv.gz", 
        "./data/Yeast_CpG_2.tsv.gz", 
        "./data/Yeast_CpG_3.tsv.gz", 
        "./data/Yeast_CpG_4.tsv.gz"],
    ref_fasta_fn="./data/yeast.fa",
    output_bed_fn="./results/Yeast_CpG_meth_comp.bed",
    output_tsv_fn="./results/Yeast_CpG_meth_comp.tsv.gz",
    sample_id_list=["S1","S2","S3","S4"],
    only_tested_sites=True,
    progress=True)

head("./results/Yeast_CpG_meth_comp.tsv.gz")
head("./results/Yeast_CpG_meth_comp.bed")

## Checking options and input files ##
## Parsing files ##
    Reading input files header and checking consistancy between headers
    Starting asynchronous file parsing
    Progress: 37.1M bytes [00:07, 5.12M bytes/s]                       
    Adjust pvalues
    Writing output file
    Progress: 100%|██████████| 344/344 [00:00<00:00, 18.8k sites/s]
    Results summary
        Insufficient samples: 211,400
        Insufficient effect size: 33,071
        Valid: 344
        Non-significant pvalue: 344

chromosome start  end    n_samples pvalue              adj_pvalue          neg_med pos_med ambiguous_med labels                med_llr_list                raw_llr_list                                       comment                
I          118799 118800 4         0.29978058859571194 0.4279805197751711  3       1       0             ["S1","S2","S3","S4"] [2.005,-2.645,-6.64,-5.51]  [[-1.82,5.83],[-0.09,-5.2],[-2.67,-10.61],[-6.7...Non-significant pvalue 
I          141415 141416 4         0.11800901597381579 0.3742064977444858  1       1       2             ["S1","S2","S3","S4"] [-4.175,0.135,3.02,-0.02]   [[-1.33,-4.52,-3.83,-4.67],[-1.42,1.69],[6.62,-...Non-significant pvalue 
I          151819 151820 4         0.1315327225513005  0.3742064977444858  1       1       2             ["S1","S2","S3","S4"] [-2.98,-1.95,2.93,-1.48]    [[-3.68,-4.4,0.54,-2.28],[-3.45,-1.95,-1.21],[4...Non-significant pvalue 
I          167230 167237 4         0.10985439586067039 0.3742064977444858  3       1       0             ["S1","S2","S3","S4"] [-6.305,-6.94,3.6,-4.17]    [[-5.06,-7.55],[-6.94,-7.26,-1.12],[3.6,5.68,0....Non-significant pvalue 
II         64407  64408  4         0.16469672531155308 0.3742064977444858  1       1       2             ["S1","S2","S3","S4"] [-1.06,-0.19,-3.68,2.93]    [[-1.06,-0.81,-2.26],[1.65,-2.03],[-3.85,-2.85,...Non-significant pvalue 
II         102416 102417 4         0.06267205003507882 0.3742064977444858  2       1       1             ["S1","S2","S3","S4"] [0.715,-3.05,2.17,-5.535]   [[-0.21,1.64],[-3.3,-2.8],[2.24,2.1],[-4.52,-2....Non-significant pvalue 
II         108834 108835 4         0.1633613146761696  0.3742064977444858  3       1       0             ["S1","S2","S3","S4"] [2.95,-5.16,-3.53,-4.53]    [[4.7,1.2],[-5.89,-4.43],[-6.55,-0.25,-3.53],[-...Non-significant pvalue 
II         110457 110458 4         0.04436795794391088 0.3742064977444858  2       1       1             ["S1","S2","S3","S4"] [-1.405,2.69,-3.285,-7.045] [[-0.97,-1.34,-1.47,-3.97],[-0.84,6.22],[-7.1,0...Non-significant pvalue 
II         112989 112990 4         0.3618050274975317  0.46268003516412975 1       1       2             ["S1","S2","S3","S4"] [-1.38,2.1,-1.94,-2.93]     [[-1.38,4.34,-5.48,-5.45,-0.93],[3.4,0.8],[-1.3...Non-significant pvalue 

track name=meth_comp itemRgb=On
I   118799  118800  .   0   .   118799  118800  230,230,230
I   141415  141416  .   0   .   141415  141416  230,230,230
I   151819  151820  .   0   .   151819  151820  230,230,230
I   167230  167237  .   0   .   167230  167237  230,230,230
II  64407   64408   .   0   .   64407   64408   230,230,230
II  102416  102417  .   0   .   102416  102417  230,230,230
II  108834  108835  .   0   .   108834  108835  230,230,230
II  110457  110458  .   0   .   110457  110458  230,230,230
II  112989  112990  .   0   .   112989  112990  230,230,230

Usage with Interval Aggregate output with a single significant result

ff = Meth_Comp (
    aggregate_fn_list=[
        "./data/Yeast_CGI_1.tsv.gz", 
        "./data/Yeast_CGI_2.tsv.gz", 
        "./data/Yeast_CGI_3.tsv.gz", 
        "./data/Yeast_CGI_4.tsv.gz"],
    ref_fasta_fn="./data/yeast.fa",
    output_bed_fn="./results/Yeast_CGI_meth_comp.bed",
    output_tsv_fn="./results/Yeast_CGI_meth_comp.tsv.gz",
    sample_id_list=["S1","S2","S3","S4"],
    max_missing = 1,
    min_diff_llr = 0,
    only_tested_sites=False,
    progress=True)

head("./results/Yeast_CGI_meth_comp.tsv.gz")
head("./results/Yeast_CGI_meth_comp.bed")

## Checking options and input files ##
## Parsing files ##
    Reading input files header and checking consistancy between headers
    Starting asynchronous file parsing
    Progress: 776k bytes [00:00, 6.48M bytes/s]              
    Adjust pvalues
    Writing output file
    Progress: 100%|██████████| 1.86k/1.86k [00:00<00:00, 36.0k sites/s]
    Results summary
        Insufficient effect size: 938
        Insufficient samples: 921
        Valid: 4
        Non-significant pvalue: 4

chromosome start end   n_samples pvalue adj_pvalue neg_med pos_med ambiguous_med unique_cpg_pos labels           med_llr_list raw_llr_list raw_pos_list comment                  
I          17    333   2         nan    nan        2       0       0             0              ["S1","S4"]      []           []           []           Insufficient samples     
I          1804  2170  3         nan    nan        3       0       0             0              ["S1","S2","S4"] []           []           []           Insufficient effect size 
I          31835 32949 1         nan    nan        1       0       0             0              ["S1"]           []           []           []           Insufficient samples     
I          33497 34371 2         nan    nan        2       0       0             0              ["S1","S2"]      []           []           []           Insufficient samples     
I          38163 38471 2         nan    nan        2       0       0             0              ["S1","S2"]      []           []           []           Insufficient samples     
I          44294 44565 2         nan    nan        2       0       0             0              ["S1","S2"]      []           []           []           Insufficient samples     
I          44730 44988 2         nan    nan        2       0       0             0              ["S1","S2"]      []           []           []           Insufficient samples     
I          45308 45526 2         nan    nan        2       0       0             0              ["S1","S2"]      []           []           []           Insufficient samples     
I          46201 46642 2         nan    nan        2       0       0             0              ["S1","S2"]      []           []           []           Insufficient samples     

track name=meth_comp itemRgb=On
I   17  333 .   0   .   17  333 230,230,230
I   1804    2170    .   0   .   1804    2170    230,230,230
I   31835   32949   .   0   .   31835   32949   230,230,230
I   33497   34371   .   0   .   33497   34371   230,230,230
I   38163   38471   .   0   .   38163   38471   230,230,230
I   44294   44565   .   0   .   44294   44565   230,230,230
I   44730   44988   .   0   .   44730   44988   230,230,230
I   45308   45526   .   0   .   45308   45526   230,230,230
I   46201   46642   .   0   .   46201   46642   230,230,230

Usage with Interval Aggregate output and larger dataset

# Generate list of file paths and sample ids from source directory
from glob import glob
import os
fn_list = sorted(glob("./data/medaka_CGI_*"))
id_list = [os.path.split(fn)[-1].strip("medaka_CGI_").strip(".tsv.gz") for fn in fn_list]

Meth_Comp (
    aggregate_fn_list=fn_list,
    ref_fasta_fn="./data/medaka.fa",
    output_bed_fn="./results/Medaka_CGI_meth_comp.bed",
    output_tsv_fn="./results/Medaka_CGI_meth_comp.tsv.gz",
    sample_id_list = id_list,
    max_missing = 1,
    min_diff_llr = 1,
    progress=True)

head("./results/Medaka_CGI_meth_comp.tsv.gz", max_char_col=40)
head("./results/Medaka_CGI_meth_comp.bed", max_char_col=40)

## Checking options and input files ##
## Parsing files ##
    Reading input files header and checking consistancy between headers
    Starting asynchronous file parsing
    Progress: 556M bytes [00:47, 11.7M bytes/s]                       
    Adjust pvalues
    Writing output file
    Progress: 100%|██████████| 266k/266k [00:08<00:00, 31.1k sites/s] 
    Results summary
        Insufficient effect size: 201,365
        Insufficient samples: 54,136
        Valid: 10,784
        Non-significant pvalue: 7,252
        Significant pvalue: 3,532

chromosome start end   n_samples pvalue adj_pvalue neg_med pos_med ambiguous_med unique_cpg_pos labels                                   med_llr_list raw_llr_list raw_pos_list comment                  
1          1657  1963  1         nan    nan        0       1       0             0              ["7-2_F2"]                               []           []           []           Insufficient samples     
1          15653 15966 12        nan    nan        0       12      0             0              ["11-1_A3","117-2_C4","131-1_F4","134...[]           []           []           Insufficient effect size 
1          17092 17597 12        nan    nan        0       12      0             0              ["11-1_A3","117-2_C4","131-1_F4","134...[]           []           []           Insufficient effect size 
1          18071 18621 12        nan    nan        0       12      0             0              ["11-1_A3","117-2_C4","131-1_F4","134...[]           []           []           Insufficient effect size 
1          20376 21340 11        nan    nan        0       11      0             0              ["117-2_C4","131-1_F4","134-1_H4","13...[]           []           []           Insufficient effect size 
1          21578 21938 12        nan    nan        0       12      0             0              ["11-1_A3","117-2_C4","131-1_F4","134...[]           []           []           Insufficient effect size 
1          27747 28080 12        nan    nan        0       10      2             0              ["11-1_A3","117-2_C4","131-1_F4","134...[]           []           []           Insufficient effect size 
1          28288 28629 12        nan    nan        0       12      0             0              ["11-1_A3","117-2_C4","131-1_F4","134...[]           []           []           Insufficient effect size 
1          31270 31833 12        nan    nan        0       12      0             0              ["11-1_A3","117-2_C4","131-1_F4","134...[]           []           []           Insufficient effect size 

track name=meth_comp itemRgb=On
1   1657    1963    .   0   .   1657    1963    230,230,230
1   15653   15966   .   0   .   15653   15966   230,230,230
1   17092   17597   .   0   .   17092   17597   230,230,230
1   18071   18621   .   0   .   18071   18621   230,230,230
1   20376   21340   .   0   .   20376   21340   230,230,230
1   21578   21938   .   0   .   21578   21938   230,230,230
1   27747   28080   .   0   .   27747   28080   230,230,230
1   28288   28629   .   0   .   28288   28629   230,230,230
1   31270   31833   .   0   .   31270   31833   230,230,230