Meth_Comp CLI usage

Activate virtual environment

# Activate Conda env 
conda activate pycoMeth

(pycoMeth) (pycoMeth)

Getting help

pycoMeth Meth_Comp --help

usage: pycoMeth Meth_Comp [-h] -i [AGGREGATE_FN_LIST [AGGREGATE_FN_LIST ...]]
                          -f REF_FASTA_FN [-b OUTPUT_BED_FN]
                          [-t OUTPUT_TSV_FN] [-m MAX_MISSING]
                          [-l MIN_DIFF_LLR]
                          [-s [SAMPLE_ID_LIST [SAMPLE_ID_LIST ...]]]
                          [--pvalue_adj_method PVALUE_ADJ_METHOD]
                          [--pvalue_threshold PVALUE_THRESHOLD]
                          [--only_tested_sites] [-v] [-q] [-p]

Compare methylation values for each CpG positions or intervals between n
samples and perform a statistical test to evaluate if the positions are
significantly different. For 2 samples a Mann_Withney test is performed
otherwise multiples samples are compared with a Kruskal Wallis test. pValues
are adjusted for multiple tests using the Benjamini & Hochberg procedure for
controlling the false discovery rate.

optional arguments:
  -h, --help            show this help message and exit

Input/Output options:
  -i [AGGREGATE_FN_LIST [AGGREGATE_FN_LIST ...]], --aggregate_fn_list [AGGREGATE_FN_LIST [AGGREGATE_FN_LIST ...]]
                        A list of output tsv files corresponding to several
                        samples to compare generated by either CpG_Aggregate
                        or Interval_Aggregate. (can be gzipped) (required)
                        [str]
  -f REF_FASTA_FN, --ref_fasta_fn REF_FASTA_FN
                        Reference file used for alignment in Fasta format
                        (ideally already indexed with samtools faidx)
                        (required) [str]
  -b OUTPUT_BED_FN, --output_bed_fn OUTPUT_BED_FN
                        Path to write a summary result file in BED format (At
                        least 1 output file is required) (can be gzipped)
                        (default: None) [str]
  -t OUTPUT_TSV_FN, --output_tsv_fn OUTPUT_TSV_FN
                        Path to write an more extensive result report in TSV
                        format (At least 1 output file is required) (can be
                        gzipped) (default: None) [str]

Misc options:
  -m MAX_MISSING, --max_missing MAX_MISSING
                        Max number of missing samples to perform the test
                        (default: 0) [int]
  -l MIN_DIFF_LLR, --min_diff_llr MIN_DIFF_LLR
                        Minimal llr boundary for negative and positive median
                        llr. The test if only performed if at least one sample
                        has a median llr above (methylated) and 1 sample has a
                        median llr below (unmethylated) (default: 2) [float]
  -s [SAMPLE_ID_LIST [SAMPLE_ID_LIST ...]], --sample_id_list [SAMPLE_ID_LIST [SAMPLE_ID_LIST ...]]
                        list of sample ids to annotate results in tsv file
                        (default: None) [str]
  --pvalue_adj_method PVALUE_ADJ_METHOD
                        Method to use for pValue multiple test adjustment
                        (default: fdr_bh) [str]
  --pvalue_threshold PVALUE_THRESHOLD
                        Alpha parameter (family-wise error rate) for pValue
                        adjustment (default: 0.01) [float]
  --only_tested_sites   Do not include sites that were not tested because of
                        insufficient samples or effect size in the report
                        (default: False) [None]

Verbosity options:
  -v, --verbose         Increase verbosity
  -q, --quiet           Reduce verbosity
  -p, --progress        Display a progress bar
(pycoMeth)

Example usage

Usage with CpG Aggregate output

pycoMeth Meth_Comp \
    -i  "./data/Yeast_CpG_1.tsv.gz" "./data/Yeast_CpG_2.tsv.gz" "./data/Yeast_CpG_3.tsv.gz" "./data/Yeast_CpG_4.tsv.gz" \
    -f "./data/yeast.fa" \
    -b "./results/CLI_Yeast_CpG_meth_comp.bed" \
    -t "./results/CLI_Yeast_CpG_meth_comp.tsv" \
    -s S1 S2 S3 S4 \
    -l 0.5 \
    --pvalue_threshold 0.05 \
    --only_tested_sites \
    --progress

head ./results/CLI_Yeast_CpG_meth_comp.bed
head ./results/CLI_Yeast_CpG_meth_comp.tsv

## Checking options and input files ##
## Parsing files ##
    Reading input files header and checking consistancy between headers
    Starting asynchronous file parsing
    Progress: 37.1M bytes [00:08, 4.31M bytes/s]                                   
    Adjust pvalues
    Writing output file
    Progress: 100%|██████████████████████| 3.28k/3.28k [00:00<00:00, 21.8k sites/s]
    Results summary
        Insufficient samples: 211,400
        Insufficient effect size: 30,132
        Valid: 3,283
        Non-significant pvalue: 3,283
(pycoMeth) (pycoMeth) track name=meth_comp itemRgb=On
I   109610  109611  .   0   .   109610  109611  230,230,230
I   109667  109668  .   0   .   109667  109668  230,230,230
I   110046  110047  .   0   .   110046  110047  230,230,230
I   110157  110158  .   0   .   110157  110158  230,230,230
I   110226  110227  .   0   .   110226  110227  230,230,230
I   110246  110247  .   0   .   110246  110247  230,230,230
I   110689  110690  .   0   .   110689  110690  230,230,230
I   110925  110926  .   0   .   110925  110926  230,230,230
I   111021  111022  .   0   .   111021  111022  230,230,230
(pycoMeth) chromosome   start   end n_samples   pvalue  adj_pvalue  neg_med pos_med ambiguous_med   labels  med_llr_list    raw_llr_list    comment
I   109610  109611  4   0.5355384957105982  0.637278314240832   3   1   0   ["S1","S2","S3","S4"]   [1.06,-3.0,-3.285,-3.575]   [[2.83,-0.71],[-4.85,1.83,-3.0],[0.87,-7.44],[0.13,-4.22,-2.93,-6.8]]   Non-significant pvalue
I   109667  109668  4   0.6210918363528521  0.6997407339555297  3   1   0   ["S1","S2","S3","S4"]   [0.555,-1.45,-4.96,-2.57]   [[5.93,-4.82],[-1.45,-1.28,-5.4],[-5.79,-4.13],[-5.81,-1.87,-0.82,-2.57,-5.08]] Non-significant pvalue
I   110046  110047  4   0.4846302240896865  0.6031859226921511  2   1   1   ["S1","S2","S3","S4"]   [0.3,-1.33,-3.355,1.32] [[-2.0,2.6],[-1.94,-1.33,-1.2],[-1.73,-4.98],[3.33,1.8,-1.03,-5.67,1.32]]   Non-significant pvalue
I   110157  110158  4   0.10079374072306878 0.4770356012343023  3   1   0   ["S1","S2","S3","S4"]   [-0.535,1.14,-3.28,-2.96]   [[1.03,-2.1],[1.14,3.45,-0.49],[-2.79,-3.77],[-3.61,-0.77,-2.96,-0.32,-7.67]]   Non-significant pvalue
I   110226  110227  4   0.2562512982158426  0.48938810639848035 3   1   0   ["S1","S2","S3","S4"]   [1.025,-1.21,-0.875,-2.85]  [[1.13,0.92],[0.46,-1.45,-1.21],[-1.62,-0.13],[-6.69,2.96,-4.64,-1.5,-2.85]]    Non-significant pvalue
I   110246  110247  4   0.16789646023633203 0.4770356012343023  2   1   1   ["S1","S2","S3","S4"]   [-2.63,0.57,-2.05,0.2]  [[-3.84,-1.42],[-1.04,0.91,0.57],[-3.46,-0.64],[-3.17,0.17,0.94,0.2,1.83]]  Non-significant pvalue
I   110689  110690  4   0.15407920997458144 0.4770356012343023  3   1   0   ["S1","S2","S3","S4"]   [-1.885,0.5,-6.53,-2.37]    [[-1.89,-1.88],[1.32,-0.32],[-2.53,-10.53],[-1.76,0.49,-5.91,-2.37,-4.05]]  Non-significant pvalue
I   110925  110926  4   0.45883970397306584 0.5920104435099945  3   1   0   ["S1","S2","S3","S4"]   [-3.06,-5.365,-1.785,0.61]  [[-3.06,-4.18,2.33],[-10.58,-0.15],[-5.51,1.94],[1.84,-1.45,0.61,2.65,0.35]]    Non-significant pvalue
I   111021  111022  4   0.10926389869102016 0.4770356012343023  3   1   0   ["S1","S2","S3","S4"]   [-7.8,-4.85,-4.94,0.51] [[-7.8,-3.83,-8.36],[-3.64,-6.06],[-4.19,-5.69],[2.71,1.03,0.32,-5.6,0.51]] Non-significant pvalue
(pycoMeth)

Usage with Interval Aggregate output

pycoMeth Meth_Comp \
    -i ./data/medaka_CGI_* \
    -f "./data/medaka.fa" \
    -b "./results/CLI_Medaka_CGI_meth_comp.bed" \
    -t "./results/CLI_Medaka_CGI_meth_comp.tsv" \
    --progress

head ./results/CLI_Medaka_CGI_meth_comp.bed
head ./results/CLI_Medaka_CGI_meth_comp.tsv

## Checking options and input files ##
## Parsing files ##
    Reading input files header and checking consistancy between headers
    Starting asynchronous file parsing
    Progress: 556M bytes [00:40, 13.6M bytes/s]                                    
    Adjust pvalues
    Writing output file
    Progress: 100%|████████████████████████| 266k/266k [00:05<00:00, 45.8k sites/s]
    Results summary
        Insufficient effect size: 156,368
        Insufficient samples: 108,385
        Valid: 1,532
        Significant pvalue: 1,106
        Non-significant pvalue: 426
(pycoMeth) (pycoMeth) track name=meth_comp itemRgb=On
1   1657    1963    .   0   .   1657    1963    230,230,230
1   15653   15966   .   0   .   15653   15966   230,230,230
1   17092   17597   .   0   .   17092   17597   230,230,230
1   18071   18621   .   0   .   18071   18621   230,230,230
1   20376   21340   .   0   .   20376   21340   230,230,230
1   21578   21938   .   0   .   21578   21938   230,230,230
1   27747   28080   .   0   .   27747   28080   230,230,230
1   28288   28629   .   0   .   28288   28629   230,230,230
1   31270   31833   .   0   .   31270   31833   230,230,230
(pycoMeth) chromosome   start   end n_samples   pvalue  adj_pvalue  neg_med pos_med ambiguous_med   unique_cpg_pos  labels  med_llr_list    raw_llr_list    raw_pos_list    comment
1   1657    1963    1   nan nan 0   1   0   0   [9] []  []  []  Insufficient samples
1   15653   15966   12  nan nan 0   12  0   0   [0,1,2,3,4,5,6,7,8,9,10,11] []  []  []  Insufficient effect size
1   17092   17597   12  nan nan 0   6   6   0   [0,1,2,3,4,5,6,7,8,9,10,11] []  []  []  Insufficient effect size
1   18071   18621   12  nan nan 0   12  0   0   [0,1,2,3,4,5,6,7,8,9,10,11] []  []  []  Insufficient effect size
1   20376   21340   11  nan nan 0   11  0   0   [1,2,3,4,5,6,7,8,9,10,11]   []  []  []  Insufficient samples
1   21578   21938   12  nan nan 0   10  2   0   [0,1,2,3,4,5,6,7,8,9,10,11] []  []  []  Insufficient effect size
1   27747   28080   12  nan nan 0   6   6   0   [0,1,2,3,4,5,6,7,8,9,10,11] []  []  []  Insufficient effect size
1   28288   28629   12  nan nan 0   11  1   0   [0,1,2,3,4,5,6,7,8,9,10,11] []  []  []  Insufficient effect size
1   31270   31833   12  nan nan 0   8   4   0   [0,1,2,3,4,5,6,7,8,9,10,11] []  []  []  Insufficient effect size
(pycoMeth)