Meth_Comp CLI usage
Activate virtual environment
# Activate Conda env
conda activate pycoMeth
(pycoMeth) (pycoMeth)
Getting help
pycoMeth Meth_Comp --help
usage: pycoMeth Meth_Comp [-h] -i [AGGREGATE_FN_LIST [AGGREGATE_FN_LIST ...]]
-f REF_FASTA_FN [-b OUTPUT_BED_FN]
[-t OUTPUT_TSV_FN] [-m MAX_MISSING]
[-l MIN_DIFF_LLR]
[-s [SAMPLE_ID_LIST [SAMPLE_ID_LIST ...]]]
[--pvalue_adj_method PVALUE_ADJ_METHOD]
[--pvalue_threshold PVALUE_THRESHOLD]
[--only_tested_sites] [-v] [-q] [-p]
Compare methylation values for each CpG positions or intervals between n
samples and perform a statistical test to evaluate if the positions are
significantly different. For 2 samples a Mann_Withney test is performed
otherwise multiples samples are compared with a Kruskal Wallis test. pValues
are adjusted for multiple tests using the Benjamini & Hochberg procedure for
controlling the false discovery rate.
optional arguments:
-h, --help show this help message and exit
Input/Output options:
-i [AGGREGATE_FN_LIST [AGGREGATE_FN_LIST ...]], --aggregate_fn_list [AGGREGATE_FN_LIST [AGGREGATE_FN_LIST ...]]
A list of output tsv files corresponding to several
samples to compare generated by either CpG_Aggregate
or Interval_Aggregate. (can be gzipped) (required)
[str]
-f REF_FASTA_FN, --ref_fasta_fn REF_FASTA_FN
Reference file used for alignment in Fasta format
(ideally already indexed with samtools faidx)
(required) [str]
-b OUTPUT_BED_FN, --output_bed_fn OUTPUT_BED_FN
Path to write a summary result file in BED format (At
least 1 output file is required) (can be gzipped)
(default: None) [str]
-t OUTPUT_TSV_FN, --output_tsv_fn OUTPUT_TSV_FN
Path to write an more extensive result report in TSV
format (At least 1 output file is required) (can be
gzipped) (default: None) [str]
Misc options:
-m MAX_MISSING, --max_missing MAX_MISSING
Max number of missing samples to perform the test
(default: 0) [int]
-l MIN_DIFF_LLR, --min_diff_llr MIN_DIFF_LLR
Minimal llr boundary for negative and positive median
llr. The test if only performed if at least one sample
has a median llr above (methylated) and 1 sample has a
median llr below (unmethylated) (default: 2) [float]
-s [SAMPLE_ID_LIST [SAMPLE_ID_LIST ...]], --sample_id_list [SAMPLE_ID_LIST [SAMPLE_ID_LIST ...]]
list of sample ids to annotate results in tsv file
(default: None) [str]
--pvalue_adj_method PVALUE_ADJ_METHOD
Method to use for pValue multiple test adjustment
(default: fdr_bh) [str]
--pvalue_threshold PVALUE_THRESHOLD
Alpha parameter (family-wise error rate) for pValue
adjustment (default: 0.01) [float]
--only_tested_sites Do not include sites that were not tested because of
insufficient samples or effect size in the report
(default: False) [None]
Verbosity options:
-v, --verbose Increase verbosity
-q, --quiet Reduce verbosity
-p, --progress Display a progress bar
(pycoMeth)
Example usage
Usage with CpG Aggregate output
pycoMeth Meth_Comp \
-i "./data/Yeast_CpG_1.tsv.gz" "./data/Yeast_CpG_2.tsv.gz" "./data/Yeast_CpG_3.tsv.gz" "./data/Yeast_CpG_4.tsv.gz" \
-f "./data/yeast.fa" \
-b "./results/CLI_Yeast_CpG_meth_comp.bed" \
-t "./results/CLI_Yeast_CpG_meth_comp.tsv" \
-s S1 S2 S3 S4 \
-l 0.5 \
--pvalue_threshold 0.05 \
--only_tested_sites \
--progress
head ./results/CLI_Yeast_CpG_meth_comp.bed
head ./results/CLI_Yeast_CpG_meth_comp.tsv
## Checking options and input files ##
## Parsing files ##
Reading input files header and checking consistancy between headers
Starting asynchronous file parsing
Progress: 37.1M bytes [00:08, 4.31M bytes/s]
Adjust pvalues
Writing output file
Progress: 100%|██████████████████████| 3.28k/3.28k [00:00<00:00, 21.8k sites/s]
Results summary
Insufficient samples: 211,400
Insufficient effect size: 30,132
Valid: 3,283
Non-significant pvalue: 3,283
(pycoMeth) (pycoMeth) track name=meth_comp itemRgb=On
I 109610 109611 . 0 . 109610 109611 230,230,230
I 109667 109668 . 0 . 109667 109668 230,230,230
I 110046 110047 . 0 . 110046 110047 230,230,230
I 110157 110158 . 0 . 110157 110158 230,230,230
I 110226 110227 . 0 . 110226 110227 230,230,230
I 110246 110247 . 0 . 110246 110247 230,230,230
I 110689 110690 . 0 . 110689 110690 230,230,230
I 110925 110926 . 0 . 110925 110926 230,230,230
I 111021 111022 . 0 . 111021 111022 230,230,230
(pycoMeth) chromosome start end n_samples pvalue adj_pvalue neg_med pos_med ambiguous_med labels med_llr_list raw_llr_list comment
I 109610 109611 4 0.5355384957105982 0.637278314240832 3 1 0 ["S1","S2","S3","S4"] [1.06,-3.0,-3.285,-3.575] [[2.83,-0.71],[-4.85,1.83,-3.0],[0.87,-7.44],[0.13,-4.22,-2.93,-6.8]] Non-significant pvalue
I 109667 109668 4 0.6210918363528521 0.6997407339555297 3 1 0 ["S1","S2","S3","S4"] [0.555,-1.45,-4.96,-2.57] [[5.93,-4.82],[-1.45,-1.28,-5.4],[-5.79,-4.13],[-5.81,-1.87,-0.82,-2.57,-5.08]] Non-significant pvalue
I 110046 110047 4 0.4846302240896865 0.6031859226921511 2 1 1 ["S1","S2","S3","S4"] [0.3,-1.33,-3.355,1.32] [[-2.0,2.6],[-1.94,-1.33,-1.2],[-1.73,-4.98],[3.33,1.8,-1.03,-5.67,1.32]] Non-significant pvalue
I 110157 110158 4 0.10079374072306878 0.4770356012343023 3 1 0 ["S1","S2","S3","S4"] [-0.535,1.14,-3.28,-2.96] [[1.03,-2.1],[1.14,3.45,-0.49],[-2.79,-3.77],[-3.61,-0.77,-2.96,-0.32,-7.67]] Non-significant pvalue
I 110226 110227 4 0.2562512982158426 0.48938810639848035 3 1 0 ["S1","S2","S3","S4"] [1.025,-1.21,-0.875,-2.85] [[1.13,0.92],[0.46,-1.45,-1.21],[-1.62,-0.13],[-6.69,2.96,-4.64,-1.5,-2.85]] Non-significant pvalue
I 110246 110247 4 0.16789646023633203 0.4770356012343023 2 1 1 ["S1","S2","S3","S4"] [-2.63,0.57,-2.05,0.2] [[-3.84,-1.42],[-1.04,0.91,0.57],[-3.46,-0.64],[-3.17,0.17,0.94,0.2,1.83]] Non-significant pvalue
I 110689 110690 4 0.15407920997458144 0.4770356012343023 3 1 0 ["S1","S2","S3","S4"] [-1.885,0.5,-6.53,-2.37] [[-1.89,-1.88],[1.32,-0.32],[-2.53,-10.53],[-1.76,0.49,-5.91,-2.37,-4.05]] Non-significant pvalue
I 110925 110926 4 0.45883970397306584 0.5920104435099945 3 1 0 ["S1","S2","S3","S4"] [-3.06,-5.365,-1.785,0.61] [[-3.06,-4.18,2.33],[-10.58,-0.15],[-5.51,1.94],[1.84,-1.45,0.61,2.65,0.35]] Non-significant pvalue
I 111021 111022 4 0.10926389869102016 0.4770356012343023 3 1 0 ["S1","S2","S3","S4"] [-7.8,-4.85,-4.94,0.51] [[-7.8,-3.83,-8.36],[-3.64,-6.06],[-4.19,-5.69],[2.71,1.03,0.32,-5.6,0.51]] Non-significant pvalue
(pycoMeth)
Usage with Interval Aggregate output
pycoMeth Meth_Comp \
-i ./data/medaka_CGI_* \
-f "./data/medaka.fa" \
-b "./results/CLI_Medaka_CGI_meth_comp.bed" \
-t "./results/CLI_Medaka_CGI_meth_comp.tsv" \
--progress
head ./results/CLI_Medaka_CGI_meth_comp.bed
head ./results/CLI_Medaka_CGI_meth_comp.tsv
## Checking options and input files ##
## Parsing files ##
Reading input files header and checking consistancy between headers
Starting asynchronous file parsing
Progress: 556M bytes [00:40, 13.6M bytes/s]
Adjust pvalues
Writing output file
Progress: 100%|████████████████████████| 266k/266k [00:05<00:00, 45.8k sites/s]
Results summary
Insufficient effect size: 156,368
Insufficient samples: 108,385
Valid: 1,532
Significant pvalue: 1,106
Non-significant pvalue: 426
(pycoMeth) (pycoMeth) track name=meth_comp itemRgb=On
1 1657 1963 . 0 . 1657 1963 230,230,230
1 15653 15966 . 0 . 15653 15966 230,230,230
1 17092 17597 . 0 . 17092 17597 230,230,230
1 18071 18621 . 0 . 18071 18621 230,230,230
1 20376 21340 . 0 . 20376 21340 230,230,230
1 21578 21938 . 0 . 21578 21938 230,230,230
1 27747 28080 . 0 . 27747 28080 230,230,230
1 28288 28629 . 0 . 28288 28629 230,230,230
1 31270 31833 . 0 . 31270 31833 230,230,230
(pycoMeth) chromosome start end n_samples pvalue adj_pvalue neg_med pos_med ambiguous_med unique_cpg_pos labels med_llr_list raw_llr_list raw_pos_list comment
1 1657 1963 1 nan nan 0 1 0 0 [9] [] [] [] Insufficient samples
1 15653 15966 12 nan nan 0 12 0 0 [0,1,2,3,4,5,6,7,8,9,10,11] [] [] [] Insufficient effect size
1 17092 17597 12 nan nan 0 6 6 0 [0,1,2,3,4,5,6,7,8,9,10,11] [] [] [] Insufficient effect size
1 18071 18621 12 nan nan 0 12 0 0 [0,1,2,3,4,5,6,7,8,9,10,11] [] [] [] Insufficient effect size
1 20376 21340 11 nan nan 0 11 0 0 [1,2,3,4,5,6,7,8,9,10,11] [] [] [] Insufficient samples
1 21578 21938 12 nan nan 0 10 2 0 [0,1,2,3,4,5,6,7,8,9,10,11] [] [] [] Insufficient effect size
1 27747 28080 12 nan nan 0 6 6 0 [0,1,2,3,4,5,6,7,8,9,10,11] [] [] [] Insufficient effect size
1 28288 28629 12 nan nan 0 11 1 0 [0,1,2,3,4,5,6,7,8,9,10,11] [] [] [] Insufficient effect size
1 31270 31833 12 nan nan 0 8 4 0 [0,1,2,3,4,5,6,7,8,9,10,11] [] [] [] Insufficient effect size
(pycoMeth)