Skip to content

CpG_Aggregate CLI usage

Activate virtual environment

# Using virtualenvwrapper here but can also be done with Conda 
workon pycoMeth
(pycoMeth) (pycoMeth) 

Getting help

pycoMeth CpG_Aggregate --help
usage: pycoMeth CpG_Aggregate [-h] -i [NANOPOLISH_FN [NANOPOLISH_FN ...]] -f
                              REF_FASTA_FN [-b OUTPUT_BED_FN]
                              [-t OUTPUT_TSV_FN] [-d MIN_DEPTH] [-s SAMPLE_ID]
                              [-l MIN_LLR] [-v] [-q] [-p]

Calculate methylation frequency at genomic CpG sites from the output of
`nanopolish call-methylation`

optional arguments:
  -h, --help            show this help message and exit

Input/Output options:
  -i [NANOPOLISH_FN [NANOPOLISH_FN ...]], --nanopolish_fn [NANOPOLISH_FN [NANOPOLISH_FN ...]]
                        Path to a nanopolish call_methylation tsv output file
                        or a list of files or a regex matching several files
                        (can be gzipped) (required) [str]
  -f REF_FASTA_FN, --ref_fasta_fn REF_FASTA_FN
                        Reference file used for alignment in Fasta format
                        (ideally already indexed with samtools faidx)
                        (required) [str]
  -b OUTPUT_BED_FN, --output_bed_fn OUTPUT_BED_FN
                        Path to write a summary result file in BED format (At
                        least 1 output file is required) (can be gzipped)
                        (default: None) [str]
  -t OUTPUT_TSV_FN, --output_tsv_fn OUTPUT_TSV_FN
                        Path to write a more extensive result report in TSV
                        format (At least 1 output file is required) (can be
                        gzipped) (default: None) [str]

Misc options:
  -d MIN_DEPTH, --min_depth MIN_DEPTH
                        Minimal number of reads covering a site to be reported
                        (default: 10) [int]
  -s SAMPLE_ID, --sample_id SAMPLE_ID
                        Sample ID to be used for the BED track header
                        (default: None) [str]
  -l MIN_LLR, --min_llr MIN_LLR
                        Minimal log likelyhood ratio to consider a site
                        significantly methylated or unmethylated in output BED
                        file (default: 2) [float]

Verbosity options:
  -v, --verbose         Increase verbosity
  -q, --quiet           Reduce verbosity
  -p, --progress        Display a progress bar
(pycoMeth) 

Example usage

Basic usage

pycoMeth CpG_Aggregate \
    -i ./data/nanopolish_sample_1.tsv \
    -f ./data/ref.fa \
    -b ./results/CpG_Aggregate_sample_1_CLI.bed \
    -t ./results/CpG_Aggregate_sample_1_CLI.tsv \
    -s sample_1 \
    --progress

head ./results/CpG_Aggregate_sample_1_CLI.bed
head ./results/CpG_Aggregate_sample_1_CLI.tsv
## Checking options and input files ##
## Parsing methylation_calls file ##
    Starting to parse file Nanopolish methylation call file
    Progress: 100%|██████████████████████| 51.9M/51.9M [00:04<00:00, 11.1M bytes/s]
    Parsing summary
        Lines Parsed: 543,135
        Line successfully parsed: 543,135
        Input files: 1
    Filtering out low coverage sites
    Sorting each chromosome by coordinates
    Sites summary
        Total Valid Lines: 543,135
        Initial Sites: 218,353
        Low Count Sites: 218,114
        Valid Sites Found: 239
## Processing valid sites found and write to file ##
    Progress: 100%|██████████████████████████| 239/239 [00:00<00:00, 4.18k sites/s]
    Results summary
        Total Sites Writen: 239
        Unmethylated sites: 162
        Ambiguous sites: 77
(pycoMeth) (pycoMeth) track name=sample_1_CpG itemRgb=On
VIII    138415  138416  .   -2.355  .   138415  138416  52,168,194
VIII    138429  138430  .   -4.525  .   138429  138430  33,102,171
VIII    212351  212352  .   -2.77   .   212351  212352  52,168,194
VIII    212392  212393  .   -2.51   .   212392  212393  52,168,194
VIII    212457  212461  .   -6.08   .   212457  212461  28,45,131
VIII    212530  212531  .   -1.27   .   212530  212531  230,230,230
VIII    212581  212582  .   0.075   .   212581  212582  230,230,230
VIII    212596  212600  .   -4.86   .   212596  212600  33,102,171
VIII    212612  212613  .   -2.91   .   212612  212613  52,168,194
(pycoMeth) chromosome   start   end sequence    num_motifs  median_llr  llr_list
VIII    138415  138416  GGTCTCGCTTT 1   -2.355  [-9.42,-5.49,-5.18,-5.11,-2.43,-1.1,0.46,-0.68,1.07,-2.28]
VIII    138429  138430  AGCTTCGAGGA 1   -4.525  [-3.62,-5.58,1.12,-2.5,-10.4,-2.39,-8.33,-7.29,-0.44,-5.43]
VIII    212351  212352  TGGGGCGACAT 1   -2.77   [-2.95,-11.55,-9.31,-0.07,-11.21,-4.14,0.66,-2.54,2.05,0.54,-2.77]
VIII    212392  212393  ATTAACGTATA 1   -2.51   [-6.76,3.04,0.11,-2.51,0.32,-3.7,-2.92,-2.01,-3.52,-4.71,-1.2]
VIII    212457  212461  AGAATCGTCGATTA  2   -6.08   [-6.08,-13.01,-3.52,-1.3,-8.11,-8.88,-1.47,-4.78,-6.83,-3.04,-6.32,-0.17,-10.75]
VIII    212530  212531  CTATTCGTTTC 1   -1.27   [-5.33,-1.27,1.12,-3.72,0.48,-4.4,-0.48,-1.02,-0.07,-5.54,-2.65,0.16,-2.7]
VIII    212581  212582  GTTACCGCAGG 1   0.075   [1.19,-0.11,-0.02,-5.77,2.08,0.17,0.84,2.46,-4.36,-2.46,1.75,6.98,-11.76,-0.68]
VIII    212596  212600  TTTGTCGTCGCTGT  2   -4.86   [-13.76,-4.43,-1.37,-8.36,-6.67,-6.3,1.13,-4.67,-7.3,-2.5,-0.96,-5.05,-2.63,-7.3]
VIII    212612  212613  CACCCCGTTGG 1   -2.91   [-7.45,1.01,-2.76,-0.81,-3.06,-2.63,-3.66,-3.11,-0.21,-2.02,-6.81,-8.47,-1.18,-7.1]
(pycoMeth) 

Example usage using a regex and with a lower depth threshold

pycoMeth CpG_Aggregate \
    -i ./data/nanopolish_sample_*.tsv \
    -f ./data/ref.fa \
    -b ./results/CpG_Aggregate_sample_all_CLI.bed \
    -t ./results/CpG_Aggregate_sample_all_CLI.tsv \
    -d 5 \
    -s sample_all \
    --progress

head ./results/CpG_Aggregate_sample_all_CLI.bed
head ./results/CpG_Aggregate_sample_all_CLI.tsv
## Checking options and input files ##
## Parsing methylation_calls file ##
    Starting to parse file Nanopolish methylation call file
    Progress: 100%|████████████████████████| 209M/209M [00:17<00:00, 12.0M bytes/s]
    Parsing summary
        Lines Parsed: 2,180,231
        Line successfully parsed: 2,180,231
        Input files: 4
    Filtering out low coverage sites
    Sorting each chromosome by coordinates
    Sites summary
        Total Valid Lines: 2,180,231
        Initial Sites: 251,674
        Valid Sites Found: 228,163
        Low Count Sites: 23,511
## Processing valid sites found and write to file ##
    Progress: 100%|████████████████████████| 228k/228k [00:26<00:00, 8.67k sites/s]
    Results summary
        Total Sites Writen: 228,163
        Unmethylated sites: 168,018
        Ambiguous sites: 60,129
        Methylated sites: 16
(pycoMeth) (pycoMeth) track name=sample_all_CpG itemRgb=On
I   144 145 .   -2.2    .   144 145 52,168,194
I   175 176 .   -1.35   .   175 176 230,230,230
I   216 217 .   -2.16   .   216 217 52,168,194
I   325 326 .   -2.66   .   325 326 52,168,194
I   339 340 .   -1.21   .   339 340 230,230,230
I   354 355 .   -1.39   .   354 355 230,230,230
I   422 433 .   -10.52  .   422 433 28,45,131
I   542 543 .   -0.78   .   542 543 230,230,230
I   557 558 .   -2.3    .   557 558 52,168,194
(pycoMeth) chromosome   start   end sequence    num_motifs  median_llr  llr_list
I   144 145 CCACTCGTTAC 1   -2.2    [-2.2,-8.42,-0.7,2.77,-3.01]
I   175 176 CACTCCGAACC 1   -1.35   [1.94,-2.01,-1.35,-8.02,-1.07]
I   216 217 CCCACCGTTAC 1   -2.16   [-0.27,-0.41,-6.62,-2.16,-2.85]
I   325 326 TGAAACGCTAA 1   -2.66   [-4.93,-2.66,-0.41,0.01,-5.79]
I   339 340 ATGATCGTAAA 1   -1.21   [-1.21,-1.08,-0.02,-2.85,-4.49]
I   354 355 ACACACGTGCT 1   -1.39   [-1.39,-1.2,-1.11,-4.6,-1.63]
I   422 433 TTTTACGTACGCACACGGATG   3   -10.52  [-13.29,-10.52,-2.49,-7.21,-10.79]
I   542 543 ATGCACGGCAC 1   -0.78   [2.14,2.59,-2.03,-3.57,0.47,-3.81]
I   557 558 CTCAGCGGTCT 1   -2.3    [-4.34,-1.14,-1.11,-5.5,-1.85,-4.84,-2.3]
(pycoMeth)