Freq_meth_calculate usage
Calculate methylation frequency at genomic CpG sites from the output of nanopolish call-methylation
Output files format
Freq_meth_calculate
can generates 2 files, a standard BED file and a tabulated file containing extra information
BED file
Standard genomic BED6 (https://genome.ucsc.edu/FAQ/FAQformat.html#format1). The score correspond to the methylation frequency multiplied by 1000. The file is sorted by coordinates and can be rendered with a genome browser such as IGV
Tabulated TSV file
Contrary to the bed file, in the tabulated report, positions are ordered by decreasing methylation frequency.
The file contains the following fields:
- chrom / start / end / strand: Genomic coordinates of the motif or group of motifs in case split_group was not selected.
- site_id: Unique integer identifier of the genomic position.
- methylated_reads / unmethylated_reads / ambiguous_reads: Number of reads at a given genomic location with a higher likelyhood of being methylated or unmethylated or with an ambiguous methylation call.
- sequence: -5 to +5 sequence of the motif or group of motifs in case split_group was not selected.
- num_motifs: Number of motif in the group.
- meth_freq: Methylation frequency (out of non anbiguous calls).
Bash command line usage
Command line help
%%bash # Load local bashrc and activate virtual environment source ~/.bashrc workon NanopolishComp NanopolishComp Freq_meth_calculate --help
usage: NanopolishComp Freq_meth_calculate [-h] [-i INPUT_FN] [-b OUTPUT_BED_FN] [-t OUTPUT_TSV_FN] [-d MIN_DEPTH] [-f FASTA_INDEX] [-s SAMPLE_ID] [--strand_specific] [--min_llr MIN_LLR] [-v | -q] Calculate methylation frequency at genomic CpG sites from the output of nanopolish call-methylation optional arguments: -h, --help show this help message and exit -v, --verbose Increase verbosity -q, --quiet Reduce verbosity Input/Output options: -i INPUT_FN, --input_fn INPUT_FN Path to a nanopolish call_methylation tsv output file. If not specified read from std input -b OUTPUT_BED_FN, --output_bed_fn OUTPUT_BED_FN Path to write a summary result file in BED format -t OUTPUT_TSV_FN, --output_tsv_fn OUTPUT_TSV_FN Path to write an more extensive result report in TSV format Filtering options: -d MIN_DEPTH, --min_depth MIN_DEPTH Minimal number of reads covering a site to be reported (default: 10) Other options: -f FASTA_INDEX, --fasta_index FASTA_INDEX fasta index file obtained with samtools faidx. Required for coordinate sorting -s SAMPLE_ID, --sample_id SAMPLE_ID Sample ID to be used for the bed track header (default: ) --strand_specific Output strand specific sites --min_llr MIN_LLR Minimal log likelyhood ratio to consider a site significantly methylated or unmethylated (default: 2)
Example usage
From an existing nanopolish call_methylation file output
%%bash # Load local bashrc and activate virtual environment source ~/.bashrc workon NanopolishComp NanopolishComp Freq_meth_calculate --verbose -i data/freq_meth_calculate/methylation_calls.tsv -b ./output/freq_meth_calculate/out_freq_meth_calculate.bed -t ./output/freq_meth_calculate/out_freq_meth_calculate.tsv -s Sample1 head ./output/freq_meth_calculate/out_freq_meth_calculate.bed head ./output/freq_meth_calculate/out_freq_meth_calculate.tsv
track name=Methylation_Sample1 itemRgb=On chr-VIII 138415 138416 7224149695747901416 -3.200000 . 138415 138416 '8,121,207' chr-VIII 138429 138430 7224167352505884938 -4.722000 . 138429 138430 '8,121,207' chr-VIII 212351 212352 7296895538640012056 -4.125000 . 212351 212352 '8,121,207' chr-VIII 212392 212393 7296795904077105039 -1.866667 . 212392 212393 '100,100,100' chr-VIII 212457 212461 7296716448666179190 -5.546667 . 212457 212461 '8,121,207' chr-VIII 212530 212531 7296001349967846549 -1.850833 . 212530 212531 '100,100,100' chr-VIII 212581 212582 7295894148222946594 -2.307692 . 212581 212582 '8,121,207' chr-VIII 212596 212600 7295913066177928939 -5.900000 . 212596 212600 '8,121,207' chr-VIII 212612 212613 7295852528721985435 -3.675385 . 212612 212613 '8,121,207' chromosome start end strand site_id methylated_reads unmethylated_reads ambiguous_reads sequence num_motifs llr_list chr-VIII 138415 138416 . 7224149695747901416 0 7 3 GGTCTCGCTTT 1 -5.51,-9.63,-2.23,0.53,-0.47,-5.44,-2.38,-5.64,-2.29,1.06 chr-VIII 138429 138430 . 7224167352505884938 0 8 2 AGCTTCGAGGA 1 -5.14,-3.64,-9.53,-8.07,-2.08,1.16,-5.42,-4.97,-9.1,-0.43 chr-VIII 212351 212352 . 7296895538640012056 0 8 4 TGGGGCGACAT 1 -9.1,-6.06,-3.14,0.53,-2.48,-11.61,0.17,-2.76,-3.4,-12.44,0.66,0.13 chr-VIII 212392 212393 . 7296795904077105039 1 5 6 ATTAACGTATA 1 0.21,-1.82,-6.91,-4.89,-1.92,3.09,-3.07,-1.12,1.83,-1.82,-2.98,-3.0 chr-VIII 212457 212461 . 7296716448666179190 0 8 4 AGAATCGTCGATTA 2 -3.48,0.08,-6.33,-0.33,-4.83,-13.56,-1.71,-10.95,-3.32,-12.85,-1.55,-7.73 chr-VIII 212530 212531 . 7296001349967846549 0 5 7 CTATTCGTTTC 1 -1.48,-1.42,-5.41,-2.82,-0.49,1.29,0.15,-2.73,-1.09,-1.11,-4.52,-2.58 chr-VIII 212581 212582 . 7295894148222946594 1 4 8 GTTACCGCAGG 1 0.87,-6.56,1.21,1.75,0.86,-0.03,7.31,-12.54,-0.77,-5.27,-18.42,0.23,1.36 chr-VIII 212596 212600 . 7295913066177928939 0 10 3 TTTGTCGTCGCTGT 2 -4.4,-8.76,-13.95,-0.94,1.11,-1.47,-5.42,-3.37,-7.38,-9.96,-13.14,-6.26,-2.76 chr-VIII 212612 212613 . 7295852528721985435 0 8 5 CACCCCGTTGG 1 0.97,-0.5,-7.73,-6.92,-3.65,-2.94,-9.09,-0.83,-7.41,0.14,1.11,-2.7,-8.23 ## Options summary ## package_name: NanopolishComp package_version: 0.6.11 timestamp: 2019-09-10 16:20:56.119314 quiet: False verbose: True min_llr: 2 strand_specific: False sample_id: Sample1 min_depth: 10 output_tsv_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.tsv output_bed_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.bed fasta_index: input_fn: data/freq_meth_calculate/methylation_calls.tsv ## Checking arguments ## Testing input file readability Check output file Output results in bed format Output results in tsv format ## Parsing methylation_calls file ## Starting to parse file Nanopolish methylation call file : 605248 lines [00:13, 44691.91 lines/s] Filtering out low coverage sites Processing valid sites found Write output file header : 100%|██████████| 804/804 [00:01<00:00, 474.31 sites/s] ## Results summary ## Total read lines: 605,248 Valid read lines: 605,248 Total sites: 229,389 Low coverage sites: 228,585 Valid sites: 1,608
Using a fasta index for output coordinates sorting and strand specific sites
%%bash # Load local bashrc and activate virtual environment source ~/.bashrc workon NanopolishComp NanopolishComp Freq_meth_calculate --verbose --strand_specific -i data/freq_meth_calculate/methylation_calls.tsv -b ./output/freq_meth_calculate/out_freq_meth_calculate.bed -t ./output/freq_meth_calculate/out_freq_meth_calculate.tsv -s Sample1 -f data/freq_meth_calculate/ref.fa.fai head ./output/freq_meth_calculate/out_freq_meth_calculate.bed head ./output/freq_meth_calculate/out_freq_meth_calculate.tsv
track name=Methylation_Sample1 itemRgb=On chr-VIII 213351 213352 -8023785553745245992 -0.500000 - 213351 213352 '100,100,100' chr-VIII 213382 213386 -8023583102452360367 -5.103000 - 213382 213386 '8,121,207' chr-VIII 213427 213428 -8023686382448928188 0.504545 - 213427 213428 '100,100,100' chr-VIII 213516 213517 -8022685139567842841 -2.447500 - 213516 213517 '8,121,207' chr-VIII 213536 213537 -8022798660204149733 -3.103333 - 213536 213537 '8,121,207' chr-VIII 213549 213550 -8022802421743098730 -2.724167 - 213549 213550 '8,121,207' chr-VIII 213645 213646 -8022541247023329610 -2.546667 - 213645 213646 '8,121,207' chr-VIII 213668 213669 -8022652358379081217 -3.728333 - 213668 213669 '8,121,207' chr-VIII 213729 213730 -8022716548847329270 -3.253846 - 213729 213730 '8,121,207' chromosome start end strand site_id methylated_reads unmethylated_reads ambiguous_reads sequence num_motifs llr_list chr-VIII 213351 213352 - -8023785553745245992 0 1 9 ACTAACGGGGA 1 -2.75,-0.46,1.48,-1.35,0.34,0.66,-1.58,-1.91,0.06,0.51 chr-VIII 213382 213386 - -8023583102452360367 1 7 2 TTCTTCGCCGACTG 2 -7.94,3.23,-0.77,0.05,-9.8,-5.7,-11.29,-5.61,-4.64,-8.56 chr-VIII 213427 213428 - -8023686382448928188 1 0 10 TTTCTCGCAAA 1 -0.28,0.35,-0.06,0.05,-0.11,2.63,-1.02,1.27,1.31,1.05,0.36 chr-VIII 213516 213517 - -8022685139567842841 0 6 6 ACTTCCGTTGC 1 -2.01,-5.92,-0.28,-0.07,-0.59,0.87,-5.23,-0.84,-2.65,-7.24,0.47,-5.88 chr-VIII 213536 213537 - -8022798660204149733 0 11 1 CATCTCGTAAA 1 -3.33,-2.35,-4.14,-2.21,-1.92,-2.04,-2.17,-4.34,-2.79,-3.75,-4.01,-4.19 chr-VIII 213549 213550 - -8022802421743098730 1 7 4 GGATACGATAA 1 -3.3,-4.74,-0.15,-0.08,-4.84,-8.16,-3.61,2.21,-2.28,-1.69,-6.15,0.1 chr-VIII 213645 213646 - -8022541247023329610 0 6 6 AAATTCGTCAT 1 -9.16,-0.83,1.66,0.58,-2.18,-5.19,-2.13,-5.99,-3.47,-1.88,-1.55,-0.42 chr-VIII 213668 213669 - -8022652358379081217 0 8 4 CACAACGTTGG 1 -10.85,-2.37,0.81,-1.98,-4.37,-7.49,-2.94,-0.06,-4.53,-3.63,-0.61,-6.72 chr-VIII 213729 213730 - -8022716548847329270 0 9 4 ATTTGCGAATA 1 -2.79,-4.55,1.21,-0.07,-6.66,-2.28,-7.42,-2.66,-1.86,-6.36,-5.87,0.39,-3.38 ## Options summary ## package_name: NanopolishComp package_version: 0.6.11 timestamp: 2019-09-10 16:21:51.314487 quiet: False verbose: True min_llr: 2 strand_specific: True sample_id: Sample1 min_depth: 10 output_tsv_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.tsv output_bed_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.bed fasta_index: data/freq_meth_calculate/ref.fa.fai input_fn: data/freq_meth_calculate/methylation_calls.tsv ## Checking arguments ## Testing input file readability Check output file Output results in bed format Output results in tsv format ## Parsing methylation_calls file ## Starting to parse file Nanopolish methylation call file : 605248 lines [00:14, 43139.23 lines/s] Filtering out low coverage sites Sorting by coordinates Processing valid sites found Write output file header : 100%|██████████| 999/999 [00:01<00:00, 668.32 sites/s] ## Results summary ## Total read lines: 605,248 Valid read lines: 605,248 Total sites: 340,081 Low coverage sites: 339,082 Valid sites: 1,998
Changing filtering thresholds (not recommended)
%%bash # Load local bashrc and activate virtual environment source ~/.bashrc workon NanopolishComp NanopolishComp Freq_meth_calculate --verbose -i data/freq_meth_calculate/methylation_calls.tsv -b ./output/freq_meth_calculate/out_freq_meth_calculate.bed -t ./output/freq_meth_calculate/out_freq_meth_calculate.tsv --min_depth 5 --min_llr 1 head ./output/freq_meth_calculate/out_freq_meth_calculate.bed head ./output/freq_meth_calculate/out_freq_meth_calculate.tsv
track name=Methylation_ itemRgb=On chr-I 4399 4400 1191264914885793541 -2.344000 . 4399 4400 '8,121,207' chr-I 4534 4535 1191135011594914772 -3.033333 . 4534 4535 '8,121,207' chr-I 4591 4592 1191022765062019525 -3.181667 . 4591 4592 '8,121,207' chr-I 4654 4655 1192234775377888428 -1.055000 . 4654 4655 '8,121,207' chr-I 1755 1756 1195840537597523385 -1.706000 . 1755 1756 '8,121,207' chr-I 1782 1787 1195897291462470420 -2.776000 . 1782 1787 '8,121,207' chr-I 1797 1798 1195752253807605775 -3.224000 . 1797 1798 '8,121,207' chr-I 1814 1815 1195776216550583412 -2.454000 . 1814 1815 '8,121,207' chr-I 1829 1830 1195792612111568111 -2.896000 . 1829 1830 '8,121,207' chromosome start end strand site_id methylated_reads unmethylated_reads ambiguous_reads sequence num_motifs llr_list chr-I 4399 4400 . 1191264914885793541 1 3 1 CTTGCCGAATA 1 -3.17,-11.29,5.14,-1.8,-0.6 chr-I 4534 4535 . 1191135011594914772 0 5 1 TTATTCGTTTT 1 -3.31,0.02,-2.66,-5.82,-4.21,-2.22 chr-I 4591 4592 . 1191022765062019525 0 3 3 TCAAACGCTTG 1 -6.16,-0.6,0.28,-0.38,-6.02,-6.21 chr-I 4654 4655 . 1192234775377888428 1 4 1 TTAATCGACAA 1 -1.5,-0.67,-2.66,-2.94,-1.66,3.1 chr-I 1755 1756 . 1195840537597523385 1 2 2 TTGTTCGTTAA 1 -3.37,-0.25,-0.17,-7.47,2.73 chr-I 1782 1787 . 1195897291462470420 0 5 0 AAACCCGTTCGTAAA 2 -2.2,-3.83,-1.37,-4.8,-1.68 chr-I 1797 1798 . 1195752253807605775 0 4 1 ATTGGCGTTTG 1 -1.45,-5.09,-2.17,-6.66,-0.75 chr-I 1814 1815 . 1195776216550583412 0 4 1 GTTTGCGATAG 1 -2.2,-4.1,-1.86,-0.4,-3.71 chr-I 1829 1830 . 1195792612111568111 0 4 1 GATACCGTCCT 1 -0.16,-5.78,-3.32,-4.0,-1.22 ## Options summary ## package_name: NanopolishComp package_version: 0.6.11 timestamp: 2019-09-10 16:22:23.393609 quiet: False verbose: True min_llr: 1.0 strand_specific: False sample_id: min_depth: 5 output_tsv_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.tsv output_bed_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.bed fasta_index: input_fn: data/freq_meth_calculate/methylation_calls.tsv ## Checking arguments ## Testing input file readability Check output file Output results in bed format Output results in tsv format ## Parsing methylation_calls file ## Starting to parse file Nanopolish methylation call file : 605248 lines [00:13, 45934.72 lines/s] Filtering out low coverage sites Processing valid sites found Write output file header : 100%|██████████| 18531/18531 [00:05<00:00, 3528.30 sites/s] ## Results summary ## Total read lines: 605,248 Valid read lines: 605,248 Total sites: 229,389 Low coverage sites: 210,858 Valid sites: 37,062
Python API usage
Import the package
# Import main program from NanopolishComp.Freq_meth_calculate import Freq_meth_calculate # Import helper functions from NanopolishComp.common import jhelp, head
python API help
jhelp(Freq_meth_calculate)
NanopolishComp.Freq_meth_calculate.init
Calculate methylation frequency at genomic CpG sites from the output of nanopolish call-methylation
- input_fn : str (required)
Path to a nanopolish call_methylation tsv output file
- fasta_index : str (default = )
fasta index file obtained with samtools faidx. Required for coordinate sorting
- output_bed_fn : str (default = )
Path to write a summary result file in BED format
- output_tsv_fn : str (default = )
Path to write an more extensive result report in TSV format
- min_depth : int (default = 10)
Minimal number of reads covering a site to be reported
- sample_id : str (default = )
Sample ID to be used for the bed track header
- strand_specific : bool (default = False)
If True, output strand specific sites
- min_llr : float (default = 2)
Minimal log likelyhood ratio to consider a site significantly methylated or unmethylated
- verbose : bool (default = False)
Increase verbosity
- quiet : bool (default = False)
Reduce verbosity
Example usage
basic setting
f = Freq_meth_calculate( input_fn="./data/freq_meth_calculate/methylation_calls.tsv", output_bed_fn="./output/freq_meth_calculate/out_freq_meth_calculate.bed", sample_id="Sample1", verbose=True) head("./output/freq_meth_calculate/out_freq_meth_calculate.bed")
## Options summary ## package_name: NanopolishComp package_version: 0.6.11 timestamp: 2019-09-10 16:23:07.646062 quiet: False verbose: True min_llr: 2 strand_specific: False sample_id: Sample1 min_depth: 10 output_tsv_fn: output_bed_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.bed fasta_index: input_fn: ./data/freq_meth_calculate/methylation_calls.tsv ## Checking arguments ## Testing input file readability Check output file Output results in bed format ## Parsing methylation_calls file ## Starting to parse file Nanopolish methylation call file : 605248 lines [00:15, 39476.23 lines/s] Filtering out low coverage sites Processing valid sites found Write output file header : 100%|██████████| 804/804 [00:01<00:00, 492.95 sites/s] ## Results summary ## Total read lines: 605,248 Valid read lines: 605,248 Total sites: 229,389 Low coverage sites: 228,585 Valid sites: 1,608 track name=Methylation_Sample1 itemRgb=On chr-VIII 138415 138416 -4387463463888030503 -3.200000 . 138415 138416 '8,121,207' chr-VIII 138429 138430 -4387486165434009317 -4.722000 . 138429 138430 '8,121,207' chr-VIII 212351 212352 -4459725007132593111 -4.125000 . 212351 212352 '8,121,207' chr-VIII 212392 212393 -4459458894565841458 -1.866667 . 212392 212393 '100,100,100' chr-VIII 212457 212461 -4459540872370764953 -5.546667 . 212457 212461 '8,121,207' chr-VIII 212530 212531 -4460596494258779804 -1.850833 . 212530 212531 '100,100,100' chr-VIII 212581 212582 -4460665860093715069 -2.307692 . 212581 212582 '8,121,207' chr-VIII 212596 212600 -4460684778048697414 -5.900000 . 212596 212600 '8,121,207' chr-VIII 212612 212613 -4460462807376904566 -3.675385 . 212612 212613 '8,121,207'
Using a fasta index for output coordinates sorting and strand specific sites
f = Freq_meth_calculate( input_fn="./data/freq_meth_calculate/methylation_calls.tsv", fasta_index="./data/freq_meth_calculate/ref.fa.fai", output_bed_fn="./output/freq_meth_calculate/out_freq_meth_calculate.bed", sample_id="Sample1", verbose=True, strand_specific=True) head("./output/freq_meth_calculate/out_freq_meth_calculate.bed")
## Options summary ## package_name: NanopolishComp package_version: 0.6.11 timestamp: 2019-09-10 16:23:56.781302 quiet: False verbose: True min_llr: 2 strand_specific: True sample_id: Sample1 min_depth: 10 output_tsv_fn: output_bed_fn: ./output/freq_meth_calculate/out_freq_meth_calculate.bed fasta_index: ./data/freq_meth_calculate/ref.fa.fai input_fn: ./data/freq_meth_calculate/methylation_calls.tsv ## Checking arguments ## Testing input file readability Check output file Output results in bed format ## Parsing methylation_calls file ## Starting to parse file Nanopolish methylation call file : 605248 lines [00:15, 40181.70 lines/s] Filtering out low coverage sites Sorting by coordinates Processing valid sites found Write output file header : 100%|██████████| 999/999 [00:01<00:00, 637.85 sites/s] ## Results summary ## Total read lines: 605,248 Valid read lines: 605,248 Total sites: 340,081 Low coverage sites: 339,082 Valid sites: 1,998 track name=Methylation_Sample1 itemRgb=On chr-VIII 213351 213352 3852928671350919267 -0.500000 - 213351 213352 '100,100,100' chr-VIII 213382 213386 3853101183011884366 -5.103000 - 213382 213386 '8,121,207' chr-VIII 213427 213428 3853155687118280343 0.504545 - 213427 213428 '100,100,100' chr-VIII 213516 213517 3846995005950784724 -2.447500 - 213516 213517 '8,121,207' chr-VIII 213536 213537 3846697452562010016 -3.103333 - 213536 213537 '8,121,207' chr-VIII 213549 213550 3846721371280116041 -2.724167 - 213549 213550 '8,121,207' chr-VIII 213645 213646 3846832673276330665 -2.546667 - 213645 213646 '8,121,207' chr-VIII 213668 213669 3847166582036626108 -3.728333 - 213668 213669 '8,121,207' chr-VIII 213729 213730 3846776908639343893 -3.253846 - 213729 213730 '8,121,207'
Changing filtering threshold (not recommended)
f = Freq_meth_calculate( input_fn="./data/freq_meth_calculate/methylation_calls.tsv", output_tsv_fn="./output/freq_meth_calculate/out_freq_meth_calculate.tsv", min_llr=1, min_depth=5) head("./output/freq_meth_calculate/out_freq_meth_calculate.tsv")
## Checking arguments ## ## Parsing methylation_calls file ## Starting to parse file Nanopolish methylation call file : 605248 lines [00:14, 40874.57 lines/s] Filtering out low coverage sites Processing valid sites found : 100%|██████████| 18531/18531 [00:05<00:00, 3458.71 sites/s] ## Results summary ## Total read lines: 605,248 Valid read lines: 605,248 Total sites: 229,389 Low coverage sites: 210,858 Valid sites: 37,062 chromosome start end strand site_id methylated_reads unmethylated_reads ambiguous_reads sequence num_motifs llr_list chr-I 4399 4400 . -2730676911920365501 1 3 1 CTTGCCGAATA 1 -3.17,-11.29,5.14,-1.8,-0.6 chr-I 4534 4535 . -2730544486235489086 0 5 1 TTATTCGTTTT 1 -3.31,0.02,-2.66,-5.82,-4.21,-2.22 chr-I 4591 4592 . -2730596195312440829 0 3 3 TCAAACGCTTG 1 -6.16,-0.6,0.28,-0.38,-6.02,-6.21 chr-I 4654 4655 . -2731644250018462742 1 4 1 TTAATCGACAA 1 -1.5,-0.67,-2.66,-2.94,-1.66,3.1 chr-I 1755 1756 . -2725162958641511345 1 2 2 TTGTTCGTTAA 1 -3.37,-0.25,-0.17,-7.47,2.73 chr-I 1782 1787 . -2725136473504536062 0 5 0 AAACCCGTTCGTAAA 2 -2.2,-3.83,-1.37,-4.8,-1.68 chr-I 1797 1798 . -2724913241635744391 0 4 1 ATTGGCGTTTG 1 -1.45,-5.09,-2.17,-6.66,-0.75 chr-I 1814 1815 . -2724934681984724382 0 4 1 GTTTGCGATAG 1 -2.2,-4.1,-1.86,-0.4,-3.71 chr-I 1829 1830 . -2724872883331782055 0 4 1 GATACCGTCCT 1 -0.16,-5.78,-3.32,-4.0,-1.22