Skip to content

Interval_Aggregate API usage

Import module

# Import main module 
from pycoMeth.Interval_Aggregate import Interval_Aggregate

# optionally inport jupyter helper functions
from pycoMeth.common import head, jhelp, stdout_print

Getting help

jhelp (Interval_Aggregate)

Interval_Aggregate (cpg_aggregate_fn, ref_fasta_fn, interval_bed_fn, output_bed_fn, output_tsv_fn, interval_size, min_cpg_per_interval, sample_id, min_llr, verbose, quiet, progress, kwargs)

Bin the output of pycoMeth CpG_Aggregate in genomic intervals, using either an annotation file containing intervals or a sliding window.


  • cpg_aggregate_fn (required) [str]

Output tsv file generated by CpG_Aggregate (can be gzipped)

  • ref_fasta_fn (required) [str]

Reference file used for alignment in Fasta format (ideally already indexed with samtools faidx)

  • interval_bed_fn (default: None) [str]

SORTED bed file containing non-overlapping intervals to bin CpG data into (Optional) (can be gzipped)

  • output_bed_fn (default: None) [str]

Path to write a summary result file in BED format (At least 1 output file is required) (can be gzipped)

  • output_tsv_fn (default: None) [str]

Path to write a more extensive result report in TSV format (At least 1 output file is required) (can be gzipped)

  • interval_size (default: 1000) [int]

Size of the sliding window in which to aggregate CpG sites data from if no BED file is provided

  • min_cpg_per_interval (default: 5) [int]

Minimal number of CpG sites per interval.

  • sample_id (default: "") [str]

Sample ID to be used for the BED track header

  • min_llr (default: 2) [float]

Minimal log likelyhood ratio to consider a site significantly methylated or unmethylated in output BED file

  • verbose (default: False) [bool]

  • quiet (default: False) [bool]

  • progress (default: False) [bool]

  • kwargs

Example usage

Default usage with sliding windows

Interval_Aggregate (
    cpg_aggregate_fn="./data/CpG_Aggregate_sample_1.tsv",
    ref_fasta_fn="./data/ref.fa",
    output_bed_fn="./results/Interval_Aggregate_sample_1.bed",
    output_tsv_fn="./results/Interval_Aggregate_sample_1.tsv",
    interval_size=500,
    min_cpg_per_interval=3,
    sample_id="sample_1",
    progress=True)

head("./results/Interval_Aggregate_sample_1.tsv")
head("./results/Interval_Aggregate_sample_1.bed")
## Checking options and input files ##
## Parsing CpG_aggregate file ##
    Progress: 100%|██████████| 5.82M/5.82M [00:02<00:00, 2.30M bytes/s]
    Results summary
        Lines parsed: 89,392
        Total number of intervals: 24,319
    Writter summary
        Empty intervals skipped: 14,390
        Valid intervals written: 9,195
        Low CpG intervals skipped: 734

chromosome start end  num_motifs median_llr llr_list                                           pos_list                                           
I          500   1000 12         -3.35      [-1.14,-3.54,-7.24,-4.3,0.56,-0.65,-4.37,-3.78,...[557,587,628,665,834,868,890,936,955,967,988]      
I          1000  1500 22         -3.65      [-2.48,-5.035,-4.16,-3.315,-3.295,-1.69,-9.885,...[1036,1095,1119,1136,1158,1178,1199,1217,1345,1...
I          1500  2000 19         -3.4       [-5.71,-6.05,-0.925,-7.165,-3.975,0.56,-1.78,-1...[1523,1584,1630,1654,1707,1755,1784,1797,1814,1...
I          2000  2500 15         -4.272     [-5.24,-3.07,-4.33,-19.055,-7.55,-1.255,-2.565,...[2003,2051,2084,2137,2302,2396,2421,2445,2462,2...
I          2500  3000 19         -2.5       [-0.705,0.385,-6.685,-10.175,-4.27,-2.3,-2.5,-2...[2546,2563,2584,2634,2666,2680,2694,2729,2752,2...
I          3000  3500 9          -1.4       [-2.34,-2.19,-1.4,-1.0,0.29,-1.3,-6.9,-1.22,-1.92] [3000,3024,3044,3056,3071,3148,3218,3367,3473]     
I          3500  4000 8          -1.75      [-1.7,-0.53,-4.46,-2.2,-1.75,-8.47,0.53]           [3516,3610,3624,3674,3722,3823,3987]               
I          4000  4500 10         -3.325     [-2.83,-3.96,-1.77,-13.895,-3.82,-1.73]            [4078,4094,4223,4276,4296,4399]                    
I          4500  5000 12         -2.628     [-2.71,-0.76,-1.55,-2.56,-0.985,-5.02,-3.46,-1....[4534,4591,4654,4706,4787,4814,4847,4859,4917,4...

track name=sample_1_Interval itemRgb=On
I   500 1000    .   -3.35   .   500 1000    29,140,190
I   1000    1500    .   -3.65   .   1000    1500    29,140,190
I   1500    2000    .   -3.4    .   1500    2000    29,140,190
I   2000    2500    .   -4.272  .   2000    2500    33,102,171
I   2500    3000    .   -2.5    .   2500    3000    52,168,194
I   3000    3500    .   -1.4    .   3000    3500    230,230,230
I   3500    4000    .   -1.75   .   3500    4000    230,230,230
I   4000    4500    .   -3.325  .   4000    4500    29,140,190
I   4500    5000    .   -2.628  .   4500    5000    52,168,194


Usage with a CpG Islands annotation Bed file

ff = Interval_Aggregate (
    cpg_aggregate_fn="./data/CpG_Aggregate_sample_1.tsv",
    ref_fasta_fn="./data/ref.fa",
    interval_bed_fn="./data/Yeast_CGI.bed",
    output_bed_fn="./results/CGI_Aggregate_sample_1.bed",
    output_tsv_fn="./results/CGI_Aggregate_sample_1.tsv",
    sample_id="sample_1",
    min_cpg_per_interval=1,
    progress=True)

head("./results/CGI_Aggregate_sample_1.tsv")
head("./results/CGI_Aggregate_sample_1.bed")
## Checking options and input files ##
## Parsing CpG_aggregate file ##
    Progress: 100%|█████████▉| 5.81M/5.82M [00:01<00:00, 5.35M bytes/s]
    Results summary
        Lines parsed: 89,235
        Total number of intervals: 2,041
    Writter summary
        Empty intervals skipped: 1,323
        Valid intervals written: 718

chromosome start end   num_motifs median_llr llr_list                                           pos_list                                           
I          1804  2170  14         -3.67      [-3.67,-3.4,-5.53,-1.06,-1.79,-1.94,-6.22,-5.24...[1814,1829,1889,1925,1949,1961,1976,2003,2051,2...
I          31835 32949 10         -5.65      [-2.925,-6.055,-1.785,-5.65,-6.83,-1.695,-12.32]   [31867,31889,31937,31960,32006,32031,32056]        
I          33497 34371 19         -3.295     [-4.38,-3.32,-1.29,-3.27,-5.89,-8.96,-6.88,-2.2...[33947,33967,34001,34021,34049,34068,34099,3416...
I          44730 44988 9          -3.2       [-2.37,-4.9,-1.63,-1.69,-8.09,-4.03]               [44748,44789,44808,44841,44877,44930]              
I          47889 48187 13         -4.55      [-4.55,-9.41,-3.37,-4.66,-3.24,-4.66,-4.535]       [47897,48003,48036,48050,48084,48100,48115]        
I          57175 57391 9          -4.76      [-7.96,-4.76,-0.33,-3.77,-1.03,-7.68,-6.66]        [57200,57255,57274,57292,57316,57335,57359]        
I          59052 59257 13         -4.05      [-8.53,-1.23,-11.15,-3.07,-3.88,-4.22,-1.59,-5.79] [59071,59109,59142,59167,59187,59219,59232,59249]  
I          60422 60656 18         -3.615     [-3.3,-3.14,-0.44,-6.65,-3.93,-2.27,-9.16,-1.73...[60427,60440,60467,60493,60519,60548,60561,6058...
I          61246 61903 38         -3.423     [-3.68,-26.15,-6.45,-0.76,-3.86,-2.33,-4.945,-4...[61262,61353,61391,61409,61424,61443,61501,6154...

track name=sample_1_Interval itemRgb=On
I   1804    2170    .   -3.67   .   1804    2170    29,140,190
I   31835   32949   .   -5.65   .   31835   32949   35,70,156
I   33497   34371   .   -3.295  .   33497   34371   29,140,190
I   44730   44988   .   -3.2    .   44730   44988   29,140,190
I   47889   48187   .   -4.55   .   47889   48187   33,102,171
I   57175   57391   .   -4.76   .   57175   57391   33,102,171
I   59052   59257   .   -4.05   .   59052   59257   33,102,171
I   60422   60656   .   -3.615  .   60422   60656   29,140,190
I   61246   61903   .   -3.423  .   61246   61903   29,140,190


Example with multiple files

for i in range (1, 5):
    stdout_print (f"##### SAMPLE {i} #####")
    Interval_Aggregate (
        cpg_aggregate_fn=f"./data/CpG_Aggregate_sample_{i}.tsv",
        ref_fasta_fn="./data/ref.fa",
        output_bed_fn=f"./results/Interval_Aggregate_sample_{i}.bed",
        output_tsv_fn=f"./results/Interval_Aggregate_sample_{i}.tsv",
        sample_id=f"sample_{i}",
        interval_size=500,
        min_cpg_per_interval=3,
        min_llr=1,
        quiet=True)
##### SAMPLE 1 #####
## Checking options and input files ##
## Parsing CpG_aggregate file ##

##### SAMPLE 2 #####
## Checking options and input files ##
## Parsing CpG_aggregate file ##

##### SAMPLE 3 #####
## Checking options and input files ##
## Parsing CpG_aggregate file ##

##### SAMPLE 4 #####
## Checking options and input files ##
## Parsing CpG_aggregate file ##