Skip to content

CGI_Finder API usage

Import module

# Import main module 
from pycoMeth.CGI_Finder import CGI_Finder

# optionally inport jupyter helper functions
from pycoMeth.common import head, jhelp

Getting help

jhelp(CGI_Finder)

CGI_Finder (ref_fasta_fn, output_tsv_fn, output_bed_fn, merge_gap, min_win_len, min_CG_freq, min_obs_CG_ratio, verbose, quiet, progress, kwargs)

Simple method to find putative CpG islands in DNA sequences by using a sliding window and merging overlapping windows satisfying the CpG island definition. Results can be saved in bed and tsv format


  • ref_fasta_fn (required) [str]

Reference file used for alignment in Fasta format (ideally already indexed with samtools faidx)

  • output_tsv_fn (default: None) [str]

Path to write an more extensive result report in TSV format (At least 1 output file is required)

  • output_bed_fn (default: None) [str]

Path to write a summary result file in BED format (At least 1 output file is required)

  • merge_gap (default: 0) [int]

Merge close CpG island within a given distance in bases

  • min_win_len (default: 200) [int]

Length of the minimal window containing CpG. Used as the sliding window length

  • min_CG_freq (default: 0.5) [float]

Minimal C+G frequency in a window to be counted as a valid CpG island

  • min_obs_CG_ratio (default: 0.6) [float]

Minimal Observed CG dinucleotidefrequency over expected distribution in a window to be counted as a valid CpG island

  • verbose (default: False) [bool]

  • quiet (default: False) [bool]

  • progress (default: False) [bool]

  • kwargs

Example usage

Basic usage with yeast genome

ff = CGI_Finder (
    ref_fasta_fn="./data/yeast.fa",
    output_bed_fn="./results/yeast_CGI.bed",
    output_tsv_fn="./results/yeast_CGI.tsv",
    progress=True)

head("./results/yeast_CGI.tsv")
head("./results/yeast_CGI.bed")
## Checking options and input files ##
## Parsing reference fasta file ##
    Parsing Reference sequence: I
    Progress: 100%|██████████| 230k/230k [00:00<00:00, 772k bases/s] 
    Parsing Reference sequence: II
    Progress: 100%|██████████| 813k/813k [00:00<00:00, 848k bases/s] 
    Parsing Reference sequence: III
    Progress: 100%|██████████| 316k/316k [00:00<00:00, 756k bases/s] 
    Parsing Reference sequence: IV
    Progress: 100%|██████████| 1.53M/1.53M [00:01<00:00, 905k bases/s]
    Parsing Reference sequence: V
    Progress: 100%|██████████| 577k/577k [00:00<00:00, 877k bases/s] 
    Parsing Reference sequence: VI
    Progress: 100%|██████████| 270k/270k [00:00<00:00, 835k bases/s] 
    Parsing Reference sequence: VII
    Progress: 100%|██████████| 1.09M/1.09M [00:01<00:00, 894k bases/s]
    Parsing Reference sequence: VIII
    Progress: 100%|██████████| 562k/562k [00:00<00:00, 848k bases/s] 
    Parsing Reference sequence: IX
    Progress: 100%|██████████| 440k/440k [00:00<00:00, 836k bases/s] 
    Parsing Reference sequence: X
    Progress: 100%|██████████| 746k/746k [00:00<00:00, 932k bases/s] 
    Parsing Reference sequence: XI
    Progress: 100%|██████████| 667k/667k [00:00<00:00, 923k bases/s] 
    Parsing Reference sequence: XII
    Progress: 100%|██████████| 1.08M/1.08M [00:01<00:00, 915k bases/s]
    Parsing Reference sequence: XIII
    Progress: 100%|██████████| 924k/924k [00:01<00:00, 891k bases/s] 
    Parsing Reference sequence: XIV
    Progress: 100%|██████████| 784k/784k [00:00<00:00, 895k bases/s] 
    Parsing Reference sequence: XV
    Progress: 100%|██████████| 1.09M/1.09M [00:01<00:00, 909k bases/s]
    Parsing Reference sequence: XVI
    Progress: 100%|██████████| 948k/948k [00:01<00:00, 906k bases/s] 
    Parsing Reference sequence: Mito
    Progress: 100%|██████████| 85.6k/85.6k [00:00<00:00, 814k bases/s]
    Results summary
        Valid minimal size windows: 216,083
        Valid merged windows: 2,041
        Number of reference sequences: 17

chromosome start end   length num_CpG CG_freq obs_exp_freq 
I          17    333   316    4       0.509   0.614        
I          1804  2170  366    14      0.495   0.650        
I          25527 25912 385    16      0.488   0.776        
I          31835 32949 1114   59      0.497   0.876        
I          33497 34371 874    39      0.506   0.715        
I          38163 38471 308    13      0.487   0.715        
I          44294 44565 271    12      0.487   0.747        
I          44730 44988 258    9       0.481   0.608        
I          45308 45526 218    12      0.495   0.908        

track name=CpG_islands
I   17  333
I   1804    2170
I   25527   25912
I   31835   32949
I   33497   34371
I   38163   38471
I   44294   44565
I   44730   44988
I   45308   45526