Skip to content

CGI_Finder CLI usage

Activate virtual environment

# Using virtualenvwrapper here but can also be done with Conda 
workon pycoMeth
(pycoMeth) (pycoMeth) 

Getting help

pycoMeth CGI_Finder --help
usage: pycoMeth CGI_Finder [-h] -f REF_FASTA_FN [-b OUTPUT_BED_FN]
                           [-t OUTPUT_TSV_FN] [-m MERGE_GAP] [-w MIN_WIN_LEN]
                           [-c MIN_CG_FREQ] [-r MIN_OBS_CG_RATIO] [-v] [-q]
                           [-p]

Simple method to find putative CpG islands in DNA sequences by using a sliding
window and merging overlapping windows satisfying the CpG island definition.
Results can be saved in bed and tsv format

optional arguments:
  -h, --help            show this help message and exit

Input/Output options:
  -f REF_FASTA_FN, --ref_fasta_fn REF_FASTA_FN
                        Reference file used for alignment in Fasta format
                        (ideally already indexed with samtools faidx)
                        (required) [str]
  -b OUTPUT_BED_FN, --output_bed_fn OUTPUT_BED_FN
                        Path to write a summary result file in BED format (At
                        least 1 output file is required) (default: None) [str]
  -t OUTPUT_TSV_FN, --output_tsv_fn OUTPUT_TSV_FN
                        Path to write an more extensive result report in TSV
                        format (At least 1 output file is required) (default:
                        None) [str]

Misc options:
  -m MERGE_GAP, --merge_gap MERGE_GAP
                        Merge close CpG island within a given distance in
                        bases (default: 0) [int]
  -w MIN_WIN_LEN, --min_win_len MIN_WIN_LEN
                        Length of the minimal window containing CpG. Used as
                        the sliding window length (default: 200) [int]
  -c MIN_CG_FREQ, --min_CG_freq MIN_CG_FREQ
                        Minimal C+G frequency in a window to be counted as a
                        valid CpG island (default: 0.5) [float]
  -r MIN_OBS_CG_RATIO, --min_obs_CG_ratio MIN_OBS_CG_RATIO
                        Minimal Observed CG dinucleotidefrequency over
                        expected distribution in a window to be counted as a
                        valid CpG island (default: 0.6) [float]

Verbosity options:
  -v, --verbose         Increase verbosity
  -q, --quiet           Reduce verbosity
  -p, --progress        Display a progress bar
(pycoMeth) 

Example usage

Basic usage with yeast genome

pycoMeth CGI_Finder \
    -f ./data/yeast.fa \
    -b ./results/yeast_CGI.bed \
    -t ./results/yeast_CGI.tsv \
    --progress

head ./results/yeast_CGI.bed
head ./results/yeast_CGI.tsv
## Checking options and input files ##
## Parsing reference fasta file ##
    Parsing Reference sequence: I
    Progress: 100%|█████████████████████████| 230k/230k [00:00<00:00, 838k bases/s]
    Parsing Reference sequence: II
    Progress: 100%|█████████████████████████| 813k/813k [00:00<00:00, 917k bases/s]
    Parsing Reference sequence: III
    Progress: 100%|█████████████████████████| 316k/316k [00:00<00:00, 854k bases/s]
    Parsing Reference sequence: IV
    Progress: 100%|███████████████████████| 1.53M/1.53M [00:01<00:00, 978k bases/s]
    Parsing Reference sequence: V
    Progress: 100%|█████████████████████████| 577k/577k [00:00<00:00, 878k bases/s]
    Parsing Reference sequence: VI
    Progress: 100%|█████████████████████████| 270k/270k [00:00<00:00, 888k bases/s]
    Parsing Reference sequence: VII
    Progress: 100%|███████████████████████| 1.09M/1.09M [00:01<00:00, 989k bases/s]
    Parsing Reference sequence: VIII
    Progress: 100%|█████████████████████████| 562k/562k [00:00<00:00, 925k bases/s]
    Parsing Reference sequence: IX
    Progress: 100%|█████████████████████████| 440k/440k [00:00<00:00, 937k bases/s]
    Parsing Reference sequence: X
    Progress: 100%|█████████████████████████| 746k/746k [00:00<00:00, 966k bases/s]
    Parsing Reference sequence: XI
    Progress: 100%|█████████████████████████| 667k/667k [00:00<00:00, 924k bases/s]
    Parsing Reference sequence: XII
    Progress: 100%|███████████████████████| 1.08M/1.08M [00:01<00:00, 906k bases/s]
    Parsing Reference sequence: XIII
    Progress: 100%|█████████████████████████| 924k/924k [00:00<00:00, 971k bases/s]
    Parsing Reference sequence: XIV
    Progress: 100%|█████████████████████████| 784k/784k [00:00<00:00, 923k bases/s]
    Parsing Reference sequence: XV
    Progress: 100%|███████████████████████| 1.09M/1.09M [00:01<00:00, 963k bases/s]
    Parsing Reference sequence: XVI
    Progress: 100%|█████████████████████████| 948k/948k [00:00<00:00, 960k bases/s]
    Parsing Reference sequence: Mito
    Progress: 100%|███████████████████████| 85.6k/85.6k [00:00<00:00, 863k bases/s]
    Results summary
        Valid minimal size windows: 216,083
        Valid merged windows: 2,041
        Number of reference sequences: 17
(pycoMeth) (pycoMeth) track name=CpG_islands
I   17  333
I   1804    2170
I   25527   25912
I   31835   32949
I   33497   34371
I   38163   38471
I   44294   44565
I   44730   44988
I   45308   45526
(pycoMeth) chromosome   start   end length  num_CpG CG_freq obs_exp_freq
I   17  333 316 4   0.509   0.614
I   1804    2170    366 14  0.495   0.650
I   25527   25912   385 16  0.488   0.776
I   31835   32949   1114    59  0.497   0.876
I   33497   34371   874 39  0.506   0.715
I   38163   38471   308 13  0.487   0.715
I   44294   44565   271 12  0.487   0.747
I   44730   44988   258 9   0.481   0.608
I   45308   45526   218 12  0.495   0.908
(pycoMeth)