Skip to content

CGI_Finder usage

Simple method to find putative CpG islands in DNA sequences by using a sliding window and merging overlapping windows satisfying the CpG island definition. Results can be saved in bed and tsv format.

Example usage

Input file

Reference FASTA file

FASTA reference file containing sequences in which CpG islands needs to be found.

Output files

Tabulated TSV file

This tabulated file contains the following fields for each CpG island found:

  • chromosome / start / end : Genomic coordinates
  • length: Length of the interval
  • num_CpG: Number of CpGs found
  • CG_freq: G+C nucleotide frequency
  • obs_exp_freq: Observed versus expected CpG frequency

BED file

Minimal standard genomic BED3 format listing the coordinates of putative CpG islands.

The picture below shows the putative CpG islands found (grey boxes) in an example sequence, overlaid with C+G frequency and observed/expected CpG frequency

Example