Skip to content

CpG_Aggregate usage

Calculate methylation frequency at genomic CpG sites from the output of nanopolish call-methylation

Example usage

Input files

Nanopolish call-methylation output file

Nanopolish call-methylation tsv output file or a list of files or a regex matching several files.

Reference FASTA file

FASTA reference file used for read alignment and Nanopolish. This file is required and used to sort the CpG sites by coordinates

Output files

Tabulated TSV file

This tabulated file contains the following fields:

  • chromosome / start / end: Genomic coordinates of the CpG or cluster of CpGs if in less than 5 bases from each other.
  • sequence: -5 to +5 sequence of the motif or group of motifs in case split_group was not selected.
  • num_motifs: Number of motifs (CpG) found in the cluster.
  • median_llr: Median of log likelihood ratios for all read mapped
  • llr_list: List of raw llr values

BED file

Standard genomic BED9 format including an RGB color field. The score correspond to the median log likelihood ratio. The file is already sorted by coordinates and can be rendered with a genome browser such as IGV

The sites are color-coded as follow:

  • Median log likelihood ratio higher than 2 (Methylated): Colorscale from orange (llr = 2) to deep red (llr >=6)
  • Median log likelihood ratio lower than 2 (Unmethylated): Colorscale from green (llr = -2) to deep blue (llr <= -6)
  • Grey: Median log likelihood ration between -2 and 2 (ambiguous methylation status)

Here is an example of multiple methylation bed files rendered with IGV

Example Bed Files