Meth_Comp API usage
Import module
# Import main module
from pycoMeth.Comp_Report import Comp_Report, cpg_heatmap, cpg_ridgeplot, category_barplot, chr_ideogram_plot, tss_dist_plot
# optionally inport jupyter helper functions
from pycoMeth.common import head, jhelp, Kaleido
Getting help
jhelp(Comp_Report)
Comp_Report (methcomp_fn, gff3_fn, ref_fasta_fn, outdir, n_top, max_tss_distance, pvalue_threshold, min_diff_llr, n_len_bin, api_mode, export_static_plots, report_non_significant, verbose, quiet, progress, kwargs)
Generate an HTML report of significantly differentially methylated CpG intervals from Meth_Comp
text output. Significant intervals are annotated with their closest transcript TSS.
- methcomp_fn (required) [str]
Input tsv file generated by Meth_comp (can be gzipped). At the moment only data binned by intervals with Interval_Aggregate are supported.
- gff3_fn (required) [str]
Path to an ensembl GFF3 file containing genomic annotations. Only the transcripts details are extracted.
- ref_fasta_fn (required) [str]
Reference file used for alignment in Fasta format (ideally already indexed with samtools faidx)
- outdir (default: ./) [str]
Directory where to output HTML reports, By default current directory
- n_top (default: 100) [int]
Number of top interval candidates for which to generate an interval report. If there are not enough significant candidates this is automatically scaled down.
- max_tss_distance (default: 100000) [int]
Maximal distance to transcription stat site to find transcripts close to interval candidates
- pvalue_threshold (default: 0.01) [float]
pValue cutoff for top interval candidates
- min_diff_llr (default: 1) [float]
Minimal llr boundary for negative and positive median llr. 1 is recommanded for vizualization purposes.
- n_len_bin (default: 500) [int]
Number of genomic intervals for the longest chromosome of the ideogram figure
- api_mode (default: False) [bool]
Don't generate reports or tables, just parse data and return a tuple containing an overall median CpG dataframe and a dictionary of CpG dataframes for the top candidates found. These dataframes can then be used to with the plotting functions containned in this module
- export_static_plots (default: False) [bool]
Export all the plots from the reports in SVG format.
- report_non_significant (default: False) [bool]
Report all valid CpG islands, significant or not in the text report. This option also adds a non-significant track to the TSS_distance plot
verbose (default: False) [bool]
quiet (default: False) [bool]
progress (default: False) [bool]
kwargs
Example usage in interactive API mode
Parse data
In api_mode
Comp_report doesn't generate any HTML reports but returns a tuple with 3 elements:
* A Dataframe containing summary information about each CpG islands including the closest transcript
* A dataframe of median methylation values for all CpG Islands
* A list of CpG Island level methylation values ranked by ascending pvalues.
report_non_significant
significantly increases the execution time but reports the closest transcript for all sites and generate additional output
all_summary_df, all_cpg_df, top_cpg_df_d = Comp_Report (
methcomp_fn = "./data/Medaka_CGI_meth_comp.tsv.gz",
gff3_fn = "./data/medaka.gff3",
ref_fasta_fn="./data/medaka.fa",
report_non_significant=True,
n_top=50,
progress=True,
api_mode=True)
display(all_summary_df.head())
display(all_cpg_df.head())
display(top_cpg_df_d[1].head())
Plotting functions
To display static version in jupyter (easier to export that dynamic d3js plots), one can use the Kaleido wrapper included in pycoMeth
kaleido = Kaleido()
cpg_heatmap
Can be used at both genome scale (all_cpg_df) or individual CpG island (top_cpg_df_d)
fig = cpg_heatmap(top_cpg_df_d[1], lim_llr=7, column_widths=[0.9, 0.10])
display(kaleido.render_plotly_svg(fig, width=1200))
fig = cpg_heatmap(all_cpg_df, lim_llr=3, column_widths=[0.80, 0.20])
display(kaleido.render_plotly_svg(fig, width=1200))
cpg_ridgeplot
Can be used at both genome scale (all_cpg_df) or individual CpG island (top_cpg_df_d)
fig = cpg_ridgeplot(all_cpg_df, box=True, scatter=False)
display(kaleido.render_plotly_svg(fig, width=1200))
fig = cpg_ridgeplot(top_cpg_df_d[1])
display(kaleido.render_plotly_svg(fig, width=1200))
category_barplot
Can be used at both genome scale (all_cpg_df) or individual CpG island (top_cpg_df_d)
fig = category_barplot(all_cpg_df)
display(kaleido.render_plotly_svg(fig, width=1200))
fig = category_barplot(top_cpg_df_d[1])
display(kaleido.render_plotly_svg(fig, width=1200))
chr_ideogram_plot
Can only be used at genome scale and requires to provide the reference fasta file
fig = chr_ideogram_plot(all_cpg_df, ref_fasta_fn="./data/medaka.fa", n_len_bin=250)
display(kaleido.render_plotly_svg(fig, width=1200, height=700))
tss_dist_plot
Can only be used from CpG island summary file
fig = tss_dist_plot(all_summary_df)
display(kaleido.render_plotly_svg(fig, width=1200))
Example usage in report mode
Example with a single significant result
Comp_Report (
methcomp_fn = "./data/Yeast_CGI_meth_comp.tsv.gz",
gff3_fn = "./data/yeast.gff3",
ref_fasta_fn="./data/yeast.fa",
outdir = "yeast_html",
pvalue_threshold = 0.05,
verbose=True)
Usage with large dataset, including static plot export
Comp_Report (
methcomp_fn = "./data/Medaka_CGI_meth_comp.tsv.gz",
gff3_fn = "./data/medaka.gff3",
ref_fasta_fn="./data/medaka.fa",
outdir = "medaka_html",
export_static_plots=True,
report_non_significant=True,
n_top=50,
progress=True)