pycoQC API Usage

pycoQC is a simple class that is initialized with a text summary file generated by ONT Albacore or Guppy.

The instantiated object can be subsequently called with various methods that will generates tables and plots.

There are a few different ways to get help for all the public package functions:

In a separate window with the jupyter magic "?": ?pycoQC.channels_activity
In an output cell with the standard help function: help (pycoQC.channels_activity)
Inline with the cursor on the function of interest use shift + tab

Running pycoQC in Jupyter notebook

If you want to run pycoQC interactively in Jupyter you need to install Jupyter manually. If you installed pycoQC in a virtual environment then install Jupyter in the same virtual environment.

pip3 install notebook

Launch the notebook in a shell terminal

jupyter notebook

If it does not auto-start, open the following URL in you favorite web browser http://localhost:8888/tree

From Jupyter homepage you can navigate to the directory you want to work in and create a new Python3 Notebook.

Imports

For plotly offline plotting

Import pycoQC main class as well as Plotly and enable inline plotting in the current Notebook.

This is the recommended option. This ensures that your all your data are stored inside the notebook.

The limitation is that if generating many plots with large datasets the notebook will become quite heavy and slow.

# Run cell with Ctrl + Enter in Jupyter

# Import main pycoQC module
from pycoQC.pycoQC import pycoQC
from pycoQC.pycoQC_plot import pycoQC_plot

# Import helper functions from pycoQC
from pycoQC.common import jhelp

# Import and setup plotly for offline plotting in Jupyter 
from plotly.offline import init_notebook_mode
init_notebook_mode (connected=False)

For plotly online plotting

This option takes advantage of Plotly web-service for hosting graphs. This requires to set up an account (https://plot.ly/python/getting-started/#initialization-for-online-plotting) and to provide credentials in the notebook. This could be a good option for easy sharing of the interactive plots generated by pycoQC.

# Only run this cell if you have set up a plotly account before and wants to use Plotly web-service 
# from plotly.plotly import plot, iplot
# import plotly.tools as pt
# pt.set_credentials_file (username="XXXXXXXXXX", api_key="XXXXXXXXXX")

Initialisation

Upon initialization pycoQC reads the sequencing summary file, runs a series of tests and pre-process the data for plotting methods.

jhelp (pycoQC)

pycoQC (summary_file, barcode_file, bam_file, runid_list, filter_calibration, filter_duplicated, min_barcode_percent, min_pass_qual, min_pass_len, sample, html_outfile, report_title, config_file, template_file, json_outfile, skip_coverage_plot, verbose, quiet)

Parse Albacore sequencing_summary.txt file and clean-up the data

summary_file (required) [str]

Path to a sequencing_summary generated by Albacore 1.0.0 + (read_fast5_basecaller.py) / Guppy 2.1.3+ (guppy_basecaller). One can also pass multiple space separated file paths or a UNIX style regex matching multiple files

barcode_file (default: "") [str]

Path to the barcode_file generated by Guppy 2.1.3+ (guppy_barcoder) or Deepbinner 0.2.0+. This is not a required file. One can also pass multiple space separated file paths or a UNIX style regex matching multiple files

bam_file (default: "") [str]

Path to a Bam file corresponding to reads in the summary_file. Preferably aligned with Minimap2 One can also pass multiple space separated file paths or a UNIX style regex matching multiple files

runid_list (default: []) [list]

Select only specific runids to be analysed. Can also be used to force pycoQC to order the runids for temporal plots, if the sequencing_summary file contain several sucessive runs. By default pycoQC analyses all the runids in the file and uses the runid order as defined in the file.

filter_calibration (default: False) [bool]

If True read flagged as calibration strand by the software are removed

filter_duplicated (default: False) [bool]

If True duplicated read_ids are removed but the first occurence is kept (Guppy sometimes outputs the same read multiple times)

min_barcode_percent (default: 0.1) [float]

Minimal percent of total reads to retain barcode label. If below the barcode value is set as unclassified.

min_pass_qual (default: 7) [float]

Minimum quality to consider a read as 'pass'

min_pass_len (default: 0) [int]

Minimum read length to consider a read as 'pass'

sample (default: 100000) [int]

If not None a n number of reads will be randomly selected instead of the entire dataset for ploting function (deterministic sampling)

html_outfile (default: "") [str]

Path to an output html file report

report_title (default: PycoQC report) [str]

Title to use in the html report

config_file (default: "") [str]

Path to a JSON configuration file for the html report. If not provided, falls back to default parameters. The first level keys are the names of the plots to be included. The second level keys are the parameters to pass to each plotting function

template_file (default: "") [str]

Jinja2 html template for the html report

json_outfile (default: "") [str]

Path to an output json file report

skip_coverage_plot (default: False) [bool]
verbose (default: False) [bool]

Increase verbosity

quiet (default: False) [bool]

Reduce verbosity

Basic initialisation

p = pycoQC("./data/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz")

Initialisation with modification of the "pass" threshold

p = pycoQC("./data/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz", min_pass_len=200, min_pass_qual=8)

Initialisation with calibration strand filtering out

p = pycoQC("./data/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz", filter_calibration=True)

Initialisation with summary file regex

p = pycoQC("./data/*RNA*")

Initialisation with guppy barcoding file and verbose option

p = pycoQC(
    summary_file="./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz",
    barcode_file="./data/Guppy-2.1.3_basecall-1D_DNA_barcoding_summary.txt.gz", verbose=True)
print(p)

Initialisation with Deepbinner barcoding file

p = pycoQC(
    summary_file="./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz",
    barcode_file="./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz")
print(p)

Initialisation with Bam files

p = pycoQC(
    summary_file="./large_data/sample_1_sequencing_summary.txt",
    bam_file="./large_data/sample_1.bam")
print(p)

p = pycoQC(
    summary_file="./large_data/sample_1_sequencing_summary.txt",
    bam_file="./large_data/sample_1.bam")
print(p)

Generating inline plots and tables

Interaction with Plotly library

Most of pycoQC methods return a Figure object generated with plotly for Python. The Figure object can be subsequently used for:

Further customization using the numerous methods attached to the Figure object
Inline plotting in Jupyter Notebook using iplot (either from plotly.plotly or plotly.offline)
Generating a separate HTML file with plot (either from plotly.plotly or plotly.offline)
Exporting as a static image (https://plot.ly/python/static-image-export/), pdf (https://plot.ly/python/pdf-reports/) or various text formats.

Users can also customize the figures online in a user friendly environment by clicking on "Edit in Chart Studio" in the upper right corner of each figures.

Edit in Chart Studio

Similarly static pictures can be exported using the "Download plot as a png" button.

Edit in Chart Studio

Common arguments

All the methods have the arguments width and height that can be used to customize the plotting area. In general we do not recommend modifing these values as they might disrupt the plot layout.

Most of the methods also have the argument sample. By default pycoQC downsample the number of reads to 100,000 before plotting. This drastically reduces the processing time for large dataset and has a very limited impact on the plot aspect. The sampling is random but deterministic, meaning that you should always obtain the same results for the same dataset. The value can be changed to increase or decrease the number of reads. Alternatively, one can deactivate the behavior by specifying sample=False.

summary_stats_dict

summary_stats_dict is the only pycoQC public method that does not return a plotly Figure object. Instead it returns a dictionary containing information about the run options, read counts during the initialisation step and basic statistics for both pass and all reads.

On top of the overall results, users can also get data split by run_id or barcode.

The dictionary can be easily save in a JSON, YAML or pickle file.

jhelp (pycoQC_plot.summary_stats_dict)

summary_stats_dict ()

Return a dictionnary containing exhaustive information about the run.

p = pycoQC("./data/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz", quiet=True)
print(p.summary_stats_dict())

summary table methods

pycoQC has 3 summary methods to generate simple summary tables with a clickable button to switch from "all reads" to "pass reads" only. * run_summary : General information about the run * basecall_summary : Basecalling related information * alignment_summary : Alignment related information (when a bam file is provided

run_summary

jhelp(pycoQC_plot.run_summary)

run_summary (width, height, plot_title)

Plot an interactive overall summary table

width (default: None) [int]

With of the plotting area in pixel

height (default: 300) [int]

height of the plotting area in pixel

plot_title (default: General run summary) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz", quiet=True)
p.run_summary()

basecall_summary

jhelp(pycoQC_plot.basecall_summary)

basecall_summary (width, height, plot_title)

Plot an interactive basecall summary table

width (default: None) [int]

With of the plotting area in pixel

height (default: 300) [int]

height of the plotting area in pixel

plot_title (default: Basecall summary) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_2_sequencing_summary.txt", quiet=True)
display(p.basecall_summary())

alignment_summary

jhelp(pycoQC_plot.alignment_summary)

alignment_summary (width, height, plot_title)

Plot an interactive alignment summary table

width (default: None) [int]

With of the plotting area in pixel

height (default: 300) [int]

height of the plotting area in pixel

plot_title (default: Alignment summary) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_2_sequencing_summary.txt", bam_file="./large_data/sample_2.bam", quiet=True)
display(p.alignment_summary())

Read Length and Mean quality distribution

pycoQC has 3 methods to visualize the distribution of mean quality scores and of estimated read length: * read_len_1D: An density plot of estimated read length in logarithmic scale * read_qual_1D: An density plot of mean quality scores * read_len_read_qual_2D: A density contour plot of estimated read length vs mean quality scores in semilog scale

Although we recommend to stick to default values, all 3 methods allow users to customize the plots. * The numbers of bin to divide the reads quality and/or length space in can be specified with nbins for the 1D plots and x_nbins / y_nbins for the 2D plot * The intensity of line smoothing (using a gaussian kernel filter) can be specified with smooth_sigma. * Additional cosmetic customization are available: color/colorscale

jhelp(pycoQC_plot.read_len_1D)

read_len_1D (color, nbins, smooth_sigma, width, height, plot_title)

Plot a distribution of read length (log scale)

color (default: lightsteelblue) [str]

Color of the area (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

nbins (default: 200) [int]

Number of bins to devide the x axis in

smooth_sigma (default: 2) [float]

standard deviation for Gaussian kernel

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Basecalled reads length) [str]

Title to display on top of the plot

p = pycoQC("./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz", min_pass_len=200, min_pass_qual=7, quiet=True)
p.read_len_1D()

jhelp(pycoQC_plot.read_qual_1D)

read_qual_1D (color, nbins, smooth_sigma, width, height, plot_title)

Plot a distribution of quality scores

color (default: salmon) [str]

Color of the area (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

nbins (default: 200) [int]

Number of bins to devide the x axis in

smooth_sigma (default: 2) [float]

standard deviation for Gaussian kernel

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Basecalled reads PHRED quality) [str]

Title to display on top of the plot

p = pycoQC("./data/Albacore-2.1.10_basecall-1D-RNA_sequencing_summary.txt.gz", min_pass_len=200, min_pass_qual=7, quiet=True)
p.read_qual_1D()

jhelp(pycoQC_plot.read_len_read_qual_2D)

read_len_read_qual_2D (colorscale, x_nbins, y_nbins, smooth_sigma, width, height, plot_title)

Plot a 2D distribution of quality scores vs length of the reads

colorscale (default: [[0.0, 'rgba(255,255,255,0)'], [0.1, 'rgba(255,150,0,0)'], [0.25, 'rgb(255,100,0)'], [0.5, 'rgb(200,0,0)'], [0.75, 'rgb(120,0,0)'], [1.0, 'rgb(70,0,0)']])

a valid plotly color scale https://plot.ly/python/colorscales/ (Not recommanded to change)

x_nbins (default: 200) [int]

Number of bins to divide the read length values in (x axis)

y_nbins (default: 100) [int]

Number of bins to divide the read quality values in (y axis)

smooth_sigma (default: 2) [float]

standard deviation for 2D Gaussian kernel

width (default: None) [int]

With of the plotting area in pixel

height (default: 600) [int]

height of the plotting area in pixel

plot_title (default: Basecalled reads length vs reads PHRED quality) [str]

Title to display on top of the plot

p = pycoQC("./data/*Albacore*DNA*", min_pass_len=200, min_pass_qual=7, quiet=True)
p.read_len_read_qual_2D ()

Sequencing output, quality and length over experiment time

pycoQC can generate plot showing the evolution of the sequencing output (output_over_time), the mean read quality (qual_over_time) and the read length (len_over_time) over the course of the sequencing run.

Please be aware that if there are multiple run IDs in the source file(s), pycoQC reorder the run IDS by decreasing throughput/second as explained in Initialisation. This means that the over_time plots could be wrong, particularly when mixing several runs together.

For both functions the argument smooth_sigma can be used to modulate the smoothing factor of the gaussian filter, if you are not satisfied with the default result.

The colors of both plots can be fully customised: * cumulative_color and interval_color for output_over_time * median_color, quartile_color and extreme_color for quality_over_time

jhelp(pycoQC_plot.output_over_time)

output_over_time (cumulative_color, interval_color, time_bins, width, height, plot_title)

Plot a yield over time

cumulative_color (default: rgb(204,226,255)) [str]

Color of cumulative yield area (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

interval_color (default: rgb(102,168,255)) [str]

Color of interval yield line (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

time_bins (default: 500) [int]

Number of bins to divide the time values in (x axis)

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Output over experiment time) [str]

Title to display on top of the plot

p  = pycoQC ("./data/Albacore-1.2.1_basecall-1D-DNA_sequencing_summary.txt.gz", min_pass_len=200, min_pass_qual=7, quiet=True)
p.output_over_time ()

jhelp (pycoQC_plot.read_qual_over_time)

read_qual_over_time (median_color, quartile_color, extreme_color, smooth_sigma, time_bins, width, height, plot_title)

Plot a mean quality over time

median_color (default: rgb(250,128,114)) [str]

Color of median line color (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

quartile_color (default: rgb(250,170,160)) [str]

Color of inter quartile area and lines (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

extreme_color (default: rgba(250,170,160,0.5)) [str]

Color of inter extreme area and lines (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-col

smooth_sigma (default: 1) [float]

sigma parameter for the Gaussian filter line smoothing

time_bins (default: 500) [int]

Number of bins to divide the time values in (x axis)

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Read quality over experiment time) [str]

Title to display on top of the plot

p  = pycoQC ("./data/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz", min_pass_len=200, min_pass_qual=7, quiet=True)
p.read_qual_over_time ()

jhelp (pycoQC_plot.read_len_over_time)

read_len_over_time (median_color, quartile_color, extreme_color, smooth_sigma, time_bins, width, height, plot_title)

Plot a read length over time

median_color (default: rgb(102,168,255)) [str]

Color of median line color (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

quartile_color (default: rgb(153,197,255)) [str]

Color of inter quartile area and lines (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

extreme_color (default: rgba(153,197,255,0.5)) [str]

Color of inter extreme area and lines (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-col

smooth_sigma (default: 1) [float]

sigma parameter for the Gaussian filter line smoothing

time_bins (default: 500) [int]

Number of bins to divide the time values in (x axis)

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Read length over experiment time) [str]

Title to display on top of the plot

p  = pycoQC ("./data/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz", min_pass_len=200, min_pass_qual=7, quiet=True)
p.read_len_over_time ()

Barcode distribution

When barcoding information is available, it is possible to generate a pie chart of the barcode count distribution. If no barcode information is available pycoQC throws an error.

It is not rare to have non-relevant barcodes detected at very low level. By default any barcode below 0.1% of the reads is excludes from the plot, but this can be changed with min_percent_barcode.

Similar to the previously described methods colors are customisable with colors.

jhelp(pycoQC_plot.barcode_counts)

barcode_counts (colors, width, height, plot_title)

Plot a mean quality over time

colors (default: ['#f8bc9c', '#f6e9a1', '#f5f8f2', '#92d9f5', '#4f97ba']) [list]

List of colors (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Percentage of reads per barcode) [str]

Title to display on top of the plot

Albacore output example

p  = pycoQC ("./data/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz", min_pass_len=200, min_pass_qual=7, quiet=True)
p.barcode_counts ()

Guppy 2.1 output example

p  = pycoQC (
    summary_file="./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz",
    barcode_file="./data/Guppy-2.1.3_basecall-1D_DNA_barcoding_summary.txt.gz",
    quiet=True)
p.barcode_counts ()

Deepbinner output example

p  = pycoQC (
    summary_file="./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz",
    barcode_file="./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz",
    min_pass_len=200, 
    min_pass_qual=7,
    quiet=True)
p.barcode_counts ()

Channels activity over time

Although the flowcell layout could be visually attractive (see https://github.com/mattloose/flowcellvis) this is not very informative on how the channels generate data during the run.

The channels_activity method generates a heatmap style plot showing the output over time per channel.

The number of channels can be changed to match Minion flowcells (512 default) or Promethion flowcells (3000).

The argument smooth_sigma can be used to modulate the smoothing factor of the gaussian smoothing filter

Colors can be changed with colorscale

jhelp(pycoQC_plot.channels_activity)

channels_activity (colorscale, smooth_sigma, time_bins, width, height, plot_title)

Plot a yield over time

colorscale (default: [[0.0, 'rgba(255,255,255,0)'], [0.01, 'rgb(255,255,200)'], [0.25, 'rgb(255,200,0)'], [0.5, 'rgb(200,0,0)'], [0.75, 'rgb(120,0,0)'], [1.0, 'rgb(0,0,0)']]) [list]

a valid plotly color scale https://plot.ly/python/colorscales/ (Not recommanded to change)

smooth_sigma (default: 1) [float]

sigma parameter for the Gaussian filter line smoothing

time_bins (default: 100) [int]

Number of bins to divide the time values in (y axis)

width (default: None) [int]

With of the plotting area in pixel

height (default: 600) [int]

height of the plotting area in pixel

plot_title (default: Output per channel over experiment time) [str]

Title to display on top of the plot

p  = pycoQC ("./data/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz", quiet=True)
p.channels_activity ()

Alignment length and score distribution

From version 2.5+ PycoQC can also generate ditribution plots for alignments length and score and if a bam file corresponding to the summary file is provided. pycoQC has 5 methods related to alignment score and length:

align_len_1D: A density plot of (primary) alignments length in logarithmic scale
align_score_1D: A density plot of alignments score corresponding to the invert edit distance normalised by alignment length (note that you bam file need )
align_len_align_score_2D: A density contour plot of alignments length vs alignments score in semi-log scale.
read_len_align_len_2D: A density contour plot of estimated basecalled reads length compared vs the actual alignments length in log-log scale.
read_qual_align_score_2D: A density contour plot of Read PHRED quality vs alignments score.

Non-mapped reads are not represented in those plots, and attempting to use the functions without bam file will trigger an error.

jhelp(pycoQC_plot.align_len_1D)

align_len_1D (color, nbins, smooth_sigma, width, height, plot_title)

Plot a distribution of read length (log scale)

color (default: mediumseagreen) [str]

Color of the area (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

nbins (default: 200) [int]

Number of bins to devide the x axis in

smooth_sigma (default: 2) [float]

standard deviation for Gaussian kernel

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Aligned reads length) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_1_sequencing_summary.txt", bam_file="./large_data/sample_1.bam", quiet=True, min_pass_len=200, min_pass_qual=7)
p.align_len_1D()

jhelp(pycoQC_plot.identity_freq_1D)

identity_freq_1D (color, nbins, smooth_sigma, width, height, plot_title)

Plot a distribution of alignments identity

color (default: sandybrown) [str]

Color of the area (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

nbins (default: 200) [int]

Number of bins to devide the x axis in

smooth_sigma (default: 2) [float]

standard deviation for Gaussian kernel

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Aligned reads identity) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_1_sequencing_summary.txt", bam_file="./large_data/sample_1.bam", quiet=True, min_pass_len=200, min_pass_qual=7)
p.identity_freq_1D()

jhelp(pycoQC_plot.align_len_identity_freq_2D)

align_len_identity_freq_2D (colorscale, x_nbins, y_nbins, smooth_sigma, width, height, plot_title)

Plot a 2D distribution of alignments length vs alignments identity

colorscale (default: [[0.0, 'rgba(255,255,255,0)'], [0.1, 'rgba(255,150,0,0)'], [0.25, 'rgb(255,100,0)'], [0.5, 'rgb(200,0,0)'], [0.75, 'rgb(120,0,0)'], [1.0, 'rgb(70,0,0)']])

a valid plotly color scale https://plot.ly/python/colorscales/ (Not recommanded to change)

x_nbins (default: 200) [int]

Number of bins to divide the read length values in (x axis)

y_nbins (default: 100) [int]

Number of bins to divide the read quality values in (y axis)

smooth_sigma (default: 2) [float]

standard deviation for 2D Gaussian kernel

width (default: None) [int]

With of the plotting area in pixel

height (default: 600) [int]

height of the plotting area in pixel

plot_title (default: Aligned reads length vs alignments identity) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_1_sequencing_summary.txt", bam_file="./large_data/sample_1.bam", quiet=True)
p.align_len_identity_freq_2D()

jhelp(pycoQC_plot.read_len_align_len_2D)

read_len_align_len_2D (colorscale, x_nbins, y_nbins, smooth_sigma, width, height, plot_title)

Plot a 2D distribution of length of the reads vs length of the alignments

colorscale (default: [[0.0, 'rgba(255,255,255,0)'], [0.1, 'rgba(255,150,0,0)'], [0.25, 'rgb(255,100,0)'], [0.5, 'rgb(200,0,0)'], [0.75, 'rgb(120,0,0)'], [1.0, 'rgb(70,0,0)']])

a valid plotly color scale https://plot.ly/python/colorscales/ (Not recommanded to change)

x_nbins (default: 200) [int]

Number of bins to divide the read length values in (x axis)

y_nbins (default: 100) [int]

Number of bins to divide the read quality values in (y axis)

smooth_sigma (default: 1) [float]

standard deviation for 2D Gaussian kernel

width (default: None) [int]

With of the plotting area in pixel

height (default: 600) [int]

height of the plotting area in pixel

plot_title (default: Basecalled reads length vs alignments length) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_1_sequencing_summary.txt", bam_file="./large_data/sample_1.bam", quiet=True)
p.read_len_align_len_2D()

jhelp(pycoQC_plot.read_qual_identity_freq_2D)

read_qual_identity_freq_2D (colorscale, x_nbins, y_nbins, smooth_sigma, width, height, plot_title)

Plot a 2D distribution of read quality vs alignments identity

colorscale (default: [[0.0, 'rgba(255,255,255,0)'], [0.1, 'rgba(255,150,0,0)'], [0.25, 'rgb(255,100,0)'], [0.5, 'rgb(200,0,0)'], [0.75, 'rgb(120,0,0)'], [1.0, 'rgb(70,0,0)']])

a valid plotly color scale https://plot.ly/python/colorscales/ (Not recommanded to change)

x_nbins (default: 200) [int]

Number of bins to divide the read length values in (x axis)

y_nbins (default: 100) [int]

Number of bins to divide the read quality values in (y axis)

smooth_sigma (default: 1) [float]

standard deviation for 2D Gaussian kernel

width (default: None) [int]

With of the plotting area in pixel

height (default: 600) [int]

height of the plotting area in pixel

plot_title (default: Reads PHRED quality vs alignments identity) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_1_sequencing_summary.txt", bam_file="./large_data/sample_1.bam", quiet=True, min_pass_len=200, min_pass_qual=7)
p.read_qual_identity_freq_2D()

jhelp(pycoQC_plot.align_len_over_time)

align_len_over_time (median_color, quartile_color, extreme_color, smooth_sigma, time_bins, width, height, plot_title)

Plot a aligned reads length over time

median_color (default: rgb(102,168,255)) [str]

Color of median line color (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

quartile_color (default: rgb(153,197,255)) [str]

Color of inter quartile area and lines (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

extreme_color (default: rgba(153,197,255,0.5)) [str]

Color of inter extreme area and lines (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-col

smooth_sigma (default: 1) [float]

sigma parameter for the Gaussian filter line smoothing

time_bins (default: 500) [int]

Number of bins to divide the time values in (x axis)

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Aligned reads length over experiment time) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_1_sequencing_summary.txt", bam_file="./large_data/sample_1.bam", quiet=True, min_pass_len=200, min_pass_qual=7)
p.align_len_over_time()

jhelp(pycoQC_plot.identity_freq_over_time)

identity_freq_over_time (median_color, quartile_color, extreme_color, smooth_sigma, time_bins, width, height, plot_title)

Plot the alignment identity scores over time

median_color (default: rgb(250,128,114)) [str]

Color of median line color (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

quartile_color (default: rgb(250,170,160)) [str]

Color of inter quartile area and lines (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

extreme_color (default: rgba(250,170,160,0.5)) [str]

Color of inter extreme area and lines (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-col

smooth_sigma (default: 1) [float]

sigma parameter for the Gaussian filter line smoothing

time_bins (default: 500) [int]

Number of bins to divide the time values in (x axis)

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Aligned reads identity over experiment time) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_1_sequencing_summary.txt", bam_file="./large_data/sample_1.bam", quiet=True, min_pass_len=200, min_pass_qual=7)
p.identity_freq_over_time()

Alignments summaries and coverage

If a bam is provided pycoQC can plot 3 alignments specific plots: * alignment_rate: Generates a table and a Sankey Diagram of the number of bases basecalled, mapped, matching... * alignment_summary: Generates a table and a Pie plot indicating primary, secondary, supplementary and unmapped reads * alignment_coverage: Generates a simple coverage plot overview over all the target genome.

jhelp(pycoQC_plot.alignment_rate)

alignment_rate (colors, width, height, plot_title)

Plot a basic alignment summary

colors (default: ['#fcaf94', '#828282', '#fc8161', '#828282', '#f44f39', '#d52221', '#828282', '#828282', '#828282', '#828282']) [list]

List of colors (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

width (default: None) [int]

With of the plotting area in pixel

height (default: 600) [int]

height of the plotting area in pixel

plot_title (default: Bases alignment rate) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_2_sequencing_summary.txt", bam_file="./large_data/sample_2.bam", quiet=True)
p.alignment_rate()

jhelp(pycoQC_plot.alignment_reads_status)

alignment_reads_status (colors, width, height, plot_title)

Plot a basic alignment summary

colors (default: ['#f44f39', '#fc8161', '#fcaf94', '#828282']) [list]

List of colors (hex, rgb, rgba, hsl, hsv or any CSS named colors https://www.w3.org/TR/css-color-3/#svg-color

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Summary of reads alignment status) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_2_sequencing_summary.txt", bam_file="./large_data/sample_2.bam", quiet=True)
p.alignment_reads_status()

jhelp(pycoQC_plot.alignment_coverage)

alignment_coverage (nbins, color, smooth_sigma, width, height, plot_title)

Plot coverage over all the references

nbins (default: 500) [int]

Number of bins to divide the coverage into.

color (default: rgba(70,130,180,0.70)) [str]
smooth_sigma (default: 1) [int]

sigma parameter for the Gaussian filter line smoothing

width (default: None) [int]

With of the plotting area in pixel

height (default: 500) [int]

height of the plotting area in pixel

plot_title (default: Coverage overview) [str]

Title to display on top of the plot

p = pycoQC(summary_file="./large_data/sample_2_sequencing_summary.txt", bam_file="./large_data/sample_2.bam", quiet=True)
p.alignment_coverage()

Generating HTML and json reports

From version 2.3+ the PycoQC API can also generate an interactive HTML report and a text JSON report similar to the CLI functionality.

p = pycoQC(
    summary_file="./large_data/sample_2_sequencing_summary.txt",
    bam_file="./large_data/sample_2.bam",
    html_outfile="./results/sample_2.html",
    json_outfile="./results/sample_2.json",
    skip_coverage_plot=True)

print (p)