Skip to content

Using pycoQC

User interface

PycoQC was designed to be used either through a python Application programming interface (API) for Jupyter notebook or a command line interface (CLI).

Jupyter API

The Jupyter Notebook is a fantastic tool that can be used in many different ways, in particular to share you analyses in an interactive environment with other people.

One of the specificity of pycoQC is to have a rich python API meant to be used directly inside a Jupyter notebook. The pycoQC API for Jupyter is very flexible and allows you to explore your nanopore data interactively and in more depth than with the command line interface.

An online live version of the usage notebook served by MyBinder is also available to familiarize with the package API:

nb

Shell CLI

On top of the jupyter interface, pycoQC also comes with a command line interface that can generate a beautiful HTML formatted report containing interactive D3.js plots.

Input files and options

Calibration strand reads

Depending on the run type and the version of Albacore/Guppy calibration reads informations might not be available. For example, these reads were not flagged in early versions of Albacore and are not flagged any more in Guppy. By default calibration reads are kept, but it is also possible to discard them using the option filter_calibration.

Minimal "pass" quality and length

By default pycoQC assumes that the minimal mean quality for a "pass" read is 7 (same as default Albacore/Guppy value). However if needed the value can be specified at initialisation with min_pass_qual.

In addition it is also possible to define a minimal length in base for a "pass" read using min_pass_len. By default it is set to 0 but 200 is a sensible value

Summary sequencing file

PycoQC needs a text summary file generated by ONT Albacore or Guppy. For 1D run use the file named sequencing_summary.txt available the root of Albacore/Guppy output directory. For 1D2, use the sequencing_1dsq_summary.txt file that can be found in the 1dsq_analysis directory. The run type is automatically detected from the file.

PycoQC can read compressed sequencing_summary.txt files (‘gzip’, ‘bz2’, ‘zip’, ‘xz’) and instead of a single file it is also possible to pass a UNIX style regex to match multiple files

Depending on the run type and the version of Albacore used some informations might not be available. In particular calibration reads were not flagged in early versions of Albacore. When the field is available those reads are automatically discarded. Similarly barcodes information are only available in multiplexed runs.

PycoQC requires the following fields in the sequencing.summary file:

  • 1D run => read_id, run_id, channel, start_time, sequence_length_template, mean_qscore_template
  • 1D2 run => read_id, run_id, channel, start_time, sequence_length_2d, mean_qscore_2d

Barcoded datasets

Barcodes information is only available in multiplexed runs. For Albacore, this is contained directly in the sequencing summary file and it is automatically fetched when available. For Guppy, barcodes identification is now done after basecalling with a separate program (guppy_barcoder) which generates a barcoding_summary.txt file. PycoQC can read this file and (barcode_file option) and merge the barcode information with the sequencing summary data. By default any barcode found in less than 0.1% of the reads is automatically considered "Unclassified". This is to reduce "noise" due to low frequency randomly attributed barcode. This threshold can be changed using min_barcode_percent.

BAM files

Since version 2.5 pycoQC can also integrate alignment information from a BAM file corresponding to a sequencing summary files. To do one can use the bam_file option. Providing a Bam file will allow pycoQC to generate 8 additional plots. To get the most out of the alignment QC it is recommended to use an aligner which generated either an "NM" or an "MD" tag such as Minimap2.

Example files

pycoQC repository contains several example sequencing summary files generated with various version of Albacore and Guppy. Each of those files only contains 10,000 reads. On top of these summary files for some versions of Guppy the barcode information was stored in a separate barcoding summary file. The same applied for barcode informations obtained with Deepbinner. Example files be found directly in the repository from the following address: https://github.com/a-slide/pycoQC/tree/master/docs/pycoQC/data

Larger versions of some of these files are also available from https://www.ebi.ac.uk/~aleg/data/pycoQC_test/