Skip to content

pycoQC CLI Usage

PycoQC CLI can generate a beautiful HTML formatted report containing interactive D3.js plots. On top of it, the CLI can also dump summary information in a JSON formated file allowing easy parsing with third party tools.

The report is dynamically generated depending on the information available in the summary file.

CLI Usage

Activate virtual environment

# Using conda here but can also be done with other virtenv managers 
conda activate pycoQC
(pycoQC) (pycoQC) 

Getting help

pycoQC -h
usage: pycoQC [-h] [--version]
              [--summary_file [SUMMARY_FILE [SUMMARY_FILE ...]]]
              [--barcode_file [BARCODE_FILE [BARCODE_FILE ...]]]
              [--bam_file [BAM_FILE [BAM_FILE ...]]]
              [--html_outfile HTML_OUTFILE] [--json_outfile JSON_OUTFILE]
              [--min_pass_qual MIN_PASS_QUAL] [--min_pass_len MIN_PASS_LEN]
              [--filter_calibration] [--filter_duplicated]
              [--min_barcode_percent MIN_BARCODE_PERCENT]
              [--report_title REPORT_TITLE] [--template_file TEMPLATE_FILE]
              [--config_file CONFIG_FILE] [--skip_coverage_plot]
              [--sample SAMPLE] [--default_config] [-v | -q]

pycoQC computes metrics and generates interactive QC plots from the sequencing summary
report generated by Oxford Nanopore technologies basecallers

* Minimal usage
    pycoQC -f sequencing_summary.txt -o pycoQC_output.html
* Including Guppy barcoding file + html output + json output
    pycoQC -f sequencing_summary.txt -b barcoding_sequencing.txt -o pycoQC_output.html -j pycoQC_output.json
* Including Bam file + html output
    pycoQC -f sequencing_summary.txt -a alignment.bam -o pycoQC_output.html

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         Increase verbosity
  -q, --quiet           Reduce verbosity

Input/output options:
  --summary_file [SUMMARY_FILE [SUMMARY_FILE ...]], -f [SUMMARY_FILE [SUMMARY_FILE ...]]
                        Path to a sequencing_summary generated by Albacore
                        1.0.0 + (read_fast5_basecaller.py) / Guppy 2.1.3+
                        (guppy_basecaller). One can also pass multiple space
                        separated file paths or a UNIX style regex matching
                        multiple files (Required)
  --barcode_file [BARCODE_FILE [BARCODE_FILE ...]], -b [BARCODE_FILE [BARCODE_FILE ...]]
                        Path to the barcode_file generated by Guppy 2.1.3+
                        (guppy_barcoder) or Deepbinner 0.2.0+. This is not a
                        required file. One can also pass multiple space
                        separated file paths or a UNIX style regex matching
                        multiple files (optional)
  --bam_file [BAM_FILE [BAM_FILE ...]], -a [BAM_FILE [BAM_FILE ...]]
                        Path to a Bam file corresponding to reads in the
                        summary_file. Preferably aligned with Minimap2 One can
                        also pass multiple space separated file paths or a
                        UNIX style regex matching multiple files (optional)
  --html_outfile HTML_OUTFILE, -o HTML_OUTFILE
                        Path to an output html file report (required if
                        json_outfile not given)
  --json_outfile JSON_OUTFILE, -j JSON_OUTFILE
                        Path to an output json file report (required if
                        html_outfile not given)

Filtering options:
  --min_pass_qual MIN_PASS_QUAL
                        Minimum quality to consider a read as 'pass' (default:
                        7)
  --min_pass_len MIN_PASS_LEN
                        Minimum read length to consider a read as 'pass'
                        (default: 0)
  --filter_calibration  If given, reads flagged as calibration strand by the
                        basecaller are removed (default: False)
  --filter_duplicated   If given, duplicated read_ids are removed but the
                        first occurence is kept (Guppy sometimes outputs the
                        same read multiple times) (default: False)
  --min_barcode_percent MIN_BARCODE_PERCENT
                        Minimal percent of total reads to retain barcode
                        label. If below, the barcode value is set as
                        `unclassified` (default: 0.1)

HTML report options:
  --report_title REPORT_TITLE
                        Title to use in the html report (default: PycoQC
                        report)
  --template_file TEMPLATE_FILE
                        Jinja2 html template for the html report (default: )
  --config_file CONFIG_FILE
                        Path to a JSON configuration file for the html report.
                        If not provided, looks for it in ~/.pycoQC and
                        ~/.config/pycoQC/config. If it's still not found,
                        falls back to default parameters. The first level keys
                        are the names of the plots to be included. The second
                        level keys are the parameters to pass to each plotting
                        function (default: )")
  --skip_coverage_plot  Skip the coverage plot in HTML report. Useful when
                        using a reference file containing many sequences, i.e.
                        transcriptome (default: False)

Other options:
  --sample SAMPLE       If not None a n number of reads will be randomly
                        selected instead of the entire dataset for ploting
                        function (deterministic sampling) (default: 100000)
  --default_config, -d  Print default configuration file. Can be used to
                        generate a template JSON file (default: False)
(pycoQC) 

Usage examples

Basic usage (quiet mode)

pycoQC \
    -f ./data/Albacore-1.2.1_basecall-1D-DNA_sequencing_summary.txt.gz \
    -o ./results/Albacore-1.2.1_basecall-1D-DNA.html \
    --quiet
Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

JSON data output on top of the html report

A json report can be generated on top (or instead) of the html report

It contains a summarized version of the data collected by pycoQC in a structured and easy to parse format

pycoQC \
    -f ./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz \
    -o ./results/Guppy-2.1.3_basecall-1D_RNA.html \
    -j ./results/Guppy-2.1.3_basecall-1D_RNA.json \
    --quiet
Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

Including guppy barcoding information

pycoQC \
    -f ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz \
    -b ./data/Guppy-2.1.3_basecall-1D_DNA_barcoding_summary.txt.gz \
    -o ./results/Guppy-2.1.3_basecall-1D_DNA_barcode.html \
    --quiet

Matching multiple files with a regex and add a title to report

pycoQC \
    -f ./data/Albacore*RNA* \
    -o ./results/Albacore_all_RNA.html \
    --report_title "All RNA runs" \
    --quiet

Tweak filtering parameters

  • Define reads with a quality higher than 8 and length higher than 200 bases as "pass"
  • Discard reads aligned on the calibration standard
  • Unset value of any barcode found in less than 10% of the reads
pycoQC \
    -f ./data/Albacore-2.1.10_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o ./results/Albacore-2.1.10_basecall-1D-DNA.html \
    --min_pass_qual 8 \
    --min_pass_len 200 \
    --filter_calibration \
    --min_barcode_percent 10 \
    --quiet
Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

Including Alignments information for a Bam file

pycoQC \
    -f ./large_data/sample_1_sequencing_summary.txt \
    -a ./large_data/sample_1.bam \
    -o ./results/Guppy-2.3_basecall-1D_alignment-DNA.html \
    -j ./results/Guppy-2.3_basecall-1D_alignment-DNA.json \
    --quiet
Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC) 

Advanced configuration with custon json file

Although we recommend to stick to the default parameters, a json formatted configuration file can be provided to tweak the plots. A default configuration file can be generated using:

pycoQC --default_config
{
  "run_summary": {
    "plot_title": "General run summary"
  },
  "basecall_summary": {
    "plot_title": "Basecall summary"
  },
  "alignment_summary": {
    "plot_title": "Alignment summary"
  },
  "read_len_1D": {
    "plot_title": "Basecalled reads length",
    "color": "lightsteelblue",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "align_len_1D": {
    "plot_title": "Aligned reads length",
    "color": "mediumseagreen",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "read_qual_1D": {
    "plot_title": "Basecalled reads PHRED quality",
    "color": "salmon",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "identity_freq_1D": {
    "plot_title": "Aligned reads identity",
    "color": "sandybrown",
    "nbins": 200,
    "smooth_sigma": 2
  },
  "read_len_read_qual_2D": {
    "plot_title": "Basecalled reads length vs reads PHRED quality",
    "x_nbins": 200,
    "y_nbins": 100,
    "smooth_sigma": 2
  },
  "read_len_align_len_2D": {
    "plot_title": "Basecalled reads length vs alignments length",
    "x_nbins": 200,
    "y_nbins": 100,
    "smooth_sigma": 1
  },
  "align_len_identity_freq_2D": {
    "plot_title": "Aligned reads length vs alignments identity",
    "x_nbins": 200,
    "y_nbins": 100,
    "smooth_sigma": 2
  },
  "read_qual_identity_freq_2D": {
    "plot_title": "Reads PHRED quality vs alignments identity",
    "x_nbins": 200,
    "y_nbins": 100,
    "smooth_sigma": 1
  },
  "output_over_time": {
    "plot_title": "Output over experiment time",
    "cumulative_color": "rgb(204,226,255)",
    "interval_color": "rgb(102,168,255)"
  },
  "read_len_over_time": {
    "plot_title": "Read length over experiment time",
    "median_color": "rgb(102,168,255)",
    "quartile_color": "rgb(153,197,255)",
    "extreme_color": "rgba(153,197,255,0.5)",
    "smooth_sigma": 1
  },
  "read_qual_over_time": {
    "plot_title": "Read quality over experiment time",
    "median_color": "rgb(250,128,114)",
    "quartile_color": "rgb(250,170,160)",
    "extreme_color": "rgba(250,170,160,0.5)",
    "smooth_sigma": 1
  },
  "align_len_over_time": {
    "plot_title": "Aligned reads length over experiment time",
    "median_color": "rgb(102,168,255)",
    "quartile_color": "rgb(153,197,255)",
    "extreme_color": "rgba(153,197,255,0.5)",
    "smooth_sigma": 1
  },
  "identity_freq_over_time": {
    "plot_title": "Aligned reads identity over experiment time",
    "median_color": "rgb(250,128,114)",
    "quartile_color": "rgb(250,170,160)",
    "extreme_color": "rgba(250,170,160,0.5)",
    "smooth_sigma": 1
  },
 "barcode_counts": {
    "plot_title": "Number of reads per barcode",
    "colors": [
      "#f8bc9c",
      "#f6e9a1",
      "#f5f8f2",
      "#92d9f5",
      "#4f97ba"
    ]
  },
  "channels_activity": {
    "plot_title": "Channel activity over time",
    "smooth_sigma": 1
  },
  "alignment_reads_status": {
    "plot_title": "Summary of reads alignment",
    "colors": [
      "#f8bc9c",
      "#f6e9a1",
      "#92d9f5",
      "#4f97ba",
      "#f5f8f2"
    ]
  },
  "alignment_rate": {
    "plot_title": "Bases alignment rate",
    "colors": [
      "#fcaf94",
      "#828282",
      "#fc8161",
      "#828282",
      "#f44f39",
      "#d52221",
      "#828282",
      "#828282",
      "#828282",
      "#828282"
    ]
  },
  "alignment_coverage": {
    "plot_title": "Coverage overview",
    "nbins": 500,
    "color": "rgba(102,168,255,0.75)",
    "smooth_sigma": 1
  }
}
(pycoQC) 

To save and edit it redirect the std output to a file and make your changes using your favorite text editor.

To remove a plot from the report, just remove it (or comment it) from the configuration file

The configuration file accept all the arguments of the target plotting functions. For more information refer to the API documentation

pycoQC --default_config > data/pycoQC_config.json
(pycoQC) 

Run pycoQC with --config option

pycoQC \
    -f ./data/Albacore-1.7.0_basecall-1D-DNA_sequencing_summary.txt.gz\
    -o ./results/Albacore-1.7.0_basecall-1D-DNA.html \
    --config ./data/pycoQC_config.json \
    --quiet
Checking arguments values
Check input data files
Parse data files
Merge data
Cleaning data
Loading plotting interface
(pycoQC)