Fast5_to_seq_summary API Usage
Running Jupyter notebook
If you want to run pycoQC interactively in Jupyter you need to install Jupyter manually. If you installed pycoQC in a virtual environment then install Jupyter in the same virtual environment.
pip3 install notebook
Launch the notebook in a shell terminal
jupyter notebook
If it does not auto-start, open the following URL in you favorite web browser http://localhost:8888/tree
From Jupyter homepage you can navigate to the directory you want to work in and create a new Python3 Notebook.
Imports
# Run cell with Ctrl + Enter
# Import main pycoQC module
from pycoQC.Fast5_to_seq_summary import Fast5_to_seq_summary
# Import helper functions from pycoQC
from pycoQC.common import jhelp, head
Running Fast5_to_seq_summary
jhelp(Fast5_to_seq_summary)
Fast5_to_seq_summary (fast5_dir, seq_summary_fn, max_fast5, threads, basecall_id, verbose_level, include_path, fields)
Create a summary file akin the one generated by Albacore or Guppy from a directory containing multiple fast5 files. The script will attempt to extract all the required fields but will not raise an error if not found.
- fast5_dir (required) [str]
Directory containing fast5 files. Can contain multiple subdirectories
- seq_summary_fn (required) [str]
path of the summary sequencing file where to write the data extracted from the fast5 files
- max_fast5 (default: 0) [int]
Maximum number of file to try to parse. 0 to deactivate
- threads (default: 4) [int]
Total number of threads to use. 1 thread is used for the reader and 1 for the writer. Minimum 3 (default = 4)
- basecall_id (default: 0) [int]
id of the basecalling group. By default leave to 0, but if you perfome multiple basecalling on the same fast5 files, this can be used to indicate the corresponding group (1, 2 ...)
- verbose_level (default: 0) [int]
Level of verbosity, from 2 (Chatty) to 0 (Nothing)
- include_path (default: False) [bool]
If True the absolute path to the corresponding file is added in an extra column
- fields (default: ['read_id', 'run_id', 'channel', 'start_time', 'sequence_length_template', 'mean_qscore_template', 'calibration_strand_genome_template', 'barcode_arrangement']) [list]
list of field names corresponding to attributes to try to fetch from the fast5 files. List a valid field names: mean_qscore_template, sequence_length_template, called_events, skip_prob, stay_prob, step_prob, strand_score, read_id, start_time, duration, start_mux, read_number, channel, channel_digitisation, channel_offset, channel_range, channel_sampling, run_id, sample_id, device_id, protocol_run, flow_cell, calibration_strand, calibration_strand, calibration_strand, calibration_strand, barcode_arrangement, barcode_full, barcode_score
Basic usage
This minimal usage creates a minimal file compatible with pycoQC
Fast5_to_seq_summary (
fast5_dir="./data/",
seq_summary_fn="./results/summary_sequencing.tsv",
verbose_level=1)
head ("./results/summary_sequencing.tsv")
Multi-threading support
Fast5_to_seq_summary (
fast5_dir="./data/",
seq_summary_fn="./results/summary_sequencing.tsv",
verbose_level=1,
threads=10)
head ("./results/summary_sequencing.tsv")
Customize fields of the summary file
Fast5_to_seq_summary (
fast5_dir="./data/",
seq_summary_fn="./results/custom_summary_sequencing.tsv",
threads=6,
verbose_level=1,
fields=["mean_qscore_template", "called_events", "duration", "strand_score"])
head ("./results/custom_summary_sequencing.tsv")
Add file path
Fast5_to_seq_summary (
fast5_dir="./data/",
seq_summary_fn="./results/fn_summary_sequencing.tsv",
threads=6,
verbose_level=1,
include_path=True)
head ("./results/fn_summary_sequencing.tsv")