Skip to content

Barcode_split CLI Usage

Activate virtual environment

# Using virtualenvwrapper here but can also be done with Conda 
workon pycoQC
(pycoQC) 

Getting help

Barcode_split -h
usage: Barcode_split [-h] [--version] --summary_file
                     [SUMMARY_FILE [SUMMARY_FILE ...]]
                     [--barcode_file [BARCODE_FILE [BARCODE_FILE ...]]]
                     [--output_dir OUTPUT_DIR] [--output_unclassified]
                     [--min_barcode_percent MIN_BARCODE_PERCENT] [-v | -q]

Barcode_split is a simple tool to split sequencing summary report in per
barcodes

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --summary_file [SUMMARY_FILE [SUMMARY_FILE ...]], -f [SUMMARY_FILE [SUMMARY_FILE ...]]
                        Path to a sequencing_summary generated by Albacore
                        1.0.0 + (read_fast5_basecaller.py) / Guppy 2.1.3+
                        (guppy_basecaller). One can also pass multiple space
                        separated file paths or a UNIX style regex matching
                        multiple files
  --barcode_file [BARCODE_FILE [BARCODE_FILE ...]], -b [BARCODE_FILE [BARCODE_FILE ...]]
                        Path to the barcode_file generated by Guppy 2.1.3+
                        (guppy_barcoder) or Deepbinner 0.2.0+. One can also
                        pass multiple space separated file paths or a UNIX
                        style regex matching multiple files
  --output_dir OUTPUT_DIR, -o OUTPUT_DIR
                        Folder where to output split barcode data (default:
                        current dir
  --output_unclassified, -u
                        If given, unclassified barcodes are also written in a
                        file. By default they are skiped
  --min_barcode_percent MIN_BARCODE_PERCENT, -p MIN_BARCODE_PERCENT
                        Minimal percent of total reads to retain barcode
                        label. If below, the barcode value is set as
                        `unclassified` (default: 0.1)
  -v, --verbose         Increase verbosity
  -q, --quiet           Reduce verbosity
(pycoQC) 

Usage examples

Basic usage

Barcode_split \
    -f './data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz' \
    -o "./results/"
Import data from sequencing summary file(s) and cleanup
    Read files and import in a dataframe
Check input data files
Parse data files
Merge data
    Cleanup missing barcodes values
    Cleaning up low frequency barcodes
Split data per barcode
    Processing data for Barcode barcode02
    Processing data for Barcode barcode07
    Processing data for Barcode barcode08
    Processing data for Barcode barcode09
    Processing data for Barcode barcode10
    Processing data for Barcode barcode11
    Processing data for Barcode barcode12
    Processing data for Barcode unclassified
Barcode Counts
              Counts  Write
barcode02          2  False
barcode07          1  False
barcode08         30  False
barcode09       9945   True
barcode10      12644   True
barcode11      13594   True
barcode12       9813   True
unclassified    3971  False
(pycoQC) 

With externaly provided barcodes

Barcode_split \
    -f "./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz" \
    -b "./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz" \
    -o "./results/" \
    -v
General info
    package_name: pycoQC
    package_version: 2.5.0.17
    timestamp: 2020-01-09 16:57:57.525774

Runtime options
    quiet: False
    verbose: True
    min_barcode_percent: 0.1
    output_unclassified: False
    output_dir: ./results/
    barcode_file: ['./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz']
    summary_file: ['./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz']

Import data from sequencing summary file(s) and cleanup
    Read files and import in a dataframe
Check input data files
        Sequencing summary files found: ./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz
        Barcode files found: ./data/Guppy-basecall-1D-DNA_deepbinner_barcoding_summary.txt.gz
Parse data files
    Parse summary files
        3,999 reads found in initial file
    Parse barcode files
        Found valid Deepbinner barcode file
        3,775 reads with barcodes assigned
Merge data
    Cleanup missing barcodes values
    Cleaning up low frequency barcodes
Split data per barcode
    Processing data for Barcode 1
    Processing data for Barcode 2
    Processing data for Barcode 3
    Processing data for Barcode 4
    Processing data for Barcode 5
    Processing data for Barcode 6
    Processing data for Barcode 7
    Processing data for Barcode 8
    Processing data for Barcode unclassified
Barcode Counts
              Counts  Write
1                534   True
2                206   True
3                562   True
4                579   True
5                590   True
6                655   True
7                271   True
8                378   True
unclassified     224  False
(pycoQC)