Fastq CLI Usage

Activate virtual environment

# Using virtualenvwrapper here but can also be done with Conda 
conda activate pyBioTools

(pyBioTools)

filter_reads

Get help

pyBioTools Fastq Filter -h

usage: pyBioTools Fastq Filter [-h] -i [INPUT_FN [INPUT_FN ...]] -o OUTPUT_FN
                               [-l MIN_LEN] [-u MIN_QUAL] [-r]
                               [-f QUAL_OFFSET] [-v] [-q] [--progress]

Filter fastq reads based on their length, mean quality and the presence of
duplicates. Can also be used to concatenate reads from multiple files in a
single one.

optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT_FN [INPUT_FN ...]], --input_fn [INPUT_FN [INPUT_FN ...]]
                        Fastq file path or directory containing fastq files or
                        list of files, or regex or list of regex. It is quite
                        flexible. (required) [str]
  -o OUTPUT_FN, --output_fn OUTPUT_FN
                        Destination fastq file. Automatically gzipped if the
                        .gz extension is found (required) [str]
  -l MIN_LEN, --min_len MIN_LEN
                        Minimal reads length (default: None) [int]
  -u MIN_QUAL, --min_qual MIN_QUAL
                        Minimal mean read PHRED quality (default: None)
                        [float]
  -r, --remove_duplicates
                        If true duplicated reads with the same read id are
                        discarded (default: False) [None]
  -f QUAL_OFFSET, --qual_offset QUAL_OFFSET
                        Quality scoring system off set. Nowadays pretty much
                        everyone uses +33 (default: 33) [int]
  -v, --verbose         Increase verbosity (default: False)
  -q, --quiet           Reduce verbosity (default: False)
  --progress            Display a progress bar
(pyBioTools)

Basic usage

pyBioTools Fastq Filter -i ./data/sample_1.fastq -o ./output/sample_1_filtered.fastq --min_len 100 --min_qual 7 --remove_duplicates --verbose

## Running Fastq Filter ##
    Parsing reads
    [DEBUG]: Reading file ./data/sample_1.fastq
    [DEBUG]: End of file ./data/sample_1.fastq
    Read counts summary
     total_reads: 12,000
     valid_reads: 10,882
     low_qual_reads: 643
     short_reads: 474
     source files: 1
     duplicate_reads: 1
(pyBioTools)

All fastq from a directory instead and write to compressed fastq

pyBioTools Fastq Filter -i ./data/ -o ./output/sample_1_filtered.fastq.gz --min_len 100 --min_qual 7 --verbose

## Running Fastq Filter ##
    Parsing reads
    [DEBUG]: Reading file ./data/sample_1.fastq
    [DEBUG]: End of file ./data/sample_1.fastq
    [DEBUG]: Reading file ./data/sample_2.fastq
    [DEBUG]: End of file ./data/sample_2.fastq
    Read counts summary
     total_reads: 24,000
     valid_reads: 21,809
     low_qual_reads: 1,304
     short_reads: 887
     source files: 2
(pyBioTools)