Fastq API Usage
Import package
from pyBioTools import Fastq
from pyBioTools.common import jhelp
index_reads
jhelp(Fastq.Filter)
Filter (input_fn, output_fn, min_len, min_qual, remove_duplicates, qual_offset, verbose, quiet, progress, kwargs)
Filter fastq reads based on their length, mean quality and the presence of duplicates. Can also be used to concatenate reads from multiple files in a single one.
- input_fn (required) [list(str)]
Fastq file path or directory containing fastq files or list of files, or regex or list of regex. It is quite flexible.
- output_fn (required) [str]
Destination fastq file. Automatically gzipped if the .gz extension is found
- min_len (default: None) [int]
Minimal reads length
- min_qual (default: None) [float]
Minimal mean read PHRED quality
- remove_duplicates (default: False) [bool]
If true duplicated reads with the same read id are discarded
- qual_offset (default: 33) [int]
Quality scoring system off set. Nowadays pretty much everyone uses +33
verbose (default: False) [bool]
quiet (default: False) [bool]
progress (default: False) [bool]
kwargs
Basic usage
Fastq.Filter ("./data/sample_1.fastq", "./output/sample_1_filtered.fastq", min_len=100, min_qual=7, remove_duplicates=True, verbose=True)
All fastq from a directory instead and write to compressed fastq
Fastq.Filter ("./data/", "./output/sample_1_filtered.fastq.gz", min_len=100, min_qual=7, verbose=True)