Sekator 0.2

Multithreaded quality and adapter trimmer for PAIRED fastq files (Python2.7/Cython/C)

Creation : 2015/03/31

Last update : 2015/03/31

Motivation

Sekator is a python2.7, cython 0.21+, C object oriented semi-compiled program performing fastq quality trimming and adapter trimming

Specific features:

Principle

  1. A configuration file containing all program parameters (including sample/adpater association) is parsed and thoroughly verified for validity.
  2. Paired fastq paired files are read read by read and sample by sample with a custom Fastq parser supporting Illumina 1.8 Phred+33 quality encoding only.
  3. If required, a quality trimming of reads can be performed with a quality sliding windows, starting from both ends of reads. Reads of insufficient quality or too short after trimming are discarded, together with their paired mate.
  4. If required, an adapter trimming of reads can be performed with the adapters provided for each sample. Imperfect matches can be found anywhere in the reads for as many adapters as required thanks to an optimized and fast Smith and Waterman Algorithm. If adapters matches are found in a read, the longest part of the read without adapter match is extracted. Reads too short after trimming are discarded, together with their paired mate.
  5. The paired reads that passed thought the trimming steps are subsequently writen in new fastq.gz files (R1 and R2) in Illumina 1.8 Phred+33 quality encoding.
  6. A progress bar indicates the advancement of sequence processing and a report is generated for each sample.

Dependencies

The program was developed under Linux Mint 17 and was not tested with other OS. In addition to python2.7 and gcc 4.8 + the following dependencies are required for proper program execution:

If you have pip already installed, enter the following line to install packages: sudo pip install numpy cython

Get and install Sekator

Usage

In the folder where fastq files will be created

Usage: Sekator.py -c Conf.txt [-i -h]

Options:
  --version     show program's version number and exit
  -h, --help    show this help message and exit
  -c CONF_FILE  Path to the configuration file [Mandatory]
  -i            Generate an example configuration file and exit [Facultative]

An example configuration file can be generated by running the program with the option -i The possible options are extensively described in the configuration file. The program can be tested from the test folder with the dataset provided and the default configuration file.

cd ./test/result
Sekator.py -i
Sekator.py -c Sekator_conf_file.txt

Authors and Contact

Adrien Leger - 2014