Alignment CLI Usage
Activate virtual environment
# Using virtualenvwrapper here but can also be done with Conda
conda activate pyBioTools
(pyBioTools)
Reads_index
Get help
pyBioTools Alignment Reads_index -h
usage: pyBioTools Alignment Reads_index [-h] -i INPUT_FN [-u] [-s] [-p] [-v]
[-q] [--progress]
Index reads found in a coordinated sorted bam file by read_id. The created
index file can be used to randon access the alignment file per read_id
optional arguments:
-h, --help show this help message and exit
-i INPUT_FN, --input_fn INPUT_FN
Path to the bam file to index (required) [str]
-u, --skip_unmapped Filter out unmapped reads (default: False) [None]
-s, --skip_secondary Filter out secondary alignment (default: False) [None]
-p, --skip_supplementary
Filter out supplementary alignment (default: False)
[None]
-v, --verbose Increase verbosity (default: False)
-q, --quiet Reduce verbosity (default: False)
--progress Display a progress bar
(pyBioTools)
Basic usage
pyBioTools Alignment Reads_index -i ./data/sample_1.bam
## Running Alignment Reads_index ##
Checking Bam file
Parsing reads
Read counts summary
Reads retained
total: 13,684
primary: 10,584
secondary: 1,496
unmapped: 1,416
supplementary: 188
(pyBioTools)
Excluding reads from index
pyBioTools Alignment Reads_index -i ./data/sample_1.bam --verbose --skip_secondary --skip_unmapped
## Running Alignment Reads_index ##
Checking Bam file
Parsing reads
Read counts summary
Reads retained
total: 10,772
primary: 10,584
supplementary: 188
Reads discarded
total: 2,912
secondary: 1,496
unmapped: 1,416
(pyBioTools)
References_sample
Get help
pyBioTools Alignment References_sample -h
usage: pyBioTools Alignment References_sample [-h] -i
[INPUT_FN [INPUT_FN ...]]
[-o OUTPUT_FN]
[-s SELECTED_READS_FN]
[-f FRAC_READS]
[-r MIN_READS_REF]
[-t SORTING_THREADS]
[--rand_seed RAND_SEED] [-v]
[-q] [--progress]
Randomly sample reads per references according to a fraction od the reads
mapped to this reference for a one or several files and write selected reads
in a new bam file
optional arguments:
-h, --help show this help message and exit
-i [INPUT_FN [INPUT_FN ...]], --input_fn [INPUT_FN [INPUT_FN ...]]
Bam file path or directory containing bam files or
list of files, or regex or list of regex. It is quite
flexible. All files need to be sorted and aligned to
the same reference file. (required) [str]
-o OUTPUT_FN, --output_fn OUTPUT_FN
Path to the output bam file (sorted and indexed)
(default: out.bam) [str]
-s SELECTED_READS_FN, --selected_reads_fn SELECTED_READS_FN
Path to the output text file containing all the read
id selected (default: select_ref.txt) [str]
-f FRAC_READS, --frac_reads FRAC_READS
Fraction of reads mapped to sample for each reference
(default: 0.5) [int]
-r MIN_READS_REF, --min_reads_ref MIN_READS_REF
Minimal read coverage per file and reference before
sampling (default: 30) [int]
-t SORTING_THREADS, --sorting_threads SORTING_THREADS
Number of threads to use for bam file sorting
(default: 4) [int]
--rand_seed RAND_SEED
Seed to use for the pseudo randon generator. For non
deterministic behaviour set to None (default: 42)
[int]
-v, --verbose Increase verbosity (default: False)
-q, --quiet Reduce verbosity (default: False)
--progress Display a progress bar
(pyBioTools)
Basic usage
pyBioTools Alignment References_sample \
--input_fn "./data/sample_*.bam" \
--output_fn "./output/sample_References_sample.bam" \
--selected_reads_fn "./output/sample_References_sample_refid.txt" \
--frac_reads 0.25 \
--min_reads_ref 100 \
--progress
## Running Alignment Ref_sample ##
## Index files ##
Indexing alignment file ./data/sample_2.bam
Reading : 13678 Reads [00:00, 19324.68 Reads/s]
Indexing alignment file ./data/sample_1.bam
Reading : 13684 Reads [00:00, 23822.68 Reads/s]
Raw read counts summary
primary reads: 21,185
secondary reads: 2,966
unmapped reads: 2,815
supplementary reads: 396
## Randomly pick reads per references ##
## Sample reads and write to output file ##
Writing selected reads for bam file ./data/sample_2.bam
Writing : 100%|██████████████████████| 2656/2656 [00:02<00:00, 1210.73 Reads/s]
Writing selected reads for bam file ./data/sample_1.bam
Writing : 100%|██████████████████████| 2653/2653 [00:02<00:00, 1247.70 Reads/s]
Sort BAM File
Index sorted BAM File
Selected read counts summary
valid reads: 21,185
valid sampled reads: 5,309
valid references: 17
(pyBioTools)
Reads_sample
Get help
pyBioTools Alignment Reads_sample -h
usage: pyBioTools Alignment Reads_sample [-h] -i INPUT_FN [-o OUTPUT_FOLDER]
[-p OUTPUT_PREFIX] [-r N_READS]
[-s N_SAMPLES]
[--rand_seed RAND_SEED] [-v] [-q]
[--progress]
Randomly sample `n_reads` reads from a bam file and write downsampled files in
`n_samples` bam files. If the input bam file is not indexed by read_id
`index_reads` is automatically called.
optional arguments:
-h, --help show this help message and exit
-i INPUT_FN, --input_fn INPUT_FN
Path to the indexed bam file (required) [str]
-o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
Path to a folder where to write sample files (default:
./) [str]
-p OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
Path to a folder where to write sample files (default:
out) [str]
-r N_READS, --n_reads N_READS
Number of randomly selected reads in each sample
(default: 1000) [int]
-s N_SAMPLES, --n_samples N_SAMPLES
Number of samples to generate files for (default: 1)
[int]
--rand_seed RAND_SEED
Seed to use for the pseudo randon generator. For non
deterministic behaviour set to 0 (default: 42) [int]
-v, --verbose Increase verbosity (default: False)
-q, --quiet Reduce verbosity (default: False)
--progress Display a progress bar
(pyBioTools)
Basic usage
pyBioTools Alignment Reads_sample -i ./data/sample_1.bam -o ./output/sample_reads -p 1K -r 1000 -s 3 --progress --verbose
## Running Alignment Reads_sample ##
Checking Bam and index file
Load index
Index: 10772it [00:00, 528519.79it/s]
Write sample reads
Sample 1: 100%|██████████████████████| 1000/1000 [00:00<00:00, 1225.20 Reads/s]
Indexing output bam file
Sample 2: 100%|██████████████████████| 1000/1000 [00:00<00:00, 1255.67 Reads/s]
Indexing output bam file
Sample 3: 100%|██████████████████████| 1000/1000 [00:00<00:00, 1225.41 Reads/s]
Indexing output bam file
(pyBioTools)
Filter
Get help
pyBioTools Alignment Filter -h
usage: pyBioTools Alignment Filter [-h] -i INPUT_FN -o OUTPUT_FN [-u] [-s]
[-p] [-t ORIENTATION] [-r MIN_READ_LEN]
[-a MIN_ALIGN_LEN] [-m MIN_MAPQ]
[-f MIN_FREQ_IDENTITY]
[--select_ref [SELECT_REF [SELECT_REF ...]]]
[--exclude_ref [EXCLUDE_REF [EXCLUDE_REF ...]]]
[-v] [-q] [--progress]
optional arguments:
-h, --help show this help message and exit
-i INPUT_FN, --input_fn INPUT_FN
Path to the bam file to filter (required) [str]
-o OUTPUT_FN, --output_fn OUTPUT_FN
Path to the write filtered bam file (required) [str]
-u, --skip_unmapped Filter out unmapped reads (default: False) [None]
-s, --skip_secondary Filter out secondary alignment (default: False) [None]
-p, --skip_supplementary
Filter out supplementary alignment (default: False)
[None]
-t ORIENTATION, --orientation ORIENTATION
Orientation of alignment on reference genome {"+","-"
,"."} (default: .) [str]
-r MIN_READ_LEN, --min_read_len MIN_READ_LEN
Minimal query read length (basecalled length)
(default: 0) [int]
-a MIN_ALIGN_LEN, --min_align_len MIN_ALIGN_LEN
Minimal query alignment length on reference (default:
0) [int]
-m MIN_MAPQ, --min_mapq MIN_MAPQ
Minimal mapping quality score (mapq) (default: 0)
[int]
-f MIN_FREQ_IDENTITY, --min_freq_identity MIN_FREQ_IDENTITY
Minimal frequency of alignment identity [0 to 1]
(default: 0) [float]
--select_ref [SELECT_REF [SELECT_REF ...]]
List of references on which the reads have to be
mapped. (default: None) [str]
--exclude_ref [EXCLUDE_REF [EXCLUDE_REF ...]]
List of references on which the reads should not be
mapped. (default: None) [str]
-v, --verbose Increase verbosity (default: False)
-q, --quiet Reduce verbosity (default: False)
--progress Display a progress bar
(pyBioTools)
Basic usage
pyBioTools Alignment Filter \
-i "./data/sample_1.bam" \
-o "./output/sample_1_filter.bam" \
--skip_unmapped \
--skip_supplementary \
--skip_secondary \
--min_read_len 300 \
--min_align_len 300 \
--orientation "+" \
--min_mapq 10 \
--min_freq_identity 0.8 \
--verbose
## Running Alignment Filter ##
Checking input bam file
Parsing reads
Indexing output bam file
Read counts summary
Reads discarded
total: 9,262
wrong_orientation: 5,291
secondary: 1,496
unmapped: 1,416
low_identity: 510
low_mapping_quality: 283
supplementary: 188
short_alignment: 67
short_read: 11
Reads retained
primary: 4,422
total: 4,422
(pyBioTools)
To_fastq
pyBioTools Alignment To_fastq -h
usage: pyBioTools Alignment To_fastq [-h] -i [INPUT_FN [INPUT_FN ...]] -1
OUTPUT_R1_FN [-2 OUTPUT_R2_FN] [-s] [-v]
[-q] [--progress]
Dump reads from an alignment file or set of alignment file(s) to a fastq or
pair of fastq file(s). Only the primary alignment are kept and paired_end
reads are assumed to be interleaved. Compatible with unmapped or unaligned
alignment files as well as files without header.
optional arguments:
-h, --help show this help message and exit
-i [INPUT_FN [INPUT_FN ...]], --input_fn [INPUT_FN [INPUT_FN ...]]
Path (or list of paths) to input BAM/CRAM/SAM file(s)
(required) [str]
-1 OUTPUT_R1_FN, --output_r1_fn OUTPUT_R1_FN
Path to an output fastq file (for Read1 in paired_end
mode of output_r2_fn is provided). Automatically
gzipped if the .gz extension is found (required) [str]
-2 OUTPUT_R2_FN, --output_r2_fn OUTPUT_R2_FN
Optional Path to an output fastq file. Automatically
gzipped if the .gz extension is found (default: None)
[str]
-s, --ignore_paired_end
Ignore paired_end information and output everything in
a single file. (default: False) [None]
-v, --verbose Increase verbosity (default: False)
-q, --quiet Reduce verbosity (default: False)
--progress Display a progress bar
(pyBioTools)
Single end read usage from bam files
pyBioTools Alignment To_fastq \
-i ./data/sample_1.bam ./data/sample_2.bam\
-1 ./output/sample_1-2_SE_from_bam.fastq.gz \
--verbose \
--progress
## Running Alignment To_fastq ##
[DEBUG]: Opening file ./output/sample_1-2_SE_from_bam.fastq.gz in writing mode
Parsing reads
Reading input file ./data/sample_1.bam
Reading: 12000 Reads [00:15, 753.11 Reads/s]
[DEBUG]: Reached end of input file ./data/sample_1.bam
Reading input file ./data/sample_2.bam
Reading: 12000 Reads [00:18, 664.44 Reads/s]
[DEBUG]: Reached end of input file ./data/sample_2.bam
[DEBUG]: Closing file:./output/sample_1-2_SE_from_bam.fastq.gz
[DEBUG]: Sequences writen: 24000
(pyBioTools)
Paired-end reads usage from unaligned CRAM files
pyBioTools Alignment To_fastq \
-i ./data/sample_1_20k.cram ./data/sample_2_20k.cram \
-1 ./output/sample_1-2_PE_from_CRAM_1.fastq.gz \
-2 ./output/sample_1-2_PE_from_CRAM_2.fastq.gz \
--verbose \
--progress
## Running Alignment To_fastq ##
[DEBUG]: Opening file ./output/sample_1-2_PE_from_CRAM_1.fastq.gz in writing mode
[DEBUG]: Opening file ./output/sample_1-2_PE_from_CRAM_2.fastq.gz in writing mode
Parsing reads
Reading input file ./data/sample_1_20k.cram
[E::cram_index_load] Could not retrieve index file for './data/sample_1_20k.cram'
Reading: 12000 Reads [00:03, 3594.10 Reads/s]
[DEBUG]: Reached end of input file ./data/sample_1_20k.cram
Reading input file ./data/sample_2_20k.cram
[E::cram_index_load] Could not retrieve index file for './data/sample_2_20k.cram'
Reading: 12000 Reads [00:03, 3628.22 Reads/s]
[DEBUG]: Reached end of input file ./data/sample_2_20k.cram
[DEBUG]: Closing file:./output/sample_1-2_PE_from_CRAM_1.fastq.gz
[DEBUG]: Sequences writen: 24000
[DEBUG]: Closing file:./output/sample_1-2_PE_from_CRAM_2.fastq.gz
[DEBUG]: Sequences writen: 24000
(pyBioTools)
Split
pyBioTools Alignment Split -h
usage: pyBioTools Alignment Split [-h] -i INPUT_FN [-o OUTPUT_DIR]
[-n N_FILES]
[-l [OUTPUT_FN_LIST [OUTPUT_FN_LIST ...]]]
[-x] [-v] [-q] [--progress]
Split reads in a bam file in N files. The input bam file has to be sorted by
coordinates and indexed. The last file can contain a few extra reads.
optional arguments:
-h, --help show this help message and exit
-i INPUT_FN, --input_fn INPUT_FN
Path to the bam file to filter (required) [str]
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Path to the directory where to write split bam files.
Files generated have the same basename as the source
file and are suffixed with numbers starting from 0
(default: None) [str]
-n N_FILES, --n_files N_FILES
Number of file to split the original file into
(default: 10) [int]
-l [OUTPUT_FN_LIST [OUTPUT_FN_LIST ...]], --output_fn_list [OUTPUT_FN_LIST [OUTPUT_FN_LIST ...]]
As an alternative to output_dir and n_files one can
instead give a list of output files. Reads will be
automatically split between the files in the same
order as given (default: None) [str]
-x, --index Index output BAM files (default: False) [None]
-v, --verbose Increase verbosity (default: False)
-q, --quiet Reduce verbosity (default: False)
--progress Display a progress bar
(pyBioTools)
Basic usage with an output folder
pyBioTools Alignment Split \
-i "./data/sample_1.bam" \
-o "./output/split_bam" \
-n 5 \
--verbose
ll "./output/split_bam"
## Running Alignment Split ##
Checking input bam file
[DEBUG]: List of output files to generate:
[DEBUG]: * ./output/split_bam/sample_1_0.bam
[DEBUG]: * ./output/split_bam/sample_1_1.bam
[DEBUG]: * ./output/split_bam/sample_1_2.bam
[DEBUG]: * ./output/split_bam/sample_1_3.bam
[DEBUG]: * ./output/split_bam/sample_1_4.bam
Parsing reads
[DEBUG]: Counting reads
[DEBUG]: Open ouput file './output/split_bam/sample_1_0.bam'
[DEBUG]: Close output file './output/split_bam/sample_1_0.bam'
[DEBUG]: Reads written: 2,736
[DEBUG]: Open ouput file './output/split_bam/sample_1_1.bam'
[DEBUG]: Close output file './output/split_bam/sample_1_1.bam'
[DEBUG]: Reads written: 2,736
[DEBUG]: Open ouput file './output/split_bam/sample_1_2.bam'
[DEBUG]: Close output file './output/split_bam/sample_1_2.bam'
[DEBUG]: Reads written: 2,736
[DEBUG]: Open ouput file './output/split_bam/sample_1_3.bam'
[DEBUG]: Close output file './output/split_bam/sample_1_3.bam'
[DEBUG]: Reads written: 2,736
[DEBUG]: Open ouput file './output/split_bam/sample_1_4.bam'
[DEBUG]: Reached end of input file
[DEBUG]: Close output file './output/split_bam/sample_1_4.bam'
[DEBUG]: Reads written: 2,740
Read counts summary
Reads from index: 13,684
Reads writen: 13,684
Reads per file: 2,736
(pyBioTools) (pyBioTools) total 38M
-rw-rw-r-- 1 aleg aleg 8.4M Jan 19 14:57 sample_1_0.bam
-rw-rw-r-- 1 aleg aleg 8.4M Jan 19 14:57 sample_1_1.bam
-rw-rw-r-- 1 aleg aleg 8.5M Jan 19 14:57 sample_1_2.bam
-rw-rw-r-- 1 aleg aleg 8.5M Jan 19 14:57 sample_1_3.bam
-rw-rw-r-- 1 aleg aleg 3.5M Jan 19 14:57 sample_1_4.bam
(pyBioTools)
Basic usage with named output files
pyBioTools Alignment Split \
-i "./data/sample_1.bam" \
-l "./output/split_bam_2/f1.bam" "./output/split_bam_2/f2.bam" "./output/split_bam_2/f3.bam" "./output/split_bam_2/f4.bam" \
--verbose \
--index
ll "./output/split_bam_2"
## Running Alignment Split ##
Checking input bam file
[DEBUG]: List of output files to generate:
[DEBUG]: * ./output/split_bam_2/f1.bam
[DEBUG]: * ./output/split_bam_2/f2.bam
[DEBUG]: * ./output/split_bam_2/f3.bam
[DEBUG]: * ./output/split_bam_2/f4.bam
Parsing reads
[DEBUG]: Counting reads
[DEBUG]: Open ouput file './output/split_bam_2/f1.bam'
[DEBUG]: Close output file './output/split_bam_2/f1.bam'
[DEBUG]: Reads written: 3,421
[DEBUG]: index output file './output/split_bam_2/f1.bam'
[DEBUG]: Open ouput file './output/split_bam_2/f2.bam'
[DEBUG]: Close output file './output/split_bam_2/f2.bam'
[DEBUG]: Reads written: 3,421
[DEBUG]: index output file './output/split_bam_2/f2.bam'
[DEBUG]: Open ouput file './output/split_bam_2/f3.bam'
[DEBUG]: Close output file './output/split_bam_2/f3.bam'
[DEBUG]: Reads written: 3,421
[DEBUG]: index output file './output/split_bam_2/f3.bam'
[DEBUG]: Open ouput file './output/split_bam_2/f4.bam'
[DEBUG]: Reached end of input file
[DEBUG]: Close output file './output/split_bam_2/f4.bam'
[DEBUG]: Reads written: 3,421
[DEBUG]: index output file './output/split_bam_2/f4.bam'
Read counts summary
Reads from index: 13,684
Reads writen: 13,684
Reads per file: 3,421
(pyBioTools) (pyBioTools) total 38M
-rw-rw-r-- 1 aleg aleg 11M Jan 19 14:57 f1.bam
-rw-rw-r-- 1 aleg aleg 6.2K Jan 19 14:57 f1.bam.bai
-rw-rw-r-- 1 aleg aleg 12M Jan 19 14:57 f2.bam
-rw-rw-r-- 1 aleg aleg 7.4K Jan 19 14:57 f2.bam.bai
-rw-rw-r-- 1 aleg aleg 9.4M Jan 19 14:57 f3.bam
-rw-rw-r-- 1 aleg aleg 5.0K Jan 19 14:57 f3.bam.bai
-rw-rw-r-- 1 aleg aleg 5.6M Jan 19 14:57 f4.bam
-rw-rw-r-- 1 aleg aleg 2.5K Jan 19 14:57 f4.bam.bai
(pyBioTools)