Seqsum API Usage
Import package
from pyBioTools import Seqsum
from pyBioTools.common import jhelp, head
Merge
jhelp(Seqsum.Merge)
Merge (input_fn, output_fn, old_filename_synthax, verbose, quiet, progress, kwargs)
- input_fn (required) [list(str)]
Sequencing summary file path or directory containing Sequencing summary file or list of files, or regex or list of regex. It is quite flexible. Files can also be gzipped
- output_fn (required) [str]
Destination sequencing summary file. Automatically gzipped if the .gz extension is found
- old_filename_synthax (default: False) [bool]
Replace the filename_fast5
field by filename
as in older versions. Useful for nanopolish index compatibility
verbose (default: False) [bool]
quiet (default: False) [bool]
progress (default: False) [bool]
kwargs
Basic usage
Seqsum.Merge (["./data/seqsum_new1.tsv", "./data/seqsum_new2.tsv"], "./output/seqsum_merged_1.tsv", verbose=True)
head ("./output/seqsum_merged_1.tsv", 20)
## Running Seqsum Merge ##
Parsing reads
[DEBUG]: Reading file ./data/seqsum_new1.tsv
[DEBUG]: End of file ./data/seqsum_new1.tsv
[DEBUG]: Reading file ./data/seqsum_new2.tsv
[DEBUG]: End of file ./data/seqsum_new2.tsv
ERROR: 9 duplicated read ids in input files
Read counts summary
Valid lines: 9
Files found: 2
Valid files: 2
Only 10 lines in the file
filename_fastq filename_fast5 read_id run_id channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563 1 59.920000 1.032000 825 TRUE 59.976250 780 0.975750 449 12.316098 0.000000 ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533 1 60.045000 1.475000 1180 FALSE 60.351250 935 1.168750 534 4.257434 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 63600975-81d1-4613-bd24-286885 397b0214f8b495477015f88b3c3d7b 2726 1 59.956500 1.429000 1143 TRUE 60.476500 727 0.909000 372 13.115662 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 81b3a753-0515-4c7d-ab5b-5e8a05 397b0214f8b495477015f88b3c3d7b 2912 1 60.213000 0.992500 794 TRUE 60.320500 708 0.885000 326 9.516800 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 f679b2cc-4b13-413a-bef9-0ca6b6 397b0214f8b495477015f88b3c3d7b 2994 1 60.464000 1.078250 862 TRUE 60.674000 694 0.868250 336 14.441545 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 84f25d0b-23d8-4fbd-ac58-a57d72 397b0214f8b495477015f88b3c3d7b 2881 1 60.544250 1.349750 1079 TRUE 60.980500 730 0.913500 351 10.576068 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 2563d8ac-4eaa-4db1-b730-9fb83c 397b0214f8b495477015f88b3c3d7b 2580 1 60.139750 1.065750 852 TRUE 60.424750 624 0.780750 326 10.828456 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 411f5895-f6ff-46e3-b92d-f99a3d 397b0214f8b495477015f88b3c3d7b 2742 1 60.923750 0.818250 654 TRUE 61.008750 586 0.733250 333 9.699732 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 629498ca-9ae3-46af-a195-7645b3 397b0214f8b495477015f88b3c3d7b 1718 1 60.640750 1.332500 1066 TRUE 60.992000 785 0.981250 453 13.001200 0.000000 ...
Using a regex instead
Seqsum.Merge (["./data/seqsum*"], "./output/seqsum_merged_2.tsv.gz", verbose=True)
head ("./output/seqsum_merged_2.tsv.gz", 20)
## Running Seqsum Merge ##
Parsing reads
[DEBUG]: Reading file ./data/seqsum_new2.tsv
[DEBUG]: End of file ./data/seqsum_new2.tsv
[DEBUG]: Reading file ./data/seqsum_new1.tsv
[DEBUG]: End of file ./data/seqsum_new1.tsv
ERROR: 9 duplicated read ids in input files
Read counts summary
Valid lines: 9
Files found: 2
Valid files: 2
Only 10 lines in the file
filename_fastq filename_fast5 read_id run_id channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563 1 59.920000 1.032000 825 TRUE 59.976250 780 0.975750 449 12.316098 0.000000 ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533 1 60.045000 1.475000 1180 FALSE 60.351250 935 1.168750 534 4.257434 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 63600975-81d1-4613-bd24-286885 397b0214f8b495477015f88b3c3d7b 2726 1 59.956500 1.429000 1143 TRUE 60.476500 727 0.909000 372 13.115662 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 81b3a753-0515-4c7d-ab5b-5e8a05 397b0214f8b495477015f88b3c3d7b 2912 1 60.213000 0.992500 794 TRUE 60.320500 708 0.885000 326 9.516800 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 f679b2cc-4b13-413a-bef9-0ca6b6 397b0214f8b495477015f88b3c3d7b 2994 1 60.464000 1.078250 862 TRUE 60.674000 694 0.868250 336 14.441545 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 84f25d0b-23d8-4fbd-ac58-a57d72 397b0214f8b495477015f88b3c3d7b 2881 1 60.544250 1.349750 1079 TRUE 60.980500 730 0.913500 351 10.576068 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 2563d8ac-4eaa-4db1-b730-9fb83c 397b0214f8b495477015f88b3c3d7b 2580 1 60.139750 1.065750 852 TRUE 60.424750 624 0.780750 326 10.828456 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 411f5895-f6ff-46e3-b92d-f99a3d 397b0214f8b495477015f88b3c3d7b 2742 1 60.923750 0.818250 654 TRUE 61.008750 586 0.733250 333 9.699732 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 629498ca-9ae3-46af-a195-7645b3 397b0214f8b495477015f88b3c3d7b 1718 1 60.640750 1.332500 1066 TRUE 60.992000 785 0.981250 453 13.001200 0.000000 ...
Files with non-matching header are skipped
Seqsum.Merge ("./data/*", "./output/seqsum_merged_3.tsv.gz", verbose=True)
head ("./output/seqsum_merged_3.tsv.gz", 20)
## Running Seqsum Merge ##
Parsing reads
[DEBUG]: Reading file ./data/seqsum_new2.tsv
[DEBUG]: End of file ./data/seqsum_new2.tsv
[DEBUG]: Reading file ./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz
ERROR: Header of file `./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz` is not consistant
[DEBUG]: Skipping file ./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz
[DEBUG]: Reading file ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz
ERROR: Header of file `./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz` is not consistant
[DEBUG]: Skipping file ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz
[DEBUG]: Reading file ./data/seqsum_new1.tsv
[DEBUG]: End of file ./data/seqsum_new1.tsv
[DEBUG]: Reading file ./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz
ERROR: Header of file `./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz` is not consistant
[DEBUG]: Skipping file ./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz
[DEBUG]: Reading file ./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz
ERROR: Header of file `./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz` is not consistant
[DEBUG]: Skipping file ./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz
ERROR: 9 duplicated read ids in input files
Read counts summary
Valid lines: 9
Files found: 6
Invalid files: 4
Valid files: 2
Only 10 lines in the file
filename_fastq filename_fast5 read_id run_id channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563 1 59.920000 1.032000 825 TRUE 59.976250 780 0.975750 449 12.316098 0.000000 ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533 1 60.045000 1.475000 1180 FALSE 60.351250 935 1.168750 534 4.257434 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 63600975-81d1-4613-bd24-286885 397b0214f8b495477015f88b3c3d7b 2726 1 59.956500 1.429000 1143 TRUE 60.476500 727 0.909000 372 13.115662 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 81b3a753-0515-4c7d-ab5b-5e8a05 397b0214f8b495477015f88b3c3d7b 2912 1 60.213000 0.992500 794 TRUE 60.320500 708 0.885000 326 9.516800 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 f679b2cc-4b13-413a-bef9-0ca6b6 397b0214f8b495477015f88b3c3d7b 2994 1 60.464000 1.078250 862 TRUE 60.674000 694 0.868250 336 14.441545 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 84f25d0b-23d8-4fbd-ac58-a57d72 397b0214f8b495477015f88b3c3d7b 2881 1 60.544250 1.349750 1079 TRUE 60.980500 730 0.913500 351 10.576068 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 2563d8ac-4eaa-4db1-b730-9fb83c 397b0214f8b495477015f88b3c3d7b 2580 1 60.139750 1.065750 852 TRUE 60.424750 624 0.780750 326 10.828456 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 411f5895-f6ff-46e3-b92d-f99a3d 397b0214f8b495477015f88b3c3d7b 2742 1 60.923750 0.818250 654 TRUE 61.008750 586 0.733250 333 9.699732 0.000000 ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 629498ca-9ae3-46af-a195-7645b3 397b0214f8b495477015f88b3c3d7b 1718 1 60.640750 1.332500 1066 TRUE 60.992000 785 0.981250 453 13.001200 0.000000 ...