Skip to content

Seqsum API Usage

Import package

from pyBioTools import Seqsum
from pyBioTools.common import jhelp, head

Merge

jhelp(Seqsum.Merge)

Merge (input_fn, output_fn, old_filename_synthax, verbose, quiet, progress, kwargs)


  • input_fn (required) [list(str)]

Sequencing summary file path or directory containing Sequencing summary file or list of files, or regex or list of regex. It is quite flexible. Files can also be gzipped

  • output_fn (required) [str]

Destination sequencing summary file. Automatically gzipped if the .gz extension is found

  • old_filename_synthax (default: False) [bool]

Replace the filename_fast5 field by filename as in older versions. Useful for nanopolish index compatibility

  • verbose (default: False) [bool]

  • quiet (default: False) [bool]

  • progress (default: False) [bool]

  • kwargs

Basic usage

Seqsum.Merge (["./data/seqsum_new1.tsv", "./data/seqsum_new2.tsv"], "./output/seqsum_merged_1.tsv", verbose=True)
head ("./output/seqsum_merged_1.tsv", 20)
## Running Seqsum Merge ##
    Parsing reads
    [DEBUG]: Reading file ./data/seqsum_new1.tsv
    [DEBUG]: End of file ./data/seqsum_new1.tsv
    [DEBUG]: Reading file ./data/seqsum_new2.tsv
    [DEBUG]: End of file ./data/seqsum_new2.tsv
ERROR: 9 duplicated read ids in input files
    Read counts summary
     Valid lines: 9
     Files found: 2
     Valid files: 2

Only 10 lines in the file
filename_fastq                 filename_fast5                 read_id                        run_id                         channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563    1   59.920000  1.032000 825        TRUE             59.976250      780                 0.975750          449                      12.316098            0.000000         ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533    1   60.045000  1.475000 1180       FALSE            60.351250      935                 1.168750          534                      4.257434             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 63600975-81d1-4613-bd24-286885 397b0214f8b495477015f88b3c3d7b 2726    1   59.956500  1.429000 1143       TRUE             60.476500      727                 0.909000          372                      13.115662            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 81b3a753-0515-4c7d-ab5b-5e8a05 397b0214f8b495477015f88b3c3d7b 2912    1   60.213000  0.992500 794        TRUE             60.320500      708                 0.885000          326                      9.516800             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 f679b2cc-4b13-413a-bef9-0ca6b6 397b0214f8b495477015f88b3c3d7b 2994    1   60.464000  1.078250 862        TRUE             60.674000      694                 0.868250          336                      14.441545            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 84f25d0b-23d8-4fbd-ac58-a57d72 397b0214f8b495477015f88b3c3d7b 2881    1   60.544250  1.349750 1079       TRUE             60.980500      730                 0.913500          351                      10.576068            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 2563d8ac-4eaa-4db1-b730-9fb83c 397b0214f8b495477015f88b3c3d7b 2580    1   60.139750  1.065750 852        TRUE             60.424750      624                 0.780750          326                      10.828456            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 411f5895-f6ff-46e3-b92d-f99a3d 397b0214f8b495477015f88b3c3d7b 2742    1   60.923750  0.818250 654        TRUE             61.008750      586                 0.733250          333                      9.699732             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 629498ca-9ae3-46af-a195-7645b3 397b0214f8b495477015f88b3c3d7b 1718    1   60.640750  1.332500 1066       TRUE             60.992000      785                 0.981250          453                      13.001200            0.000000         ...


Using a regex instead

Seqsum.Merge (["./data/seqsum*"], "./output/seqsum_merged_2.tsv.gz", verbose=True)
head ("./output/seqsum_merged_2.tsv.gz", 20)
## Running Seqsum Merge ##
    Parsing reads
    [DEBUG]: Reading file ./data/seqsum_new2.tsv
    [DEBUG]: End of file ./data/seqsum_new2.tsv
    [DEBUG]: Reading file ./data/seqsum_new1.tsv
    [DEBUG]: End of file ./data/seqsum_new1.tsv
ERROR: 9 duplicated read ids in input files
    Read counts summary
     Valid lines: 9
     Files found: 2
     Valid files: 2

Only 10 lines in the file
filename_fastq                 filename_fast5                 read_id                        run_id                         channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563    1   59.920000  1.032000 825        TRUE             59.976250      780                 0.975750          449                      12.316098            0.000000         ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533    1   60.045000  1.475000 1180       FALSE            60.351250      935                 1.168750          534                      4.257434             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 63600975-81d1-4613-bd24-286885 397b0214f8b495477015f88b3c3d7b 2726    1   59.956500  1.429000 1143       TRUE             60.476500      727                 0.909000          372                      13.115662            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 81b3a753-0515-4c7d-ab5b-5e8a05 397b0214f8b495477015f88b3c3d7b 2912    1   60.213000  0.992500 794        TRUE             60.320500      708                 0.885000          326                      9.516800             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 f679b2cc-4b13-413a-bef9-0ca6b6 397b0214f8b495477015f88b3c3d7b 2994    1   60.464000  1.078250 862        TRUE             60.674000      694                 0.868250          336                      14.441545            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 84f25d0b-23d8-4fbd-ac58-a57d72 397b0214f8b495477015f88b3c3d7b 2881    1   60.544250  1.349750 1079       TRUE             60.980500      730                 0.913500          351                      10.576068            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 2563d8ac-4eaa-4db1-b730-9fb83c 397b0214f8b495477015f88b3c3d7b 2580    1   60.139750  1.065750 852        TRUE             60.424750      624                 0.780750          326                      10.828456            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 411f5895-f6ff-46e3-b92d-f99a3d 397b0214f8b495477015f88b3c3d7b 2742    1   60.923750  0.818250 654        TRUE             61.008750      586                 0.733250          333                      9.699732             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 629498ca-9ae3-46af-a195-7645b3 397b0214f8b495477015f88b3c3d7b 1718    1   60.640750  1.332500 1066       TRUE             60.992000      785                 0.981250          453                      13.001200            0.000000         ...


Files with non-matching header are skipped

Seqsum.Merge ("./data/*", "./output/seqsum_merged_3.tsv.gz", verbose=True)
head ("./output/seqsum_merged_3.tsv.gz", 20)
## Running Seqsum Merge ##
    Parsing reads
    [DEBUG]: Reading file ./data/seqsum_new2.tsv
    [DEBUG]: End of file ./data/seqsum_new2.tsv
    [DEBUG]: Reading file ./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz
ERROR: Header of file `./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz` is not consistant
    [DEBUG]: Skipping file ./data/Guppy-basecall-1D-DNA_sequencing_summary.txt.gz
    [DEBUG]: Reading file ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz
ERROR: Header of file `./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz` is not consistant
    [DEBUG]: Skipping file ./data/Guppy-2.1.3_basecall-1D-DNA_sequencing_summary.txt.gz
    [DEBUG]: Reading file ./data/seqsum_new1.tsv
    [DEBUG]: End of file ./data/seqsum_new1.tsv
    [DEBUG]: Reading file ./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz
ERROR: Header of file `./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz` is not consistant
    [DEBUG]: Skipping file ./data/Guppy-2.2.4-basecall-1D-DNA_sequencing_summary+barcode.txt.gz
    [DEBUG]: Reading file ./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz
ERROR: Header of file `./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz` is not consistant
    [DEBUG]: Skipping file ./data/Guppy-2.1.3_basecall-1D-RNA_sequencing_summary.txt.gz
ERROR: 9 duplicated read ids in input files
    Read counts summary
     Valid lines: 9
     Files found: 6
     Invalid files: 4
     Valid files: 2

Only 10 lines in the file
filename_fastq                 filename_fast5                 read_id                        run_id                         channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_temp...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 5c068f48-bb83-4458-9aac-49f6e2 397b0214f8b495477015f88b3c3d7b 1563    1   59.920000  1.032000 825        TRUE             59.976250      780                 0.975750          449                      12.316098            0.000000         ...
PAF25252_fail_397b0214_0.fastq PAF25252_fail_397b0214_0.fast5 2e6bdc7f-fdd3-497f-91ea-06ad2b 397b0214f8b495477015f88b3c3d7b 1533    1   60.045000  1.475000 1180       FALSE            60.351250      935                 1.168750          534                      4.257434             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 63600975-81d1-4613-bd24-286885 397b0214f8b495477015f88b3c3d7b 2726    1   59.956500  1.429000 1143       TRUE             60.476500      727                 0.909000          372                      13.115662            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 81b3a753-0515-4c7d-ab5b-5e8a05 397b0214f8b495477015f88b3c3d7b 2912    1   60.213000  0.992500 794        TRUE             60.320500      708                 0.885000          326                      9.516800             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 f679b2cc-4b13-413a-bef9-0ca6b6 397b0214f8b495477015f88b3c3d7b 2994    1   60.464000  1.078250 862        TRUE             60.674000      694                 0.868250          336                      14.441545            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 84f25d0b-23d8-4fbd-ac58-a57d72 397b0214f8b495477015f88b3c3d7b 2881    1   60.544250  1.349750 1079       TRUE             60.980500      730                 0.913500          351                      10.576068            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 2563d8ac-4eaa-4db1-b730-9fb83c 397b0214f8b495477015f88b3c3d7b 2580    1   60.139750  1.065750 852        TRUE             60.424750      624                 0.780750          326                      10.828456            0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 411f5895-f6ff-46e3-b92d-f99a3d 397b0214f8b495477015f88b3c3d7b 2742    1   60.923750  0.818250 654        TRUE             61.008750      586                 0.733250          333                      9.699732             0.000000         ...
PAF25252_pass_397b0214_0.fastq PAF25252_pass_397b0214_0.fast5 629498ca-9ae3-46af-a195-7645b3 397b0214f8b495477015f88b3c3d7b 1718    1   60.640750  1.332500 1066       TRUE             60.992000      785                 0.981250          453                      13.001200            0.000000         ...