NanoCount command line usage
Activate virtual environment
conda activate nanocount
(nanocount)
Running NanoCount
NanoCount --help
usage: NanoCount [-h] [--version] -i ALIGNMENT_FILE [-o COUNT_FILE]
[-b FILTER_BAM_OUT] [-l MIN_ALIGNMENT_LENGTH]
[-f MIN_QUERY_FRACTION_ALIGNED] [-s SEC_SCORING_VALUE]
[-t SEC_SCORING_THRESHOLD] [-c CONVERGENCE_TARGET]
[-e MAX_EM_ROUNDS] [-x] [-p PRIMARY_SCORE] [-a]
[-d MAX_DIST_3_PRIME] [-u MAX_DIST_5_PRIME] [-v] [-q]
NanoCount estimates transcripts abundance from Oxford Nanopore *direct-RNA
sequencing* datasets, using an expectation-maximization approach like RSEM,
Kallisto, salmon, etc to handle the uncertainty of multi-mapping reads
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
Input/Output options:
-i ALIGNMENT_FILE, --alignment_file ALIGNMENT_FILE
Sorted and indexed BAM or SAM file containing aligned
ONT dRNA-Seq reads including secondary alignments
(required) [str]
-o COUNT_FILE, --count_file COUNT_FILE
Output file path where to write estimated counts (TSV
format) (default: None) [str]
-b FILTER_BAM_OUT, --filter_bam_out FILTER_BAM_OUT
Optional output file path where to write filtered
reads selected by NanoCount to perform quantification
estimation (BAM format) (default: None) [str]
Misc options:
-l MIN_ALIGNMENT_LENGTH, --min_alignment_length MIN_ALIGNMENT_LENGTH
Minimal length of the alignment to be considered valid
(default: 50) [int]
-f MIN_QUERY_FRACTION_ALIGNED, --min_query_fraction_aligned MIN_QUERY_FRACTION_ALIGNED
Minimal fraction of the primary alignment query
aligned to consider the read valid (default: 0.5)
[float]
-s SEC_SCORING_VALUE, --sec_scoring_value SEC_SCORING_VALUE
Value to use for score thresholding of secondary
alignments either "alignment_score" or
"alignment_length" (default: alignment_score) [str]
-t SEC_SCORING_THRESHOLD, --sec_scoring_threshold SEC_SCORING_THRESHOLD
Fraction of the alignment score or the alignment
length of secondary alignments compared to the primary
alignment to be considered valid alignments (default:
0.95) [float]
-c CONVERGENCE_TARGET, --convergence_target CONVERGENCE_TARGET
Convergence target value of the cummulative difference
between abundance values of successive EM round to
trigger the end of the EM loop. (default: 0.005)
[float]
-e MAX_EM_ROUNDS, --max_em_rounds MAX_EM_ROUNDS
Maximum number of EM rounds before triggering stop
(default: 100) [int]
-x, --extra_tx_info Add transcripts length and zero coverage transcripts
to the output file (required valid bam/sam header)
(default: False) [boolean]
-p PRIMARY_SCORE, --primary_score PRIMARY_SCORE
Method to pick the best alignment for each read. By
default ("alignment_score") uses the best alignment
score (AS optional field), but it can be changed to
use either the primary alignment defined by the
aligner ("primary") or the longest alignment
("alignment_length"). choices = [primary,
alignment_score, alignment_length] (default:
alignment_score) [str]
-a, --keep_suplementary
Retain any supplementary alignments and considered
them like secondary alignments. Discarded by default.
(default: False) [boolean]
-d MAX_DIST_3_PRIME, --max_dist_3_prime MAX_DIST_3_PRIME
Maximum distance of alignment end to 3 prime of
transcript. In ONT dRNA-Seq reads are assumed to start
from the polyA tail (-1 to deactivate) (default: 50)
[int]
-u MAX_DIST_5_PRIME, --max_dist_5_prime MAX_DIST_5_PRIME
Maximum distance of alignment start to 5 prime of
transcript. In conjunction with max_dist_3_prime it
can be used to select near full transcript reads only
(-1 to deactivate). (default: -1) [int]
Verbosity options:
-v, --verbose Increase verbosity for QC and debugging (default:
False) [boolean]
-q, --quiet Reduce verbosity (default: False) [boolean]
(nanocount)
Basic command
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv
head ./output/tx_counts.tsv
## Checking options and input files ##
## Initialise Nanocount ##
Parse Bam file and filter low quality alignments
Summary of alignments parsed in input bam file
Valid alignments: 150,517
Discarded unmapped alignments: 9,545
Discarded alignment with invalid 3 prime end: 6,133
Discarded negative strand alignments: 4,515
Discarded supplementary alignments: 334
Summary of reads filtered
Reads with valid best alignment: 85,908
Invalid secondary alignments: 60,120
Valid secondary alignments: 2,622
Reads with low query fraction aligned: 1,628
Generate initial read/transcript compatibility index
## Start EM abundance estimate ##
Progress: 2.00 rounds [00:00, 7.41 rounds/s]
Exit EM loop after 2 rounds
Convergence value: 0.0019361726963877538
## Summarize data ##
Convert results to dataframe
Compute estimated counts and TPM
Write file
(nanocount) transcript_name raw est_count tpm
YHR174W_mRNA 0.5881056948454584 50522.984032783635 588105.6948454584
YGR192C_mRNA 0.02083282680839274 1789.7064854554035 20832.82680839274
YLR110C_mRNA 0.009591656190343158 824.0 9591.656190343158
YOL086C_mRNA 0.008299576290915864 713.0 8299.576290915864
YKL060C_mRNA 0.006518601294407972 560.0 6518.601294407972
YCR012W_mRNA 0.005412767146249476 464.99999999999994 5412.767146249475
YPR080W_mRNA 0.005255622293616427 451.5 5255.622293616427
YBR118W_mRNA 0.005255622293616427 451.5 5255.622293616427
YKL152C_mRNA 0.005226521394980677 449.0 5226.5213949806775
(nanocount)
Changing default distance to transcripts ends filters
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv --max_dist_3_prime 10 --max_dist_5_prime 10
head ./output/tx_counts.tsv
## Checking options and input files ##
## Initialise Nanocount ##
Parse Bam file and filter low quality alignments
Summary of alignments parsed in input bam file
Valid alignments: 73,329
Discarded alignment with invalid 5 prime end: 44,897
Discarded alignment with invalid 3 prime end: 38,424
Discarded unmapped alignments: 9,545
Discarded negative strand alignments: 4,515
Discarded supplementary alignments: 334
Summary of reads filtered
Reads with valid best alignment: 46,241
Invalid secondary alignments: 25,688
Reads with low query fraction aligned: 687
Valid secondary alignments: 606
Generate initial read/transcript compatibility index
## Start EM abundance estimate ##
Progress: 2.00 rounds [00:00, 13.8 rounds/s]
Exit EM loop after 2 rounds
Convergence value: 0.000702479043822885
## Summarize data ##
Convert results to dataframe
Compute estimated counts and TPM
Write file
(nanocount) transcript_name raw est_count tpm
YHR174W_mRNA 0.6314525433905865 29198.997058924113 631452.5433905865
YGR192C_mRNA 0.02019852511840142 934.0 20198.52511840142
YLR110C_mRNA 0.011461689842347701 530.0 11461.689842347701
YOL086C_mRNA 0.008217815358664388 379.99999999999994 8217.815358664388
YKL152C_mRNA 0.005428083302696741 251.0 5428.083302696741
YKL060C_mRNA 0.005384831642914297 249.0 5384.831642914297
YDL081C_mRNA 0.005125321684219632 237.0 5125.321684219632
YOR369C_mRNA 0.004433295127700526 205.0 4433.295127700526
YDL130W_mRNA 0.004152159339114638 191.99999999999997 4152.159339114638
(nanocount)
Adding extra transcripts information
The extra_tx_info
option adds a columns with the transcript lengths and also includes all the zero-coverage transcripts in the results
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv --extra_tx_info
head ./output/tx_counts.tsv
## Checking options and input files ##
## Initialise Nanocount ##
Parse Bam file and filter low quality alignments
Summary of alignments parsed in input bam file
Valid alignments: 150,517
Discarded unmapped alignments: 9,545
Discarded alignment with invalid 3 prime end: 6,133
Discarded negative strand alignments: 4,515
Discarded supplementary alignments: 334
Summary of reads filtered
Reads with valid best alignment: 85,908
Invalid secondary alignments: 60,120
Valid secondary alignments: 2,622
Reads with low query fraction aligned: 1,628
Generate initial read/transcript compatibility index
## Start EM abundance estimate ##
Progress: 2.00 rounds [00:00, 8.77 rounds/s]
Exit EM loop after 2 rounds
Convergence value: 0.0019361726963877538
## Summarize data ##
Convert results to dataframe
Compute estimated counts and TPM
Write file
(nanocount) transcript_name raw est_count tpm transcript_length
YHR174W_mRNA 0.5881056948454584 50522.984032783635 588105.6948454584 1314
YGR192C_mRNA 0.02083282680839274 1789.7064854554035 20832.82680839274 999
YLR110C_mRNA 0.009591656190343158 824.0 9591.656190343158 402
YOL086C_mRNA 0.008299576290915864 713.0 8299.576290915864 1047
YKL060C_mRNA 0.006518601294407972 560.0 6518.601294407972 1080
YCR012W_mRNA 0.005412767146249476 464.99999999999994 5412.767146249475 1251
YBR118W_mRNA 0.005255622293616427 451.5 5255.622293616427 1377
YPR080W_mRNA 0.005255622293616427 451.5 5255.622293616427 1377
YKL152C_mRNA 0.005226521394980677 449.0 5226.5213949806775 744
(nanocount)
Write selected alignment to BAM file
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv -b ./output/aligned_reads_selected.bam --extra_tx_info
head ./output/tx_counts.tsv
## Checking options and input files ##
## Initialise Nanocount ##
Parse Bam file and filter low quality alignments
Summary of alignments parsed in input bam file
Valid alignments: 150,517
Discarded unmapped alignments: 9,545
Discarded alignment with invalid 3 prime end: 6,133
Discarded negative strand alignments: 4,515
Discarded supplementary alignments: 334
Summary of reads filtered
Reads with valid best alignment: 85,908
Invalid secondary alignments: 60,120
Valid secondary alignments: 2,622
Reads with low query fraction aligned: 1,628
Write selected alignments to BAM file
Summary of alignments written to bam
Alignments to select: 88,530
Alignments written: 88,530
Alignments skipped: 82,514
Generate initial read/transcript compatibility index
## Start EM abundance estimate ##
Progress: 2.00 rounds [00:00, 7.98 rounds/s]
Exit EM loop after 2 rounds
Convergence value: 0.0019361726963877538
## Summarize data ##
Convert results to dataframe
Compute estimated counts and TPM
Write file
(nanocount) transcript_name raw est_count tpm transcript_length
YHR174W_mRNA 0.5881056948454584 50522.984032783635 588105.6948454584 1314
YGR192C_mRNA 0.02083282680839274 1789.7064854554035 20832.82680839274 999
YLR110C_mRNA 0.009591656190343158 824.0 9591.656190343158 402
YOL086C_mRNA 0.008299576290915864 713.0 8299.576290915864 1047
YKL060C_mRNA 0.006518601294407972 560.0 6518.601294407972 1080
YCR012W_mRNA 0.005412767146249476 464.99999999999994 5412.767146249475 1251
YBR118W_mRNA 0.005255622293616427 451.5 5255.622293616427 1377
YPR080W_mRNA 0.005255622293616427 451.5 5255.622293616427 1377
YKL152C_mRNA 0.005226521394980677 449.0 5226.5213949806775 744
(nanocount)
Relaxing the secondary alignment scoring threshold
The default value is 0.95 (95% of the alignment score of the primary alignment) but this value could be lowered to allow more secondary alignments to be included in the uncertainty calculation. Lowering the value bellow 0.75 might not be relevant and will considerably increase the computation time.
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv --sec_scoring_threshold 0.8
head ./output/tx_counts.tsv
## Checking options and input files ##
## Initialise Nanocount ##
Parse Bam file and filter low quality alignments
Summary of alignments parsed in input bam file
Valid alignments: 150,517
Discarded unmapped alignments: 9,545
Discarded alignment with invalid 3 prime end: 6,133
Discarded negative strand alignments: 4,515
Discarded supplementary alignments: 334
Summary of reads filtered
Reads with valid best alignment: 85,908
Valid secondary alignments: 49,092
Invalid secondary alignments: 13,650
Reads with low query fraction aligned: 1,628
Generate initial read/transcript compatibility index
## Start EM abundance estimate ##
Progress: 17.0 rounds [00:02, 7.01 rounds/s]
Exit EM loop after 17 rounds
Convergence value: 0.004795139982321842
## Summarize data ##
Convert results to dataframe
Compute estimated counts and TPM
Write file
(nanocount) transcript_name raw est_count tpm
YHR174W_mRNA 0.5770419415271139 49572.5191127113 577041.9415271139
YGR192C_mRNA 0.014985653368924351 1287.3875096175532 14985.653368924352
YGR254W_mRNA 0.012367659441483866 1062.480887298996 12367.659441483866
YLR110C_mRNA 0.009591656190343162 824.0000000000003 9591.656190343161
YJR009C_mRNA 0.00941808679575318 809.0890004495642 9418.08679575318
YOL086C_mRNA 0.008299576290915867 713.0000000000003 8299.576290915868
YKL060C_mRNA 0.006518601294407974 560.0000000000002 6518.601294407974
YCR012W_mRNA 0.005412767146249479 465.0000000000003 5412.767146249479
YPR080W_mRNA 0.0052556222936164295 451.5000000000002 5255.6222936164295
(nanocount)
verbose mode
Print additional information for QC and debugging
NanoCount -i ./data/aligned_reads_sorted.bam -o ./output/tx_counts.tsv --sec_scoring_threshold 0.8 --verbose
## Checking options and input files ##
[DEBUG]: Options summary
[DEBUG]: Package name: NanoCount
[DEBUG]: Package version: 0.3.0.dev2
[DEBUG]: Timestamp: 2021-09-08 22:54:12.755159
[DEBUG]: alignment_file: ./data/aligned_reads_sorted.bam
[DEBUG]: count_file: ./output/tx_counts.tsv
[DEBUG]: filter_bam_out:
[DEBUG]: min_alignment_length: 50
[DEBUG]: keep_suplementary: False
[DEBUG]: min_query_fraction_aligned: 0.5
[DEBUG]: sec_scoring_threshold: 0.8
[DEBUG]: sec_scoring_value: alignment_score
[DEBUG]: convergence_target: 0.005
[DEBUG]: max_em_rounds: 100
[DEBUG]: extra_tx_info: False
[DEBUG]: primary_score: alignment_score
[DEBUG]: max_dist_3_prime: 50
[DEBUG]: max_dist_5_prime: -1
[DEBUG]: verbose: True
[DEBUG]: quiet: False
## Initialise Nanocount ##
Parse Bam file and filter low quality alignments
Summary of alignments parsed in input bam file
Valid alignments: 150,517
Discarded unmapped alignments: 9,545
Discarded alignment with invalid 3 prime end: 6,133
Discarded negative strand alignments: 4,515
Discarded supplementary alignments: 334
Summary of reads filtered
Reads with valid best alignment: 85,908
Valid secondary alignments: 49,092
Invalid secondary alignments: 13,650
Reads with low query fraction aligned: 1,628
Generate initial read/transcript compatibility index
## Start EM abundance estimate ##
[DEBUG]: EM Round: 1 / Convergence value: 1
[DEBUG]: EM Round: 2 / Convergence value: 0.08982516174030376
[DEBUG]: EM Round: 3 / Convergence value: 0.07275793447585568
[DEBUG]: EM Round: 4 / Convergence value: 0.05953041461618004
[DEBUG]: EM Round: 5 / Convergence value: 0.04879243854714777
[DEBUG]: EM Round: 6 / Convergence value: 0.040022962888262556
[DEBUG]: EM Round: 7 / Convergence value: 0.03285040500110691
[DEBUG]: EM Round: 8 / Convergence value: 0.026980252318091508
[DEBUG]: EM Round: 9 / Convergence value: 0.022174110853707095
[DEBUG]: EM Round: 10 / Convergence value: 0.01823785737980107
[DEBUG]: EM Round: 11 / Convergence value: 0.015013106051349104
[DEBUG]: EM Round: 12 / Convergence value: 0.012370502416389305
[DEBUG]: EM Round: 13 / Convergence value: 0.010204386062917101
[DEBUG]: EM Round: 14 / Convergence value: 0.008428311617153536
[DEBUG]: EM Round: 15 / Convergence value: 0.0069715401043749445
[DEBUG]: EM Round: 16 / Convergence value: 0.005776253476233076
[DEBUG]: EM Round: 17 / Convergence value: 0.004795139982321842
Exit EM loop after 17 rounds
Convergence value: 0.004795139982321842
## Summarize data ##
Convert results to dataframe
Compute estimated counts and TPM
Write file
(nanocount)