NanoCount python API
Activate virtual environment
from NanoCount.NanoCount import NanoCount
from NanoCount.common import jhelp, head
Running NanoCount
jhelp(NanoCount)
NanoCount (alignment_file, count_file, filter_bam_out, min_alignment_length, keep_suplementary, min_query_fraction_aligned, sec_scoring_threshold, sec_scoring_value, convergence_target, max_em_rounds, extra_tx_info, primary_score, max_dist_3_prime, max_dist_5_prime, verbose, quiet)
Estimate abundance of transcripts using an EM
- alignment_file (required) [str]
Sorted and indexed BAM or SAM file containing aligned ONT dRNA-Seq reads including secondary alignments
- count_file (default: "") [str]
Output file path where to write estimated counts (TSV format)
- filter_bam_out (default: "") [str]
Optional output file path where to write filtered reads selected by NanoCount to perform quantification estimation (BAM format)
- min_alignment_length (default: 50) [int]
Minimal length of the alignment to be considered valid
- keep_suplementary (default: False) [bool]
Retain any supplementary alignments and considered them like secondary alignments. Discarded by default.
- min_query_fraction_aligned (default: 0.5) [float]
Minimal fraction of the primary alignment query aligned to consider the read valid
- sec_scoring_threshold (default: 0.95) [float]
Fraction of the alignment score or the alignment length of secondary alignments compared to the primary alignment to be considered valid alignments
- sec_scoring_value (default: alignment_score) [str]
Value to use for score thresholding of secondary alignments either "alignment_score" or "alignment_length"
- convergence_target (default: 0.005) [float]
Convergence target value of the cummulative difference between abundance values of successive EM round to trigger the end of the EM loop.
- max_em_rounds (default: 100) [int]
Maximum number of EM rounds before triggering stop
- extra_tx_info (default: False) [bool]
Add transcripts length and zero coverage transcripts to the output file (required valid bam/sam header)
- primary_score (default: alignment_score) [str]
Method to pick the best alignment for each read. By default ("alignment_score") uses the best alignment score (AS optional field), but it can be changed to use either the primary alignment defined by the aligner ("primary") or the longest alignment ("alignment_length"). choices = [primary, alignment_score, alignment_length]
- max_dist_3_prime (default: 50) [int]
Maximum distance of alignment end to 3 prime of transcript. In ONT dRNA-Seq reads are assumed to start from the polyA tail (-1 to deactivate)
- max_dist_5_prime (default: -1) [int]
Maximum distance of alignment start to 5 prime of transcript. In conjunction with max_dist_3_prime it can be used to select near full transcript reads only (-1 to deactivate).
- verbose (default: False) [bool]
Increase verbosity for QC and debugging
- quiet (default: False) [bool]
Reduce verbosity
Basic command
NanoCount (alignment_file="./data/aligned_reads_sorted.bam", count_file="./output/tx_counts.tsv")
head("./output/tx_counts.tsv")
Using Best Alignment score rather than Primary reads as best reads
NanoCount (alignment_file="./data/aligned_reads_sorted.bam", count_file="./output/tx_counts.tsv", primary_score="align_score")
head("./output/tx_counts.tsv")
Write selected alignment to BAM file
NanoCount (
alignment_file="./data/aligned_reads_sorted.bam",
count_file="./output/tx_counts.tsv",
filter_bam_out = "./output/aligned_reads_selected.bam",
primary_score="align_score")
head("./output/tx_counts.tsv")
Basic command without file writing and Dataframe output
In interactive mode it is also possible not to write the results out but instead to access the data directly as a pandas DataFrame
nc = NanoCount (alignment_file="./data/aligned_reads_sorted.bam")
display(nc.count_df)
Adding extra transcripts information
The extra_tx_info
option adds a columns with the transcript lengths and also includes all the zero-coverage transcripts in the results
nc = NanoCount (alignment_file="./data/aligned_reads_sorted.bam", extra_tx_info=True)
display(nc.count_df)
Relaxing the secondary alignment scoring threshold
The default value is 0.95 (95% of the alignment score of the primary alignment) but this value could be lowered to allow more secondary alignments to be included in the uncertainty calculation. Lowering the value bellow 0.75 might not be relevant and will considerably increase the computation time.
NanoCount (alignment_file="./data/aligned_reads_sorted.bam", count_file="./output/tx_counts.tsv", sec_scoring_threshold=0.8, extra_tx_info=True)
head("./output/tx_counts.tsv")