NanoCount Input / Output
Input BAM file
NanoCount is meant to be used with Oxford Nanopore direct-RNA sequencing datasets only.
Reads should be aligned to a transcriptome reference using minimap2. We recommend using the -N 10
option to retain at least 10 secondary mappings. For highly repetitive transcriptomes, this value can even be increased.
Since we use a transcriptome reference the alignment algorithm does not have to be splice aware.
Nanocount can take either BAM or SAM format and does not require reads to be sorted, although sorting the reads with samtools will make secondary sampling deterministic, so it is recommended.
Here is an example minimap2 command line with optional conversion to BAM format with samtools then sorting and indexing with samtools.
minimap2 -t 8 -ax map-ont -N 10 ./data/yeast_ref.fa.gz ./data/yeast_reads.fastq.gz | samtools view -bh > ./output/aligned_reads.bam
samtools sort -o ./output/aligned_sorted_reads.bam ./output/aligned_reads.bam
samtools index ./output/aligned_sorted_reads.bam
Output TSV file
NanoCount returns a file containing count data per transcripts.
By default only transcripts with at least one read mapped are included in the report.
This behaviour can be changed to include all transcripts with the option extra_tx_info
.
Here is a example tabulated count file returned by NanoCount:
transcript_name raw est_count tpm transcript_length
YHR174W_mRNA 0.5847481080119605 51228.61224671184 584748.1080119605 1314
YGR192C_mRNA 0.015286737423038144 1339.2404921575258 15286.737423038145 999
YGR254W_mRNA 0.011624369387633806 1018.3877533118225 11624.369387633806 1314
YLR110C_mRNA 0.00945119167199341 827.9999999999986 9451.19167199341 402
YJR009C_mRNA 0.009088112600958011 796.1913687447295 9088.112600958011 999
YOL086C_mRNA 0.008275499954342055 724.9999999999987 8275.499954342056 1047
YKL060C_mRNA 0.006597570998082356 577.9999999999991 6597.570998082356 1080
YBR118W_mRNA 0.005222125833257228 457.49999999999926 5222.125833257228 1377
YPR080W_mRNA 0.005222125833257228 457.49999999999926 5222.125833257228 1377
Description of fields
- transcript_name : Transcript name as in source Bam/Sam file.
- raw: Raw abundance estimates. The sum of all abundance values is 1.
- est_count: Estimated counts obtained by multiplying the raw abundance by the number of primary alignments.
- tpm: Estimated counts obtained by multiplying the raw abundance by 1M.
- transcript_length: Optional column included with the option
extra_tx_info
.
tpm and estimated counts are not normalised by transcript length as it is usually done with Illumina data. The reason is that in dRNA-Seq one read is supposed to represent a single transcript molecule starting from the polyA tail, even if the fragment doesn't extend to the 5' end. If using a custom protocol allowing to sequence from internal RNA fragments (whole RNA tailing, degenerated custom adapter), then this prior is not verified any more.
Output BAM file
Optionally, users can choose to dump the alignments selected by Nanocount for the transcripts estimate step, for QC or visualisation purpose. The alignments are written in the same order as the source BAM file.