SCOTCH: isoform-level characterization of gene expression through long-read single-cell RNA sequencing.
Zhuoran Xu, Hui-Qi Qu, Joe Chan, Shizhuo Mu, Charlly Kao, Hakon Hakonarson, Kai Wang
Abstract
Recent advances in long-read single-cell transcriptome sequencing (lr-scRNA-Seq) enable full-length isoform profiling at single-cell resolution. We present SCOTCH (Single-Cell Omics for Transcriptome CHaracterization), an end-to-end, platform-independent pipeline for isoform characterization from lr-scRNA-Seq data, supporting Nanopore and PacBio sequencing as well as 10X Genomics and Parse Biosciences protocols. SCOTCH models isoforms as combinations of non-overlapping sub-exons and applies dynamic thresholding for robust isoform assignment while efficiently address ambiguous mapping issues. By refining sub-exon boundaries through integration of read coverage with existing annotations and applying an iterative clustering strategy to reconstruct novel transcripts, SCOTCH reliably recovers more true novel isoforms than existing splice-graph-based methods, with poly(A)-aware filtering further reducing false-positive structures. Extensive simulations demonstrate improved quantification of known isoforms and enhanced reconstruction of novel isoforms. Analyses of human blood and cerebral organoid datasets across multiple platforms further confirm SCOTCH's ability to resolve cell-type-specific transcriptome profiles and uncover experimentally supported novel isoforms.