The influence of sequencing depth disparities on single-cell RNA sequencing (scRNA-seq) data is a recognized confounding factor in downstream analysis. Traditionally, these disparities are addressed through normalization based on total unique molecular identifier (UMI) counts per cell, followed by standardization to a consistent depth using a scaling factor, such as '10k' or counts per million (CPM). Despite the seemingly straightforward nature of scaling data by a constant, our study illustrates the critical role of the scaling factor in mitigating the impact of sequencing depth discrepancies. Additionally, alternative methods, such as sctransform, utilize transformations of Pearson residuals to address sequencing depth normalization.
In our study, we systematically evaluated various scaling normalization methodologies across a range of scaling factors, comparing the outcomes with those obtained using sctransform. Our results, assessed through diverse metrics, reveal an increase in resolution but also a rise in sequencing depth bias as the scaling factor value increases, with the most pronounced changes occurring between the two most commonly used factors (10k and CPM). Notably, we demonstrate that scRNA-seq data effectively becomes binary with scaling factors CPM and greater. Importantly, we show that simpler methodologies, when combined with an appropriate scaling factor, outperform sctransform, effectively reducing the impact of sequencing depth bias.
Our findings highlight the effectiveness of both downsampling each cell to a defined sequencing depth and proportion fitting, or employing additional round normalization post log transformation, to control sequencing depth differences. When sequencing depth has been minimized and applied uniformly to data from a common platform, as is typically the case, these techniques function as efficient batch integrators, performing on par with commonly utilized batch integration methods. Notably, in the context of disease and development, they outperform widely used methods in preserving biological signals.