Poster Presentation 45th Lorne Genome Conference 2024

Simulating single-cell ATAC-seq data. (#144)

Sagrika Chugh 1 2 3 , Davis McCarthy 1 2 3 , Heejung Shim 2 3
  1. St Vincent's Institute of Medical Research, Fitzroy, VIC, Australia
  2. School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
  3. Melbourne Integrative Genomics, University of Melbourne, Carlton, VIC, Australia

Single-cell sequencing technologies have evolved and expanded rapidly in recent years, accompanied by the successful development of numerous computational tools. Single-cell Assay for Transposase-Accessible Chromatin sequencing (scATAC-seq) is a powerful method for genome-wide analysis of open (accessible) chromatin regions within individual cells derived from heterogeneous populations. Robust evaluation is critical for the development of scATAC-seq data analysis workflows, which are currently far from stable or mature. Notwithstanding important existing software tools, workflows remain in flux with many open questions regarding the best ways to optimize all elements of data analysis. Simulations are crucial for generating reproducible datasets with known characteristics that enable successful development, testing, and benchmarking of analysis methods. However, current scATAC-seq simulation frameworks do not fully account for the biological and technical characteristics of scATAC-seq data and are incapable of simulating population-scale scATAC-seq data with realistic genetic effects on chromatin accessibility and overall population structure. We present a data generative model and software tool for flexible and reproducible simulation of single-cell ATAC-seq data. Our model enables simulation of datasets that closely resemble real scATAC-seq datasets in library size, cell-sparsity, and chromatin accessibility signals. We are expanding this model to mimic complex batch, cell group, and conditional effects between individuals from different cohorts to simulate population scale scATAC-seq data. The inclusion of genetic effects in our simulation framework will establish this model as a key tool for the simulation of population-scale scATAC-seq data, generally, and specifically, the development of methods for single-cell multi-omic quantitative trait locus mapping.