Poster Presentation 45th Lorne Genome Conference 2024

Squigulator: simulation of nanopore sequencing signal data with tunable noise parameters (#271)

Hasindu Gamaarachchi 1 2 , James M Ferguson 1 , Hiruna Samarakoon 1 2 , Kisaru Liyanage 1 2 , Ira W Deveson 1 3
  1. Garvan Insitute of Medical Research, Darlinghurst, NSW, Australia
  2. School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
  3. St Vincent’s Clinical School, University of New South Wales, Sydney, NSW, Australia

In silico simulation of next-generation sequencing data is a technique used widely in the genomics field. However, there is currently a lack of optimal tools for creating simulated data from ‘third-generation’ nanopore sequencing devices, which measure DNA or RNA molecules in the form of time-series current signal data. Here, we introduce Squigulator, a fast and simple tool for simulation of realistic nanopore signal data. Squigulator takes a reference genome, transcriptome or read sequences and generates corresponding raw nanopore signal data. This is compatible with basecalling software from Oxford Nanopore Technologies (ONT) and other third-party tools, thereby providing a useful substrate for testing, debugging, validation and optimisation of nanopore analysis methods. The user may generate noise-free ‘ideal’ data, realistic data with noise profiles emulating specific ONT protocols, or they may deterministically modify noise parameters and other variables to shape the data to their needs. To highlight its utility, we use Squigulator to model the degree to which different types of noise impact the accuracy of ONT basecalling and downstream variant detection, revealing new insights into the properties of ONT data. We provide Squigulator as an open-source tool for the nanopore community: https://github.com/hasindu2008/squigulator. The preprint is available at https://www.biorxiv.org/content/10.1101/2023.05.09.539953v1.