Poster Presentation 45th Lorne Genome Conference 2024

Developing computational methods for investigating key components of vertebrate enhancer sequences (#268)

Mia Gruzin 1 2 3 , Kavitha Krishna Sudhakar 1 3 , Ted Wong 1 3 , Leslie Burnett 1 2 3 4
  1. Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia
  2. School of Clinical Medicine, UNSW Medicine and Health, St Vincent’s Clinical Healthcare Campus, Darlinghurst, NSW 2010, Australia
  3. Genium, Potts Point, NSW 2011, Australia
  4. Northern Clinical School, Faculty of Medicine and Health, University of Sydney, St Leonards, NSW 2065, Australia

Introduction:

Enhancers regulate cell- and stage-specific gene expression through direct contact with their target genes. While there already exist enhancer databases for various candidate genes, putative enhancers are typically 100-1000 base pairs in length, and the precise sequences involved in gene contact points remains unclear. We have developed “BEES_KNEES”: Bioinformatic Exploration and Evaluation Suite of Known and New Extended Enhancer Sequences, an automated tool offering evaluation of the location, strength, and likelihood of potential enhancer-promoter interactions for a given gene.

Methodology:

Using a scalable Docker-based automation framework, we have developed a custom modular pipeline housing several algorithms, written in Python and R. Modules already developed include: (i) a metadata module, to collect all relevant information required by downstream modules; (ii) a GeneHancer module, to rank all associated enhancers using GeneHancer data; (iii) a GC profile module, to assess GC content; and (iv) a transient transcriptome sequencing (TT-Seq) module, to map RNA transcriptional activity for evidence of enhancer RNA (eRNA). Modules under development include: (v) evolutionary sequence conservation; (vi) a contact domain module, to find topologically associating domain (TAD) boundaries; and (vii) a “jury” module, to weigh the evidence generated by the other modules. 

Results:

BEES_KNEES successfully identified and ranked enhancer regions in initial validation studies, providing insights into their regulatory potential based on relative locations, GeneHancer data, GC content, and transcriptional activity. BEES_KNEES is computationally parsimonious and offers immediate feedback on potential locations of suspected regulatory regions. Further validation using genes with established regulatory frameworks is needed to confirm its broader applicability. Additional modules can be added to explore which sequences within the enhancer are active. 

Conclusion:

We have developed a novel, automated computational tool to assist in defining key regulatory sequences for eukaryotic genes. Further work is being undertaken to characterise the tool’s performance and potential for wider application.