Oral Presentation 45th Lorne Genome Conference 2024

Machine learning enables pan-cancer identification of persistent CTCF binding sites as mutational hotspots (#34)

Wenhan Chen 1 , Joanna Achinger-Kawecka 1 , Amanda Khoury 1 , Susan J. Clark 1
  1. Garvan Institute of Medical Research, Darlinghurst, NSW, Australia

CCCTC-Binding Factor (CTCF) is crucial in shaping 3D nuclear architecture and transcription regulation, and mutations at its DNA binding sites (CTCF-BSs) have been implicated in cancer development and progression. While numerous CTCF-BSs have been identified and characterized into classes, the differential involvement of each class in cancer remains unclear.

Our study looked into a subclass of CTCF-BSs, termed persistent CTCF binding sites (P-CTCF-BSs), identified to resist CTCF knockdown in prostate and breast cancer cell lines. We uncovered a markedly elevated mutation rate in these P-CTCF-BSs in both cancer types. To extend our investigation to other cancers, we introduced CTCF-INSITE, a machine learning tool designed to predict P-CTCF-BSs by analyzing intricate genetic and epigenetic features. Application of CTCF-INSITE to twelve cancer types revealed a consistent increase in the mutational burden at P-CTCF-BSs compared to other CTCF-BSs. This enrichment was amplified for mutations that affect CTCF binding and chromatin looping. Intriguingly, distinct mutational patterns at the core binding motif of P-CTCF-BSs enabled the classification of the 12 cancer types into discrete groups, suggesting a correlation with unique cancer etiologies. Together this study reveals the significant role of P-CTCF-BSs mutations in cancer and provides insights into their importance in genome organisation in a pan-cancer setting.