Poster Presentation 45th Lorne Genome Conference 2024

Machine learning enables pan-cancer identification of mutational hotspots at persistent CTCF binding sites. (#251)

Wenhan Chen 1 , Joanna Achinger-Kawecka 1 , Amanda Khoury 1 , Susan Clark 1
  1. Garvan Institute of Medical Research, Darlinghurst, NSW, Australia

CCCTC-Binding Factor (CTCF) is versatile insulator protein that binds to a highly conserved

DNA motif and facilitates the regulation of three-dimensional (3D) nuclear architecture and

transcription. CTCF binding sites (CTCF-BSs) are among the non-coding DNA sequences

that are frequently mutated in cancer. However, the features of CTCF-BS DNA mutations are

still largely unexplored.

Our previous study identified a small subclass of CTCF-BSs that are resistant to CTCF knock

down, termed here as persistent CTCF binding sites (P-CTCF-BSs). P-CTCF-BSs show high

binding conservation and potentially regulate cell-type constitutive 3D chromatin

architecture. Here, using ICGC patient data we made the striking observation that P-CTCFBSs

display a highly elevated mutation rate in breast and prostate cancer when compared to

all CTCF-BSs.

To further characterise the P-CTCF-BSs mutational burden in other cell-types, we developed

CTCF-INSITE – a tool utilizing machine learning to predict persistence based on genetic and

epigenetic features of experimentally-determined P-CTCF-BSs. The P-CTCF-BSs predicted

in all twelve cancer-types tested, show a significantly elevated mutational burden compared

to all other CTCF-BSs. The enrichment is even stronger for mutations with functional impact

to CTCF binding and chromatin looping. Furthermore, using the mutational pattern at the

core binding motif, we found the 12 cancers can be classified into groups, likely to reflect the

unique cancer aetiologies.

Together this study reveals a new class of cancer specific CTCF-BS DNA mutations and

provides insights into their importance in genome organisation in a pan-cancer setting.