CCCTC-Binding Factor (CTCF) is versatile insulator protein that binds to a highly conserved
DNA motif and facilitates the regulation of three-dimensional (3D) nuclear architecture and
transcription. CTCF binding sites (CTCF-BSs) are among the non-coding DNA sequences
that are frequently mutated in cancer. However, the features of CTCF-BS DNA mutations are
still largely unexplored.
Our previous study identified a small subclass of CTCF-BSs that are resistant to CTCF knock
down, termed here as persistent CTCF binding sites (P-CTCF-BSs). P-CTCF-BSs show high
binding conservation and potentially regulate cell-type constitutive 3D chromatin
architecture. Here, using ICGC patient data we made the striking observation that P-CTCFBSs
display a highly elevated mutation rate in breast and prostate cancer when compared to
all CTCF-BSs.
To further characterise the P-CTCF-BSs mutational burden in other cell-types, we developed
CTCF-INSITE – a tool utilizing machine learning to predict persistence based on genetic and
epigenetic features of experimentally-determined P-CTCF-BSs. The P-CTCF-BSs predicted
in all twelve cancer-types tested, show a significantly elevated mutational burden compared
to all other CTCF-BSs. The enrichment is even stronger for mutations with functional impact
to CTCF binding and chromatin looping. Furthermore, using the mutational pattern at the
core binding motif, we found the 12 cancers can be classified into groups, likely to reflect the
unique cancer aetiologies.
Together this study reveals a new class of cancer specific CTCF-BS DNA mutations and
provides insights into their importance in genome organisation in a pan-cancer setting.