Poster Presentation 45th Lorne Genome Conference 2024

GenoNet: An Interpretable Biologically Informed Multi-modal Deep Neural Network for Enhanced Copy Number Variants Curation through Pathway and Gene Ontology (#230)

Ivan Bakhshayeshi 1 , Mohammad Mahdi Hosseini 2 , Ahmadreza Dr Argha 3 4 , Nigel Prof Lovell 3 4 , Hamid Dr Alinejad-Rokny 1 4
  1. UNSW BioMedical Machine Learning Lab (BML), the Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, Sydney, NSW, Australia
  2. Remote Internship in UNSW BioMedical Machine Learning Lab, Sydney, NSW, Australia
  3. The Graduate School of Biomedical Engineering, UNSW, Sydney, UNSW, Australia
  4. Tyree Institute of Health Engineering (IHealthE), UNSW, Sydney, NSW, Australia

Developmental disorders (DD) are characterised by diverse phenotypes and complex etiology, with 10-20% of cases linked to Copy number variations (CNVs). However, the precise identification of genes within CNV regions (CNVRs) responsible for these disorders presents a significant challenge, as CNVRs often encompass multiple genes. Integrating machine learning into biomarker workflows shows promise for gene discovery in diseases, nonetheless, existing models often suffer from the lack of interpretability, compounded by limitations in the integration of diverse data modalities.

In this study, we introduce a novel multi-modal deep neural network that combines CNVR curation, tissue-specific genes, biological pathways, and gene ontology, resulting in improved interpretability for identifying genetic biomarkers in developmental disorders. We employ sparse networks enriched with biological pathways from Reactome and KEGG, curated tissue-specific genes from FANTOM5 or GTEX, and gene ontology terms. Notably, we use attention mechanisms to integrate tissue-specific expression data into our gene-centric layer, enhancing both accuracy and interpretability. GenoNet has been trained and tested on a dataset comprising of CNV from 24,105 DD patients and 26,150 normal controls sourced from SFARI and MSSNG consortia.

GenoNet demonstrated superior biomarker discovery performance compared to other approaches, achieving an average test F1-score of 79%, an accuracy of 80%, a precision of 78%, a recall of 81%, and an area under curve (AUC) of 87%. Validation of GenoNet against a permutated control dataset, with randomised sample labels, demonstrated a performance enhancement of nearly 1.5X across key metrics, including accuracy and F1-score. Moreover, the biological interpretability within GenoNet revealed established and novel molecularly altered candidates, such as SHANK3.

In summary, we present GenoNet, an interpretable neural network model informed by pathways, gene ontology, and tissue-specific gene expression. GenoNet empowers preclinical exploration and clinical prognostication for developmental disorders, with potential relevance to a wide spectrum of neurocognitive disorders.