Poster Presentation 45th Lorne Genome Conference 2024

AI-m6ARS: a comprehensive machine learning tool for m6A methylation site identification (#165)

Korawich Uthayopas 1 2 3 , Alex G. C. de Sá 1 2 3 4 , David B. Ascher 1 2 3 4
  1. School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
  2. Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, Australia
  3. Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
  4. Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria, Australia

N6-Methyladenosine (m6A) is the most common RNA methylation in humans, regulating a wide range of cellular phenomena and presenting associations with various diseases1. The interaction between RNA-binding proteins and an RNA molecule has resulted in the methylation of a particular nucleotide, leading to notable effects on RNA stability, functionality, and cellular localisation. Multiple experiments have been conducted to identify human m6A sites. Nevertheless, an extensive examination across diverse cellular contexts and transcriptomes is laborious and costly, hampering progress in medical applications. Several computational models2,3 have been developed for screening potential m6A RNA methylation sites. However, their predictive capabilities are currently constrained to the utilisation of ineffective features that are unable to capture the hidden information present in methylated sites.

This study introduces AI-m6ARS, an innovative predictive model for an accurate prediction of m6A methylation sites. The AI-m6ARS model integrates four distinct feature sets: (i) one-hot encodings, (ii) iFeatures, (iii) conservation features, and (iv) geographical features. These feature sets are brought to improve the characterisation of methylated sites within DRACH motifs. Comprehensive negative sample selection and feature selection techniques were also performed to enhance the quality of the training set.

AI-m6ARS demonstrates robust predictive performance with an area under the receiver operating characteristic curve of 0.86 on a non-redundant blind test. Consistent results were observed on cross-validation, providing confidence in the robustness and generalisability of AI-m6ARS. The feature importance analysis revealed that the four most important features are geographical features. AI-m6ARS displayed comparable performance to state-of-the-art models, but offered two significant advantages. First, a machine learning pipeline that is both effective and interpretable is employed. Second, AI-m6ARS can also be accessed as a comprehensive web-based platform at https://biosig.lab.uq.edu.au/ai_m6ars. Our web server provides valuable insights into the landscape of m6A RNA methylation sites in the human genome, facilitating advancements in medical applications.

  1. Jiang, X., Liu, B., Nie, Z., Duan, L., Xiong, Q., Jin, Z., Yang, C., & Chen, Y. (2021). The role of M6A modification in the biological functions and diseases. Signal Transduction and Targeted Therapy, 6(1). https://doi.org/10.1038/s41392-020-00450-x
  2. Huang, D., Chen, K., Song, B., Wei, Z., Su, J., Coenen, F., Magalhães, J. P., Rigden, D. J., & Meng, J. (2022). Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation. Nucleic Acids Research, 50(18), 10290–10310. https://doi.org/10.1093/nar/gkac830
  3. El Allali, A., Elhamraoui, Z., & Daoud, R. (2021). Machine learning applications in RNA modification sites prediction. Computational and Structural Biotechnology Journal, 19, 5510–5524. https://doi.org/10.1016/j.csbj.2021.09.025