New tool to speed up translation of genome sequences

L-R: Monash BDI collaborators Dr Jiangning Song, Professor Roger Daly, Professor Ian Smith and Fuyi Li.

In this post-genomic, big data era, genome-sequencing projects are generating an avalanche of biological sequences – proteins, peptides, DNAs, and RNAs. One of the major challenges in computational biology is how to represent each of these genome sequences accurately and yet simply, retaining all key information. Simple genome sequences allow computational biologists to construct machine learning (ML) models that can convey important biological information to researchers in a clear, effective manner.

An international research team, co-led by Monash Biomedicine Discovery Institute’s (BDI) Dr Jiangning Song and Professor Roger Daly, and Gordon Life Science Institute’s Professor Kuo-Chen Chou, has developed an answer to this challenge, called iFeature.

iFeature is a powerful and flexible computer program and web server. It can generate a large variety of numerical schemes representing measurable biological features of protein and peptide sequences. Such distinctive sequence attributes are extremely important, as they provide the information with which ML models are usually constructed, to address questions in bioinformatics, proteomics and genomics, as well as conduct proteome and genome analysis.

The highly collaborative work was published in Bioinformatics, one of the two top journals in the field of bioinformatics and computational biology on 8 March.

Dr Song said he was proud to see that the international collaboration yielded a powerful new tool that is publicly available to the wider research community.

“iFeature can significantly speed up the construction of ML models in important basic and clinical research areas,” Dr Song said.

“I like to think of genome sequences as a language that has its own grammar and vocabulary. We only need to decipher their meaning. iFeature is an important step toward this goal by automatically generating the ‘vocabulary’ represented by sequences, thereby allowing us to classify and annotate them using ML techniques,” he said.

Co-lead author and Head of the Monash BDI Cancer Program, Professor Roger Daly spoke of the impact this tool will have on cancer research.

iFeature represents a major step forward in our ability to interrogate the complex patterns of gene mutations found in human cancers, and prioritize ‘driver’ mutations over ‘passenger’ mutations that do not have a functional consequence,” Professor Daly said.

“This will aid our understanding of cancer biology, and help identify therapeutic targets and biomarkers.”

Collaborator Professor Ian Smith, Vice-Provost (Research and Research Infrastructure) of Monash University and Principal Research Fellow of the Monash BDI, expanded on the success of the extensive collaboration.

“This work exemplifies the value of international, cross-institutional and multidisciplinary collaboration and also embraces and demonstrates the power that artificial intelligence/machine learning is bringing to biomedical sciences,” Professor Smith said.

Another Monash collaborator, Professor Geoff Webb, Director of the Monash Centre for Data Science, wants the tool to enable researchers to work smarter, not harder.

“Our ambition is to develop data-driven techniques to get the most of the hidden information from the data,” Professor Webb said.

“This new tool automates the translating process for biological sequences, and will substantially reduce the resources required to create new and innovative artificial intelligence tools for biological research,” he added.

Dr Song, Professor Daly and Professor Webb have ambitious plans to build upon the foundation laid by this bioinformatics tool. They plan to combine it with cutting-edge ML algorithms to develop more powerful artificial intelligence systems that can perform accurate, automatic classification of human germline (inherited) variants and somatic (cancer-causing) mutations from background mutations. Further research into this area will increase understanding of causal relationships between genome information and functional phenotypes.

Professor John Carroll, Director of the Monash BDI, spoke of the importance of collaboration in research.

“One of the core strengths of the Monash BDI is the ability to bring teams together to solve important problems in biomedicine. It is exciting to see Dr Song and his international team of collaborators developing bioinformatics tools that will help researchers all over the world," Professor Carroll said.

Collaborators

  • Dr Jiangning Song, Professor Roger Daly, Professor Ian Smith and Fuyi Li from the Monash BDI
  • Professor Kuo-Chen Chou from the Gordon Life Science Institute
  • Professor Geoff Webb from the Monash Centre of Data Science
  • Associate Professor Tatiana Marquez-Lago and Assistant Professor Andre Leier from the University of Alabama at Birmingham
  • Associate Professor Zhen Chen, Qingdao University
  • Dr Pei Zhao, Chinese Academy of Agricultural Sciences
  • Yanan Wang, Shanghai Jiao Tong University

This research was supported by the Australian National Health and Medical Research Council, the Australian Research Council, the National Institutes of Health and a Major Inter-Disciplinary Research project grant awarded by Monash University.

Click here to access iFeature.

Read the full paper in Bioinformatics, titled iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences