Describing (i.e., modelling) real-life optimisation problems is not an easy task, as they are usually large and complex, containing multiple parts and competing objectives. Further, while the same problem can be modelled in many different ways, these differences can significantly affect their solving time.
Our research in modelling has two main aims. The first is to improve the expressivity of modelling languages to make the modelling task easier and faster. The second aim is to improve the power of our analysis tools to help users find the best model for their optimisation problem.
Optimisation solving technology has advanced enormously. And yet, taking advantage of these advances is difficult, as each solving technology (MIP, SAT, SMT, etc) requires problems to be modelled in a particular way. Further, selecting the most suitable technology requires a PhD or two, and plenty of time.
Our research in solving has three main aims. The first is to develop tools that can transform a single problem model into the input required by each solving technology. That way, users can easily take advantage of these technologies and determine experimentally which one is most suitable. The second aim is to determine rigorous, theoretical foundations that can help select the most suitable technology in a more principled way. The third aim is to develop state-of-the-art solving algorithms based on lazy clause generation, efficient hybrids of different approaches, and integration of uncertainty into the solving process.
Bioinformatics is one of of the most important cross-disciplinary areas of modern science: where computing and biology converge. 21st century biology and its applications in medicine are fundamentally data-driven, with computing playing a critical role in rationalizing the every growing streams of biological data, and in developing methodologies for their accurate interpretation and application.
Members of the Optimisation group and the Monash Faculty of IT in general have been actively involved in bioinformatics research since its early days in late 80s. Our current research uses discrete and continuous optimization, statistical learning, information theory and algorithm development to analyse primarily two important data streams: the experimental 3D atomic structural data of proteins, and their 1D amino acid sequence counterpart. Computational analyses on such data are directly used to:
Uncover major principles of protein architecture, function and evolution.
Develop state of the art methodologies for comparing and classifying protein domains.
Further, they fully support research on important applied problems such as predicting protein three-dimensional structures from sequence information, which is essential to design novel drugs and industrial catalysts, and to understand the regulatory cascades underlying many diseases.
Understanding the architectural principles of protein structure: The goal of our ongoing investigation is to unravel the observed repertoire of protein folding patterns and identify a universal set of architectural themes or concepts. Such a set of architectural concepts is central to understand how protein 3D shapes form, how they function and how they evolve. Using statistical learning, information theory, and optimisation our recent work identified a comprehensive dictionary of concepts into which any protein folding patterns can be decomposed. By decomposing the entire world-wide Protein Data Bank (wwPDB) using this dictionary, we are able to understand the constituents of protein folding patterns, establish their functional roles, and reveal patterns of conservation of amino-acid sequences that dictate protein 3D shapes. The insights gained from this investigation is useful for (i) annotation of protein function, (ii) protein engineering, (iii) drug design, and (iv) protein structure prediction. A fully navigable website of our preliminary work is available from: http://lcb.infotech.monash.edu.au/prosodic
Statistical inference of protein structural alignments: Comparing the 3D structures of proteins and establishing their relationship is a computational task that supports many biological studies. In this ARC-funded project (DP150100894) we are investigating statistically rigorous ways to assess and identify biologically meaningful structural relationships between proteins. We recently developed an information-theoretic measure to accurately discriminate between competing structural alignments, and select the best by establishing an objective trade-off between alignment complexity and structural fidelity. Among other outputs, this research has resulted in the program, MMLigner. Unlike existing alignment programs, this program is able to infer closely-competing and statistically-significant structural alignments between proteins, instead of reporting just the single best -- the identification of meaningful, closely-competing alignments is an extremely challenging optimization problem, especially when comparing protein oligomers and complexes. MMLigner is open-source C++ program, freely available from http://lcb.infotech.monash.edu.au/mmligner.
Computational methods to study of protein structure and architecture: We have contributed to the development of theory and practice of algorithms to compare and classify protein structures. These algorithms are fundamental to understand the evolution of protein families, identify patterns of conservation in sequences that fold into similar structures. Some of our popular work that is used by structural biologists and crystallographers world over :
Even when the best model and solving technology are combined, the resulting system is not yet finished. This is because it is often hard for users to understand why the system produced a particular solution to their problem, particularly when they think the solution is not quite right or is inefficient.
To help fill this gap, our group collaborates with the immersive analytics team to develop interactive, collaborative visualisations that can help users understand the reasons behind a given decision, the effects of changes to decisions, and the differences between different proposed solutions. This requires a close connection between the modelling concepts, the visualisation concepts and those use by the users.