Bayesian and Statistical Machine Learning research projects

AIDE Ambulance Vic - BCV Innovation Fund - Artificial Intelligence in carDiac arrEst (AIDE)

(Victorian Department of Health and Human Services, 2020-2021)

Project lead: Prof. Wray Buntine


BARD: Bayesian Argumentation via Delphi

Project lead: Prof. Kevin Korb

Analysts are usually asked to evaluate and assess complex situations and provide justifications to progress, or not. There is a scientific way to make these assessments- its called BARD.  Monash Data Science researchers have developed BARD to improve the core process of intelligence analysis: making well-reasoned inferences from incomplete information.

Find more information on the BARD project, watch the introductory video or read the full project report.


Nonparametric Bayesian Machine Learning for Modern Data Analytics

(ARC DP, 2016-2019)

Project lead: Prof. Dinh Phung

We are developing next generation machine learning methods to cope with the data deluge. Our intended outcomes include a new Bayesian nonparametric method that can express arbitrary dependency amongst multiple, heterogeneous data sour sources with infinite model complexity, together with algorithms to perform inference and deduce knowledge from them. An additional outcome is the new Bayesian statistical inference for set-valued random variables that moves beyond vectors and matrices to enrich our analytics toolbox to deal with sets, and a new deterministic fast inference to meet with real-world demand.


Stay well: Analysing lifestyle data from smart monitoring devices

(ARC DP, 2015-2019)

Project lead: Prof. Dinh Phung

Modern data analytics tasks need to interpret and derive values from complex, growing data. This project aims to develop next generation machine learning methods to cope with the data deluge. Intended outcomes include: new Bayesian nonparametric methods that can express arbitrary dependency amongst multiple, heterogeneous data sources with infinite model complexity, together with algorithms to perform inference and deduce knowledge from them; new Bayesian statistical inference for set-valued random variables that moves beyond vectors and matrices to enrich our analytics toolbox to deal with sets; and a new deterministic fast inference to meet with real world demand.


Bayesian Learning with Unbounded Capacity from Heterogenous and Set-Valued Data

(AOARD, 2016-2018)

Project lead: Prof. Dinh Phung

Large-scale and modern datasets have reshaped machine learning research and practices. They are not only bigger in size, but predominantly heterogeneous and growing in their complexity. This project aims to advance machine learning methods grounded in the theory of recent Bayesian nonparametric to deal with growing complexity and heterogeneity of large-scale data. The proposal is unique in its approach to deliver three new bodies of theory and techniques for:

  • Bayesian nonparametric methods that can express and inference from heterogeneous, set-valued data sources with infinite model capacity
  • New framework for deterministic fast inference based on small-variance asymptotic analysis (SVAA) and Wasserstein geometry
  • New applications in pervasive healthcare and exploiting electronic medical records (EMR) data.

Target-agnostic analytics: Building agile predictive models for big data

(ARC DP, 2019-2021)

Project lead: Prof. Geoff Webb, Prof. Wray Buntine and Dr. François Petitjean

This project investigates technologies to predict any unobserved variables in a system. Government and business collect vast quantities of data, but these are wasted if we cannot use them to predict the future from the past. Presently, big-data analytics is effective at predicting a single pre-defined target variable, yet in many applications, what we know about a system and what we want to find out are far more complex, and change depending on the context. This Project will yield novel target-agnostic technologies with associated publications and open-source software. It will expand the capabilities of machine learning, providing better use of the massive data assets collected across most public, commercial and industry sectors.