Showcase software


Hierarchical non-parametric topic models developed in C and running on multi-core systems. Uses a high-performance implementation of Dirichlet process and Pitman-Yor process hierarchies.


Log-linear analysis is the statistical method used to capture multi-way relationships between variables. However, due to its exponential nature, previous approaches did not allow scale-up to more than a dozen variables. Chordalysis is a log-linear analysis method for big data.

Minimum Message Length (MML) Software

This suite of systems based on the Minimum Message Length Principle provide fundamental approaches for supervised and unsupervised classification.

Weka components

We have contributed many components to the popular Weka Machine Learning Workbench including Averaged n-Dependence Estimators, MultiBoosting, Lazy Bayesian Rules, Decision Tree Grafting and Proportional k-Interval Discretization.


CaMML (Causal discovery via MML) is a computer program, or set of programs, for learning causal Bayesian networks from sample data. The MML score is composed of a two-part message length, the first a measure of the complexity of the Bayesian network being scored and the second a measure of the complexity of the data unexplained by the network. An MCMC stochastic search (Metropolis sampling) is used to estimate the posterior probability distribution over the causal model space using the MML score. Since 1996 CaMML has been an effective tool for investigating issues in causal discovery and causal modeling, as well as in their application. There are multiple versions of CaMML, including CaMML for learning linear Gaussian models, and "vanilla" CaMML for learning discrete Bayesian networks. The latter provides a wide range of options for biasing the discovery process to reflect expert prior opinions about what causal connections are likely or unlikely, including tiers, direct links, indirect links and subgraphs.

For descriptions of the central algorithms and ideas behind CaMML see C.S. Wallace *Statistical and Inductive Inference by Minimum Message Length* (Springer, 2004), Sec 7.4, and K.B. Korb and A.E. Nicholson *Bayesian Artificial Intelligence, 2nd edition* (CRC Press, 2010), Part II.