Abstracts
Contributed talks
Title: Bayesian Clustering for Big Data Using Splinters
Speaker: David Dahl, Brigham Young University
Abstract: We propose a Bayesian method to cluster a large dataset where it is impractical or impossible to obtain samples from the full posterior distribution. Under the Bayesian paradigm, the canonical approach to choosing an estimator is to introduce a loss function and then report the Bayes rule that minimizes the posterior expectation of the chosen loss function. Except in a situation with trivially small sample size, the posterior expectation must be approximated, usually using posterior samples. Typical algorithms, however, scale poorly in the sample size and it is challenging to fit models with tens of thousands items. In this paper, we consider the "big data" situation where obtaining samples from the full posterior distribution is infeasible. We consider instead the idea splintering the data into overlapping subsets such that the size of each subset is manageable for existing MCMC algorithms. The model is then fit to each subset independently and posterior samples are obtained for each. Our task becomes to use these samples to obtain an estimated partition the approximates the partition that would be obtained by minimizing the posterior expectation of the full model, if only it were possible to run. The size of each subset, the number of subsets, and the amount of overlap among the subsets are important tuning parameters which we explore.
Title: Coverage of Credible Intervals under Multivariate Monotonicity
Speaker: Subhashis Ghoshal, North Carolina State University
Abstract: Shape restrictions such as monotonicity in one or more dimensions sometimes naturally arise. The restriction can be effectively used for function estimation without smoothing. Several exciting results on function estimation under monotonicity, and to a lesser extent, under multivariate monotonicity have been obtained in the frequentist setting. But only a little is known about how Bayesian methods work when there are restrictions on the shape. Chakraborty and Ghosal recently studied the convergence properties of a "projection-posterior" distribution. The shape restriction is not imposed on the prior in this approach. Instead, a conjugate prior disregarding the shape is used. Samples from the posterior distribution are "corrected" via a projection map to comply with the shape restriction. In contrast to the phenomenon Cox observed for smooth functions, we demonstrate that the equal-tailed projection-posterior credible interval for the function value at a point has a limiting coverage slightly higher than the credibility. Interestingly, the correct coverage is obtained for a suitably lower credibility interval. In the multivariate context, we generalize the projection-posterior approach by using an “immersion map'” given by a block maxmin operation and show that the resulting Bayesian credible intervals have similar coverage properties by explicitly evaluating the limiting coverage in terms of a function of a pair of Gaussian processes. Simulation results confirm the theoretical findings.
Title: Flexible Bayesian Nonparametric Modeling for Longitudinal Binary and Ordinal Responses
Speaker: Jizhou Kang, University of California, Santa Cruz
Abstract: Longitudinal studies with binary or ordinal responses are widely encountered in various disciplines, where the primary focus is on the temporal evolution of the probability of each response category. Traditional approaches attempt the problem under the generalized mixed effects modeling framework. Even amplified with nonparametric priors placed on the fixed or random effects, such models are restrictive due to the implied assumptions on the marginal expectation and covariance structure of the responses. We tackle the problem from a functional data analysis perspective, treating the observations for each subject as realizations from subject-specific stochastic processes at the measured times. We develop the methodology focusing initially on binary responses, for which we assume the stochastic processes have Binomial marginal distributions. Leveraging the logits representation, we model the discrete space processes through sequences of continuous space processes. We utilize a hierarchical framework to model the mean and covariance kernel of the continuous space processes nonparametrically and simultaneously through a Gaussian process prior and an Inverse-Wishart process prior, respectively. The prior structure results in flexible inference for the evolution and correlation of binary responses, while allowing for borrowing of strength across all subjects. The modeling approach can be naturally extended to ordinal responses, for which the key is the continuation-ratio logits factorization of the multinomial distribution. It also yields a practical way of dealing with unbalanced longitudinal data and incorporating covariate effects. We illustrate the methodology with several synthetic and real data examples.
Title: Computational Approaches to Bayesian Variable Selection: Random Neighbourhood Samplers and Large p Asymptotics
Speaker: Samuel Livingstone, University College London
Abstract: Choosing which variables to include in a probabilistic model is a classical problem. The Bayesian solution is to place a prior distribution on a model with each possible combination of the p variables under consideration. This leads to a posterior over 2^p possible models. In order to either decide on the best model or average over them for e.g. prediction, this model space must be explored, and when p is large this can be challenging. Recently sophisticated Markov chain Monte Carlo (MCMC) algorithms have been proposed for this purpose. Some rely on intelligent global approximations to the posterior distribution, while others consider sophisticated moves within a neighbourhood of the current model. I will argue that in the latter case the choice of neighbourhood is crucial to performance and scalability, and that ideas from the former case can help design such a neighbourhood, leading to algorithms that combine many recently proposed approaches to produce practical and scalable methodology for variable selection. I will then discuss more recent work on convergence of algorithms in the large p regime.
Title: Simple and Effective Sampling from Probability Distributions Concentrated around Manifolds
Speaker: Hadi Mohasel Afshar, University of Technology, Sydney
Abstract: In the applications of probabilistic inference that deal with almost-deterministic transformations, the target distribution is often concentrated around a known manifold. Such problems are typically formalised by State Space Models (SSMs). State of the art MCMC samplers have difficulty in dealing with such models and are prone to becoming trapped in a single mode of the density function. We introduce a simple and effective MCMC sampler that is designed for SSMs and uses the manifold (that represents the deterministic transformation) as a guide in generating quality proposals. Our experimental results suggest that the proposed sampler can navigate the distribution modes better than state of the art MCMC methods. Meanwhile, as an MCMC method, our sampler does not suffer from the particle depletion problem that can negatively affect the performance of particle filters.
Title: Evidence Estimation in Finite and Infinite Mixture Models and Applications
Speaker: Christian Robert, Ceremade - Université Paris-Dauphine
Abstract: Estimating the model evidence - or marginal likelihood of the data - is a notoriously difficult task for finite and infinite mixture models and we re-examine here different Monte Carlo techniques advocated in the recent literature, as well as novel approaches based on Geyer (1994) reverse logistic regression technique, Chib (1995) algorithm, and Sequential Monte Carlo (SMC). Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric 'strongly identifiable' Dirichlet Process Mixture (DPM) model.
Title: Shrinking a Partition Distribution Towards an Anchor Partition, with Applications to Dependent Partitions
Speaker: Richard Warr, Brigham Young University
Abstract: Bayesian nonparametric models often rely on clustering models that borrow strength within and between groups. In a scenario where researchers have some notion of the clustering composition, we propose the shrinkage partition distribution, which allows for tractable posterior analysis based on the researchers’ prior knowledge. The shrinkage partition distribution (SPD) shrinks any baseline random partition distribution towards an anchor partition. An extension to any sequentially-allocated partition model, the SPD is extremely flexible with relatively inexpensive posterior simulation. We show several distinct advantages over the existing methods, including the ability to model dependent random partitions. Specifically, we show that the SPD can hierarchically model a collection of random partition distributions and can also model time-dependent random partitions.
Title: Verifying Sources of Identification of Structural Vector Autoregressions Using the BETEL Framework
Speaker: Tomasz Wozniak, University of Melbourne
Abstract: We employ a spike’n’slab prior distribution to discriminate between moment conditions identifying a fiscal policy structural vector autoregression. Various sources of identification, including instrumental variables as well as symmetric and asymmetric kurtosis, are presented as moment conditions informing the estimation of the structural parameters. Exclusion restrictions are also considered. The spike’n’slab prior is used to verify these conditions within a single MCMC run. We use a three-variable system for the US fiscal policy analysis to show that the structural parameters are identified thanks to the non-normal innovations and exclusion restrictions rather than via instruments or heteroskedasticity.
Title: Infinite Sparse Factor Stochastic Volatility Model
Speaker: Martina Zaharieva, CUNEF Universidad
Abstract: This paper proposes a sparse factor multivariate stochastic volatility model, in which the sparsity of the loading matrix is achieved by introducing the Indian buffet process, a Bayesian nonparametric prior defining a distribution over infinite binary matrices. The benefit of the infinite dimensional latent process is twofold. First, inducing sparsity prior reduces the dimensionality of the problem and second, the number of active factors is determined by the data itself and a priori set to infinity. Both, the diagonal elements of the covariance matrix of the idiosyncratic term, and the active factors follow univariate stochastic volatility processes. Each latent volatility is sampled independently and in parallel by means of a particle filtering and smoothing technique, based on a simulated likelihood. The model is applied to a cross section of five international stock market indices.
Title: Bayesian Non-linear Latent Variable Modeling via Random Fourier Features
Speaker: Michael Minyi Zhang, University of Hong Kong
Abstract: The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to overfitting, or variational approximations, which mischaracterize the posterior uncertainty. Here, we present a method to perform Markov chain Monte Carlo (MCMC) inference for generalized Bayesian nonlinear latent variable modeling. The crucial insight necessary to generalize GPLVMs to arbitrary observation models is that we approximate the kernel function in the Gaussian process mappings with random Fourier features; this allows us to compute the gradient of the posterior in closed form with respect to the latent variables. We show that we can generalize GPLVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions, using our random feature latent variable model (RFLVM). Our generalized RFLVMs perform on par with state-of-the-art latent variable models on a wide range of applications, including motion capture, images, and text data for the purpose of estimating the latent structure and imputing the missing data of these complex data sets.
Title: Identifying Summary and Parameter Structures in ABC: A Gaussian Graphical Model Approach
Speaker: Yangqi Zhang, University of New South Wales
Abstract: This paper proposes a novel approach to conducting inference in high-dimensional approximate Bayesian computation (ABC) problems. ABC is a powerful likelihood-free inference technique, but its performance is significantly affected when models have a large number of summary statistics. Researchers have proposed and widely used several marginal adjustment strategies to address the challenges of high-dimensional ABC inference. However, these methods mainly focus on estimating univariate marginal posteriors and imposing a fixed dependency structure between the parameters, which may fail to approximate complex multivariate dependencies. To overcome these limitations, this paper proposes a novel approach that constructs a network using summary statistics and parameters as nodes and the dependency as edges, utilizing Gaussian Graphical Models (GGM) to learn the conditional independence structure among the parameters and summary statistics. The graphical lasso (glasso) is employed to estimate the network, imposing a sparse structure to group closely related parameters and summary statistics into clusters, while discarding weakly dependent relationships. Using these clusters, the paper performs multivariate marginal ABC inference for each group, overcoming the curse of dimensionality while maintaining significant multivariate dependency. This approach extends the existing marginal-adjustment strategies by incorporating the dependency structure, providing a more accurate estimation of the posterior distribution for high-dimensional models. The paper concludes by highlighting the potential benefits of this approach in several simulated and real examples.
Title: A Bayesian Stochastic Frontier Model for Analyzing Cost Efficiency of Commercial Banks in the US
Presenter: Xibin Zhang, Monash University
Abstract: We propose a Bayesian approach to the estimation of a panel stochastic frontier models without specifying the distribution of inefficiency terms. The marginal posterior density of inefficiencies is approximated by a conditional density, which we approximate by the joint density of the inefficiency and the realized composite errors divided by the marginal density of the realized composite errors. A sampling algorithm is developed to estimate parameters of the frontier function as well as the error variance and bandwidths involved in the approximated conditional density of inefficiencies. With our model and another two competing models, we analyze the cost efficiencies of large commercial banks in the US and find that our model presents clearly different results in comparison to what we obtain through the two competing models.
Title: Bayesian Spatial Generalised Dissimilarity Models for Antarctic Biodiversity
Speaker: Xiaotian Zheng, University of Wollongong
Abstract: Monitoring change in species composition, often referred to as species turnover, is an informative way of measuring biodiversity. Generalised dissimilarity modelling is commonly used in ecology to understand species turnover through site-pairwise dissimilarities in species composition, which depends on environmental predictors in monotonic relations under a generalised linear model. However, this approach using predictors to explain spatial variation is unable to accommodate complex spatial dependence, especially when the ecological process underlying dissimilarity is spatially dependent. To this end, we develop a dissimilarity-analysis framework based on spatial generalised linear mixed models. The framework extends the classical model by taking into account more structured dependence and thus introduces spatial dependence among dissimilarities. Monotonic relations are modelled using Bayesian nonparametric regressions based on shape-constrained Bernstein polynomials. Our modelling approach offers avenues to incorporate prior beliefs from experts into the monotonic relations, which is crucial for Antarctic studies, especially under a sparse-data setting. Finally, we integrate Gaussian process-based spatial downscaling into the model to allow for predictors that are only available at coarse resolutions. Our approach, which is based on conditional-probability modelling, naturally accounts for the uncertainty that may arise from the downscaling procedure. Inference and prediction are developed under a Bayesian framework. We investigate model properties both analytically and through simulation studies, and we illustrate our methodology with an analysis of species turnover in Antarctica.
Posters
Title: Gaussian Processes at the Helm(holtz): A More Fluid Model for Ocean Currents
Presenter: Renato Berlinghieri, Massachusetts Institute of Technology
Abstract: Oceanographers are interested in predicting ocean currents and identifying divergences in a current vector field based on sparse observations of buoy velocities. Since we expect current velocity to be a continuous but highly non-linear function of spatial location, Gaussian processes (GPs) offer an attractive model. But we show that applying a GP with a standard stationary kernel directly to buoy data can struggle at both current prediction and divergence identification -- due to some physically unrealistic prior assumptions. To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate the benefits of our method on synthetic and real ocean data.
Title: Quantile Slice Sampling with Transformations to Approximate Targets
Presenter: Matthew Heiner, Brigham Young University
Abstract: While general-purpose slice samplers can be more efficient than Metropolis-type alternatives, they are often overlooked as candidates for MCMC algorithms in complex modeling scenarios, in part because they also require tuning parameters. Generalized elliptical slice samplers address this issue by substituting the problem of tuning with that of approximating the target distribution. We apply this trade-off to conventional slice samplers by extending Neal's shrinkage procedure to general continuous distributions via transformations that automatically bound the slice region and eliminate the need for a length-scale tuning parameter. This, together with a suitable approximated target that expands the slice region, yields an efficient rejection algorithm. We extend the transformation method to multivariate slice samplers that retain efficiency when natural approximations to the target are available. We demonstrate the method with a constrained state-space model for which a readily available chain of unconstrained forward-filter, backward-sampling densities provides the approximate target.
Title: Adaptive Finite Element Type Decomposition of Gaussian Random Fields
Presenter: Jaehoan Kim, Texas A&M and Duke University
Abstract: In this paper, we investigate a general class of approximate Gaussian processes (GP) obtained by taking a linear combination of compactly supported basis functions with the basis coefficients endowed with a dependence structure. This general class includes two highly scalable approximate GP methods: the finite element approximation of the stochastic partial differential equation (SPDE) associated with Matern GP and a linear approximation of a general GP on a regular lattice. We propose prior distributions for the number of basis functions to yield the optimal rate of posterior convergence of the underlying function, adaptively over a large class of smooth functions. We also provide two scalable algorithms and numerics to illustrate the methodology.
Title: Joint Modelling of Multiple Treatment Variable Types Through Copula-Based Latent Formulations of Propensity
Presenter: Fui Swen Kuh, University of Adelaide and Monash University
Abstract: Current frameworks for causal inference in observational studies do not readily allow for the joint modelling of different types of treatment variables, such as a mix of continuous and discrete data. In this work, we propose an extended rank likelihood method [Hoff (2007)] for the inference of two latent parametrisations of the propensity score; the latent nature of the score is due to the copula framework. This allows for the simultaneous inclusion of different types of treatment variables (discrete, ordinal, and continuous). One parametrisation, the LPF, is an adaptation of the non-latent propensity function by Imai and Van Dyk (2004), who showcase their method on a canonical data set in the causal inference literature. Our other parametrisation, LPGS, is an adaptation of the generalised propensity score by Hirano and Imbens (2004). We compare the performance of the three approaches when applied to the canonical data set, as well as the data from our work on the latent causal socio-economic health (LACSH) index [Kuh (2022)].
Title: Logistic-Beta Processes for Modeling Dependent Random Probabilities with Beta Marginals
Presenter: Changwoo Lee, Texas A&M University
Abstract: We propose a novel stochastic process called the logistic-beta process (LBP), whose finite- dimensional marginal is a multivariate generalized logistic distribution with a highly flexible dependence structure. The logistic transformation of the LBP leads to a stochastic process with common beta marginals, and we propose to use the LBP as a prior for logit-transformed dependent random probabilities for Bayesian inference and prediction. The LBP induces marginal beta priors on random probabilities while having a flexible dependence structure, which encompasses multiple existing dependent Bayesian nonparametric models involving beta distributions. The LBP facilitates posterior computation due to a normal variance-mean mixture representation distinct from copula-type constructions. We study several dependence cases, including temporal, spatial, and a flexible model class relying on feature maps. We apply LBP in dependent Bayesian nonparametric models including dependent Dirichlet processes, and illustrate its application and benefits such as for Bayesian density regression problems in a toxicology study.
Title: Scalable Posterior Sampling from Gaussian Mixture Models via Randomly Weighted Expectation - Conditional – Maximization
Presenter: Santiago Marin, The Australian National University (ANU)
Title: Sampling from the joint posterior distribution of Gaussian mixture models (GMMs) via standard Markov chain Monte Carlo (MCMC) imposes a number of computational challenges, which have prevented a broader full Bayesian implementation of these models. By definition, MCMC draws are correlated, may get trapped in areas of high posterior density — leading to mixing limitations between posterior modes — and require a large number of expensive linear algebra operations. Thus, we propose a method to sample, in a scalable fashion, from an approximate joint posterior distribution of GMMs. We build on recent weighted Bayesian bootstrap (WBB) ideas, and combine them with a tempered Expectation-Conditional-Maximization (ECM) algorithm to compute maximum a posteriori (MAP) estimates on many independently randomized objective posterior functions. Given the non-convex nature of these objective functions, the inclusion of a tempering profile reduces the risk of landing in sub-optimal modes. Our proposed method generates approximate posterior draws that are independent, explores the entire posterior distribution, enables uncertainty quantification and, by making use of modern numerical optimization algorithms, reduces the number of expensive linear algebra operations. We demonstrate the performance of our method and compare it with competing approaches through extensive simulations, in addition to a real-world data set.
Title: Bayesian Nonparametric Change-Point Modelling for Macroeconomic Time Series
Presenter: Carson McKee, King's College London
Abstract: In this work, we focus on uncertainty quantification in macroeconomic time series data, where the observations, y_t, are vectors. Such data exhibits dramatic structural breaks driven by events such as economic recessions or financial market crashes. These structural or ‘regime’ changes typically arise from change-points or other nonlinear dynamics. Inferring the locations of these regime switches is central to some of the most important questions in economics. In modelling this we have two key considerations to make. The first concerns how we model the regime switches and the second is how we model the within-regime dynamics. The latter has been researched extensively using vector autoregressions (VARs). VARs are linear multivariate time series models which capture the joint dynamics of multiple time series, providing a flexible prior over the within-regime dynamics. Kalli & Griffin (2018) modelled the regime switching by specifying an infinite mixture model for the joint density of y_t and its lags. This results in a transition density that is also an infinite mixture with weights depending on the lags. However, posterior computation under this model is difficult and scales poorly with the dimension of y_t. Further, while the regime switches are identified, inference about which factors are driving the switches is difficult. Our contribution aims to address these issues by adopting the Product Partition Model (PPM) of Hartigan (1990) over the regime switch locations and specifying a VAR structure for the within-regime dynamics. We modify the PPM by modelling the change-point probabilities as a function of the previous lags and other exogenous variables, taking ideas from Page, Quintana and Müller (2022) on similarity-based clustering in PPMs. It is our hope that this approach will facilitate inference on what drives regime switches and allow us to consider larger macroeconomic datasets by taking advantage of state- of-the-art computation.
Barry, D., & Hartigan, J. A. (1993). A Bayesian Analysis for Change Point Problems. Hartigan, J. A. (1990). Partition models. Kalli, M., & Griffin, J. E. (2018). Bayesian nonparametric vector autoregressive models. Page, G. L., Quintana, F. A., & Müller, P. (2022). Clustering and Prediction with Variable Dimension Covariates.
Title: Understanding the Impact of Automatic Truncation in the Slice Sampler
Presenter: Artiom Rumiancev, King's College London
Abstract: The Slice sampler of Kalli, Griffin and Walker (2011) is an infinite mixture Markov Chain Monte Carlo sampling algorithm, where the weights and locations of the mixture components are generated using a stick-breaking process. The sampler avoids integrating out the random probability measure via automatic truncation, which is achieved by introducing latent variables that determine the finite number and allocation of mixture components. As such, it produces a set of component indices that show whether an observation is linked to a given mixture. With each iteration of the Markov chain, this set of allocated components changes and in turn affects the convergence of the algorithm to the posterior density. This analysis focuses on investigating the transition density of this set of component indices in order to better understand the behaviour and the properties of the sampler. Furthermore, it examines the generalised representation of the algorithm, where the latent variable, which determines the number of components, is uncorrelated with the stick-breaking weights. In particular, it looks at the impact of introducing a positive sequence on the convergence ability of the algorithm. Ferguson, T.S., 1973. A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1(2), pp.209–230. Ishwaran, H. and James, L.F., 2001. Gibbs sampling methods for stick-breaking priors. Journal of the American statistical Association, 96(453), pp.161-173. Kalli, M., Griffin, J.E. and Walker, S.G., 2011. Slice sampling mixture models. Statistics and computing, 21(1), pp.93–105. Lo, A.Y., 1984 On a class of Bayesian nonparametric estimates I. Density estimates. The Annals of Statistics, 12(1), pp.351–357. Walker, S.G., 2007. Sampling the Dirichlet mixture model with slices. Communications in Statistics— Simulation and Computation, 36(1), pp.45–54.
Title: Double Trouble: Predicting New Variant Counts Across Two Heterogeneous Populations
Presenter: Yunyi Shen, Massachusetts Institute of Technology
Abstract: Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they might expect to find in a follow-up study: both the number of new variants shared between the populations and the total across the populations. While many authors have developed prediction methods for the single-population case, we expect these predictions to fare poorly across multiple populations that are heterogeneous. We prove that, surprisingly, a natural extension of a state-of-the-art single-population predictor to multiple populations fails for fundamental reasons. We provide the first predictor for the number of new shared variants and new total variants that can handle heterogeneity in multiple populations. We show that our proposed method works well empirically using real cancer and population genetics data.
Presenter: Ziyou Wang, King's College London
Abstract: We are focusing on density estimation of spatial data. Public Use Microdata Sample (PUMS) provides a representative sample of the US population. It contains various information on various factors such as demographics, socio-economic indicators, housing, and employment, making it a valuable resource for understanding population dynamics and informing policy decisions. PUMS data have complex and non- standard distributions. Their complexity makes them suitable for Bayesian non-parametric methods of density estimation. In this study, we will build on the work of Beraha and Griffin (2023) on normalised latent measure factor models. Beraha and Griffin proposed a methodology that incorporates dependent normalized random measures and a prior distribution for a collection of discrete random measures. In their paper they focused on estimating the density of personal incomes for the PUMS classifications of the state of California. We are going to extend their work by looking at how personal income is distributed in the other US states and by considering multiple discrete value covariates, aiming to gain valuable insights into the spatial density and distribution of various characteristics within the PUMS data. Our Bayesian nonparametric approach aims to uncover spatial dependence in the PUMS data across all states.