New statistical tool a boon for biologists

Associate Professor Traude Beilharz with Associate Professor David Powell and Dr Paul Harrison.
Associate Professor Traude Beilharz with Associate Professor David Powell and Dr Paul Harrison.

A cross-disciplinary collaboration between Monash Biomedicine Discovery Institute (BDI) scientists has developed a new statistical approach that is set to reap benefits for biologists working with ‘omic’ data.

The approach, called Topconfects, was developed to reconcile what is biologically interesting with what is statistically defensible. Born out of many years of conversation between bioinformaticians Dr Paul Harrison and Associate Professor David Powell, and Associate Professor Traude Beilharz, Topconfects uses confidence bounds on the effect size to rank gene expression data.

The study describing it was published last week in Genome Biology, ranked fourth in the field of Genetics and Hereditary and highest ranked among open access journals in this category.

The approach was designed to replace widely used but increasingly criticised p-value-based methods. In a nutshell, p-values, when used to rank gene lists, prioritise the most highly consistent, but possibly very small changes, whereas Topconfects emphasises the most ‘interesting’ ones with the biggest effect size, (fold change in terms of gene expression), Associate Professor Beilharz said. It does this without compromising the False Discovery Rate.

“The difference sounds like a subtle thing but it turns out to make a major difference in the way you rank what’s at the top of your list and how further research is prioritised,” she said.

“If you do an enormous transcriptomic experiment and have 20,000 genes, half of which are statistically significantly changed, we want to put them in order of the ones that are most biologically interesting.”

The researchers applied the method to a breast cancer data set in the study and found that the top-ranked genes emphasised markedly different biological processes compared to genes top-ranked by p-value.

First author, Dr Paul Harrison, initiated research into the new approach five years ago to address Associate Professor Beilharz’s frustration with existing methods of ranking gene expression. She said the process of developing and iterating the method was much more involved than first thought.

“Bioinformaticians and biologists speak such different languages. To be able to cross that language barrier requires a lot of trust in each other’s leadership and expertise,” Associate Professor Beilharz said.

“The work is testimony to the value of cross-disciplinary research and to the importance of building computational biology into the fabric of any modern institution,” she said.

The research is timely. “There’s a massive global shift away from what statisticians are calling ‘the abuse of p-values’ – equating p-values with calling the findings correct, which it doesn’t do – it just gives a statistical probability. Beyond a certain small number, it becomes meaningless,” Associate Professor Beilharz said.

Coincident with publication of the Topconfects approach, commentaries around this issue have been published in Nature, Cell and a dedicated issue of the Journal of the American Statistical Society. Statistical discoveries become inflated when only the smallest p-values are reported, as described in the famous paper by Ioannidis "Why Most Published Research Findings Are False."

“Bioinformatics has long recognised the problem of selective examination of only the smallest p-values, and False Discovery Rate correction is routinely used, but the use of p-values as an effect size is unfortunately common,” Dr Harrison said.

“To make the switch to confidence bounds on a meaningful effect size, a correction similar to the FDR is needed. This is the missing piece that the Topconfects approach provides, allowing the discovery of results that are not only mostly not false, but also confidently of a meaningful size,” he said.

The method was launched six months ago in the BioRxiv Preprint server where it prompted much discussion on social media with this feedback and reviewer feedback strengthening the final manuscript. An ‘R package’ to implement the approach has been incorporated into the Bioconductor suite of bioinformatic tools.

“So many people are using ‘omics’ approaches now, whether it’s transcriptomics as in my case, or proteomics or any number of high-content data-driven research approaches – we can’t study everything so we need a means to prioritise,” Associate Professor Beilharz said.

“I think biologists will prefer this tool because it emphasises the ‘effect size’ over consistent but small changes,” she said.

“I’m certainly very proud that we can bring a different perspective to our data and to everyone else’s data. It doesn’t necessarily replace the current technologies, it’s another way of looking at data.”

“I’m loving it, it’s giving me what I was looking for.”

Read the full paper in Genome Biology titled Topconfects: a package for confident effect sizes in differential expression analysis provides a more biologically useful ranked gene list


About the Monash Biomedicine Discovery Institute

Committed to making the discoveries that will relieve the future burden of disease, the newly established Monash Biomedicine Discovery Institute at Monash University brings together more than 120 internationally-renowned research teams. Our researchers are supported by world-class technology and infrastructure, and partner with industry, clinicians and researchers internationally to enhance lives through discovery.