News & Events
Want To Become An Instructor?
You can become a Software Carpentry instructor at Monash. There are a limited number of places in 2019 for Monash graduate researchers and professional staff to gain Software Carpentry Certification and become an instructor.
- Find out more about the expectations and requirements.
- Complete the form to express your interest.
Bioinformatics and Statistics Drop-In Sessions: 3pm Fridays
The Bioinformatics drop-in help session is on every Friday at 3pm, 15 Innovation Walk in room G19 (in foyer opposite Cinque Lire cafe), Clayton Campus.
MBP staff and others will be there to offer help and advice on:
- HPC (eResearch)
- Statistics (Ian Hunt, Manager of the Statistical Consulting Service, Maths Department)
New statistical tool a boon for biologists
A cross-disciplinary collaboration between Monash Biomedicine Discovery Institute (BDI) scientists has developed a new statistical approach that is set to reap benefits for biologists working with ‘omic’ data.
The approach, called Topconfects, was developed to reconcile what is biologically interesting with what is statistically defensible. Born out of many years of conversation between bioinformaticians Dr Paul Harrison and Associate Professor David Powell, and Associate Professor Traude Beilharz, Topconfects uses confidence bounds on the effect size to rank gene expression data.
The study describing it was published last week in Genome Biology, ranked fourth in the field of Genetics and Hereditary and highest ranked among open access journals in this category.
The approach was designed to replace widely used but increasingly criticised p-value-based methods. In a nutshell, p-values, when used to rank gene lists, prioritise the most highly consistent, but possibly very small changes, whereas Topconfects emphasises the most ‘interesting’ ones with the biggest effect size, (fold change in terms of gene expression), Associate Professor Beilharz said. It does this without compromising the False Discovery Rate.
“The difference sounds like a subtle thing but it turns out to make a major difference in the way you rank what’s at the top of your list and how further research is prioritised,” she said.
“If you do an enormous transcriptomic experiment and have 20,000 genes, half of which are statistically significantly changed, we want to put them in order of the ones that are most biologically interesting.”
The researchers applied the method to a breast cancer data set in the study and found that the top-ranked genes emphasised markedly different biological processes compared to genes top-ranked by p-value.
First author, Dr Paul Harrison, initiated research into the new approach five years ago to address Associate Professor Beilharz’s frustration with existing methods of ranking gene expression. She said the process of developing and iterating the method was much more involved than first thought.
“Bioinformaticians and biologists speak such different languages. To be able to cross that language barrier requires a lot of trust in each other’s leadership and expertise,” Associate Professor Beilharz said.
“The work is testimony to the value of cross-disciplinary research and to the importance of building computational biology into the fabric of any modern institution,” she said.
The research is timely. “There’s a massive global shift away from what statisticians are calling ‘the abuse of p-values’ – equating p-values with calling the findings correct, which it doesn’t do – it just gives a statistical probability. Beyond a certain small number, it becomes meaningless,” Associate Professor Beilharz said.
Coincident with publication of the Topconfects approach, commentaries around this issue have been published in Nature, Cell and a dedicated issue of the Journal of the American Statistical Society. Statistical discoveries become inflated when only the smallest p-values are reported, as described in the famous paper by Ioannidis "Why Most Published Research Findings Are False."
“Bioinformatics has long recognised the problem of selective examination of only the smallest p-values, and False Discovery Rate correction is routinely used, but the use of p-values as an effect size is unfortunately common,” Dr Harrison said.
“To make the switch to confidence bounds on a meaningful effect size, a correction similar to the FDR is needed. This is the missing piece that the Topconfects approach provides, allowing the discovery of results that are not only mostly not false, but also confidently of a meaningful size,” he said.
The method was launched six months ago in the BioRxiv Preprint server where it prompted much discussion on social media with this feedback and reviewer feedback strengthening the final manuscript. An ‘R package’ to implement the approach has been incorporated into the Bioconductor suite of bioinformatic tools.
“So many people are using ‘omics’ approaches now, whether it’s transcriptomics as in my case, or proteomics or any number of high-content data-driven research approaches – we can’t study everything so we need a means to prioritise,” Associate Professor Beilharz said.
“I think biologists will prefer this tool because it emphasises the ‘effect size’ over consistent but small changes,” she said.
“I’m certainly very proud that we can bring a different perspective to our data and to everyone else’s data. It doesn’t necessarily replace the current technologies, it’s another way of looking at data.”
“I’m loving it, it’s giving me what I was looking for.”
Read the full paper in Genome Biology titled Topconfects: a package for confident effect sizes in differential expression analysis provides a more biologically useful ranked gene list
Data Fluency For Next Generation Research
The Monash Bioinformatics Platform and Monash library join hands to start a cross-disciplinary initiative called the Data Fluency for Research with aim to build researcher capability in the use of digital research tools.
The initiative was officially launched by Provost and Senior Vice-President Marc Parlange at the Sir Louis Matheson Library, Clayton campus on the 23 March 2018. Participants at the Caulfield and Peninsula campuses were able to join via live-streaming.
A panel of experts shared their experiences in using different digital tools like R, Python, machine learning and others. The panelists included Associate Professor David Powell (Director Bioinformatics Platform), Dr Simon Musgrave (Faculty of Arts), Professor Di Cook (Faculty of Business and Economics) and Professor Geoff Webb (Faculty of Information Technology). Workshops were also held in the afternoon.
The launch is a step towards building a community of passionate, like-minded Monash researchers, graduate researchers and professional staff, growing and sharing expertise across disciplines to enhance knowledge and capability.
Three parallel workshops were run by instructors from the Monash Bioinformatics and eResearch platform on introduction to R, Python and deep learning. More regular events are being planned under the new initiative. Be part of the community of practice and get involved. For more information go to the Data fluency webpage.