Bioinformatics Tools and Computing Resources
We have developed several bioinformatics tools and applications tailored to particular research problems. We also offer tools for data analysis, visualisation and exploration.
- Degust is our flagship visualisation tool for differential expression analysis of RNA-Seq data.
- RNAsik is an end-to-end RNAseq pipeline that tries to apply best-practice based on the current literature with minimal configuration. It is based on the BigDataScript workflow engine that allows analysis to scale from small servers to large HPC clusters.
- Varistran: Varistran is an R package providing a Variance Stabilizing Transformation appropriate for RNA-Seq data. This transformaion renders RNA-Seq data suitable for visualization and further analysis such as clustering. Varistran also includes an interactive "shiny" app to help assess RNA-Seq data quality, which can help identify batch effects or mis-labelled samples.
- Topconfects: Topconfects is an R package for differential expression analysis without p-values, instead calculating confident effect sizes while maintaining an FDR. This allows the genes with the largest confident effect sizes to b identified, and is especially useful when too many genes are found to be significantly differentially expressed to all be examined. The package builds on the TREAT method of McCarthy and Smyth.
- TCPseq is a pipeline we are actively developing to analyse TCP-seq data in yeast. (Nature Protocols 2017: 12, 697–731)
- Bio-ansible simplifies configuring servers and installing bioinformatics packages using the Ansible configuration management tool. We use this on all our main computing platforms to configure a common software stack and help improve reproducibility of analysis.
- RaftProt: A database for mammalian proteins localised to cholesterol-rich membrane microdomain (lipid-rafts). Lipid rafts are specialized regions on cell surface that regulate diverse range of cellular functions. RaftProt hosts information about lipid raft proteome identified through 117 mass-spectrometry based proteomics experiments.
Technology and Computing Resources
We use a range of computing platforms for our collaborative research work and are happy to provide advice and assistance to researchers who need access to compute and storage for bioinformatics. The primary platforms we use are:
- NeCTAR - the Platform uses the NeCTAR research cloud extensively, via the Monash node R@Cmon, supported by the Monash eResearch Centre.
- Monash HPC (Massive, M3) - our standard toolset is available on M3, installed via bio-ansible. M3 contains 87+ nodes, typically with 24 cores and 128 Gb or 256 Gb of RAM per node. Special high RAM nodes with 1TB+ memory are also available for de-novo genome assembly. M3 also provides GPU nodes particularly suited to deep learning and image processing.
- Amazon Web Services (AWS) - we use AWS to supplement the local cloud and high performance cluster computing resources at Monash.
- Galaxy - researchers that would like to use Galaxy should consider using the official Galaxy Australia server, refer to the recorded webinar video for more informaiton on available Galaxy resources in Australia.
- VicNode - our primary data storage is provided by the VicNode arm of the nationally funded Research Data Services, supported by the Monash eResearch Centre. We store primary data and analysis outputs for projects we collaborate on. Researchers are encouraged to contact Monash eResearch if they require long term archival of large datasets.
Where possible we use our bio-ansible project for configuration management to provide a consistent availability of tools across computing platforms. This eases administrative burden and improves reproducibility for analysis.