Home
Harnessing dual strengths in computer vision and natural language processing
At the forefront of vision and language research, we are a multimodal group that’s using interdisciplinary expertise to solve big challenges, and make a lasting contribution to society.

Our projects
Our award-winning research is to empower machines to comprehend and produce images, scenery, text and languages.

Our people
With experts in vision and language, we’re leveraging the synergy between these disciplines to drive innovation – and advance the world.

Our publications
Discover our research outputs that are shaping the future of technology and society.

Collaborate with us
Vision and language capabilities are integral to almost every organisation and sector. If you would like to collaborate or partner with us, we are keen to hear from you.

PhD opportunities
Are you a PhD student looking to support one of our projects? Head to Supervisor Connect to find an initiative.

Contact us
Associate Professor Reza Haffari
E: Gholamreza.Haffari@monash.edu

Our expertise at a glance
- Joint visual and language learning, e.g., image/video captioning, visual language navigation, visual question answering
- Multilingual natural language understanding and generation, e.g., machine translation, biomedical NLP, dialogue and chatbots
- Knowledge graph representation, reasoning, question answering and network representation learning
- 3D visual computing and analytics