You are here:

Alham Fikri Aji

Adjunct Assistant Professor, Data Science

Dr. Alham Fikri Aji is an Adjunct Assistant Professor at Monash University, Indonesia. He is also an Assistant Professor at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). He received his Ph.D. and MSc. from the University of Edinburgh. Aji has worked as an NLP scientist at an Indonesian start-up, then moved to Amazon Alexa, before returning to academia.

Aji's work revolves around efficient and accessible NLP. Specifically, he explores how to efficiently train and deploy NLP models through efficient fine-tuning, model compression, and distillation. Additionally, the lack of data makes many NLP technologies inaccessible, so part of Aji's work involves constructing multilingual NLP corpora for training and evaluation, especially for Indonesian languages. He has also worked on evaluating LLMs on cultural and local nuances, as many NLP systems and data do not capture local nuances. Aji is active in grassroots NLP communities, especially for Indonesian and Southeast Asian NLP communities. He actively leads and works on several community-based research projects.

Aji was active in competitive programming and was one of Indonesia's representatives at the IOI 2010, where he achieved a silver medal. He also participated in and won several ACM-ICPC competitions and participated in the World Finals in 2014.

Copal-ID: Indonesian Language Reasoning with Local Culture and Nuances. Haryo Akbarianto Wibowo, Erland Hilman Fuadi, Made Nindyatama Nityasya, Radityo Eko Prasojo, Alham Fikri Aji (NAACL, 2024)
Lamini-LM: A Diverse Herd of Distilled Models from Large-scale Instructions. Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji (EACL, 2024)
LLM-powered Data Augmentation for Enhanced Crosslingual Performance. Chenxi Whitehouse, Monojit Choudhury, Alham Fikri Aji (EMNLP, 2023)
Multilingual Large Language Models Are Not (Yet) Code-Switchers. Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Alham Fikri Aji (EMNLP, 2023)
Nusawrites: Constructing high-quality corpora for underrepresented and extremely low-resource languages. Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung (AACL, 2023)
Crosslingual Generalization through Multitask Finetuning. Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff and Colin Raffel (ACL, 2023)
NusaCrowd: Open Source Initiative for Indonesian NLP Resources. Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Fajri Koto, Rahmad Mahendra, et al. (ACL, 2023)
Multi-lingual and Multi-cultural Figurative Language Understanding. Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo and Graham Neubig (ACL, 2023)
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder (EACL, 2023) -- Outstanding Award
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder (ACL, 2022)
IndoNLI: A Natural Language Inference Dataset for Indonesian. Rahmad Mahendra, Alham Fikri Aji, Samuel Louvan, Fahrurrozi Rahman, Clara Vania (EMNLP, 2021) In Neural Machine Translation, What Does Transfer Learning Transfer?. Alham Fikri Aji, Nikolay Bogoychev, Kenneth Heafield, Rico Sennrich (ACL, 2020)
Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training. Alham Fikri Aji, Kenneth Heafield, Nikolay Bogoychev (EMNLP, 2019)
Accelerating asynchronous stochastic gradient descent for neural machine translation. Nikolay Bogoychev, Marcin Junczys-Dowmunt, Kenneth Heafield, Alham Fikri Aji (EMNLP, 2018)
Sparse communication for distributed gradient descent. Alham Fikri Aji, Kenneth Heafield (EMNLP, 2017)

For more up-to-date publication list, you can visit Aji’s Google scholar page.

Research

Efficient NLP, such as:

Knowledge Distillation
Parameter Efficient Finetuning
Mixture of Experts
Model compression

Multilingual/Indonesian NLP, such as:

Corpus building
Synthetic data
Zero-shot crosslingual generalization
Locally and culturally nuanced evaluation in NLP
Code-switching and Code-mixing

Teaching

Introduction to Data Science (Monash University, Indonesia)
Advanced Natural Language Processing (MBZUAI)
Deep Learning for Language Processing (MBZUAI)

Exploring the Use of Generative AI for Moderating Online Polarisation (PI), Monash Data Futures Institute, Amount: AUD $49,500 (2023-2024)
Inclusive Language Technologies: Data Curation and Large Pretrained Language Models (PI), Monash University FIT, Action Lab, Amount: AUD $33k (2023-2024)
Social Media Analysis of Misinformation and Vaccine Hesitancy in Three Middle-Income Countries (Co-PI), Center for Emerging Infectious
Diseases Policy & Research (CEID), Boston University, Amount: USD $25k (2022-2023)
Exploring the evolution of racial biases over time through framing analysis (PI), Google Research Scholar Award, Amount: USD $60k (2021-2022)
Explore CS Research ( Research Workshop for Female Undergraduates) (PI), Google, Amount: USD $18k (2020)
LEARN : Label-Eﬀicient Active Resilient Network (Co-PI), DARPA Learning with Less Labels (LwLL), DARPA, Amount: USD $462k (2019–2022)
Semi-supervised Learning of Multimodal Representations (Co-PI), DARPA Active Interpretation of Disparate Alternatives (AIDA), Amount: USD $200k (2019–2020)
Bridging Linguistic and Visual Knowledge through Visual Genome (PI), BU Hariri Institute Research Incubation Award, Amount: USD $28k (2019–2020)
BIGDATA: IA: Multiplatform, Multilingual, and Multimodal Tools for Analyzing Public Communication in over 100 Languages (Co-PI), NSF, Amount: USD $1m (2018–2022)

AI Governance and Research Symposium at ATxSG, organised by AI Singapore and the Digital Trust Centre Singapore, NUS FinTech Lab, 2024.
Open Sourcing Active Question Reformulation with Reinforcement Learning, Google Research, 2018.
UI Sapu Bersih Gemastik 2010, Kompas.com, 2010.
Tim Olimpiade Komputer Indonesia Raih 2 Perak dan 1 Perunggu, Republika.co.id, 2010.

Best Resource Paper Award, EACL 2024
Best Resource Paper Award, AACL 2023
Outstanding Paper Award, EACL 2023
Outstanding Contribution Award, WNGT 2019
World Finalists, ACM-ICPC 2014
Silver Medalists, International Olympiad of Informatics (IOI) 2010