Samuel Cahyawijaya

The Hong Kong University of Science and Technology (HKUST)


I’m currently working as a 3rd year PhD Student at Centre for Artificial Intelligence Research (CAiRE) HKUST focusing on the field of multilingualism for low-resource languages, especially in Southeast Asian languages.

In addition to pursuing my PhD, I am also supervising undergraduate and master students who are interested in NLP research. If you are interested in learning more about NLP, please feel free to reach out to me via email

⌛ Previously

🏆 Awards

💡 Interests

As of now, I’m researching the fields of multicultural and multilingualism across various language families and modalities.

You can take a look at my resume here.

If you are interested in collaborating or discussing this, please feel free to reach out to me via email.


Dec 12, 2023 Our collaborative research papers Multilingual Large Language Models Are Not Yet Code-Switchers and Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages have been respectively published in EMNLP 2023 and CALCS Workshop 2023!
Nov 6, 2023 🎉 So happy that 3 out of 4 of our published papers in AACL 2023 got an award!
Sep 19, 2023 So proud of our latest IndoNLP :indonesia:’s collaboration project! 🚀 Introducing NusaWrites, our groundbreaking project accepted at AACL 2023. 📚🌍 Dive deep into our analysis of corpora collection strategy and explore a comprehensive language modeling benchmark for underrepresented and extremely low-resource 🇮🇩 local languages.
May 4, 2023 EACL 2023 Outstanding Paper Award for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.
May 1, 2023 NusaCrowd is published in ACL Findings 2023. So proud of our IndoNLP :indonesia: community! From a joint collaboration to 100+ datasets.

Publication Highlight

  1. Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, and Pascale Fung.
    . 2023.
  2. Samuel Cahyawijaya, Holy Lovenia, Tiezheng Yu, Willy Chung, and Pascale Fung.
    arXiv preprint arXiv:2305.13627. 2023.
  3. Ruochen Zhang,  Samuel Cahyawijaya, Jan Christian Blaise Cruz, and Alham Fikri Aji.
    arXiv preprint arXiv:2305.14235. 2023.
  4. Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, and others.
    In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 9-14 July 2023. 2023.
  5. Genta Indra Winata, Alham Fikri Aji,  Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, and Sebastian Ruder.
    In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. May 2023.
  6. Muhammad Farid Adilazuarda,  Samuel Cahyawijaya, and Ayu Purwarianti.
    In Tiny Papers of Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, 5 May 2023. May 2023.
  7. Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Huan Zhong, MingQian Zhong, Yuk-Yu Nancy Ip, and Pascale Fung.
    In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI). Dec 2022.
  8. Alham Fikri Aji, Genta Indra Winata, Fajri Koto,  Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, and Sebastian Ruder.
    In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022. Dec 2022.
  9. Samuel Cahyawijaya, Tiezheng Yu, Zihan Liu, Xiaopu Zhou, Tze Wing Tiffany Mak, Nancy Y. Ip, and Pascale Fung.
    In Proceedings of the 21st Workshop on Biomedical Language Processing, BioNLP@ACL 2022, Dublin, Ireland, May 26, 2022. Dec 2022.
  10. Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Leylia Khodra, Ayu Purwarianti, and Pascale Fung.
    In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Dec 2021.
  11. Bryan Wilie, Karissa Vincentio, Genta Indra Winata,  Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, and Ayu Purwarianti.
    In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2020, Suzhou, China, December 4-7, 2020. Dec 2020.