Samuel Cahyawijaya
The Hong Kong University of Science and Technology (HKUST)
I’m currently working as a 3rd year PhD Student at Centre for Artificial Intelligence Research (CAiRE) HKUST focusing on the field of multilingualism for low-resource languages, especially in Southeast Asian languages.
In addition to pursuing my PhD, I am also supervising undergraduate and master students who are interested in NLP research. If you are interested in learning more about NLP, please feel free to reach out to me via email
⌛ Previously
-
Together with my research colleagues, we started an Indonesian research community, 🇮🇩 IndoNLP, and have been active in NLP research in Indonesian indigenous languages (e.g., IndoNLU, IndoNLG, One Country 700+ Languages, NusaX, NusaCrowd, NusaWrites, etc.) since 2020.
-
🎓 I studied and graduated as an MPhil in The Hong Kong University of Science and Technology, Hong Kong 🇭🇰 from 2019 to 2021. I worked on efficient NLP modeling & low-resource language research as a part of the Centre for Artificial Intelligence Research (CAiRE), HKUST.
-
- From 2017 to 2019, I built my career in data science and resource-oriented research in Indonesia 🇮🇩. I was a Data Scientist at UangTeman, Senior Research Engineer at Prosa.ai, Indonesia 🇮🇩, and Senior Data Scientist at Julo.
-
From 2014 to 2017, I built two startups in Indonesia 🇮🇩.
-
A for-profit corporation, namely PT. Devarta Kencana Indonesia, focuses on building a tool for rapid software development.
-
A non-profit organization, namely Yayasan Pelita Cakrawala Inspirasi, focuses on building a healthcare crowd-sourcing platform in Indonesia, known as WeCare.id.
-
-
🎓 I got my Bachelor’s degree from the undergraduate program of Computer Science at Bandung Institute of Technology (ITB), Indonesia 🇮🇩 in 2014.
- From 2013 to 2015, I worked on various research projects regarding integrated modular avionics, real-time visual tracking, and 3D computer graphics at Bandung Institute of Technology (ITB), Indonesia 🇮🇩.
🏆 Awards
- Resource Award at IJCNLP-AACL 2023 for NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages (November 2023 )
- Area Chair’s Award at IJCNLP-AACL 2023 for A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity (November 2023 )
- Best Paper Award at SEALP 2023 for IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems (November 2023 )
- Outstanding Paper Award at EACL 2023 for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages (May 2023 )
- Best Student Paper Award at DialDoc 2022 for Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters (May 2022 )
- Honorable Mention Award at NLP4ConvAI for XPersona: Evaluating Multilingual Personalized Chatbot (November 2021 )
- Hong Kong PhD Fellowship from Research Grants Council of Hong Kong (September 2021 )
- Merit Award of e-Inclusion category at INAICTA 2014 (August 2014 )
- Semi-Finalist of the World Citizenship Category of Imagine Cup 2014 (August 2014 )
- 1st Place Winner of Data Mining Competition at Gemastik 6 (October 2013 )
- 1st Place Winner of Gemastik 6 Debugging Competition at Gemastik 6 (October 2013 )
- 3rd Place Winner of Samsung App Challenge 2013 (September 2013 )
- 1st Place Winner of the Innovation Category of Imagine Cup 2013 (March 2013 )
💡 Interests
As of now, I’m researching the fields of multicultural and multilingualism across various language families and modalities.
You can take a look at my resume here.
If you are interested in collaborating or discussing this, please feel free to reach out to me via email.
News
Dec 12, 2023 | Our collaborative research papers Multilingual Large Language Models Are Not Yet Code-Switchers and Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages have been respectively published in EMNLP 2023 and CALCS Workshop 2023! |
---|---|
Nov 6, 2023 | 🎉 So happy that 3 out of 4 of our published papers in AACL 2023 got an award! |
Sep 19, 2023 | So proud of our latest IndoNLP ’s collaboration project! 🚀 Introducing NusaWrites, our groundbreaking project accepted at AACL 2023. 📚🌍 Dive deep into our analysis of corpora collection strategy and explore a comprehensive language modeling benchmark for underrepresented and extremely low-resource 🇮🇩 local languages. |
May 4, 2023 | EACL 2023 Outstanding Paper Award for NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. |
May 1, 2023 | NusaCrowd is published in ACL Findings 2023. So proud of our IndoNLP community! From a joint collaboration to 100+ datasets. |
Publication Highlight
- arXiv preprint arXiv:2305.13627. 2023.
- arXiv preprint arXiv:2305.14235. 2023.
- In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 9-14 July 2023. 2023.
- In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. May 2023.
- In Tiny Papers of Eleventh International Conference on Learning Representations (ICLR), Kigali, Rwanda, 5 May 2023. May 2023.
- In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI). Dec 2022.
- One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in IndonesiaIn Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022. Dec 2022.
- In Proceedings of the 21st Workshop on Biomedical Language Processing, BioNLP@ACL 2022, Dublin, Ireland, May 26, 2022. Dec 2022.
- In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Dec 2021.
- In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2020, Suzhou, China, December 4-7, 2020. Dec 2020.