NativQA

Natural Question Answering (QA) datasets play a crucial role in evaluating the capabilities of large language models (LLMs), ensuring their effectiveness in real-world applications. Despite the numerous QA datasets that have been developed, there is a notable lack of region-specific datasets created by native users in their own languages. This gap hinders the effective benchmarking of LLMs for regional and cultural specificities and limits the development of fine-tuned models.

In this study, we propose a scalable, language-independent framework, NativQA, to seamlessly construct culturally and regionally aligned QA datasets in native languages for LLM evaluation and tuning. We demonstrate the efficacy of the proposed framework by designing a multilingual natural QA dataset, MultiNativQA, consisting of approximately 64k manually annotated QA pairs in seven languages, ranging from high- to extremely low-resource languages, based on queries from native speakers from nine regions covering 18 topics.

We benchmark both open- and closed-source LLMs using the MultiNativQA dataset. Additionally, we showcase the framework’s efficacy in constructing fine-tuning data, especially for low-resource and dialectally rich languages. Both the NativQA framework and the MultiNativQA dataset have been made publicly available to the community.

MultiNativQA Dataset

Statistics

Topics Coverage

Selected topics used as seed to collect manual queries.
Animal, Business, Cloth, Education, Events, Food & Drinks, General, Geography, Immigration Related, Language, Literature, Names & Persons, Plants, Religion, Sports & Games, Tradition, Travel, Weather

Language Coverage

Arabic, Assamese, Bangla, English, Hindi, Nepali, Turkish

news

Jan 23, 2025	Multilingual and Multimodal Cultural Inclusivity in LLMs
Nov 13, 2024	Fostering Native and Cultural Inclusivity in LLMs

latest posts

Jul 16, 2024	Arabic Language Technologies – Medium

publications

AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

Basel Mousi, Nadir Durrani, Fatema Ahmad , and 6 more authors

In Proceedings of the 31st International Conference on Computational Linguistics , Jan 2025

Bib

@inproceedings{mousi-etal-2025-aradice,
  title = {AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs},
  author = {Mousi, Basel and Durrani, Nadir and Ahmad, Fatema and Hasan, Md. Arid and Hasanain, Maram and Kabbani, Tameem and Dalvi, Fahim and Chowdhury, Shammur Absar and Alam, Firoj},
  editor = {Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Eugenio, Barbara Di and Schockaert, Steven},
  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics},
  month = jan,
  year = {2025},
  address = {Abu Dhabi, UAE},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2025.coling-main.283/},
  pages = {4186--4218},
}

NativQA: Multilingual Culturally-Aligned Natural Query for LLMs

Md. Arid Hasan, Maram Hasanain, Fatema Ahmad , and 6 more authors

Jan 2024

Bib

@article{hasan2024nativqa,
  title = {NativQA: Multilingual Culturally-Aligned Natural Query for LLMs},
  author = {Hasan, Md. Arid and Hasanain, Maram and Ahmad, Fatema and Laskar, Sahinur Rahman and Upadhyay, Sunaya and Sukhadia, Vrunda N and Kutlu, Mucahid and Chowdhury, Shammur Absar and Alam, Firoj},
  year = {2024},
  url = {https://arxiv.org/abs/2407.09823},
  publisher = {arXiv:2407.09823},
}