MENASpeechBank | NativMMQA

AudioLLM resource

MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs

Building better resources for AudioLLMs through the lens of Arabic, MENASpeechBank combines a curated voice bank and a large persona-conditioned conversation pipeline to support evaluation in culturally grounded, multi-turn speech settings.

Hugging Face dataset arXiv abstract Paper PDF Back to resources

Scope Arabic and MENA speech resources with an extensible framework for other languages.

Overview

The development of AudioLLMs for low-resource languages depends not only on model capability, but also on speech resources that capture dialect diversity, speaker variation, and realistic conversational interaction. MENASpeechBank addresses this need for Arabic and the MENA region by pairing a reference voice bank with a large synthetic conversation pipeline designed for persona-grounded, multi-turn evaluation.

More broadly, MENASpeechBank is not only a resource for Arabic. The pipeline is designed to be reusable for building similar AudioLLM resources in other languages where real conversational speech data is scarce.

Resource at a glance

3,915

Hugging Face downloads (all-time)

17,641

reference utterances

124

speakers in the voice bank

countries represented

469

personas

900

topics

4,521

conversation scenarios

417k

multi-turn dialogues

What the resource includes

Reference voice bank

A curated speech bank with 17,641 utterances from 124 speakers across 18 MENA countries, covering English, Modern Standard Arabic, and regional Arabic varieties.

Persona-conditioned conversations

A large synthetic pipeline builds 469 personas, 900 topics, and 4,521 scenarios to generate approximately 417,000 multi-turn dialogues grounded in realistic assistant interactions.

Extensible framework

The design is not limited to Arabic. MENASpeechBank also serves as a template for building AudioLLM resources in other low-resource and dialect-rich languages.

Why it matters for AudioLLMs

MENASpeechBank is designed to support the development of conversational speech datasets for AudioLLMs. It is intended to enable the development and evaluation of models that can maintain conversational context, adapt to speaker characteristics, and respond appropriately in culturally grounded interactions that reflect real language use across the MENA region.

Arabic MENA Persona-grounded Multi-turn Dialect diversity Speaker variation AudioLLMs

Framework

The paper describes a controllable pipeline that constructs persona profiles, defines conversational scenarios, matches personas to scenarios, generates role-play conversations, and synthesizes user turns while preserving speaker identity through reference audio. This makes the resource useful not only for benchmark creation, but also for studying how model behavior changes across speakers, dialects, and interaction settings.

Citation

@article{ali2026menaspeechbank,
  title={MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs},
  author={Ali, Zien Sheikh and Bhatti, Hunzalah Hassan and Nandi, Rabindra Nath and Chowdhury, Shammur Absar and Alam, Firoj},
  journal={arXiv preprint arXiv:2602.07036},
  year={2026}
}

Metrics are fetched from public APIs when available. Last checked: runtime.