MENASpeechBank
MENASpeechBank resource page for Arabic and MENA-focused AudioLLM speech resources.
AudioLLM resource
MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs
Building better resources for AudioLLMs through the lens of Arabic, MENASpeechBank combines a curated voice bank and a large persona-conditioned conversation pipeline to support evaluation in culturally grounded, multi-turn speech settings.
Scope Arabic and MENA speech resources with an extensible framework for other languages.
Overview
The development of AudioLLMs for low-resource languages depends not only on model capability, but also on speech resources that capture dialect diversity, speaker variation, and realistic conversational interaction. MENASpeechBank addresses this need for Arabic and the MENA region by pairing a reference voice bank with a large synthetic conversation pipeline designed for persona-grounded, multi-turn evaluation.
More broadly, MENASpeechBank is not only a resource for Arabic. The pipeline is designed to be reusable for building similar AudioLLM resources in other languages where real conversational speech data is scarce.
Resource at a glance
Hugging Face downloads (all-time)
reference utterances
speakers in the voice bank
countries represented
personas
topics
conversation scenarios
multi-turn dialogues
What the resource includes
Reference voice bank
A curated speech bank with 17,641 utterances from 124 speakers across 18 MENA countries, covering English, Modern Standard Arabic, and regional Arabic varieties.
Persona-conditioned conversations
A large synthetic pipeline builds 469 personas, 900 topics, and 4,521 scenarios to generate approximately 417,000 multi-turn dialogues grounded in realistic assistant interactions.
Extensible framework
The design is not limited to Arabic. MENASpeechBank also serves as a template for building AudioLLM resources in other low-resource and dialect-rich languages.
Why it matters for AudioLLMs
MENASpeechBank is designed to support the development of conversational speech datasets for AudioLLMs. It is intended to enable the development and evaluation of models that can maintain conversational context, adapt to speaker characteristics, and respond appropriately in culturally grounded interactions that reflect real language use across the MENA region.
Framework
The paper describes a controllable pipeline that constructs persona profiles, defines conversational scenarios, matches personas to scenarios, generates role-play conversations, and synthesizes user turns while preserving speaker identity through reference audio. This makes the resource useful not only for benchmark creation, but also for studying how model behavior changes across speakers, dialects, and interaction settings.
Citation
@article{ali2026menaspeechbank,
title={MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs},
author={Ali, Zien Sheikh and Bhatti, Hunzalah Hassan and Nandi, Rabindra Nath and Chowdhury, Shammur Absar and Alam, Firoj},
journal={arXiv preprint arXiv:2602.07036},
year={2026}
}