Google’s new WAXAL dataset is expected to bridge a critical digital divide for over 100 million speakers by providing foundational data for 21 Sub-Saharan African languages, including Shona, Luganda, Yoruba, and Swahili.
The search giant and a consortium of leading African research institutions announced the launch of WAXAL last week.
What is WAXAL
WAXAL is a large-scale, openly accessible speech dataset designed to catalyse research and build more inclusive AI technologies.
While voice-enabled technologies have become common in much of the world, a profound scarcity of high-quality speech data has prevented their development for most of Africa’s 2,000+ languages.
This has excluded hundreds of millions of people from accessing technology in their native tongues.
ALSO READ: Google launches news initiative in a bid to bridge digital divide for community media
Bridging the gap
Google said the WAXAL dataset was created to directly address this gap.
Developed over three years with funding from Google, the project features 1,250 hours of transcribed, natural speech and over 20 hours of high-quality, studio recordings designed for building high-fidelity synthetic voices.
“The ultimate impact of WAXAL is the empowerment of people in Africa”. Said Aisha Walcott-Bryantt, Head of Google Research Africa.
“This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people. We look forward to seeing African innovators use this data to create everything from new educational tools to voice-enabled services that create tangible economic opportunities across the continent.”
Communities
A central principle of the project was to ensure it was built by and for the community. African academic and community organizations, including Makerere University (Uganda), the University of Ghana, and Digital Umuganda (Rwanda), led the data collection with guidance from Google experts.
These partner institutions retain full ownership of the data, establishing a new framework for equitable, partnership-led AI development.
ALSO READ: Sanef encouraged by attempt to address global digital market failures
Languages
The dataset covers the following languages:
- Acholi, Akan
- Dagaare
- Dagbani
- Dholuo
- Ewe
- Fante
- Fulani (Fula)
- Hausa,
- Igbo
- Ikposo (Kposo)
- Kikuyu
- Lingala
- Luganda
- Malagasy
- Masaaba
- Nyankole
- Rukiga
- Shona
- Soga (Lusoga)
- Swahili
- Yoruba.
The WAXAL dataset is now available. Quotes from Universities.
Startups
Meanwhile, Google has opened applications for the 10th cohort of the Google for Startups Accelerator Africa.
This 12-week, equity-free “AI First” hybrid program for Growth to Series A African startups focuses on becoming a launchpad for scientific breakthroughs, providing founders with the specialised AI tools and research mentorship they need to scale their impact.
“Africa’s tech landscape is seeing a vibrant shift toward deep-tech innovation,” said Folarin Aiyegbusi, Head of Startup Ecosystem, Africa.
“For Class 10, we are focusing on the potential of AI to drive health and societal benefits, providing the infrastructure and expertise to turn these startups into the research labs of the continent.”
Support
Since its inception in 2018, the Google for Startups Accelerator: Africa program has supported 180+ startups from 17 African countries.
Collectively, these startups have raised over $350 million in funding and created more than 3,700 direct job opportunities in the region.
Startups are invited to apply by 18 March 2026.
ALSO READ: Competition Commission starts Inquiry into media and digital platforms