Built with Mozilla Common Voice

Every time the hama demo transcribes phonemes in your browser, part of what it knows comes from people who donated their voices. One of the datasets behind hama's phoneme ASR model is Mozilla Common Voice English, now distributed through the Mozilla Data Collective.

hama is an on-device pronunciation runtime: grapheme-to-phoneme conversion and phoneme-level speech recognition that run locally in Python, Node, and the browser. The models ship inside the package. Nothing you type or say is sent anywhere.

Many voices, no language model

A phoneme recognizer has no language model to lean on; it transcribes exactly the sounds it hears. That makes voice diversity the whole game. LibriSpeech, another of our training sets, is audiobook speech, which means careful, studio-read narration. Common Voice contributes thousands of ordinary speakers: different accents, microphones, rooms, and reading styles. Training on volunteer voices from everywhere is the difference between recognizing an um in anyone's mouth and recognizing it only in a narrator's.

Clear terms, shippable weights

Licensing matters just as much. Community datasets with clear terms are what allow us to train models and ship the weights inside an Apache 2.0 package that works offline. Data agency runs in both directions here: contributors chose to share their voices, and the people who use hama keep their audio on their own machines.

You can hear the result yourself: the Ums considered harmful experiment runs the phoneme ASR model live in your browser, and the homepage demo does the same for grapheme-to-phoneme conversion.

The full list of datasets hama is trained on, including Wikipron for the multilingual G2P side, is in the acknowledgements. If you build on community data, credit it; it is the supply chain of everything we make.

Explore community datasets at the Mozilla Data Collective.