Modalities
This page summarizes what is currently shipped in hama and what remains on the roadmap.
Available
Text → IPA (G2P): Available in Python, Node/Bun, and the browser. Returns IPA plus per-phoneme alignment metadata.
Audio → Phoneme (ASR): Available in Python, Node/Bun, and the browser. Accepts waveform input and returns collapsed phoneme output from
asr_waveform_fp16.onnx.
Coming soon
IPA → Text: Reverse text generation from phoneme sequences is not exposed as a public runtime today.
Audio → Text: Grapheme-level ASR remains separate from the shipped phoneme ASR runtime.
Text → Embeddings: Embedding export is still planned.
Runtime coverage
- Browser supports both G2P and waveform-input phoneme ASR.
- Node/Bun supports both G2P and waveform-input ASR.
- Python supports both G2P and waveform-input ASR.
Model contracts
- Input type: text for G2P, waveform for ASR.
- Output type: IPA plus alignments, or phoneme ASR tokens.
- Shared vocabulary and tokenization rules.
- ONNX tensor names expected by the runtime wrappers.