hama on-device NLP

Modalities

This page summarizes what is currently shipped in hama and what remains on the roadmap.

Available

  • Text → IPA (G2P): Available in Python, Node/Bun, and the browser. Returns IPA plus per-phoneme alignment metadata.

  • Audio → Phoneme (ASR): Available in Python, Node/Bun, and the browser. Accepts waveform input and returns collapsed phoneme output from asr_waveform_fp16.onnx.

Coming soon

  • IPA → Text: Reverse text generation from phoneme sequences is not exposed as a public runtime today.

  • Audio → Text: Grapheme-level ASR remains separate from the shipped phoneme ASR runtime.

  • Text → Embeddings: Embedding export is still planned.

Runtime coverage

  • Browser supports both G2P and waveform-input phoneme ASR.
  • Node/Bun supports both G2P and waveform-input ASR.
  • Python supports both G2P and waveform-input ASR.

Model contracts

  • Input type: text for G2P, waveform for ASR.
  • Output type: IPA plus alignments, or phoneme ASR tokens.
  • Shared vocabulary and tokenization rules.
  • ONNX tensor names expected by the runtime wrappers.