hama on-device NLP

APIs

This section documents the public interfaces exposed by the hama Python package and the hama-js package via hama-js/g2p, hama-js/asr, hama-js/g2p/browser, hama-js/asr/browser, and hama-js/browser.

G2P

Quickstart

from hama import G2PModel

model = G2PModel()
result = model.predict(
  "Really? What's the orbital velocity of the moon?",
  preserve_literals="punct",
)

print(result.ipa)
print(result.display_ipa)
for alignment in result.alignments:
  print(alignment.phoneme, alignment.char_index)

G2P uses split ONNX assets by default. If you explicitly pass a single-file model_path or modelPath, both runtimes still support the legacy fallback path.

Signatures

class G2PModel(
  model_path: Optional[PathLike] = None,
  encoder_model_path: Optional[PathLike] = None,
  decoder_step_model_path: Optional[PathLike] = None,
  vocab_path: Optional[PathLike] = None,
  max_input_len: int = 128,
  max_output_len: int = 32,
  providers: Optional[Sequence[str]] = None,
)

model.predict(
  text,
  split_delimiter=r"\s+",
  output_delimiter=" ",
  preserve_literals="none" | "punct",
)

Return values

  • Python returns G2PResult with ipa and display_ipa plus alignments: list[G2PAlignment].

  • TypeScript returns { ipa: string; displayIpa: string; alignments: Alignment[] }.

  • Alignment fields are phoneme, phoneme_index / phonemeIndex, and char_index / charIndex.

Notes

  • Python applies Unicode casefolding; TypeScript uses toLocaleLowerCase(“und”).
  • Whitespace is skipped during tokenization, so alignments map back to non-whitespace characters.
  • For whitespace-only input, the alignment sentinel is -1.
  • display_ipa / displayIpa equals canonical IPA by default.
  • Set preserve_literals=“punct” or preserveLiterals: “punct” to keep punctuation in rendered output.
  • Browser G2P is available through hama-js/g2p/browser or the aggregate hama-js/browser export.

ASR

Quickstart

from hama import ASRModel

model = ASRModel()
result = model.transcribe_file("sample.wav")
print(result.phoneme_text)
print(result.word_phoneme_text)

ASR is waveform-input only and uses the packaged asr_waveform_fp16.onnx asset. Browser ASR uses the same model contract, loaded explicitly via modelUrl.

Signatures

class ASRModel(
  model_path: Optional[PathLike] = None,
  vocab_path: Optional[PathLike] = None,
  decode: Optional[ASRDecodeConfig] = None,
  providers: Optional[Sequence[str]] = None,
  model_sample_rate: int = 16000,
)

model.transcribe_file("sample.wav")
model.transcribe_waveform(waveform, sample_rate)

Return values

  • Python ASRResult: phonemes, phoneme_text, word_phoneme_text, token_ids, frame_token_ids, num_frames.

  • TypeScript ASRResult: phonemes, phonemeText, wordPhonemeText, tokenIds, frameTokenIds, numFrames.

Common usage patterns

# Reuse model instances across requests.
g2p = G2PModel()
asr = ASRModel()

# Explicit split G2P assets.
custom_g2p = G2PModel(
  encoder_model_path="encoder.onnx",
  decoder_step_model_path="decoder_step.onnx",
  vocab_path="g2p_vocab.json",
)

# Explicit ASR asset.
custom_asr = ASRModel(model_path="asr_waveform_fp16.onnx")

Reference demo

The browser demo powering this site lives in src/scripts/g2p-demo.ts. It exposes mountBrowserDemo(), which wires DOM elements to the public browser G2P runtime.

<section id="g2p-demo">
<textarea data-demo-input placeholder="Type text…">안녕하세요</textarea>
<button data-demo-chip data-value="Alignment gives explainability">Sample</button>
<span data-demo-status-dot></span>
<p data-demo-status-text>Waiting for input.</p>
<small data-demo-status-note>(Everything stays on-device.)</small>
<output data-demo-ipa>—</output>
<div data-demo-alignments></div>
</section>

Shared helpers

  • Python: split_text_to_jamo, join_jamo_tokens, decode_ctc_tokens.
  • TypeScript: splitTextToJamo, joinJamoTokens, decodeCtcTokens.

Alignments map each phoneme back to the original character index. Use them to highlight pronunciations or validate text/phoneme correspondence in UI.