hama on-device NLP

Architecture

hama and hama-js package the same model assets and tokenizer vocabulary for Python, Node/Bun, and browser inference. The runtime layers stay thin; most behavior is defined by the ONNX graphs and g2p_vocab.json.

Published assets

  • encoder.onnx – default G2P encoder graph.
  • decoder_step.onnx – default G2P decoder-step graph.
  • asr_waveform_fp16.onnx – waveform-input phoneme ASR model.
  • g2p_vocab.json – shared tokenizer and decoder vocabulary.

Runtime bindings

  • Python runtime. Exposes G2PModel, ASRModel, tokenizer helpers, and WAV/CTC utilities from hama.

  • TypeScript runtime. Exposes hama-js/g2p, hama-js/asr, hama-js/g2p/browser, hama-js/asr/browser, and hama-js/browser. Node/Bun uses onnxruntime-node; browser G2P and browser ASR use onnxruntime-web.

G2P data flow

  1. Text is tokenized into jamo (Korean) or script-specific graphemes using the shared vocabulary.
  2. Tokens feed the G2P ONNX via input_ids and input_lengths.
  3. Decoder output plus attention argmax indices map emitted phonemes back to original character indices.
  4. The runtime returns an IPA string plus alignment metadata.

ASR data flow

  1. Audio is read as mono waveform samples and resampled to the model rate when needed.
  2. The ASR ONNX receives waveform and waveform_lengths.
  3. log_probs and out_lengths are decoded with CTC post-processing.
  4. The runtime returns phoneme sequences, text forms, and frame-level token ids.

Normalization note: Python casefolds input; TypeScript lowercases with toLocaleLowerCase(“und”). Whitespace is ignored during tokenization, so alignments never point to whitespace characters.

Model contracts

  • G2P inputs: input_ids and input_lengths.
  • G2P alignment rule: attention argmax maps each emitted phoneme back to an input character index.
  • ASR inputs: waveform and waveform_lengths.
  • ASR outputs: log_probs and out_lengths.

Browser export note: hama-js v1.3.7 corrected the published browser export targets, so browser G2P and browser ASR now resolve through the standard package exports without project-local aliasing.

Extending assets

  1. Regenerate vocab/tokenizer JSON for the new language or modality.
  2. Export the ONNX graph and place the artifact in the shared asset set.
  3. Update Python and TypeScript bindings together.
  4. Document the change and release both packages in the same cycle.