Architecture
hama and hama-js package the same model assets and tokenizer vocabulary for Python,
Node/Bun, and browser inference. The runtime layers stay thin; most behavior is defined by
the ONNX graphs and g2p_vocab.json.
Published assets
encoder.onnx– default G2P encoder graph.decoder_step.onnx– default G2P decoder-step graph.asr_waveform_fp16.onnx– waveform-input phoneme ASR model.g2p_vocab.json– shared tokenizer and decoder vocabulary.
Runtime bindings
Python runtime. Exposes
G2PModel,ASRModel, tokenizer helpers, and WAV/CTC utilities fromhama.TypeScript runtime. Exposes
hama-js/g2p,hama-js/asr,hama-js/g2p/browser,hama-js/asr/browser, andhama-js/browser. Node/Bun usesonnxruntime-node; browser G2P and browser ASR useonnxruntime-web.
G2P data flow
- Text is tokenized into jamo (Korean) or script-specific graphemes using the shared vocabulary.
- Tokens feed the G2P ONNX via
input_idsandinput_lengths. - Decoder output plus attention argmax indices map emitted phonemes back to original character indices.
- The runtime returns an IPA string plus alignment metadata.
ASR data flow
- Audio is read as mono waveform samples and resampled to the model rate when needed.
- The ASR ONNX receives
waveformandwaveform_lengths. log_probsandout_lengthsare decoded with CTC post-processing.- The runtime returns phoneme sequences, text forms, and frame-level token ids.
Normalization note: Python casefolds input; TypeScript lowercases with
toLocaleLowerCase(“und”). Whitespace is ignored during tokenization, so
alignments never point to whitespace characters.
Model contracts
- G2P inputs:
input_idsandinput_lengths. - G2P alignment rule: attention argmax maps each emitted phoneme back to an input character index.
- ASR inputs:
waveformandwaveform_lengths. - ASR outputs:
log_probsandout_lengths.
Browser export note: hama-js v1.3.7 corrected the published
browser export targets, so browser G2P and browser ASR now resolve through the standard
package exports without project-local aliasing.
Extending assets
- Regenerate vocab/tokenizer JSON for the new language or modality.
- Export the ONNX graph and place the artifact in the shared asset set.
- Update Python and TypeScript bindings together.
- Document the change and release both packages in the same cycle.