Architecture

hama and hama-js package the same model assets and tokenizer vocabulary for Python, Node/Bun, and browser inference. The runtime layers stay thin; most behavior is defined by the ONNX graphs and g2p_vocab.json.

Published assets

encoder.onnx – default G2P encoder graph.
decoder_step.onnx – default G2P decoder-step graph.
asr_waveform_fp16.onnx – phoneme ASR model.
g2p_vocab.json – shared tokenizer and decoder vocabulary.

Runtime bindings

Python runtime. Exposes G2PModel, ASRModel, tokenizer helpers, and WAV/CTC utilities from hama.
TypeScript runtime. Exposes hama-js/g2p, hama-js/asr, hama-js/g2p/browser, hama-js/asr/browser, and hama-js/browser. Node/Bun uses onnxruntime-node; browser G2P and browser ASR use onnxruntime-web.

G2P data flow

Text is tokenized into jamo (Korean) or script-specific graphemes using the shared vocabulary.
Tokens feed the G2P ONNX via input_ids and input_lengths.
Decoder output plus attention argmax indices map emitted phonemes back to original character indices.
The runtime returns an IPA string plus alignment metadata.

ASR data flow

Audio is read as mono waveform samples and resampled to the model rate when needed.
The ASR ONNX receives waveform and waveform_lengths.
log_probs and out_lengths are decoded with CTC post-processing.
The runtime returns phoneme sequences, text forms, and frame-level token ids.

Normalization note: Python casefolds input; TypeScript lowercases with toLocaleLowerCase(“und”). Whitespace is ignored during tokenization, so alignments never point to whitespace characters.

Model contracts

G2P contract: text input is tokenized through the shared vocabulary and passed as input_ids and input_lengths.
G2P outputs: the runtime returns IPA plus alignments; attention argmax maps each emitted phoneme back to an input character index.
ASR contract: waveform input is passed as waveform and waveform_lengths.
ASR outputs: the ONNX graph returns log_probs and out_lengths, which decode to phoneme tokens.
Shared rules: Python and TypeScript wrappers use the same tokenizer vocabulary and ONNX tensor names.

Extending assets

Regenerate vocab/tokenizer JSON for the new language or modality.
Export the ONNX graph and place the artifact in the shared asset set.
Update Python and TypeScript bindings together.
Document the change and release both packages in the same cycle.