APIs
This section documents the public interfaces exposed by the hama Python package
and the hama-js package via hama-js/g2p,
hama-js/asr, hama-js/g2p/browser,
hama-js/asr/browser, and hama-js/browser.
G2P
Quickstart
from hama import G2PModel
model = G2PModel()
result = model.predict(
"Really? What's the orbital velocity of the moon?",
preserve_literals="punct",
)
print(result.ipa)
print(result.display_ipa)
for alignment in result.alignments:
print(alignment.phoneme, alignment.char_index) import { G2PNodeModel } from "hama-js/g2p";
const model = await G2PNodeModel.create();
const result = await model.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(result.ipa, result.displayIpa, result.alignments); import { G2PBrowserModel } from "hama-js/g2p/browser";
const model = await G2PBrowserModel.create();
const result = await model.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(result.ipa, result.displayIpa, result.alignments); G2P uses split ONNX assets by default. If you explicitly pass a single-file
model_path or modelPath, both runtimes still support the legacy
fallback path.
Signatures
class G2PModel(
model_path: Optional[PathLike] = None,
encoder_model_path: Optional[PathLike] = None,
decoder_step_model_path: Optional[PathLike] = None,
vocab_path: Optional[PathLike] = None,
max_input_len: int = 128,
max_output_len: int = 32,
providers: Optional[Sequence[str]] = None,
)
model.predict(
text,
split_delimiter=r"\s+",
output_delimiter=" ",
preserve_literals="none" | "punct",
) type G2POptions = {
modelPath?: string;
encoderModelPath?: string;
decoderStepModelPath?: string;
maxInputLen?: number;
maxOutputLen?: number;
};
type BrowserOptions = {
modelUrl?: string;
encoderUrl?: string;
decoderStepUrl?: string;
maxInputLen?: number;
maxOutputLen?: number;
};
type PredictOptions = {
splitDelimiter?: string | RegExp | null;
outputDelimiter?: string;
preserveLiterals?: "none" | "punct";
}; Return values
Python returns
G2PResultwithipaanddisplay_ipaplusalignments: list[G2PAlignment].TypeScript returns
{ ipa: string; displayIpa: string; alignments: Alignment[] }.Alignment fields are
phoneme,phoneme_index/phonemeIndex, andchar_index/charIndex.
Notes
- Python applies Unicode casefolding; TypeScript uses
toLocaleLowerCase(“und”). - Whitespace is skipped during tokenization, so alignments map back to non-whitespace characters.
- For whitespace-only input, the alignment sentinel is
-1. display_ipa/displayIpaequals canonical IPA by default.- Set
preserve_literals=“punct”orpreserveLiterals: “punct”to keep punctuation in rendered output. - Browser G2P is available through
hama-js/g2p/browseror the aggregatehama-js/browserexport.
ASR
Quickstart
from hama import ASRModel
model = ASRModel()
result = model.transcribe_file("sample.wav")
print(result.phoneme_text)
print(result.word_phoneme_text) import { ASRNodeModel } from "hama-js/asr";
const model = await ASRNodeModel.create();
const result = await model.transcribeWavFile("sample.wav");
console.log(result.phonemeText, result.wordPhonemeText); import { ASRBrowserModel } from "hama-js/asr/browser";
const model = await ASRBrowserModel.create({
modelUrl: "/assets/asr_waveform_fp16.onnx",
});
const result = await model.transcribeWaveform(float32Samples, 16000);
console.log(result.phonemeText, result.wordPhonemeText); ASR is waveform-input only and uses the packaged asr_waveform_fp16.onnx asset.
Browser ASR uses the same model contract, loaded explicitly via modelUrl.
Signatures
class ASRModel(
model_path: Optional[PathLike] = None,
vocab_path: Optional[PathLike] = None,
decode: Optional[ASRDecodeConfig] = None,
providers: Optional[Sequence[str]] = None,
model_sample_rate: int = 16000,
)
model.transcribe_file("sample.wav")
model.transcribe_waveform(waveform, sample_rate) type ASRNodeOptions = {
modelPath?: string;
vocabPath?: string;
sampleRate?: number;
blankToken?: string;
unkToken?: string;
wordBoundaryToken?: string;
blankBias?: number;
unkBias?: number;
collapseRepeats?: boolean;
};
type ASRBrowserOptions = {
modelUrl?: string;
vocabUrl?: string;
sampleRate?: number;
blankToken?: string;
unkToken?: string;
wordBoundaryToken?: string;
blankBias?: number;
unkBias?: number;
collapseRepeats?: boolean;
}; Return values
Python
ASRResult:phonemes,phoneme_text,word_phoneme_text,token_ids,frame_token_ids,num_frames.TypeScript
ASRResult:phonemes,phonemeText,wordPhonemeText,tokenIds,frameTokenIds,numFrames.
Common usage patterns
# Reuse model instances across requests.
g2p = G2PModel()
asr = ASRModel()
# Explicit split G2P assets.
custom_g2p = G2PModel(
encoder_model_path="encoder.onnx",
decoder_step_model_path="decoder_step.onnx",
vocab_path="g2p_vocab.json",
)
# Explicit ASR asset.
custom_asr = ASRModel(model_path="asr_waveform_fp16.onnx") import { G2PNodeModel } from "hama-js/g2p";
import { ASRNodeModel } from "hama-js/asr";
import { G2PBrowserModel, ASRBrowserModel } from "hama-js/browser";
// Reuse model instances.
const g2p = await G2PNodeModel.create();
const asr = await ASRNodeModel.create();
// Browser: host assets next to your bundle.
const browserG2p = await G2PBrowserModel.create({
encoderUrl: "/assets/encoder.onnx",
decoderStepUrl: "/assets/decoder_step.onnx",
});
const browserAsr = await ASRBrowserModel.create({
modelUrl: "/assets/asr_waveform_fp16.onnx",
});
const rendered = await browserG2p.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(rendered.displayIpa); Reference demo
The browser demo powering this site lives in src/scripts/g2p-demo.ts. It
exposes mountBrowserDemo(), which wires DOM elements to the public browser G2P
runtime.
<section id="g2p-demo">
<textarea data-demo-input placeholder="Type text…">안녕하세요</textarea>
<button data-demo-chip data-value="Alignment gives explainability">Sample</button>
<span data-demo-status-dot></span>
<p data-demo-status-text>Waiting for input.</p>
<small data-demo-status-note>(Everything stays on-device.)</small>
<output data-demo-ipa>—</output>
<div data-demo-alignments></div>
</section> import { mountBrowserDemo } from "./g2p-demo";
const root = document.querySelector("#g2p-demo");
if (root) {
mountBrowserDemo(root);
} Shared helpers
- Python:
split_text_to_jamo,join_jamo_tokens,decode_ctc_tokens. - TypeScript:
splitTextToJamo,joinJamoTokens,decodeCtcTokens.
Alignments map each phoneme back to the original character index. Use them to highlight pronunciations or validate text/phoneme correspondence in UI.