APIs
This section documents the public G2P, pronunciation correction, and phoneme ASR
interfaces exposed by the hama Python package and the
hama-js package via hama-js/g2p, hama-js/asr,
hama-js/g2p/browser, hama-js/asr/browser, and
hama-js/browser.
G2P
Quickstart
from hama import G2PModel
model = G2PModel()
result = model.predict(
"Really? What's the orbital velocity of the moon?",
preserve_literals="punct",
)
print(result.ipa)
print(result.display_ipa)
for alignment in result.alignments:
print(alignment.phoneme, alignment.char_index) import { G2PNodeModel } from "hama-js/g2p";
const model = await G2PNodeModel.create();
const result = await model.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(result.ipa, result.displayIpa, result.alignments); import { G2PBrowserModel } from "hama-js/g2p/browser";
const model = await G2PBrowserModel.create();
const result = await model.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(result.ipa, result.displayIpa, result.alignments); G2P uses split ONNX assets by default. If you explicitly pass a single-file
model_path or modelPath, both runtimes still support the legacy
fallback path.
Signatures
class G2PModel(
model_path: Optional[PathLike] = None,
encoder_model_path: Optional[PathLike] = None,
decoder_step_model_path: Optional[PathLike] = None,
vocab_path: Optional[PathLike] = None,
max_input_len: int = 128,
max_output_len: int = 32,
providers: Optional[Sequence[str]] = None,
)
model.predict(
text,
split_delimiter=r"\s+",
output_delimiter=" ",
preserve_literals="none" | "punct",
) type G2POptions = {
modelPath?: string;
encoderModelPath?: string;
decoderStepModelPath?: string;
maxInputLen?: number;
maxOutputLen?: number;
};
type BrowserOptions = {
modelUrl?: string;
encoderUrl?: string;
decoderStepUrl?: string;
maxInputLen?: number;
maxOutputLen?: number;
};
type PredictOptions = {
splitDelimiter?: string | RegExp | null;
outputDelimiter?: string;
preserveLiterals?: "none" | "punct";
}; Return values
Python returns
G2PResultwithipaanddisplay_ipaplusalignments: list[G2PAlignment].TypeScript returns
{ ipa: string; displayIpa: string; alignments: Alignment[] }.Alignment fields are
phoneme,phoneme_index/phonemeIndex, andchar_index/charIndex.
Notes
- Python applies Unicode casefolding; TypeScript uses
toLocaleLowerCase(“und”). - Whitespace is skipped during tokenization, so alignments map back to non-whitespace characters.
- For whitespace-only input, the alignment sentinel is
-1. display_ipa/displayIpaequals canonical IPA by default.- Set
preserve_literals=“punct”orpreserveLiterals: “punct”to keep punctuation in rendered output. - Browser G2P is available through
hama-js/g2p/browseror the aggregatehama-js/browserexport.
Pronunciation correction
Quickstart
from hama import pronunciation_scan, pronunciation_replace
text = "we met (jon smyth), and later spoke with o reilly media yesterday."
terms = [{"text": "John Smythe"}, {"text": "O'Reilly Media"}]
scan = pronunciation_scan(text, terms, {"return_phonemes": True})
result = pronunciation_replace(
text,
terms,
{"return_phonemes": True, "include_discarded": True},
)
print(scan["matches"])
print(result["text"]) import { pronunciationScan, pronunciationReplace } from "hama-js";
const text =
"we met (jon smyth), and later spoke with o reilly media yesterday.";
const terms = [{ text: "John Smythe" }, { text: "O'Reilly Media" }];
const scan = await pronunciationScan(text, terms, {
returnPhonemes: true,
});
const result = await pronunciationReplace(text, terms, {
returnPhonemes: true,
includeDiscarded: true,
});
console.log(scan.matches);
console.log(result.text); import { G2PBrowserModel } from "hama-js/g2p/browser";
const model = await G2PBrowserModel.create();
const text =
"we met (jon smyth), and later spoke with o reilly media yesterday.";
const terms = [{ text: "John Smythe" }, { text: "O'Reilly Media" }];
const scan = await model.pronunciationScan(text, terms, {
returnPhonemes: true,
});
const result = await model.pronunciationReplace(text, terms, {
returnPhonemes: true,
includeDiscarded: true,
});
console.log(scan.matches);
console.log(result.text); Signatures
pronunciation_scan(
text: str,
terms: Sequence[str | PronunciationTerm],
options: PronunciationScanOptions | None = None,
) -> PronunciationScanResult
pronunciation_replace(
text: str,
terms: Sequence[str | PronunciationTerm],
options: PronunciationReplaceOptions | None = None,
) -> PronunciationReplaceResult pronunciationScan(
text: string,
terms: Array<string | PronunciationTerm>,
options?: PronunciationScanOptions,
): Promise<PronunciationScanResult>
pronunciationReplace(
text: string,
terms: Array<string | PronunciationTerm>,
options?: PronunciationReplaceOptions,
): Promise<PronunciationReplaceResult>
model.pronunciationScan(text, terms, options?)
model.pronunciationReplace(text, terms, options?) Return values
Scan returns
matcheswith the matched text, canonical replacement, originalstart_char/end_charorstartChar/endChar, score, and optional phoneme details.Replace returns corrected
text,appliedpatches,discardedpatches, andstatsfor applied, ambiguous, overlap, and duplicate counts.Applied patches also include output offsets via
output_start_char/outputStartCharandoutput_end_char/outputEndChar.
Notes
- Offsets always refer to the original input string.
- Matching is token-boundary only, so larger words are not rewritten by substring match.
- Matching is pronunciation-first, with text similarity as a secondary score.
- Replacement rewrites the original text in one pass, preserving surrounding punctuation and spacing.
- Ambiguous and overlapping candidates can be surfaced through discarded patches.
ASR
Quickstart
from hama import ASRModel
model = ASRModel()
result = model.transcribe_file("sample.wav")
print(result.phoneme_text)
print(result.word_phoneme_text) import { ASRNodeModel } from "hama-js/asr";
const model = await ASRNodeModel.create();
const result = await model.transcribeWavFile("sample.wav");
console.log(result.phonemeText, result.wordPhonemeText); import { ASRBrowserModel } from "hama-js/asr/browser";
const model = await ASRBrowserModel.create({
modelUrl: "/assets/asr_waveform_fp16.onnx",
});
const result = await model.transcribeWaveform(float32Samples, 16000);
console.log(result.phonemeText, result.wordPhonemeText); ASR is waveform-input only and uses the packaged asr_waveform_fp16.onnx asset.
Browser ASR uses the same model contract, loaded explicitly via modelUrl.
Signatures
class ASRModel(
model_path: Optional[PathLike] = None,
vocab_path: Optional[PathLike] = None,
decode: Optional[ASRDecodeConfig] = None,
providers: Optional[Sequence[str]] = None,
model_sample_rate: int = 16000,
)
model.transcribe_file("sample.wav")
model.transcribe_waveform(waveform, sample_rate) type ASRNodeOptions = {
modelPath?: string;
vocabPath?: string;
sampleRate?: number;
blankToken?: string;
unkToken?: string;
wordBoundaryToken?: string;
blankBias?: number;
unkBias?: number;
collapseRepeats?: boolean;
};
type ASRBrowserOptions = {
modelUrl?: string;
vocabUrl?: string;
sampleRate?: number;
blankToken?: string;
unkToken?: string;
wordBoundaryToken?: string;
blankBias?: number;
unkBias?: number;
collapseRepeats?: boolean;
}; Return values
Python
ASRResult:phonemes,phoneme_text,word_phoneme_text,token_ids,frame_token_ids,num_frames.TypeScript
ASRResult:phonemes,phonemeText,wordPhonemeText,tokenIds,frameTokenIds,numFrames.
Common usage patterns
# Reuse model instances across requests.
g2p = G2PModel()
asr = ASRModel()
# Explicit split G2P assets.
custom_g2p = G2PModel(
encoder_model_path="encoder.onnx",
decoder_step_model_path="decoder_step.onnx",
vocab_path="g2p_vocab.json",
)
# Explicit ASR asset.
custom_asr = ASRModel(model_path="asr_waveform_fp16.onnx") import { G2PNodeModel } from "hama-js/g2p";
import { ASRNodeModel } from "hama-js/asr";
import { G2PBrowserModel, ASRBrowserModel } from "hama-js/browser";
// Reuse model instances.
const g2p = await G2PNodeModel.create();
const asr = await ASRNodeModel.create();
// Browser: host assets next to your bundle.
const browserG2p = await G2PBrowserModel.create({
encoderUrl: "/assets/encoder.onnx",
decoderStepUrl: "/assets/decoder_step.onnx",
});
const browserAsr = await ASRBrowserModel.create({
modelUrl: "/assets/asr_waveform_fp16.onnx",
});
const rendered = await browserG2p.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(rendered.displayIpa); Reference demo
The browser demo powering this site lives in src/scripts/g2p-demo.ts. It
exposes mountBrowserDemo(), which wires DOM elements to the public browser G2P
runtime.
<section id="g2p-demo">
<textarea data-demo-input placeholder="Type text…">안녕하세요</textarea>
<button data-demo-chip data-value="Alignment gives explainability">Sample</button>
<span data-demo-status-dot></span>
<p data-demo-status-text>Waiting for input.</p>
<small data-demo-status-note>(Everything stays on-device.)</small>
<output data-demo-ipa>—</output>
<div data-demo-alignments></div>
</section> import { mountBrowserDemo } from "./g2p-demo";
const root = document.querySelector("#g2p-demo");
if (root) {
mountBrowserDemo(root);
} Shared helpers
- Python:
split_text_to_jamo,join_jamo_tokens,decode_ctc_tokens. - TypeScript:
splitTextToJamo,joinJamoTokens,decodeCtcTokens.
Alignments map each phoneme back to the original character index. Use them to highlight pronunciations or validate text/phoneme correspondence in UI.