APIs

This section documents the public G2P, pronunciation correction, and phoneme ASR interfaces exposed by the hama Python package and the hama-js package via hama-js/g2p, hama-js/asr, hama-js/g2p/browser, hama-js/asr/browser, and hama-js/browser.

G2P

Quickstart

from hama import G2PModel

model = G2PModel()
result = model.predict(
  "Really? What's the orbital velocity of the moon?",
  preserve_literals="punct",
)

print(result.ipa)
print(result.display_ipa)
for alignment in result.alignments:
  print(alignment.phoneme, alignment.char_index)

import { G2PNodeModel } from "hama-js/g2p";

const model = await G2PNodeModel.create();
const result = await model.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(result.ipa, result.displayIpa, result.alignments);

import { G2PBrowserModel } from "hama-js/g2p/browser";

const model = await G2PBrowserModel.create();
const result = await model.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(result.ipa, result.displayIpa, result.alignments);

G2P uses split ONNX assets by default. If you explicitly pass a single-file model_path or modelPath, both runtimes still support the legacy fallback path.

Signatures

class G2PModel(
  model_path: Optional[PathLike] = None,
  encoder_model_path: Optional[PathLike] = None,
  decoder_step_model_path: Optional[PathLike] = None,
  vocab_path: Optional[PathLike] = None,
  max_input_len: int = 128,
  max_output_len: int = 32,
  providers: Optional[Sequence[str]] = None,
)

model.predict(
  text,
  split_delimiter=r"\s+",
  output_delimiter=" ",
  preserve_literals="none" | "punct",
)

type G2POptions = {
modelPath?: string;
encoderModelPath?: string;
decoderStepModelPath?: string;
maxInputLen?: number;
maxOutputLen?: number;
};

type BrowserOptions = {
modelUrl?: string;
encoderUrl?: string;
decoderStepUrl?: string;
maxInputLen?: number;
maxOutputLen?: number;
};

type PredictOptions = {
splitDelimiter?: string | RegExp | null;
outputDelimiter?: string;
preserveLiterals?: "none" | "punct";
};

Return values

Python returns G2PResult with ipa and display_ipa plus alignments: list[G2PAlignment].
TypeScript returns { ipa: string; displayIpa: string; alignments: Alignment[] }.
Alignment fields are phoneme, phoneme_index / phonemeIndex, and char_index / charIndex.

Notes

Python applies Unicode casefolding; TypeScript uses toLocaleLowerCase(“und”).
Whitespace is skipped during tokenization, so alignments map back to non-whitespace characters.
For whitespace-only input, the alignment sentinel is -1.
display_ipa / displayIpa equals canonical IPA by default.
Set preserve_literals=“punct” or preserveLiterals: “punct” to keep punctuation in rendered output.
Browser G2P is available through hama-js/g2p/browser or the aggregate hama-js/browser export.

Pronunciation correction

Quickstart

from hama import pronunciation_scan, pronunciation_replace

text = "we met (jon smyth), and later spoke with o reilly media yesterday."
terms = [{"text": "John Smythe"}, {"text": "O'Reilly Media"}]

scan = pronunciation_scan(text, terms, {"return_phonemes": True})
result = pronunciation_replace(
  text,
  terms,
  {"return_phonemes": True, "include_discarded": True},
)

print(scan["matches"])
print(result["text"])

import { pronunciationScan, pronunciationReplace } from "hama-js";

const text =
"we met (jon smyth), and later spoke with o reilly media yesterday.";
const terms = [{ text: "John Smythe" }, { text: "O'Reilly Media" }];

const scan = await pronunciationScan(text, terms, {
returnPhonemes: true,
});
const result = await pronunciationReplace(text, terms, {
returnPhonemes: true,
includeDiscarded: true,
});

console.log(scan.matches);
console.log(result.text);

import { G2PBrowserModel } from "hama-js/g2p/browser";

const model = await G2PBrowserModel.create();
const text =
"we met (jon smyth), and later spoke with o reilly media yesterday.";
const terms = [{ text: "John Smythe" }, { text: "O'Reilly Media" }];

const scan = await model.pronunciationScan(text, terms, {
returnPhonemes: true,
});
const result = await model.pronunciationReplace(text, terms, {
returnPhonemes: true,
includeDiscarded: true,
});

console.log(scan.matches);
console.log(result.text);

Signatures

pronunciation_scan(
  text: str,
  terms: Sequence[str | PronunciationTerm],
  options: PronunciationScanOptions | None = None,
) -> PronunciationScanResult

pronunciation_replace(
  text: str,
  terms: Sequence[str | PronunciationTerm],
  options: PronunciationReplaceOptions | None = None,
) -> PronunciationReplaceResult

pronunciationScan(
text: string,
terms: Array<string | PronunciationTerm>,
options?: PronunciationScanOptions,
): Promise<PronunciationScanResult>

pronunciationReplace(
text: string,
terms: Array<string | PronunciationTerm>,
options?: PronunciationReplaceOptions,
): Promise<PronunciationReplaceResult>

model.pronunciationScan(text, terms, options?)
model.pronunciationReplace(text, terms, options?)

Return values

Scan returns matches with the matched text, canonical replacement, original start_char / end_char or startChar / endChar, score, and optional phoneme details.
Replace returns corrected text, applied patches, discarded patches, and stats for applied, ambiguous, overlap, and duplicate counts.
Applied patches also include output offsets via output_start_char / outputStartChar and output_end_char / outputEndChar.

Notes

Offsets always refer to the original input string.
Matching is token-boundary only, so larger words are not rewritten by substring match.
Matching is pronunciation-first, with text similarity as a secondary score.
Replacement rewrites the original text in one pass, preserving surrounding punctuation and spacing.
Ambiguous and overlapping candidates can be surfaced through discarded patches.

ASR

Quickstart

from hama import ASRModel

model = ASRModel()
result = model.transcribe_file("sample.wav")
print(result.phoneme_text)
print(result.word_phoneme_text)

import { ASRNodeModel } from "hama-js/asr";

const model = await ASRNodeModel.create();
const result = await model.transcribeWavFile("sample.wav");
console.log(result.phonemeText, result.wordPhonemeText);

import { ASRBrowserModel } from "hama-js/asr/browser";

const model = await ASRBrowserModel.create({
modelUrl: "/assets/asr_waveform_fp16.onnx",
});
const result = await model.transcribeWaveform(float32Samples, 16000);
console.log(result.phonemeText, result.wordPhonemeText);

ASR is waveform-input only and uses the packaged asr_waveform_fp16.onnx asset. Browser ASR uses the same model contract, loaded explicitly via modelUrl.

Signatures

class ASRModel(
  model_path: Optional[PathLike] = None,
  vocab_path: Optional[PathLike] = None,
  decode: Optional[ASRDecodeConfig] = None,
  providers: Optional[Sequence[str]] = None,
  model_sample_rate: int = 16000,
)

model.transcribe_file("sample.wav")
model.transcribe_waveform(waveform, sample_rate)

type ASRNodeOptions = {
modelPath?: string;
vocabPath?: string;
sampleRate?: number;
blankToken?: string;
unkToken?: string;
wordBoundaryToken?: string;
blankBias?: number;
unkBias?: number;
collapseRepeats?: boolean;
};

type ASRBrowserOptions = {
modelUrl?: string;
vocabUrl?: string;
sampleRate?: number;
blankToken?: string;
unkToken?: string;
wordBoundaryToken?: string;
blankBias?: number;
unkBias?: number;
collapseRepeats?: boolean;
};

Return values

Python ASRResult: phonemes, phoneme_text, word_phoneme_text, token_ids, frame_token_ids, num_frames.
TypeScript ASRResult: phonemes, phonemeText, wordPhonemeText, tokenIds, frameTokenIds, numFrames.

Common usage patterns

# Reuse model instances across requests.
g2p = G2PModel()
asr = ASRModel()

# Explicit split G2P assets.
custom_g2p = G2PModel(
  encoder_model_path="encoder.onnx",
  decoder_step_model_path="decoder_step.onnx",
  vocab_path="g2p_vocab.json",
)

# Explicit ASR asset.
custom_asr = ASRModel(model_path="asr_waveform_fp16.onnx")

import { G2PNodeModel } from "hama-js/g2p";
import { ASRNodeModel } from "hama-js/asr";
import { G2PBrowserModel, ASRBrowserModel } from "hama-js/browser";

// Reuse model instances.
const g2p = await G2PNodeModel.create();
const asr = await ASRNodeModel.create();

// Browser: host assets next to your bundle.
const browserG2p = await G2PBrowserModel.create({
encoderUrl: "/assets/encoder.onnx",
decoderStepUrl: "/assets/decoder_step.onnx",
});

const browserAsr = await ASRBrowserModel.create({
modelUrl: "/assets/asr_waveform_fp16.onnx",
});

const rendered = await browserG2p.predict(
"Really? What's the orbital velocity of the moon?",
{ preserveLiterals: "punct" },
);
console.log(rendered.displayIpa);

Reference demo

The browser demo powering this site lives in src/scripts/g2p-demo.ts. It exposes mountBrowserDemo(), which wires DOM elements to the public browser G2P runtime.

<section id="g2p-demo">
<textarea data-demo-input placeholder="Type text…">안녕하세요</textarea>
<button data-demo-chip data-value="Alignment gives explainability">Sample</button>
<span data-demo-status-dot></span>
<p data-demo-status-text>Waiting for input.</p>
<small data-demo-status-note>(Everything stays on-device.)</small>
<output data-demo-ipa>—</output>
<div data-demo-alignments></div>
</section>

import { mountBrowserDemo } from "./g2p-demo";

const root = document.querySelector("#g2p-demo");
if (root) {
mountBrowserDemo(root);
}

Shared helpers

Python: split_text_to_jamo, join_jamo_tokens, decode_ctc_tokens.
TypeScript: splitTextToJamo, joinJamoTokens, decodeCtcTokens.

Alignments map each phoneme back to the original character index. Use them to highlight pronunciations or validate text/phoneme correspondence in UI.