Parakeet.js

What is parakeet.js

parakeet.js is browser speech-to-text for NVIDIA Parakeet ONNX models. It runs fully client-side using onnxruntime-web with WebGPU or WASM execution.

Installation

npm i parakeet.js
# or
yarn add parakeet.js

Use WebGPU when available for best throughput.
Use WASM when WebGPU is not available or for compatibility-first setups.

Quickstart

import { fromHub } from 'parakeet.js';

const model = await fromHub('parakeet-tdt-0.6b-v3', {
  backend: 'webgpu-hybrid',
  encoderQuant: 'fp32',
  decoderQuant: 'int8',
});

// `file` should be a File (for example from <input type="file">)
const pcm = await getMono16kPcm(file); // returns mono Float32Array at 16 kHz
const result = await model.transcribe(pcm, 16000, {
  returnTimestamps: true,
  returnConfidences: true,
});

console.log(result.utterance_text);

Use your existing app audio pipeline for getMono16kPcm(file) (Web Audio API, ffmpeg, server-side decode, etc.). A complete browser example is available in examples/demo/src/App.jsx (transcribeFile flow).

Loading models

fromHub(repoIdOrModelKey, options): easiest path. Accepts model keys like parakeet-tdt-0.6b-v3 or full repo IDs.
fromUrls(cfg): explicit URL wiring when you host assets yourself.

import { fromUrls } from 'parakeet.js';

const model = await fromUrls({
  encoderUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/encoder-model.onnx',
  decoderUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/decoder_joint-model.int8.onnx',
  tokenizerUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/vocab.txt',
  // Only needed if you choose preprocessorBackend: 'onnx'
  preprocessorUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/nemo128.onnx',
  backend: 'webgpu-hybrid',
  preprocessorBackend: 'js',
});

Backends and quantization

Backends are selected with backend:
- webgpu (alias accepted)
- wasm
- advanced: webgpu-hybrid, webgpu-strict
In WebGPU modes, the encoder prefers WebGPU but decoder session runs on WASM (hybrid execution).
In getParakeetModel/fromHub, if backend starts with webgpu and encoderQuant is int8, encoder quantization is forced to fp32.
Encoder/decoder quantization supports int8, fp32, and fp16.
FP16 requires FP16 ONNX artifacts (for example encoder-model.fp16.onnx).
ONNX Runtime Web does not convert FP32 model files into FP16 at load time.
getParakeetModel/fromHub are strict about requested quantization: they do not auto-switch fp16 to fp32.
If requested FP16 artifacts are missing or fail to load, API calls throw actionable errors so callers can choose a different quantization explicitly.
Decoder runs on WASM in WebGPU modes; if decoder FP16 is unsupported in your runtime, choose decoderQuant: 'int8' or decoderQuant: 'fp32' explicitly.
preprocessorBackend is js (default) or onnx.

JS Mel FFT Update (v1.4.0)

parakeet.js now uses the pr74 real-FFT path in the default JS preprocessor (preprocessorBackend: 'js'). This keeps feature compatibility with the previous implementation while reducing mel extraction cost.

Item	Previous JS path	New JS path (default)
FFT strategy	Full `N=512` complex FFT per frame	Real-FFT via one `N/2=256` complex FFT + spectrum reconstruction (`pr74`)
Expected speed	Baseline	Faster mel stage (commonly around `~1.5x` in local mel benchmarks)
Output behavior	NeMo-compatible normalized log-mel	Same behavior and ONNX-reference accuracy thresholds preserved
API changes	N/A	None (`JsPreprocessor` / `IncrementalMelProcessor` unchanged)

If you need exact ONNX preprocessor execution instead of JS mel, set preprocessorBackend: 'onnx'.

FP16 Examples

Before using FP16 examples: ensure FP16 artifacts exist in the target repo and your browser/runtime supports FP16 execution (WebGPU FP16 path).

Load known FP16 model key:

import { fromHub } from 'parakeet.js';

const model = await fromHub('parakeet-tdt-0.6b-v3', {
  backend: 'webgpu-hybrid',
  encoderQuant: 'fp16',
  decoderQuant: 'fp16',
});

Use explicit FP16 URLs:

import { fromUrls } from 'parakeet.js';

const model = await fromUrls({
  encoderUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/encoder-model.fp16.onnx',
  decoderUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/decoder_joint-model.fp16.onnx',
  tokenizerUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/vocab.txt',
  preprocessorBackend: 'js',
  backend: 'webgpu-hybrid',
});

Transcribing a file (single-shot)

The demo flow in examples/demo/src/App.jsx is:

Load a model with public APIs (fromHub(...) for hub loading, or fromUrls(...) for explicit URLs).
Decode uploaded audio with AudioContext({ sampleRate: 16000 }) + decodeAudioData(...).
Convert decoded audio to mono 16 kHz PCM (Float32Array) by averaging channels when needed.
Call model.transcribe(pcm, 16000, options) and render utterance_text.

Reference code:

App component in examples/demo/src/App.jsx (loadModel / transcribeFile flow)

Results

model.transcribe(...) returns a TranscribeResult with this shape:

type TranscribeResult = {
  utterance_text: string;
  words: Array<{
    text: string;
    start_time: number;
    end_time: number;
    confidence?: number;
  }>;
  tokens?: Array<{
    token: string;
    raw_token?: string;
    is_word_start?: boolean;
    start_time?: number;
    end_time?: number;
    confidence?: number;
  }>;
  confidence_scores?: {
    token?: number[] | null;
    token_avg?: number | null;
    word?: number[] | null;
    word_avg?: number | null;
    frame: number[] | null;
    frame_avg: number | null;
    overall_log_prob: number | null;
  };
  metrics?: {
    preprocess_ms: number;
    encode_ms: number;
    decode_ms: number;
    tokenize_ms: number;
    total_ms: number;
    rtf: number;
    mel_cache?: { cached_frames: number; new_frames: number };
  } | null;
  is_final: boolean;
  tokenIds?: number[];
  frameIndices?: number[];
  logProbs?: number[];
  tdtSteps?: number[];
};

Enable returnTimestamps for meaningful start_time/end_time.
Enable returnConfidences for per-token/per-word confidence fields.
Advanced alignment/debug outputs are opt-in: returnTokenIds, returnFrameIndices, returnLogProbs, returnTdtSteps.

Real-time streaming (Keet)

Keet is a reference real-time app built on parakeet.js (repo).

For contiguous chunk streams, Keet uses createStreamingTranscriber(...).
Keet currently defaults to v4 utterance-based merging (UtteranceBasedMerger) with cursor/windowed chunk processing.

Real-time transcription with Keet and parakeet.js

API Reference

Published API docs: https://ysdede.github.io/parakeet.js/api/
Generate locally:

npm run docs:api

License

MIT

Credits

istupakov/onnx-asr for the reference implementation and model tooling foundations.

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
.github		.github
.jules		.jules
compat-tests		compat-tests
examples/demo		examples/demo
metrics		metrics
src		src
tests		tests
types		types
.coderabbit.yaml		.coderabbit.yaml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
typedoc.json		typedoc.json
vitest.config.js		vitest.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parakeet.js

What is parakeet.js

Installation

Quickstart

Loading models

Backends and quantization

JS Mel FFT Update (v1.4.0)

FP16 Examples

Transcribing a file (single-shot)

Results

Real-time streaming (Keet)

API Reference

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parakeet.js

What is parakeet.js

Installation

Quickstart

Loading models

Backends and quantization

JS Mel FFT Update (v1.4.0)

FP16 Examples

Transcribing a file (single-shot)

Results

Real-time streaming (Keet)

API Reference

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages