BVoice

Local push-to-talk speech-to-text desktop app.
Hold a key, speak, release — the transcription is typed at your cursor.
Runs 100% offline using whisper.cpp.

Install

Grab the latest build from the Releases page.

Debian / Ubuntu — .deb:

sudo apt install ./BVoice_0.1.2_amd64.deb

Fedora / RHEL / openSUSE — .rpm:

sudo dnf install ./BVoice-0.1.2-1.x86_64.rpm

After install you'll find BVoice in your application menu. On first launch the selected whisper model (~75–466 MB) downloads to ~/.local/share/bvoice/models/.

The packages declare a runtime dependency on xdotool — used to type the transcription at the cursor.

To uninstall, the registered package name is b-voice (the bundler kebab-cases BVoice):

sudo apt remove b-voice          # Debian / Ubuntu
sudo dnf remove b-voice          # Fedora / RHEL / openSUSE

The terminal command is still bvoice — only the package name carries the hyphen.

Features

Push-to-talk trigger: hold Ctrl + Win (instant arm — no hold delay)
Local transcription with whisper.cpp (tiny.en / base.en / small.en, full or quantized q5_1 / q8_0)
Optional Silero VAD silence trim with tunable threshold
FFT-based resampling (rubato) for high-quality 48 kHz → 16 kHz conversion
Greedy decoding by default; configurable beam search (set beam_size ≥ 2)
Live-applied settings for threshold, input device, and model swap — no restart
Always-on-top desktop overlay reflects state (idle / recording / transcribing) with red and orange pulse animations; draggable, position persists
Single-instance enforcement; optional autostart on login
Types output directly at the cursor — never touches your clipboard

Usage

Launch BVoice — a tray icon appears (no window by default).
Click the tray icon → Settings to configure model, input device, beam size, VAD, and autostart.
Focus any text field (editor, terminal, browser, …).
Hold Ctrl + Win, speak, release.
The transcription is typed at the cursor.

Platform support

Linux / X11 — primary target, tested on Ubuntu GNOME
Wayland — not supported (global hotkeys and synthesized typing require compositor-specific portals)
macOS / Windows — not yet ported

Configuration

Settings persist at ~/.config/bvoice/config.toml:

Key	Type	Default	Description
`model`	string	`base.en`	Whisper model; append `-q5_1` or `-q8_0` for quantized
`input_device`	string\|null	`null`	Input device name; null = system default
`beam_size`	u32	`1`	Beam search size; `1` = greedy
`use_vad`	bool	`false`	Trim silence with Silero VAD before transcription
`vad_threshold`	f32	`0.5`	VAD speech probability threshold (0–1); active when on
`overlay_position`	[i32, i32]	bottom-right	Desktop overlay position; written automatically when you drag it

The trigger is hardcoded to Ctrl + Win and is not user-configurable.

The overlay icon and overlay_position update on drag — the rest is editable from the Settings window and persists on Save.

Build from source

Prerequisites

Rust (stable) via rustup
Node.js 20+ and npm
Tauri CLI: cargo install tauri-cli --version '^2.0' --locked

Linux system packages (Ubuntu/Debian):

sudo apt install \
  libwebkit2gtk-4.1-dev libsoup-3.0-dev libayatana-appindicator3-dev \
  libasound2-dev libpulse-dev libclang-dev libssl-dev libstdc++-12-dev \
  pkg-config build-essential

Run / build

npm install
npm run tauri dev          # development
npm run tauri build        # release bundles (.deb + .rpm)

Architecture

setup (background thread):  model::ensure_model  ─▶  transcribe::init  (whisper-rs context)

hotkey (rdev, X11 XRecord)  ─▶ state machine (Ctrl+Win chord)
                                   │
                             armed ▼
                             audio::start          (cpal capture on dedicated thread,
                                                    PulseAudio source via libpulse-binding)
                                   │
                          released ▼
                             audio::stop           (downmix to mono → rubato 48→16 kHz)
                                   │
                       (if use_vad) ▼
                             vad::trim_silence_with  (Silero VAD, configurable threshold)
                                   │
                                   ▼
                             transcribe::transcribe  (whisper-rs, beam_size≥2 → beam search,
                                                      else greedy; nonverbal segments filtered)
                                   │
                                   ▼
                             inject::paste         (xdotool type --delay 0 —
                                                    types at cursor, no clipboard)

watchdog thread: forces reset if Recording > 60s or Transcribing > 45s

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
.vscode		.vscode
public		public
src-tauri		src-tauri
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BVoice

Install

Features

Usage

Platform support

Configuration

Build from source

Prerequisites

Run / build

Architecture

License

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BVoice

Install

Features

Usage

Platform support

Configuration

Build from source

Prerequisites

Run / build

Architecture

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages