Local push-to-talk speech-to-text desktop app.
Hold a key, speak, release — the transcription is typed at your cursor.
Runs 100% offline using whisper.cpp.
Grab the latest build from the Releases page.
Debian / Ubuntu — .deb:
sudo apt install ./BVoice_0.1.2_amd64.deb
Fedora / RHEL / openSUSE — .rpm:
sudo dnf install ./BVoice-0.1.2-1.x86_64.rpm
After install you'll find BVoice in your application menu. On first launch the selected whisper model (~75–466 MB) downloads to ~/.local/share/bvoice/models/.
The packages declare a runtime dependency on xdotool — used to type the transcription at the cursor.
To uninstall, the registered package name is b-voice (the bundler kebab-cases BVoice):
sudo apt remove b-voice # Debian / Ubuntu
sudo dnf remove b-voice # Fedora / RHEL / openSUSE
The terminal command is still bvoice — only the package name carries the hyphen.
- Push-to-talk trigger: hold Ctrl + Win (instant arm — no hold delay)
- Local transcription with whisper.cpp (
tiny.en/base.en/small.en, full or quantizedq5_1/q8_0) - Optional Silero VAD silence trim with tunable threshold
- FFT-based resampling (rubato) for high-quality 48 kHz → 16 kHz conversion
- Greedy decoding by default; configurable beam search (set
beam_size≥ 2) - Live-applied settings for threshold, input device, and model swap — no restart
- Always-on-top desktop overlay reflects state (idle / recording / transcribing) with red and orange pulse animations; draggable, position persists
- Single-instance enforcement; optional autostart on login
- Types output directly at the cursor — never touches your clipboard
- Launch BVoice — a tray icon appears (no window by default).
- Click the tray icon → Settings to configure model, input device, beam size, VAD, and autostart.
- Focus any text field (editor, terminal, browser, …).
- Hold Ctrl + Win, speak, release.
- The transcription is typed at the cursor.
- Linux / X11 — primary target, tested on Ubuntu GNOME
- Wayland — not supported (global hotkeys and synthesized typing require compositor-specific portals)
- macOS / Windows — not yet ported
Settings persist at ~/.config/bvoice/config.toml:
| Key | Type | Default | Description |
|---|---|---|---|
model |
string | base.en |
Whisper model; append -q5_1 or -q8_0 for quantized |
input_device |
string|null | null |
Input device name; null = system default |
beam_size |
u32 | 1 |
Beam search size; 1 = greedy |
use_vad |
bool | false |
Trim silence with Silero VAD before transcription |
vad_threshold |
f32 | 0.5 |
VAD speech probability threshold (0–1); active when on |
overlay_position |
[i32, i32] | bottom-right | Desktop overlay position; written automatically when you drag it |
The trigger is hardcoded to Ctrl + Win and is not user-configurable.
The overlay icon and overlay_position update on drag — the rest is editable from the Settings window and persists on Save.
- Rust (stable) via rustup
- Node.js 20+ and npm
- Tauri CLI:
cargo install tauri-cli --version '^2.0' --locked - Linux system packages (Ubuntu/Debian):
sudo apt install \ libwebkit2gtk-4.1-dev libsoup-3.0-dev libayatana-appindicator3-dev \ libasound2-dev libpulse-dev libclang-dev libssl-dev libstdc++-12-dev \ pkg-config build-essential
npm install
npm run tauri dev # development
npm run tauri build # release bundles (.deb + .rpm)
setup (background thread): model::ensure_model ─▶ transcribe::init (whisper-rs context)
hotkey (rdev, X11 XRecord) ─▶ state machine (Ctrl+Win chord)
│
armed ▼
audio::start (cpal capture on dedicated thread,
PulseAudio source via libpulse-binding)
│
released ▼
audio::stop (downmix to mono → rubato 48→16 kHz)
│
(if use_vad) ▼
vad::trim_silence_with (Silero VAD, configurable threshold)
│
▼
transcribe::transcribe (whisper-rs, beam_size≥2 → beam search,
else greedy; nonverbal segments filtered)
│
▼
inject::paste (xdotool type --delay 0 —
types at cursor, no clipboard)
watchdog thread: forces reset if Recording > 60s or Transcribing > 45s
MIT — see LICENSE.