Skip to content

ErfanFathi/BVoice

Repository files navigation

BVoice

BVoice

Local push-to-talk speech-to-text desktop app.
Hold a key, speak, release — the transcription is typed at your cursor.
Runs 100% offline using whisper.cpp.

Latest release Platform License


Install

Grab the latest build from the Releases page.

Debian / Ubuntu.deb:

sudo apt install ./BVoice_0.1.2_amd64.deb

Fedora / RHEL / openSUSE.rpm:

sudo dnf install ./BVoice-0.1.2-1.x86_64.rpm

After install you'll find BVoice in your application menu. On first launch the selected whisper model (~75–466 MB) downloads to ~/.local/share/bvoice/models/.

The packages declare a runtime dependency on xdotool — used to type the transcription at the cursor.

To uninstall, the registered package name is b-voice (the bundler kebab-cases BVoice):

sudo apt remove b-voice          # Debian / Ubuntu
sudo dnf remove b-voice          # Fedora / RHEL / openSUSE

The terminal command is still bvoice — only the package name carries the hyphen.

Features

  • Push-to-talk trigger: hold Ctrl + Win (instant arm — no hold delay)
  • Local transcription with whisper.cpp (tiny.en / base.en / small.en, full or quantized q5_1 / q8_0)
  • Optional Silero VAD silence trim with tunable threshold
  • FFT-based resampling (rubato) for high-quality 48 kHz → 16 kHz conversion
  • Greedy decoding by default; configurable beam search (set beam_size ≥ 2)
  • Live-applied settings for threshold, input device, and model swap — no restart
  • Always-on-top desktop overlay reflects state (idle / recording / transcribing) with red and orange pulse animations; draggable, position persists
  • Single-instance enforcement; optional autostart on login
  • Types output directly at the cursor — never touches your clipboard

Usage

  1. Launch BVoice — a tray icon appears (no window by default).
  2. Click the tray icon → Settings to configure model, input device, beam size, VAD, and autostart.
  3. Focus any text field (editor, terminal, browser, …).
  4. Hold Ctrl + Win, speak, release.
  5. The transcription is typed at the cursor.

Platform support

  • Linux / X11 — primary target, tested on Ubuntu GNOME
  • Wayland — not supported (global hotkeys and synthesized typing require compositor-specific portals)
  • macOS / Windows — not yet ported

Configuration

Settings persist at ~/.config/bvoice/config.toml:

Key Type Default Description
model string base.en Whisper model; append -q5_1 or -q8_0 for quantized
input_device string|null null Input device name; null = system default
beam_size u32 1 Beam search size; 1 = greedy
use_vad bool false Trim silence with Silero VAD before transcription
vad_threshold f32 0.5 VAD speech probability threshold (0–1); active when on
overlay_position [i32, i32] bottom-right Desktop overlay position; written automatically when you drag it

The trigger is hardcoded to Ctrl + Win and is not user-configurable.

The overlay icon and overlay_position update on drag — the rest is editable from the Settings window and persists on Save.

Build from source

Prerequisites

  • Rust (stable) via rustup
  • Node.js 20+ and npm
  • Tauri CLI: cargo install tauri-cli --version '^2.0' --locked
  • Linux system packages (Ubuntu/Debian):
    sudo apt install \
      libwebkit2gtk-4.1-dev libsoup-3.0-dev libayatana-appindicator3-dev \
      libasound2-dev libpulse-dev libclang-dev libssl-dev libstdc++-12-dev \
      pkg-config build-essential
    

Run / build

npm install
npm run tauri dev          # development
npm run tauri build        # release bundles (.deb + .rpm)

Architecture

setup (background thread):  model::ensure_model  ─▶  transcribe::init  (whisper-rs context)

hotkey (rdev, X11 XRecord)  ─▶ state machine (Ctrl+Win chord)
                                   │
                             armed ▼
                             audio::start          (cpal capture on dedicated thread,
                                                    PulseAudio source via libpulse-binding)
                                   │
                          released ▼
                             audio::stop           (downmix to mono → rubato 48→16 kHz)
                                   │
                       (if use_vad) ▼
                             vad::trim_silence_with  (Silero VAD, configurable threshold)
                                   │
                                   ▼
                             transcribe::transcribe  (whisper-rs, beam_size≥2 → beam search,
                                                      else greedy; nonverbal segments filtered)
                                   │
                                   ▼
                             inject::paste         (xdotool type --delay 0 —
                                                    types at cursor, no clipboard)

watchdog thread: forces reset if Recording > 60s or Transcribing > 45s

License

MIT — see LICENSE.

About

Push-to-talk dictation for Linux. Hold a key, speak, release the transcription is typed at your cursor. 100% offline, powered by whisper.cpp. Built with Rust, Tauri and Svelte.

Topics

Resources

License

Stars

Watchers

Forks

Contributors