Whisper Rust API

A speech-to-text transcription API powered by OpenAI's Whisper model. Convert audio files to text with high accuracy.

What it does

This API transcribes audio files to text. Simply upload an audio file and get back the transcribed text along with segment-level details including timestamps.

Quick Start

Prerequisites

Rust (for local builds)
ffmpeg (for audio format conversion)
2GB+ of disk space for the model file

Setup and Run

Clone and navigate to the project

git clone <repository-url>
cd whisper-rust-api

Download the transcription model
```
make download-model
```
Configure (optional)
```
cp .env.example .env
```
Edit .env if you need to change the port or other settings.
Start the API
```
make start
```

The API is now running at http://localhost:8000

Using the API

Transcribe Audio

Upload an audio file to be transcribed:

curl -X POST -F file=@audio.mp3 http://localhost:8000/transcribe

Supported formats: WAV, MP3, M4A, FLAC, OGG, and more.

Response:

{
  "result": {
    "text": "Hello world. How are you today?",
    "segments": [
      { "id": 0, "start": 0, "end": 1500, "text_start": 0, "text_end": 12 },
      { "id": 1, "start": 1500, "end": 3200, "text_start": 12, "text_end": 31 }
    ]
  },
  "processing_time_ms": 1234
}

Each segment includes audio timestamps (start/end in milliseconds) and character offsets (text_start/text_end) into the full text string.

Check API Status

curl http://localhost:8000/health

Response:

{
  "status": "ok",
  "version": "0.2.0"
}

Get API Info

curl http://localhost:8000/info

Shows the current model, configuration, and available endpoints.

List Available Models

curl http://localhost:8000/models

Lists model files found in the configured model directory and indicates whether the configured model exists.

CLI

All operations are available through a single script with an interactive fzf menu:

./run/dev.sh              # Interactive menu
./run/dev.sh start        # Start locally in background (with Metal GPU on macOS)
./run/dev.sh stop         # Stop local server
./run/dev.sh logs         # Tail local server logs
./run/dev.sh dev          # Development mode with auto-reload
./run/dev.sh docker-build # Build Docker image
./run/dev.sh docker-start # Start in Docker container
./run/dev.sh docker-stop  # Stop Docker container
./run/dev.sh docker-logs  # View Docker logs
./run/dev.sh download-model
./run/dev.sh test
./run/dev.sh doctor       # Check environment

Makefile shortcuts are also available (make start, make stop, make build, etc).

GPU Acceleration

Local (macOS with Apple Silicon): Metal GPU is automatically enabled when running locally via ./run/dev.sh start or make build. This provides ~3-4x faster transcription compared to CPU-only.

Docker/Podman containers: Apple Metal GPU is not available inside Linux containers. Docker and Podman on macOS run a Linux VM which has no access to the Metal framework. Containers on macOS will always use CPU-only mode.

Linux + NVIDIA GPU: Build the Docker image with CUDA support:

docker build --build-arg GPU_FEATURES=cuda -t whisper-rust-api .

For the fastest transcription on macOS, run natively instead of in a container.

Deployment Options

Local Installation

Requires Rust to be installed. Then:

./run/dev.sh start

Docker

./run/dev.sh docker-build
./run/dev.sh docker-start

Configuration

Create a .env file (copy from .env.example) to customize:

Setting	Default	Purpose
`WHISPER_PORT`	8000	Port the API listens on
`WHISPER_HOST`	0.0.0.0	Host address
`WHISPER_THREADS`	4	Number of CPU threads to use
`WHISPER_MODEL`	`./models/ggml-large-v3-turbo.bin`	Path to the model file
`RUST_LOG`	info	Logging detail (debug, info, warn, error)

Model Selection

The default model (base.en) is optimized for English and is ~140MB.

For other languages or different accuracy/speed tradeoffs, download a different model from Hugging Face:

tiny.en (75MB) - Fastest, English only
base.en (140MB) - Default, English only
small.en (466MB) - Better accuracy, English only
medium.en (1.5GB) - High accuracy, English only
tiny (75MB) - Supports 99 languages
base (140MB) - Supports 99 languages
small (466MB) - Supports 99 languages

To use a different model:

# Download the model
wget -O models/ggml-small.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.en.bin

# Update .env
echo "WHISPER_MODEL=./models/ggml-small.en.bin" >> .env

# Restart the API
make stop
make start

Troubleshooting

"Model file not found" error

Run make download-model to download the default model.

"ffmpeg not found" error

Install ffmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Fedora
sudo dnf install ffmpeg

Port 8000 is already in use

Change the port in .env:

WHISPER_PORT=8001

Transcription is slow

Run locally instead of in Docker to get Metal GPU acceleration on macOS (~3-4x faster)
Use a smaller model (e.g., tiny.en instead of base.en)
Increase WHISPER_THREADS in .env (if your CPU has multiple cores)
Ensure no other heavy processes are running

Out of memory errors

Use a smaller model:

WHISPER_MODEL=./models/ggml-tiny.en.bin

Endpoints

Method	Endpoint	Description
POST	`/transcribe`	Upload audio and get transcription
GET	`/health`	Check if API is running
GET	`/info`	Get API information and configuration
GET	`/models`	List available model files in model directory

Performance Tips

Audio format: WAV files process faster than MP3 (no conversion needed)
File size: Smaller audio files process faster
Threads: More threads = faster processing on multi-core systems (up to CPU core count)
Model size: Smaller models are faster but less accurate

Need Help?

Check the Docker logs: make docker-logs
Review the configuration in .env
Ensure the model file was downloaded: ls models/ggml-*.bin
Verify ffmpeg is installed: ffmpeg -version

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
run		run
src		src
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
volume-profit.mp3		volume-profit.mp3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Rust API

What it does

Quick Start

Prerequisites

Setup and Run

Using the API

Transcribe Audio

Check API Status

Get API Info

List Available Models

CLI

GPU Acceleration

Deployment Options

Local Installation

Docker

Configuration

Model Selection

Troubleshooting

"Model file not found" error

"ffmpeg not found" error

Port 8000 is already in use

Transcription is slow

Out of memory errors

Endpoints

Performance Tips

Need Help?

About

Uh oh!

Releases

Packages

Languages

pascalweiss/whisper-api

Folders and files

Latest commit

History

Repository files navigation

Whisper Rust API

What it does

Quick Start

Prerequisites

Setup and Run

Using the API

Transcribe Audio

Check API Status

Get API Info

List Available Models

CLI

GPU Acceleration

Deployment Options

Local Installation

Docker

Configuration

Model Selection

Troubleshooting

"Model file not found" error

"ffmpeg not found" error

Port 8000 is already in use

Transcription is slow

Out of memory errors

Endpoints

Performance Tips

Need Help?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages