A speech-to-text transcription API powered by OpenAI's Whisper model. Convert audio files to text with high accuracy.
This API transcribes audio files to text. Simply upload an audio file and get back the transcribed text along with segment-level details including timestamps.
- Rust (for local builds)
- ffmpeg (for audio format conversion)
- 2GB+ of disk space for the model file
-
Clone and navigate to the project
git clone <repository-url> cd whisper-rust-api
-
Download the transcription model
make download-model
-
Configure (optional)
cp .env.example .env
Edit
.envif you need to change the port or other settings. -
Start the API
make start
The API is now running at http://localhost:8000
Upload an audio file to be transcribed:
curl -X POST -F file=@audio.mp3 http://localhost:8000/transcribeSupported formats: WAV, MP3, M4A, FLAC, OGG, and more.
Response:
{
"result": {
"text": "Hello world. How are you today?",
"segments": [
{ "id": 0, "start": 0, "end": 1500, "text_start": 0, "text_end": 12 },
{ "id": 1, "start": 1500, "end": 3200, "text_start": 12, "text_end": 31 }
]
},
"processing_time_ms": 1234
}Each segment includes audio timestamps (start/end in milliseconds) and character offsets (text_start/text_end) into the full text string.
curl http://localhost:8000/healthResponse:
{
"status": "ok",
"version": "0.2.0"
}curl http://localhost:8000/infoShows the current model, configuration, and available endpoints.
curl http://localhost:8000/modelsLists model files found in the configured model directory and indicates whether the configured model exists.
All operations are available through a single script with an interactive fzf menu:
./run/dev.sh # Interactive menu
./run/dev.sh start # Start locally in background (with Metal GPU on macOS)
./run/dev.sh stop # Stop local server
./run/dev.sh logs # Tail local server logs
./run/dev.sh dev # Development mode with auto-reload
./run/dev.sh docker-build # Build Docker image
./run/dev.sh docker-start # Start in Docker container
./run/dev.sh docker-stop # Stop Docker container
./run/dev.sh docker-logs # View Docker logs
./run/dev.sh download-model
./run/dev.sh test
./run/dev.sh doctor # Check environmentMakefile shortcuts are also available (make start, make stop, make build, etc).
Local (macOS with Apple Silicon): Metal GPU is automatically enabled when running locally via ./run/dev.sh start or make build. This provides ~3-4x faster transcription compared to CPU-only.
Docker/Podman containers: Apple Metal GPU is not available inside Linux containers. Docker and Podman on macOS run a Linux VM which has no access to the Metal framework. Containers on macOS will always use CPU-only mode.
Linux + NVIDIA GPU: Build the Docker image with CUDA support:
docker build --build-arg GPU_FEATURES=cuda -t whisper-rust-api .For the fastest transcription on macOS, run natively instead of in a container.
Requires Rust to be installed. Then:
./run/dev.sh start./run/dev.sh docker-build
./run/dev.sh docker-startCreate a .env file (copy from .env.example) to customize:
| Setting | Default | Purpose |
|---|---|---|
WHISPER_PORT |
8000 | Port the API listens on |
WHISPER_HOST |
0.0.0.0 | Host address |
WHISPER_THREADS |
4 | Number of CPU threads to use |
WHISPER_MODEL |
./models/ggml-large-v3-turbo.bin |
Path to the model file |
RUST_LOG |
info | Logging detail (debug, info, warn, error) |
The default model (base.en) is optimized for English and is ~140MB.
For other languages or different accuracy/speed tradeoffs, download a different model from Hugging Face:
- tiny.en (75MB) - Fastest, English only
- base.en (140MB) - Default, English only
- small.en (466MB) - Better accuracy, English only
- medium.en (1.5GB) - High accuracy, English only
- tiny (75MB) - Supports 99 languages
- base (140MB) - Supports 99 languages
- small (466MB) - Supports 99 languages
To use a different model:
# Download the model
wget -O models/ggml-small.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.en.bin
# Update .env
echo "WHISPER_MODEL=./models/ggml-small.en.bin" >> .env
# Restart the API
make stop
make startRun make download-model to download the default model.
Install ffmpeg:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Fedora
sudo dnf install ffmpegChange the port in .env:
WHISPER_PORT=8001
- Run locally instead of in Docker to get Metal GPU acceleration on macOS (~3-4x faster)
- Use a smaller model (e.g.,
tiny.eninstead ofbase.en) - Increase
WHISPER_THREADSin.env(if your CPU has multiple cores) - Ensure no other heavy processes are running
Use a smaller model:
WHISPER_MODEL=./models/ggml-tiny.en.bin
| Method | Endpoint | Description |
|---|---|---|
| POST | /transcribe |
Upload audio and get transcription |
| GET | /health |
Check if API is running |
| GET | /info |
Get API information and configuration |
| GET | /models |
List available model files in model directory |
- Audio format: WAV files process faster than MP3 (no conversion needed)
- File size: Smaller audio files process faster
- Threads: More threads = faster processing on multi-core systems (up to CPU core count)
- Model size: Smaller models are faster but less accurate
- Check the Docker logs:
make docker-logs - Review the configuration in
.env - Ensure the model file was downloaded:
ls models/ggml-*.bin - Verify ffmpeg is installed:
ffmpeg -version