Skip to content

eraykeskinmac/strands-deepgram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

strands-deepgram

PyPI version Python Support License: MIT

Deepgram speech processing tool for Strands Agents SDK. Enables powerful speech-to-text, text-to-speech, and audio intelligence capabilities for AI agents.

Features

  • Speech-to-Text: Transcribe audio with 30+ language support
  • Text-to-Speech: Generate natural-sounding speech in multiple voices
  • Audio Intelligence: Sentiment analysis, topic detection, intent recognition
  • Speaker Diarization: Identify and separate different speakers
  • Multi-format Support: WAV, MP3, M4A, FLAC, and more
  • Type Safe: Full type hints and validation
  • Easy Integration: Drop-in tool for Strands agents

Requirements

  • Python 3.9+
  • Strands Agents SDK 1.11.0+
  • Deepgram SDK 3.0+

Installation

pip install strands-deepgram

Quick Start

from strands import Agent
from strands_deepgram import deepgram

# Create an agent with Deepgram tool
agent = Agent(tools=[deepgram])

# Transcribe audio with speaker identification
agent("transcribe audio from recording.mp3 in Turkish with speaker diarization")

# Text-to-speech
agent("convert this text to speech and save as output.mp3: Hello world")

# Audio intelligence
agent("analyze sentiment and topics in recording.wav")

Configuration

Set your Deepgram API key as an environment variable:

DEEPGRAM_API_KEY=your_deepgram_api_key  # Required
DEEPGRAM_DEFAULT_MODEL=nova-3            # Optional
DEEPGRAM_DEFAULT_LANGUAGE=en             # Optional

Get your API key at: console.deepgram.com

Supported Actions

Speech-to-Text (transcribe)

agent("transcribe this audio file: path/to/audio.mp3")

Features:

  • Multi-language transcription (30+ languages)
  • Speaker diarization (identify different speakers)
  • Smart formatting and punctuation
  • Word-level timestamps
  • Sentiment analysis (optional)
  • Topic and intent detection (optional)

Text-to-Speech (text_to_speech)

agent("convert this text to speech: Hello, how are you today?")

Features:

  • Natural-sounding voices (Aura series)
  • Multiple audio formats (MP3, WAV, FLAC)
  • Customizable speech parameters
  • Voice selection

Audio Intelligence (analyze)

agent("analyze sentiment and topics in audio: call.mp3")

Features:

  • Sentiment analysis
  • Topic detection
  • Intent recognition
  • Language detection

Testing

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Links


Built for the Strands community 🚀

About

Deepgram speech processing tool for Strands Agents SDK

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages