Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions docs/streaming/dev-guide/part1.md
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,10 @@ In the following sections, you'll see each phase detailed, showing exactly when

These components are created once when your application starts and shared across all streaming sessions. They define your agent's capabilities, manage conversation history, and orchestrate the streaming execution.

!!! info "Python Version Requirement"

ADK requires **Python 3.10 or higher**. As of ADK v1.19.0, Python 3.9 is no longer supported. Ensure your development and production environments meet this requirement before installing ADK.

#### Define Your Agent

The `Agent` is the core of your streaming application—it defines what your AI can do, how it should behave, and which AI model powers it. You configure your agent with a specific model, tools it can use (like Google Search or custom APIs), and instructions that shape its personality and behavior.
Expand Down Expand Up @@ -457,11 +461,18 @@ session_service = InMemorySessionService()

For production applications, choose a persistent session service based on your infrastructure:

**Use `SqliteSessionService` if:**

- You need lightweight local persistence without external dependencies
- You're building a single-server application or development environment
- You want automatic database initialization with minimal configuration
- Example: `SqliteSessionService(db_path="sessions.db")`

**Use `DatabaseSessionService` if:**

- You have existing PostgreSQL/MySQL/SQLite infrastructure
- You have existing PostgreSQL/MySQL infrastructure
- You need full control over data storage and backups
- You're running outside Google Cloud or in hybrid environments
- You're running multi-server deployments requiring shared state
- Example: `DatabaseSessionService(connection_string="postgresql://...")`

**Use `VertexAiSessionService` if:**
Expand All @@ -471,7 +482,7 @@ For production applications, choose a persistent session service based on your i
- You need tight integration with Vertex AI features
- Example: `VertexAiSessionService(project="my-project")`

Both provide the same session persistence capabilities—choose based on your infrastructure. With persistent session services, the state of the `Session` will be preserved even after application shutdown. See the [ADK Session Management documentation](https://google.github.io/adk-docs/sessions/ for more details.
All three provide session persistence capabilities—choose based on your infrastructure and scale requirements. With persistent session services, the state of the `Session` will be preserved even after application shutdown. See the [ADK Session Management documentation](https://google.github.io/adk-docs/sessions/ for more details.

#### Define Your Runner

Expand Down Expand Up @@ -855,7 +866,7 @@ This example shows the core pattern. For production applications, consider:
- **Authentication and authorization**: Implement authentication and authorization for your endpoints
- **Rate limiting and quotas**: Add rate limiting and timeout controls. For guidance on concurrent sessions and quota management, see [Part 4: Concurrent Live API Sessions and Quota Management](part4.md#concurrent-live-api-sessions-and-quota-management).
- **Structured logging**: Use structured logging for debugging.
- **Persistent session services**: Consider using persistent session services (`DatabaseSessionService` or `VertexAiSessionService`). See the [ADK Session Services documentation](https://google.github.io/adk-docs/sessions/) for more details.
- **Persistent session services**: Consider using persistent session services (`SqliteSessionService`, `DatabaseSessionService`, or `VertexAiSessionService`). See the [ADK Session Services documentation](https://google.github.io/adk-docs/sessions/) for more details.

## 1.6 What We Will Learn

Expand Down
20 changes: 6 additions & 14 deletions docs/streaming/dev-guide/part3.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ The `run_live()` method is ADK's primary entry point for streaming conversations
You'll learn how to process different event types (text, audio, transcriptions, tool calls), manage conversation flow with interruption and turn completion signals, serialize events for network transport, and leverage ADK's automatic tool execution. Understanding event handling is essential for building responsive streaming applications that feel natural and real-time to users.

!!! note "Async Context Required"

All `run_live()` code requires async context. See [Part 1: FastAPI Application Example](part1.md#fastapi-application-example) for details and production examples.

## How run_live() Works

`run_live()` is an async generator that streams conversation events in real-time. It yields events immediately as they're generated—no buffering, no polling, no callbacks. Events are streamed without internal buffering. Overall memory depends on session persistence (e.g., in-memory vs database), making it suitable for both quick exchanges and extended sessions.
Expand All @@ -30,7 +30,7 @@ async def run_live(
```

As its signature tells, every streaming conversation needs identity (user_id), continuity (session_id), communication (live_request_queue), and configuration (run_config). The return type—an async generator of Events—promises real-time delivery without overwhelming system resources.

```mermaid
sequenceDiagram
participant Client
Expand All @@ -51,7 +51,7 @@ loop Continuous Streaming
Runner-->>Client: Event (yield)
end
```

### Basic Usage Pattern

The simplest way to consume events from `run_live()` is to iterate over the async generator with a for-loop:
Expand All @@ -77,7 +77,6 @@ async for event in runner.run_live(
The `run_live()` method manages the underlying Live API connection lifecycle automatically:

**Connection States:**

1. **Initialization**: Connection established when `run_live()` is called
2. **Active Streaming**: Bidirectional communication via `LiveRequestQueue` (upstream to the model) and `run_live()` (downstream from the model)
3. **Graceful Closure**: Connection closes when `LiveRequestQueue.close()` is called
Expand Down Expand Up @@ -134,7 +133,7 @@ Not all events yielded by `run_live()` are persisted to the ADK `Session`. When

These events are persisted to the ADK `Session` and available in session history:

- **Audio Events with File Data**: Saved to ADK `Session` only if `RunConfig.save_live_model_audio_to_session` is `True`; audio data is aggregated into files in artifacts with `file_data` references
- **Audio Events with File Data**: Saved to ADK `Session` only if `RunConfig.save_live_blob` is `True`; audio data is aggregated into files in artifacts with `file_data` references
- **Usage Metadata Events**: Always saved to track token consumption across the ADK `Session`
- **Non-Partial Transcription Events**: Final transcriptions are saved; partial transcriptions are not persisted
- **Function Call and Response Events**: Always saved to maintain tool execution history
Expand Down Expand Up @@ -166,27 +165,23 @@ ADK's `Event` class is a Pydantic model that represents all communication in a s
#### Key Fields

**Essential for all applications:**

- `content`: Contains text, audio, or function calls as `Content.parts`
- `author`: Identifies who created the event (`"user"` or agent name)
- `partial`: Distinguishes incremental chunks from complete text
- `turn_complete`: Signals when to enable user input again
- `interrupted`: Indicates when to stop rendering current output

**For voice/audio applications:**

- `input_transcription`: User's spoken words (when enabled in `RunConfig`)
- `output_transcription`: Model's spoken words (when enabled in `RunConfig`)
- `content.parts[].inline_data`: Audio data for playback

**For tool execution:**

- `content.parts[].function_call`: Model's tool invocation requests
- `content.parts[].function_response`: Tool execution results
- `long_running_tool_ids`: Track async tool execution

**For debugging and diagnostics:**

- `usage_metadata`: Token counts and billing information
- `cache_metadata`: Context cache hit/miss statistics
- `finish_reason`: Why the model stopped generating (e.g., STOP, MAX_TOKENS, SAFETY)
Expand Down Expand Up @@ -374,7 +369,7 @@ Both input and output audio data are aggregated into audio files and saved in th

!!! note "Session Persistence"

To save audio events with file data to session history, enable `RunConfig.save_live_model_audio_to_session = True`. This allows audio conversations to be reviewed or replayed from persisted sessions.
To save audio events with file data to session history, enable `RunConfig.save_live_blob = True`. This allows audio conversations to be reviewed or replayed from persisted sessions.

### Metadata Events

Expand Down Expand Up @@ -708,7 +703,6 @@ Event 4: partial=False, text="", turn_complete=True # Turn done
```

**Important timing relationships**:

- `partial=False` can occur **multiple times** in a turn (e.g., after each sentence)
- `turn_complete=True` occurs **once** at the very end of the model's complete response, in a **separate event**
- You may receive: `partial=False` (sentence 1) → `partial=False` (sentence 2) → `turn_complete=True`
Expand Down Expand Up @@ -823,7 +817,6 @@ async for event in runner.run_live(...):
- **Streaming optimization**: Stop buffering when turn is complete

**Turn completion and caching:** Audio/transcript caches are flushed automatically at specific points during streaming:

- **On turn completion** (`turn_complete=True`): Both user and model audio caches are flushed
- **On interruption** (`interrupted=True`): Model audio cache is flushed
- **On generation completion**: Model audio cache is flushed
Expand Down Expand Up @@ -1151,7 +1144,6 @@ Think of it as a traveling notebook that accompanies a conversation from start t
### What is an Invocation?

An **invocation** represents a complete interaction cycle:

- Starts with user input (text, audio, or control signal)
- May involve one or multiple agent calls
- Ends when a final response is generated or when explicitly terminated
Expand Down
48 changes: 33 additions & 15 deletions docs/streaming/dev-guide/part4.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,33 @@ sequenceDiagram
Note over ADK,Gemini: Turn Detection: finish_reason
```

!!! info "Progressive SSE Streaming (New in v1.19.0)"

ADK v1.19.0 introduced **progressive SSE streaming**, an experimental feature that enhances how SSE mode delivers streaming responses. When enabled, this feature improves response aggregation by:

**Key improvements:**

- **Content ordering preservation**: Maintains the original order of mixed content types (text, function calls, inline data)
- **Intelligent text merging**: Only merges consecutive text parts of the same type (regular text vs thought text)
- **Progressive delivery**: Marks all intermediate chunks as `partial=True`, with a single final aggregated response at the end
- **Deferred function execution**: Skips executing function calls in partial events, only executing them in the final aggregated event to avoid duplicate executions

**Enabling the feature:**

This is an experimental (WIP stage) feature disabled by default. Enable it via environment variable:

```bash
export ADK_ENABLE_PROGRESSIVE_SSE_STREAMING=1
```

**When to use:**

- You're using `StreamingMode.SSE` and need better handling of mixed content types (text + function calls)
- Your responses include thought text (extended thinking) mixed with regular text
- You want to ensure function calls execute only once after complete response aggregation

**Note:** This feature only affects `StreamingMode.SSE`. It does not apply to `StreamingMode.BIDI` (the focus of this guide), which uses the Live API's native bidirectional protocol.

### When to Use Each Mode

Your choice between BIDI and SSE depends on your application requirements and the interaction patterns you need to support. Here's a practical guide to help you choose:
Expand Down Expand Up @@ -278,7 +305,7 @@ When building ADK Bidi-streaming applications, it's essential to understand how
Understanding the distinction between **ADK `Session`** and **Live API session** is crucial for building reliable streaming applications with ADK Bidi-streaming.

**ADK `Session`** (managed by SessionService):
- Persistent conversation storage for conversation history, events, and state, created via `SessionService.create_session()`
- Persistent conversation storage for conversation history, events, and state, created via `SessionService.create_session()`
- Storage options: in-memory, database (PostgreSQL/MySQL/SQLite), or Vertex AI
- Survives across multiple `run_live()` calls and application restarts (with the persistent `SessionService`)

Expand Down Expand Up @@ -348,7 +375,6 @@ sequenceDiagram
```

**Key insights:**

- ADK Session survives across multiple `run_live()` calls and app restarts
- Live API session is ephemeral - created and destroyed per streaming session
- Conversation continuity is maintained through ADK Session's persistent storage
Expand Down Expand Up @@ -539,16 +565,6 @@ run_config = RunConfig(
)
)
)

# For gemini-live-2.5-flash (32k context window on Vertex AI)
run_config = RunConfig(
context_window_compression=types.ContextWindowCompressionConfig(
trigger_tokens=25000, # Start compression at ~78% of 32k context
sliding_window=types.SlidingWindow(
target_tokens=20000 # Compress to ~62% of context
)
)
)
```

**How it works:**
Expand Down Expand Up @@ -608,14 +624,12 @@ While compression enables unlimited session duration, consider these trade-offs:
**Common Use Cases:**

✅ **Enable compression when:**

- Sessions need to exceed platform duration limits (15/2/10 minutes)
- Extended conversations may hit token limits (128k for 2.5-flash)
- Customer support sessions that can last hours
- Educational tutoring with long interactions

❌ **Disable compression when:**

- All sessions complete within duration limits
- Precision recall of early conversation is critical
- Development/testing phase (full history aids debugging)
Expand Down Expand Up @@ -822,6 +836,10 @@ This parameter caps the total number of LLM invocations allowed per invocation c

This parameter controls whether audio and video streams are persisted to ADK's session and artifact services for debugging, compliance, and quality assurance purposes.

!!! warning "Migration Note: save_live_audio Deprecated"

**If you're using `save_live_audio`:** This parameter has been deprecated in favor of `save_live_blob`. ADK will automatically migrate `save_live_audio=True` to `save_live_blob=True` with a deprecation warning, but this compatibility layer will be removed in a future release. Update your code to use `save_live_blob` instead.

Currently, **only audio is persisted** by ADK's implementation. When enabled, ADK persists audio streams to:

- **[Session service](https://google.github.io/adk-docs/sessions/)**: Conversation history includes audio references
Expand Down Expand Up @@ -957,7 +975,7 @@ run_config = RunConfig(

ADK validates CFC compatibility at session initialization and will raise an error if the model is unsupported:

- ✅ **Supported**: `gemini-2.x` models (e.g., `gemini-2.5-flash-native-audio-preview-09-2025`, `gemini-2.0-flash-live-001`)
- ✅ **Supported**: `gemini-2.x` models (e.g., `gemini-2.5-flash-native-audio-preview-09-2025`)
- ❌ **Not supported**: `gemini-1.5-x` models
- **Validation**: ADK checks that the model name starts with `gemini-2` when `support_cfc=True` ([`runners.py:1200-1203`](https://github.com/google/adk-python/blob/main/src/google/adk/runners.py#L1200-L1203))
- **Code executor**: ADK automatically injects `BuiltInCodeExecutor` when CFC is enabled for safe parallel tool execution
Expand Down
31 changes: 30 additions & 1 deletion docs/streaming/dev-guide/part5.md
Original file line number Diff line number Diff line change
Expand Up @@ -668,6 +668,35 @@ DEMO_AGENT_MODEL=gemini-2.5-flash-native-audio-preview-09-2025
# DEMO_AGENT_MODEL=gemini-live-2.5-flash-preview-native-audio-09-2025
```

!!! note "Environment Variable Loading Order"

When using `.env` files with `python-dotenv`, you must call `load_dotenv()` **before** importing any modules that read environment variables. Otherwise, `os.getenv()` will return `None` and fall back to the default value, ignoring your `.env` configuration.

**Correct order in `main.py`:**

```python
from dotenv import load_dotenv
from pathlib import Path

# Load .env file BEFORE importing agent
load_dotenv(Path(__file__).parent / ".env")

# Now safe to import modules that use environment variables
from google_search_agent.agent import agent
```

**Incorrect order (will not work):**

```python
from dotenv import load_dotenv
from google_search_agent.agent import agent # Agent reads env var here

# Too late! Agent already initialized with default model
load_dotenv(Path(__file__).parent / ".env")
```

This is a Python import behavior: when you import a module, its top-level code executes immediately. If your agent module calls `os.getenv("DEMO_AGENT_MODEL")` at import time, the `.env` file must already be loaded.

**Selecting the right model:**

1. **Choose platform**: Decide between Gemini Live API (public) or Vertex AI Live API (enterprise)
Expand Down Expand Up @@ -953,7 +982,7 @@ The automatic enablement happens in `Runner.run_live()` when both conditions are

!!! note "Source"

[`runners.py:1236-1253`](https://github.com/google/adk-python/blob/main/src/google/adk/runners.py#L1236-L1253)
[`runners.py:1245-1260`](https://github.com/google/adk-python/blob/main/src/google/adk/runners.py#L1245-L1260)

## Voice Configuration (Speech Config)

Expand Down