google · kazunori279 · Nov 22, 2025 · Nov 24, 2025
diff --git a/docs/streaming/dev-guide/part1.md b/docs/streaming/dev-guide/part1.md
@@ -410,6 +410,10 @@ In the following sections, you'll see each phase detailed, showing exactly when
 
 These components are created once when your application starts and shared across all streaming sessions. They define your agent's capabilities, manage conversation history, and orchestrate the streaming execution.
 
+!!! info "Python Version Requirement"
+
+    ADK requires **Python 3.10 or higher**. As of ADK v1.19.0, Python 3.9 is no longer supported. Ensure your development and production environments meet this requirement before installing ADK.
+
 #### Define Your Agent
 
 The `Agent` is the core of your streaming application—it defines what your AI can do, how it should behave, and which AI model powers it. You configure your agent with a specific model, tools it can use (like Google Search or custom APIs), and instructions that shape its personality and behavior.
@@ -457,11 +461,18 @@ session_service = InMemorySessionService()
 
 For production applications, choose a persistent session service based on your infrastructure:
 
+**Use `SqliteSessionService` if:**
+
+- You need lightweight local persistence without external dependencies
+- You're building a single-server application or development environment
+- You want automatic database initialization with minimal configuration
+- Example: `SqliteSessionService(db_path="sessions.db")`
+
 **Use `DatabaseSessionService` if:**
 
-- You have existing PostgreSQL/MySQL/SQLite infrastructure
+- You have existing PostgreSQL/MySQL infrastructure
 - You need full control over data storage and backups
-- You're running outside Google Cloud or in hybrid environments
+- You're running multi-server deployments requiring shared state
 - Example: `DatabaseSessionService(connection_string="postgresql://...")`
 
 **Use `VertexAiSessionService` if:**
@@ -471,7 +482,7 @@ For production applications, choose a persistent session service based on your i
 - You need tight integration with Vertex AI features
 - Example: `VertexAiSessionService(project="my-project")`
 
-Both provide the same session persistence capabilities—choose based on your infrastructure. With persistent session services, the state of the `Session` will be preserved even after application shutdown. See the [ADK Session Management documentation](https://google.github.io/adk-docs/sessions/ for more details.
+All three provide session persistence capabilities—choose based on your infrastructure and scale requirements. With persistent session services, the state of the `Session` will be preserved even after application shutdown. See the [ADK Session Management documentation](https://google.github.io/adk-docs/sessions/ for more details.
 
 #### Define Your Runner
 
@@ -855,7 +866,7 @@ This example shows the core pattern. For production applications, consider:
 - **Authentication and authorization**: Implement authentication and authorization for your endpoints
 - **Rate limiting and quotas**: Add rate limiting and timeout controls. For guidance on concurrent sessions and quota management, see [Part 4: Concurrent Live API Sessions and Quota Management](part4.md#concurrent-live-api-sessions-and-quota-management).
 - **Structured logging**: Use structured logging for debugging.
-- **Persistent session services**: Consider using persistent session services (`DatabaseSessionService` or `VertexAiSessionService`). See the [ADK Session Services documentation](https://google.github.io/adk-docs/sessions/) for more details.
+- **Persistent session services**: Consider using persistent session services (`SqliteSessionService`, `DatabaseSessionService`, or `VertexAiSessionService`). See the [ADK Session Services documentation](https://google.github.io/adk-docs/sessions/) for more details.
 
 ## 1.6 What We Will Learn
 

diff --git a/docs/streaming/dev-guide/part3.md b/docs/streaming/dev-guide/part3.md
@@ -5,9 +5,9 @@ The `run_live()` method is ADK's primary entry point for streaming conversations
 You'll learn how to process different event types (text, audio, transcriptions, tool calls), manage conversation flow with interruption and turn completion signals, serialize events for network transport, and leverage ADK's automatic tool execution. Understanding event handling is essential for building responsive streaming applications that feel natural and real-time to users.
 
 !!! note "Async Context Required"
-
+    
     All `run_live()` code requires async context. See [Part 1: FastAPI Application Example](part1.md#fastapi-application-example) for details and production examples.
-
+    
 ## How run_live() Works
 
 `run_live()` is an async generator that streams conversation events in real-time. It yields events immediately as they're generated—no buffering, no polling, no callbacks. Events are streamed without internal buffering. Overall memory depends on session persistence (e.g., in-memory vs database), making it suitable for both quick exchanges and extended sessions.
@@ -30,7 +30,7 @@ async def run_live(
 ```
 
 As its signature tells, every streaming conversation needs identity (user_id), continuity (session_id), communication (live_request_queue), and configuration (run_config). The return type—an async generator of Events—promises real-time delivery without overwhelming system resources.
-
+    
 ```mermaid
 sequenceDiagram
 participant Client
@@ -51,7 +51,7 @@ loop Continuous Streaming
     Runner-->>Client: Event (yield)
 end
 ```
-
+    
 ### Basic Usage Pattern
 
 The simplest way to consume events from `run_live()` is to iterate over the async generator with a for-loop:
@@ -77,7 +77,6 @@ async for event in runner.run_live(
 The `run_live()` method manages the underlying Live API connection lifecycle automatically:
 
 **Connection States:**
-
 1. **Initialization**: Connection established when `run_live()` is called
 2. **Active Streaming**: Bidirectional communication via `LiveRequestQueue` (upstream to the model) and `run_live()` (downstream from the model)
 3. **Graceful Closure**: Connection closes when `LiveRequestQueue.close()` is called
@@ -134,7 +133,7 @@ Not all events yielded by `run_live()` are persisted to the ADK `Session`. When
 
 These events are persisted to the ADK `Session` and available in session history:
 
-- **Audio Events with File Data**: Saved to ADK `Session` only if `RunConfig.save_live_model_audio_to_session` is `True`; audio data is aggregated into files in artifacts with `file_data` references
+- **Audio Events with File Data**: Saved to ADK `Session` only if `RunConfig.save_live_blob` is `True`; audio data is aggregated into files in artifacts with `file_data` references
 - **Usage Metadata Events**: Always saved to track token consumption across the ADK `Session`
 - **Non-Partial Transcription Events**: Final transcriptions are saved; partial transcriptions are not persisted
 - **Function Call and Response Events**: Always saved to maintain tool execution history
@@ -166,27 +165,23 @@ ADK's `Event` class is a Pydantic model that represents all communication in a s
 #### Key Fields
 
 **Essential for all applications:**
-
 - `content`: Contains text, audio, or function calls as `Content.parts`
 - `author`: Identifies who created the event (`"user"` or agent name)
 - `partial`: Distinguishes incremental chunks from complete text
 - `turn_complete`: Signals when to enable user input again
 - `interrupted`: Indicates when to stop rendering current output
 
 **For voice/audio applications:**
-
 - `input_transcription`: User's spoken words (when enabled in `RunConfig`)
 - `output_transcription`: Model's spoken words (when enabled in `RunConfig`)
 - `content.parts[].inline_data`: Audio data for playback
 
 **For tool execution:**
-
 - `content.parts[].function_call`: Model's tool invocation requests
 - `content.parts[].function_response`: Tool execution results
 - `long_running_tool_ids`: Track async tool execution
 
 **For debugging and diagnostics:**
-
 - `usage_metadata`: Token counts and billing information
 - `cache_metadata`: Context cache hit/miss statistics
 - `finish_reason`: Why the model stopped generating (e.g., STOP, MAX_TOKENS, SAFETY)
@@ -374,7 +369,7 @@ Both input and output audio data are aggregated into audio files and saved in th
 
 !!! note "Session Persistence"
 
-    To save audio events with file data to session history, enable `RunConfig.save_live_model_audio_to_session = True`. This allows audio conversations to be reviewed or replayed from persisted sessions.
+    To save audio events with file data to session history, enable `RunConfig.save_live_blob = True`. This allows audio conversations to be reviewed or replayed from persisted sessions.
 
 ### Metadata Events
 
@@ -708,7 +703,6 @@ Event 4: partial=False, text="",             turn_complete=True  # Turn done
 ```
 
 **Important timing relationships**:
-
 - `partial=False` can occur **multiple times** in a turn (e.g., after each sentence)
 - `turn_complete=True` occurs **once** at the very end of the model's complete response, in a **separate event**
 - You may receive: `partial=False` (sentence 1) → `partial=False` (sentence 2) → `turn_complete=True`
@@ -823,7 +817,6 @@ async for event in runner.run_live(...):
 - **Streaming optimization**: Stop buffering when turn is complete
 
 **Turn completion and caching:** Audio/transcript caches are flushed automatically at specific points during streaming:
-
 - **On turn completion** (`turn_complete=True`): Both user and model audio caches are flushed
 - **On interruption** (`interrupted=True`): Model audio cache is flushed
 - **On generation completion**: Model audio cache is flushed
@@ -1151,7 +1144,6 @@ Think of it as a traveling notebook that accompanies a conversation from start t
 ### What is an Invocation?
 
 An **invocation** represents a complete interaction cycle:
-
 - Starts with user input (text, audio, or control signal)
 - May involve one or multiple agent calls
 - Ends when a final response is generated or when explicitly terminated

diff --git a/docs/streaming/dev-guide/part4.md b/docs/streaming/dev-guide/part4.md
@@ -221,6 +221,33 @@ sequenceDiagram
     Note over ADK,Gemini: Turn Detection: finish_reason
 ```
 
+!!! info "Progressive SSE Streaming (New in v1.19.0)"
+
+    ADK v1.19.0 introduced **progressive SSE streaming**, an experimental feature that enhances how SSE mode delivers streaming responses. When enabled, this feature improves response aggregation by:
+
+    **Key improvements:**
+
+    - **Content ordering preservation**: Maintains the original order of mixed content types (text, function calls, inline data)
+    - **Intelligent text merging**: Only merges consecutive text parts of the same type (regular text vs thought text)
+    - **Progressive delivery**: Marks all intermediate chunks as `partial=True`, with a single final aggregated response at the end
+    - **Deferred function execution**: Skips executing function calls in partial events, only executing them in the final aggregated event to avoid duplicate executions
+
+    **Enabling the feature:**
+
+    This is an experimental (WIP stage) feature disabled by default. Enable it via environment variable:
+
+    ```bash
+    export ADK_ENABLE_PROGRESSIVE_SSE_STREAMING=1
+    ```
+
+    **When to use:**
+
+    - You're using `StreamingMode.SSE` and need better handling of mixed content types (text + function calls)
+    - Your responses include thought text (extended thinking) mixed with regular text
+    - You want to ensure function calls execute only once after complete response aggregation
+
+    **Note:** This feature only affects `StreamingMode.SSE`. It does not apply to `StreamingMode.BIDI` (the focus of this guide), which uses the Live API's native bidirectional protocol.
+
 ### When to Use Each Mode
 
 Your choice between BIDI and SSE depends on your application requirements and the interaction patterns you need to support. Here's a practical guide to help you choose:
@@ -278,7 +305,7 @@ When building ADK Bidi-streaming applications, it's essential to understand how
 Understanding the distinction between **ADK `Session`** and **Live API session** is crucial for building reliable streaming applications with ADK Bidi-streaming.
 
 **ADK `Session`** (managed by SessionService):
-- Persistent conversation storage for conversation history, events, and state, created via `SessionService.create_session()`
+- Persistent conversation storage for conversation history, events, and state, created via `SessionService.create_session()` 
 - Storage options: in-memory, database (PostgreSQL/MySQL/SQLite), or Vertex AI
 - Survives across multiple `run_live()` calls and application restarts (with the persistent `SessionService`)
 
@@ -348,7 +375,6 @@ sequenceDiagram
 ```
 
 **Key insights:**
-
 - ADK Session survives across multiple `run_live()` calls and app restarts
 - Live API session is ephemeral - created and destroyed per streaming session
 - Conversation continuity is maintained through ADK Session's persistent storage
@@ -539,16 +565,6 @@ run_config = RunConfig(
         )
     )
 )
-
-# For gemini-live-2.5-flash (32k context window on Vertex AI)
-run_config = RunConfig(
-    context_window_compression=types.ContextWindowCompressionConfig(
-        trigger_tokens=25000,  # Start compression at ~78% of 32k context
-        sliding_window=types.SlidingWindow(
-            target_tokens=20000  # Compress to ~62% of context
-        )
-    )
-)
 ```
 
 **How it works:**
@@ -608,14 +624,12 @@ While compression enables unlimited session duration, consider these trade-offs:
 **Common Use Cases:**
 
 ✅ **Enable compression when:**
-
 - Sessions need to exceed platform duration limits (15/2/10 minutes)
 - Extended conversations may hit token limits (128k for 2.5-flash)
 - Customer support sessions that can last hours
 - Educational tutoring with long interactions
 
 ❌ **Disable compression when:**
-
 - All sessions complete within duration limits
 - Precision recall of early conversation is critical
 - Development/testing phase (full history aids debugging)
@@ -822,6 +836,10 @@ This parameter caps the total number of LLM invocations allowed per invocation c
 
 This parameter controls whether audio and video streams are persisted to ADK's session and artifact services for debugging, compliance, and quality assurance purposes.
 
+!!! warning "Migration Note: save_live_audio Deprecated"
+
+    **If you're using `save_live_audio`:** This parameter has been deprecated in favor of `save_live_blob`. ADK will automatically migrate `save_live_audio=True` to `save_live_blob=True` with a deprecation warning, but this compatibility layer will be removed in a future release. Update your code to use `save_live_blob` instead.
+
 Currently, **only audio is persisted** by ADK's implementation. When enabled, ADK persists audio streams to:
 
 - **[Session service](https://google.github.io/adk-docs/sessions/)**: Conversation history includes audio references
@@ -957,7 +975,7 @@ run_config = RunConfig(
 
 ADK validates CFC compatibility at session initialization and will raise an error if the model is unsupported:
 
-- ✅ **Supported**: `gemini-2.x` models (e.g., `gemini-2.5-flash-native-audio-preview-09-2025`, `gemini-2.0-flash-live-001`)
+- ✅ **Supported**: `gemini-2.x` models (e.g., `gemini-2.5-flash-native-audio-preview-09-2025`)
 - ❌ **Not supported**: `gemini-1.5-x` models
 - **Validation**: ADK checks that the model name starts with `gemini-2` when `support_cfc=True` ([`runners.py:1200-1203`](https://github.com/google/adk-python/blob/main/src/google/adk/runners.py#L1200-L1203))
 - **Code executor**: ADK automatically injects `BuiltInCodeExecutor` when CFC is enabled for safe parallel tool execution

diff --git a/docs/streaming/dev-guide/part5.md b/docs/streaming/dev-guide/part5.md
@@ -668,6 +668,35 @@ DEMO_AGENT_MODEL=gemini-2.5-flash-native-audio-preview-09-2025
 # DEMO_AGENT_MODEL=gemini-live-2.5-flash-preview-native-audio-09-2025
 ```
 
+!!! note "Environment Variable Loading Order"
+
+    When using `.env` files with `python-dotenv`, you must call `load_dotenv()` **before** importing any modules that read environment variables. Otherwise, `os.getenv()` will return `None` and fall back to the default value, ignoring your `.env` configuration.
+
+    **Correct order in `main.py`:**
+
+    ```python
+    from dotenv import load_dotenv
+    from pathlib import Path
+
+    # Load .env file BEFORE importing agent
+    load_dotenv(Path(__file__).parent / ".env")
+
+    # Now safe to import modules that use environment variables
+    from google_search_agent.agent import agent
+    ```
+
+    **Incorrect order (will not work):**
+
+    ```python
+    from dotenv import load_dotenv
+    from google_search_agent.agent import agent  # Agent reads env var here
+
+    # Too late! Agent already initialized with default model
+    load_dotenv(Path(__file__).parent / ".env")
+    ```
+
+    This is a Python import behavior: when you import a module, its top-level code executes immediately. If your agent module calls `os.getenv("DEMO_AGENT_MODEL")` at import time, the `.env` file must already be loaded.
+
 **Selecting the right model:**
 
 1. **Choose platform**: Decide between Gemini Live API (public) or Vertex AI Live API (enterprise)
@@ -953,7 +982,7 @@ The automatic enablement happens in `Runner.run_live()` when both conditions are
 
 !!! note "Source"
 
-    [`runners.py:1236-1253`](https://github.com/google/adk-python/blob/main/src/google/adk/runners.py#L1236-L1253)
+    [`runners.py:1245-1260`](https://github.com/google/adk-python/blob/main/src/google/adk/runners.py#L1245-L1260)
 
 ## Voice Configuration (Speech Config)