cloudflare · renandincer · Oct 7, 2025 · Oct 7, 2025 · Oct 7, 2025 · Oct 7, 2025
diff --git a/examples/realtime-agents/.env.example b/examples/realtime-agents/.env.example
@@ -0,0 +1,6 @@
+CF_ACCOUNT_ID=
+CF_API_TOKEN=
+DEEPGRAM_API_KEY=
+ELEVENLABS_API_KEY=
+RTK_MEETING_ID=
+RTK_AUTH_TOKEN=
diff --git a/examples/realtime-agents/README.md b/examples/realtime-agents/README.md
@@ -0,0 +1,198 @@
+# Realtime Voice Assistant Agent
+
+This example demonstrates how to build a complete voice assistant using Cloudflare's AI Agent framework with realtime capabilities. The assistant can:
+
+- Listen to audio input via RealtimeKit
+- Convert speech to text using Deepgram STT
+- Process conversations with intelligent responses
+- Convert responses back to speech using ElevenLabs TTS
+- Stream audio output back to the client
+
+## Architecture
+
+The voice assistant uses a pipeline architecture:
+
+```
+Audio Input → RealtimeKit → Deepgram STT → Agent Logic → ElevenLabs TTS → Audio Output
+```
+
+## Setup
+
+1. **Environment Variables**: Configure the following in your `wrangler.toml` or environment:
+
+```toml
+[vars]
+ACCOUNT_ID = "your-cloudflare-account-id"
+API_TOKEN = "your-cloudflare-api-token"
+DEEPGRAM_API_KEY = "your-deepgram-api-key"
+ELEVENLABS_API_KEY = "your-elevenlabs-api-key"
+RTK_MEETING_ID = "your-realtimekit-meeting-id"  # Optional
+RTK_AUTH_TOKEN = "your-realtimekit-auth-token"  # Optional
+```
+
+2. **API Keys**:
+   - Get a Deepgram API key from [https://deepgram.com](https://deepgram.com)
+   - Get an ElevenLabs API key from [https://elevenlabs.io](https://elevenlabs.io)
+   - Get your Cloudflare Account ID and API token from the Cloudflare dashboard
+
+3. **Deploy**:
+
+```bash
+npm run dev  # For local development
+wrangler deploy  # For production deployment
+```
+
+## Usage
+
+Once deployed, the agent creates WebSocket connections for real-time voice interaction.
+
+### Basic Flow:
+
+1. Client connects to the agent WebSocket endpoint
+2. Agent initializes the realtime pipeline
+3. Client streams audio → Agent processes → Agent streams audio back
+4. Agent handles conversation logic in `onRealtimeTranscript()` method
+
+### Customization:
+
+- Modify `onRealtimeTranscript()` method to add your own conversational AI logic
+- Integrate with OpenAI, Anthropic, or other language models
+- Add knowledge base queries, tool calling, or context management
+- Customize voice settings in ElevenLabsTTS configuration
+
+## Key Components
+
+### RealtimeVoiceAgent
+
+- Extends `Agent` class with realtime pipeline components
+- Implements `onRealtimeTranscript()` for conversation handling
+- Manages pipeline initialization and cleanup via `realtimePipelineComponents`
+
+### MyAgent (Durable Object)
+
+- Manages agent lifecycle and WebSocket connections
+- Handles client connect/disconnect events
+- Implements alarm handling for maintenance tasks
+
+### Pipeline Components:
+
+- **RealtimeKitTransport**: Audio input/output via RealtimeKit
+- **DeepgramSTT**: Speech-to-text conversion
+- **ElevenLabsTTS**: Text-to-speech synthesis
+
+## Pipeline Configuration
+
+The agent uses a pipeline component system defined in `realtimePipelineComponents` method:
+
+```typescript
+createRealtimePipeline() {
+  const rtk = new RealtimeKitTransport(
+    this.env.RTK_MEETING_ID || "default-meeting",
+    this.env.RTK_AUTH_TOKEN || "default-token",
+    [{
+      media_kind: "audio",
+      stream_kind: "microphone",
+      preset_name: "*"
+    }]
+  );
+
+  const stt = new DeepgramSTT(this.env.DEEPGRAM_API_KEY);
+  const tts = new ElevenLabsTTS(this.env.ELEVENLABS_API_KEY);
+
+  // Pipeline: Audio Input → STT → Agent → TTS → Audio Output
+  return [rtk, stt, this, tts, rtk];
+}
+```
+
+### Pipeline Flow
+
+1. **Audio Input**: RealtimeKit captures microphone audio
+2. **Speech Recognition**: Deepgram converts audio to text
+3. **Agent Processing**: Your agent receives transcribed text via `onRealtimeTranscript()`
+4. **Response Generation**: Agent generates text response
+5. **Speech Synthesis**: ElevenLabs converts response to audio
+6. **Audio Output**: RealtimeKit streams audio back to client
+
+### Customizing the Pipeline
+
+You can modify the pipeline components in `createRealtimePipeline()`:
+
+```typescript
+// Different STT provider
+const stt = new CustomSTT(this.env.CUSTOM_API_KEY);
+
+// Multiple TTS voices
+const tts1 = new ElevenLabsTTS(this.env.ELEVENLABS_KEY, { voice_id: "voice1" });
+const tts2 = new ElevenLabsTTS(this.env.ELEVENLABS_KEY, { voice_id: "voice2" });
+
+// Audio preprocessing
+const processor = new AudioProcessor();
+
+return [rtk, processor, stt, this, tts1, rtk];
+```
+
+## Implementation Details
+
+The Agent class implements the `RealtimePipelineComponent` interface, allowing it to be used directly in realtime pipelines:
+
+```typescript
+class RealtimeVoiceAgent extends Agent<Env> {
+  realtimePipelineComponents = this.createRealtimePipeline;
+
+  createRealtimePipeline() {
+    const rtk = new RealtimeKitTransport(...);
+    const stt = new DeepgramSTT(...);
+    const tts = new ElevenLabsTTS(...);
+
+    // Use 'this' to include the agent in the pipeline
+    return [rtk, stt, this, tts, rtk];
+  }
+
+  // This method receives transcribed text
+  onRealtimeTranscript(text: string, reply: (response: string) => void) {
+    // Your conversation logic here
+    const response = processConversation(text);
+    reply(response);
+  }
+}
+```
+
+**Key Features:**
+
+- ✅ **Direct agent integration** - Use `this` to include your agent in the pipeline
+- ✅ **Type safety** - Full TypeScript support for pipeline components
+- ✅ **Flexible positioning** - Place the agent anywhere in the processing flow
+- ✅ **Clean separation** - Clear distinction between pipeline setup and conversation logic
+
+## Examples
+
+The current implementation includes basic conversational responses like:
+
+- Greetings and farewells
+- Time and date queries
+- Simple jokes
+- Help information
+
+You can extend this by integrating with:
+
+- OpenAI GPT models for advanced conversations
+- Knowledge bases for domain-specific responses
+- Weather APIs, calendars, or other external services
+- Custom business logic and workflows
+
+## Development
+
+Run locally:
+
+```bash
+npm run dev
+```
+
+The agent will be available at the WebSocket endpoint provided by Wrangler.
+
+## Troubleshooting
+
+- Ensure all API keys are properly configured
+- Check Cloudflare account ID and API token permissions
+- Verify RealtimeKit meeting configuration
+- Monitor logs for pipeline initialization errors
diff --git a/examples/realtime-agents/package.json b/examples/realtime-agents/package.json
@@ -0,0 +1,11 @@
+{
+  "name": "@cloudflare/realtime-agents-example",
+  "author": "Manish",
+  "keywords": [],
+  "private": true,
+  "scripts": {
+    "dev": "wrangler dev",
+    "types": "wrangler types"
+  },
+  "type": "module"
+}
diff --git a/examples/realtime-agents/src/index.ts b/examples/realtime-agents/src/index.ts
@@ -0,0 +1,161 @@
+import {
+  RealtimeKitTransport,
+  DeepgramSTT,
+  ElevenLabsTTS
+} from "agents/realtime";
+import { Agent, routeAgentRequest } from "agents";
+
+// Environment interface for required secrets and configuration
+interface Env {
+  // Cloudflare credentials
+  CF_ACCOUNT_ID: string;
+  CF_API_TOKEN: string;
+
+  // Third-party API keys
+  DEEPGRAM_API_KEY: string;
+  ELEVENLABS_API_KEY: string;
+
+  // RealtimeKit meeting configuration
+  RTK_MEETING_ID?: string;
+  RTK_AUTH_TOKEN?: string;
+
+  // Durable Object binding
+  REALTIME_VOICE_AGENT: DurableObjectNamespace;
+}
+
+export class RealtimeVoiceAgent extends Agent<Env> {
+  realtimePipelineComponents = () => {
+    // RealtimeKit transport for audio I/O
+    const rtk = new RealtimeKitTransport(
+      this.env.RTK_MEETING_ID || "default-meeting",
+      this.env.RTK_AUTH_TOKEN || "default-token",
+      [
+        {
+          media_kind: "audio",
+          stream_kind: "microphone",
+          preset_name: "*"
+        }
+      ]
+    );
+
+    // Deepgram for speech-to-text (Audio → Text)
+    const stt = new DeepgramSTT(this.env.DEEPGRAM_API_KEY);
+
+    // ElevenLabs for text-to-speech (Text → Audio)
+    const tts = new ElevenLabsTTS(this.env.ELEVENLABS_API_KEY);
+
+    return [rtk, stt, this, tts, rtk];
+  };
+
+  /**
+   * Handle incoming transcribed text and generate intelligent responses
+   * This is where you implement your AI logic, knowledge retrieval, etc.
+   */
+  async onRealtimeTranscript(
+    text: string,
+    reply: (text: string | ReadableStream<Uint8Array>) => void
+  ): Promise<void> {
+    console.log(`Received transcript: ${text}`);
+
+    // Simple response logic - you can enhance this with:
+    // - Integration with language models (OpenAI, Anthropic, etc.)
+    // - Knowledge base queries
+    // - Context management
+    // - Intent recognition
+    // - Tool calling
+
+    let response = "";
+
+    // Basic conversational responses
+    const lowerText = text.toLowerCase().trim();
+
+    if (lowerText.includes("hello") || lowerText.includes("hi")) {
+      response = "Hello! I'm your voice assistant. How can I help you today?";
+    } else if (lowerText.includes("time")) {
+      const now = new Date();
+      response = `The current time is ${now.toLocaleTimeString()}.`;
+    } else if (lowerText.includes("date")) {
+      const now = new Date();
+      response = `Today's date is ${now.toLocaleDateString()}.`;
+    } else if (lowerText.includes("weather")) {
+      response =
+        "I'd love to help with weather information, but I don't have access to weather data right now. You could integrate a weather API for real weather updates!";
+    } else if (lowerText.includes("joke")) {
+      const jokes = [
+        "Why don't scientists trust atoms? Because they make up everything!",
+        "Why did the scarecrow win an award? He was outstanding in his field!",
+        "What do you call a fake noodle? An impasta!"
+      ];
+      response = jokes[Math.floor(Math.random() * jokes.length)];
+    } else if (
+      lowerText.includes("help") ||
+      lowerText.includes("what can you do")
+    ) {
+      response =
+        "I can help you with basic conversations, tell you the time and date, share jokes, and more. Try asking me about the weather or saying hello!";
+    } else if (lowerText.includes("goodbye") || lowerText.includes("bye")) {
+      response = "Goodbye! It was nice talking with you.";
+    } else {
+      // Default response for unrecognized input
+      response = `You said: "${text}". I'm still learning how to respond to that. Try asking about the time, weather, or say hello!`;
+    }
+
+    // Send the response back through the pipeline
+    reply(response);
+  }
+
+  /**
+   * Cleanup resources when the agent is no longer needed
+   */
+  async cleanup(): Promise<void> {
+    try {
+      if (this.realtimePipelineRunning) {
+        await this.stopRealtimePipeline();
+        console.log("Agent stopped successfully");
+      }
+    } catch (error) {
+      console.error("Error during cleanup:", error);
+    }
+  }
+}
+
+/**
+ * Worker fetch handler - routes requests to the appropriate Durable Object
+ */
+export default {
+  async fetch(request: Request, env: Env): Promise<Response> {
+    try {
+      const url = new URL(request.url);
+
+      // Health check endpoint
+      if (url.pathname === "/health") {
+        return new Response(
+          JSON.stringify({
+            status: "healthy",
+            timestamp: new Date().toISOString(),
+            service: "realtime-voice-assistant"
+          }),
+          {
+            headers: { "Content-Type": "application/json" }
+          }
+        );
+      }
+
+      // Forward the request to the Durable Object
+      const response = await routeAgentRequest(request, env);
+      return response || new Response("Not found", { status: 404 });
+    } catch (error) {
+      console.error("Worker fetch error:", error);
+      return new Response(
+        JSON.stringify({
+          error: "Internal server error",
+          message: error instanceof Error ? error.message : "Unknown error"
+        }),
+        {
+          status: 500,
+          headers: { "Content-Type": "application/json" }
+        }
+      );
+    }
+  }
+};
diff --git a/examples/realtime-agents/tsconfig.json b/examples/realtime-agents/tsconfig.json
@@ -0,0 +1,3 @@
+{
+  "extends": "../../tsconfig.base.json"
+}