emqx · ysfscream · Dec 3, 2025 · Dec 3, 2025 · Dec 4, 2025 · Dec 4, 2025
diff --git a/dir.yaml b/dir.yaml
@@ -941,7 +941,7 @@
       title_ja: MCP over MQTT & Multimedia Service SDKs
       path: emqx-ai/sdks/overview
       collapsed: true
-      children: 
+      children:
         - title_en: Clients Compatible with Multimedia Services
           title_cn: 与多媒体服务适配的客户端
           title_ja: Clients Compatible with Multimedia Services
@@ -959,6 +959,12 @@
             - emqx-ai/sdks/mcp-sdk-paho-c
             - emqx-ai/sdks/mcp-sdk-python
             - emqx-ai/sdks/mcp-sdk-typescript
+    - title_en: Tutorials
+      title_cn: 教程
+      title_ja: チュートリアル
+      collapsed: true
+      children:
+        - emqx-ai/tutorials/quickstart-volcengine-voice
 - title_en: Tutorials
   title_cn: 实用教程
   title_ja: 実用的なチュートリアル

diff --git a/en_US/emqx-ai/tutorials/quickstart-volcengine-voice.md b/en_US/emqx-ai/tutorials/quickstart-volcengine-voice.md
@@ -0,0 +1,34 @@
+# Quick Start: Building an AI Agent with EMQ + Volcengine Voice Services
+
+This document describes how to quickly deploy an AI agent demo system with voice interaction and device control using Docker Compose. The project simulates smart device capabilities (camera, expressions, volume control, etc.) through a PC browser, demonstrating how the MCP over MQTT protocol enables real-time AI Agent control of devices. The system integrates Volcengine RTC for voice channels, ASR/TTS for speech recognition and synthesis, and CustomLLM mode to connect to custom AI Agent services for multi-turn conversations and tool invocations.
+
+Watch the [demo video](https://youtu.be/x_RxJViZyLQ) to see the complete demo effect.
+
+## Architecture Overview
+
+The system consists of three core components:
+
+### Component Overview
+
+| Component | Role | Port | Main Functions |
+|------|------|------|----------|
+| **web** | MCP Server | 8080 | Frontend UI, exposes hardware control tools (camera/expressions/volume) |
+| **app** | MCP Client + AI Agent | 8081 | Provides `/chat-stream` endpoint, handles LLM/VLM inference and MCP tool invocations |
+| **volc-server** | Volcengine Proxy | 3002 | Manages RTC rooms/tokens, configures CustomLLM address for Volcengine services to request app |
+
+### Communication Flow
+
+```text
+1. Web UI → volc-server: Request scene configuration and RTC credentials
+2. Web UI ↔ Volcengine RTC: Establish real-time audio/video connection (ASR/TTS)
+3. Volcengine → app: CustomLLM callback to /chat-stream (SSE streaming response)
+4. app ↔ Web UI: Invoke MCP tools via MQTT (camera/expressions, etc.)
+5. Volcengine → Web UI: TTS synthesized voice playback
+```
+
+**Core Capabilities**:
+
+- **MCP over MQTT Protocol**: Enables AI Agent cross-network tool invocation via EMQX Broker (camera, expressions, volume control)
+- **Multimodal Understanding**: Integrates VLM vision models, supports visual scenarios like "what am I holding"
+- **Real-time Voice Interaction**: Based on Volcengine RTC + ASR/TTS, end-to-end speech recognition and synthesis with low latency
+- **Parallel Processing Architecture**: Tool invocation and voice synthesis execute asynchronously for smooth user experience
diff --git a/ja_JP/emqx-ai/tutorials/quickstart-volcengine-voice.md b/ja_JP/emqx-ai/tutorials/quickstart-volcengine-voice.md
@@ -0,0 +1,34 @@
+# クイックスタート：EMQ + Volcengine音声サービスでAIエージェントを構築する
+
+このドキュメントでは、Docker Composeを使用して、音声インタラクションとデバイス制御をサポートするAIエージェントデモシステムを迅速にデプロイする方法について説明します。このプロジェクトは、PCブラウザを介してスマートデバイス機能（カメラ、表情、音量制御など）をシミュレートし、MCP over MQTTプロトコルがAIエージェントによるデバイスのリアルタイム制御を実現する方法を示します。システムは、Volcengine RTCを統合して音声チャネルを実現し、ASR/TTSで音声認識と合成を提供し、CustomLLMモードでカスタムAIエージェントサービスに接続して、マルチターン会話とツール呼び出しを完了します。
+
+[デモ動画](https://youtu.be/x_RxJViZyLQ)を視聴して、完全なデモ効果をご覧ください。
+
+## アーキテクチャ概要
+
+システムは3つのコアコンポーネントで構成されています：
+
+### コンポーネント概要
+
+| コンポーネント | 役割 | ポート | 主な機能 |
+|------|------|------|----------|
+| **web** | MCP Server | 8080 | フロントエンドUI、ハードウェア制御ツールの公開（カメラ/表情/音量） |
+| **app** | MCP Client + AI Agent | 8081 | `/chat-stream`エンドポイントの提供、LLM/VLM推論とMCPツール呼び出しの処理 |
+| **volc-server** | Volcengineプロキシ | 3002 | RTCルーム/トークンの管理、CustomLLMアドレスの設定、Volcengineサービスがappにリクエストできるようにする |
+
+### 通信フロー
+
+```text
+1. Web UI → volc-server: シーン設定とRTC認証情報のリクエスト
+2. Web UI ↔ Volcengine RTC: リアルタイムオーディオ/ビデオ接続の確立（ASR/TTS）
+3. Volcengine → app: CustomLLMコールバックで/chat-stream（SSEストリーミングレスポンス）
+4. app ↔ Web UI: MQTTを介してMCPツールを呼び出す（カメラ/表情など）
+5. Volcengine → Web UI: TTS合成音声の再生
+```
+
+**コア機能**：
+
+- **MCP over MQTTプロトコル**: EMQX Brokerを介してAIエージェントのデバイスへのクロスネットワークツール呼び出しを実現（カメラ、表情、音量制御）
+- **マルチモーダル理解**: VLMビジョンモデルを統合し、「手に持っているのは何ですか」などの視覚シナリオをサポート
+- **リアルタイム音声インタラクション**: Volcengine RTC + ASR/TTSに基づき、エンドツーエンドの音声認識と合成、低レイテンシレスポンス
+- **並列処理アーキテクチャ**: ツール呼び出しと音声合成が非同期で実行され、スムーズなユーザーエクスペリエンスを実現
diff --git a/zh_CN/emqx-ai/tutorials/assets/chat-example.png b/zh_CN/emqx-ai/tutorials/assets/chat-example.png
diff --git a/zh_CN/emqx-ai/tutorials/assets/mcp-tool-example.png b/zh_CN/emqx-ai/tutorials/assets/mcp-tool-example.png
diff --git a/zh_CN/emqx-ai/tutorials/assets/mqtt-settings.png b/zh_CN/emqx-ai/tutorials/assets/mqtt-settings.png
diff --git a/zh_CN/emqx-ai/tutorials/assets/voice-connected.png b/zh_CN/emqx-ai/tutorials/assets/voice-connected.png
diff --git a/zh_CN/emqx-ai/tutorials/assets/web-ui-initial.png b/zh_CN/emqx-ai/tutorials/assets/web-ui-initial.png