feat(audio): add SenseAudio as a selectable TTS provider for narration#38
feat(audio): add SenseAudio as a selectable TTS provider for narration#38mzl163 wants to merge 2 commits into
Conversation
SenseAudio's synthesis API is wire-compatible with MiniMax (POST /t2a_v2, Bearer auth, voice_setting/audio_setting body, base_resp envelope, hex data.audio), so narration gains a second engine. Background music stays MiniMax-only — SenseAudio has no music endpoint. core - extract the shared Bearer-audio transport into audio-http.ts (provider name and error hints parameterized); minimax.ts now reuses it - add senseaudio.ts: resolveSenseAudioCredentials, generateTtsSenseAudio (model senseaudio-tts-1.5-260319), and listSenseAudioVoices via /get_voice cli - MediaConfigStore is provider-keyed (minimax | senseaudio) with back-compat shims; keys persist separately in .html-video/media-config.json - /api/config/senseaudio (GET/POST/DELETE) + /api/config/senseaudio/voices - generate-audio picks the TTS engine from narration.provider, resolving each provider's key on demand studio UI - engine selector + dynamic voice list in the narration panel; provider switch in Settings -> Audio; en/zh strings fix(studio): the /asset route 403'd every file on Windows — the safety guard matched a forward-slash marker against a backslash path. Normalize separators before the check, and add audio MIME types (.mp3 etc.) so <audio> receives audio/mpeg instead of application/octet-stream. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Hey @mzl163 👋 Thanks for putting this together — the SenseAudio + MiniMax abstraction approach (shared Bearer-audio transport in This is still marked as a draft, so I'll hold off on a full code review until you're ready. I've added @PerishCode for code review in the meantime. 💡 To drive this PR to merge hands-free once you're ready, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …): |
getStatus() returned an empty baseUrl for a config-stored key with no explicit base URL, while resolve() silently substituted the provider default — so the Settings UI showed a blank endpoint even though requests hit the default host. Report the same effective URL resolve() uses, falling back to the provider default. For MiniMax's region-bound keys this surfaces exactly which region a key will authenticate against (issue nexu-io#4). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks for marking this ready, @mzl163! @PerishCode is on for code review, and I've also added @elihahah666 for a product look at the new narration engine selector and Settings → Audio provider switch. You should hear back from them shortly. |
|
Hey @PerishCode and @elihahah666 — this PR has been sitting for about 8 days now. Just a gentle ping to check if it's on your radar. No pressure if you're swamped; a quick ETA or any initial thoughts would help @mzl163 know what to expect. 🙏 |
SenseAudio's synthesis API is wire-compatible with MiniMax (POST /t2a_v2, Bearer auth, voice_setting/audio_setting body, base_resp envelope, hex data.audio), so narration gains a second engine. Background music stays MiniMax-only — SenseAudio has no music endpoint.
core
cli
studio UI
fix(studio): the /asset route 403'd every file on Windows — the safety guard matched a forward-slash marker against a backslash path. Normalize separators before the check, and add audio MIME types (.mp3 etc.) so receives audio/mpeg instead of application/octet-stream.