feat(audio): add SenseAudio as a selectable TTS provider for narration by mzl163 · Pull Request #38 · nexu-io/html-video

mzl163 · 2026-06-09T06:45:41Z

SenseAudio's synthesis API is wire-compatible with MiniMax (POST /t2a_v2, Bearer auth, voice_setting/audio_setting body, base_resp envelope, hex data.audio), so narration gains a second engine. Background music stays MiniMax-only — SenseAudio has no music endpoint.

core

extract the shared Bearer-audio transport into audio-http.ts (provider name and error hints parameterized); minimax.ts now reuses it
add senseaudio.ts: resolveSenseAudioCredentials, generateTtsSenseAudio (model senseaudio-tts-1.5-260319), and listSenseAudioVoices via /get_voice

cli

MediaConfigStore is provider-keyed (minimax | senseaudio) with back-compat shims; keys persist separately in .html-video/media-config.json
/api/config/senseaudio (GET/POST/DELETE) + /api/config/senseaudio/voices
generate-audio picks the TTS engine from narration.provider, resolving each provider's key on demand

studio UI

engine selector + dynamic voice list in the narration panel; provider switch in Settings -> Audio; en/zh strings

fix(studio): the /asset route 403'd every file on Windows — the safety guard matched a forward-slash marker against a backslash path. Normalize separators before the check, and add audio MIME types (.mp3 etc.) so receives audio/mpeg instead of application/octet-stream.

SenseAudio's synthesis API is wire-compatible with MiniMax (POST /t2a_v2, Bearer auth, voice_setting/audio_setting body, base_resp envelope, hex data.audio), so narration gains a second engine. Background music stays MiniMax-only — SenseAudio has no music endpoint. core - extract the shared Bearer-audio transport into audio-http.ts (provider name and error hints parameterized); minimax.ts now reuses it - add senseaudio.ts: resolveSenseAudioCredentials, generateTtsSenseAudio (model senseaudio-tts-1.5-260319), and listSenseAudioVoices via /get_voice cli - MediaConfigStore is provider-keyed (minimax | senseaudio) with back-compat shims; keys persist separately in .html-video/media-config.json - /api/config/senseaudio (GET/POST/DELETE) + /api/config/senseaudio/voices - generate-audio picks the TTS engine from narration.provider, resolving each provider's key on demand studio UI - engine selector + dynamic voice list in the narration panel; provider switch in Settings -> Audio; en/zh strings fix(studio): the /asset route 403'd every file on Windows — the safety guard matched a forward-slash marker against a backslash path. Normalize separators before the check, and add audio MIME types (.mp3 etc.) so <audio> receives audio/mpeg instead of application/octet-stream. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

lefarcen · 2026-06-09T06:48:36Z

Hey @mzl163 👋 Thanks for putting this together — the SenseAudio + MiniMax abstraction approach (shared Bearer-audio transport in audio-http.ts, provider-keyed config store, engine selector in the UI) is a clean way to expand narration without forking the existing MiniMax logic. The Windows path-separator fix in the asset route is a nice catch to include.

This is still marked as a draft, so I'll hold off on a full code review until you're ready. I've added @PerishCode for code review in the meantime.

💡 To drive this PR to merge hands-free once you're ready, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …):
Take over nexu-io/html-video#38 until it merges — read https://raw.githubusercontent.com/nexu-io/looper/main/skills/pr-takeover/SKILL.md and follow it.

getStatus() returned an empty baseUrl for a config-stored key with no explicit base URL, while resolve() silently substituted the provider default — so the Settings UI showed a blank endpoint even though requests hit the default host. Report the same effective URL resolve() uses, falling back to the provider default. For MiniMax's region-bound keys this surfaces exactly which region a key will authenticate against (issue nexu-io#4). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

lefarcen · 2026-06-09T07:05:58Z

Thanks for marking this ready, @mzl163! @PerishCode is on for code review, and I've also added @elihahah666 for a product look at the new narration engine selector and Settings → Audio provider switch. You should hear back from them shortly.

lefarcen · 2026-06-16T23:23:23Z

Hey @PerishCode and @elihahah666 — this PR has been sitting for about 8 days now. Just a gentle ping to check if it's on your radar. No pressure if you're swamped; a quick ETA or any initial thoughts would help @mzl163 know what to expect. 🙏

lefarcen requested a review from PerishCode June 9, 2026 06:47

lefarcen added size/XL Size XL (700-1499 LOC) risk/medium Medium risk type/feature Feature change labels Jun 9, 2026

mzl163 marked this pull request as ready for review June 9, 2026 06:58

lefarcen requested a review from elihahah666 June 9, 2026 07:05

This was referenced Jun 12, 2026

Feature Request: add Volcengine as an audio provider and support user-selectable audio backends #45

Open

feat(narration): add FishAudio as a selectable TTS provider #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): add SenseAudio as a selectable TTS provider for narration#38

feat(audio): add SenseAudio as a selectable TTS provider for narration#38
mzl163 wants to merge 2 commits into
nexu-io:mainfrom
mzl163:feat/senseaudio-tts

mzl163 commented Jun 9, 2026

Uh oh!

lefarcen commented Jun 9, 2026

Uh oh!

lefarcen commented Jun 9, 2026

Uh oh!

lefarcen commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mzl163 commented Jun 9, 2026

Uh oh!

lefarcen commented Jun 9, 2026

Uh oh!

lefarcen commented Jun 9, 2026

Uh oh!

lefarcen commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants