Skip to content

feat(audio): add SenseAudio as a selectable TTS provider for narration#38

Open
mzl163 wants to merge 2 commits into
nexu-io:mainfrom
mzl163:feat/senseaudio-tts
Open

feat(audio): add SenseAudio as a selectable TTS provider for narration#38
mzl163 wants to merge 2 commits into
nexu-io:mainfrom
mzl163:feat/senseaudio-tts

Conversation

@mzl163

@mzl163 mzl163 commented Jun 9, 2026

Copy link
Copy Markdown

SenseAudio's synthesis API is wire-compatible with MiniMax (POST /t2a_v2, Bearer auth, voice_setting/audio_setting body, base_resp envelope, hex data.audio), so narration gains a second engine. Background music stays MiniMax-only — SenseAudio has no music endpoint.

core

  • extract the shared Bearer-audio transport into audio-http.ts (provider name and error hints parameterized); minimax.ts now reuses it
  • add senseaudio.ts: resolveSenseAudioCredentials, generateTtsSenseAudio (model senseaudio-tts-1.5-260319), and listSenseAudioVoices via /get_voice

cli

  • MediaConfigStore is provider-keyed (minimax | senseaudio) with back-compat shims; keys persist separately in .html-video/media-config.json
  • /api/config/senseaudio (GET/POST/DELETE) + /api/config/senseaudio/voices
  • generate-audio picks the TTS engine from narration.provider, resolving each provider's key on demand

studio UI

  • engine selector + dynamic voice list in the narration panel; provider switch in Settings -> Audio; en/zh strings

fix(studio): the /asset route 403'd every file on Windows — the safety guard matched a forward-slash marker against a backslash path. Normalize separators before the check, and add audio MIME types (.mp3 etc.) so receives audio/mpeg instead of application/octet-stream.

SenseAudio's synthesis API is wire-compatible with MiniMax (POST /t2a_v2,
Bearer auth, voice_setting/audio_setting body, base_resp envelope, hex
data.audio), so narration gains a second engine. Background music stays
MiniMax-only — SenseAudio has no music endpoint.

core
- extract the shared Bearer-audio transport into audio-http.ts (provider name
  and error hints parameterized); minimax.ts now reuses it
- add senseaudio.ts: resolveSenseAudioCredentials, generateTtsSenseAudio
  (model senseaudio-tts-1.5-260319), and listSenseAudioVoices via /get_voice

cli
- MediaConfigStore is provider-keyed (minimax | senseaudio) with back-compat
  shims; keys persist separately in .html-video/media-config.json
- /api/config/senseaudio (GET/POST/DELETE) + /api/config/senseaudio/voices
- generate-audio picks the TTS engine from narration.provider, resolving each
  provider's key on demand

studio UI
- engine selector + dynamic voice list in the narration panel; provider switch
  in Settings -> Audio; en/zh strings

fix(studio): the /asset route 403'd every file on Windows — the safety guard
matched a forward-slash marker against a backslash path. Normalize separators
before the check, and add audio MIME types (.mp3 etc.) so <audio> receives
audio/mpeg instead of application/octet-stream.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@lefarcen lefarcen requested a review from PerishCode June 9, 2026 06:47
@lefarcen lefarcen added size/XL Size XL (700-1499 LOC) risk/medium Medium risk type/feature Feature change labels Jun 9, 2026
@lefarcen

lefarcen commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Hey @mzl163 👋 Thanks for putting this together — the SenseAudio + MiniMax abstraction approach (shared Bearer-audio transport in audio-http.ts, provider-keyed config store, engine selector in the UI) is a clean way to expand narration without forking the existing MiniMax logic. The Windows path-separator fix in the asset route is a nice catch to include.

This is still marked as a draft, so I'll hold off on a full code review until you're ready. I've added @PerishCode for code review in the meantime.


💡 To drive this PR to merge hands-free once you're ready, paste this to your AI coding agent (Claude Code / Codex / opencode / Cursor …):
Take over nexu-io/html-video#38 until it merges — read https://raw.githubusercontent.com/nexu-io/looper/main/skills/pr-takeover/SKILL.md and follow it.

@mzl163 mzl163 marked this pull request as ready for review June 9, 2026 06:58
getStatus() returned an empty baseUrl for a config-stored key with no explicit
base URL, while resolve() silently substituted the provider default — so the
Settings UI showed a blank endpoint even though requests hit the default host.
Report the same effective URL resolve() uses, falling back to the provider
default. For MiniMax's region-bound keys this surfaces exactly which region a
key will authenticate against (issue nexu-io#4).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@lefarcen lefarcen requested a review from elihahah666 June 9, 2026 07:05
@lefarcen

lefarcen commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Thanks for marking this ready, @mzl163! @PerishCode is on for code review, and I've also added @elihahah666 for a product look at the new narration engine selector and Settings → Audio provider switch. You should hear back from them shortly.

@lefarcen

Copy link
Copy Markdown
Contributor

Hey @PerishCode and @elihahah666 — this PR has been sitting for about 8 days now. Just a gentle ping to check if it's on your radar. No pressure if you're swamped; a quick ETA or any initial thoughts would help @mzl163 know what to expect. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/medium Medium risk size/XL Size XL (700-1499 LOC) type/feature Feature change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants