Conversation
|
Seconded. Also, options to increase/decrease the speed of TTS narration would be great if possible, since the default speed is slower than most people speak 🙂 |
you got it |
notimaginative
left a comment
There was a problem hiding this comment.
I tested it on Mac and confirmed everything was still working. Apart from the issues noted obviously. I'd like to check and test the Linux changes as well and will do that when I have a bit more time.
However, one desperately needed addition is the ability to actually test the voice changes before saving them.
| static SCP_vector<SCP_string> cached_voices; | ||
| static bool voices_cached = false; |
There was a problem hiding this comment.
This really doesn't need to happen. It's a complete waste of memory. The voices should be cached of course, but that should happen when and where it's needed, not globally at startup.
I'm only marking that issue here, but it applies to win and linux versions as well.
There was a problem hiding this comment.
this is needed for imgui, otherwise it would call the getvoices() at every frame.
There is any any to detect when we are out of the settings systems to clear memory? Ideally you would call it once when you enter to cache it and empty on exit.
There was a problem hiding this comment.
ingame_options_close() can be used for the F3 settings screen, but I'm not sure how that interacts with SCPUI. Simple in theory, but what I didn't realize until now is that the options builder prints the current values to the log, which means it has to load all of this stuff at startup without the in-game options screen ever being active. Which also means that we can't just initialize the voices in in game_options_init().
We might be able to cache the name of the currently active voice and use that whenever it requests the id of the active voice. The options could be changed so that only ttsvoice_enumerator() gets the whole list of voices and builds a <int, SCP_string> pair vector and then display doesn't have to look it up. I believe the display resolution option does it like that so take a look in 2d.cpp for an example. You'd still have to get the list of voices every frame, but only once a frame, and only when on the settings screen, so you could probably get away with not caching it.
Hopefully something like that would work. If not then I think it's going to require changes beyond the scope of this PR. If that's the case then we'll just have to accept it as-is and have a look at improving OptionBuilder in the future to make dealing with this stuff easier.
|
Regarding testing; The options UI framework doesn't really have a nice way to set that up as a separate control. I'd recommend carving out a special case for this option's selector to play a test whenever the value changes. |
I played around with this a bit, but it doesn't look like there's a way to do it. It's easy enough to add a test command to the change listener but that's only called when changes are saved. There doesn't appear to be any current mechanism to deal with control changes when they happen. I assume we'll have to push the test feature down the road, extend options manager to have such functionality, then circle back and add it for TTS. |
|
I did the requested changes. taylor when you have the time please take a look. two things to add:
Also, normally you are not going to want to use the espeak-ng voices, they are terrible, so Linux users should be able to remove the espeak-ng backend and install a neural one like PiperTTS to speech-dispatcher. I havent tested that myselft but the point of using speech-dispatcher is that it should work with any TTS backend. |
| static SCP_vector<int> ttsvoice_enumerator() | ||
| { | ||
| SCP_vector<int> vals; | ||
| auto voices = speech_enumerate_voices(); | ||
| for (int i = 0; i < static_cast<int>(voices.size()); ++i) { | ||
| vals.push_back(i); | ||
| } | ||
| return vals; | ||
| } |
There was a problem hiding this comment.
Use SCP_vector<std::pair<int, SCP_string>> here and then you don't need to enumerate again inside ttsvoice_display(). And doing that will hopefully fix it so that you don't need to use the cache on Linux still since it would only have to get the voice list once per frame.
| auto voices = speech_enumerate_voices(); | ||
| if (voice < 0 || static_cast<size_t>(voice) >= voices.size()) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Voices are enumerated twice in this function so this block of code should be removed. Just check for voice < 0 here, then verify that voice < num_voices after line 153 (once num_voices is set and hr is tested valid).
| while (voices[num_voices] != nullptr) { | ||
| num_voices++; | ||
| } |
There was a problem hiding this comment.
What's the purpose of this? It appears to just prevent parsing of more than 600 voices. If that's the case then just remove this and add that check to the loop below here so that it will bail then the limit it hit.
e.g.:
for (int i = 0; voices[i], i < 600; i++) {
...
}
| for (int i = 0; voices[i] != nullptr; i++) { | ||
| // There are too many we cant add them all | ||
| // Only add English voices | ||
| if (num_voices < 600 || (voices[i]->language && strstr(voices[i]->language, "en") != nullptr)) { |
There was a problem hiding this comment.
Use strncmp(voices[i]->language, "en", 2) == 0 here instead of strstr(). It will be much faster and less error prone.
And of course remove the num_voices check in favor of the for loop fix as mentioned in the previous comment.
| auto voices = speech_enumerate_voices(); | ||
| if (voices.empty() || id < 0 || static_cast<size_t>(id) >= voices.size()) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
You shouldn't need to enumerate voices here since if id is wrong then something extremely bad happened with the settings ui. Just let speech_set_voice() deal with sanity checking.
The objectives of this pr are:
Add TTS Speech options to ingame options settings. Incliding voice selection, voice rate, volume settings and select places were TTS is used
Add TTS Speech support to Linux OS by using speech-dispatcher/libspeechd-dev, this is done used dlopen and a small implementation of the lib types. In this way there is not additional dependency for compiling or in runtime, if speech-dispatcher is not installed on host OS, the speech system just fails to init.
Separate the speech system cpps into diferent files for each platform, copying the way it was done for mac, this is clearer and will make it easier to add other platforms, like android in the future.
Sanitize text was moved to an earlier stage, to fsspeech.cpp, to avoid having to repeat this bit of code in every implementation.
Note:
I may have broken mac tts with these changes and i have no way to test it.