Skip to content

Commit 7c58b5c

Browse files
authored
Merge pull request #129 from pettarin/devel
Added AWS Polly TTS API wrapper
2 parents d73ad9f + 221cde2 commit 7c58b5c

29 files changed

+715
-168
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,7 @@ which explains how to use the built-in command line tools.
235235
* Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML
236236
* Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
237237
* MFCC and DTW computed via Python C extensions to reduce the processing time
238-
* Several built-in TTS engine wrappers: eSpeak (default), eSpeak-ng, Festival, Nuance TTS API
238+
* Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, Nuance TTS API
239239
* Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
240240
* Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)
241241
* Batch processing of multiple audio/text pairs

README.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -251,8 +251,8 @@ Supported Features
251251
SWE, TUR, UKR
252252
- MFCC and DTW computed via Python C extensions to reduce the
253253
processing time
254-
- Several built-in TTS engine wrappers: eSpeak (default), eSpeak-ng,
255-
Festival, Nuance TTS API
254+
- Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak
255+
(default), eSpeak-ng, Festival, Nuance TTS API
256256
- Default TTS (eSpeak) called via a Python C extension for fast audio
257257
synthesis
258258
- Possibility of running a custom, user-provided TTS engine Python

aeneas/language.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ class Language(object):
4040
Consult the documentation of your TTS engine wrapper to
4141
see the list of languages supported by it:
4242
43+
* :class:`~aeneas.ttswrappers.awsttswrapper.AWSTTSWrapper`
4344
* :class:`~aeneas.ttswrappers.espeakttswrapper.ESPEAKTTSWrapper` (default TTS)
4445
* :class:`~aeneas.ttswrappers.espeakngttswrapper.ESPEAKNGTTSWrapper`
4546
* :class:`~aeneas.ttswrappers.festivalttswrapper.FESTIVALTTSWrapper`

aeneas/runtimeconfiguration.py

Lines changed: 62 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -561,29 +561,6 @@ class RuntimeConfiguration(Configuration):
561561
.. versionadded:: 1.5.0
562562
"""
563563

564-
NUANCE_TTS_API_SLEEP = "nuance_tts_api_sleep"
565-
"""
566-
Wait this number of seconds before the next HTTP POST request
567-
to the Nuance TTS API.
568-
This parameter can be used to throttle the HTTP usage.
569-
It cannot be a negative value.
570-
571-
Default: ``1.000``.
572-
573-
.. versionadded:: 1.5.0
574-
"""
575-
576-
NUANCE_TTS_API_RETRY_ATTEMPTS = "nuance_tts_api_retry_attempts"
577-
"""
578-
Retry an HTTP POST request to the Nuance TTS API
579-
for this number of times before giving up.
580-
It must be an integer greater than zero.
581-
582-
Default: ``5``.
583-
584-
.. versionadded:: 1.5.0
585-
"""
586-
587564
SAFETY_CHECKS = "safety_checks"
588565
"""
589566
If ``True``, perform safety checks on input files and parameters.
@@ -604,9 +581,10 @@ class RuntimeConfiguration(Configuration):
604581
Maximum length of the audio file of a Task, in seconds.
605582
If a Task has an audio file longer than this value,
606583
it will not be executed and an error will be raised.
607-
Use ``0`` for disabling this check.
608584
609-
Default: ``7200`` seconds.
585+
Use ``0`` to disable this check.
586+
587+
Default: ``0`` seconds.
610588
611589
.. versionadded:: 1.4.1
612590
"""
@@ -617,7 +595,7 @@ class RuntimeConfiguration(Configuration):
617595
If a Task has more text fragments than this value,
618596
it will not be executed and an error will be raised.
619597
620-
Use ``0`` for disabling this check.
598+
Use ``0`` to disable this check.
621599
622600
Default: ``0`` (disabled).
623601
@@ -670,6 +648,17 @@ class RuntimeConfiguration(Configuration):
670648
parameter if the command ``text2wave`` is not available in
671649
one of the directories listed in your ``PATH`` environment variable.
672650
651+
Specify the value
652+
:data:`~aeneas.synthesizer.Synthesizer.AWS` (``aws``)
653+
to use the built-in AWS Polly TTS API wrapper;
654+
you will need to provide your AWS API Access Key and Secret Access Key
655+
by either storing them on disk
656+
(e.g., in ``~/.aws/credentials`` and ``~/.aws/config``)
657+
or setting them in environment variables.
658+
Please refer to
659+
http://boto3.readthedocs.io/en/latest/guide/configuration.html
660+
for further details.
661+
673662
Specify the value
674663
:data:`~aeneas.synthesizer.Synthesizer.NUANCE` (``nuance``)
675664
to use the built-in Nuance TTS API wrapper;
@@ -691,6 +680,34 @@ class RuntimeConfiguration(Configuration):
691680
.. versionadded:: 1.5.0
692681
"""
693682

683+
TTS_PATH = "tts_path"
684+
"""
685+
Path to the TTS engine executable
686+
or the Python CustomTTSWrapper ``.py`` source file
687+
(see the ``aeneas/extra`` directory for examples).
688+
689+
You might need to use a full path,
690+
like ``/path/to/your/ttsengine`` or
691+
``/path/to/your/ttswrapper.py``.
692+
693+
Default: ``None``, implying to use the default path
694+
defined by each TTS wrapper, if it calls the TTS engine
695+
via ``subprocess`` (otherwise it does not matter).
696+
697+
.. versionadded:: 1.5.0
698+
"""
699+
700+
TTS_VOICE_CODE = "tts_voice_code"
701+
"""
702+
The code of the TTS voice to use.
703+
If you specify this value, it will override the default voice code
704+
associated with the language of your text.
705+
706+
Default: ``None``.
707+
708+
.. versionadded:: 1.5.0
709+
"""
710+
694711
TTS_CACHE = "tts_cache"
695712
"""
696713
If set to ``True``, synthesize each distinct text fragment
@@ -712,30 +729,31 @@ class RuntimeConfiguration(Configuration):
712729
.. versionadded:: 1.6.0
713730
"""
714731

715-
TTS_PATH = "tts_path"
732+
TTS_API_SLEEP = "tts_api_sleep"
716733
"""
717-
Path to the TTS engine executable
718-
or the Python CustomTTSWrapper ``.py`` source file
719-
(see the ``aeneas/extra`` directory for examples).
734+
Wait this number of seconds before the next HTTP POST request
735+
to the Nuance TTS API.
736+
This parameter can be used to throttle the HTTP usage.
737+
It cannot be a negative value.
720738
721-
You might need to use a full path,
722-
like ``/path/to/your/ttsengine`` or
723-
``/path/to/your/ttswrapper.py``.
739+
Note that this parameter was called ``nuance_tts_api_sleep``
740+
before v1.7.0.
724741
725-
Default: ``None``, implying to use the default path
726-
defined by each TTS wrapper, if it calls the TTS engine
727-
via ``subprocess`` (otherwise it does not matter).
742+
Default: ``1.000``.
728743
729744
.. versionadded:: 1.5.0
730745
"""
731746

732-
TTS_VOICE_CODE = "tts_voice_code"
747+
TTS_API_RETRY_ATTEMPTS = "tts_api_retry_attempts"
733748
"""
734-
The code of the TTS voice to use.
735-
If you specify this value, it will override the default voice code
736-
associated with the language of your text.
749+
Retry an HTTP POST request to the Nuance TTS API
750+
for this number of times before giving up.
751+
It must be an integer greater than zero.
737752
738-
Default: ``None``.
753+
Note that this parameter was called ``nuance_tts_api_retry_attempts``
754+
before v1.7.0.
755+
756+
Default: ``5``.
739757
740758
.. versionadded:: 1.5.0
741759
"""
@@ -926,12 +944,10 @@ class RuntimeConfiguration(Configuration):
926944

927945
(NUANCE_TTS_API_ID, (None, None, [], u"Nuance Developer API ID")),
928946
(NUANCE_TTS_API_KEY, (None, None, [], u"Nuance Developer API Key")),
929-
(NUANCE_TTS_API_SLEEP, ("1.000", TimeValue, [], u"sleep between Nuance API calls, in s")),
930-
(NUANCE_TTS_API_RETRY_ATTEMPTS, (5, int, [], u"number of retries for a failed Nuance API call")),
931947

932948
(SAFETY_CHECKS, (True, bool, [], u"if True, always perform safety checks")),
933949

934-
(TASK_MAX_AUDIO_LENGTH, ("7200.0", TimeValue, [], u"max length of single audio file, in s (0 to disable)")),
950+
(TASK_MAX_AUDIO_LENGTH, ("0", TimeValue, [], u"max length of single audio file, in s (0 to disable)")),
935951
(TASK_MAX_TEXT_LENGTH, (0, int, [], u"max length of single text file, in fragments (0 to disable)")),
936952

937953
(TMP_PATH, (None, None, [], u"path to the temporary dir")),
@@ -940,6 +956,8 @@ class RuntimeConfiguration(Configuration):
940956
(TTS_PATH, (None, None, [], u"path of the TTS executable/wrapper")), # None (= default) or "espeak" or "/usr/bin/espeak"
941957
(TTS_VOICE_CODE, (None, None, [], u"overrides TTS voice code selected by language with this value")),
942958
(TTS_CACHE, (False, bool, [], u"if True, cache synthesized audio files")),
959+
(TTS_API_SLEEP, ("1.000", TimeValue, [], u"sleep between TTS API calls, in s")),
960+
(TTS_API_RETRY_ATTEMPTS, (5, int, [], u"number of retries for a failed TTS API call")),
943961

944962
(TTS_L1, ("espeak", None, [], u"TTS wrapper to use at level 1 (para)")),
945963
(TTS_PATH_L1, (None, None, [], u"path to level 1 (para) TTS executable/wrapper")), # None (= default) or "espeak" or "/usr/bin/espeak"

aeneas/synthesizer.py

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
from aeneas.logger import Loggable
3838
from aeneas.runtimeconfiguration import RuntimeConfiguration
3939
from aeneas.textfile import TextFile
40+
from aeneas.ttswrappers.awsttswrapper import AWSTTSWrapper
4041
from aeneas.ttswrappers.espeakngttswrapper import ESPEAKNGTTSWrapper
4142
from aeneas.ttswrappers.espeakttswrapper import ESPEAKTTSWrapper
4243
from aeneas.ttswrappers.festivalttswrapper import FESTIVALTTSWrapper
@@ -54,11 +55,17 @@ class Synthesizer(Loggable):
5455
:type rconf: :class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`
5556
:param logger: the logger object
5657
:type logger: :class:`~aeneas.logger.Logger`
57-
:raises: OSError: if a custom TTS engine is requested but it cannot be loaded
58-
:raises: ImportError: if the Nuance TTS API wrapper is requested but
59-
the``requests`` module is not installed
58+
:raises: OSError: if a custom TTS engine is requested
59+
but it cannot be loaded
60+
:raises: ImportError: if the AWS Polly TTS API wrapper is requested
61+
but the ``boto3`` module is not installed, or
62+
if the Nuance TTS API wrapper is requested
63+
but the``requests`` module is not installed
6064
"""
6165

66+
AWS = "aws"
67+
""" Select AWS Polly TTS API wrapper """
68+
6269
CUSTOM = "custom"
6370
""" Select custom TTS engine wrapper """
6471

@@ -74,7 +81,7 @@ class Synthesizer(Loggable):
7481
NUANCE = "nuance"
7582
""" Select Nuance TTS API wrapper """
7683

77-
ALLOWED_VALUES = [CUSTOM, ESPEAK, ESPEAKNG, FESTIVAL, NUANCE]
84+
ALLOWED_VALUES = [AWS, CUSTOM, ESPEAK, ESPEAKNG, FESTIVAL, NUANCE]
7885
""" List of all the allowed values """
7986

8087
TAG = u"Synthesizer"
@@ -110,9 +117,13 @@ def _select_tts_engine(self):
110117
self.log(u"Creating CustomTTSWrapper instance... done")
111118
except Exception as exc:
112119
self.log_exc(u"Unable to load custom TTS wrapper", exc, True, OSError)
113-
elif requested_tts_engine == self.FESTIVAL:
114-
self.log(u"TTS engine: Festival")
115-
self.tts_engine = FESTIVALTTSWrapper(rconf=self.rconf, logger=self.logger)
120+
elif requested_tts_engine == self.AWS:
121+
try:
122+
import boto3
123+
except ImportError as exc:
124+
self.log_exc(u"Unable to import boto3 for AWS Polly TTS API wrapper", exc, True, ImportError)
125+
self.log(u"TTS engine: AWS Polly TTS API")
126+
self.tts_engine = AWSTTSWrapper(rconf=self.rconf, logger=self.logger)
116127
elif requested_tts_engine == self.NUANCE:
117128
try:
118129
import requests
@@ -123,6 +134,9 @@ def _select_tts_engine(self):
123134
elif requested_tts_engine == self.ESPEAKNG:
124135
self.log(u"TTS engine: eSpeak-ng")
125136
self.tts_engine = ESPEAKNGTTSWrapper(rconf=self.rconf, logger=self.logger)
137+
elif requested_tts_engine == self.FESTIVAL:
138+
self.log(u"TTS engine: Festival")
139+
self.tts_engine = FESTIVALTTSWrapper(rconf=self.rconf, logger=self.logger)
126140
else:
127141
self.log(u"TTS engine: eSpeak")
128142
self.tts_engine = ESPEAKTTSWrapper(rconf=self.rconf, logger=self.logger)

aeneas/tests/test_runtimeconfiguration.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -142,14 +142,14 @@ def test_set_rconf_string(self):
142142
(u"mfcc_mask_min_nonspeech_length=5", "mfcc_mask_min_nonspeech_length", 5),
143143
(u"nuance_tts_api_id=foo", "nuance_tts_api_id", "foo"),
144144
(u"nuance_tts_api_key=bar", "nuance_tts_api_key", "bar"),
145-
(u"nuance_tts_api_sleep=5.000", "nuance_tts_api_sleep", TimeValue("5.000")),
146-
(u"nuance_tts_api_retry_attempts=3", "nuance_tts_api_retry_attempts", 3),
147145
(u"safety_checks=False", "safety_checks", False),
148146
(u"task_max_audio_length=1000", "task_max_audio_length", TimeValue("1000")),
149147
(u"task_max_text_length=1000", "task_max_text_length", 1000),
150148
(u"tmp_path=/foo/bar", "tmp_path", "/foo/bar"),
151149
(u"tts=festival", "tts", "festival"),
152150
(u"tts_path=/foo/bar/festival", "tts_path", "/foo/bar/festival"),
151+
(u"tts_api_sleep=5.000", "tts_api_sleep", TimeValue("5.000")),
152+
(u"tts_api_retry_attempts=3", "tts_api_retry_attempts", 3),
153153
(u"tts_voice_code=ru", "tts_voice_code", "ru"),
154154
(u"tts_cache=True", "tts_cache", True),
155155
(u"tts_l1=festival", "tts_l1", "festival"),

aeneas/tests/tool_test_execute_task.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,9 @@ def test_list_values_help(self):
6868
def test_list_values_bad(self):
6969
self.execute([("", "--list-values=foo")], 2)
7070

71+
def test_list_values_aws(self):
72+
self.execute([("", "--list-values=aws")], 2)
73+
7174
def test_list_values_espeak(self):
7275
self.execute([("", "--list-values=espeak")], 2)
7376

aeneas/tools/execute_task.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@
4545
from aeneas.task import TaskConfiguration
4646
from aeneas.textfile import TextFileFormat
4747
from aeneas.tools.abstract_cli_program import AbstractCLIProgram
48+
from aeneas.ttswrappers.awsttswrapper import AWSTTSWrapper
4849
from aeneas.ttswrappers.espeakngttswrapper import ESPEAKNGTTSWrapper
4950
from aeneas.ttswrappers.espeakttswrapper import ESPEAKTTSWrapper
5051
from aeneas.ttswrappers.festivalttswrapper import FESTIVALTTSWrapper
@@ -394,6 +395,7 @@ class ExecuteTaskCLI(AbstractCLIProgram):
394395
PARAMETERS = TaskConfiguration.parameters(sort=True, as_strings=True)
395396

396397
VALUES = {
398+
"aws": AWSTTSWrapper.CODE_TO_HUMAN_LIST,
397399
"espeak": ESPEAKTTSWrapper.CODE_TO_HUMAN_LIST,
398400
"espeak-ng": ESPEAKNGTTSWrapper.CODE_TO_HUMAN_LIST,
399401
"festival": FESTIVALTTSWrapper.CODE_TO_HUMAN_LIST,
@@ -428,7 +430,7 @@ class ExecuteTaskCLI(AbstractCLIProgram):
428430
u"--list-values : list all parameters for which values can be listed",
429431
u"--list-values=PARAM : list all allowed values for parameter PARAM",
430432
u"--output-html : output HTML file for fine tuning",
431-
u"--presets-word : apply presets for word-level alignment (MFCC masking)",
433+
u"--presets-word : apply presets for word-level alignment (MFCC masking)",
432434
u"--rate : print rate of each fragment",
433435
u"--skip-validator : do not validate the given config string",
434436
u"--zero : print fragments with zero duration",

aeneas/tools/synthesize_text.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -168,12 +168,23 @@ def perform_command(self):
168168
backwards=backwards
169169
)
170170
self.print_success(u"Created file '%s'" % output_file_path)
171+
synt.clear_cache()
171172
return self.NO_ERROR_EXIT_CODE
172173
except ImportError as exc:
173-
self.print_error(u"You need to install Python module requests to use the Nuance TTS API wrapper. Run:")
174-
self.print_error(u"$ pip install requests")
175-
self.print_error(u"or, to install for all users:")
176-
self.print_error(u"$ sudo pip install requests")
174+
tts = self.rconf[RuntimeConfiguration.TTS]
175+
if tts == Synthesizer.AWS:
176+
self.print_error(u"You need to install Python module boto3 to use the AWS Polly TTS API wrapper. Run:")
177+
self.print_error(u"$ pip install boto3")
178+
self.print_error(u"or, to install for all users:")
179+
self.print_error(u"$ sudo pip install boto3")
180+
elif tts == Synthesizer.NUANCE:
181+
self.print_error(u"You need to install Python module requests to use the Nuance TTS API wrapper. Run:")
182+
self.print_error(u"$ pip install requests")
183+
self.print_error(u"or, to install for all users:")
184+
self.print_error(u"$ sudo pip install requests")
185+
else:
186+
self.print_error(u"An unexpected error occurred while synthesizing text:")
187+
self.print_error(u"%s" % exc)
177188
except Exception as exc:
178189
self.print_error(u"An unexpected error occurred while synthesizing text:")
179190
self.print_error(u"%s" % exc)

aeneas/ttswrappers/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,10 @@ The TTS engine can be called using one of these methods:
1313

1414
Currently, the available TTS engines are:
1515

16-
* `ESPEAKTTSWrapper` for `eSpeak` (C extension, subprocess)
17-
* `FESTIVALTTSWrapper` for `festival` (subprocess)
16+
* `AWSTTSWrapper` for `AWS Polly TTS API` (Python calling remote AWS Polly API)
17+
* `ESPEAKTTSWrapper` for `eSpeak` (C extension, subprocess; default TTS Wrapper)
18+
* `ESPEAKNGTTSWrapper` for `eSpeak-ng` (subprocess)
19+
* `FESTIVALTTSWrapper` for `Festival` (subprocess)
1820
* `NuanceTTSWrapper` for `Nuance TTS API` (Python calling remote Nuance API)
1921

2022
Moreover, custom TTS wrappers can be specified at runtime.

0 commit comments

Comments
 (0)