Skip to content

Commit d73ad9f

Browse files
authored
Merge pull request #128 from pettarin/devel
Updated venv script. Added --presets-word. Improved README
2 parents e942391 + edac750 commit d73ad9f

File tree

7 files changed

+150
-28
lines changed

7 files changed

+150
-28
lines changed

README.md

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -70,12 +70,13 @@ in several formats, depending on its application:
7070
### Supported Platforms
7171

7272
**aeneas** has been developed and tested on **Debian 64bit**,
73-
which is the **only supported OS** at the moment.
73+
with **Python 2.7** and **Python 3.5**,
74+
which are the **only supported platforms** at the moment.
7475
Nevertheless, **aeneas** has been confirmed to work on
7576
other Linux distributions, Mac OS X, and Windows.
7677
See the
7778
[PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md)
78-
for the details.
79+
for details.
7980

8081
If installing **aeneas** natively on your OS proves difficult,
8182
you are strongly encouraged to use
@@ -99,15 +100,15 @@ for detailed, step-by-step installation procedures for different operating syste
99100

100101
The generic OS-independent procedure is simple:
101102

102-
1. Install
103+
1. **Install**
103104
[Python](https://python.org/) (2.7.x preferred),
104105
[FFmpeg](https://www.ffmpeg.org/), and
105106
[eSpeak](http://espeak.sourceforge.net/)
106107

107-
2. Make sure the following executables can be called from your shell:
108+
2. Make sure the following **executables** can be called from your **shell**:
108109
`espeak`, `ffmpeg`, `ffprobe`, `pip`, and `python`
109110

110-
3. First install `numpy` with `pip` and then `aeneas`:
111+
3. First install `numpy` with `pip` and then `aeneas` (this order is important):
111112

112113
```bash
113114
pip install numpy
@@ -218,6 +219,8 @@ which explains how to use the built-in command line tools.
218219
[HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md)
219220
* Development history:
220221
[HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md)
222+
* Testing:
223+
[TESTING](https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md)
221224
* Benchmark suite:
222225
[https://readbeyond.github.io/aeneas-benchmark/](https://readbeyond.github.io/aeneas-benchmark/)
223226
@@ -245,16 +248,44 @@ which explains how to use the built-in command line tools.
245248
* Output an HTML file for fine tuning the sync map manually (`finetuneas` project)
246249
* Execution parameters tunable at runtime
247250
* Code suitable for Web app deployment (e.g., on-demand cloud computing instances)
248-
* Extensive test suite including 1,000+ unit/integration/performance tests, that run and must pass before each release
251+
* Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release
249252
250253
251254
## Limitations and Missing Features
252255
253256
* Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
254257
* Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
255-
* No protection against memory trashing if you feed extremely long audio files (>1.5h per single audio file)
258+
* No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM => max 2h audio; 16 GB RAM => max 10h audio)
256259
* [Open issues](https://github.com/readbeyond/aeneas/issues)
257260
261+
### A Note on Word-Level Alignment
262+
263+
A significant number of users runs **aeneas** to align audio and text
264+
at word-level (i.e., each fragment is a word).
265+
Although **aeneas** was not designed with word-level alignment in mind
266+
and the results might be inferior to
267+
[ASR-based forced aligners](https://github.com/pettarin/forced-alignment-tools)
268+
for languages with good ASR models,
269+
**aeneas** offers some options to improve
270+
the quality of the alignment at word-level:
271+
272+
* multilevel text (since v1.5.1), and/or
273+
* MFCC nonspeech masking (since v1.7.0, disabled by default).
274+
275+
If you use the ``aeneas.tools.execute_task`` command line tool,
276+
you can add ``--presets-word`` switch to enable MFCC nonspeech masking, for example:
277+
278+
```bash
279+
$ python -m aeneas.tools.execute_task --example-words --presets-word
280+
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
281+
```
282+
283+
If you use **aeneas** as a library, just set the appropriate
284+
``RuntimeConfiguration`` parameters.
285+
Please see the
286+
[command line tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html)
287+
for details.
288+
258289
259290
## License
260291

README.rst

Lines changed: 46 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -87,12 +87,13 @@ System Requirements
8787
Supported Platforms
8888
~~~~~~~~~~~~~~~~~~~
8989

90-
**aeneas** has been developed and tested on **Debian 64bit**, which is
91-
the **only supported OS** at the moment. Nevertheless, **aeneas** has
92-
been confirmed to work on other Linux distributions, Mac OS X, and
93-
Windows. See the `PLATFORMS
90+
**aeneas** has been developed and tested on **Debian 64bit**, with
91+
**Python 2.7** and **Python 3.5**, which are the **only supported
92+
platforms** at the moment. Nevertheless, **aeneas** has been confirmed
93+
to work on other Linux distributions, Mac OS X, and Windows. See the
94+
`PLATFORMS
9495
file <https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md>`__
95-
for the details.
96+
for details.
9697

9798
If installing **aeneas** natively on your OS proves difficult, you are
9899
strongly encouraged to use
@@ -115,14 +116,16 @@ operating systems.
115116

116117
The generic OS-independent procedure is simple:
117118

118-
1. Install `Python <https://python.org/>`__ (2.7.x preferred),
119+
1. **Install** `Python <https://python.org/>`__ (2.7.x preferred),
119120
`FFmpeg <https://www.ffmpeg.org/>`__, and
120121
`eSpeak <http://espeak.sourceforge.net/>`__
121122

122-
2. Make sure the following executables can be called from your shell:
123-
``espeak``, ``ffmpeg``, ``ffprobe``, ``pip``, and ``python``
123+
2. Make sure the following **executables** can be called from your
124+
**shell**: ``espeak``, ``ffmpeg``, ``ffprobe``, ``pip``, and
125+
``python``
124126

125-
3. First install ``numpy`` with ``pip`` and then ``aeneas``:
127+
3. First install ``numpy`` with ``pip`` and then ``aeneas`` (this order
128+
is important):
126129

127130
.. code:: bash
128131
@@ -224,6 +227,8 @@ Documentation and Support
224227
`HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__
225228
- Development history:
226229
`HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__
230+
- Testing:
231+
`TESTING <https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md>`__
227232
- Benchmark suite: https://readbeyond.github.io/aeneas-benchmark/
228233

229234
Supported Features
@@ -268,7 +273,7 @@ Supported Features
268273
- Execution parameters tunable at runtime
269274
- Code suitable for Web app deployment (e.g., on-demand cloud computing
270275
instances)
271-
- Extensive test suite including 1,000+ unit/integration/performance
276+
- Extensive test suite including 1,200+ unit/integration/performance
272277
tests, that run and must pass before each release
273278

274279
Limitations and Missing Features
@@ -278,10 +283,39 @@ Limitations and Missing Features
278283
might produce a wrong sync map
279284
- Audio is assumed to be spoken: not suitable for song captioning, YMMV
280285
for CC applications
281-
- No protection against memory trashing if you feed extremely long
282-
audio files (>1.5h per single audio file)
286+
- No protection against memory swapping: be sure your amount of RAM is
287+
adequate for the maximum duration of a single audio file (e.g., 4 GB
288+
RAM => max 2h audio; 16 GB RAM => max 10h audio)
283289
- `Open issues <https://github.com/readbeyond/aeneas/issues>`__
284290

291+
A Note on Word-Level Alignment
292+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
293+
294+
A significant number of users runs **aeneas** to align audio and text at
295+
word-level (i.e., each fragment is a word). Although **aeneas** was not
296+
designed with word-level alignment in mind and the results might be
297+
inferior to `ASR-based forced
298+
aligners <https://github.com/pettarin/forced-alignment-tools>`__ for
299+
languages with good ASR models, **aeneas** offers some options to
300+
improve the quality of the alignment at word-level:
301+
302+
- multilevel text (since v1.5.1), and/or
303+
- MFCC nonspeech masking (since v1.7.0, disabled by default).
304+
305+
If you use the ``aeneas.tools.execute_task`` command line tool, you can
306+
add ``--presets-word`` switch to enable MFCC nonspeech masking, for
307+
example:
308+
309+
.. code:: bash
310+
311+
$ python -m aeneas.tools.execute_task --example-words --presets-word
312+
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
313+
314+
If you use **aeneas** as a library, just set the appropriate
315+
``RuntimeConfiguration`` parameters. Please see the `command line
316+
tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__ for
317+
details.
318+
285319
License
286320
-------
287321

aeneas/tests/long_test_task_switches.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,15 @@ def test_exec_output_html(self):
7676
("", "--output-html")
7777
], 0)
7878

79+
def test_exec_presets_word(self):
80+
self.execute([
81+
("in", "../tools/res/audio.mp3"),
82+
("in", "../tools/res/words.txt"),
83+
("", "task_language=eng|is_text_type=plain|os_task_file_format=json"),
84+
("out", "sonnet.json"),
85+
("", "--presets-word")
86+
], 0)
87+
7988
def test_exec_rate(self):
8089
self.execute([
8190
("in", "../tools/res/audio.mp3"),

aeneas/tools/execute_task.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -428,6 +428,7 @@ class ExecuteTaskCLI(AbstractCLIProgram):
428428
u"--list-values : list all parameters for which values can be listed",
429429
u"--list-values=PARAM : list all allowed values for parameter PARAM",
430430
u"--output-html : output HTML file for fine tuning",
431+
u"--presets-word : apply presets for word-level alignment (MFCC masking)",
431432
u"--rate : print rate of each fragment",
432433
u"--skip-validator : do not validate the given config string",
433434
u"--zero : print fragments with zero duration",
@@ -470,6 +471,7 @@ def perform_command(self):
470471
print_faster_rate = self.has_option(u"--faster-rate")
471472
print_rates = self.has_option(u"--rate")
472473
print_zero = self.has_option(u"--zero")
474+
presets_word = self.has_option(u"--presets-word")
473475

474476
if demo:
475477
validate = False
@@ -522,6 +524,11 @@ def perform_command(self):
522524
config_string = self.actual_arguments[2]
523525
sync_map_file_path = self.actual_arguments[3]
524526

527+
if presets_word:
528+
self.print_info(u"Preset for word-level alignment")
529+
self.rconf[RuntimeConfiguration.MFCC_MASK_NONSPEECH] = True
530+
self.rconf[RuntimeConfiguration.MFCC_MASK_NONSPEECH_L3] = True
531+
525532
html_file_path = None
526533
if output_html:
527534
keep_audio = True

docs/source/changelog.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ v1.7.0 (2016-12-07)
4949
#. Changed ``--rates`` to ``--rate`` in ``ExecuteTaskCLI``
5050
#. Fixes issue with ``gf.relative_path()`` in Windows, if executed from a drive different than install drive
5151
#. Fixed a bug with empty fragments when using subprocess TTS with TTS cache enabled
52+
#. Added ``--presets-word`` switch to ``aeneas.tools.execute_task``
5253

5354
v1.6.0.1 (2016-09-30)
5455
---------------------

docs/source/clitutorial.rst

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -304,8 +304,13 @@ you need to provide the following additional parameters:
304304

305305
.. note::
306306
If you are interested in synchronizing at **word granularity**,
307-
it is highly suggested to use a **multilevel text format**,
308-
even if you are going to use only the timings for the finer granularity.
307+
it is highly suggested to use:
308+
309+
1. MFCC nonspeech masking; and/or
310+
2. a **multilevel text format**,
311+
even if you are going to use only the timings for the finer granularity.
312+
313+
as they generally yield more accurate timings.
309314

310315
(If you do not want the output sync map file to contain
311316
the multilevel tree hierarchy for the timings,
@@ -315,10 +320,20 @@ you need to provide the following additional parameters:
315320
:data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_LEVELS`
316321
with value ``3``).
317322

318-
There are two main reasons for this suggestion:
323+
Since ``aeneas`` v1.7.0,
324+
the ``aeneas.tools.execute_task`` has a switch ``--presets-word``
325+
that enables MFCC nonspeech masking for single level tasks or
326+
MFCC nonspeech masking on level 3 (word) for multilevel tasks.
327+
For example::
328+
329+
$ python -m aeneas.tools.execute_task --example-words
330+
$ python -m aeneas.tools.execute_task --example-words --presets-word
331+
$ python -m aeneas.tools.execute_task --example-words-multilevel
332+
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
319333

320-
1. the computation should be faster, and
321-
2. likely, the timings will be more accurate.
334+
The other default settings should be fine for most users,
335+
however if you need finer control, feel free to experiment
336+
with the following parameters.
322337

323338
Starting with ``aeneas`` v1.5.1,
324339
you can specify different MFCC parameters for each level, see:
@@ -336,7 +351,22 @@ you need to provide the following additional parameters:
336351
* :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L1`,
337352
* :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L2`,
338353
* :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L3`.
354+
355+
Starting with ``aeneas`` v1.7.0,
356+
you can specify the MFCC nonspeech masking, for both
357+
single level tasks and multilevel tasks.
358+
In the latter case, you can apply it to each level separately, see:
359+
360+
* :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH`,
361+
* :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L1`,
362+
* :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L2`,
363+
* :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L3`.
339364

365+
If you are using a multilevel text format,
366+
you might want to enable MFCC masking only for level 3 (word),
367+
as enabling it for level 1 and 2 does not seem to yield significantly
368+
better results.
369+
340370
The ``aeneas`` mailing list contains some interesting threads
341371
about using aeneas for word-level synchronization.
342372

venvs/manage_venvs.sh

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,20 @@ deps() {
4848
cd $1
4949
source bin/activate
5050
pip install -U pip
51-
pip install -U numpy
51+
if [ "$2" == "pypy" ]
52+
then
53+
# on pypy install cython and numpy from its devel repo
54+
# as recommended in http://pypy.org/download.html
55+
pip install -U cython git+https://github.com/numpy/numpy.git
56+
else
57+
# otherwise, just install regular numpy
58+
pip install -U numpy
59+
fi
5260
pip install -U lxml BeautifulSoup4
5361
pip install -U pafy requests tgt youtube-dl
54-
# NOTE Pillow might cause errors with pypy: try installing it as last
62+
# NOTE Pillow might raise errors due to missing libraries
63+
# (e.g., libjpeg, libpng, zlib)
64+
# so install it as the last one
5565
pip install -U Pillow
5666
deactivate
5767
cd ..
@@ -226,7 +236,7 @@ fi
226236

227237
if [ "$ACTION" == "deps" ]
228238
then
229-
deps $D
239+
deps $D $EX
230240
fi
231241

232242
if [ "$ACTION" == "sdist" ]
@@ -247,6 +257,6 @@ fi
247257
if [ "$ACTION" == "full" ]
248258
then
249259
create $D $FULLEX
250-
deps $D
260+
deps $D $EX
251261
copytests $D
252262
fi

0 commit comments

Comments
 (0)