Merge pull request #128 from pettarin/devel

readbeyond · web-flow · commit d73ad9f0730d · 2016-11-29T20:47:27.000+01:00
Updated venv script. Added --presets-word. Improved README
diff --git a/README.md b/README.md
@@ -70,12 +70,13 @@ in several formats, depending on its application:
 ### Supported Platforms
 
 **aeneas** has been developed and tested on **Debian 64bit**,
-which is the **only supported OS** at the moment.
+with **Python 2.7** and **Python 3.5**,
+which are the **only supported platforms** at the moment.
 Nevertheless, **aeneas** has been confirmed to work on
 other Linux distributions, Mac OS X, and Windows.
 See the
 [PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md)
-for the details.
+for details.
 
 If installing **aeneas** natively on your OS proves difficult,
 you are strongly encouraged to use
@@ -99,15 +100,15 @@ for detailed, step-by-step installation procedures for different operating syste
 
 The generic OS-independent procedure is simple:
 
-1. Install
+1. **Install**
    [Python](https://python.org/) (2.7.x preferred),
    [FFmpeg](https://www.ffmpeg.org/), and
    [eSpeak](http://espeak.sourceforge.net/)
 
-2. Make sure the following executables can be called from your shell:
+2. Make sure the following **executables** can be called from your **shell**:
    `espeak`, `ffmpeg`, `ffprobe`, `pip`, and `python`
 
-3. First install `numpy` with `pip` and then `aeneas`:
+3. First install `numpy` with `pip` and then `aeneas` (this order is important):
     
     ```bash
     pip install numpy
@@ -218,6 +219,8 @@ which explains how to use the built-in command line tools.
   [HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md)
 * Development history:
   [HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md)
+* Testing:
+  [TESTING](https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md)
 * Benchmark suite:
   [https://readbeyond.github.io/aeneas-benchmark/](https://readbeyond.github.io/aeneas-benchmark/)
 
@@ -245,16 +248,44 @@ which explains how to use the built-in command line tools.
 * Output an HTML file for fine tuning the sync map manually (`finetuneas` project)
 * Execution parameters tunable at runtime
 * Code suitable for Web app deployment (e.g., on-demand cloud computing instances)
-* Extensive test suite including 1,000+ unit/integration/performance tests, that run and must pass before each release
+* Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release
 
 
 ## Limitations and Missing Features 
 
 * Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
 * Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
-* No protection against memory trashing if you feed extremely long audio files (>1.5h per single audio file)
+* No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM => max 2h audio; 16 GB RAM => max 10h audio)
 * [Open issues](https://github.com/readbeyond/aeneas/issues)
 
+### A Note on Word-Level Alignment
+
+A significant number of users runs **aeneas** to align audio and text
+at word-level (i.e., each fragment is a word).
+Although **aeneas** was not designed with word-level alignment in mind
+and the results might be inferior to
+[ASR-based forced aligners](https://github.com/pettarin/forced-alignment-tools)
+for languages with good ASR models,
+**aeneas** offers some options to improve
+the quality of the alignment at word-level:
+
+* multilevel text (since v1.5.1), and/or
+* MFCC nonspeech masking (since v1.7.0, disabled by default).
+
+If you use the ``aeneas.tools.execute_task`` command line tool,
+you can add ``--presets-word`` switch to enable MFCC nonspeech masking, for example:
+
+```bash
+$ python -m aeneas.tools.execute_task --example-words --presets-word
+$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
+```
+
+If you use **aeneas** as a library, just set the appropriate
+``RuntimeConfiguration`` parameters.
+Please see the
+[command line tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html)
+for details.
+
 
 ## License
 
diff --git a/README.rst b/README.rst
@@ -87,12 +87,13 @@ System Requirements
 Supported Platforms
 ~~~~~~~~~~~~~~~~~~~
 
-**aeneas** has been developed and tested on **Debian 64bit**, which is
-the **only supported OS** at the moment. Nevertheless, **aeneas** has
-been confirmed to work on other Linux distributions, Mac OS X, and
-Windows. See the `PLATFORMS
+**aeneas** has been developed and tested on **Debian 64bit**, with
+**Python 2.7** and **Python 3.5**, which are the **only supported
+platforms** at the moment. Nevertheless, **aeneas** has been confirmed
+to work on other Linux distributions, Mac OS X, and Windows. See the
+`PLATFORMS
 file <https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md>`__
-for the details.
+for details.
 
 If installing **aeneas** natively on your OS proves difficult, you are
 strongly encouraged to use
@@ -115,14 +116,16 @@ operating systems.
 
 The generic OS-independent procedure is simple:
 
-1. Install `Python <https://python.org/>`__ (2.7.x preferred),
+1. **Install** `Python <https://python.org/>`__ (2.7.x preferred),
    `FFmpeg <https://www.ffmpeg.org/>`__, and
    `eSpeak <http://espeak.sourceforge.net/>`__
 
-2. Make sure the following executables can be called from your shell:
-   ``espeak``, ``ffmpeg``, ``ffprobe``, ``pip``, and ``python``
+2. Make sure the following **executables** can be called from your
+   **shell**: ``espeak``, ``ffmpeg``, ``ffprobe``, ``pip``, and
+   ``python``
 
-3. First install ``numpy`` with ``pip`` and then ``aeneas``:
+3. First install ``numpy`` with ``pip`` and then ``aeneas`` (this order
+   is important):
 
    .. code:: bash
 
@@ -224,6 +227,8 @@ Documentation and Support
    `HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__
 -  Development history:
    `HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__
+-  Testing:
+   `TESTING <https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md>`__
 -  Benchmark suite: https://readbeyond.github.io/aeneas-benchmark/
 
 Supported Features
@@ -268,7 +273,7 @@ Supported Features
 -  Execution parameters tunable at runtime
 -  Code suitable for Web app deployment (e.g., on-demand cloud computing
    instances)
--  Extensive test suite including 1,000+ unit/integration/performance
+-  Extensive test suite including 1,200+ unit/integration/performance
    tests, that run and must pass before each release
 
 Limitations and Missing Features
@@ -278,10 +283,39 @@ Limitations and Missing Features
    might produce a wrong sync map
 -  Audio is assumed to be spoken: not suitable for song captioning, YMMV
    for CC applications
--  No protection against memory trashing if you feed extremely long
-   audio files (>1.5h per single audio file)
+-  No protection against memory swapping: be sure your amount of RAM is
+   adequate for the maximum duration of a single audio file (e.g., 4 GB
+   RAM => max 2h audio; 16 GB RAM => max 10h audio)
 -  `Open issues <https://github.com/readbeyond/aeneas/issues>`__
 
+A Note on Word-Level Alignment
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A significant number of users runs **aeneas** to align audio and text at
+word-level (i.e., each fragment is a word). Although **aeneas** was not
+designed with word-level alignment in mind and the results might be
+inferior to `ASR-based forced
+aligners <https://github.com/pettarin/forced-alignment-tools>`__ for
+languages with good ASR models, **aeneas** offers some options to
+improve the quality of the alignment at word-level:
+
+-  multilevel text (since v1.5.1), and/or
+-  MFCC nonspeech masking (since v1.7.0, disabled by default).
+
+If you use the ``aeneas.tools.execute_task`` command line tool, you can
+add ``--presets-word`` switch to enable MFCC nonspeech masking, for
+example:
+
+.. code:: bash
+
+    $ python -m aeneas.tools.execute_task --example-words --presets-word
+    $ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
+
+If you use **aeneas** as a library, just set the appropriate
+``RuntimeConfiguration`` parameters. Please see the `command line
+tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__ for
+details.
+
 License
 -------
 
diff --git a/aeneas/tests/long_test_task_switches.py b/aeneas/tests/long_test_task_switches.py
@@ -76,6 +76,15 @@ def test_exec_output_html(self):
             ("", "--output-html")
         ], 0)
 
+    def test_exec_presets_word(self):
+        self.execute([
+            ("in", "../tools/res/audio.mp3"),
+            ("in", "../tools/res/words.txt"),
+            ("", "task_language=eng|is_text_type=plain|os_task_file_format=json"),
+            ("out", "sonnet.json"),
+            ("", "--presets-word")
+        ], 0)
+
     def test_exec_rate(self):
         self.execute([
             ("in", "../tools/res/audio.mp3"),
diff --git a/aeneas/tools/execute_task.py b/aeneas/tools/execute_task.py
@@ -428,6 +428,7 @@ class ExecuteTaskCLI(AbstractCLIProgram):
             u"--list-values : list all parameters for which values can be listed",
             u"--list-values=PARAM : list all allowed values for parameter PARAM",
             u"--output-html : output HTML file for fine tuning",
+            u"--presets-word : apply presets for word-level alignment (MFCC masking)", 
             u"--rate : print rate of each fragment",
             u"--skip-validator : do not validate the given config string",
             u"--zero : print fragments with zero duration",
@@ -470,6 +471,7 @@ def perform_command(self):
         print_faster_rate = self.has_option(u"--faster-rate")
         print_rates = self.has_option(u"--rate")
         print_zero = self.has_option(u"--zero")
+        presets_word = self.has_option(u"--presets-word")
 
         if demo:
             validate = False
@@ -522,6 +524,11 @@ def perform_command(self):
             config_string = self.actual_arguments[2]
             sync_map_file_path = self.actual_arguments[3]
 
+        if presets_word:
+            self.print_info(u"Preset for word-level alignment")
+            self.rconf[RuntimeConfiguration.MFCC_MASK_NONSPEECH] = True
+            self.rconf[RuntimeConfiguration.MFCC_MASK_NONSPEECH_L3] = True
+
         html_file_path = None
         if output_html:
             keep_audio = True
diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst
@@ -49,6 +49,7 @@ v1.7.0 (2016-12-07)
 #. Changed ``--rates`` to ``--rate`` in ``ExecuteTaskCLI``
 #. Fixes issue with ``gf.relative_path()`` in Windows, if executed from a drive different than install drive
 #. Fixed a bug with empty fragments when using subprocess TTS with TTS cache enabled
+#. Added ``--presets-word`` switch to ``aeneas.tools.execute_task``
 
 v1.6.0.1 (2016-09-30)
 ---------------------
diff --git a/docs/source/clitutorial.rst b/docs/source/clitutorial.rst
@@ -304,8 +304,13 @@ you need to provide the following additional parameters:
 
 .. note::
     If you are interested in synchronizing at **word granularity**,
-    it is highly suggested to use a **multilevel text format**,
-    even if you are going to use only the timings for the finer granularity.
+    it is highly suggested to use:
+   
+    1. MFCC nonspeech masking; and/or
+    2. a **multilevel text format**,
+       even if you are going to use only the timings for the finer granularity.
+
+    as they generally yield more accurate timings.
 
     (If you do not want the output sync map file to contain
     the multilevel tree hierarchy for the timings,
@@ -315,10 +320,20 @@ you need to provide the following additional parameters:
     :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_LEVELS`
     with value ``3``).
 
-    There are two main reasons for this suggestion:
+    Since ``aeneas`` v1.7.0,
+    the ``aeneas.tools.execute_task`` has a switch ``--presets-word``
+    that enables MFCC nonspeech masking for single level tasks or
+    MFCC nonspeech masking on level 3 (word) for multilevel tasks.
+    For example::
+
+        $ python -m aeneas.tools.execute_task --example-words
+        $ python -m aeneas.tools.execute_task --example-words --presets-word
+        $ python -m aeneas.tools.execute_task --example-words-multilevel
+        $ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
 
-    1. the computation should be faster, and
-    2. likely, the timings will be more accurate.
+    The other default settings should be fine for most users,
+    however if you need finer control, feel free to experiment
+    with the following parameters.
 
     Starting with ``aeneas`` v1.5.1,
     you can specify different MFCC parameters for each level, see:
@@ -336,7 +351,22 @@ you need to provide the following additional parameters:
     * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L1`,
     * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L2`,
     * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L3`.
+
+    Starting with ``aeneas`` v1.7.0,
+    you can specify the MFCC nonspeech masking, for both
+    single level tasks and multilevel tasks.
+    In the latter case, you can apply it to each level separately, see:
+
+    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH`,
+    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L1`,
+    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L2`,
+    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L3`.
     
+    If you are using a multilevel text format,
+    you might want to enable MFCC masking only for level 3 (word),
+    as enabling it for level 1 and 2 does not seem to yield significantly
+    better results.
+
     The ``aeneas`` mailing list contains some interesting threads
     about using aeneas for word-level synchronization.
 
diff --git a/venvs/manage_venvs.sh b/venvs/manage_venvs.sh
@@ -48,10 +48,20 @@ deps() {
         cd $1
         source bin/activate
         pip install -U pip
-        pip install -U numpy
+        if [ "$2" == "pypy" ]
+        then
+            # on pypy install cython and numpy from its devel repo
+            # as recommended in http://pypy.org/download.html
+            pip install -U cython git+https://github.com/numpy/numpy.git
+        else
+            # otherwise, just install regular numpy
+            pip install -U numpy
+        fi
         pip install -U lxml BeautifulSoup4
         pip install -U pafy requests tgt youtube-dl
-        # NOTE Pillow might cause errors with pypy: try installing it as last
+        # NOTE Pillow might raise errors due to missing libraries
+        #      (e.g., libjpeg, libpng, zlib)
+        #      so install it as the last one
         pip install -U Pillow 
         deactivate
         cd ..
@@ -226,7 +236,7 @@ fi
 
 if [ "$ACTION" == "deps" ]
 then
-    deps $D
+    deps $D $EX
 fi
 
 if [ "$ACTION" == "sdist" ]
@@ -247,6 +257,6 @@ fi
 if [ "$ACTION" == "full" ]
 then
     create $D $FULLEX
-    deps $D
+    deps $D $EX
     copytests $D
 fi