• Ralf Gross
  • 29.11.2018

During the EU-FP7 Project Embodied Audition for RobotS (EARS) a real-time demonstrator for robot audition was developed, which consists of a signal enhancement stage containing a multichannel spatial filter and a single-channel post-filter, followed by a cloud-based, i.e., online, automatic speech recognition (ASR) system.

In previous work, the possibility to integrate an offline speech recognition system, primarily as backup for the online system in situations without access to an internet connection, was investigated. The respective approach used CMUSphinx, which uses Hidden Markov Model – Gaussian Mixture Model (HMM-GMM)-based acoustic models, as ASR engine and the Python SpeechRecognition package as interface to the demonstrator. The evaluation showed that with the available training and evaluation data, no satisfying recognition accuracy could be obtained.

The goal of this research internship is to train a word model-based ASR system for the purpose of a small-vocabulary speech recognition task. The research internship therefore involves the following tasks:

  1. Training of an word model-based acoustic model for the small-vocabulary recognition task of the CHiME challenge,

  2. Verification of the trained acoustic model with evaluation data recorded by the 12-element microphone array integrated into the robot head,

  3. Investigation of the impact of the speech enhancement algorithms employed in the current version of the demonstrator on the recorded evaluation data.

The implementation is to be done using either Matlab or Python. A well-structured and well-documented code has to be handed in at the end of the research internship.