• Daniel Gerber
  • 01.12.2015
  • 31.05.2016

For many tasks of audio signal processing, a detection of time frames with a dominant target source is crucial. Possible applications include automatic speech recognition, where the speech recognizer should only be active during target source activity, or system identification, where the acoustic path between the target source and the microphones should be estimated in interference and noise pauses.


In the literature, different methods for target source activity detection were published which typically rely on signal-to-interference-and-noise ratio (SINR) estimation, exploit crosscorrelation, or distinguish speech from noise by incorporating knowledge on the spectral and temporal structure of speech signals. With all these methods yielding target source activity estimates with varying quality (dependent on target source position, interferer activity and interferer / noise type), artificial neural networks (ANNs) proved to be a good solution to combine different kinds of methods, improving the overall detection rates. Due to the lack of memory, conventional (feedforward) ANNs cannot exploit any dependency between subsequent time frames.
Speech signals, however, exhibit a high temporal correlation of the features. Therefore, the use of recurrent neural networks (RNNs) can be expected to be beneficial.


In this thesis, recurrent neural networks should be investigated in the context of target activity detection. The work involves an implementation of the RNN including training algorithms, which can be based on existing frameworks. For the features, a framework with different methods is available. Moreover, possible feedback connections should be identified. Finally, the network should be optimized and evaluated in terms of detection rates (false positives rate and false negatives rate).

Well-documented and well-structured software is important. The thesis can be written in German or English.