Demo Beamforming

Koen Eneman

Why beamforming ?

Consider the following hands-free teleconferencing setup :

Typical teleconferencing setup.

As the distance between the signal source and the microphones is large, the background noise picked up by the microphones will typically result in a low signal-to-noise ratio at the microphone outputs. In order to provide a high speech transmission quality and speech intelligibility, the recorded signals need to be enhanced.
The speech signal (desired signal) and background noise overlap in time and frequency domain. By using multiple microphones, e.g. a microphone array, one can take advantage of the different spatial characteristics of speech and noise. In this way the undesired signals can be suppressed.


Recording setup.

Some recordings were made in the ESAT speech laboratory. Six electret microphones having omnidirectional spatial characteristics were placed on a linear array, 5 cm apart. A speech source was located in front of the microphone array (broadside direction). A noise source was placed at an angle of 55 deg w.r.t. the array. All signals were sampled and processed at 8 kHz :

Signal recorded by microphone 4. Click on the figure to listen to the signal (*).

The signal-to-noise ratio corresponding to this reference signal is -9.9 dB.

By applying a standard delay-and-sum beamformer

the following signal was obtained

Output delay-and-sum beamformer. Click on the figure to listen to the signal (*).

The signal-to-noise ratio has now increased to 0.5 dB.

Finally, a more advanced beamforming technique, called Griffiths-Jim beamforming, was applied. This technique combines a standard fixed beamformer (delay-and-sum) with adaptive filtering.

A blocking matrix was designed to obtain a single noise reference containing as little desired speech as possible. It was verified that the signal-to-noise ratio of the noise reference is -108 dB.

Noise reference. Click on the figure to listen to the signal (*).

Using this reference the following final result could be obtained after convergence of the adaptive filter :

Output Griffiths-Jim + speech detection. Click on the figure to listen to the signal (*).

The resulting signal-to-noise ratio is 8.0 dB : a big improvement !

(*)You'll have to install a helper application that can play .au-files to listen to the images !
This page is maintained by Koen Eneman. Last modification : October 16, 1998.
Back to my homepage.