Back to home page

This page demonstrates audio source separation from deformable arrays, in which microphones can move relative to the sound sources and also relative to each other. Five simultaneous speech sources were separated using arrays of 11 moving microphones and a time-invariant beamformer. The beamformer was designed based on the full-rank second-order statistics of training data from each source. The method will be explained in a future paper.

The speech data used in this experiment was taken from the VCTK corpus and played through five loudspeakers in the Illinois Augmented Listening Laboratory. The data was captured by 11 microphones placed on a mannequin or human subject, as shown below. The human subject moved in different ways during each recording.

For each experiment, the spatial statistics of each source were inferred from separate training recordings. These statistics were used to design four different source separation filters, shown in the table below. The first two rows are the unprocessed mixture and the clean* source signals. The next four rows contain the separated sources using arrays on a nonmoving mannequin, a human standing as still as possible, a human moving his head and arms, and a human dancing in place.

The samples below represent the separated sources as captured by the microphones nearest the left and right ears. The clips are in stereo and best experienced using headphones. In the final two experiments where the human subject moved his head, the sound sources should appear to move.

Array Source 1 Source 2 Source 3 Source 4 Source 5
Unprocessed
Clean*
Mannequin
Human standing still
Human gesturing
Human dancing

* "Clean" samples were recorded separately and have different head movement patterns than the recorded mixture.