Télécom Paris

Blind speech separation by combining beamformers and a time frequency binary mask
  • Jan Cermak, (NTT Communication Science Laboratories)
  • Shoko Araki, (NTT Communication Science Laboratories)
  • Hiroshi Sawada, (NTT Communication Science Laboratories)
  • Shoji Makino, (NTT Communication Science Laboratories)
  • Sound enhancement and sound separation
  • Microphone arrays and array signal processing
Get the paper in PDF format
Acrobat Reader (version 5 minimum) is necessary to read this document.

This paper describes a new method for blind speech separation (BSS) of convolutive mixtures. Our approach is based on a widely used speech enhancement method called beamforming. We utilize this technique for BSS by combining a beamformer and a time-frequency binary mask (TFBM) in one system. We propose two different approaches using the same basis but with a different setup. The first approach is designed for (over-)determined configurations, i.e. the number of sensors is equal to or greater than the number of sources. The second approach is designed for underdetermined configurations, i.e. the sources outnumber the sensors. Experimental results show that the proposed approach provides better results than the sole use of a conventional TFBM or a conventional beam-former.

©2006 Télécom Paris/TSI
Edition : Télécom Paris -- 2006