Télécom Paris

Voice activity detection in the DFT domain based on a parametric noise model
  • Colin Breithaupt, (Ruhr-Universität Bochum)
  • Rainer Martin, (Ruhr-Universität Bochum)
  • Voice activity detection and double-talk detection
Get the paper in PDF format
Acrobat Reader (version 5 minimum) is necessary to read this document.

We present a robust voice activity detection (VAD) algorithm which is based on the statistics of the coefficients of the discrete Fourier transform (DFT) derived from short signal segments. This algorithm uses a common parametric noise probability density function (PDF) in all frequency bins. The noise model is based on a Rayleigh inverse Gaussian distribution that is adapted to the statistics of the noise during speech-absence. As only the current and past signal frames are analysed, the detection is causal and no additional delay is introduced. A framework for protecting low energy syllables at the end of utterances is also described.

©2006 Télécom Paris/TSI
Edition : Télécom Paris -- 2006