WO2000017855A1 - Noise suppression for low bitrate speech coder - Google Patents

Noise suppression for low bitrate speech coder Download PDF

Info

Publication number
WO2000017855A1
WO2000017855A1 PCT/KR1999/000577 KR9900577W WO0017855A1 WO 2000017855 A1 WO2000017855 A1 WO 2000017855A1 KR 9900577 W KR9900577 W KR 9900577W WO 0017855 A1 WO0017855 A1 WO 0017855A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
input signal
signal
band spectrum
perceptual
Prior art date
Application number
PCT/KR1999/000577
Other languages
French (fr)
Inventor
Steven H. Isabelle
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Priority to KR1020007005629A priority Critical patent/KR100330230B1/en
Priority to CA002310491A priority patent/CA2310491A1/en
Priority to IL13609099A priority patent/IL136090A0/en
Priority to AU60079/99A priority patent/AU6007999A/en
Priority to BR9913011-4A priority patent/BR9913011A/en
Publication of WO2000017855A1 publication Critical patent/WO2000017855A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02168Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • the present invention provides a noise suppression technique suitable for
  • Spectral modification has several desirable properties. First, it can be
  • noise suppression technique that overcomes the disadvantages of the prior art.
  • noise suppression technique that accounts for time- domain discontinuities typical in block based noise suppression techniques. It would be further advantageous to provide such a technique that reduces distortion due to frequency-domain discontinuities inherent in spectral subtraction. It would be still further advantageous to reduce the complexity of spectral shaping operations in providing noise suppression, and to increase the reliability of estimated noise statistics in a noise suppression technique.
  • the present invention provides a noise suppression technique having
  • the invention also increases the reliability
  • a method in accordance with the invention suppresses noise in an input
  • the input signal is a signal that carries a combination of noise and speech.
  • the input signal is carrying noise only or a combination of noise
  • a noise suppression frequency response is then determined based on the estimate of the
  • the method can comprise the further step of prefiltering the input signal
  • the processing of the input signal comprises the application of a discrete Fourier
  • the noise suppression frequency response can be modeled using an all-
  • pole filter for use in shaping the current block of the input signal.
  • Apparatus for suppressing noise in an input signal that carries a combination of noise and speech.
  • a signal preprocessor which can pre-filter the
  • a fast Fourier transform processor then processes the blocks to
  • accumulator is provided to accumulate the complex-valued frequency domain
  • the long term perceptual-band spectrum is filtered to generate
  • noise spectrum estimator only or a combination of speech and noise.
  • a spectral gain processor based on the short-time perceptual band spectrum.
  • a spectral shaping processor responsive to the spectral gain
  • processor then shapes a current block of the input signal to suppress noise therein.
  • the spectral shaping processor can comprise, for example, an all-pole filter. Also disclosed is a method for suppressing noise in an input signal that
  • noise carries a combination of noise and audio information, such as speech.
  • audio information such as speech.
  • suppressi@n frequency response is computed for the input signal in the frequency
  • method can comprise the further step of dividing the input signal into blocks prior
  • the noise suppression frequency response is applied to the input
  • Figure 1 is a block diagram of a noise suppression algorithm in
  • Figure 2 is a diagram illustrating the block processing of an input signal
  • Figure 3 is a diagram illustrating the correlation of various noise spectrum bands (NS Band), which are of different widths, with discrete Fourier
  • Figure 4 is a block diagram of one possible embodiment of a
  • Figure 5 comprises waveforms providing an example of the energy
  • Figure 6 comprises waveforms providing an example of the spectral
  • Figure 7 comprises waveforms providing an example of the spectral
  • Figure 8 is an illustration of a signal-state machine that models a noisy
  • Figure 9 illustrates a piecewise-constant frequency response
  • Figure 10 illustrates the smoothing of the piecewise-constant frequency
  • a noisy input signal is preprocessed in a signal preprocessor 10 using a
  • preprocessor then divides the filtered signal into blocks that are passed to a fast
  • the FFT module 12 applies a window to the
  • complex-valued frequency domain representation is processed to generate a
  • noise specnrim estimation module 14 to generate an estimate of the short- time perceptual-band spectrum of the input signal. This estimate is passed on to a
  • the speech/pause detector 16 determines whether the current input signal
  • the noise spectrum estimator 18 uses the current
  • noise spectrum estimator certain parameters of the noise spectrum estimator are
  • perceptual band spectrum estimate of the noise is then passed to a spectral gain
  • the spectral gain computation module 20 determines a noise suppression frequency response. This noise suppression frequency response is
  • the AR parameter computation module models the noise suppression
  • the all-pole filter parameters can then be determined in closed form.
  • the all-pole filter parameters can then be determined in closed form.
  • the AR spectral shaping module 24 uses the AR parameters to apply a
  • noise suppression frequency response can be modeled with a low-order all-pole filter
  • time domain shaping may result in a more
  • the input signal 30 is processed in blocks of
  • analysis block 34 which, as shown, is eighty samples in length.
  • the input signal is
  • Each block consists of
  • Each block is windowed with a Hamming window and
  • Each noise suppression frame can be viewed as a 128-point sequence.
  • C is a normalization constant
  • the signal spectrum is then accumulated into bands of unequal width as
  • N Band are of different widths, and are correlated with discrete Fourier
  • DFT transform
  • the filter parameter ⁇ is chosen to perform smoothing over only a few
  • noise suppression blocks (e.g.. 2-3) noise suppression blocks. This smoothing is referred to as "short-time"
  • the noise suppression system requires an accurate estimate of the noise
  • microphone is provided that measures both the speech and the noise. Because the
  • noise suppression algorithm requires an estimate of noise statistics, a method for
  • This method must essentially detect pauses in noisy speech. This task is made
  • the pause detector must perform acceptably in low signal-to-noise
  • the pause detector must be insensitive to slow variations in
  • the pause detector must accurately distinguish between noise-like
  • speech sounds e.g. fricatives
  • background noise e.g.
  • a block diagram of one possible embodiment of the speech/pause detector 16 is
  • the pause detector models the noisy speech signal as it is being generated
  • FSM finite-state machine
  • measurement module 60 quantify the following signal properties:
  • An energy measure determines whether the signal is of high or low
  • E [i] log ⁇
  • a spectral transition measure determines whether the signal spectrum
  • T transition measure
  • perceptual spectrum is computed by the recursive filter
  • the total variance is computed as the sum of the variance of each band
  • the adaptive time constant ⁇ is given by :
  • a spectral similarity measure denoted SS, measures the degree to
  • the spectral similarity measure corresponds to highly similar spectra, while a
  • An energy similarity measure determines whether the current signal
  • the actual threshold is computed by a threshold
  • computation processor 66 which can comprise a microprocessor.
  • the binaiy parameters are defined by denoting the current estimate of the
  • the parameter high_ low _energy indicates whether the signal has a high
  • High energy is defined relative to the estimated energy of the
  • E log ⁇ G[k] ⁇ ' and E, is an adaptive threshold.
  • the parameter transition indicates when the signal spectrum is going
  • T is the spectral transition measure defined in the previous section
  • T is an adaptively computed threshold described in greater detail hereinafter.
  • the parameter spectral similarity measures similarity between the
  • Spectral_similarity 1 SS, ⁇ SS,
  • SS is described above and SS, is a threshold (e.g., a constant) as
  • the parameter energy_similarity measures the similarity between the two parameters.
  • E log and ES
  • E. log and ES
  • the first three thresholds reflect the properties of a dynamic signal
  • the threshold is of an estimated mean and sum multiple of the standard deviation.
  • the noise and can be set to a constant value.
  • the high/low energy threshold is computed by threshold computation
  • E ⁇ ,E, ⁇ + (l - ⁇ , )(E, - E,_, ) ⁇ and as E, is the empirical mean
  • the energy similarity threshold is computed as
  • the signal-state state machine 64 that models the noisy speech signal is
  • the speech/pause decision provided by detector 16 ( Figure 1) depends on the current state of the signal-state state machine and by the signal measurements
  • noise parameter estimation module 68 The noise spectrum is estimated by noise parameter estimation module 68
  • N ⁇ [k] ⁇ N l [k] + ( ⁇ - ⁇ ) ⁇ og(S, [k]) , where ⁇ is a constant between o and 1.
  • N, ?N,_, [k] + (l - ⁇ )(N, - log(E, )) 2 ,
  • filter constant ⁇ is chosen to average 10-20 noise suppression
  • the spectral gains can be computed by a variety of methods well known
  • One method that is well-suited to the current implementation comprises
  • G s [k] ⁇ G k - l ⁇ + ( ⁇ - ⁇ )G ch [k ⁇ .
  • vector G s [k] is the smoothed channel
  • a time domain implementation of the spectral shaping has the added
  • the spectral shaping technique described herein consists of a method for
  • This filter is provided
  • AR parameter computation processor 22 controls the display by AR parameter computation processor 22.
  • this may provide a computational advantage in a fixed point implementation.
  • the spectrum can be determined by solving the normal equations. The required
  • parameter computation processor 22 is applied to the current block of the noisy
  • voice activity detector which consists of a state-machine model for
  • This state-machine is driven by a variety of measurements made
  • the noise suppression filter is designed using the
  • the all-pole filter may, in some cases, be less complex

Abstract

Noise is suppressed in an input signal that carries a combination of noise and speech. The input signal is divided into signal blocks, which are processed to provide an estimate of a short-time perceptual band spectrum of the input signal. A determination is made at various points in time as to whether the input signal is carrying noise only or a combination of noise and speech. When the input signal is carrying noise only, the corresponding estimated short-time perceptual band spectrum of the input signal is used to update an estimate of a long term perceptual band spectrum of the noise. A noise suppression frequency response is then determined based on the estimate of the long term perceptual band spectrum of the noise and the short-time perceptual band spectrum of the input signal, and used to shape a current block of the input signal in accordance with the noise suppression frequency response.

Description

NOISE SUPPRESSION FOR LOW BITRATE SPEECH CODER
BACKGROUND OF THE INVENTION
The present invention provides a noise suppression technique suitable for
use as a front end to a low-bitrate speech coder. The inventive technique is
particularly suitable for use in cellular telephony applications.
The following prior art documents provide technological background for
the present invention: "ENHANCED VARIABLE RATE CODEC, SPEECH
SERVICE OPTION 3 FOR WIDEBAND SPREAD SPECTRUM DIGITAL
SYSTEMS" TIA/EIA/IS-127 Standard.
"THE STUDY OF SPEECH/PAUSE DETECTORS FOR SPEECH
ENHANCEMENT METHODS" P. Sovka and P. Pollak, Eurospeech 95 Madrid,
1995, P. 1575-1578.
" SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR
SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR" YEphraim, D. Malah,
IEEE Transactions on Acoustics Speech and Signal Processing, Vol. ASSP-32,
No. 6. Dec. 1984. pp. 1109-1121. "SUPPRESSION OF ACOUSTIC NOISE USING SPECTRAL
SUBTRACTION", S. Boll. IEEE Transactions on Acoustics Speech and Signal
Processing Vol. ASSP-27, No. 2. April 1979, pp 113-120.
"STATISTICAL-MODEL-BASED SPEECH ENHANCEMENT SYSTEMS",
Proceedings of the IEEE, Vol. 80, No. 10, October 1992, pp 1526-1544.
A low complexity approach to noise suppression is spectral modification
(also known as spectral subtraction). Noise suppression algorithms using spectral
modification first divide the noisy speech signal into several frequency bands. A
gain, typically based on an estimated signal-to-noise ratio in that band, is
computed for each band. These gains are applied and a signal is reconstructed.
This type of scheme must estimate signal and noise characteristics from the
observed noisy speech signal. Several implementations of spectral modification
techniques can be found in US patents 5,687,285; 5,680,393: 5,668,927
5,659,622; 5,651,071; 5,630,015; 5,625,684; 5,621,850; 5,617,505; 5,617,472
5,602,962; 5,577, 161; 5,555,287; 5,550,924; 5,544,250; 5,539,859; 5,533,133
5,530,768; 5,479,560; 5,432.859; 5,406,635; 5,402,496; 5,388, 182; 5,388, 160
5.353.376; 5,319.736: 5,278.780; 5,251.263; 5, 168,526; 5.133.013; 5,081,681 5,040.156; 5,012,519; 4,908,855; 4,897,878; 4,811,404; 4,747, 143; 4,737,976;
4,630.305; 4,630,304: 4,628,529 and 4,468,804.
Spectral modification has several desirable properties. First, it can be
made to be adaptive and hence can handle a changing noise environment. Second,
much of the computation can be performed in the discrete Fourier transform
(DFT domain. Thus, fast algorithms (like the fast Fourier transform (FFT)) can
be used.
There are, however, several shortcomings in the current state of the art.
These include:
(i) objectionable distortion of the desired speech signal in moderate to
high noise levels (such distortions have several causes, some of which
are detailed below); and
(ii) excessive computational complexity
It would be advantageous to provide a noise suppression technique that overcomes the disadvantages of the prior art. In particular, it would be advantageous to provide a noise suppression technique that accounts for time- domain discontinuities typical in block based noise suppression techniques. It would be further advantageous to provide such a technique that reduces distortion due to frequency-domain discontinuities inherent in spectral subtraction. It would be still further advantageous to reduce the complexity of spectral shaping operations in providing noise suppression, and to increase the reliability of estimated noise statistics in a noise suppression technique.
The present invention provides a noise suppression technique having
these and other advantages.
SUMMARY OF THE INVENTION
In accordance with the present invention, a noise suppression technique is
provided in which a reduction is achieved in distortion due to time-domain
discontinuities that are typical in block based noise suppression techniques.
Distortion due to frequency-domain discontinuities inherent in spectral
subtraction is also reduced, as is the complexity of the spectral shaping operations
used in the noise suppression process. The invention also increases the reliability
of estimated noise statistics by using an improved voice activity detector.
A method in accordance with the invention suppresses noise in an input
signal that carries a combination of noise and speech. The input signal is
divided into signal blocks, which are processed to provide an
estimate of a short - time perceptual band spectrum of the input
signal. A determination is made at various points in time as to whether
the input signal is carrying noise only or a combination of noise and
speech. When the input signal is carrying noise only, the corresponding estimated
short-time perceptual band spectrum of the input signal is used to update an
estimate of an long term perceptual band spectrum of the noise. A noise suppression frequency response is then determined based on the estimate of the
long term perceptual band spectrum of the noise and the short-time perceptual
band spectrum of the input signal, and used to shape a current block of the input
signal in accordance with the noise suppression frequency response.
The method can comprise the further step of prefiltering the input signal
to emphasize high frequency components thereof. In an illustrated embodiment,
the processing of the input signal comprises the application of a discrete Fourier
transform to the signal blocks to provide a complex-valued frequency domain
representation of each block. The frequency domain representations of the signal
blocks are converted to magnitude only signals, which are averaged across
disjoint frequency bands to provide a long term perceptual-band spectrum
estimate. Time variations in the perceptual band spectrum are smoothed to
provide the short-time perceptual band spectrum estimate.
The noise suppression frequency response can be modeled using an all-
pole filter for use in shaping the current block of the input signal.
Apparatus is provided for suppressing noise in an input signal that carries a combination of noise and speech. A signal preprocessor, which can pre-filter the
input signal to emphasize high frequency components thereof, divides the input
signal intό^blocks. A fast Fourier transform processor then processes the blocks to
provide a complex-valued frequency domain spectrum of the input signal. An
accumulator is provided to accumulate the complex-valued frequency domain
spectrum into a long term perceptual-band spectrum comprising frequency bands
of unequal width. The long term perceptual-band spectrum is filtered to generate
an estimate of a short-time perceptual-band spectrum comprising a current
segment of said long term perceptual-band spectrum plus noise. A speech/pause
detector determines whether the input signal is, at a given point in time, noise
only or a combination of speech and noise. A noise spectrum estimator,
responsive to the speech/pause detection circuit when the input signal is noise
only, updates an estimate of the long term perceptual band spectrum of the noise
based on the short-time perceptual band spectrum. A spectral gain processor
responsive to the noise spectrum estimator determines a noise suppression
frequency response. A spectral shaping processor responsive to the spectral gain
processor then shapes a current block of the input signal to suppress noise therein.
The spectral shaping processor can comprise, for example, an all-pole filter. Also disclosed is a method for suppressing noise in an input signal that
carries a combination of noise and audio information, such as speech. A noise
suppressi@n frequency response is computed for the input signal in the frequency
domain. The computed noise suppression frequency response is then applied to
the input signal in the time domain to suppress noise in the input signal. This
method can comprise the further step of dividing the input signal into blocks prior
to computing the noise suppression frequency response thereof. In an illustrated
embodiment, the noise suppression frequency response is applied to the input
signal via an all-pole filter generated by determining an autocorrelation function
of the noise suppression frequency response.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of a noise suppression algorithm in
accordance with the present invention;
Figure 2 is a diagram illustrating the block processing of an input signal
in accordance with the invention;
Figure 3 is a diagram illustrating the correlation of various noise spectrum bands (NS Band), which are of different widths, with discrete Fourier
transform (DFT) bins;
Figure 4 is a block diagram of one possible embodiment of a
speech/pause detector;
Figure 5 comprises waveforms providing an example of the energy
measure of a noisy speech utterance;
Figure 6 comprises waveforms providing an example of the spectral
transition measure of a noisy speech utterance;
Figure 7 comprises waveforms providing an example of the spectral
similarity measure of a noisy speech utterance;
Figure 8 is an illustration of a signal-state machine that models a noisy
speech signal;
Figure 9 illustrates a piecewise-constant frequency response: and
Figure 10 illustrates the smoothing of the piecewise-constant frequency
response of Figure 9.
DETAILED DESCRIPTION OF THE INVENTION Ln accordance with the present invention, a noise suppression algorithm
computes a time vaiying filter response and applies it to the noisy speech. A
block diagram of the algorithm is shown in Figure 1. wherein the blocks labeled
"AR Parameter Computation' and "AR Spectral Shaping" are related to the
application of the time varying filter response, and "AR" designates "auto-
regressive". All other blocks in Figure 1 correspond to computing the time-
varying filter response from the noisy speech.
A noisy input signal is preprocessed in a signal preprocessor 10 using a
simple high-pass filter to slightly emphasize its high frequencies. The
preprocessor then divides the filtered signal into blocks that are passed to a fast
Fourier transform (FFT) module 12. The FFT module 12 applies a window to the
signal blocks and a discrete Fourier transform to the signal. The resulting
complex-valued frequency domain representation is processed to generate a
magnitude only signal. These magnitude-only signal values are averaged in
disjoint frequency bands yielding a "perceptual-band spectrum". The averaging
results in a reduction of the amount of data that must be processed.
Time-variations in the perceptual-band spectrum are smoothed in a signal
and noise specnrim estimation module 14 to generate an estimate of the short- time perceptual-band spectrum of the input signal. This estimate is passed on to a
speech/pause detector 16. a noise spectrum estimator 18, and a spectral gain
computation module 20.
The speech/pause detector 16 determines whether the current input signal
is simply noise, or a combination of speech and noise. It makes this determination
by measuring several properties of the input speech signal, using these
measurements to update a model of the input signal; and using the state of this
model to make the final speech/pause decision. The decision is then passed on to
the noise spectrum estimator.
When the speech/pause detector 16 determines that the input signal
consists of noise only, the noise spectrum estimator 18 uses the current
perceptual-band spectrum to update an estimate of the perceptual-band spectrum
of the noise. In addition, certain parameters of the noise spectrum estimator are
updated in this module and passed back to the speech/pause detector 16. The
perceptual band spectrum estimate of the noise is then passed to a spectral gain
computation module 20.
Using the estimate of the perceptual-band spectra of the current signal
and the noise, the spectral gain computation module 20 determines a noise suppression frequency response. This noise suppression frequency response is
piecewise constant, as shown in Figure 9. Each piecewise constant segment
corresponds to one element of the critical band spectrum. This frequency
response is passed to the AR parameter computation module 22.
The AR parameter computation module models the noise suppression
frequency response with an all-pole filter. Because the noise suppression
frequency response is piecewise constant, its auto-correlation function can easily
be determined in closed form. The all-pole filter parameters can then be
efficiently computed from the auto-correlation function. The all pole modeling of
the piecewise constant spectrum has the effect of smoothing out discontinuities in
the noise suppression spectrum. It should be appreciated that other modeling
techniques now known or hereafter discovered may be substituted for the use of
an all-pole filter and all such equivalents are intended to be covered by the
invention claimed herein.
The AR spectral shaping module 24 uses the AR parameters to apply a
filter to the current block of the input signal. By implementing the spectral
shaping in the time domain, time discontinuities due to block processing are
reduced. Also, because the noise suppression frequency response can be modeled with a low-order all-pole filter, time domain shaping may result in a more
efficient implementation on certain processors.
In*signal preprocessing module 10, the signal is first pre-emphasized with
a high-pass filter of the form H(z) = l-0.8z "' . This high-pass filter is chosen
to partially compensate for the spectral tilt inherent in speech. Signals thus
preprocessed generate more accurate noise suppression frequency responses.
As illustrated in Figure 2, the input signal 30 is processed in blocks of
eighty samples (corresponding to 10ms at a sampling rate of 8 KHz). This is
illustrated by analysis block 34, which, as shown, is eighty samples in length.
More particularly, in the illustrated example embodiment, the input signal is
divided into blocks of one hundred twenty-eight samples. Each block consists of
the last twenty-four samples from the previous block (reference numeral 32), the
eighty new samples of the analysis block 34, and twenty-four samples of zeros
(reference numeral 36). Each block is windowed with a Hamming window and
Fourier transformed.
The zero-padding implicit in the block structure deserves further
explanation. In particular, from a signal processing standpoint, zero-padding is unnecessary because the spectral shaping (described below) is not implemented
using a Discrete Fourier Transform. However, including the zero-padding eases
the integration of this algorithm into the existing EVRC voice codec implemented
by Solana Technology Development Corporation, the assignee of the present
invention. This block structure requires no change in the overall buffer
management strategy of the existing EVRC code.
Each noise suppression frame can be viewed as a 128-point sequence.
Denoting this sequence by gfnj. the frequency-domain representation of a signal
Λ-/-1 block is defined as the discrete Fourier transform G[k] = c g[n]ej2mk/M , where π=0
C is a normalization constant.
The signal spectrum is then accumulated into bands of unequal width as
follows:
Figure imgf000016_0001
Where
F, [k] = {2,4,6,8, 10, 12, 14, 17,20.23.27,31,36,42,49,56}
FΛ [k] = [3,5.7.9.11.13.16.19.22.26.30.35.41.48,55,63 } This is referred to as the perceptual-band spectrum. The bands, generally
designated 50, are illustrated in Figure 3. As shown, the noise spectrum bands
(NS Band) are of different widths, and are correlated with discrete Fourier
transform (DFT) bins.
The estimate of the perceptual band spectrum of the signal plus noise is
generated in module 14 (Figure 1) by filtering the perceptual-band spectra, e.g.
with a single-pole recursive filter. The estimate of the power spectrum of the
signal plus noise is:
S„[*] = β - S„ [k] + ( l-β ) ' S[k]
Because the properties of speech are stationary only over relatively short time
periods, the filter parameter β is chosen to perform smoothing over only a few
(e.g.. 2-3) noise suppression blocks. This smoothing is referred to as "short-time"
smoothing, and provides an estimate of a "short-time perceptual band spectrum"
The noise suppression system requires an accurate estimate of the noise
statistics in order to function properly. This function is provided by the speech/pause detection module 16. In one possible embodiment, a single
microphone is provided that measures both the speech and the noise. Because the
noise suppression algorithm requires an estimate of noise statistics, a method for
distinguishing between noisy speech signals and noise-only signals is required.
This method must essentially detect pauses in noisy speech. This task is made
more difficult by several factors:
1. The pause detector must perform acceptably in low signal-to-noise
ratios (on the order of 0 to 5 dB).
2. The pause detector must be insensitive to slow variations in
background noise statistics.
3. The pause detector must accurately distinguish between noise-like
speech sounds (e.g. fricatives) and background noise.
A block diagram of one possible embodiment of the speech/pause detector 16 is
provided in Figure 4.
The pause detector models the noisy speech signal as it is being generated
by switching between a finite number of signal models. A finite-state machine (FSM) 64 governs transitions between the models. The speech/pause decision is a
function of the current state of the FSM along with measurements made on the
current signal and other appropriate state variables. Transitions between states are
functions of the current FSM state and measurements made on the current signal.
The measured quantities described below are used to determine binary
valued parameters that drive the signal-state state machine 64. In general these
binary valued parameters are determined by comparing the appropriate real-
valued measurements to an adaptive threshold. The signal measurements
provided by measurement module 60 quantify the following signal properties:
1. An energy measure determines whether the signal is of high or low
63 2 energy. This signal energy, denoted E [i], is defined as E, = log^ |G[A:]| . An
*r=0
example of the energy measure of a noisy speech utterance is shown in Figure 5,
where the amplitude of individual speech samples is indicated by curve 70 and
the energy measure of the corresponding NS blocks is indicated by curve 72. 2. A spectral transition measure determines whether the signal spectrum
is steady-state or transient over a short time window. This measure is computed
by determining an empirical mean and variance of each band of the perceptual
band spectrum. The sum of the variances of all bands of the perceptual band
spectrum is used as a measure of spectral transition. More specifically, the
transition measure, denoted T, is computed as follows:
The mean of each band of the perceptual spectrum is computed by the single-pole
recursive filter S, [k] = aSl_ [k] + (\ - a)St[k] . The variance of each band of the
perceptual spectrum is computed by the recursive filter
Sl [k] = aSl^ [k] + (l - a)(S, [k] - S, [k])2. The filter parameter is chosen to
perform smoothing over a relatively long period of time, i.e. 10 to 12 noise
suppression blocks.
The total variance is computed as the sum of the variance of each band
15 „ σ, : = ^Sι [A-]. Note that the variance of σ,: , itself will be smallest when the
perceptual band spectrum does not vary greatly from its long term mean. It
follows that a reasonable measure of spectral transition is the variance of σ,: , which is computed as follows:
σ ~, = ft>,σ ~,-ι + (l - 6), )σ, "
Figure imgf000021_0001
The adaptive time constant ω, is given by :
0.875 σf )~ co. i_-l
0.25 σ ' ≤ σ,_
By adapting the time constant, the spectral transition measure properly
tracks portions of the signal that are stationary. An example of the spectral
transition measure of a noisy speech utterance is shown in Figure 6, where the
amplitude of individual speech samples is indicated by curve 74 and the energy
measure of the corresponding NS blocks is indicated by curve 75.
3. A spectral similarity measure, denoted SS, measures the degree to
which the current signal spectrum is similar to the estimated noise spectrum. In
order to define the spectral similarity measure, we assume that an estimate of the
logarithm of the perceptual band spectrum of the noise, denoted by N,[k], is
available (the definition of N,[k] is provided below in connection with the discussion on the noise spectrum estimator). The spectral similarity measure is
then defined as SS, = S,[k]- N,
Figure imgf000022_0001
. An example of the spectral similarity
Figure imgf000022_0002
measure of a noisy utterance is shown in Figure 7, where the amplitude of
individual speech samples is indicated by curve 76 and the energy measure of the
corresponding NS blocks is indicated by curve 78. Note that the a low value of
the spectral similarity measure corresponds to highly similar spectra, while a
higher spectral similarity measure corresponds to dissimilar spectra.
4. An energy similarity measure determines whether the current signal
energy E = log T |G[Λr]| " is similar to the estimated noise energy. This is
1 =0
detemύned by comparing the signal energy to a threshold applied by threshold
application module 62. The actual threshold is computed by a threshold
computation processor 66, which can comprise a microprocessor.
The binaiy parameters are defined by denoting the current estimate of the
signal spectrum by S[k], the current estimate of the signal energy by Eh the
current estimate of the log noise specmim by N,[&], the current estimate of the noise energy by JV, . and the variance of the noise energy estimate by N .
The parameter high_ low _energy indicates whether the signal has a high
energy content. High energy is defined relative to the estimated energy of the
background noise. It is computed by estimating the energy in the current signal
frame and applying a threshold. It is defined as
high_low_energy = 1 E, > E,
0 E, < E,
63 Where E is defined by E, = log \G[k]\ ' and E, is an adaptive threshold.
The parameter transition indicates when the signal spectrum is going
through a transition. It is measured by observing the deviation of the current
short-time spectrum from the average value of the spectrum.
Mathematically it is defined by
Transition = 1 T, T,
0 T, < T, where T is the spectral transition measure defined in the previous section
and T, is an adaptively computed threshold described in greater detail hereinafter.
The parameter spectral similarity measures similarity between the
spertrum of the current signal and the estimated noise spectrum. It is measured by
computing the distance between the log spectrum of the current signal and the
estimated log spectrum of the noise.
Spectral_similarity = 1 SS, < SS,
0 SS, ≥ SS,
where SS, is described above and SS, is a threshold (e.g., a constant) as
discussed below.
The parameter energy_similarity measures the similarity between the
energy in the current signal and the estimated noise energy.
energy_similarity = 1 E < ES,
0 E ≥ES,
Where E is defined by E. = log
Figure imgf000024_0001
and ES, is an adaptively computed threshold defined below.
;-' he variables described above are all computed by comparing a number
to a threshold. The first three thresholds reflect the properties of a dynamic signal
and will depend on the properties of the noise. These three thresholds are the sum
of an estimated mean and sum multiple of the standard deviation. The threshold
for the spectral similarity measure does not depend on the specific properties of
the noise and can be set to a constant value.
The high/low energy threshold is computed by threshold computation
processor 66 (Figure 4) as E, = E,_, + 2 /E,_1 , where E, is the empirical variance
defined as E, = γ,E,^ + (l - γ, )(E, - E,_, )\ and as E, is the empirical mean
defined as E, = γEl_] + (\ - γ)E, .
The energy similarity threshold is computed as
ES, [i] N + 2^ ,/NT , M, + 2^(1 05ES< \' ~ l\
I .05ES,|/ - l! otherwise Note that the growth rate of the energy similarity threshold is limited by the factor
1.05 in the present example. This ensures that high noise energies do not have a
disproportionate influence on the value of the threshold.
The spectral transition threshold is computed as Tt = 2JV, . The spectral
similarity threshold is constant with value SSr = 10.
The signal-state state machine 64 that models the noisy speech signal is
illustrated in greater detail in Figure 8. Its state transitions are governed by the
signal measurements described in the previous section. The signal states are
steady-state low energy, shown as element 80, nansient, shown as element 82,
and steady-state high energy, shown as element 84. During steady-state, low
energy, no spectral transition is occurring and the signal energy is below a
threshold. During π-ansient. a spectral transition is occurring. During steady-state
high energy, no spectral transition is occurring and the signal energy is above a
threshold. The transitions between states are governed by the signal
measurements described above.
The state machine transitions are defined in Table 1. TABLE 1
Figure imgf000027_0001
In this table, "X" means "any value". Note that a state transition is
assured for any measurement.
The speech/pause decision provided by detector 16 (Figure 1) depends on the current state of the signal-state state machine and by the signal measurements
described in connection with Figure 4. The speech/pause decision is governed by
the following pseudocode (pause: dec = 0; speech: dec =1)
Dec = 1;
if spectral _ similarity == 1
dec = 0 ;
elseif durrent-state == 1
if energy _similarity == 1
dec = 0;
end
end
The noise spectrum is estimated by noise parameter estimation module 68
(Figure 4) during frames classified as pauses using the formula
Nι [k] = βNl [k] + (\ - β) \og(S, [k]) , where β is a constant between o and 1. The
current estimate of the noise energy, N. . and the variance of the noise energy
estimate, V, are defined as follows:
Figure imgf000028_0001
N, = ?N,_, [k] + (l - λ)(N, - log(E, ))2 ,
Where the filter constant λ is chosen to average 10-20 noise suppression
blocks. >
The spectral gains can be computed by a variety of methods well known
in the art. One method that is well-suited to the current implementation comprises
defining the signal to noise ratio as SNR [k] = c*(logSu[&] - N,[£]), where c is a
constant and S„[k] and N,[k] are as defined above. The noise dependent
component of the gain is defined as γy - -10* T N[k] . The instantaneous gain is k
computed as
Figure imgf000029_0001
10 w+C:(S 7?(<:!"6))' :0 . Once the instantaneous gain has been
computed, it is smoothed using the single-pole smoothing filter
Gs[k] = βG k - l} + (\ - β)Gch[k} . where vector Gs [k] is the smoothed channel
gain vector at time k.
Once a target frequency response has been computed, it must be applied
to the noisy speech. This corresponds to a (time-varying) filtering operation that
modifies the short-time spectrum of the noisy speech signal. The result is the noise-suppressed signal. Contrary to current practice, this spectral modification
need not be applied in the frequency domain. Indeed, a frequency domain
implementation may have the following disadvantages:
1. It may be unnecessarily complex.
2. It may result in lower quality noise suppressed speech.
A time domain implementation of the spectral shaping has the added
advantage that the impulse response of the shaping filter need not be linear phase.
Also, a time-domain implementation eliminates the possibility of artifacts due to
circular convolution.
The spectral shaping technique described herein consists of a method for
designing a low complexity filter that implements the noise suppression
frequency response along with the application of that filter. This filter is provided
by the AR spectral shaping module 24 (Figure 1) based on parameters provided
by AR parameter computation processor 22.
Because the desired frequency response is piecewise-constant with relatively few segments, as illustrated in Figure 9, its auto-correlation function
can be efficiently determined in closed form. Given the auto-correlation
coefficients, an all-pole filter that approximates the piecewise constant frequency
response can be determined. This approach has several advantages. First, spectral
discontinuities associated with the piecewise constant frequency response are
smoothed out. Second, the time discontinuities associated with FFT block
processing are eliminated. Third, because the shaping is applied in the time-
domain, an inverse DFT is not required. Given the low order of the all-pole filter,
this may provide a computational advantage in a fixed point implementation.
Such a frequency response can be expressed mathematically as
H(ω) = ∑ G,
Figure imgf000031_0001
ωk_] , ωk ) , where GΛk] is the smoothed channel gain, which
sets the amplitude of the ith piecewise-constant segment, and l(ω, ω _x , ωt ) is the
indicator function for the interval bounded by the frequencies ω^ , ω, , i.e.,
l(ω, ω.^ , ω: ) equals 1 when ω._ (ω(ω, , and 0 otherwise. The auto-correlation
function is the inverse Fourier transform of H (ω) t i.e.,
ύn(γ,n) cos(βtn)
*ω (") = 2∑ ( /[*] TUl Where /, = (&>, - ωl ) and β, = (ω,_ + ω, )l 2. This can be easily
imple .
Figure imgf000032_0001
Given the auto-correlation function set forth above, an all-pole model of
the spectrum can be determined by solving the normal equations. The required
matrix inversion can be computed efficiently using, e.g., the Levinson/Durbin
recursion.
An example of the effectiveness of all-pole modeling with an order
sixteen filter is shown in Figure 10. Note that the spectral discontinuities have
been smoothed out. Obviously, the model can be made more accurate by
increasing the all-pole filter order. However, a filter order of sixteen provides
good performance at reasonable computational cost.
The all-pole filter provided by the parameters computed by the AR
parameter computation processor 22 is applied to the current block of the noisy
input signal in the AR spectral shaping module 24. in order to provide the spectrally shaped output signal.
It should now be appreciated that the present invention provides a method
and apparatus for noise suppression with various unique features. In particular, a
voice activity detector is provided which consists of a state-machine model for
the input signal. This state-machine is driven by a variety of measurements made
from the input signal. This structure yields a low complexity yet highly accurate
speech/pause decision. In addition, the noise suppression frequency response is
computed in the frequency-domain but applied in the time-domain. This has the
effect of eliminating time-domain discontinuities that would occur in "block-
based" methods that apply the noise suppression frequency response in the
frequency domain. Moreover, the noise suppression filter is designed using the
novel approach of determining an auto-correlation function of the noise
suppression frequency response. This auto-correlation sequence is then used to
generate an all pole filter. The all-pole filter may, in some cases, be less complex
to implement that a frequency domain method.
Although the invention has been described in connection with a particular
embodiment thereof, it should be appreciated that numerous modifications and
adaptations may be made thereto without departing from the scope of the invention as set forth in the claims.

Claims

WHAT IS CLAIMED IS:
lv ^A method for suppressing noise in an input signal that carries a
combination of noise and speech, comprising the steps of:
dividing said input signal into signal blocks;
processing said signal blocks to provide an estimate of a short-time
perceptual band spectrum of said input signal;
determining at various points in time whether said input signal is carrying
noise only or a combination of noise and speech, and when the input signal is
carrying noise only, using the corresponding estimated short-time perceptual band
spectnim of the input signal to update an estimate of a long term perceptual band
spectnim of the noise;
determining a noise suppression frequency response based on said estimate
of the long term perceptual band spectrum of the noise and the estimated short-
time perceptual band spectrum of the input signal; and
shaping a current block of the input signal in accordance with said noise
suppression frequency response.
2. A method in accordance with claim 1 comprising the further step of :
pre-fϊltering said input signal prior to said processing step to emphasize
high frequency components thereof.
3. A method in accordance with claim 2 wherein said processing step
comprises the steps of:
applying a discrete Fourier transform to the siganl blocks to provide a
complex- valued frequency domain representation of each block;
converting the frequency domain representations of the signal blocks to
magnitude only signals;
averaging the magnitude only signals across disjoint frequency bands to
provide said long term perceptual-band spectrum estimate; and
smoothing time variations in the perceptual band spectrum to provide said
short-time perceptual band spectrum estimate.
4. A method in accordance with claim 3 wherein said noise suppression
frequency response is modeled using an all-pole filter during said shaping step.
5. A method in accordance with claim 1 wherein said noise suppression
frequency response is modeled using an all-pole filter during said shaping step.
6. A method in accordance with claim 1 wherein said processing step
comprises the steps of:
applying a discrete Fourier transform to the signal blocks to provide a
complex- valued frequency domain representation of each block;
converting the frequency domain representations of the signal blocks to
magnitude only signals;
averaging the magnitude only signals across disjoint frequency bands to
provide said long term perceptual-band spectrum estimate: and
smoothing time variations in the perceptual band spectrum to provide said
short-time perceptual band spectrum estimate.
7. Apparatus for suppressing noise in an input signal that carries a
combination of noise and speech, comprising:
a signal preprocessor for dividing said input signal into blocks;
a fast Fourier transform processor for processing said blocks to provide a complex- valued frequency domain spectrum of said input signal;
an accumulator for accumulating said complex-valued frequency domain
spectruηi into a long term perceptual-band spectrum comprising frequency bands
of unequal width;
a filter for filtering the long term perceptual-band spectrum to generate an
estimate of a short-time perceptual-band spectrum comprising a current segment
of said long term perceptual-band spectrum plus noise;
a speech/pause detector for determining whether said input signal is
currently noise only or a combination of speech and noise;
a noise spectrum estimator responsive to said speech/pause detection
circuit when the input signal is noise only for updating an estimate of the long
term perceptual band spectrum of the noise based on the short-time perceptual
band spectrum of the input signal:
a spectral gain processor responsive to said noise spectrum estimator for
determining a noise suppression frequency response: and
a spectral shaping processor responsive to said spectral gain processor for
shaping a current block of the input signal to suppress noise therein.
8. Apparatus in accordance with claim 7 wherein said spectral shaping
processor comprises an all-pole filter.
9. Apparatus in accordance with claim 8 wherein said signal preprocessor
pre-filters said input signal to emphasize high frequency components thereof.
10. Apparatus in accordance with claim 7 wherein said signal preprocessor
pre-filters said input signal to emphasize high frequency components thereof.
11. A method for suppressing noise in an input signal that carries a
combination of noise and audio information, comprising the steps of;
computing a noise suppression frequency response for said input signal in
the frequency domain; and
applying said noise suppression frequency response to said input signal in
the time domain to suppress noise in the input signal.
12. A method in accordance with claim 11 comprising the further step of
dividing said input signal into blocks pπor to computing the noise suppression frequency response thereof.
13,. A method in accordance with claim 12 wherein said noise
suppression frequency response is applied to said input signal via an all-pole filter
generated by determining an autocorrelation function of the noise suppression
frequency response.
14. A method in accordance with claim 11 wherein said noise suppression
frequency response is applied to said input signal via an all-pole filter generated
by determining an autocorrelation function of the noise suppression frequency
response.
PCT/KR1999/000577 1998-09-23 1999-09-22 Noise suppression for low bitrate speech coder WO2000017855A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020007005629A KR100330230B1 (en) 1998-09-23 1999-09-22 Noise suppression for low bitrate speech coder
CA002310491A CA2310491A1 (en) 1998-09-23 1999-09-22 Noise suppression for low bitrate speech coder
IL13609099A IL136090A0 (en) 1998-09-23 1999-09-22 Noise supression for low bitrate speech coder
AU60079/99A AU6007999A (en) 1998-09-23 1999-09-22 Noise suppression for low bitrate speech coder
BR9913011-4A BR9913011A (en) 1998-09-23 1999-09-22 Process and apparatus for suppressing noise in an input signal that carries a combination of noise and voice

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/159,358 1998-09-23
US09/159,358 US6122610A (en) 1998-09-23 1998-09-23 Noise suppression for low bitrate speech coder

Publications (1)

Publication Number Publication Date
WO2000017855A1 true WO2000017855A1 (en) 2000-03-30

Family

ID=22572262

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US1999/021033 WO2000017859A1 (en) 1998-09-23 1999-09-15 Noise suppression for low bitrate speech coder
PCT/KR1999/000577 WO2000017855A1 (en) 1998-09-23 1999-09-22 Noise suppression for low bitrate speech coder

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/US1999/021033 WO2000017859A1 (en) 1998-09-23 1999-09-15 Noise suppression for low bitrate speech coder

Country Status (10)

Country Link
US (1) US6122610A (en)
EP (1) EP1116224A4 (en)
JP (1) JP2003517624A (en)
KR (2) KR20010075343A (en)
CN (2) CN1326584A (en)
AU (2) AU6037899A (en)
BR (1) BR9913011A (en)
CA (2) CA2344695A1 (en)
IL (1) IL136090A0 (en)
WO (2) WO2000017859A1 (en)

Families Citing this family (95)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6415253B1 (en) * 1998-02-20 2002-07-02 Meta-C Corporation Method and apparatus for enhancing noise-corrupted speech
US6351731B1 (en) 1998-08-21 2002-02-26 Polycom, Inc. Adaptive filter featuring spectral gain smoothing and variable noise multiplier for noise reduction, and method therefor
US6453285B1 (en) * 1998-08-21 2002-09-17 Polycom, Inc. Speech activity detector for use in noise reduction system, and methods therefor
KR100281181B1 (en) * 1998-10-16 2001-02-01 윤종용 Codec Noise Reduction of Code Division Multiple Access Systems in Weak Electric Fields
US7177805B1 (en) * 1999-02-01 2007-02-13 Texas Instruments Incorporated Simplified noise suppression circuit
US6397177B1 (en) * 1999-03-10 2002-05-28 Samsung Electronics, Co., Ltd. Speech-encoding rate decision apparatus and method in a variable rate
US6507623B1 (en) * 1999-04-12 2003-01-14 Telefonaktiebolaget Lm Ericsson (Publ) Signal noise reduction by time-domain spectral subtraction
US6351729B1 (en) * 1999-07-12 2002-02-26 Lucent Technologies Inc. Multiple-window method for obtaining improved spectrograms of signals
US6980950B1 (en) * 1999-10-22 2005-12-27 Texas Instruments Incorporated Automatic utterance detector with high noise immunity
JP3878482B2 (en) * 1999-11-24 2007-02-07 富士通株式会社 Voice detection apparatus and voice detection method
US6473733B1 (en) * 1999-12-01 2002-10-29 Research In Motion Limited Signal enhancement for voice coding
JP2001166782A (en) * 1999-12-07 2001-06-22 Nec Corp Method and device for generating alarm signal
US6317456B1 (en) * 2000-01-10 2001-11-13 The Lucent Technologies Inc. Methods of estimating signal-to-noise ratios
US9609278B2 (en) 2000-04-07 2017-03-28 Koplar Interactive Systems International, Llc Method and system for auxiliary data detection and delivery
DE10017646A1 (en) * 2000-04-08 2001-10-11 Alcatel Sa Noise suppression in the time domain
US6463408B1 (en) * 2000-11-22 2002-10-08 Ericsson, Inc. Systems and methods for improving power spectral estimation of speech signals
US7617099B2 (en) * 2001-02-12 2009-11-10 FortMedia Inc. Noise suppression by two-channel tandem spectrum modification for speech signal in an automobile
EP1244094A1 (en) * 2001-03-20 2002-09-25 Swissqual AG Method and apparatus for determining a quality measure for an audio signal
KR20020082643A (en) * 2001-04-25 2002-10-31 주식회사 호서텔넷 synchronous detector by using fast fonrier transform(FFT) and inverse fast fourier transform (IFFT)
WO2003001173A1 (en) * 2001-06-22 2003-01-03 Rti Tech Pte Ltd A noise-stripping device
US6952482B2 (en) * 2001-10-02 2005-10-04 Siemens Corporation Research, Inc. Method and apparatus for noise filtering
KR100434723B1 (en) * 2001-12-24 2004-06-07 주식회사 케이티 Sporadic noise cancellation apparatus and method utilizing a speech characteristics
US8718687B2 (en) * 2002-03-26 2014-05-06 Zoove Corp. System and method for mediating service invocation from a communication device
US7949522B2 (en) 2003-02-21 2011-05-24 Qnx Software Systems Co. System for suppressing rain noise
US7885420B2 (en) * 2003-02-21 2011-02-08 Qnx Software Systems Co. Wind noise suppression system
US8271279B2 (en) 2003-02-21 2012-09-18 Qnx Software Systems Limited Signature noise removal
US8326621B2 (en) * 2003-02-21 2012-12-04 Qnx Software Systems Limited Repetitive transient noise removal
US7593851B2 (en) * 2003-03-21 2009-09-22 Intel Corporation Precision piecewise polynomial approximation for Ephraim-Malah filter
US7330511B2 (en) 2003-08-18 2008-02-12 Koplar Interactive Systems International, L.L.C. Method and system for embedding device positional data in video signals
US7224810B2 (en) * 2003-09-12 2007-05-29 Spatializer Audio Laboratories, Inc. Noise reduction system
US9055239B2 (en) 2003-10-08 2015-06-09 Verance Corporation Signal continuity assessment using embedded watermarks
US7454332B2 (en) * 2004-06-15 2008-11-18 Microsoft Corporation Gain constrained noise suppression
KR100657912B1 (en) * 2004-11-18 2006-12-14 삼성전자주식회사 Noise reduction method and apparatus
US20070116300A1 (en) * 2004-12-22 2007-05-24 Broadcom Corporation Channel decoding for wireless telephones with multiple microphones and multiple description transmission
US20060147063A1 (en) * 2004-12-22 2006-07-06 Broadcom Corporation Echo cancellation in telephones with multiple microphones
US7983720B2 (en) * 2004-12-22 2011-07-19 Broadcom Corporation Wireless telephone with adaptive microphone array
US20060133621A1 (en) * 2004-12-22 2006-06-22 Broadcom Corporation Wireless telephone having multiple microphones
US8509703B2 (en) * 2004-12-22 2013-08-13 Broadcom Corporation Wireless telephone with multiple microphones and multiple description transmission
KR100738341B1 (en) * 2005-12-08 2007-07-12 한국전자통신연구원 Apparatus and method for voice recognition using vocal band signal
KR100784456B1 (en) * 2005-12-08 2007-12-11 한국전자통신연구원 Voice Enhancement System using GMM
US8345890B2 (en) 2006-01-05 2013-01-01 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US8194880B2 (en) 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US8744844B2 (en) * 2007-07-06 2014-06-03 Audience, Inc. System and method for adaptive intelligent noise suppression
US8204252B1 (en) 2006-10-10 2012-06-19 Audience, Inc. System and method for providing close microphone adaptive array processing
US9185487B2 (en) * 2006-01-30 2015-11-10 Audience, Inc. System and method for providing noise suppression utilizing null processing noise subtraction
US8150065B2 (en) 2006-05-25 2012-04-03 Audience, Inc. System and method for processing an audio signal
US8204253B1 (en) 2008-06-30 2012-06-19 Audience, Inc. Self calibration of audio device
US8849231B1 (en) 2007-08-08 2014-09-30 Audience, Inc. System and method for adaptive power control
US8934641B2 (en) 2006-05-25 2015-01-13 Audience, Inc. Systems and methods for reconstructing decomposed audio signals
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
US8259926B1 (en) 2007-02-23 2012-09-04 Audience, Inc. System and method for 2-channel and 3-channel acoustic echo cancellation
US8189766B1 (en) 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
US8428661B2 (en) * 2007-10-30 2013-04-23 Broadcom Corporation Speech intelligibility in telephones with multiple microphones
US20090111584A1 (en) 2007-10-31 2009-04-30 Koplar Interactive Systems International, L.L.C. Method and system for encoded information processing
US8296136B2 (en) * 2007-11-15 2012-10-23 Qnx Software Systems Limited Dynamic controller for improving speech intelligibility
US8143620B1 (en) 2007-12-21 2012-03-27 Audience, Inc. System and method for adaptive classification of audio sources
US8180064B1 (en) 2007-12-21 2012-05-15 Audience, Inc. System and method for providing voice equalization
US8194882B2 (en) 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
US8355511B2 (en) 2008-03-18 2013-01-15 Audience, Inc. System and method for envelope-based acoustic echo cancellation
US9142221B2 (en) * 2008-04-07 2015-09-22 Cambridge Silicon Radio Limited Noise reduction
US8521530B1 (en) 2008-06-30 2013-08-27 Audience, Inc. System and method for enhancing a monaural audio signal
US8774423B1 (en) 2008-06-30 2014-07-08 Audience, Inc. System and method for controlling adaptivity of signal modification using a phantom coefficient
CN101770776B (en) 2008-12-29 2011-06-08 华为技术有限公司 Coding method and device, decoding method and device for instantaneous signal and processing system
US8582781B2 (en) 2009-01-20 2013-11-12 Koplar Interactive Systems International, L.L.C. Echo modulation methods and systems
US8715083B2 (en) 2009-06-18 2014-05-06 Koplar Interactive Systems International, L.L.C. Methods and systems for processing gaming data
USRE48462E1 (en) * 2009-07-29 2021-03-09 Northwestern University Systems, methods, and apparatus for equalization preference learning
CN102044241B (en) * 2009-10-15 2012-04-04 华为技术有限公司 Method and device for tracking background noise in communication system
US20110125497A1 (en) * 2009-11-20 2011-05-26 Takahiro Unno Method and System for Voice Activity Detection
US9008329B1 (en) 2010-01-26 2015-04-14 Audience, Inc. Noise reduction using multi-feature cluster tracker
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
US8745403B2 (en) 2011-11-23 2014-06-03 Verance Corporation Enhanced content management based on watermark extraction records
US8712076B2 (en) 2012-02-08 2014-04-29 Dolby Laboratories Licensing Corporation Post-processing including median filtering of noise suppression gains
US9173025B2 (en) 2012-02-08 2015-10-27 Dolby Laboratories Licensing Corporation Combined suppression of noise, echo, and out-of-location signals
US8726304B2 (en) 2012-09-13 2014-05-13 Verance Corporation Time varying evaluation of multimedia content
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
JP6059003B2 (en) * 2012-12-26 2017-01-11 パナソニック株式会社 Distortion compensation apparatus and distortion compensation method
US9262794B2 (en) 2013-03-14 2016-02-16 Verance Corporation Transactional video marking system
US9485089B2 (en) 2013-06-20 2016-11-01 Verance Corporation Stego key management
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US10504200B2 (en) 2014-03-13 2019-12-10 Verance Corporation Metadata acquisition using embedded watermarks
US9596521B2 (en) 2014-03-13 2017-03-14 Verance Corporation Interactive content acquisition using embedded codes
JP6134078B1 (en) * 2014-03-17 2017-05-24 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Noise suppression
WO2016028934A1 (en) 2014-08-20 2016-02-25 Verance Corporation Content management based on dither-like watermark embedding
US9799330B2 (en) 2014-08-28 2017-10-24 Knowles Electronics, Llc Multi-sourced noise suppression
US9942602B2 (en) 2014-11-25 2018-04-10 Verance Corporation Watermark detection and metadata delivery associated with a primary content
US9769543B2 (en) 2014-11-25 2017-09-19 Verance Corporation Enhanced metadata and content delivery using watermarks
WO2016100916A1 (en) 2014-12-18 2016-06-23 Verance Corporation Service signaling recovery for multimedia content using embedded watermarks
US10257567B2 (en) 2015-04-30 2019-04-09 Verance Corporation Watermark based content recognition improvements
US10477285B2 (en) 2015-07-20 2019-11-12 Verance Corporation Watermark-based data recovery for content with multiple alternative components
US20190132652A1 (en) 2016-04-18 2019-05-02 Verance Corporation System and method for signaling security and database population
US11297398B2 (en) 2017-06-21 2022-04-05 Verance Corporation Watermark-based metadata acquisition and processing
US11468149B2 (en) 2018-04-17 2022-10-11 Verance Corporation Device authentication in collaborative content screening
CN112562701B (en) * 2020-11-16 2023-03-28 华南理工大学 Heart sound signal double-channel self-adaptive noise reduction algorithm, device, medium and equipment
US11722741B2 (en) 2021-02-08 2023-08-08 Verance Corporation System and method for tracking content timeline in the presence of playback rate changes
CN115173971B (en) * 2022-07-08 2023-10-03 电信科学技术第五研究所有限公司 Broadband signal real-time detection method based on frequency spectrum data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0424016A2 (en) * 1989-10-18 1991-04-24 AT&T Corp. Perceptual coding of audio signals
US5535300A (en) * 1988-12-30 1996-07-09 At&T Corp. Perceptual coding of audio signals using entropy coding and/or multiple power spectra
US5682463A (en) * 1995-02-06 1997-10-28 Lucent Technologies Inc. Perceptual audio compression based on loudness uncertainty

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4628529A (en) * 1985-07-01 1986-12-09 Motorola, Inc. Noise suppression system
US4630304A (en) * 1985-07-01 1986-12-16 Motorola, Inc. Automatic background noise estimator for a noise suppression system
US4630305A (en) * 1985-07-01 1986-12-16 Motorola, Inc. Automatic gain selector for a noise suppression system
US4658426A (en) * 1985-10-10 1987-04-14 Harold Antin Adaptive noise suppressor
US4811404A (en) * 1987-10-01 1989-03-07 Motorola, Inc. Noise suppression system
US5450522A (en) * 1991-08-19 1995-09-12 U S West Advanced Technologies, Inc. Auditory model for parametrization of speech
FI92535C (en) * 1992-02-14 1994-11-25 Nokia Mobile Phones Ltd Noise reduction system for speech signals
US5432859A (en) * 1993-02-23 1995-07-11 Novatel Communications Ltd. Noise-reduction system
WO1995002288A1 (en) * 1993-07-07 1995-01-19 Picturetel Corporation Reduction of background noise for speech enhancement
IT1272653B (en) * 1993-09-20 1997-06-26 Alcatel Italia NOISE REDUCTION METHOD, IN PARTICULAR FOR AUTOMATIC SPEECH RECOGNITION, AND FILTER SUITABLE TO IMPLEMENT THE SAME
JPH08506434A (en) * 1993-11-30 1996-07-09 エイ・ティ・アンド・ティ・コーポレーション Transmission noise reduction in communication systems
JP3484757B2 (en) * 1994-05-13 2004-01-06 ソニー株式会社 Noise reduction method and noise section detection method for voice signal
US5544250A (en) * 1994-07-18 1996-08-06 Motorola Noise suppression system and method therefor
FR2726392B1 (en) * 1994-10-28 1997-01-10 Alcatel Mobile Comm France METHOD AND APPARATUS FOR SUPPRESSING NOISE IN A SPEAKING SIGNAL, AND SYSTEM WITH CORRESPONDING ECHO CANCELLATION
SE505156C2 (en) * 1995-01-30 1997-07-07 Ericsson Telefon Ab L M Procedure for noise suppression by spectral subtraction
US5659622A (en) * 1995-11-13 1997-08-19 Motorola, Inc. Method and apparatus for suppressing noise in a communication system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5535300A (en) * 1988-12-30 1996-07-09 At&T Corp. Perceptual coding of audio signals using entropy coding and/or multiple power spectra
EP0424016A2 (en) * 1989-10-18 1991-04-24 AT&T Corp. Perceptual coding of audio signals
US5682463A (en) * 1995-02-06 1997-10-28 Lucent Technologies Inc. Perceptual audio compression based on loudness uncertainty

Also Published As

Publication number Publication date
CN1326584A (en) 2001-12-12
JP2003517624A (en) 2003-05-27
US6122610A (en) 2000-09-19
KR100330230B1 (en) 2002-05-09
IL136090A0 (en) 2001-05-20
EP1116224A4 (en) 2003-06-25
CN1286788A (en) 2001-03-07
AU6007999A (en) 2000-04-10
BR9913011A (en) 2001-03-27
KR20010032390A (en) 2001-04-16
CA2310491A1 (en) 2000-03-30
KR20010075343A (en) 2001-08-09
AU6037899A (en) 2000-04-10
CA2344695A1 (en) 2000-03-30
WO2000017859A8 (en) 2000-07-20
WO2000017859A1 (en) 2000-03-30
EP1116224A1 (en) 2001-07-18

Similar Documents

Publication Publication Date Title
US6122610A (en) Noise suppression for low bitrate speech coder
US6415253B1 (en) Method and apparatus for enhancing noise-corrupted speech
RU2329550C2 (en) Method and device for enhancement of voice signal in presence of background noise
US6529868B1 (en) Communication system noise cancellation power signal calculation techniques
US6523003B1 (en) Spectrally interdependent gain adjustment techniques
US6766292B1 (en) Relative noise ratio weighting techniques for adaptive noise cancellation
EP1157377B1 (en) Speech enhancement with gain limitations based on speech activity
US6289309B1 (en) Noise spectrum tracking for speech enhancement
Arslan et al. New methods for adaptive noise suppression
Martin et al. New speech enhancement techniques for low bit rate speech coding
US6671667B1 (en) Speech presence measurement detection techniques
EP1386313B1 (en) Speech enhancement device
Jafer et al. Wavelet-based perceptual speech enhancement using adaptive threshold estimation.
Lin et al. Speech enhancement based on a perceptual modification of Wiener filtering
Krini et al. Model-based speech enhancement for automotive applications
Jax et al. A noise suppression system for the AMR speech codec

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 136090

Country of ref document: IL

Ref document number: 99801661.6

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase

Ref document number: 2310491

Country of ref document: CA

Ref document number: 2310491

Country of ref document: CA

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 60079/99

Country of ref document: AU

Ref document number: 1020007005629

Country of ref document: KR

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1020007005629

Country of ref document: KR

122 Ep: pct application non-entry in european phase
WWG Wipo information: grant in national office

Ref document number: 1020007005629

Country of ref document: KR