WO2007130766A2 - Narrow band noise reduction for speech enhancement - Google Patents

Narrow band noise reduction for speech enhancement Download PDF

Info

Publication number
WO2007130766A2
WO2007130766A2 PCT/US2007/065701 US2007065701W WO2007130766A2 WO 2007130766 A2 WO2007130766 A2 WO 2007130766A2 US 2007065701 W US2007065701 W US 2007065701W WO 2007130766 A2 WO2007130766 A2 WO 2007130766A2
Authority
WO
WIPO (PCT)
Prior art keywords
signal
narrow band
console
noise
instructions
Prior art date
Application number
PCT/US2007/065701
Other languages
French (fr)
Other versions
WO2007130766A3 (en
Inventor
Xiadong Mao
Original Assignee
Sony Computer Entertainment Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/381,724 external-priority patent/US8073157B2/en
Priority claimed from US11/418,988 external-priority patent/US8160269B2/en
Priority claimed from US11/418,989 external-priority patent/US8139793B2/en
Priority claimed from US11/381,729 external-priority patent/US7809145B2/en
Priority claimed from US11/381,727 external-priority patent/US7697700B2/en
Priority claimed from US11/381,728 external-priority patent/US7545926B2/en
Priority claimed from PCT/US2006/017483 external-priority patent/WO2006121896A2/en
Priority claimed from US11/381,725 external-priority patent/US7783061B2/en
Priority claimed from US11/381,721 external-priority patent/US8947347B2/en
Priority claimed from US11/429,047 external-priority patent/US8233642B2/en
Application filed by Sony Computer Entertainment Inc. filed Critical Sony Computer Entertainment Inc.
Priority to JP2009509909A priority Critical patent/JP4866958B2/en
Priority to EP07759884A priority patent/EP2012725A4/en
Publication of WO2007130766A2 publication Critical patent/WO2007130766A2/en
Publication of WO2007130766A3 publication Critical patent/WO2007130766A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02163Only one microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • Embodiments of the present invention are directed to audio signal processing and more particularly to removal of console noise in a device having a microphone located on a device console.
  • consoles that include various user controls and inputs.
  • many consumer electronic devices utilize a console that includes various user controls and inputs.
  • a microphone is typically a conventional omni-directional microphone having no preferred listening direction.
  • noise sources such as cooling fans, hard-disk drives, CD-ROM drives and digital video disk (DVD) drives.
  • a microphone located on the console would pick up noise from these sources. Since these noise sources are often located quite close to the microphone(s) they can greatly interfere with desired sound inputs, e.g., user voice commands. To address this problem techniques for filtering out noise from these sources have been implemented in these devices.
  • Embodiments of the invention are directed to reduction of noise in a device having a console with one or more microphones and a source of narrow band distributed noise located on the console.
  • a microphone signal containing a broad band distributed desired sound and narrow band distributed noise is divided amongst a plurality of frequency bins. For each frequency bin, it is determined whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the source of narrow band noise located on the console. Any frequency bins containing portions of the signal belonging to the narrow band distribution are filtered to reduce the narrow band noise.
  • FIG. 1 is a schematic diagram of an electronic device according to an embodiment of the present invention.
  • FIG. 2 is a flow diagram of a method for reduction of noise in a device of the type shown in FIG. 1.
  • FIGs. 3A-3B are graphs of microphone signal as a function of frequency illustrating reduction of narrow band noise according to embodiments of the present invention.
  • FIGs. 4A-4B are graphs of microphone signals for different microphones as a function of frequency illustrating reduction of narrow band noise according to alternative embodiments of the present invention.
  • an electronic device 100 includes a console 102 having one or more microphones 104A, 104B.
  • the term console generally refers to a stand-alone unit containing electronic components that perform computation and/or signal processing functions.
  • the console may receive inputs from one or more input external devices, e.g., a joystick 106, and provide outputs to one or more output external devices such as a monitor 108.
  • the console 102 may include a central processor unit 110 and memory 112.
  • the console may include an optional fan 114 to provide cooling of the console components.
  • the console 102 may be a console for a video game system, such as a Sony PlayStation®, a cable television set top box, a digital video recorder, such as a TiVo® digital video recorder available from TiVo Inc. of Alviso, California.
  • a video game system such as a Sony PlayStation®
  • a cable television set top box such as a cable television set top box
  • a digital video recorder such as a TiVo® digital video recorder available from TiVo Inc. of Alviso, California.
  • the processor unit 110 and memory 112 may be coupled to each other via a system bus 116.
  • the microphones 104A, 104B may be coupled to the processor and/or memory through input/output (VO) elements 118.
  • VO generally refers to any program, operation or device that transfers data to or from the console 100 and to or from a peripheral device. Every data transfer may be regarded as an output from one device and an input into another.
  • the device 100 may include one or more additional peripheral units which may be internal to the console 102 or external to it.
  • Peripheral devices include input-only devices, such as keyboards and mouses, output-only devices, such as printers as well as devices such as a writable CD-ROM that can act as both an input and an output device.
  • peripheral device includes external devices, such as a mouse, keyboard, printer, monitor, microphone, game controller, camera, external Zip drive or scanner as well as internal devices, e.g., a disk drive 120 such as a CD-ROM drive, CD-R drive, hard disk drive or DVD drive, an internal modem other peripheral such as a flash memory reader/writer, hard drive.
  • the console includes at least one source of narrow-band distributed noise such as the disk drive 120.
  • Narrow band noise from the disk drive 120 may be filtered from digital signal data generated from microphone inputs X A OO, X B (O SO that desired sounds, e.g., voice, from a remote source 101 are not drowned out by the sound of the disk drive 120.
  • the narrow band noise may be characterized by a gamma distribution.
  • the desired sound from the source 101 is preferably characterized by a broad band probability density function distribution such as a Gaussian-distributed probability density function.
  • the memory 112 may contain coded instructions 113 that can be executed by the processor 110 and/or data 115 that facilitate removal of the narrow band disk drive noise.
  • the data 115 may include a distribution function generated from training data of many hours of recording of sounds from disk drive.
  • the distribution function may be stored in the form of a lookup table.
  • the coded instructions 113 may implement a method 200 for reducing narrow band noise in a device of the type shown in FIG. 1.
  • a signal from one or more of the console microphone input signals 104A, 104B is divided into frequency bins, as indicated at 202.
  • Dividing the signal into a plurality of frequency bins may include capturing a time-windowed portion of the signal (e.g., microphone signal X A (X)), converting the time- windowed portion to a frequency domain signal x(f) (e.g., using a fast Fourier transform) and dividing the frequency domain signal amongst the frequency bins.
  • a time-windowed portion of the signal e.g., microphone signal X A (X)
  • converting the time- windowed portion to a frequency domain signal x(f)
  • x(f) e.g., using a fast Fourier transform
  • approximately 32 ms of microphone data may be stored in a buffer for classification into frequency bins.
  • each frequency bin it is determined whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the narrow band disk drive noise as indicated at 204. Any frequency bins containing portions of the signal belonging to the narrow band distribution are filtered from the input signal and indicated at 206.
  • the frequency domain signal x(f) may be regarded as a combination of a broadband signal 302 and a narrow band signal 304.
  • each bin contains a value corresponding to a portion of the broadband signal 302 and a portion of the narrow band signal 304.
  • the portion of the signal x(f) in a given frequency bin 306 due to the narrow band signal 304 may be estimated from the training data. This portion may be subtracted from the value in the frequency bin 306 to filter out the narrow band noise from that bin.
  • the narrow band signal 304 may be estimated as follows. First narrow band signal samples may be collected in a large volume to train its distribution model. Distribution models are widely known to those of skill in the pattern recognition arts, such as speech modeling. The distribution model for the narrow band signal 304 is similar to those used in speech modeling with a few exceptions. Specifically, unlike speech, which is considered broadband with a Gaussian distribution, the narrow band noise on in the narrow band signal 304 has a "Gamma" distribution density function. The distribution model is known as a "Gamma- Mixture-Model". Speech applications, such as speaker/language identification, by comparison usually use a "Gaussian-Mixture-Model". The two models are quite similar. The underlying distribution function is the only significant difference.
  • the model training procedure follows an "Estimate-Maximize” (EM) algorithm, which is widely available in speech modeling.
  • EM Estimatimate-Maximize
  • the EM algorithm is an iterative likelihood maximization method, which estimates a set of model parameters from a training data set.
  • a feature vector is generated directly from a logarithm of power-spectrum.
  • a speech model usually applies further compression, such as DCT or cepstrum-coeficient. This is because the signal of interest is narrow band, and band averaging that possibly has attenuation in broadband background is not desired.
  • the model is utilized to estimate a narrow-band noise power spectrum density (PSD).
  • PSD narrow-band noise power spectrum density
  • An algorithm for such a model may proceed as follows:
  • the signal x(t) is transformed from the time domain to the frequency domain.
  • X(k) fft(x(t)), where k is a frequency index.
  • a feature vector V(k) is obtained from the logarithm of power spectrum.
  • V(k) log(S yy (k))
  • feature Vector is a common term in pattern recognition. Essentially any pattern matching includes 1) a pre-trained model that defines the distribution in priori feature space, and 2) runtime observed feature vectors. The task is to match the feature vector against the model. Given a prior trained gamma ⁇ Model>, the narrow-band noise presence probability ⁇ P n (k)> may be obtained for this observed feature V(k).
  • the narrow-band noise PSD is adaptively updated:
  • the filtering may take advantage of the presence of two or more microphones 104A, 104B on the console 102. If there are two microphones 104A, 104B on the console 102 one of them (104B) may be closer to the disk drive than the other (104A). As a result there is a difference in the time of arrival of the noise from the disk drive 120 for the microphone input signals X A Q) and X ⁇ (t). The difference in time of arrival results in different frequency distributions for the input signals when they are frequency converted to X A (Q, X ⁇ (f) as illustrated in FIGs. 4A-4B.
  • the frequency distribution of broadband sound from remote a sources will not be significantly different for X A (Q, X B (Q- However the frequency distribution for the narrow band signal 304A from microphone 104A will be frequency shifted relative to the frequency distribution 304B from microphone 104B.
  • the narrow band noise contribution to the frequency bins 306 can be determined by generating a feature vector V(k) from the frequency domain signals X A (Q, X ⁇ (f) from the two microphones 104A, 104B.
  • a first feature vector V(k,A) is generated from the power spectrum S yy (k,A) for microphone 104A:
  • V(k,A) log(S yy (k,A))
  • a second feature vector V(k,B) is generated from the power spectrum S yy (k,B) for microphone 104B:
  • V(k,B) log(S yy (k,B))
  • V(k) is then obtained from a simple concatenation of V(k,A) and V(k,B)
  • V(k) [V(k,l), V(k,2)]
  • Embodiments of the present invention may be used as presented herein or in combination with other user input mechanisms and notwithstanding mechanisms that track or profile the angular direction or volume of sound and/or mechanisms that track the position of the object actively or passively, mechanisms using machine vision, combinations thereof and where the object tracked may include ancillary controls or buttons that manipulate feedback to the system and where such feedback may include but is not limited light emission from light sources, sound distortion means, or other suitable transmitters and modulators as well as controls, buttons, pressure pad, etc. that may influence the transmission or modulation of the same, encode state, and/or transmit commands from or to a device, including devices that are tracked by the system and whether such devices are part of, interacting with or influencing a system used in connection with embodiments of the present invention.

Abstract

Reduction of noise in a device having a console with one or more microphones and a source of narrow band distributed noise located on the console is disclosed. A microphone signal containing a broad band distributed desired sound and narrow band distributed noise is divided amongst a plurality of frequency bins. For each frequency bin, it is determined whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the source of narrow band noise located on the console. Any frequency bins containing portions of the signal belonging to the narrow band distribution are filtered to reduce the narrow band noise.

Description

NOISE REMOVAL FOR ELECTRONIC DEVICE WITH FAR FIELD MICROPHONE ON CONSOLE
CLAIM OF PRIORITY
This application also claims the benefit of commonly-assigned, co-pending application number 11/381 ,727, to Xiao Dong Mao, entitled "NOISE REMOVAL FOR ELECTRONIC DEVICE WITH FAR FIELD MICROPHONE ON CONSOLE", (Attorney Docket SCEA05073US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application claims the benefit of commonly-assigned, co-pending application number 11/381,729, to Xiao Dong Mao, entitled ULTRA SMALL MICROPHONE ARRAY, (Attorney Docket SCEA05062US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/381,728, to Xiao Dong Mao, entitled ECHO AND NOISE CANCELLATION, (Attorney Docket SCEA05064US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claim the benefit of commonly-assigned, co-pending application number 11/381,725, to Xiao Dong Mao, entitled "METHODS AND APPARATUS FOR TARGETED SOUND DETECTION", (Attorney Docket SCEA05072US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/381,724, to Xiao Dong Mao, entitled "METHODS AND APPARATUS FOR TARGETED SOUND DETECTION AND CHARACTERIZATION", (Attorney Docket SCEA05079US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/381,721, to Xiao Dong Mao, entitled "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING", (Attorney Docket SCEA04005 JUMBOUS), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly- assigned, co-pending International Patent Application number PCT/US06/17483, to Xiao Dong Mao, entitled "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING", (Attorney Docket
SCEA04005 JUMBOPCT), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly- assigned, co-pending application number 11/418,988, to Xiao Dong Mao, entitled "METHODS AND APPARATUSES FOR ADJUSTING A LISTENING AREA FOR CAPTURING SOUNDS", (Attorney Docket SCEA-00300) filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/418,989, to Xiao Dong Mao, entitled "METHODS AND APPARATUSES FOR CAPTURING AN AUDIO
SIGNAL BASED ON VISUAL IMAGE", (Attorney Docket SCEA-00400), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/429,047, to Xiao Dong Mao, entitled "METHODS AND APPARATUSES FOR CAPTURING AN AUDIO SIGNAL BASED ON A LOCATION OF THE SIGNAL", (Attorney Docket SCEA- 00500), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference.
FIELD OF THE INVENTION
Embodiments of the present invention are directed to audio signal processing and more particularly to removal of console noise in a device having a microphone located on a device console.
BACKGROUND OF THE INVENTION
Many consumer electronic devices utilize a console that includes various user controls and inputs. In many applications, such as video game consoles, cable television set top boxes and digital video recorders it is desirable to incorporate a microphone into the console. To reduce cost the microphone is typically a conventional omni-directional microphone having no preferred listening direction. Unfortunately, such electronic device consoles also contain noise sources, such as cooling fans, hard-disk drives, CD-ROM drives and digital video disk (DVD) drives. A microphone located on the console would pick up noise from these sources. Since these noise sources are often located quite close to the microphone(s) they can greatly interfere with desired sound inputs, e.g., user voice commands. To address this problem techniques for filtering out noise from these sources have been implemented in these devices.
Most previous techniques have been effective in filtering out broad band distributed noise. For example, fan noise is Gaussian distributed and therefore distributed over a broad band of frequencies. Such noise can be simulated with a Gaussian and cancelled out from the input signal to the microphone on the console. Noise from a disk drive, e.g., a hard disk or DVD drive is characterized by a narrow-band frequency distribution such as a gamma-distribution or a narrow band Laplacian distribution. Unfortunately, deterministic methods that work with Gaussian noise are not suitable for removal of gamma-distributed noise.
Thus, there is a need in the art, for a noise reduction technique that overcomes the above disadvantages.
SUMMARY OF THE INVENTION
Embodiments of the invention are directed to reduction of noise in a device having a console with one or more microphones and a source of narrow band distributed noise located on the console. A microphone signal containing a broad band distributed desired sound and narrow band distributed noise is divided amongst a plurality of frequency bins. For each frequency bin, it is determined whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the source of narrow band noise located on the console. Any frequency bins containing portions of the signal belonging to the narrow band distribution are filtered to reduce the narrow band noise.
BRIEF DESCRIPTION OF THE DRAWINGS The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of an electronic device according to an embodiment of the present invention.
FIG. 2 is a flow diagram of a method for reduction of noise in a device of the type shown in FIG. 1.
FIGs. 3A-3B are graphs of microphone signal as a function of frequency illustrating reduction of narrow band noise according to embodiments of the present invention.
FIGs. 4A-4B are graphs of microphone signals for different microphones as a function of frequency illustrating reduction of narrow band noise according to alternative embodiments of the present invention.
DESCRIPTION OF THE SPECIFIC EMBODIMENTS
Although the following detailed description contains many specific details for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the exemplary embodiments of the invention described below are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
As depicted in FIG. 1 an electronic device 100 according to an embodiment of the present invention includes a console 102 having one or more microphones 104A, 104B. As used herein, the term console generally refers to a stand-alone unit containing electronic components that perform computation and/or signal processing functions. The console may receive inputs from one or more input external devices, e.g., a joystick 106, and provide outputs to one or more output external devices such as a monitor 108. The console 102 may include a central processor unit 110 and memory 112. The console may include an optional fan 114 to provide cooling of the console components. By way of example, the console 102 may be a console for a video game system, such as a Sony PlayStation®, a cable television set top box, a digital video recorder, such as a TiVo® digital video recorder available from TiVo Inc. of Alviso, California.
The processor unit 110 and memory 112 may be coupled to each other via a system bus 116. The microphones 104A, 104B may be coupled to the processor and/or memory through input/output (VO) elements 118. As used herein, the term VO generally refers to any program, operation or device that transfers data to or from the console 100 and to or from a peripheral device. Every data transfer may be regarded as an output from one device and an input into another.
The device 100 may include one or more additional peripheral units which may be internal to the console 102 or external to it. Peripheral devices include input-only devices, such as keyboards and mouses, output-only devices, such as printers as well as devices such as a writable CD-ROM that can act as both an input and an output device. The term "peripheral device" includes external devices, such as a mouse, keyboard, printer, monitor, microphone, game controller, camera, external Zip drive or scanner as well as internal devices, e.g., a disk drive 120 such as a CD-ROM drive, CD-R drive, hard disk drive or DVD drive, an internal modem other peripheral such as a flash memory reader/writer, hard drive.
The console includes at least one source of narrow-band distributed noise such as the disk drive 120. Narrow band noise from the disk drive 120 may be filtered from digital signal data generated from microphone inputs XAOO, XB(O SO that desired sounds, e.g., voice, from a remote source 101 are not drowned out by the sound of the disk drive 120. The narrow band noise may be characterized by a gamma distribution. The desired sound from the source 101 is preferably characterized by a broad band probability density function distribution such as a Gaussian-distributed probability density function.
The memory 112 may contain coded instructions 113 that can be executed by the processor 110 and/or data 115 that facilitate removal of the narrow band disk drive noise. Specifically, the data 115 may include a distribution function generated from training data of many hours of recording of sounds from disk drive. The distribution function may be stored in the form of a lookup table.
The coded instructions 113 may implement a method 200 for reducing narrow band noise in a device of the type shown in FIG. 1. According to the method 200 a signal from one or more of the console microphone input signals 104A, 104B is divided into frequency bins, as indicated at 202. Dividing the signal into a plurality of frequency bins may include capturing a time-windowed portion of the signal (e.g., microphone signal XA(X)), converting the time- windowed portion to a frequency domain signal x(f) (e.g., using a fast Fourier transform) and dividing the frequency domain signal amongst the frequency bins. By way of example, approximately 32 ms of microphone data may be stored in a buffer for classification into frequency bins. For each frequency bin it is determined whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the narrow band disk drive noise as indicated at 204. Any frequency bins containing portions of the signal belonging to the narrow band distribution are filtered from the input signal and indicated at 206.
Filtering the input signal may be understood with respect to FIGs. 3A-3B. Specifically, as shown in FIG. 3 A, the frequency domain signal x(f) may be regarded as a combination of a broadband signal 302 and a narrow band signal 304. When these signals are divided into frequency bins 306, as shown in FIG. 3B, each bin contains a value corresponding to a portion of the broadband signal 302 and a portion of the narrow band signal 304. The portion of the signal x(f) in a given frequency bin 306 due to the narrow band signal 304 (indicated by the dashed bars in FIG. 3B) may be estimated from the training data. This portion may be subtracted from the value in the frequency bin 306 to filter out the narrow band noise from that bin. The narrow band signal 304 may be estimated as follows. First narrow band signal samples may be collected in a large volume to train its distribution model. Distribution models are widely known to those of skill in the pattern recognition arts, such as speech modeling. The distribution model for the narrow band signal 304 is similar to those used in speech modeling with a few exceptions. Specifically, unlike speech, which is considered broadband with a Gaussian distribution, the narrow band noise on in the narrow band signal 304 has a "Gamma" distribution density function. The distribution model is known as a "Gamma- Mixture-Model". Speech applications, such as speaker/language identification, by comparison usually use a "Gaussian-Mixture-Model". The two models are quite similar. The underlying distribution function is the only significant difference. The model training procedure follows an "Estimate-Maximize" (EM) algorithm, which is widely available in speech modeling. The EM algorithm is an iterative likelihood maximization method, which estimates a set of model parameters from a training data set. A feature vector is generated directly from a logarithm of power-spectrum. By contrast, a speech model usually applies further compression, such as DCT or cepstrum-coeficient. This is because the signal of interest is narrow band, and band averaging that possibly has attenuation in broadband background is not desired. In real-time, the model is utilized to estimate a narrow-band noise power spectrum density (PSD).
An algorithm for such a model may proceed as follows:
First, the signal x(t) is transformed from the time domain to the frequency domain.
X(k) = fft(x(t)), where k is a frequency index.
Next, a power spectrum is obtained from the frequency domain signal X(k).
Syy(k) = X(k) .* conj(X(k)), where "conj" refers to the complex conjugate.
Next, a feature vector V(k) is obtained from the logarithm of power spectrum.
V(k) = log(Syy(k))
The term "feature Vector" is a common term in pattern recognition. Essentially any pattern matching includes 1) a pre-trained model that defines the distribution in priori feature space, and 2) runtime observed feature vectors. The task is to match the feature vector against the model. Given a prior trained gamma <Model>, the narrow-band noise presence probability <Pn(k)> may be obtained for this observed feature V(k).
Pn (k) = Gamma (Model, VQc))
The narrow-band noise PSD is adaptively updated:
Sm(k) = { α*Snn(k) + (1 - α) * Syy(k) } * PnQc) + Sm (k) * (1 - PnQc) )
If Pn(k) is zero, that is no narrow-band noise is presented, the Snn(k) does not change. If PnQc) = 1, that is this frequency <k> is entirely narrow-band noise, then:
Snn(k) = α * Snn(k) + (1 - O) * Syy(k)
This is essentially a statistical periodgram averaging, where α is a smoothing factor.
Given the estimated noise PSD, it is thus straightforward to estimate the clean voice signal. An example of an algorithm for performing such an estimation is based on the well-known MMSE estimator, which is described by Y. Ephraim and D. Malah, in "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust, Speech, Signal Processing, Vol. ASSP-32, pp, 1109-1121, Dec. 1984 and Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log- spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP- 33, pp, 443-445, Apr. 1985, the disclosures of both of which are incorporated herein by reference.
In alternative embodiments, the filtering may take advantage of the presence of two or more microphones 104A, 104B on the console 102. If there are two microphones 104A, 104B on the console 102 one of them (104B) may be closer to the disk drive than the other (104A). As a result there is a difference in the time of arrival of the noise from the disk drive 120 for the microphone input signals XAQ) and Xβ(t). The difference in time of arrival results in different frequency distributions for the input signals when they are frequency converted to XA(Q, Xβ(f) as illustrated in FIGs. 4A-4B. The frequency distribution of broadband sound from remote a sources, by contrast, will not be significantly different for XA(Q, XB(Q- However the frequency distribution for the narrow band signal 304A from microphone 104A will be frequency shifted relative to the frequency distribution 304B from microphone 104B. The narrow band noise contribution to the frequency bins 306 can be determined by generating a feature vector V(k) from the frequency domain signals XA(Q, Xβ(f) from the two microphones 104A, 104B.
By way of example, a first feature vector V(k,A) is generated from the power spectrum Syy(k,A) for microphone 104A:
V(k,A) = log(Syy(k,A))
A second feature vector V(k,B) is generated from the power spectrum Syy(k,B) for microphone 104B:
V(k,B) = log(Syy(k,B))
The feature vector V(k) is then obtained from a simple concatenation of V(k,A) and V(k,B)
V(k) = [V(k,l), V(k,2)]
The rest model training, real-time detection, they are the same, except now the model size and feature vector dimension are doubled. Although the above technique uses neither array beam forming, nor anything that depends on time-difference-arrival the spatial information is actually implicitly included in the trained model and runtime feature vectors, they can greatly improve detection accuracy.
Embodiments of the present invention may be used as presented herein or in combination with other user input mechanisms and notwithstanding mechanisms that track or profile the angular direction or volume of sound and/or mechanisms that track the position of the object actively or passively, mechanisms using machine vision, combinations thereof and where the object tracked may include ancillary controls or buttons that manipulate feedback to the system and where such feedback may include but is not limited light emission from light sources, sound distortion means, or other suitable transmitters and modulators as well as controls, buttons, pressure pad, etc. that may influence the transmission or modulation of the same, encode state, and/or transmit commands from or to a device, including devices that are tracked by the system and whether such devices are part of, interacting with or influencing a system used in connection with embodiments of the present invention.
While the above is a complete description of the preferred embodiment of the present invention, it is possible to use various alternatives, modifications and equivalents. Therefore, the scope of the present invention should be determined not with reference to the above description but should, instead, be determined with reference to the appended claims, along with their full scope of equivalents. Any feature described herein, whether preferred or not, may be combined with any other feature described herein, whether preferred or not. In the claims that follow, the indefinite article "A", or "An" refers to a quantity of one or more of the item following the article, except where expressly stated otherwise. The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase "means for."

Claims

WHAT IS CLAIMED IS:
L A method for reduction of noise in a device having a console with one or more microphones and a source of narrow band distributed noise located on the console, the method comprising: obtaining a signal from the one or more microphones containing a broad band distributed desired sound and narrow band distributed noise from the source located on the console; dividing the signal amongst a plurality of frequency bins; for each frequency bin, determining whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the source of narrow band noise located on the console; and filtering from the signal any frequency bins containing portions of the signal belonging to the narrow band distribution.
2. The method of claim 1, wherein determining whether a portion of the signal within the frequency bin belongs to the narrow band distribution includes comparing a value corresponding to the portion of the signal in the frequency bin to a stored value for that frequency bin derived from a known signal from the source of narrow band noise located on the console.
3. The method of claim 1 , wherein the one or more microphones include a first microphone and a second microphone, wherein, obtaining a signal from the one or more microphones includes obtaining a first signal from the first microphone and obtaining a second signal from the second microphone, wherein determining whether a portion of the signal within the frequency bin belongs to the narrow band distribution includes determining a first vector feature from the first signal and obtaining a second vector feature from the second signal, concatenating the first and second signals to form a combined vector feature and matching the combined feature vector against a model.
4. The method of claim 1 , wherein dividing the signal amongst a plurality of frequency bins includes capturing a time-windowed portion of the signal, converting the time-windowed portion to a frequency domain signal and dividing the frequency domain signal amongst the plurality of frequency bins.
5. The method of claim 1 wherein the broad band distributed desired sound is a voice sound.
6. The method of claim 1 wherein the source of narrow band distributed noise is a disk drive.
7. The method of claim 1 wherein the broad band distributed desired sound is characterized by a Gaussian-distributed probability density function.
8. The method of claim 1 wherein the narrow band noise is characterized by a gamma- distributed probability density function.
9. An electronic device, comprising: a console; one or more microphones located on the console; a source of narrow band distributed noise located on the console; a processor coupled to the microphone; a memory coupled to the processor, the memory having embodied therein a set of processor readable instructions for implementing a method for reduction of noise, the processor readable instructions including: instructions which, when executed, cause the device to obtain a signal from the one or more microphones containing a broad band distributed desired sound and narrow band distributed noise from the source located on the console; instructions which, when executed, divide the signal amongst a plurality of frequency bins; instructions which, when executed, determine, for each frequency bin, whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the source of narrow band noise located on the console; and instructions which, when executed, filter from the signal any frequency bins containing portions of the signal belonging to the narrow band distribution.
10. The device of claim 9, wherein the instructions which, when executed, determine whether a portion of the signal within the frequency bin belongs to the narrow band distribution include one or more instructions which, when executed, compare a value corresponding to the portion of the signal in the frequency bin to a stored value for that frequency bin derived from a known signal from the source of narrow band noise located on the console.
11. The device of claim 10 further comprising a look-up table stored in the memory, wherein the look-up table contains the stored value.
12. The device of claim 9, wherein the one or more microphones include a first microphone and a second microphone.
13. The device of claim 9 wherein the instructions which, when executed, obtain a signal from the one or more microphones include one or more instructions which, when executed cause the device to obtain a first signal from the first microphone and obtain a second signal from the second microphone, wherein determining whether a portion of the signal within the frequency bin belongs to the narrow band distribution includes determining a first vector feature from the first signal and obtaining a second vector feature from the second signal, concatenating the first and second signals to form a combined vector feature and matching the combined feature vector against a model.
14. The device of claim 9 wherein instructions which, when executed, divide the signal amongst a plurality of frequency bins include instructions which, when executed, directed the device to capture a time-windowed portion of the signal, converting the time- windowed portion to a frequency domain signal and divide the frequency domain signal amongst the plurality of frequency bins.
15. The device of claim 9 wherein the broad band distributed desired sound is a voice sound.
16. The device of claim 9 wherein the source of narrow band distributed noise is a disk drive.
17. The device of claim 9 wherein the broad band distributed desired sound is characterized by a Gaussian-distributed probability density function.
18. The device of claim 9 wherein the narrow band noise is characterized by a gamma- distributed probability density function.
19. The device of claim 9, wherein the console is a video game console.
20. The device of claim 9 wherein the console is a cable television set top box or a digital video recorder.
21. A processor readable medium having embodied therein a set of processor readable instructions for implementing a method for reduction of noise in an electronic device having a console, one or more microphones located on the console, a source of narrow band distributed noise located on the console, a processor coupled to the microphone and a memory coupled to the processor, the processor readable instructions including: instructions which, when executed, cause the device to obtain a signal from the one or more microphones containing a broad band distributed desired sound and narrow band distributed noise from the source located on the console; instructions which, when executed, divide the signal amongst a plurality of frequency bins; instructions which, when executed, determine, for each frequency bin, whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the source of narrow band noise located on the console; and instructions which, when executed, filter from an output signal any frequency bins containing portions of the signal belonging to the narrow band distribution.
PCT/US2007/065701 2006-05-04 2007-03-30 Narrow band noise reduction for speech enhancement WO2007130766A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2009509909A JP4866958B2 (en) 2006-05-04 2007-03-30 Noise reduction in electronic devices with farfield microphones on the console
EP07759884A EP2012725A4 (en) 2006-05-04 2007-03-30 Narrow band noise reduction for speech enhancement

Applications Claiming Priority (20)

Application Number Priority Date Filing Date Title
US11/381,727 2006-05-04
US11/381,728 US7545926B2 (en) 2006-05-04 2006-05-04 Echo and noise cancellation
US11/381,721 US8947347B2 (en) 2003-08-27 2006-05-04 Controlling actions in a video game unit
US11/381,729 2006-05-04
US11/429,047 US8233642B2 (en) 2003-08-27 2006-05-04 Methods and apparatuses for capturing an audio signal based on a location of the signal
US11/381,725 2006-05-04
US11/418,988 2006-05-04
US11/429,047 2006-05-04
US11/381,725 US7783061B2 (en) 2003-08-27 2006-05-04 Methods and apparatus for the targeted sound detection
US11/381,724 US8073157B2 (en) 2003-08-27 2006-05-04 Methods and apparatus for targeted sound detection and characterization
US11/418,988 US8160269B2 (en) 2003-08-27 2006-05-04 Methods and apparatuses for adjusting a listening area for capturing sounds
US11/381,728 2006-05-04
PCT/US2006/017483 WO2006121896A2 (en) 2005-05-05 2006-05-04 Microphone array based selective sound source listening and video game control
US11/381,721 2006-05-04
US11/381,727 US7697700B2 (en) 2006-05-04 2006-05-04 Noise removal for electronic device with far field microphone on console
US11/381,724 2006-05-04
USPCT/US2006/017483 2006-05-04
US11/381,729 US7809145B2 (en) 2006-05-04 2006-05-04 Ultra small microphone array
US11/418,989 US8139793B2 (en) 2003-08-27 2006-05-04 Methods and apparatus for capturing audio signals based on a visual image
US11/418,989 2006-05-04

Publications (2)

Publication Number Publication Date
WO2007130766A2 true WO2007130766A2 (en) 2007-11-15
WO2007130766A3 WO2007130766A3 (en) 2008-09-04

Family

ID=56290936

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2007/065701 WO2007130766A2 (en) 2006-05-04 2007-03-30 Narrow band noise reduction for speech enhancement
PCT/US2007/065686 WO2007130765A2 (en) 2006-05-04 2007-03-30 Echo and noise cancellation

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US2007/065686 WO2007130765A2 (en) 2006-05-04 2007-03-30 Echo and noise cancellation

Country Status (3)

Country Link
EP (2) EP2012725A4 (en)
JP (3) JP4476355B2 (en)
WO (2) WO2007130766A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010106734A1 (en) * 2009-03-18 2010-09-23 日本電気株式会社 Audio signal processing device
JP2010249939A (en) * 2009-04-13 2010-11-04 Sony Corp Noise reducing device and noise determination method
CN109166589A (en) * 2018-08-13 2019-01-08 深圳市腾讯网络信息技术有限公司 Using sound suppressing method, device, medium and equipment

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4964267B2 (en) * 2009-04-03 2012-06-27 有限会社ケプストラム Adaptive filter and echo canceller having the same
WO2013179464A1 (en) * 2012-05-31 2013-12-05 トヨタ自動車株式会社 Audio source detection device, noise model generation device, noise reduction device, audio source direction estimation device, approaching vehicle detection device and noise reduction method
US11837248B2 (en) 2019-12-18 2023-12-05 Dolby Laboratories Licensing Corporation Filter adaptation step size control for echo cancellation
CN113689871A (en) * 2020-05-19 2021-11-23 阿里巴巴集团控股有限公司 Echo cancellation method and device
CN112017679B (en) * 2020-08-05 2024-01-26 海尔优家智能科技(北京)有限公司 Method, device and equipment for updating adaptive filter coefficients

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4802227A (en) 1987-04-03 1989-01-31 American Telephone And Telegraph Company Noise reduction processing arrangement for microphone arrays
US5550924A (en) 1993-07-07 1996-08-27 Picturetel Corporation Reduction of background noise for speech enhancement

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3135937B2 (en) * 1991-05-16 2001-02-19 株式会社リコー Noise removal device
JP3110201B2 (en) * 1993-04-16 2000-11-20 沖電気工業株式会社 Noise removal device
US5806025A (en) * 1996-08-07 1998-09-08 U S West, Inc. Method and system for adaptive filtering of speech signals using signal-to-noise ratio to choose subband filter bank
SE9700772D0 (en) * 1997-03-03 1997-03-03 Ericsson Telefon Ab L M A high resolution post processing method for a speech decoder
FR2771542B1 (en) * 1997-11-21 2000-02-11 Sextant Avionique FREQUENTIAL FILTERING METHOD APPLIED TO NOISE NOISE OF SOUND SIGNALS USING A WIENER FILTER
DE19806015C2 (en) * 1998-02-13 1999-12-23 Siemens Ag Process for improving acoustic attenuation in hands-free systems
US6263078B1 (en) * 1999-01-07 2001-07-17 Signalworks, Inc. Acoustic echo canceller with fast volume control compensation
WO2000049602A1 (en) * 1999-02-18 2000-08-24 Andrea Electronics Corporation System, method and apparatus for cancelling noise
US6426979B1 (en) * 1999-04-29 2002-07-30 Legerity, Inc. Adaptation control algorithm for echo cancellation using signal-value based analysis
US6526139B1 (en) * 1999-11-03 2003-02-25 Tellabs Operations, Inc. Consolidated noise injection in a voice processing system
JP3358731B2 (en) * 2000-04-24 2002-12-24 株式会社富建設 Nursing equipment
US7139401B2 (en) * 2002-01-03 2006-11-21 Hitachi Global Storage Technologies B.V. Hard disk drive with self-contained active acoustic noise reduction
JP2003284181A (en) * 2002-03-20 2003-10-03 Matsushita Electric Ind Co Ltd Sound collection apparatus
DE10305369B4 (en) * 2003-02-10 2005-05-19 Siemens Ag User-adaptive method for noise modeling
US6947549B2 (en) * 2003-02-19 2005-09-20 The Hong Kong Polytechnic University Echo canceller
US7885420B2 (en) * 2003-02-21 2011-02-08 Qnx Software Systems Co. Wind noise suppression system
JP4227529B2 (en) * 2004-01-06 2009-02-18 パナソニック株式会社 Periodic noise suppressor
US7254535B2 (en) * 2004-06-30 2007-08-07 Motorola, Inc. Method and apparatus for equalizing a speech signal generated within a pressurized air delivery system
US9509854B2 (en) * 2004-10-13 2016-11-29 Koninklijke Philips N.V. Echo cancellation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4802227A (en) 1987-04-03 1989-01-31 American Telephone And Telegraph Company Noise reduction processing arrangement for microphone arrays
US5550924A (en) 1993-07-07 1996-08-27 Picturetel Corporation Reduction of background noise for speech enhancement

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JONG WIN SHIN ET AL.: "Voice Activity Detection based on Generalised Gamma Distribution", 2005 LEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2005, pages 781 - 784, XP010792154, DOI: doi:10.1109/ICASSP.2005.1415230
See also references of EP2012725A4
Y. EPHRAIM; D. MALAH: "Speech enhancement using a minimum mean-square error log- spectral amplitude estimator", IEEE TRANS. ACOUST., SPEECH, SIGNAL PROCESSING, vol. ASSP-33, April 1985 (1985-04-01), pages 443 - 445, XP000931203, DOI: doi:10.1109/TASSP.1985.1164550
Y. EPHRAIM; D. MALAH: "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator", IEEE TRANS. ACOUST., SPEECH, SIGNAL PROCESSING, vol. ASSP-32, December 1984 (1984-12-01), pages 1109 - 1121, XP002435684, DOI: doi:10.1109/TASSP.1984.1164453

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010106734A1 (en) * 2009-03-18 2010-09-23 日本電気株式会社 Audio signal processing device
US8738367B2 (en) 2009-03-18 2014-05-27 Nec Corporation Speech signal processing device
JP5772591B2 (en) * 2009-03-18 2015-09-02 日本電気株式会社 Audio signal processing device
JP2010249939A (en) * 2009-04-13 2010-11-04 Sony Corp Noise reducing device and noise determination method
CN109166589A (en) * 2018-08-13 2019-01-08 深圳市腾讯网络信息技术有限公司 Using sound suppressing method, device, medium and equipment

Also Published As

Publication number Publication date
EP2014132A2 (en) 2009-01-14
WO2007130765A2 (en) 2007-11-15
EP2012725A2 (en) 2009-01-14
JP2009535996A (en) 2009-10-01
EP2012725A4 (en) 2011-10-12
EP2014132A4 (en) 2013-01-02
JP2009535997A (en) 2009-10-01
WO2007130766A3 (en) 2008-09-04
JP4833343B2 (en) 2011-12-07
JP4476355B2 (en) 2010-06-09
JP2010171985A (en) 2010-08-05
WO2007130765A3 (en) 2008-12-18
JP4866958B2 (en) 2012-02-01

Similar Documents

Publication Publication Date Title
US7697700B2 (en) Noise removal for electronic device with far field microphone on console
US9286907B2 (en) Smart rejecter for keyboard click noise
CN110491403B (en) Audio signal processing method, device, medium and audio interaction equipment
WO2007130766A2 (en) Narrow band noise reduction for speech enhancement
JP4376902B2 (en) Voice input system
Martin Speech enhancement based on minimum mean-square error estimation and supergaussian priors
US7295972B2 (en) Method and apparatus for blind source separation using two sensors
US20210035563A1 (en) Per-epoch data augmentation for training acoustic models
JP5587396B2 (en) System, method and apparatus for signal separation
JP5452655B2 (en) Multi-sensor voice quality improvement using voice state model
JP4897666B2 (en) Method and apparatus for detecting and eliminating audio interference
US8462969B2 (en) Systems and methods for own voice recognition with adaptations for noise robustness
Mallawaarachchi et al. Spectrogram denoising and automated extraction of the fundamental frequency variation of dolphin whistles
CN104021798A (en) Method for soundproofing an audio signal by an algorithm with a variable spectral gain and a dynamically modulatable hardness
Gerkmann et al. Spectral masking and filtering
JP6888627B2 (en) Information processing equipment, information processing methods and programs
KR20110061781A (en) Apparatus and method for subtracting noise based on real-time noise estimation
Gomez et al. Robustness to speaker position in distant-talking automatic speech recognition
CN110858485B (en) Voice enhancement method, device, equipment and storage medium
KR101568282B1 (en) Mask estimation method and apparatus in cluster based missing feature reconstruction
Li Robust speaker recognition by means of acoustic transmission channel matching: An acoustic parameter estimation approach
Al-Ali et al. Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments
Anderson et al. Channel-robust classifiers
Witkowski et al. Speaker Recognition from Distance Using X-Vectors with Reverberation-Robust Features
KR101506547B1 (en) speech feature enhancement method and apparatus in reverberation environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07759884

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2009509909

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2007759884

Country of ref document: EP