NOISE REMOVAL FOR ELECTRONIC DEVICE WITH FAR FIELD MICROPHONE ON CONSOLE
CLAIM OF PRIORITY
This application also claims the benefit of commonly-assigned, co-pending application number 11/381 ,727, to Xiao Dong Mao, entitled "NOISE REMOVAL FOR ELECTRONIC DEVICE WITH FAR FIELD MICROPHONE ON CONSOLE", (Attorney Docket SCEA05073US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application claims the benefit of commonly-assigned, co-pending application number 11/381,729, to Xiao Dong Mao, entitled ULTRA SMALL MICROPHONE ARRAY, (Attorney Docket SCEA05062US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/381,728, to Xiao Dong Mao, entitled ECHO AND NOISE CANCELLATION, (Attorney Docket SCEA05064US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claim the benefit of commonly-assigned, co-pending application number 11/381,725, to Xiao Dong Mao, entitled "METHODS AND APPARATUS FOR TARGETED SOUND DETECTION", (Attorney Docket SCEA05072US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/381,724, to Xiao Dong Mao, entitled "METHODS AND APPARATUS FOR TARGETED SOUND DETECTION AND CHARACTERIZATION", (Attorney Docket SCEA05079US00), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/381,721, to Xiao Dong Mao, entitled "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING", (Attorney Docket SCEA04005 JUMBOUS), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly- assigned, co-pending International Patent Application number PCT/US06/17483, to Xiao Dong Mao, entitled "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING", (Attorney Docket
SCEA04005 JUMBOPCT), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly- assigned, co-pending application number 11/418,988, to Xiao Dong Mao, entitled
"METHODS AND APPARATUSES FOR ADJUSTING A LISTENING AREA FOR CAPTURING SOUNDS", (Attorney Docket SCEA-00300) filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/418,989, to Xiao Dong Mao, entitled "METHODS AND APPARATUSES FOR CAPTURING AN AUDIO
SIGNAL BASED ON VISUAL IMAGE", (Attorney Docket SCEA-00400), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference. This application also claims the benefit of commonly-assigned, co-pending application number 11/429,047, to Xiao Dong Mao, entitled "METHODS AND APPARATUSES FOR CAPTURING AN AUDIO SIGNAL BASED ON A LOCATION OF THE SIGNAL", (Attorney Docket SCEA- 00500), filed on May 4, 2006, the entire disclosures of which are incorporated herein by reference.
FIELD OF THE INVENTION
Embodiments of the present invention are directed to audio signal processing and more particularly to removal of console noise in a device having a microphone located on a device console.
BACKGROUND OF THE INVENTION
Many consumer electronic devices utilize a console that includes various user controls and inputs. In many applications, such as video game consoles, cable television set top boxes and digital video recorders it is desirable to incorporate a microphone into the console. To reduce cost the microphone is typically a conventional omni-directional microphone having no preferred listening direction. Unfortunately, such electronic device consoles also contain noise sources, such as cooling fans, hard-disk drives, CD-ROM drives and digital video disk (DVD) drives. A microphone located on the console would pick up noise from these sources. Since these noise sources are often located quite close to the microphone(s) they can greatly interfere with desired sound inputs, e.g., user voice commands. To address this problem techniques for filtering out noise from these sources have been implemented in these devices.
Most previous techniques have been effective in filtering out broad band distributed noise. For example, fan noise is Gaussian distributed and therefore distributed over a broad band of frequencies. Such noise can be simulated with a Gaussian and cancelled out from the input signal to the microphone on the console. Noise from a disk drive, e.g., a hard disk or DVD drive is characterized by a narrow-band frequency distribution such as a gamma-distribution
or a narrow band Laplacian distribution. Unfortunately, deterministic methods that work with Gaussian noise are not suitable for removal of gamma-distributed noise.
Thus, there is a need in the art, for a noise reduction technique that overcomes the above disadvantages.
SUMMARY OF THE INVENTION
Embodiments of the invention are directed to reduction of noise in a device having a console with one or more microphones and a source of narrow band distributed noise located on the console. A microphone signal containing a broad band distributed desired sound and narrow band distributed noise is divided amongst a plurality of frequency bins. For each frequency bin, it is determined whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the source of narrow band noise located on the console. Any frequency bins containing portions of the signal belonging to the narrow band distribution are filtered to reduce the narrow band noise.
BRIEF DESCRIPTION OF THE DRAWINGS The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of an electronic device according to an embodiment of the present invention.
FIG. 2 is a flow diagram of a method for reduction of noise in a device of the type shown in FIG. 1.
FIGs. 3A-3B are graphs of microphone signal as a function of frequency illustrating reduction of narrow band noise according to embodiments of the present invention.
FIGs. 4A-4B are graphs of microphone signals for different microphones as a function of frequency illustrating reduction of narrow band noise according to alternative embodiments of the present invention.
DESCRIPTION OF THE SPECIFIC EMBODIMENTS
Although the following detailed description contains many specific details for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the
exemplary embodiments of the invention described below are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
As depicted in FIG. 1 an electronic device 100 according to an embodiment of the present invention includes a console 102 having one or more microphones 104A, 104B. As used herein, the term console generally refers to a stand-alone unit containing electronic components that perform computation and/or signal processing functions. The console may receive inputs from one or more input external devices, e.g., a joystick 106, and provide outputs to one or more output external devices such as a monitor 108. The console 102 may include a central processor unit 110 and memory 112. The console may include an optional fan 114 to provide cooling of the console components. By way of example, the console 102 may be a console for a video game system, such as a Sony PlayStation®, a cable television set top box, a digital video recorder, such as a TiVo® digital video recorder available from TiVo Inc. of Alviso, California.
The processor unit 110 and memory 112 may be coupled to each other via a system bus 116. The microphones 104A, 104B may be coupled to the processor and/or memory through input/output (VO) elements 118. As used herein, the term VO generally refers to any program, operation or device that transfers data to or from the console 100 and to or from a peripheral device. Every data transfer may be regarded as an output from one device and an input into another.
The device 100 may include one or more additional peripheral units which may be internal to the console 102 or external to it. Peripheral devices include input-only devices, such as keyboards and mouses, output-only devices, such as printers as well as devices such as a writable CD-ROM that can act as both an input and an output device. The term "peripheral device" includes external devices, such as a mouse, keyboard, printer, monitor, microphone, game controller, camera, external Zip drive or scanner as well as internal devices, e.g., a disk drive 120 such as a CD-ROM drive, CD-R drive, hard disk drive or DVD drive, an internal modem other peripheral such as a flash memory reader/writer, hard drive.
The console includes at least one source of narrow-band distributed noise such as the disk drive 120. Narrow band noise from the disk drive 120 may be filtered from digital signal data generated from microphone inputs XAOO, XB(O SO that desired sounds, e.g., voice, from a remote source 101 are not drowned out by the sound of the disk drive 120. The narrow band
noise may be characterized by a gamma distribution. The desired sound from the source 101 is preferably characterized by a broad band probability density function distribution such as a Gaussian-distributed probability density function.
The memory 112 may contain coded instructions 113 that can be executed by the processor 110 and/or data 115 that facilitate removal of the narrow band disk drive noise. Specifically, the data 115 may include a distribution function generated from training data of many hours of recording of sounds from disk drive. The distribution function may be stored in the form of a lookup table.
The coded instructions 113 may implement a method 200 for reducing narrow band noise in a device of the type shown in FIG. 1. According to the method 200 a signal from one or more of the console microphone input signals 104A, 104B is divided into frequency bins, as indicated at 202. Dividing the signal into a plurality of frequency bins may include capturing a time-windowed portion of the signal (e.g., microphone signal XA(X)), converting the time- windowed portion to a frequency domain signal x(f) (e.g., using a fast Fourier transform) and dividing the frequency domain signal amongst the frequency bins. By way of example, approximately 32 ms of microphone data may be stored in a buffer for classification into frequency bins. For each frequency bin it is determined whether a portion of the signal within the frequency bin belongs to a narrow band distribution characteristic of the narrow band disk drive noise as indicated at 204. Any frequency bins containing portions of the signal belonging to the narrow band distribution are filtered from the input signal and indicated at 206.
Filtering the input signal may be understood with respect to FIGs. 3A-3B. Specifically, as shown in FIG. 3 A, the frequency domain signal x(f) may be regarded as a combination of a broadband signal 302 and a narrow band signal 304. When these signals are divided into frequency bins 306, as shown in FIG. 3B, each bin contains a value corresponding to a portion of the broadband signal 302 and a portion of the narrow band signal 304. The portion of the signal x(f) in a given frequency bin 306 due to the narrow band signal 304 (indicated by the dashed bars in FIG. 3B) may be estimated from the training data. This portion may be subtracted from the value in the frequency bin 306 to filter out the narrow band noise from that bin.
The narrow band signal 304 may be estimated as follows. First narrow band signal samples may be collected in a large volume to train its distribution model. Distribution models are widely known to those of skill in the pattern recognition arts, such as speech modeling. The distribution model for the narrow band signal 304 is similar to those used in speech modeling with a few exceptions. Specifically, unlike speech, which is considered broadband with a Gaussian distribution, the narrow band noise on in the narrow band signal 304 has a "Gamma" distribution density function. The distribution model is known as a "Gamma- Mixture-Model". Speech applications, such as speaker/language identification, by comparison usually use a "Gaussian-Mixture-Model". The two models are quite similar. The underlying distribution function is the only significant difference. The model training procedure follows an "Estimate-Maximize" (EM) algorithm, which is widely available in speech modeling. The EM algorithm is an iterative likelihood maximization method, which estimates a set of model parameters from a training data set. A feature vector is generated directly from a logarithm of power-spectrum. By contrast, a speech model usually applies further compression, such as DCT or cepstrum-coeficient. This is because the signal of interest is narrow band, and band averaging that possibly has attenuation in broadband background is not desired. In real-time, the model is utilized to estimate a narrow-band noise power spectrum density (PSD).
An algorithm for such a model may proceed as follows:
First, the signal x(t) is transformed from the time domain to the frequency domain.
X(k) = fft(x(t)), where k is a frequency index.
Next, a power spectrum is obtained from the frequency domain signal X(k).
Syy(k) = X(k) .* conj(X(k)), where "conj" refers to the complex conjugate.
Next, a feature vector V(k) is obtained from the logarithm of power spectrum.
V(k) = log(Syy(k))
The term "feature Vector" is a common term in pattern recognition. Essentially any pattern matching includes 1) a pre-trained model that defines the distribution in priori feature space, and 2) runtime observed feature vectors. The task is to match the feature vector against the
model. Given a prior trained gamma <Model>, the narrow-band noise presence probability <Pn(k)> may be obtained for this observed feature V(k).
Pn (k) = Gamma (Model, VQc))
The narrow-band noise PSD is adaptively updated:
Sm(k) = { α*Snn(k) + (1 - α) * Syy(k) } * PnQc) + Sm (k) * (1 - PnQc) )
If Pn(k) is zero, that is no narrow-band noise is presented, the Snn(k) does not change. If PnQc) = 1, that is this frequency <k> is entirely narrow-band noise, then:
Snn(k) = α * Snn(k) + (1 - O) * Syy(k)
This is essentially a statistical periodgram averaging, where α is a smoothing factor.
Given the estimated noise PSD, it is thus straightforward to estimate the clean voice signal. An example of an algorithm for performing such an estimation is based on the well-known MMSE estimator, which is described by Y. Ephraim and D. Malah, in "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust, Speech, Signal Processing, Vol. ASSP-32, pp, 1109-1121, Dec. 1984 and Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log- spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP- 33, pp, 443-445, Apr. 1985, the disclosures of both of which are incorporated herein by reference.
In alternative embodiments, the filtering may take advantage of the presence of two or more microphones 104A, 104B on the console 102. If there are two microphones 104A, 104B on the console 102 one of them (104B) may be closer to the disk drive than the other (104A). As a result there is a difference in the time of arrival of the noise from the disk drive 120 for the microphone input signals XAQ) and Xβ(t). The difference in time of arrival results in different frequency distributions for the input signals when they are frequency converted to XA(Q, Xβ(f) as illustrated in FIGs. 4A-4B. The frequency distribution of broadband sound from remote a sources, by contrast, will not be significantly different for XA(Q, XB(Q- However the frequency distribution for the narrow band signal 304A from microphone 104A will be frequency shifted relative to the frequency distribution 304B from microphone 104B. The narrow band noise contribution to the frequency bins 306 can be determined by
generating a feature vector V(k) from the frequency domain signals XA(Q, Xβ(f) from the two microphones 104A, 104B.
By way of example, a first feature vector V(k,A) is generated from the power spectrum Syy(k,A) for microphone 104A:
V(k,A) = log(Syy(k,A))
A second feature vector V(k,B) is generated from the power spectrum Syy(k,B) for microphone 104B:
V(k,B) = log(Syy(k,B))
The feature vector V(k) is then obtained from a simple concatenation of V(k,A) and V(k,B)
V(k) = [V(k,l), V(k,2)]
The rest model training, real-time detection, they are the same, except now the model size and feature vector dimension are doubled. Although the above technique uses neither array beam forming, nor anything that depends on time-difference-arrival the spatial information is actually implicitly included in the trained model and runtime feature vectors, they can greatly improve detection accuracy.
Embodiments of the present invention may be used as presented herein or in combination with other user input mechanisms and notwithstanding mechanisms that track or profile the angular direction or volume of sound and/or mechanisms that track the position of the object actively or passively, mechanisms using machine vision, combinations thereof and where the object tracked may include ancillary controls or buttons that manipulate feedback to the system and where such feedback may include but is not limited light emission from light sources, sound distortion means, or other suitable transmitters and modulators as well as controls, buttons, pressure pad, etc. that may influence the transmission or modulation of the same, encode state, and/or transmit commands from or to a device, including devices that are tracked by the system and whether such devices are part of, interacting with or influencing a system used in connection with embodiments of the present invention.
While the above is a complete description of the preferred embodiment of the present invention, it is possible to use various alternatives, modifications and equivalents. Therefore,
the scope of the present invention should be determined not with reference to the above description but should, instead, be determined with reference to the appended claims, along with their full scope of equivalents. Any feature described herein, whether preferred or not, may be combined with any other feature described herein, whether preferred or not. In the claims that follow, the indefinite article "A", or "An" refers to a quantity of one or more of the item following the article, except where expressly stated otherwise. The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase "means for."