CN103310789A - Sound event recognition method based on optimized parallel model combination - Google Patents

Sound event recognition method based on optimized parallel model combination Download PDF

Info

Publication number
CN103310789A
CN103310789A CN2013102397247A CN201310239724A CN103310789A CN 103310789 A CN103310789 A CN 103310789A CN 2013102397247 A CN2013102397247 A CN 2013102397247A CN 201310239724 A CN201310239724 A CN 201310239724A CN 103310789 A CN103310789 A CN 103310789A
Authority
CN
China
Prior art keywords
sound event
noise
model
template
spectral domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102397247A
Other languages
Chinese (zh)
Other versions
CN103310789B (en
Inventor
刘宏
王一
李晓飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN201310239724.7A priority Critical patent/CN103310789B/en
Publication of CN103310789A publication Critical patent/CN103310789A/en
Application granted granted Critical
Publication of CN103310789B publication Critical patent/CN103310789B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a sound event recognition method based on optimized parallel model combination. The sound event recognition method includes 1) recording data of a sound event, acquiring a GMM (Gaussian mixture model) according to clean sound event training, and establishing a clean sound event template; 2) acquiring noise data in current environment in indoor actual noisy environment, acquiring a GMM according to the noise data training, and establishing a noise template; 3) processing the noise template and the clean sound event template by the method of optimized parallel model combination, and obtaining a template of a sound event with noise; 4) sampling to obtain sample signals of the sound event with noise, recognizing sound of the sample signals according to the parameters in the template of the sound event with noise. According to the sound event recognition method, a GMM capable of describing background noise feature distribution better is established as one input in a PMC (portable media center) method, a clean GMM of five sound events is established at another input in the PMC. Meanwhile, robustness of a recognition system to noises is guaranteed.

Description

A kind of sound event recognition methods based on improved parallel model combination
Technical field
The invention belongs to intelligent monitoring sound intermediate frequency signal process field, relate to sound event recognition methods in the indoor environment, be specifically related to a kind of sound event recognition methods based on improved parallel model combination.
Background technology
With respect to audio recognition method ripe in the artificial intelligence field, the identification that utilizes computing machine to carry out sound event is to compare the recent studies on direction in recent years.Sound event identification at be in the physical environment sounding have certain implication or the sound event that can reflect people's behavior, judge automatically and sort out.In family intelligent monitoring system, the identification of sound event can help situation about taking place in the long-range monitoring family indoor environment of people, and informs that in time which type of event the user has produced, and is conducive to the user and in time handles.But, be to exist complicated noise in the real environment, want to be implemented in the effective monitoring under the true environment, be necessary with urgent to the processing of noise.
At first, the identification of sound event belongs to the problem of a pattern-recognition, is similar to automatic speech recognition.Fundamental method is that signal is handled and pattern-recognition.Existing sound event recognition methods comprises following step:
(1) typing of sound event signal, pre-filtering, analog to digital conversion.Earlier the analoging sound signal of typing is carried out pre-filtering, high-pass filtering suppresses 50HZ power supply noise signal; Low-pass filtering filtering sound signal intermediate frequency rate component surpasses half part of sample frequency, prevents that aliasing from disturbing.Analoging sound signal is sampled and quantification obtains digital signal.
(2) divide frame, windowing.Voice signal is the same with voice signal, all has whole non-stationary, and the part is stationarity in short-term stably, and the analogous terms tone signal can think that voice signal is stably in 10~30ms, can carry out the branch frame to voice signal according to the length of 30ms.Utilize window function to carry out the extraction of signal when dividing frame, its selection of window function (shape and length) is very big to the properties influence of short-time analysis parameter, and window function commonly used comprises rectangular window, Hanning window and Hamming window etc.Generally select Hamming window for use, can react the characteristic variations of voice signal well.
(3) feature extraction.The feature of different sound events is inequality, wants to distinguish different voice signals, will carry out mathematical description to audio signal characteristics.The feature of sound event identification commonly used has temporal signatures: short-time energy, short-time zero-crossing rate.Frequency domain character: sub belt energy, Wavelet time-frequency characteristic.Cepstrum domain feature: linear prediction cepstrum coefficient (LPCC), Mel frequency cepstral coefficient (MFCC) etc.
(4) identification.The recognition methods of sound event also is to adopt the algorithm that is similar to speech recognition.The method of sound event identification commonly used has the classification based on support vector machine (SVM), based on mixed Gauss model (GMM) clustering method, and hidden Markov model (HMM) method, Bayes algorithm.
Secondly, to the processing of noise.When the above-mentioned recognition methods of telling about was used in actual environment, the performance of recognition system can sharply worsen along with the mismatch of training data and test data, is exactly influence of environmental noise and cause the reason of described mismatch.Can do not analyzed from signal space, feature space and three spaces of the model space by the matching of training and testing that noise causes.Method commonly used has the sound enhancing method, robust features extraction, feature compensation, model compensation such as the parallel model combination methods such as (PMC) that are similar to the voice enhancing that noise is handled, and improves the robustness of system.
Existing method major part is still continued to use a cover of speech recognition, to the processing of noise also nothing more than above several method, in the above method based on the abundant describe environment noise and being widely adopted of the method for PMC, they can fully excavate the information in the environment, improve the robustness of system identification, but describe with single Gauss model (SGM) for noise characteristic in the existing P MC method, for noise ratio than complicated situation, the characteristic that SGM can not fine sign noise.So discrimination is not ideal enough under the noise complicated situation.
Summary of the invention
In order to solve the problems of the technologies described above, the object of the present invention is to provide a kind of method that merges by improved model parameter to obtain meeting the band noise sound event model of noise circumstance, identify for the sound event to be identified under the actual noise environment.
In order to realize above-mentioned purpose, technical solution of the present invention is: a kind of sound event recognition methods based on improved parallel model combination, and its step comprises:
1) obtains the GMM gauss hybrid models according to clean sound event training, set up clean sound event template;
2) training obtains the GMM gauss hybrid models according to noise data, sets up the noise template;
3) to the method for described noise template and the fusion of described clean sound event template employing parallel model, obtain being with noise sound event-template;
4) sampling obtains being with noise sound event sample signal, according to the parameter in the described band noise sound event-template sample signal is carried out voice recognition.
Further, it is as follows to set up the method for template of clean sound event:
1) data of recorded voice event under nothing is made an uproar quiet indoor environment carry out carrying out branch frame, windowing process again after pre-filtering, the analog to digital conversion to the sound event of recording;
2) extract MFCC Mel cepstrum coefficient feature, train the GMM Gaussian Mixture template of sound event.
Further, described gauss hybrid models adopts the training of EM algorithm and upgrades the parameter of Gauss model, and the GMM parameter of the clean sound event that training obtains is λ x={ w Xk, μ Xk, Σ Xk, k=1,2, M, wherein, w XkRepresent clean sound event model mix weight, μ XkThe average of representing clean sound event model, Σ XkThe variance of representing clean sound event model, M represents the exponent number of mixed Gaussian.
Further, obtain the noise data in the current environment under indoor true noisy environment, set up described noise template method and be: extract the MFCC feature, set up the GMM template of noise, obtaining noise template GMM parameter is λ n={ w Nk, μ Nk, Σ Nk, k=1,2, M, wherein, w NkThe hybrid weight of expression noise model, μ NkThe average of expression noise model, Σ NkThe variance of expression noise model, M represents the exponent number of mixed Gaussian.
Further, adopt the method for improved parallel model fusion as follows to described noise template and described clean sound event template:
(1) adopts inverse discrete cosine transform that arbitrary model parameter is mapped to linear spectral domain by cepstrum domain, obtain the average μ of log-spectral domain model Log=C -1μ and variance Σ Log=C -1Σ (C -1) T, wherein, C is the discrete cosine transform matrix, μ, Σ are respectively and are the average of the cepstrum domain of model and variance;
(2) the log-spectral domain average in the log-spectral domain model and variance are transformed to linear spectral domain by exponential function, μ i lin = exp ( μ i log + Σ ii log 2 ) Be i element of the mean vector of linear spectral domain, Σ ij lin = μ i lin μ j lin [ exp ( Σ ij log ) - 1 ] The capable j column element of i for the covariance matrix of linear spectral domain.Wherein, μ i LogBe i element of the mean vector of log-spectral domain,
Figure BDA00003356323800033
The capable j column element of i for the covariance matrix of log-spectral domain;
(3) adopt improved parallel model combined method, clean sound event model parameter and noise model parameter merged at linear spectral domain, Be the average of the band noise sound event model after merging at linear spectral domain,
Figure BDA00003356323800035
Be the variance of the band noise sound event model after merging at linear spectral domain.μ wherein Xk LinBe the average of the clean linear spectral domain of sound event model after described step (1) (2) conversion, Be the variance of the clean linear spectral domain of sound event model after described step (1) (2) conversion, μ Nk LinBe the average of the linear spectral domain of noise model after described step (1) (2) conversion,
Figure BDA00003356323800037
Variance for the linear spectral domain of noise model after described step (1) (2) conversion;
(4) average of the linear spectral domain model of the band noise sound event model after will merging and variance obtain the log-spectral domain parameter through the inverse transformation of above-mentioned steps (2), pass through the characteristic parameter that above-mentioned steps (1) inverse transformation obtains cepstrum domain again, obtain mean vector and variance with noise sound event model.
Further, the parameter of band noise sound event model is λ y={ w Yk, μ Yk, Σ Yk, k=1,2, M, wherein w Yk, μ Yk, Σ YkThe hybrid weight of representing the noise template respectively, average and variance.Wherein hybrid weight does not have linear spectral domain, the difference of log-spectral domain and cepstrum domain.Therefore with the hybrid weight w of noise sound event model YkBe the weight w of clean sound event template Xk, M represents the exponent number of mixed Gaussian.
Further, as follows to the method that sample signal carries out voice recognition according to the parameter in the described band noise sound event model:
1) described sample signal is carried out pre-filtering, analog to digital conversion, carry out extracting after branch frame, the windowing process multidimensional MFCC feature again and obtain the sample signal characteristic sequence;
2) characteristic vector sequence and the described band noise sound event model with sample signal mates, and calculates the match likelihood degree, and the matching template of maximum likelihood degree is recognition result.
Further, air-conditioning noise under the babble noise of described noise data employing NoiseX-92 and/or the indoor environment.
Technique effect of the present invention:
The present invention can better describe the background GMM that the ground unrest feature distributes by setting up under complicated noise background, be used as input in the PMC method, sets up the clean GMM of 5 kinds of sound events as another input of PMC.The method that improved model parameter merges obtains meeting the band noise sound event model of noise circumstance, and the sound event to be identified under the actual noise environment has good identification effect.The present invention has guaranteed the robustness of recognition system to noise.
Description of drawings
Fig. 1 is the whole identification process synoptic diagram of sound event recognition methods that the present invention is based on improved parallel model combination.
Fig. 2 the present invention is based on fusion method synoptic diagram among sound event recognition methods one embodiment of improved parallel model combination.
Fig. 3 the present invention is based on 5 kinds of sound event recognition effect synoptic diagram among sound event recognition methods one embodiment of improved parallel model combination.
Specific implementation method
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, be understandable that described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those skilled in the art belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
The present invention is directed to 5 kinds of sound events that recurrent needs come into the picture in the indoor environment identifies.In addition, take into full account the situation (the babble noise of the air-conditioning noise of recording under the indoor environment, public noise data storehouse NoiseX-92) of complicated noise, utilize GMM mixed Gauss model (GMM, " voice signal processing " the 2nd edition, Zhao Li writes, China Machine Press, 228-230 page or leaf) ambient noise signal described, the feature that GMM describes background with a plurality of Gaussian distribution weightings distributes, and can better describe the information of ground unrest fully., to utilizing described ground unrest model parameter clean sound event model parameter is compensated in model layer, obtains the model with noise sound event, to prevent because the training data that brings of noise and the mismatch of test data.
The present invention is a kind of sound event recognition methods based on improved parallel model combination, and particular content is:
The first, set up the template of clean sound event.
(1) under quiet environment, records the data of 5 kinds of sound events respectively, windowing is divided pre-service such as frame according to foregoing sound signal processing step.
(2) again by as previously mentioned, extract the MFCC feature of robust, train the gauss hybrid models of 5 kinds of sound events respectively.The training of gauss hybrid models adopts the EM algorithm to upgrade the parameter of Gauss model.The wherein a kind of GMM parameter of supposing the clean sound event that obtains of training is as follows:
λ x={w xkxkxk},k=1,2···,M (1)
The second, obtain the noise data in the current environment, extract the MFCC feature, set up the GMM template of noise.It is as follows to obtain the noise mode board parameter:
λ n={w nknknk},k=1,2···,M (2)
The 3rd, carrying out model and merge, is the MFCC feature all owing to be used for training the data of GMM among the present invention, belongs to the cepstrum domain feature, and ground unrest and sound event model parameter only can add at linear spectral domain, so will do following processing equally to above two kinds of models: (with λ={ w k, μ k, Σ k, k=1,2, M unify to explain the GMM of clean sound and the GMM of ground unrest)
1) model parameter is mapped to linear spectral domain by cepstrum domain, specifically can adopts the inverse transformation of discrete cosine transform to handle.Here we do not extract the difference coefficient of MFCC, computing method formula (3) and (4):
μ log=C -1μ (3)
Σ log=C -1Σ(C -1) T (4)
μ wherein Log, Σ LogBe average and the variance of log-spectral domain model, μ, Σ are average and the variance of the cepstrum domain of model, and C is the discrete cosine transform matrix.
2) normally distributed random variable with log-spectral domain transforms to linear spectral domain by exponential function, computing method such as formula (5) and (6):
μ i lin = exp ( μ i log + Σ ii log 2 ) - - - ( 5 )
Σ ij lin = μ i lin μ j lin [ exp ( Σ ij log ) - 1 ] - - - ( 6 )
Wherein, μ i Lin, Be respectively i element of mean vector of linear spectral domain and the capable j column element of i of covariance matrix; μ i Log,
Figure BDA00003356323800067
Be respectively i element of mean vector of log-spectral domain and the capable j column element of i of covariance matrix.
3) mean vector and the variance of the linear spectral domain of the clean sound event model after supposing to be calculated by following formula are respectively μ Xk Lin,
Figure BDA00003356323800061
Linear spectral domain mean vector and the variance of noise model are respectively μ Nk Lin,
Figure BDA00003356323800062
Adopt formula (7) and (8) that two models are merged:
μ yk lin = g μ xk lin + ( 1 - g ) Σ k = 1 k w nk μ nk lin - - - ( 7 )
Σ yk lin = g 2 Σ xk lin + ( 1 - g ) 2 Σ k = 1 K w nk Σ n lin - - - ( 8 )
μ wherein Yk Lin,
Figure BDA00003356323800065
Mean vector and the variance of the band noise sound event model after expression is merged, g represents gain factor.
4) the linear spectral domain model parameter after will merging obtains the log-spectral domain parameter of model through the inverse transformation of formula (5) (6), and the inverse transformation of passing through formula (3) (4) again obtains the parameter of the cepstrum domain of model.Be 5 kinds of band noise sound event model parameters after the fusion with what 5 kinds of sound event models did all that above-mentioned processing can obtain at last.
The 4th, for the band noise sound event signal sample that in the room noise ambient sound, extracts, the purpose of identification is to determine which of 5 kinds of sound events current sample belong to, namely calculate sample to the posterior probability of 5 kinds of models, wherein the model of a Zui Da posterior probability correspondence is the classification of sample.According to Bayesian formula, because 5 kinds of contingent probability of sound event are identical, for the observation vector of determining, the calculating of above-mentioned maximum a posteriori probability is equal to calculates this probability under 5 kinds of sound event models of revise, makes the model of this probability maximum be the affiliated classification of sample.
Be the whole identification process synoptic diagram of sound event recognition methods that the present invention is based on improved parallel model combination as shown in Figure 1, comprise training part and identification division.
The present invention considers frequent 5 kinds of sound events that take place and need come into the picture under the indoor environment, is respectively close the door sound, tap-tap, clapping, voice, birdie.5 kinds of sound event templates and noise training template training process are as follows:
1, under quiet environment, records 5 kinds of sound event databases and go forward side by side rower calmly.00 of every kind of sound event Class1 is by 5 male 5 woman sounding or produce action and obtain respectively.Air-conditioning noise under the babble noise of noise employing NoiseX-92 and the indoor environment.
2, pre-filtering, high-pass filtering suppress 50Hz power supply noise signal; Low-pass filtering filtering sound signal intermediate frequency rate component surpasses half part of sample frequency; Analog to digital conversion, sample frequency are 11025Hz, and sampling precision is 16bits;
3, for each complete voice segments, divide frame, windowing.Frame length is 256 sampled points, and it is 128 sampled points that frame moves.Window function is chosen Hamming window;
4, feature extraction.Extract 13 dimension MFCC features;
5, every kind of sound event utilizes 60 characteristic vector sequence respectively, and noise adopts 10 characteristic vector sequence, based on the GMM template λ of 5 kinds of sound of expectation maximization (EM) algorithm training Xk, k=1,2,5, and the template λ of noise n, template adopts the gauss hybrid models of 8 gaussian component.
Model fusion process of the present invention is to the present invention is based on fusion method synoptic diagram among sound event recognition methods one embodiment of improved parallel model combination as shown in Figure 2.
Concrete steps are as follows:
1, adopt described formula (3) (4) (5) (6) that ground unrest model and ten clean sound event model parameter spectral domains are converted into linear spectral domain.
2, adopt described formula (7) (8) respectively with ten kinds of clean sound events the linear spectral field parameter and the linear spectral field parameter of noise merge g=0.5 here.
3, the linear spectral field parameter of the band noise sound event model after will merging through the inverse transformation of (5) (6) formula and the inverse transformation of (3) (4), obtains 5 GMM model λ with noise sound event respectively Yk, k=1,2,5.
Identifying of the present invention is as follows:
1, under above-mentioned two kinds of noise conditions, records totally 110 of 5 kinds of band noise sound event signals.Carry out pre-filtering; Analog to digital conversion, sample frequency are 11025Hz, and sampling precision is 16bits.
2, divide frame, windowing.Frame length is 256 sampled points, and it is 128 sampled points that frame moves.Window function is chosen Hamming window.Extract 13 dimension MFCC features.
3, template matches.Current audio signal characteristics sequence vector and 5 kinds of band noise sound event-templates mate.Feature vector sequence is X k, k=1 ..., N, 5 templates are λ Yk, k=1,2,5.Calculate the match likelihood degree, selecting the template of acquisition maximum likelihood degree is recognition result.Be to the present invention is based on 5 kinds of sound event recognition effect synoptic diagram among sound event recognition methods one embodiment of improved parallel model combination as shown in Figure 3.
Above-mentioned example is of the present invention giving an example, although disclose example of the present invention for the purpose of illustration, but it will be appreciated by those skilled in the art that: without departing from the spirit and scope of the invention and the appended claims, various replacements, variation and modification all are possible.Therefore, the present invention should not be limited to the content of this example.

Claims (9)

1. sound event recognition methods based on improved parallel model combination, its step comprises:
1) obtains the GMM gauss hybrid models according to clean sound event training, set up clean sound event template;
2) training obtains the GMM gauss hybrid models according to noise data, sets up the noise template;
3) to the method for described noise template and the fusion of described clean sound event template employing parallel model, obtain being with noise sound event-template;
4) sampling obtains being with noise sound event sample signal, according to the parameter in the described band noise sound event-template sample signal is carried out voice recognition.
2. the sound event recognition methods based on the combination of improved parallel model as claimed in claim 1 is characterized in that, the method for template of setting up clean sound event is as follows:
1) data of recorded voice event under nothing is made an uproar quiet indoor environment carry out carrying out branch frame, windowing process again after pre-filtering, the analog to digital conversion to the sound event of recording;
2) extract MFCC Mel cepstrum coefficient feature, train the GMM Gaussian Mixture template of sound event.
3. the sound event recognition methods based on improved parallel model combination as claimed in claim 1 is characterized in that, described gauss hybrid models adopts the training of EM algorithm and upgrades the parameter of Gauss model, and the GMM parameter of the clean sound event that training obtains is λ x={ w Xk, μ Xk, Σ Xk, k=1,2, M, wherein, w XkRepresent clean sound event model mix weight, μ XkThe average of representing clean sound event model, Σ XkThe variance of representing clean sound event model, M represents the exponent number of mixed Gaussian.
4. the sound event recognition methods based on improved parallel model combination as claimed in claim 1, it is characterized in that, under indoor true noisy environment, obtain the noise data in the current environment, setting up described noise template method is: extract the MFCC feature, set up the GMM template of noise, obtaining noise template GMM parameter is λ n={ w Nk, μ Nk, Σ Nk, k=1,2, M, wherein, w NkThe hybrid weight of expression noise model, μ NkThe average of expression noise model, Σ NkThe variance of expression noise model, M represents the exponent number of mixed Gaussian.
5. the sound event recognition methods based on improved parallel model combination as claimed in claim 1 is characterized in that, adopts the method for parallel model fusion as follows to described noise template and described clean sound event template:
(1) adopts inverse discrete cosine transform that arbitrary model parameter is mapped to linear spectral domain by cepstrum domain, obtain the average μ of log-spectral domain model Log=C -1μ and variance Σ Log=C -1Σ (C -1) T, wherein, C is the discrete cosine transform matrix, μ, Σ are respectively average and the variance of the cepstrum domain of model;
(2) the log-spectral domain average in the log-spectral domain model and variance are transformed to linear spectral domain by exponential function, μ i lin = exp ( μ i log + Σ ii log 2 ) Be i element of the mean vector of linear spectral domain, Σ ij lin = μ i lin μ j lin [ exp ( Σ ij log ) - 1 ] The capable j column element of i for the covariance matrix of linear spectral domain; Wherein, μ i LogBe i element of the mean vector of log-spectral domain,
Figure FDA00003356323700021
The capable j column element of i for the covariance matrix of log-spectral domain;
(3) adopt improved parallel model combined method, clean sound event model parameter and noise model parameter merged at linear spectral domain,
Figure FDA00003356323700022
Be the average of the band noise sound event model after merging at linear spectral domain, Be the variance of the band noise sound event model after merging at linear spectral domain, wherein μ Xk LinBe the average of the clean linear spectral domain of sound event model after described step (1) (2) conversion,
Figure FDA00003356323700024
Be the variance of the clean linear spectral domain of sound event model after described step (1) (2) conversion, μ Nk LinBe the average of the linear spectral domain of noise model after described step (1) (2) conversion,
Figure FDA00003356323700025
Variance for the linear spectral domain of noise model after described step (1) (2) conversion;
(4) average of the linear spectral domain model of the band noise sound event model after will merging and variance obtain the log-spectral domain parameter through the inverse transformation of above-mentioned steps (2), pass through the characteristic parameter that above-mentioned steps (1) inverse transformation obtains cepstrum domain again, obtain mean vector and variance with noise sound event model.
6. the sound event recognition methods based on improved parallel model combination as claimed in claim 1 is characterized in that, the parameter lambda of band noise sound event model y={ w Yk, μ Yk, Σ Yk, k=1,2, M, wherein w Yk, μ Yk, Σ YkThe hybrid weight of representing the noise template respectively, average and variance, wherein hybrid weight does not have linear spectral domain, the hybrid weight w of band noise sound event model YkBe the weight w of clean sound event template Xk, M represents the exponent number of mixed Gaussian.
7. as any described sound event recognition methods based on improved parallel model combination of claim 1-6, it is characterized in that air-conditioning noise under the babble noise of described noise data employing NoiseX-92 and/or the indoor environment.
8. as any described sound event recognition methods based on the combination of improved parallel model of claim 1-6, it is characterized in that being characterized as of described extraction: Mel frequency cepstral coefficient MFCC.
9. the sound event recognition methods based on improved parallel model combination as claimed in claim 1 is characterized in that, and is as follows to the method that sample signal carries out voice recognition according to the parameter in the described band noise sound event model:
1) described sample signal is carried out pre-filtering, analog to digital conversion, carry out extracting after branch frame, the windowing process multidimensional MFCC feature again and obtain the sample signal characteristic sequence;
2) characteristic vector sequence and the described band noise sound event model with sample signal mates, and calculates the match likelihood degree, and the matching template of maximum likelihood degree is recognition result.
CN201310239724.7A 2013-05-08 2013-06-17 A kind of sound event recognition method of the parallel model combination based on improving Expired - Fee Related CN103310789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310239724.7A CN103310789B (en) 2013-05-08 2013-06-17 A kind of sound event recognition method of the parallel model combination based on improving

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201310166660.2 2013-05-08
CN201310166660 2013-05-08
CN2013101666602 2013-05-08
CN201310239724.7A CN103310789B (en) 2013-05-08 2013-06-17 A kind of sound event recognition method of the parallel model combination based on improving

Publications (2)

Publication Number Publication Date
CN103310789A true CN103310789A (en) 2013-09-18
CN103310789B CN103310789B (en) 2016-04-06

Family

ID=49135932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310239724.7A Expired - Fee Related CN103310789B (en) 2013-05-08 2013-06-17 A kind of sound event recognition method of the parallel model combination based on improving

Country Status (1)

Country Link
CN (1) CN103310789B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408440A (en) * 2014-12-10 2015-03-11 重庆邮电大学 Identification method for human facial expression based on two-step dimensionality reduction and parallel feature fusion
CN104485108A (en) * 2014-11-26 2015-04-01 河海大学 Noise and speaker combined compensation method based on multi-speaker model
CN105118516A (en) * 2015-09-29 2015-12-02 浙江图维电力科技有限公司 Identification method of engineering machinery based on sound linear prediction cepstrum coefficients (LPCC)
CN105405447A (en) * 2015-10-27 2016-03-16 航宇救生装备有限公司 Telephone transmitter respiration noise shielding method
CN105657338A (en) * 2014-12-02 2016-06-08 深圳大学 Internet based remote mobile terminal control system and control method
CN106340292A (en) * 2016-09-08 2017-01-18 河海大学 Voice enhancement method based on continuous noise estimation
CN107492153A (en) * 2016-06-07 2017-12-19 腾讯科技(深圳)有限公司 Attendance checking system, method, work attendance server and attendance record terminal
CN108922518A (en) * 2018-07-18 2018-11-30 苏州思必驰信息科技有限公司 voice data amplification method and system
CN109472311A (en) * 2018-11-13 2019-03-15 北京物灵智能科技有限公司 A kind of user behavior recognition method and device
CN109631104A (en) * 2018-11-01 2019-04-16 广东万和热能科技有限公司 Air quantity Automatic adjustment method, device, equipment and the storage medium of kitchen ventilator
CN110120230A (en) * 2019-01-08 2019-08-13 国家计算机网络与信息安全管理中心 A kind of acoustic events detection method and device
CN110544469A (en) * 2019-09-04 2019-12-06 秒针信息技术有限公司 Training method and device of voice recognition model, storage medium and electronic device
WO2020029332A1 (en) * 2018-08-09 2020-02-13 厦门亿联网络技术股份有限公司 Rnn-based noise reduction method and device for real-time conference
CN110838306A (en) * 2019-11-12 2020-02-25 广州视源电子科技股份有限公司 Voice signal detection method, computer storage medium and related equipment
CN111028841A (en) * 2020-03-10 2020-04-17 深圳市友杰智新科技有限公司 Method and device for awakening system to adjust parameters, computer equipment and storage medium
CN111711881A (en) * 2020-06-29 2020-09-25 深圳市科奈信科技有限公司 Self-adaptive volume adjustment method according to environmental sound and wireless earphone
CN112820318A (en) * 2020-12-31 2021-05-18 西安合谱声学科技有限公司 Impact sound model establishment and impact sound detection method and system based on GMM-UBM
CN113112681A (en) * 2020-01-13 2021-07-13 阿里健康信息技术有限公司 Vending equipment, and shipment detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1397929A (en) * 2002-07-12 2003-02-19 清华大学 Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization
US20040138882A1 (en) * 2002-10-31 2004-07-15 Seiko Epson Corporation Acoustic model creating method, speech recognition apparatus, and vehicle having the speech recognition apparatus
US6876966B1 (en) * 2000-10-16 2005-04-05 Microsoft Corporation Pattern recognition training method and apparatus using inserted noise followed by noise reduction
CN1819019A (en) * 2006-03-13 2006-08-16 华南理工大学 Phonetic identifier based on matrix characteristic vector function and identification thereof
CN102426837A (en) * 2011-12-30 2012-04-25 中国农业科学院农业信息研究所 Robustness method used for voice recognition on mobile equipment during agricultural field data acquisition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6876966B1 (en) * 2000-10-16 2005-04-05 Microsoft Corporation Pattern recognition training method and apparatus using inserted noise followed by noise reduction
CN1397929A (en) * 2002-07-12 2003-02-19 清华大学 Speech intensifying-characteristic weighing-logrithmic spectrum addition method for anti-noise speech recognization
US20040138882A1 (en) * 2002-10-31 2004-07-15 Seiko Epson Corporation Acoustic model creating method, speech recognition apparatus, and vehicle having the speech recognition apparatus
CN1819019A (en) * 2006-03-13 2006-08-16 华南理工大学 Phonetic identifier based on matrix characteristic vector function and identification thereof
CN102426837A (en) * 2011-12-30 2012-04-25 中国农业科学院农业信息研究所 Robustness method used for voice recognition on mobile equipment during agricultural field data acquisition

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104485108A (en) * 2014-11-26 2015-04-01 河海大学 Noise and speaker combined compensation method based on multi-speaker model
CN105657338A (en) * 2014-12-02 2016-06-08 深圳大学 Internet based remote mobile terminal control system and control method
CN104408440B (en) * 2014-12-10 2017-10-17 重庆邮电大学 A kind of facial expression recognizing method merged based on two step dimensionality reductions and Concurrent Feature
CN104408440A (en) * 2014-12-10 2015-03-11 重庆邮电大学 Identification method for human facial expression based on two-step dimensionality reduction and parallel feature fusion
CN105118516A (en) * 2015-09-29 2015-12-02 浙江图维电力科技有限公司 Identification method of engineering machinery based on sound linear prediction cepstrum coefficients (LPCC)
CN105405447A (en) * 2015-10-27 2016-03-16 航宇救生装备有限公司 Telephone transmitter respiration noise shielding method
CN105405447B (en) * 2015-10-27 2019-05-24 航宇救生装备有限公司 One kind sending words respiratory noise screen method
CN107492153A (en) * 2016-06-07 2017-12-19 腾讯科技(深圳)有限公司 Attendance checking system, method, work attendance server and attendance record terminal
CN106340292A (en) * 2016-09-08 2017-01-18 河海大学 Voice enhancement method based on continuous noise estimation
CN106340292B (en) * 2016-09-08 2019-08-20 河海大学 A kind of sound enhancement method based on continuing noise estimation
CN108922518A (en) * 2018-07-18 2018-11-30 苏州思必驰信息科技有限公司 voice data amplification method and system
WO2020029332A1 (en) * 2018-08-09 2020-02-13 厦门亿联网络技术股份有限公司 Rnn-based noise reduction method and device for real-time conference
CN109631104A (en) * 2018-11-01 2019-04-16 广东万和热能科技有限公司 Air quantity Automatic adjustment method, device, equipment and the storage medium of kitchen ventilator
CN109472311A (en) * 2018-11-13 2019-03-15 北京物灵智能科技有限公司 A kind of user behavior recognition method and device
CN110120230A (en) * 2019-01-08 2019-08-13 国家计算机网络与信息安全管理中心 A kind of acoustic events detection method and device
CN110120230B (en) * 2019-01-08 2021-06-01 国家计算机网络与信息安全管理中心 Acoustic event detection method and device
CN110544469A (en) * 2019-09-04 2019-12-06 秒针信息技术有限公司 Training method and device of voice recognition model, storage medium and electronic device
CN110544469B (en) * 2019-09-04 2022-04-19 秒针信息技术有限公司 Training method and device of voice recognition model, storage medium and electronic device
CN110838306A (en) * 2019-11-12 2020-02-25 广州视源电子科技股份有限公司 Voice signal detection method, computer storage medium and related equipment
CN110838306B (en) * 2019-11-12 2022-05-13 广州视源电子科技股份有限公司 Voice signal detection method, computer storage medium and related equipment
CN113112681A (en) * 2020-01-13 2021-07-13 阿里健康信息技术有限公司 Vending equipment, and shipment detection method and device
CN111028841A (en) * 2020-03-10 2020-04-17 深圳市友杰智新科技有限公司 Method and device for awakening system to adjust parameters, computer equipment and storage medium
CN111711881A (en) * 2020-06-29 2020-09-25 深圳市科奈信科技有限公司 Self-adaptive volume adjustment method according to environmental sound and wireless earphone
CN111711881B (en) * 2020-06-29 2022-02-18 深圳市科奈信科技有限公司 Self-adaptive volume adjustment method according to environmental sound and wireless earphone
CN112820318A (en) * 2020-12-31 2021-05-18 西安合谱声学科技有限公司 Impact sound model establishment and impact sound detection method and system based on GMM-UBM

Also Published As

Publication number Publication date
CN103310789B (en) 2016-04-06

Similar Documents

Publication Publication Date Title
CN103310789A (en) Sound event recognition method based on optimized parallel model combination
Rabaoui et al. Using one-class SVMs and wavelets for audio surveillance
Chachada et al. Environmental sound recognition: A survey
Cowling et al. Comparison of techniques for environmental sound recognition
Wang et al. Robust environmental sound recognition for home automation
CN101136199B (en) Voice data processing method and equipment
Stowell et al. Birdsong and C4DM: A survey of UK birdsong and machine recognition for music researchers
Souli et al. Audio sounds classification using scattering features and support vectors machines for medical surveillance
CN104795064A (en) Recognition method for sound event under scene of low signal to noise ratio
CN106024010A (en) Speech signal dynamic characteristic extraction method based on formant curves
Todkar et al. Speaker recognition techniques: A review
Nishida et al. Unsupervised speaker indexing using speaker model selection based on Bayesian information criterion
Maganti et al. Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms
Gupta et al. Automatic speech recognition technique for voice command
Nyodu et al. Automatic identification of Arunachal language using K-nearest neighbor algorithm
Ravindran et al. Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing
Biagetti et al. Robust speaker identification in a meeting with short audio segments
Venkatesan et al. Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest
Camarena-Ibarrola et al. Speaker identification through spectral entropy analysis
Suzuki et al. MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments
Yue et al. Speaker age recognition based on isolated words by using SVM
Therese et al. A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system
Fang et al. A generalized denoising method with an optimized loss function for automated bird sound recognition
Jun A speaker recognition system based on MFCC and SCHMM
Ai et al. Application of hierarchical clustering analysis for vocal feature extraction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160406

Termination date: 20170617