WO2004072953A1 - Method for reducing computational quantity amount utterrance verification using anti-phoneme model - Google Patents

Method for reducing computational quantity amount utterrance verification using anti-phoneme model Download PDF

Info

Publication number
WO2004072953A1
WO2004072953A1 PCT/KR2003/000863 KR0300863W WO2004072953A1 WO 2004072953 A1 WO2004072953 A1 WO 2004072953A1 KR 0300863 W KR0300863 W KR 0300863W WO 2004072953 A1 WO2004072953 A1 WO 2004072953A1
Authority
WO
WIPO (PCT)
Prior art keywords
phoneme
classes
phonemes
phoneme model
model
Prior art date
Application number
PCT/KR2003/000863
Other languages
French (fr)
Inventor
Soon-Hyob Kim
Ho-Jun Lee
Original Assignee
Speechsoundnet Co., Ltd.
Institute Information Technology Assessment
Kwangwoon Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Speechsoundnet Co., Ltd., Institute Information Technology Assessment, Kwangwoon Foundation filed Critical Speechsoundnet Co., Ltd.
Priority to AU2003223135A priority Critical patent/AU2003223135A1/en
Publication of WO2004072953A1 publication Critical patent/WO2004072953A1/en

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B44DECORATIVE ARTS
    • B44DPAINTING OR ARTISTIC DRAWING, NOT OTHERWISE PROVIDED FOR; PRESERVING PAINTINGS; SURFACE TREATMENT TO OBTAIN SPECIAL ARTISTIC SURFACE EFFECTS OR FINISHES
    • B44D3/00Accessories or implements for use in connection with painting or artistic drawing, not otherwise provided for; Methods or devices for colour determination, selection, or synthesis, e.g. use of colour tables
    • B44D3/02Palettes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B44DECORATIVE ARTS
    • B44DPAINTING OR ARTISTIC DRAWING, NOT OTHERWISE PROVIDED FOR; PRESERVING PAINTINGS; SURFACE TREATMENT TO OBTAIN SPECIAL ARTISTIC SURFACE EFFECTS OR FINISHES
    • B44D3/00Accessories or implements for use in connection with painting or artistic drawing, not otherwise provided for; Methods or devices for colour determination, selection, or synthesis, e.g. use of colour tables
    • B44D3/12Paint cans; Brush holders; Containers for storing residual paint
    • B44D3/127Covers or lids for paint cans
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B44DECORATIVE ARTS
    • B44DPAINTING OR ARTISTIC DRAWING, NOT OTHERWISE PROVIDED FOR; PRESERVING PAINTINGS; SURFACE TREATMENT TO OBTAIN SPECIAL ARTISTIC SURFACE EFFECTS OR FINISHES
    • B44D3/00Accessories or implements for use in connection with painting or artistic drawing, not otherwise provided for; Methods or devices for colour determination, selection, or synthesis, e.g. use of colour tables
    • B44D3/12Paint cans; Brush holders; Containers for storing residual paint
    • B44D3/14Holders for paint cans
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Definitions

  • the present invention relates to a method for reducing computational process of utterance verification using an anti-phoneme model. More specifically, the present invention relates to a method for reducing computational process of utterance verification using an anti-phoneme model in order to reduce a scenario error due to a wrong recognition in a speech recognition application system
  • Speech recognition refers to a function of a machine to understand a human's speech and performs a work according to the human's speech.
  • FIG. 9 is a view that illustrates a conventional speech recognition system and utterance verification system.
  • ASR Automatic Speech Recognition
  • Registered vocabulary and a phoneme model are also inputted to the ASR system.
  • the ASR system recognizes the corresponding speech signal, and performs a post process rejection or recognition (approval) of the signal. This is called an utterance verification step.
  • the utterance verification step is a step which verifies the rejection or recognition (approval) of the speech inputted to the ASR system.
  • an anti-phoneme model is formed using a mono-phoneme model. Then, a recognition result is analyzed in a post process of a recognition engine (program). Phoneme label information having the level of a frame is extracted. Then, based on the extracted label information, a class is formed by anti- phoneme models, excluding a mono-phoneme model expressing each frame, using the established anti-phoneme model. Conventionally, as shown in FIG. 3, an anti-phoneme model is produced by a mono-phoneme model which is produced during the training procedure of words recognition. Here, the total number of mono-phoneme models is forty-five, and the total number of anti-phoneme models is forty-four.
  • the total number of each phoneme is identical with the number of anti-phoneme models. Reliance of an uttered speech is calculated based on such an anti-phoneme model and is detected in the ASR.
  • each mono-phoneme is arranged by a feature vector.
  • An initially arranged mono-phoneme K ( ⁇ ⁇ ) is compared with an anti-phoneme model (remainder: 44 phonemes), and the reliance of an uttered speech is detected.
  • a model most similar to a feature parameter of a corresponding frame in each anti-phoneme model class from all frames is searched, and the reliance of the uttered speech is calculated using the most similar model acquired from the search to verify an inputted speech.
  • a detection of a little error in a speech recognition is performed by the reliance, and discrimination between registered words and unregistered words is determined based on the reliance of the uttered speech.
  • the reliance represents the relative degree of similarity between a recognized model and an unrecognized model. Models similar to each model are searched and are called anti-phoneme models.
  • the computational process is increased in proportion to the length of a recognized speech and to the number of similar phoneme units. Consequently, it requires a long computational time, thereby requiring a long response time.
  • a method for reducing computational process of utterance verification using an anti-phoneme model comprising the steps of: arranging a plurality of phonemes; measuring distances between the phonemes by using a Bhattacharyya's distance measuring method; integrating the phonemes from a phoneme having the greatest degree of similarity one by one to perform an Agglomerative Hierarchical Clustering; forming anti-phoneme model classes which are classified into nine classes by the Agglomerative Hierarchical Clustering, each of the nine classes each having a similar phoneme; and
  • the Bhattacharyya's distance measuring method enables to measure a distance between two Gaussian distributions using the following equation:
  • FIG. 1 is a block diagram showing configurations of a speech recognition system and an utterance verification system according to an embodiment of the present invention
  • FIG. 2 is a block diagram for illustrating an anti-phoneme model forming method according to an embodiment of the present invention
  • FIG. 3 is a flow chart of a phoneme classifying procedure using the Bhattacharyya's distance measuring method and the Agglomerative Hierarchical Clustering in order to form an anti-phoneme model according to an embodiment of the present invention
  • FIG. 4 is a view showing a phoneme classification tree using the Bhattacharyya's distance measuring method and the Agglomerative Hierarchical Clustering in order to form an anti-phoneme model according to an embodiment of the present invention
  • FIG. 5 is a view which shows a final anti-phoneme model class formed by the Agglomerative Hierarchical Clustering and the Bhattacharyya's distance measuring method which are used in an embodiment of the present invention
  • FIG. 6 is a view which shows a state of executing an utterance verification using an anti-phoneme model according to the present invention
  • FIG. 7 is a view which shows a performance estimate reference of a conventional utterance verification function
  • FIG. 8 is a view which shows a performance estimate result of an utterance verification function according to the present invention.
  • FIG. 9 is a view illustrating conventional speech recognition system and an utterance verification system
  • FIG. 10 is a view illustrating a conventional anti-phoneme model forming method which is used in an utterance verification
  • FIG. 11 is a view illustrating an utterance verification method using a conventional anti-phoneme model forming method
  • FIG. 12 is a view showing a performance of an utterance verification using the conventional method.
  • FIG. 13 is a view showing a comparison of performances according to threshold values of a method according to the present invention and the conventional method.
  • FIG. 1 is a block diagram showing configurations of a speech recognition system and an utterance verification system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram for illustrating an anti-phoneme model forming method according to an embodiment of the present invention.
  • the ASR system recognizes the corresponding speech signal, and performs a post process which rejects or recognizes (approves) it. That is called an utterance verification step.
  • the utterance verification step is a step which verifies to reject or recognize (approve) the speech inputted to the ASR system.
  • the anti-phoneme model is formed by a class of all phone models other than recognized phonemes.
  • the alternative hypothesis is a probability that the recognized phoneme is a wrong phoneme.
  • the present invention is characterized by integrating mono- phoneme models extracted during a recognition model training from a phoneme having the highest degree of similarity one by one using a Bhattacharyya's distance measuring method and an Agglomerative Hierarchical Clustering. Accordingly, when searching an anti-phoneme model during an utterance verification, by searching only clusters having recognized phonemes among previously classified clusters, the number of computations for the degree of similarity is reduced from 5 to 3. So it reduces the computational process to increase the computational speed.
  • FIG. 3 is a flow chart of a phoneme classifying procedure using the Bhattacharyya's distance measuring method and the Agglomerative Hierarchical Clustering in order to form an anti-phoneme model according to the present invention.
  • the phoneme classifying procedure includes the steps of: 1) arranging a plurality of phonemes;
  • the present invention uses the Agglomerative Hierarchical Clustering for reducing an anti-phoneme model in order to search and form the anti-phoneme model similar to a recognition phoneme model as a similar phoneme class, thereby reducing the searching number and a searching time.
  • the present invention can use N phoneme models having a great similarity degree. However, a class formed by the N phoneme models is not flexible to the different number of similar phonemes. The number of similar phonemes varies according to each phoneme and features of the phonemes.
  • the Agglomerative Hierarchical Clustering is an unsupervised clustering which clusters similar phonemes and forms a layer classification. The Agglomerative Hierarchical Clustering forms a similar phoneme class based on a feature of a phoneme.
  • the present invention uses a Bhattacharyya's distance measuring method as a distance measuring method.
  • the Bhattacharyya's distance measuring method measures a distance between two Gaussian distributions. Since a computation in the Bhattacharyya's distance measuring method is simple and the Bhattacharyya's distance measuring method provides a boundary of an error rather than an exact computation of the distance, it has a flexibility.
  • the Bhattacharyya's distance measuring method measures a distance between two Gaussian distributions using the equation:
  • FIG. 4 shows a phoneme classification tree using the Agglomerative
  • the phoneme classification tree is formed by the phonemes which are phonetically similar to each other.
  • FIG. 5 is a view which shows a final anti-phoneme model aggregate formed by the Agglomerative Hierarchical Clustering and the Bhattacharyya's distance measuring method which are used in the present invention.
  • anti-phoneme model classes are classified into nine classes.
  • the nine classes and the anti-phoneme model classes classified into the nine classes include:
  • FIG. 6 is a view which shows a state executing an utterance verification using an anti-phoneme model according to the present invention.
  • the utterance verification is executed by the above anti-phoneme models.
  • each mono- phoneme is arranged by feature vectors.
  • An initially arranged mono-phoneme K ( ⁇ ) is compared with an anti-phoneme model (included in a class E of FIG. 5) and the reliance of an uttered speech is detected.
  • the initially arranged mono-phoneme K ( ⁇ >) is compared with ⁇ ti , ⁇ (initial sound), "S " , ⁇ (initial sound) ⁇ , and the reliance of an uttered speech is detected.
  • FIG. 7 is a view which shows a performance estimate reference of a conventional utterance verification function.
  • FIG. 8 is a view which shows a performance estimate result of an utterance verification function according to the present invention.
  • FIG. 12 is a view showing a performance of an utterance verification.
  • FIG. 13 is a view showing a comparison of performances according to threshold values of a method according to the present invention and a conventional method.
  • computational process is reduced by more than 50 % by forming the anti-phoneme model using the Agglomerative Hierarchical Clustering and the Bhattacharyya's distance measuring method, in an utterance verification function which is a method for reducing a scenario error due to a incorrect recognition in a speech recognition application system, during searching of a similar phoneme. Also, by searching a limited area, an effect according to a change of a threshold value is minimized. Furthermore, in accordance with the present invention, by minimizing the computational process during an utterance verification, the present invention uses an utterance verification method for minimizing a scenario error due to a incorrect recognition in an actual field, thereby providing more convenient interface to a user.

Abstract

Disclosed is a method for reducing computational quantity amount utterance verification using an anti-phoneme model in order to reduce a scenario error due to a wrong recognition in a speech recognition application system. In the method, a plurality of phonemes are arranged. Distances between the phonemes are measured using, a Bhattacharyya’s distance manner. The phonemes are integrated from a phoneme having the greatest degree of similarity, one by one, to perform an integrated layer clustering. Anti-phoneme model aggregates are classified into nine classes by the integrated layer clustering. Each of the nine classes has a similar phoneme. A degree of similarity with respect to an uttered phoneme based on the anti-phoneme model aggregates which are classified into the nine classes is calculated during an utterance verification.

Description

METHOD FOR REDUCING COMPUTATIONAL QUANTITY AMOUNT UTTERRANCE VERIFICATION USING ANTI-PHONEME MODEL
Technical Field
The present invention relates to a method for reducing computational process of utterance verification using an anti-phoneme model. More specifically, the present invention relates to a method for reducing computational process of utterance verification using an anti-phoneme model in order to reduce a scenario error due to a wrong recognition in a speech recognition application system
Background Art
Speech recognition refers to a function of a machine to understand a human's speech and performs a work according to the human's speech.
Due to the development of computers and information technology, human beings can easily obtain information from a distance without a motion. Accordingly, speech recognition devices comprising systems that operate according to a given speech have been developed.
Various speech recognition application systems based on such a speech recognition also have been developed. One of them is a system which guides desired information according to a language uttered together with an utterance.
It is assumed that there is a telephone guide system for all groups. When a user utters a name of a department in one of the groups to be searched as a speech, a speech recognition system displays a telephone number of the corresponding department. The speech recognition system is a kind of a speech recognition application field. Hereinafter, conventional speech recognition system and an utterance verification system will be described with reference to FIG. 9. FIG. 9 is a view that illustrates a conventional speech recognition system and utterance verification system. When a user speaks out a desired utterance, various parameters of a speech signal corresponding to the utterance are preprocessed and inputted to an Automatic Speech Recognition (ASR) system. Registered vocabulary and a phoneme model are also inputted to the ASR system. Then, the ASR system recognizes the corresponding speech signal, and performs a post process rejection or recognition (approval) of the signal. This is called an utterance verification step.
That is, the utterance verification step is a step which verifies the rejection or recognition (approval) of the speech inputted to the ASR system.
According to a conventional rejection method for incorrect input using an utterance verification, an anti-phoneme model is formed using a mono-phoneme model. Then, a recognition result is analyzed in a post process of a recognition engine (program). Phoneme label information having the level of a frame is extracted. Then, based on the extracted label information, a class is formed by anti- phoneme models, excluding a mono-phoneme model expressing each frame, using the established anti-phoneme model. Conventionally, as shown in FIG. 3, an anti-phoneme model is produced by a mono-phoneme model which is produced during the training procedure of words recognition. Here, the total number of mono-phoneme models is forty-five, and the total number of anti-phoneme models is forty-four.
In FIG. 5, the total number of each phoneme is identical with the number of anti-phoneme models. Reliance of an uttered speech is calculated based on such an anti-phoneme model and is detected in the ASR.
For example, as shown in FIG. 11, when a user utters "Kwang woon university", each mono-phoneme is arranged by a feature vector. An initially arranged mono-phoneme K (~ι ) is compared with an anti-phoneme model (remainder: 44 phonemes), and the reliance of an uttered speech is detected.
That is, in order to express an alternative hypothesis for the reliance calculation, a model most similar to a feature parameter of a corresponding frame in each anti-phoneme model class from all frames is searched, and the reliance of the uttered speech is calculated using the most similar model acquired from the search to verify an inputted speech.
A detection of a little error in a speech recognition is performed by the reliance, and discrimination between registered words and unregistered words is determined based on the reliance of the uttered speech. The reliance represents the relative degree of similarity between a recognized model and an unrecognized model. Models similar to each model are searched and are called anti-phoneme models.
When searching an anti-phone model in order to detect the reliance, the computational process is increased in proportion to the length of a recognized speech and to the number of similar phoneme units. Consequently, it requires a long computational time, thereby requiring a long response time.
In detail, since 44 anti-phoneme models are sequentially compared with mono-phoneme models in the order of from the initially arranged mono-phoneme to a finally arranged mono-phoneme, it takes a considerably long computational time. As described above, in the conventional rejection method for incorrect input using an utterance verification, since all similar phoneme areas are searched, a computational process is increased in proportion to the number of similar phoneme units.
Disclosure of the Invention
Therefore, it is an object of the present invention to provide a method which is capable of significantly reducing computational process of utterance verification using an anti-phoneme model in order to obtain a high computational speed by measuring distances between the phonemes using a Bhattacharyya's distance method, which forms anti-phoneme model classes having similar phonemes using an Agglomerative Hierarchical Clustering.
According to the present invention, there is provided a method for reducing computational process of utterance verification using an anti-phoneme model, the method comprising the steps of: arranging a plurality of phonemes; measuring distances between the phonemes by using a Bhattacharyya's distance measuring method; integrating the phonemes from a phoneme having the greatest degree of similarity one by one to perform an Agglomerative Hierarchical Clustering; forming anti-phoneme model classes which are classified into nine classes by the Agglomerative Hierarchical Clustering, each of the nine classes each having a similar phoneme; and
Computing a degree of similarity with respect to an uttered phoneme based on the anti-phoneme model classes which are classified into the nine classes during an utterance verification. In a preferred embodiment of the present invention, the above nine classes and the anti-phoneme model classes, classified into the above nine classes, include: {ti (final sound), τ= (final sound), ~ι (final sound), T-, Ξ (final sound)}, μ-, -i , -, T, T^} H, -11 , 4 , , Ξ (initial sound), -sr},
{o, ϋ, },
{π _ t. , π (initial sound), "6" (initial sound), ^ (initial sound)}, {-l , *, *κ, },
{mj, re, , Λ, H, n, t (initial sound), t. (initial sound)}, (TT, ^, =.1 , -] }, and
{ > , JT TI, C }.
In a more preferred embodiment of the present invention, when searching the anti-phoneme model during the utterance verification, only those classes which have a recognized phoneme among the above nine classes are searched to reduce computational quantity amount and speed of a degree of similarity. Meanwhile, the Bhattacharyya's distance measuring method enables to measure a distance between two Gaussian distributions using the following equation:
Figure imgf000007_0001
Brief Description of the Drawings
The foregoing and other objects, features and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings in which: FIG. 1 is a block diagram showing configurations of a speech recognition system and an utterance verification system according to an embodiment of the present invention;
FIG. 2 is a block diagram for illustrating an anti-phoneme model forming method according to an embodiment of the present invention;
FIG. 3 is a flow chart of a phoneme classifying procedure using the Bhattacharyya's distance measuring method and the Agglomerative Hierarchical Clustering in order to form an anti-phoneme model according to an embodiment of the present invention; FIG. 4 is a view showing a phoneme classification tree using the
Agglomerative Hierarchical Clustering and the Bhattacharyya's distance measuring method in order to form an anti-phoneme model according to an embodiment of the present invention;
FIG. 5 is a view which shows a final anti-phoneme model class formed by the Agglomerative Hierarchical Clustering and the Bhattacharyya's distance measuring method which are used in an embodiment of the present invention;
FIG. 6 is a view which shows a state of executing an utterance verification using an anti-phoneme model according to the present invention;
FIG. 7 is a view which shows a performance estimate reference of a conventional utterance verification function;
FIG. 8 is a view which shows a performance estimate result of an utterance verification function according to the present invention;
FIG. 9 is a view illustrating conventional speech recognition system and an utterance verification system; FIG. 10 is a view illustrating a conventional anti-phoneme model forming method which is used in an utterance verification;
FIG. 11 is a view illustrating an utterance verification method using a conventional anti-phoneme model forming method;
FIG. 12 is a view showing a performance of an utterance verification using the conventional method; and
FIG. 13 is a view showing a comparison of performances according to threshold values of a method according to the present invention and the conventional method.
Best Mode for Carrying Out the Invention
References will now be made in detail to the preferred embodiments of the present invention.
FIG. 1 is a block diagram showing configurations of a speech recognition system and an utterance verification system according to an embodiment of the present invention. FIG. 2 is a block diagram for illustrating an anti-phoneme model forming method according to an embodiment of the present invention.
When a user speaks desired utterances, various parameters of a speech signal corresponding to the utterances are preprocessed and inputted to an ASR system. Registered vocabulary and phoneme models are also inputted to the ASR system.
Then, the ASR system recognizes the corresponding speech signal, and performs a post process which rejects or recognizes (approves) it. That is called an utterance verification step.
That is, the utterance verification step is a step which verifies to reject or recognize (approve) the speech inputted to the ASR system. In general, the anti-phoneme model is formed by a class of all phone models other than recognized phonemes. At this time, in order to calculate an alternative hypothesis, a phoneme model having the greatest degree of similarity among the anti-phoneme model. The alternative hypothesis is a probability that the recognized phoneme is a wrong phoneme.
Conventionally, when 46 mono-phoneme models are used for an utterance verification, 45 computations for the degree of similarity are required. Accordingly, in order to detect the reliance (similarity degree) of the uttered speech signal, when the anti-phoneme mode is searched, the computational process is increased in proportion to a length of a recognized speech and the number of similar phoneme units. Consequently, it takes a long time to compute, thus leading to a long response time.
Therefore, the present invention is characterized by integrating mono- phoneme models extracted during a recognition model training from a phoneme having the highest degree of similarity one by one using a Bhattacharyya's distance measuring method and an Agglomerative Hierarchical Clustering. Accordingly, when searching an anti-phoneme model during an utterance verification, by searching only clusters having recognized phonemes among previously classified clusters, the number of computations for the degree of similarity is reduced from 5 to 3. So it reduces the computational process to increase the computational speed.
That is, the present invention searches phones similar to classified anti- phoneme models by using the Bhattacharyya's distance measuring method and the Agglomerative Hierarchical Clustering to greatly reduce computational process of a degree of similarity and to increase the computational speed. FIG. 3 is a flow chart of a phoneme classifying procedure using the Bhattacharyya's distance measuring method and the Agglomerative Hierarchical Clustering in order to form an anti-phoneme model according to the present invention.
The phoneme classifying procedure includes the steps of: 1) arranging a plurality of phonemes;
2) sequentially comparing each phoneme with the rest of phonemes using Bhattacharyya's distance measuring method, and clustering the phonemes into a plurality of classes in such a manner that the phonemes phonetically similar to each other form the same class in a plurality of the classes;
3) obtaining a minimal distance between the phonemes and clustering the phonemes using the Agglomerative Hierarchical Clustering; and
4) clustering the phonemes by repeating steps (2) and (3) until the number of the clustered phonemes becomes the desired number. The present invention uses the Agglomerative Hierarchical Clustering for reducing an anti-phoneme model in order to search and form the anti-phoneme model similar to a recognition phoneme model as a similar phoneme class, thereby reducing the searching number and a searching time.
The present invention can use N phoneme models having a great similarity degree. However, a class formed by the N phoneme models is not flexible to the different number of similar phonemes. The number of similar phonemes varies according to each phoneme and features of the phonemes. The Agglomerative Hierarchical Clustering is an unsupervised clustering which clusters similar phonemes and forms a layer classification. The Agglomerative Hierarchical Clustering forms a similar phoneme class based on a feature of a phoneme. The present invention uses a Bhattacharyya's distance measuring method as a distance measuring method. The Bhattacharyya's distance measuring method measures a distance between two Gaussian distributions. Since a computation in the Bhattacharyya's distance measuring method is simple and the Bhattacharyya's distance measuring method provides a boundary of an error rather than an exact computation of the distance, it has a flexibility.
The Bhattacharyya's distance measuring method measures a distance between two Gaussian distributions using the equation:
Figure imgf000012_0001
and a boundary with respect to an error between the two Gaussian distributions is
expressed by s < Jp exp(-Dbhat ).
FIG. 4 shows a phoneme classification tree using the Agglomerative
Hierarchical Clustering and the Bhattacharyya's distance measuring method in order to form an anti-phoneme model according to the present invention. The phoneme classification tree is formed by the phonemes which are phonetically similar to each other.
FIG. 5 is a view which shows a final anti-phoneme model aggregate formed by the Agglomerative Hierarchical Clustering and the Bhattacharyya's distance measuring method which are used in the present invention. As shown in FIG. 5, anti-phoneme model classes are classified into nine classes. Preferably, the nine classes and the anti-phoneme model classes classified into the nine classes include:
{ ti (final sound), t_: (final sound), ~ι (final sound), 1-, ≡ (final sound)}, P-, 4 , - T, )}
{τ-11, "11 , , Y , Ξ (initial sound), -&}, {o, ^, 1 ,-rϊ},
{π , ^ , (initial sound), ~& (initial sound), ^ (initial sound)}, {=ι , X-, , },
[m, ιx. , A, E, jr . t-: (initial sound), ti (initial sound)}, {IT, -^, 41 , -l }, and { h 4, TI, τ=}.
FIG. 6 is a view which shows a state executing an utterance verification using an anti-phoneme model according to the present invention. The utterance verification is executed by the above anti-phoneme models.
For example, when the user utters "Kwang woon university", each mono- phoneme is arranged by feature vectors. An initially arranged mono-phoneme K (π ) is compared with an anti-phoneme model (included in a class E of FIG. 5) and the reliance of an uttered speech is detected.
That is, the initially arranged mono-phoneme K (~>) is compared with {ti , π (initial sound), "S", ^ (initial sound)}, and the reliance of an uttered speech is detected.
As described above, when searching the anti-phoneme model during the utterance verification, by searching only clusters having recognized phonemes among previously classified clusters, the number of computations for the degree of similarity is reduced from 5 to 3, thereby reducing computational process while increasing the computational speed.
FIG. 7 is a view which shows a performance estimate reference of a conventional utterance verification function. FIG. 8 is a view which shows a performance estimate result of an utterance verification function according to the present invention. FIG. 12 is a view showing a performance of an utterance verification. FIG. 13 is a view showing a comparison of performances according to threshold values of a method according to the present invention and a conventional method. When comparing the performance estimate result of FIG. 8 with a performance of FIG. 12, a total recognition ratio of the present invention is minutely less than that of the conventional method. It is a range which has a great influence on the recognition ratio.
Industrial Applicability
As seen from the foregoing, according to the method for reducing the computational process of utterance verification using an anti-phoneme model of the present invention, computational process is reduced by more than 50 % by forming the anti-phoneme model using the Agglomerative Hierarchical Clustering and the Bhattacharyya's distance measuring method, in an utterance verification function which is a method for reducing a scenario error due to a incorrect recognition in a speech recognition application system, during searching of a similar phoneme. Also, by searching a limited area, an effect according to a change of a threshold value is minimized. Furthermore, in accordance with the present invention, by minimizing the computational process during an utterance verification, the present invention uses an utterance verification method for minimizing a scenario error due to a incorrect recognition in an actual field, thereby providing more convenient interface to a user.
While this invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiment and the drawings, but to the contrary, it is intended to cover various modifications and variations within the spirit and scope of the appended claims.

Claims

ClaimsWhat is claimed is:
1. A method for reducing computational process in utterance verification using an anti-phoneme model, wherein said method comprises the following steps of: arranging a plurality of phonemes; measuring distances between the phonemes by using a Bhattacharyya's distance measuring method; integrating the phonemes from a phoneme having the greatest degree of similarity one by one to perform an Agglomerative Hierarchical Clustering; generating anti-phoneme model classes which are classified into nine classes by the Agglomerative Hierarchical Clustering, each of said nine classes having a similar phoneme; and computing a degree of similarity with respect to an uttered phoneme based on the anti-phoneme model classes which are classified into the nine classes during an utterance verification.
2. The method according to claim 1, wherein said nine classes and said anti- phoneme model classes classified into said nine classes include: {u (final sound), ^ (final sound), ~ι (final sound), L, s (final sound)}, μ-, -. , -, T, i}
H, "fl , 4 , , Ξ (initial sound), "&},
{ , y , (initial sound), "er (initial sound), A (initial sound)}, {=ι , *, , A }, [m, v , , , ≡, -si, τ=: (initial sound), ti (initial sound)}, {TT, >-, =fl , -l }, and { V , , TI, t=}.
3. The method according to claim 1 or 2, wherein only classes having a recognized phoneme among said nine classes are searched to reduce computational process and speed of a degree of similarity when searching the anti-phoneme model during the utterance verification.
4. The method according to claim 1 or 2, wherein said Bhattacharyya's distance measuring method measures a distance between two Gaussian distributions using the following equation:
Dhhal =
Figure imgf000017_0001
and a boundary with respect to an error between the two Gaussian distributions is
expressed by ε ≤ P ex$(-DblM ).
PCT/KR2003/000863 2003-02-12 2003-04-29 Method for reducing computational quantity amount utterrance verification using anti-phoneme model WO2004072953A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003223135A AU2003223135A1 (en) 2003-02-12 2003-04-29 Method for reducing computational quantity amount utterrance verification using anti-phoneme model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2003-0008685 2003-02-12
KR10-2003-0008685A KR100492089B1 (en) 2003-02-12 2003-02-12 Method for reducing compute quantity amount uttrrance verification using anti-phoneme model

Publications (1)

Publication Number Publication Date
WO2004072953A1 true WO2004072953A1 (en) 2004-08-26

Family

ID=32866879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2003/000863 WO2004072953A1 (en) 2003-02-12 2003-04-29 Method for reducing computational quantity amount utterrance verification using anti-phoneme model

Country Status (3)

Country Link
KR (1) KR100492089B1 (en)
AU (1) AU2003223135A1 (en)
WO (1) WO2004072953A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4803729A (en) * 1987-04-03 1989-02-07 Dragon Systems, Inc. Speech recognition method
JP2000105597A (en) * 1998-09-29 2000-04-11 Atr Interpreting Telecommunications Res Lab Speech recognition error correction device
KR20000025827A (en) * 1998-10-14 2000-05-06 이계철 Method for constructing anti-phone model in speech recognition system and method for verifying phonetic
KR20020045960A (en) * 2000-12-12 2002-06-20 이계철 Method for performance improvement of keyword detection in speech recognition
US6526379B1 (en) * 1999-11-29 2003-02-25 Matsushita Electric Industrial Co., Ltd. Discriminative clustering methods for automatic speech recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4803729A (en) * 1987-04-03 1989-02-07 Dragon Systems, Inc. Speech recognition method
JP2000105597A (en) * 1998-09-29 2000-04-11 Atr Interpreting Telecommunications Res Lab Speech recognition error correction device
KR20000025827A (en) * 1998-10-14 2000-05-06 이계철 Method for constructing anti-phone model in speech recognition system and method for verifying phonetic
US6526379B1 (en) * 1999-11-29 2003-02-25 Matsushita Electric Industrial Co., Ltd. Discriminative clustering methods for automatic speech recognition
KR20020045960A (en) * 2000-12-12 2002-06-20 이계철 Method for performance improvement of keyword detection in speech recognition

Also Published As

Publication number Publication date
AU2003223135A1 (en) 2004-09-06
AU2003223135A8 (en) 2004-09-06
KR100492089B1 (en) 2005-06-02
KR20040072989A (en) 2004-08-19

Similar Documents

Publication Publication Date Title
US7657432B2 (en) Speaker recognition method based on structured speaker modeling and a scoring technique
Kamppari et al. Word and phone level acoustic confidence scoring
US8271283B2 (en) Method and apparatus for recognizing speech by measuring confidence levels of respective frames
Sainath et al. Exemplar-based sparse representation features: From TIMIT to LVCSR
US5717826A (en) Utterance verification using word based minimum verification error training for recognizing a keyboard string
US5822729A (en) Feature-based speech recognizer having probabilistic linguistic processor providing word matching based on the entire space of feature vectors
US7966183B1 (en) Multiplying confidence scores for utterance verification in a mobile telephone
US5745649A (en) Automated speech recognition using a plurality of different multilayer perception structures to model a plurality of distinct phoneme categories
US20120316879A1 (en) System for detecting speech interval and recognizing continous speech in a noisy environment through real-time recognition of call commands
Chou Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition
Ma et al. A support vector machines-based rejection technique for speech recognition
US20060287856A1 (en) Speech models generated using competitive training, asymmetric training, and data boosting
US8229744B2 (en) Class detection scheme and time mediated averaging of class dependent models
Sukkar et al. A two pass classifier for utterance rejection in keyword spotting
Zhao A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units
McDermott et al. Prototype-based minimum classification error/generalized probabilistic descent training for various speech units
JP4769098B2 (en) Speech recognition reliability estimation apparatus, method thereof, and program
McDermott et al. String-level MCE for continuous phoneme recognition.
Sukkar Rejection for connected digit recognition based on GPD segmental discrimination
WO2004072953A1 (en) Method for reducing computational quantity amount utterrance verification using anti-phoneme model
Shinozaki et al. Gaussian mixture optimization based on efficient cross-validation
KR20020045960A (en) Method for performance improvement of keyword detection in speech recognition
Anguita et al. Detection of confusable words in automatic speech recognition
Fujimura et al. Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices.
Kaewtip et al. A Hierarchical Classification Framework for Phonemes and Broad Phonetic Groups (BPGs): a Discriminative Template-Based Approach

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP