US20140095162A1 - Hierarchical methods and apparatus for extracting user intent from spoken utterances - Google Patents

Hierarchical methods and apparatus for extracting user intent from spoken utterances Download PDF

Info

Publication number
US20140095162A1
US20140095162A1 US14/043,647 US201314043647A US2014095162A1 US 20140095162 A1 US20140095162 A1 US 20140095162A1 US 201314043647 A US201314043647 A US 201314043647A US 2014095162 A1 US2014095162 A1 US 2014095162A1
Authority
US
United States
Prior art keywords
semantic analysis
level
classification
sub
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/043,647
Inventor
Dimitri Kanevsky
Joseph Simon Reisinger
Roberto Sicconi
Mahesh Viswanathan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
Original Assignee
Nuance Communications Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuance Communications Inc filed Critical Nuance Communications Inc
Priority to US14/043,647 priority Critical patent/US20140095162A1/en
Publication of US20140095162A1 publication Critical patent/US20140095162A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics

Definitions

  • the present invention relates generally to speech processing systems and, more particularly, to systems for hierarchically extracting user intent from spoken utterances, such as spoken instructions or commands.
  • a speech recognition system or a voice system
  • a voice system to translate a user's spoken command to a precise text command that the target system can input and process is well known.
  • a user e.g., driver
  • a voice system to translate a user's spoken command to a precise text command that the target system can input and process is well known.
  • a user e.g., driver
  • a voice system to translate a user's spoken command to a precise text command that the target system can input and process
  • the climate control system in the vehicle is the target system.
  • the user of a conventional voice system may typically have to utter several predetermined machine-based grammar commands, such as the command “climate control” followed by the command “air conditioner” followed by the command “decrease temperature” followed by the command “five degrees.”
  • One approach that attempts to overcome the machine-based grammar problem is to use a single-stage front end action classifier that detects a very general subject from the user's speech, which is then provided to a human operator for further intent determination. This is typically the approach used in the General Motors' OnStarTM system. However, a major problem with this approach is that a human operator is required.
  • Another approach is to build a full-fledged statistical parser, which takes the input as transcribed and builds a parse tree which is mined later to extract intent.
  • One major difficulty in this second approach is that statistical parsers are huge in terms of storage requirements. Further, they require hand-tuning in every step. That is, every time data is added, the statistical parser requires a tremendous amount of hand-tuning and balancing of the new data with the old data.
  • Principles of the present invention provide improved techniques for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system.
  • human-based grammar i.e., free form or conversational input
  • a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.
  • the multi-stage intent extraction approach may have more than two iterations.
  • the user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user.
  • the first class may represent a target (e.g., topic) associated with the user intent
  • the sub-class of the first class may represent an action (e.g., function) associated with the target
  • the sub-class of the sub-class of the first class may represent data associated with the action.
  • One or more commands may then be provided to a target system based on the class and sub-class determinations.
  • FIG. 1 illustrates a block diagram of a hierarchical system for extracting user intent from a spoken utterance, according to an embodiment of the invention
  • FIG. 2 illustrates a block diagram of a hierarchy manager, according to an embodiment of the invention
  • FIG. 3 illustrates a block diagram of an intent recognition manager, according to an embodiment of the invention
  • FIG. 4 illustrates a block diagram of a confidence/rejection module, according to an embodiment of the invention
  • FIG. 5 illustrates a flow diagram of a run-time methodology for use in hierarchically extracting user intent from a spoken utterance, according to an embodiment of the invention
  • FIG. 6 illustrates a flow diagram of a training methodology for use in hierarchically extracting user intent from a spoken utterance, according to an embodiment of the invention.
  • FIG. 7 illustrates a block diagram of a computing system for use in implementing a hierarchical system for extracting user intent from a spoken utterance, according to an embodiment of the invention.
  • Principles of the invention address the problem of extracting user intent from free form-type spoken utterances. For example, returning to the vehicle-based climate control example described above, principles of the invention permit a driver to interact with a voice system in the vehicle by giving free form voice instructions that are different than the precise (machine-based grammar) voice commands understood by the climate control system. Thus, in this particular example, instead of saying the precise commands “decrease temperature” and “five degrees,” in accordance with principles of the invention, the drivers may say “make it cooler.” The system interprets “it” and “cooler” and associates the phrase with a temperature and asks one or more additional questions to clarify the user intent.
  • the system detects a dialog domain, such as in the following examples (the illustrative free form-type spoken utterance is to the left of the arrow and the illustrative detected dialog domain is to the right of the arrow):
  • principles of the invention are able to determine intent associated with a spoken utterance of a user by obtaining decoded speech uttered by the user (e.g., from a speech recognition engine), and extracting an intent from the decoded speech uttered by the user, wherein the intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration.
  • the first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.
  • the multi-stage approach may have more than two iterations.
  • the user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user.
  • the first class may represent a target (e.g., topic) associated with the user intent
  • the sub-class of the first class may represent an action (e.g., function) associated with the target
  • the sub-class of the sub-class of the first class may represent data associated with the action.
  • One or more commands may then be provided to a target system based on the class and sub-class determinations.
  • principles of the invention provide a multi-stage system that extracts more and more information from the same sentence as it goes along.
  • the free form utterance “turn the volume up” may result in a detected class “Audio” after a first stage (or first iteration), a sub-class “Audio_Volume” after a second stage (or second iteration), and a sub-class “Audio_Volume_Up” (which is a sub-class of the sub-class “Audio”) after a third stage (or third iteration).
  • each stage or level in the multi-stage system acts as an elemental AVP extractor or semantic analyzer of the sentence.
  • AVP attribute value pair
  • each stage or level in the multi-stage system acts as an elemental AVP extractor or semantic analyzer of the sentence.
  • the multi-stage system of the invention is not tagging each word with labels as would occur in a statistical parser or attaching a semantic label as would occur in a linguistic parser, rather the multi-stage system is adding class, sub-class, and sub-class (of the sub-class) information, which is far simpler to do.
  • the methodology is iterative because the same process is applied at each subsequent level with only finer and finer class labels.
  • Table 1 below is an example of the multi-level class labels (e.g., hierarchical structure) that may be associated with the audio example:
  • Level AUDIO 1 Level AUDIO_RADIO AUDIO_VOLUME 2: Level Aud._Radio_on Aud._Radio_off A_Radio_Station 3: Aud._volume_down Aud._volume_up
  • an initial training data set may be used.
  • the process is automated wherein a small model is built with a relatively small data set. Then, the training process iterates when new data is added, using the initial model to label the new data set.
  • the multi-stage system can also be employed with lower level parsers or metadata. That is, most of the intent determination processing uses the hierarchical action classification approach of the invention. However, when the system gets down to some very specific part of the user request, e.g., complicated navigation request that has a “to city,” a “from city,” and/or some other peripheral information like avoiding the most congested roads, this can make the request complicated.
  • the system can utilize added metadata and/or use a simple kind of parser, at the lowest stage or level, for extracting items such as “to” and “from” information.
  • principles of the invention are able to use a smaller domain dependent subset of the data.
  • a hierarchical system for extracting user intent from a spoken utterance is depicted.
  • the system referred to as a dialog domain detection (DDE) engine 10 comprises conversational system 100 , command bus 101 , hierarchy manager 102 , intent recognition manager 103 , question module 104 , situation manager 105 , audio input 106 , speech recognition system 107 , and sensors 108 .
  • DDE dialog domain detection
  • Conversational system 100 functions as a dialog manager. Audio input 106 represents the spoken utterances captured by the system that are being processed to determine intent. Conversational system 100 sends the audio input to speech recognition engine 107 , which then decodes the audio and returns text, representative of what the speech recognition engine recognized, back to conversational system 100 . It is to be appreciated that the invention is not limited to any particular speech recognition engine and, thus, any suitable speech recognition system can be employed. By way of example only, the IBM Corporation (Armonk, N.Y.) Embedded ViaVoiceTM engine could be employed.
  • the command bus 101 serves as a central communication bus between the components of the DDE engine.
  • Hierarchy manager 102 (as will be explained in further detail below in the context of FIG. 2 ) imposes the top-down iterative structure used by intent recognition manager 103 (as will be explained in further detail below in the context of FIG. 3 ) to extract intent from the spoken utterance of the user.
  • intent recognition manager 103 (as will be explained in further detail below in the context of FIG. 3 ) to extract intent from the spoken utterance of the user.
  • the above-described multi-level class labels in Table 1 may serve as the imposed hierarchical structure.
  • hierarchy manager 102 sets the number of levels or stages that intent recognition manager 103 will traverse for a given intent determination session. More particularly, hierarchy manager dictates, at each level, the permitted inputs and the permitted results (e.g., class labels). Then, intent recognition manager 103 traverses (top to bottom) the hierarchical structure set by the hierarchy manager. As it traverses down the structure, intent recognition manager 103 expects hierarchy manager 102 to inform it, at this level, what structure can be imposed. Thus, intent recognition manager keeps referring back to the hierarchy manager.
  • Intent recognition manager 103 has an additional function. It is also serves as an interface for the logical, multi-tiered view of the user-input sentence. Conversational system 100 may utilize such a logical view of the sentence.
  • the intent gets clarified as the intent recognition manager walks down the structure.
  • the intent recognition manager walks down the structure and determines a particular intent at each level, from broad to narrow.
  • the particular intent determined at each level is referred to herein as an “interpretation.”
  • the top level intent is going to be the audio system. However, this does not mean much since there are any number of actions that can be taken with respect to the audio system.
  • the next level could determine that the user is referring to a radio station.
  • the next level could determine a particular radio station that the user wishes to be selected.
  • the DDE engine of the invention permits the user to say “I want to listen to channel 47.” Therefore, the intent recognition manager starts with a vague picture, or actually with nothing, and tries to come up with a highly tuned view of what the intent is.
  • Question module 104 generates questions that can be asked of the user that may be used to assist the system with determining intent.
  • dialog managers are able to coordinate the asking of questions to a speaker, the responses to which further clarify any ambiguity that remains from the previous user input.
  • question module may comprise a text-to-speech engine capable of generating questions that are audibly output to the user. The responses are processed through the speech recognition engine and provided to the conversational system which coordinates their use with the intent recognition manager. Further, when an intent is determined by the system, question module 104 could serve to ask the user to confirm that intent before the system sends the appropriate command(s) to the target system.
  • Sensors 108 may comprise one or more sensors that describe external situations (e.g., weather, speed, humidity, temperature, location via a global positioning system, etc.) and personal characteristics (e.g., biometrics—voice, face characteristics, tired, sleepiness conditions). This information, coordinated by situation manager 105 , may also be used to determine intent of the user and/or assist in providing a response to the user.
  • external situations e.g., weather, speed, humidity, temperature, location via a global positioning system, etc.
  • personal characteristics e.g., biometrics—voice, face characteristics, tired, sleepiness conditions.
  • hierarchy manager ( 102 in FIG. 1 ) comprises parser 201 , labeler 202 , semantic processing module 203 , sequencing module 204 , topic 205 , function and data 206 , text input 208 , and training module 210 .
  • Parser 201 receives as input text 208 .
  • text 208 represents the decoded speech, i.e., the result of the audio input ( 106 in FIG. 1 ) being decoded by the speech recognition engine ( 107 in FIG. 1 ).
  • the role of parser 201 is to tag the parts of speech of the decoded text, e.g., nouns, verbs, other grammatical terms or phrases.
  • the parser can utilize meta information or even external mark up to describe the tagged portions of the text.
  • Labeler 202 separates function and non-function words in the text. That is, it is understood that some words in the text are more valuable (function words) than other words (non-function words) in determining intent. To do this, the words in the text are weighted by the labeler. The weighting may be done by accessing the domain dependent model and scoring the words in the text against all potential words. The importance of the word depends on its score, i.e., words with higher scores are considered more important. Words at or above a threshold score may be considered function words, while words below a threshold score may be considered non-function words.
  • Semantic processor 203 interprets the scores assigned by the labeler. For example, the semantic processor may determine for a given input sentence that terms associated with audio have more weight than terms associated with climate control. Thus, the semantic processor accepts all the interpretations, does a relative scoring, applies a threshold, and decides, for example, that the top three interpretations should be taken as the most relevant ones.
  • Interpretation means intent in this context.
  • the labeler produces a list of interpretations and attendant scores. Since this is a statistical approach, there are no unambiguously correct labels produced, but instead a list of interpretations covering all possible interpretations.
  • the semantic processor applies intelligent thresholding to discard low scores that are possible but of low probability based on prior knowledge or simple thresholding.
  • Prior knowledge can include user knowledge derived from the training data, and simple thresholding can includes retaining a fixed number of interpretations (e.g., three), or retaining all interpretations within a fixed percentage of the best scoring label. These are all parameters that can be made available to an agent deploying the system via operating panels.
  • semantic processor 203 may employ techniques disclosed in U.S. Pat. No. 6,236,968.
  • the interpreted result is a three-tuple (a group of three sub-results). That is, in this particular embodiment, to “understand” a command three entities are extracted and analyzed: (1) the machine (target or topic 205 ) that is operated upon (e.g., Audio. Navigation); (2) the action (function 206 ) to be performed (e.g., switch, turn, move); and (3) the data 206 that is provided with the action (e.g., on/off, up/down, left/right).
  • Table 1 above illustrates the hierarchical structure from which the three-tuple may be determined. It is to be understood that while hierarchy manager 102 and intent recognition manager 103 are illustrated in FIG. 1 as logically separate components, the components may be implemented in a single functional module due to their tightly coupled functionality.
  • Sequencing module 204 is used to apply global rules on which part of the sentence is more important because, for example, it is first in order in the sentence or because it is the premise of the sentence or because the user used more emphasis on it.
  • sequencing or timing relates to separating, within a complex request from the user, the primary request from a secondary one.
  • the target system is a navigation system
  • the principal request is find me a McDonald's.
  • the parking is a secondary request.
  • the sequencer informs the semantic processor that the concept of “finding a McDonald's” should take precedence or is more important than the concept of “parking.”
  • Such sequencing may be determined from any nuances in the user's utterance that guide the search for the correct interpretation.
  • An emphasized word or phrase carries more weight.
  • the speeding up of a phrase within a sentence may carry additional indicators of importance, etc. So this module attempts to perform a fine-grained analysis of the user's nuances.
  • Training module 210 serves to train parser 201 , labeler 202 , and semantic processor 203 .
  • intent recognition manager ( 103 in FIG. 1 ) comprises weight computation module 300 , pruning module 301 , list preparation module 302 , feedback 303 , and external input 304 .
  • Weight computation module 300 computes the weights of the different words in the user utterance and applies two kinds of quantitative tests. The first is to compute whether the words in the utterance are above a fixed threshold. This is the rejection mechanism which decides whether to accept the user utterance for analysis or reject it outright as being outside the realm of its capability. Systems built for use in a car are unlikely to “understand” questions about other general subjects. In other words, it has to be able to detect that the user used words that are outside its vocabulary. The rejection mechanism is one way to do this. The second quantitative test is the confidence scores. These are the relative scores of the multiple interpretations of the user utterance.
  • Pruning module 301 prunes the list from weight computation module 300 .
  • the output from weight computation module 300 nominally will include all possible candidate interpretations. Pruning module 301 decides which ones are worth keeping. Some scores from weight computation module 300 may be too small to consider, not relevant, or too small in magnitude relative to the top scoring interpretations. A “worthiness” test may be derived from the training data. Further, the pruning module can include a control panel and additional controls that can be adjusted with input from customer satisfaction tests (feedback 303 ).
  • List preparation module 302 prepares the final intent list.
  • the search for the interpretation is usually done in a hierarchical fashion with each level in turn revealing the topic, function, and data.
  • the scoring, pruning and list preparing tasks are iterative as the scores are carried from one level to the next.
  • the top three scorers from the top level are expanded to the next level. The top three are appropriate it has been proven from computing with training data that 98.5% of the time the correct interpretation is within the top three results.
  • external inputs 304 e.g., other intent recognition scores
  • FIG. 4 a confidence/rejection module, according to an embodiment of the invention, is depicted. It is to be understood that FIG. 4 depicts the confidence score and rejection mechanisms shown in weight computation module 300 of FIG. 3 .
  • the confidence score for an utterance is the ratio of words in-vocabulary to the total number of words in the utterance. Hence, if all the words in the utterance are found in the system's vocabulary, then the confidence score is 1. If none are, it is zero. If the ratio is less than 0.5, then the utterance is rejected. Block 400 computed the confidence score and block 401 applies the rejection mechanism.
  • the confidence score tries to determine how many of the words are in the system vocabulary versus out of the system vocabulary. If all of the words are in the vocabulary, the word scores are accepted as is. If a fraction of the words are not in the vocabulary, then those words are handicapped to the extent they are not in the vocabulary. For example, if 75 percent of the words are in the vocabulary, every score coming out of the word score computation is handicapped (i.e., by multiplying by 0.75). That cascades down the hierarchy. The siblings are also penalized to that extent.
  • FIG. 5 a run-time methodology for use in hierarchically extracting user intent from a spoken utterance, according to an embodiment of the invention, is depicted.
  • the input utterance is applied to the system (i.e., applied against the system model) and the system will return an interpretation, e.g., a three-tuple comprising [topic][function][data].
  • an input “turn the volume up” will generate multiple interpretations:
  • FIG. 5 shows a flow chart of how these interpretations are generated.
  • An initial model tree created during training contains all possible paths that can yield a result. Traversing down this tree from the top node to a leaf node yields several interpretations per level. So, for example, nine interpretations from the top level are pruned down to three. Each of the nodes of the tree are expanded to their child nodes. For example, “Audio” above may yield “Audio_Volume,” “Audio_Treble,” and “Audio_CD”), and “Climate” may yield three more of its children. Similarly, “Audio_Volume” will be split into its children. The process stops after three levels. In some cases, there may be fewer than three levels simply because there is not adequate data to warrant a third level.
  • Step 501 Push top-level interpretation that operates with the text input 500 .
  • Step 502 Assign scores for interpretations from step 501 .
  • Step 503 Get next interpretation.
  • Step 504 Check if anything is left (None Left?).
  • Step 505 If “No” for step 504 , then check if node is expandable
  • Step 506 If not expandable, then add to interpretation list and go to get next interpretation (step 503 ).
  • Step 507 Olerwise (if expandable), calculate children and go to assign scores (step 502 ).
  • step 504 methodology is done ( 508 ).
  • FIG. 6 a training methodology for use in hierarchically extracting user intent from a spoken utterance, according to an embodiment of the invention, is depicted.
  • Step 600 Collect text data in domain.
  • Step 601 Split data into individual domains.
  • Step 602 Tag domains.
  • Step 603 Gather more data.
  • Step 604 None left? If no, go to step 601 .
  • Step 605 Build system model, if yes in step 604 .
  • we preferably split training data into one set for each node in the hierarchy, and build a model for each node.
  • FIG. 7 a block diagram of an illustrative implementation of a computing system for use in implementing techniques of the invention is shown. More particularly, FIG. 7 represents a computing system which may implement the user intent extraction components and methodologies of the invention, as described above in the context of FIGS. 1 through 6 . The architecture shown may also be used to implement a target system.
  • a processor 701 for controlling and performing methodologies described herein is coupled to a memory 702 and a user interface 703 via a computer bus 704 .
  • processor as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) or other suitable processing circuitry.
  • the processor may be a digital signal processor (DSP), as is known in the art.
  • DSP digital signal processor
  • processor may refer to more than one individual processor.
  • the invention is not limited to any particular processor type or configuration.
  • memory as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard drive), a removable memory device (e.g., diskette), flash memory, etc.
  • a processor or CPU such as, for example, RAM, ROM, a fixed memory device (e.g., hard drive), a removable memory device (e.g., diskette), flash memory, etc.
  • RAM random access memory
  • ROM read-only memory
  • a fixed memory device e.g., hard drive
  • removable memory device e.g., diskette
  • flash memory etc.
  • the invention is not limited to any particular memory type or configuration.
  • user interface as used herein is intended to include, for example, one or more input devices, e.g., keyboard, for inputting data to the processing unit, and/or one or more output devices, e.g., CRT display and/or printer, for providing results associated with the processing unit.
  • the user interface may also include one or more microphones for receiving user speech.
  • the invention is not limited to any particular user interface type or configuration.
  • computer software including instructions or code for performing the methodologies of the invention, as described herein, may be stored in one or more of the associated memory devices (e.g., ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (e.g., into RAM) and executed by a CPU.
  • ROM read-only memory
  • RAM random access memory
  • FIGS. 1 through 7 may be implemented in various forms of hardware, software, or combinations thereof, e.g., one or more digital signal processors with associated memory, application specific integrated circuit(s), functional circuitry, one or more appropriately programmed general purpose digital computers with associated memory, etc.
  • digital signal processors with associated memory
  • application specific integrated circuit(s) e.g., one or more digital signal processors with associated memory
  • functional circuitry e.g., one or more appropriately programmed general purpose digital computers with associated memory, etc.
  • FIGS. 1 through 7 may be implemented in various forms of hardware, software, or combinations thereof, e.g., one or more digital signal processors with associated memory, application specific integrated circuit(s), functional circuitry, one or more appropriately programmed general purpose digital computers with associated memory, etc.

Abstract

Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The multi-stage intent extraction approach may have more than two iterations. By way of example only, the user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user.

Description

    FIELD OF THE INVENTION
  • The present invention relates generally to speech processing systems and, more particularly, to systems for hierarchically extracting user intent from spoken utterances, such as spoken instructions or commands.
  • BACKGROUND OF THE INVENTION
  • The use of a speech recognition system (or a voice system) to translate a user's spoken command to a precise text command that the target system can input and process is well known. For example, in a conventional voice system based in a vehicle, a user (e.g., driver) interacts with the voice system by uttering very specific commands that must be consistent with machine-based grammar that is understood by the target system.
  • By way of example, assume that the climate control system in the vehicle is the target system. In order to decrease the temperature in the vehicle, the user of a conventional voice system may typically have to utter several predetermined machine-based grammar commands, such as the command “climate control” followed by the command “air conditioner” followed by the command “decrease temperature” followed by the command “five degrees.”
  • Unfortunately, people do not talk or think in terms of specific machine-based grammar, and may also forget the precise predetermined commands that must be uttered to effectuate their wishes.
  • One approach that attempts to overcome the machine-based grammar problem is to use a single-stage front end action classifier that detects a very general subject from the user's speech, which is then provided to a human operator for further intent determination. This is typically the approach used in the General Motors' OnStar™ system. However, a major problem with this approach is that a human operator is required.
  • Another approach is to build a full-fledged statistical parser, which takes the input as transcribed and builds a parse tree which is mined later to extract intent. One major difficulty in this second approach is that statistical parsers are huge in terms of storage requirements. Further, they require hand-tuning in every step. That is, every time data is added, the statistical parser requires a tremendous amount of hand-tuning and balancing of the new data with the old data.
  • Accordingly, improved techniques are needed that permit a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system.
  • SUMMARY OF THE INVENTION
  • Principles of the present invention provide improved techniques for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system.
  • In one aspect of the invention, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target.
  • The multi-stage intent extraction approach may have more than two iterations. By way of example only, the user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user.
  • In a preferred embodiment, as will be explained in further detail below, the first class may represent a target (e.g., topic) associated with the user intent, the sub-class of the first class may represent an action (e.g., function) associated with the target, and the sub-class of the sub-class of the first class may represent data associated with the action. One or more commands may then be provided to a target system based on the class and sub-class determinations.
  • These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates a block diagram of a hierarchical system for extracting user intent from a spoken utterance, according to an embodiment of the invention;
  • FIG. 2 illustrates a block diagram of a hierarchy manager, according to an embodiment of the invention;
  • FIG. 3 illustrates a block diagram of an intent recognition manager, according to an embodiment of the invention;
  • FIG. 4 illustrates a block diagram of a confidence/rejection module, according to an embodiment of the invention;
  • FIG. 5 illustrates a flow diagram of a run-time methodology for use in hierarchically extracting user intent from a spoken utterance, according to an embodiment of the invention;
  • FIG. 6 illustrates a flow diagram of a training methodology for use in hierarchically extracting user intent from a spoken utterance, according to an embodiment of the invention; and
  • FIG. 7 illustrates a block diagram of a computing system for use in implementing a hierarchical system for extracting user intent from a spoken utterance, according to an embodiment of the invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • While the present invention may be illustratively described below in the context of a vehicle-based voice system, it is to be understood that principles of the invention are not limited to any particular computing system environment or any particular speech recognition application. Rather, principles of the invention are more generally applicable to any computing system environment and any speech recognition application in which it would be desirable to permit the user to provide free form or conversational speech input.
  • Principles of the invention address the problem of extracting user intent from free form-type spoken utterances. For example, returning to the vehicle-based climate control example described above, principles of the invention permit a driver to interact with a voice system in the vehicle by giving free form voice instructions that are different than the precise (machine-based grammar) voice commands understood by the climate control system. Thus, in this particular example, instead of saying the precise commands “decrease temperature” and “five degrees,” in accordance with principles of the invention, the drivers may say “make it cooler.” The system interprets “it” and “cooler” and associates the phrase with a temperature and asks one or more additional questions to clarify the user intent.
  • To do this, the system detects a dialog domain, such as in the following examples (the illustrative free form-type spoken utterance is to the left of the arrow and the illustrative detected dialog domain is to the right of the arrow):
      • Turn the AC up→CLIMATE
      • Set the temperature to 76 degrees→CLIMATE
      • Set the radio to one oh one point seven FM→AUDIO and AUDIO_RadioStation
      • What features are available in this system→HELP
      • Switch off the CD player→AUDIO or AUDIO_CD
      • What are the current traffic conditions→TRAFFIC
      • How is the rush hour traffic in New York city→TRAFFIC
      • What is tomorrow's weather forecast for Boston→WEATHER
      • What are the road conditions for my route→TRAFFIC
      • How do I use the point of interest application→HELP
      • How far is Hollywood→NAVIGATION
      • Increase volume→AUDIO or AUDIO Volume
      • Raise fan speed→CLIMATE
      • Scan for a rock-and-roll station in this area→AUDIO and AUDIO RadioStation
      • I am looking for Chinese food→RESTAURANTS
      • My destination is the Mid-Hudson bridge→NAVIGATION
  • As will be illustratively explained herein, principles of the invention are able to determine intent associated with a spoken utterance of a user by obtaining decoded speech uttered by the user (e.g., from a speech recognition engine), and extracting an intent from the decoded speech uttered by the user, wherein the intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. Of course, the multi-stage approach may have more than two iterations. By way of example only, the user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user.
  • In a preferred embodiment, as will be explained in further detail below, the first class may represent a target (e.g., topic) associated with the user intent, the sub-class of the first class may represent an action (e.g., function) associated with the target, and the sub-class of the sub-class of the first class may represent data associated with the action. One or more commands may then be provided to a target system based on the class and sub-class determinations.
  • Advantageously, principles of the invention provide a multi-stage system that extracts more and more information from the same sentence as it goes along.
  • In another example where the target system is an audio system of the vehicle, the free form utterance “turn the volume up” may result in a detected class “Audio” after a first stage (or first iteration), a sub-class “Audio_Volume” after a second stage (or second iteration), and a sub-class “Audio_Volume_Up” (which is a sub-class of the sub-class “Audio”) after a third stage (or third iteration).
  • In a preferred embodiment, this may be accomplished via attribute value pair (AVP) extraction in a top-down fashion. Thus, each stage or level in the multi-stage system acts as an elemental AVP extractor or semantic analyzer of the sentence. The advantage is that the multi-stage system of the invention is not tagging each word with labels as would occur in a statistical parser or attaching a semantic label as would occur in a linguistic parser, rather the multi-stage system is adding class, sub-class, and sub-class (of the sub-class) information, which is far simpler to do. Also, the methodology is iterative because the same process is applied at each subsequent level with only finer and finer class labels.
  • Table 1 below is an example of the multi-level class labels (e.g., hierarchical structure) that may be associated with the audio example:
  • TABLE 1
    Level AUDIO
    1:
    Level AUDIO_RADIO AUDIO_VOLUME
    2:
    Level Aud._Radio_on Aud._Radio_off A_Radio_Station
    3: Aud._volume_down Aud._volume_up
  • In order to be able to decode (or recognize) the free form speech, an initial training data set may be used. The process is automated wherein a small model is built with a relatively small data set. Then, the training process iterates when new data is added, using the initial model to label the new data set.
  • Further, the multi-stage system can also be employed with lower level parsers or metadata. That is, most of the intent determination processing uses the hierarchical action classification approach of the invention. However, when the system gets down to some very specific part of the user request, e.g., complicated navigation request that has a “to city,” a “from city,” and/or some other peripheral information like avoiding the most congested roads, this can make the request complicated. Within the hierarchical action classification of the invention, while this lower level information in the utterance can be annotated, the system can utilize added metadata and/or use a simple kind of parser, at the lowest stage or level, for extracting items such as “to” and “from” information. Thus, instead of building an entire statistical parser for the entire corpus of data, principles of the invention are able to use a smaller domain dependent subset of the data.
  • Referring initially to FIG. 1, a hierarchical system for extracting user intent from a spoken utterance, according to an embodiment of the invention, is depicted. As shown, the system referred to as a dialog domain detection (DDE) engine 10 comprises conversational system 100, command bus 101, hierarchy manager 102, intent recognition manager 103, question module 104, situation manager 105, audio input 106, speech recognition system 107, and sensors 108.
  • Conversational system 100 functions as a dialog manager. Audio input 106 represents the spoken utterances captured by the system that are being processed to determine intent. Conversational system 100 sends the audio input to speech recognition engine 107, which then decodes the audio and returns text, representative of what the speech recognition engine recognized, back to conversational system 100. It is to be appreciated that the invention is not limited to any particular speech recognition engine and, thus, any suitable speech recognition system can be employed. By way of example only, the IBM Corporation (Armonk, N.Y.) Embedded ViaVoice™ engine could be employed.
  • The command bus 101 serves as a central communication bus between the components of the DDE engine.
  • Hierarchy manager 102 (as will be explained in further detail below in the context of FIG. 2) imposes the top-down iterative structure used by intent recognition manager 103 (as will be explained in further detail below in the context of FIG. 3) to extract intent from the spoken utterance of the user. For example, in the audio example, the above-described multi-level class labels in Table 1 may serve as the imposed hierarchical structure.
  • That is, hierarchy manager 102 sets the number of levels or stages that intent recognition manager 103 will traverse for a given intent determination session. More particularly, hierarchy manager dictates, at each level, the permitted inputs and the permitted results (e.g., class labels). Then, intent recognition manager 103 traverses (top to bottom) the hierarchical structure set by the hierarchy manager. As it traverses down the structure, intent recognition manager 103 expects hierarchy manager 102 to inform it, at this level, what structure can be imposed. Thus, intent recognition manager keeps referring back to the hierarchy manager.
  • Intent recognition manager 103 has an additional function. It is also serves as an interface for the logical, multi-tiered view of the user-input sentence. Conversational system 100 may utilize such a logical view of the sentence.
  • Thus, the intent gets clarified as the intent recognition manager walks down the structure. As the hierarchy manager informs that it can provide certain information, the intent recognition manager walks down the structure and determines a particular intent at each level, from broad to narrow. The particular intent determined at each level is referred to herein as an “interpretation.” In the audio example, the top level intent is going to be the audio system. However, this does not mean much since there are any number of actions that can be taken with respect to the audio system. The next level could determine that the user is referring to a radio station. The next level could determine a particular radio station that the user wishes to be selected. Thus, instead of saying “XM Radio,” “set radio channel,” and “channel 47,” the DDE engine of the invention permits the user to say “I want to listen to channel 47.” Therefore, the intent recognition manager starts with a vague picture, or actually with nothing, and tries to come up with a highly tuned view of what the intent is.
  • Question module 104 generates questions that can be asked of the user that may be used to assist the system with determining intent. As is known, dialog managers are able to coordinate the asking of questions to a speaker, the responses to which further clarify any ambiguity that remains from the previous user input. Thus, as is known, question module may comprise a text-to-speech engine capable of generating questions that are audibly output to the user. The responses are processed through the speech recognition engine and provided to the conversational system which coordinates their use with the intent recognition manager. Further, when an intent is determined by the system, question module 104 could serve to ask the user to confirm that intent before the system sends the appropriate command(s) to the target system.
  • Sensors 108 may comprise one or more sensors that describe external situations (e.g., weather, speed, humidity, temperature, location via a global positioning system, etc.) and personal characteristics (e.g., biometrics—voice, face characteristics, tired, sleepiness conditions). This information, coordinated by situation manager 105, may also be used to determine intent of the user and/or assist in providing a response to the user.
  • While the invention is not limited to any particular question module architecture or external situation manager architecture, examples of techniques that could be employed here are described in U.S. Pat. Nos. 6,092,192; 6,587,818; and 6,236,968.
  • Referring now to FIG. 2, a hierarchy manager, according to an embodiment of the invention, is depicted. As shown, hierarchy manager (102 in FIG. 1) comprises parser 201, labeler 202, semantic processing module 203, sequencing module 204, topic 205, function and data 206, text input 208, and training module 210.
  • Parser 201 receives as input text 208. It is to be appreciated that text 208 represents the decoded speech, i.e., the result of the audio input (106 in FIG. 1) being decoded by the speech recognition engine (107 in FIG. 1). The role of parser 201 is to tag the parts of speech of the decoded text, e.g., nouns, verbs, other grammatical terms or phrases. The parser can utilize meta information or even external mark up to describe the tagged portions of the text.
  • Labeler 202 separates function and non-function words in the text. That is, it is understood that some words in the text are more valuable (function words) than other words (non-function words) in determining intent. To do this, the words in the text are weighted by the labeler. The weighting may be done by accessing the domain dependent model and scoring the words in the text against all potential words. The importance of the word depends on its score, i.e., words with higher scores are considered more important. Words at or above a threshold score may be considered function words, while words below a threshold score may be considered non-function words.
  • Semantic processor 203 then interprets the scores assigned by the labeler. For example, the semantic processor may determine for a given input sentence that terms associated with audio have more weight than terms associated with climate control. Thus, the semantic processor accepts all the interpretations, does a relative scoring, applies a threshold, and decides, for example, that the top three interpretations should be taken as the most relevant ones.
  • Interpretation means intent in this context. Thus, for each input utterance, the labeler produces a list of interpretations and attendant scores. Since this is a statistical approach, there are no unambiguously correct labels produced, but instead a list of interpretations covering all possible interpretations. The semantic processor applies intelligent thresholding to discard low scores that are possible but of low probability based on prior knowledge or simple thresholding. Prior knowledge can include user knowledge derived from the training data, and simple thresholding can includes retaining a fixed number of interpretations (e.g., three), or retaining all interpretations within a fixed percentage of the best scoring label. These are all parameters that can be made available to an agent deploying the system via operating panels. By way of one example, semantic processor 203 may employ techniques disclosed in U.S. Pat. No. 6,236,968.
  • The interpreted result is a three-tuple (a group of three sub-results). That is, in this particular embodiment, to “understand” a command three entities are extracted and analyzed: (1) the machine (target or topic 205) that is operated upon (e.g., Audio. Navigation); (2) the action (function 206) to be performed (e.g., switch, turn, move); and (3) the data 206 that is provided with the action (e.g., on/off, up/down, left/right). By way of example, Table 1 above illustrates the hierarchical structure from which the three-tuple may be determined. It is to be understood that while hierarchy manager 102 and intent recognition manager 103 are illustrated in FIG. 1 as logically separate components, the components may be implemented in a single functional module due to their tightly coupled functionality.
  • Sequencing module 204 is used to apply global rules on which part of the sentence is more important because, for example, it is first in order in the sentence or because it is the premise of the sentence or because the user used more emphasis on it.
  • The idea of sequencing or timing here relates to separating, within a complex request from the user, the primary request from a secondary one. For example, where the target system is a navigation system, assume a user says “Find me a McDonald's with parking.” The principal request is find me a McDonald's. The parking is a secondary request. The sequencer informs the semantic processor that the concept of “finding a McDonald's” should take precedence or is more important than the concept of “parking.”
  • Such sequencing may be determined from any nuances in the user's utterance that guide the search for the correct interpretation. An emphasized word or phrase carries more weight. The speeding up of a phrase within a sentence may carry additional indicators of importance, etc. So this module attempts to perform a fine-grained analysis of the user's nuances.
  • Training module 210 serves to train parser 201, labeler 202, and semantic processor 203.
  • Referring now to FIG. 3, an intent recognition manager, according to an embodiment of the invention, is depicted. As shown, intent recognition manager (103 in FIG. 1) comprises weight computation module 300, pruning module 301, list preparation module 302, feedback 303, and external input 304.
  • Weight computation module 300 computes the weights of the different words in the user utterance and applies two kinds of quantitative tests. The first is to compute whether the words in the utterance are above a fixed threshold. This is the rejection mechanism which decides whether to accept the user utterance for analysis or reject it outright as being outside the realm of its capability. Systems built for use in a car are unlikely to “understand” questions about other general subjects. In other words, it has to be able to detect that the user used words that are outside its vocabulary. The rejection mechanism is one way to do this. The second quantitative test is the confidence scores. These are the relative scores of the multiple interpretations of the user utterance.
  • Pruning module 301 prunes the list from weight computation module 300. The output from weight computation module 300 nominally will include all possible candidate interpretations. Pruning module 301 decides which ones are worth keeping. Some scores from weight computation module 300 may be too small to consider, not relevant, or too small in magnitude relative to the top scoring interpretations. A “worthiness” test may be derived from the training data. Further, the pruning module can include a control panel and additional controls that can be adjusted with input from customer satisfaction tests (feedback 303).
  • List preparation module 302 prepares the final intent list. The search for the interpretation is usually done in a hierarchical fashion with each level in turn revealing the topic, function, and data. Hence, the scoring, pruning and list preparing tasks are iterative as the scores are carried from one level to the next. In one embodiment, the top three scorers from the top level are expanded to the next level. The top three are appropriate it has been proven from computing with training data that 98.5% of the time the correct interpretation is within the top three results.
  • In addition, external inputs 304 (e.g., other intent recognition scores) can be utilized to generate the list in 302.
  • Referring now to FIG. 4, a confidence/rejection module, according to an embodiment of the invention, is depicted. It is to be understood that FIG. 4 depicts the confidence score and rejection mechanisms shown in weight computation module 300 of FIG. 3.
  • More particularly, in one embodiment, the confidence score for an utterance is the ratio of words in-vocabulary to the total number of words in the utterance. Hence, if all the words in the utterance are found in the system's vocabulary, then the confidence score is 1. If none are, it is zero. If the ratio is less than 0.5, then the utterance is rejected. Block 400 computed the confidence score and block 401 applies the rejection mechanism.
  • This operation can also be understood as follows. The confidence score tries to determine how many of the words are in the system vocabulary versus out of the system vocabulary. If all of the words are in the vocabulary, the word scores are accepted as is. If a fraction of the words are not in the vocabulary, then those words are handicapped to the extent they are not in the vocabulary. For example, if 75 percent of the words are in the vocabulary, every score coming out of the word score computation is handicapped (i.e., by multiplying by 0.75). That cascades down the hierarchy. The siblings are also penalized to that extent.
  • Referring now to FIG. 5, a run-time methodology for use in hierarchically extracting user intent from a spoken utterance, according to an embodiment of the invention, is depicted.
  • In general, the input utterance is applied to the system (i.e., applied against the system model) and the system will return an interpretation, e.g., a three-tuple comprising [topic][function][data]. Hence, an input “turn the volume up” will generate multiple interpretations:
  • [Audio][Volume][up]
    [Climate][temperature][up]
    [Audio][Volume][down]
    . . .
  • Each will have a computed score associated with it. FIG. 5 shows a flow chart of how these interpretations are generated. An initial model tree created during training contains all possible paths that can yield a result. Traversing down this tree from the top node to a leaf node yields several interpretations per level. So, for example, nine interpretations from the top level are pruned down to three. Each of the nodes of the tree are expanded to their child nodes. For example, “Audio” above may yield “Audio_Volume,” “Audio_Treble,” and “Audio_CD”), and “Climate” may yield three more of its children. Similarly, “Audio_Volume” will be split into its children. The process stops after three levels. In some cases, there may be fewer than three levels simply because there is not adequate data to warrant a third level.
  • Thus, as specifically shown in FIG. 5:
  • Step 501—Push top-level interpretation that operates with the text input 500.
  • Step 502—Assign scores for interpretations from step 501.
  • Step 503—Get next interpretation.
  • Step 504—Check if anything is left (None Left?).
  • Step 505—If “No” for step 504, then check if node is expandable
  • Step 506—If not expandable, then add to interpretation list and go to get next interpretation (step 503).
  • Step 507—Otherwise (if expandable), calculate children and go to assign scores (step 502).
  • If none left in step 504, then methodology is done (508).
  • Referring now to FIG. 6, a training methodology for use in hierarchically extracting user intent from a spoken utterance, according to an embodiment of the invention, is depicted.
  • In general, first, we decide on the domain in which this system will operate. Data is then collected in that domain, rejecting all data that is outside the domain. These data are then carefully divided into multiple “topic” domains. Within each “topic,” the sentences are further bucketed into sub-domains by “function,” and then each function into “data.” This process of bucketing may be done using a tool that allows for easy “tagging” of such data in a visual manner. We may then gather more data in sub-domains that do not have adequate representation. The more common approach is to build a model, run a test with data withheld from the training set. “Topics” that perform poorly are candidates for adding more sentences. This approach allows for more targeted data collection.
  • Thus, as specifically shown in FIG. 6:
  • Step 600—Collect text data in domain.
  • Step 601—Split data into individual domains.
  • Step 602—Tag domains.
  • Step 603—Gather more data.
  • Step 604—None left? If no, go to step 601.
  • Step 605—Build system model, if yes in step 604.
  • Further, we preferably split training data into one set for each node in the hierarchy, and build a model for each node.
  • Referring lastly to FIG. 7, a block diagram of an illustrative implementation of a computing system for use in implementing techniques of the invention is shown. More particularly, FIG. 7 represents a computing system which may implement the user intent extraction components and methodologies of the invention, as described above in the context of FIGS. 1 through 6. The architecture shown may also be used to implement a target system.
  • In this particular implementation, a processor 701 for controlling and performing methodologies described herein is coupled to a memory 702 and a user interface 703 via a computer bus 704.
  • It is to be appreciated that the term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) or other suitable processing circuitry. For example, the processor may be a digital signal processor (DSP), as is known in the art. Also the term “processor” may refer to more than one individual processor. However, the invention is not limited to any particular processor type or configuration.
  • The term “memory” as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard drive), a removable memory device (e.g., diskette), flash memory, etc. However, the invention is not limited to any particular memory type or configuration.
  • In addition, the term “user interface” as used herein is intended to include, for example, one or more input devices, e.g., keyboard, for inputting data to the processing unit, and/or one or more output devices, e.g., CRT display and/or printer, for providing results associated with the processing unit. The user interface may also include one or more microphones for receiving user speech. However, the invention is not limited to any particular user interface type or configuration.
  • Accordingly, computer software including instructions or code for performing the methodologies of the invention, as described herein, may be stored in one or more of the associated memory devices (e.g., ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (e.g., into RAM) and executed by a CPU.
  • In any case, it should be understood that the components/steps illustrated in FIGS. 1 through 7 may be implemented in various forms of hardware, software, or combinations thereof, e.g., one or more digital signal processors with associated memory, application specific integrated circuit(s), functional circuitry, one or more appropriately programmed general purpose digital computers with associated memory, etc. Given the teachings of the invention provided herein, one of ordinary skill in the related art will be able to contemplate other implementations of the elements of the invention.
  • Although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be made by one skilled in the art without departing from the scope or spirit of the invention.

Claims (21)

1.-15. (canceled)
16. A method, comprising:
obtaining a decoding of a free form voice instruction of a user, the free form voice instruction specifying an intended action;
determining a first level of classification of the intended action by analyzing a portion of the decoding during a first semantic analysis stage of an iterative semantic analysis process, the first level of classification including a plurality of sub-classifications; and
analyzing the portion of the decoding during a second semantic analysis stage of the iterative semantic analysis process to determine a second level of classification of the intended action,
wherein the second level of classification represents one of the sub-classifications of the first level of classification.
17. The method of claim 16, wherein the second level of classification includes a plurality of sub-classifications, and wherein the method further comprises analyzing the portion of the decoding during a third semantic analysis stage of the iterative semantic analysis process to determine a third level of classification of the intended action, wherein the third level of classification represents one of the sub-classifications of the second level of classification.
18. The method of claim 16, wherein the method comprises extracting a value for an attribute at each of the first semantic analysis stage and the second semantic analysis stage.
19. The method of claim 16, wherein analyzing a portion of the decoding during the first semantic analysis stage comprises analyzing the decoding in its entirety during the first semantic analysis stage.
20. The method of claim 16, wherein neither the first semantic analysis stage nor the second semantic analysis stage involves tagging each word of the portion of the decoding.
21. The method of claim 16, wherein determining a first level of classification of the intended action by analyzing a portion of the decoding during a first semantic analysis stage of an iterative semantic analysis process comprises weighting words of the portion of the decoding and pruning a list of potential classifications to determine the first level of classification.
22. The method of claim 21, wherein analyzing the portion of the decoding during a second semantic analysis stage of the iterative semantic analysis process comprises weighting words of the portion of the decoding and pruning a list of potential sub-classifications to determine the second level of classification.
23. At least one computer readable storage device encoded with a plurality of instructions that, when executed, cause at least one processor to perform a method comprising:
obtaining a decoding of a free form voice instruction of a user, the free form voice instruction specifying an intended action;
determining a first level of classification of the intended action by analyzing a portion of the decoding during a first semantic analysis stage of an iterative semantic analysis process, the first level of classification including a plurality of sub-classifications; and
analyzing the portion of the decoding during a second semantic analysis stage of the iterative semantic analysis process to determine a second level of classification of the intended action,
wherein the second level of classification represents one of the sub-classifications of the first level of classification.
24. The at least one computer readable storage device of claim 23, wherein the second level of classification includes a plurality of sub-classifications, and wherein the method further comprises analyzing the portion of the decoding during a third semantic analysis stage of the iterative semantic analysis process to determine a third level of classification of the intended action, wherein the third level of classification represents one of the sub-classifications of the second level of classification.
25. The at least one computer readable storage device of claim 23, wherein the method comprises extracting a value for an attribute at each of the first semantic analysis stage and the second semantic analysis stage.
26. The at least one computer readable storage device of claim 23, wherein analyzing the portion of the decoding during the first semantic analysis stage comprises analyzing the decoding in its entirety during the first semantic analysis stage.
27. The at least one computer readable storage device of claim 23, wherein neither the first semantic analysis stage nor the second semantic analysis stage involves tagging each word of the portion of the decoding.
28. The at least one computer readable storage device of claim 23, wherein determining a first level of classification of the intended action by analyzing a portion of the decoding during a first semantic analysis stage of an iterative semantic analysis process comprises weighting words of the portion of the decoding and pruning a list of potential classifications to determine the first level of classification.
29. The at least one computer readable storage device of claim 28, wherein analyzing the portion of the decoding during a second semantic analysis stage of the iterative semantic analysis process comprises weighting words of the portion of the decoding and pruning a list of potential sub-classifications to determine the second level of classification.
30. An apparatus comprising:
at least one processor circuit programmed to perform a method comprising:
obtaining a decoding of a free form voice instruction of a user, the free form voice instruction specifying an intended action;
determining a first level of classification of the intended action by analyzing a portion of the decoding during a first semantic analysis stage of an iterative semantic analysis process, the first level of classification including a plurality of sub-classifications; and
analyzing the portion of the decoding during a second semantic analysis stage of the iterative semantic analysis process to determine a second level of classification of the intended action,
wherein the second level of classification represents one of the sub-classifications of the first level of classification.
31. The apparatus of claim 30, wherein the second level of classification includes a plurality of sub-classifications, and wherein the method further comprises analyzing the portion of the decoding during a third semantic analysis stage of the iterative semantic analysis process to determine a third level of classification of the intended action, wherein the third level of classification represents one of the sub-classifications of the second level of classification.
32. The apparatus of claim 30, wherein the method comprises extracting a value for an attribute at each of the first semantic analysis stage and the second semantic analysis stage.
33. The apparatus of claim 30, wherein analyzing a portion of the decoding during the first semantic analysis stage comprises analyzing the decoding in its entirety during the first semantic analysis stage.
34. The apparatus of claim 30, wherein neither the first semantic analysis stage nor the second semantic analysis stage involves tagging each word of the portion of the decoding.
35. The apparatus of claim 30, wherein determining a first level of classification of the intended action by analyzing a portion of the decoding during a first semantic analysis stage of an iterative semantic analysis process comprises weighting words of the portion of the decoding and pruning a list of potential classifications to determine the first level of classification.
US14/043,647 2005-08-31 2013-10-01 Hierarchical methods and apparatus for extracting user intent from spoken utterances Abandoned US20140095162A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/043,647 US20140095162A1 (en) 2005-08-31 2013-10-01 Hierarchical methods and apparatus for extracting user intent from spoken utterances

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11/216,483 US8265939B2 (en) 2005-08-31 2005-08-31 Hierarchical methods and apparatus for extracting user intent from spoken utterances
US13/564,596 US8560325B2 (en) 2005-08-31 2012-08-01 Hierarchical methods and apparatus for extracting user intent from spoken utterances
US14/043,647 US20140095162A1 (en) 2005-08-31 2013-10-01 Hierarchical methods and apparatus for extracting user intent from spoken utterances

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/564,596 Continuation US8560325B2 (en) 2005-08-31 2012-08-01 Hierarchical methods and apparatus for extracting user intent from spoken utterances

Publications (1)

Publication Number Publication Date
US20140095162A1 true US20140095162A1 (en) 2014-04-03

Family

ID=37831070

Family Applications (4)

Application Number Title Priority Date Filing Date
US11/216,483 Active 2026-11-21 US8265939B2 (en) 2005-08-31 2005-08-31 Hierarchical methods and apparatus for extracting user intent from spoken utterances
US12/125,441 Abandoned US20080221903A1 (en) 2005-08-31 2008-05-22 Hierarchical Methods and Apparatus for Extracting User Intent from Spoken Utterances
US13/564,596 Expired - Fee Related US8560325B2 (en) 2005-08-31 2012-08-01 Hierarchical methods and apparatus for extracting user intent from spoken utterances
US14/043,647 Abandoned US20140095162A1 (en) 2005-08-31 2013-10-01 Hierarchical methods and apparatus for extracting user intent from spoken utterances

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US11/216,483 Active 2026-11-21 US8265939B2 (en) 2005-08-31 2005-08-31 Hierarchical methods and apparatus for extracting user intent from spoken utterances
US12/125,441 Abandoned US20080221903A1 (en) 2005-08-31 2008-05-22 Hierarchical Methods and Apparatus for Extracting User Intent from Spoken Utterances
US13/564,596 Expired - Fee Related US8560325B2 (en) 2005-08-31 2012-08-01 Hierarchical methods and apparatus for extracting user intent from spoken utterances

Country Status (1)

Country Link
US (4) US8265939B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10275452B2 (en) 2017-05-12 2019-04-30 International Business Machines Corporation Automatic, unsupervised paraphrase detection
US10395655B1 (en) * 2017-09-13 2019-08-27 Amazon Technologies, Inc. Proactive command framework
US10620912B2 (en) 2017-10-25 2020-04-14 International Business Machines Corporation Machine learning to determine and execute a user interface trace
US10620911B2 (en) 2017-10-25 2020-04-14 International Business Machines Corporation Machine learning to identify a user interface trace

Families Citing this family (310)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU6630800A (en) * 1999-08-13 2001-03-13 Pixo, Inc. Methods and apparatuses for display and traversing of links in page character array
US8645137B2 (en) * 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
ITFI20010199A1 (en) 2001-10-22 2003-04-22 Riccardo Vieri SYSTEM AND METHOD TO TRANSFORM TEXTUAL COMMUNICATIONS INTO VOICE AND SEND THEM WITH AN INTERNET CONNECTION TO ANY TELEPHONE SYSTEM
US7669134B1 (en) 2003-05-02 2010-02-23 Apple Inc. Method and apparatus for displaying information during an instant messaging session
EP1696198B1 (en) * 2005-02-28 2014-07-16 Saab Ab Method and system for fire simulation
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9009046B1 (en) 2005-09-27 2015-04-14 At&T Intellectual Property Ii, L.P. System and method for disambiguating multiple intents in a natural language dialog system
US7633076B2 (en) 2005-09-30 2009-12-15 Apple Inc. Automated response to and sensing of user activity in portable devices
DE102006036338A1 (en) * 2006-08-03 2008-02-07 Siemens Ag Method for generating a context-based speech dialog output in a speech dialogue system
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US7912828B2 (en) * 2007-02-23 2011-03-22 Apple Inc. Pattern searching methods and apparatuses
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
ITFI20070177A1 (en) 2007-07-26 2009-01-27 Riccardo Vieri SYSTEM FOR THE CREATION AND SETTING OF AN ADVERTISING CAMPAIGN DERIVING FROM THE INSERTION OF ADVERTISING MESSAGES WITHIN AN EXCHANGE OF MESSAGES AND METHOD FOR ITS FUNCTIONING.
US9053089B2 (en) 2007-10-02 2015-06-09 Apple Inc. Part-of-speech tagging using latent analogy
US8165886B1 (en) 2007-10-04 2012-04-24 Great Northern Research LLC Speech interface system and method for control and interaction with applications on a computing system
US8595642B1 (en) 2007-10-04 2013-11-26 Great Northern Research, LLC Multiple shell multi faceted graphical user interface
US8364694B2 (en) 2007-10-26 2013-01-29 Apple Inc. Search assistant for digital media assets
US8620662B2 (en) 2007-11-20 2013-12-31 Apple Inc. Context-aware unit selection
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8327272B2 (en) 2008-01-06 2012-12-04 Apple Inc. Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars
US8065143B2 (en) 2008-02-22 2011-11-22 Apple Inc. Providing text input using speech data and non-speech data
US8289283B2 (en) 2008-03-04 2012-10-16 Apple Inc. Language input interface on a device
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
KR20090107365A (en) * 2008-04-08 2009-10-13 엘지전자 주식회사 Mobile terminal and its menu control method
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8464150B2 (en) 2008-06-07 2013-06-11 Apple Inc. Automatic language identification for dynamic text processing
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US8768702B2 (en) 2008-09-05 2014-07-01 Apple Inc. Multi-tiered voice feedback in an electronic device
US8898568B2 (en) 2008-09-09 2014-11-25 Apple Inc. Audio user interface
US8355919B2 (en) * 2008-09-29 2013-01-15 Apple Inc. Systems and methods for text normalization for text to speech synthesis
US8396714B2 (en) * 2008-09-29 2013-03-12 Apple Inc. Systems and methods for concatenation of words in text to speech synthesis
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US20100082328A1 (en) * 2008-09-29 2010-04-01 Apple Inc. Systems and methods for speech preprocessing in text to speech synthesis
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8352268B2 (en) * 2008-09-29 2013-01-08 Apple Inc. Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
US8352272B2 (en) * 2008-09-29 2013-01-08 Apple Inc. Systems and methods for text to speech synthesis
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
TWI377560B (en) * 2008-12-12 2012-11-21 Inst Information Industry Adjustable hierarchical scoring method and system
US8862252B2 (en) 2009-01-30 2014-10-14 Apple Inc. Audio user interface for displayless electronic device
US8380507B2 (en) * 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10540976B2 (en) 2009-06-05 2020-01-21 Apple Inc. Contextual voice commands
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US20110010179A1 (en) * 2009-07-13 2011-01-13 Naik Devang K Voice synthesis and processing
JP2011033680A (en) * 2009-07-30 2011-02-17 Sony Corp Voice processing device and method, and program
US20110099507A1 (en) * 2009-10-28 2011-04-28 Google Inc. Displaying a collection of interactive elements that trigger actions directed to an item
US8682649B2 (en) 2009-11-12 2014-03-25 Apple Inc. Sentiment prediction from textual data
KR20110072847A (en) * 2009-12-23 2011-06-29 삼성전자주식회사 Dialog management system or method for processing information seeking dialog
US8600743B2 (en) 2010-01-06 2013-12-03 Apple Inc. Noise profile determination for voice-related feature
US20110167350A1 (en) * 2010-01-06 2011-07-07 Apple Inc. Assist Features For Content Display Device
US8311838B2 (en) * 2010-01-13 2012-11-13 Apple Inc. Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts
US8381107B2 (en) 2010-01-13 2013-02-19 Apple Inc. Adaptive audio feedback system and method
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US8626511B2 (en) * 2010-01-22 2014-01-07 Google Inc. Multi-dimensional disambiguation of voice commands
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9043206B2 (en) 2010-04-26 2015-05-26 Cyberpulse, L.L.C. System and methods for matching an utterance to a template hierarchy
US8165878B2 (en) 2010-04-26 2012-04-24 Cyberpulse L.L.C. System and methods for matching an utterance to a template hierarchy
US8639516B2 (en) 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
US8713021B2 (en) 2010-07-07 2014-04-29 Apple Inc. Unsupervised document clustering using latent semantic density analysis
US9104670B2 (en) 2010-07-21 2015-08-11 Apple Inc. Customized search or acquisition of digital media assets
US8719006B2 (en) 2010-08-27 2014-05-06 Apple Inc. Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
US8719014B2 (en) 2010-09-27 2014-05-06 Apple Inc. Electronic device with text error correction based on voice recognition data
KR20120046627A (en) * 2010-11-02 2012-05-10 삼성전자주식회사 Speaker adaptation method and apparatus
US20120159341A1 (en) 2010-12-21 2012-06-21 Microsoft Corporation Interactions with contextual and task-based computing environments
US10515147B2 (en) 2010-12-22 2019-12-24 Apple Inc. Using statistical language models for contextual lookup
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US20120166522A1 (en) * 2010-12-27 2012-06-28 Microsoft Corporation Supporting intelligent user interface interactions
US10032127B2 (en) 2011-02-18 2018-07-24 Nuance Communications, Inc. Methods and apparatus for determining a clinician's intent to order an item
US8781836B2 (en) 2011-02-22 2014-07-15 Apple Inc. Hearing assistance system for providing consistent human speech
US8688453B1 (en) * 2011-02-28 2014-04-01 Nuance Communications, Inc. Intent mining via analysis of utterances
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10642934B2 (en) 2011-03-31 2020-05-05 Microsoft Technology Licensing, Llc Augmented conversational understanding architecture
US9842168B2 (en) 2011-03-31 2017-12-12 Microsoft Technology Licensing, Llc Task driven user intents
US9760566B2 (en) 2011-03-31 2017-09-12 Microsoft Technology Licensing, Llc Augmented conversational understanding agent to identify conversation context between two humans and taking an agent action thereof
US9064006B2 (en) 2012-08-23 2015-06-23 Microsoft Technology Licensing, Llc Translating natural language utterances to keyword search queries
US20120310642A1 (en) 2011-06-03 2012-12-06 Apple Inc. Automatically creating a mapping between text data and audio data
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8812294B2 (en) 2011-06-21 2014-08-19 Apple Inc. Translating phrases from one language into another using an order-based set of declarative rules
US8706472B2 (en) 2011-08-11 2014-04-22 Apple Inc. Method for disambiguating multiple readings in language conversion
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US8762156B2 (en) 2011-09-28 2014-06-24 Apple Inc. Speech recognition repair using contextual information
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9317605B1 (en) 2012-03-21 2016-04-19 Google Inc. Presenting forked auto-completions
EP2839391A4 (en) 2012-04-20 2016-01-27 Maluuba Inc Conversational agent
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US8775442B2 (en) 2012-05-15 2014-07-08 Apple Inc. Semantic search using a single-source semantic model
US9123338B1 (en) 2012-06-01 2015-09-01 Google Inc. Background audio identification for speech disambiguation
US9679568B1 (en) 2012-06-01 2017-06-13 Google Inc. Training a dialog system using user feedback
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US10019994B2 (en) 2012-06-08 2018-07-10 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
US8983840B2 (en) 2012-06-19 2015-03-17 International Business Machines Corporation Intent discovery in audio or text-based conversation
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US10026394B1 (en) 2012-08-31 2018-07-17 Amazon Technologies, Inc. Managing dialogs on a speech recognition platform
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
US8935167B2 (en) 2012-09-25 2015-01-13 Apple Inc. Exemplar-based latent perceptual modeling for automatic speech recognition
US8478584B1 (en) * 2012-11-06 2013-07-02 AskZiggy, Inc. Method and system for domain-optimized semantic tagging and task execution using task classification encoding
CN104756100B (en) * 2012-11-30 2017-07-28 三菱电机株式会社 It is intended to estimation unit and is intended to method of estimation
EP2954514B1 (en) 2013-02-07 2021-03-31 Apple Inc. Voice trigger for a digital assistant
US9626629B2 (en) 2013-02-14 2017-04-18 24/7 Customer, Inc. Categorization of user interactions into predefined hierarchical categories
KR101383552B1 (en) * 2013-02-25 2014-04-10 미디어젠(주) Speech recognition method of sentence having multiple instruction
WO2014134093A1 (en) * 2013-03-01 2014-09-04 Nuance Communications, Inc. Methods and apparatus for determining a clinician's intent to order an item
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10642574B2 (en) 2013-03-14 2020-05-05 Apple Inc. Device, method, and graphical user interface for outputting captions
US9733821B2 (en) 2013-03-14 2017-08-15 Apple Inc. Voice control to diagnose inadvertent activation of accessibility features
US10572476B2 (en) 2013-03-14 2020-02-25 Apple Inc. Refining a search based on schedule items
US11151899B2 (en) 2013-03-15 2021-10-19 Apple Inc. User training by intelligent digital assistant
US10078487B2 (en) 2013-03-15 2018-09-18 Apple Inc. Context-sensitive handling of interruptions
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
AU2014233517B2 (en) 2013-03-15 2017-05-25 Apple Inc. Training an at least partial voice command system
DE102013007502A1 (en) * 2013-04-25 2014-10-30 Elektrobit Automotive Gmbh Computer-implemented method for automatically training a dialogue system and dialog system for generating semantic annotations
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
WO2014200728A1 (en) 2013-06-09 2014-12-18 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
AU2014278595B2 (en) 2013-06-13 2017-04-06 Apple Inc. System and method for emergency calls initiated by voice command
US10474961B2 (en) 2013-06-20 2019-11-12 Viv Labs, Inc. Dynamically evolving cognitive architecture system based on prompting for additional user input
US10083009B2 (en) 2013-06-20 2018-09-25 Viv Labs, Inc. Dynamically evolving cognitive architecture system planning
US9594542B2 (en) 2013-06-20 2017-03-14 Viv Labs, Inc. Dynamically evolving cognitive architecture system based on training by third-party developers
US9633317B2 (en) 2013-06-20 2017-04-25 Viv Labs, Inc. Dynamically evolving cognitive architecture system based on a natural language intent interpreter
US10496743B2 (en) 2013-06-26 2019-12-03 Nuance Communications, Inc. Methods and apparatus for extracting facts from a medical text
US9997160B2 (en) * 2013-07-01 2018-06-12 Toyota Motor Engineering & Manufacturing North America, Inc. Systems and methods for dynamic download of embedded voice components
US9646606B2 (en) 2013-07-03 2017-05-09 Google Inc. Speech recognition using domain knowledge
US10339216B2 (en) * 2013-07-26 2019-07-02 Nuance Communications, Inc. Method and apparatus for selecting among competing models in a tool for building natural language understanding models
KR101749009B1 (en) 2013-08-06 2017-06-19 애플 인크. Auto-activating smart responses based on activities from remote devices
US8688447B1 (en) 2013-08-21 2014-04-01 Ask Ziggy, Inc. Method and system for domain-specific noisy channel natural language processing (NLP)
US10102851B1 (en) * 2013-08-28 2018-10-16 Amazon Technologies, Inc. Incremental utterance processing and semantic stability determination
DE112014005354T5 (en) * 2013-11-25 2016-08-04 Mitsubishi Electric Corporation DIALOG MANAGEMENT SYSTEM AND DIALOG MANAGEMENT PROCESS
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US10037758B2 (en) * 2014-03-31 2018-07-31 Mitsubishi Electric Corporation Device and method for understanding user intent
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9483763B2 (en) 2014-05-29 2016-11-01 Apple Inc. User interface for payments
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9966065B2 (en) * 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10066959B2 (en) 2014-09-02 2018-09-04 Apple Inc. User interactions for a mapping application
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
DE112014007015B4 (en) * 2014-09-30 2021-01-14 Mitsubishi Electric Corporation Speech recognition system
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9842593B2 (en) 2014-11-14 2017-12-12 At&T Intellectual Property I, L.P. Multi-level content analysis and response
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9574896B2 (en) 2015-02-13 2017-02-21 Apple Inc. Navigation user interface
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
JPWO2016151692A1 (en) * 2015-03-20 2017-06-15 株式会社東芝 Tag assignment support apparatus, method and program
EP3232131B1 (en) * 2015-04-03 2020-05-06 Mitsubishi Electric Corporation Air-conditioning system
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9953648B2 (en) 2015-05-11 2018-04-24 Samsung Electronics Co., Ltd. Electronic device and method for controlling the same
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10200824B2 (en) 2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US20160358133A1 (en) 2015-06-05 2016-12-08 Apple Inc. User interface for loyalty accounts and private label accounts for a wearable device
US9940637B2 (en) 2015-06-05 2018-04-10 Apple Inc. User interface for loyalty accounts and private label accounts
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10740384B2 (en) 2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10956666B2 (en) 2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
DE112016006496T5 (en) * 2016-02-26 2018-11-15 Mitsubishi Electric Corporation Voice recognition device
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US20200285678A1 (en) * 2016-04-11 2020-09-10 Kiss Digital Media Pty Ltd Method and system for machine-assisted cross-platform design synchronisation
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
US10621581B2 (en) 2016-06-11 2020-04-14 Apple Inc. User interface for transactions
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
KR20180025634A (en) * 2016-09-01 2018-03-09 삼성전자주식회사 Voice recognition apparatus and method
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
JP2018054850A (en) * 2016-09-28 2018-04-05 株式会社東芝 Information processing system, information processor, information processing method, and program
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10216832B2 (en) 2016-12-19 2019-02-26 Interactions Llc Underspecification of intents in a natural language processing system
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
DK180048B1 (en) 2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770428A1 (en) 2017-05-12 2019-02-18 Apple Inc. Low-latency intelligent automated assistant
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US20180336892A1 (en) 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
AU2018273197B2 (en) * 2017-05-22 2021-08-12 Genesys Cloud Services Holdings II, LLC System and method for dynamic dialog control for contact center systems
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
WO2019098803A1 (en) * 2017-11-20 2019-05-23 Lg Electronics Inc. Device for providing toolkit for agent developer
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
CN110580335B (en) 2018-06-07 2023-05-26 阿里巴巴集团控股有限公司 User intention determining method and device
US20200019641A1 (en) * 2018-07-10 2020-01-16 International Business Machines Corporation Responding to multi-intent user input to a dialog system
US10685645B2 (en) * 2018-08-09 2020-06-16 Bank Of America Corporation Identification of candidate training utterances from human conversations with an intelligent interactive assistant
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11176935B2 (en) 2019-02-15 2021-11-16 Wipro Limited System and method for controlling devices through voice interaction
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US10902220B2 (en) 2019-04-12 2021-01-26 The Toronto-Dominion Bank Systems and methods of generating responses associated with natural language input
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11270077B2 (en) * 2019-05-13 2022-03-08 International Business Machines Corporation Routing text classifications within a cross-domain conversational service
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
DK201970511A1 (en) 2019-05-31 2021-02-15 Apple Inc Voice identification in digital assistant systems
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. User activity shortcut suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11468890B2 (en) 2019-06-01 2022-10-11 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11347939B2 (en) 2019-09-16 2022-05-31 Microsoft Technology Licensing, Llc Resolving temporal ambiguities in natural language inputs leveraging syntax tree permutations
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
KR20210036169A (en) * 2019-09-25 2021-04-02 현대자동차주식회사 Dialogue system, dialogue processing method, translating apparatus and method of translation
KR20210054800A (en) * 2019-11-06 2021-05-14 엘지전자 주식회사 Collecting user voice sample
US11288322B2 (en) 2020-01-03 2022-03-29 International Business Machines Corporation Conversational agents over domain structured knowledge
US11823082B2 (en) 2020-05-06 2023-11-21 Kore.Ai, Inc. Methods for orchestrating an automated conversation in one or more networks and devices thereof
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US11043220B1 (en) 2020-05-11 2021-06-22 Apple Inc. Digital assistant hardware abstraction
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6311159B1 (en) * 1998-10-05 2001-10-30 Lernout & Hauspie Speech Products N.V. Speech controlled computer user interface
US6490698B1 (en) * 1999-06-04 2002-12-03 Microsoft Corporation Multi-level decision-analytic approach to failure and repair in human-computer interactions
US20030144831A1 (en) * 2003-03-14 2003-07-31 Holy Grail Technologies, Inc. Natural language processor
US7392185B2 (en) * 1999-11-12 2008-06-24 Phoenix Solutions, Inc. Speech based learning/training system using semantic decoding

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6272455B1 (en) * 1997-10-22 2001-08-07 Lucent Technologies, Inc. Method and apparatus for understanding natural language
US6092192A (en) 1998-01-16 2000-07-18 International Business Machines Corporation Apparatus and methods for providing repetitive enrollment in a plurality of biometric recognition systems based on an initial enrollment
US6236968B1 (en) 1998-05-14 2001-05-22 International Business Machines Corporation Sleep prevention dialog based car system
US6631346B1 (en) * 1999-04-07 2003-10-07 Matsushita Electric Industrial Co., Ltd. Method and apparatus for natural language parsing using multiple passes and tags
TW501046B (en) * 1999-06-11 2002-09-01 Ind Tech Res Inst A portable dialogue manager
US6513006B2 (en) * 1999-08-26 2003-01-28 Matsushita Electronic Industrial Co., Ltd. Automatic control of household activity using speech recognition and natural language
US6553345B1 (en) * 1999-08-26 2003-04-22 Matsushita Electric Industrial Co., Ltd. Universal remote control allowing natural language modality for television and multimedia searches and requests
US6587818B2 (en) 1999-10-28 2003-07-01 International Business Machines Corporation System and method for resolving decoding ambiguity via dialog
US6766320B1 (en) * 2000-08-24 2004-07-20 Microsoft Corporation Search engine with natural language-based robust parsing for user query and relevance feedback learning
US6856958B2 (en) * 2000-09-05 2005-02-15 Lucent Technologies Inc. Methods and apparatus for text to speech processing using language independent prosody markup
US6785651B1 (en) * 2000-09-14 2004-08-31 Microsoft Corporation Method and apparatus for performing plan-based dialog
WO2002050816A1 (en) * 2000-12-18 2002-06-27 Koninklijke Philips Electronics N.V. Store speech, select vocabulary to recognize word
US20020103837A1 (en) * 2001-01-31 2002-08-01 International Business Machines Corporation Method for handling requests for information in a natural language understanding system
US6964023B2 (en) * 2001-02-05 2005-11-08 International Business Machines Corporation System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input
US20020198714A1 (en) * 2001-06-26 2002-12-26 Guojun Zhou Statistical spoken dialog system
US6839896B2 (en) * 2001-06-29 2005-01-04 International Business Machines Corporation System and method for providing dialog management and arbitration in a multi-modal environment
US7152029B2 (en) * 2001-07-18 2006-12-19 At&T Corp. Spoken language understanding that incorporates prior knowledge into boosting
US8190436B2 (en) * 2001-12-07 2012-05-29 At&T Intellectual Property Ii, L.P. System and method of spoken language understanding in human computer dialogs
US20030115289A1 (en) * 2001-12-14 2003-06-19 Garry Chinn Navigation in a voice recognition system
KR100434545B1 (en) * 2002-03-15 2004-06-05 삼성전자주식회사 Method and apparatus for controlling devices connected with home network
US20030233230A1 (en) * 2002-06-12 2003-12-18 Lucent Technologies Inc. System and method for representing and resolving ambiguity in spoken dialogue systems
JP2004163590A (en) * 2002-11-12 2004-06-10 Denso Corp Reproducing device and program
CA2431183A1 (en) * 2003-06-05 2004-12-05 Atc Dynamics Inc. Method and system for natural language recognition command interface and data management
US7386440B2 (en) * 2003-10-01 2008-06-10 International Business Machines Corporation Method, system, and apparatus for natural language mixed-initiative dialogue processing
US20050119894A1 (en) * 2003-10-20 2005-06-02 Cutler Ann R. System and process for feedback speech instruction
US20050165607A1 (en) * 2004-01-22 2005-07-28 At&T Corp. System and method to disambiguate and clarify user intention in a spoken dialog system
US20050187772A1 (en) * 2004-02-25 2005-08-25 Fuji Xerox Co., Ltd. Systems and methods for synthesizing speech using discourse function level prosodic features
GB0411377D0 (en) * 2004-05-21 2004-06-23 Univ Belfast Dialogue manager
KR100679043B1 (en) * 2005-02-15 2007-02-05 삼성전자주식회사 Apparatus and method for spoken dialogue interface with task-structured frames
US7962340B2 (en) * 2005-08-22 2011-06-14 Nuance Communications, Inc. Methods and apparatus for buffering data for use in accordance with a speech recognition system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6311159B1 (en) * 1998-10-05 2001-10-30 Lernout & Hauspie Speech Products N.V. Speech controlled computer user interface
US6490698B1 (en) * 1999-06-04 2002-12-03 Microsoft Corporation Multi-level decision-analytic approach to failure and repair in human-computer interactions
US7392185B2 (en) * 1999-11-12 2008-06-24 Phoenix Solutions, Inc. Speech based learning/training system using semantic decoding
US20030144831A1 (en) * 2003-03-14 2003-07-31 Holy Grail Technologies, Inc. Natural language processor

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10275452B2 (en) 2017-05-12 2019-04-30 International Business Machines Corporation Automatic, unsupervised paraphrase detection
US10275453B2 (en) 2017-05-12 2019-04-30 International Business Machines Corporation Automatic, unsupervised paraphrase detection
US10395655B1 (en) * 2017-09-13 2019-08-27 Amazon Technologies, Inc. Proactive command framework
US11270698B2 (en) * 2017-09-13 2022-03-08 Amazon Technologies, Inc. Proactive command framework
US20220246149A1 (en) * 2017-09-13 2022-08-04 Amazon Technologies, Inc. Proactive command framework
US11823678B2 (en) * 2017-09-13 2023-11-21 Amazon Technologies, Inc. Proactive command framework
US10620912B2 (en) 2017-10-25 2020-04-14 International Business Machines Corporation Machine learning to determine and execute a user interface trace
US10620911B2 (en) 2017-10-25 2020-04-14 International Business Machines Corporation Machine learning to identify a user interface trace

Also Published As

Publication number Publication date
US20130006637A1 (en) 2013-01-03
US8560325B2 (en) 2013-10-15
US20080221903A1 (en) 2008-09-11
US20070055529A1 (en) 2007-03-08
US8265939B2 (en) 2012-09-11

Similar Documents

Publication Publication Date Title
US8560325B2 (en) Hierarchical methods and apparatus for extracting user intent from spoken utterances
CN108305634B (en) Decoding method, decoder and storage medium
US11830485B2 (en) Multiple speech processing system with synthesized speech styles
KR101383552B1 (en) Speech recognition method of sentence having multiple instruction
US8200491B2 (en) Method and system for automatically detecting morphemes in a task classification system using lattices
EP1696421B1 (en) Learning in automatic speech recognition
US8548806B2 (en) Voice recognition device, voice recognition method, and voice recognition program
KR100755677B1 (en) Apparatus and method for dialogue speech recognition using topic detection
Gruhn et al. Statistical pronunciation modeling for non-native speech processing
US6618702B1 (en) Method of and device for phone-based speaker recognition
US20080177541A1 (en) Voice recognition device, voice recognition method, and voice recognition program
US20020013706A1 (en) Key-subword spotting for speech recognition and understanding
KR20170088164A (en) Self-learning based dialogue apparatus for incremental dialogue knowledge, and method thereof
JP2008233678A (en) Voice interaction apparatus, voice interaction method, and program for voice interaction
KR20050082249A (en) Method and apparatus for domain-based dialog speech recognition
JP2003308090A (en) Device, method and program for recognizing speech
US11715472B2 (en) Speech-processing system
WO2013163494A1 (en) Negative example (anti-word) based performance improvement for speech recognition
JP2005275348A (en) Speech recognition method, device, program and recording medium for executing the method
JP2001242885A (en) Device and method for speech recognition, and recording medium
US11783824B1 (en) Cross-assistant command processing
US11551666B1 (en) Natural language processing
Homma et al. In-vehicle voice interface with improved utterance classification accuracy using off-the-shelf cloud speech recognizer
JP3621922B2 (en) Sentence recognition apparatus, sentence recognition method, program, and medium
KR20000025827A (en) Method for constructing anti-phone model in speech recognition system and method for verifying phonetic

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION