US20030149564A1 - User interface for data access and entry - Google Patents

User interface for data access and entry Download PDF

Info

Publication number
US20030149564A1
US20030149564A1 US10/184,069 US18406902A US2003149564A1 US 20030149564 A1 US20030149564 A1 US 20030149564A1 US 18406902 A US18406902 A US 18406902A US 2003149564 A1 US2003149564 A1 US 2003149564A1
Authority
US
United States
Prior art keywords
voice recognition
user
options
address
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/184,069
Inventor
Li Gong
Richard Swan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAP SE
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/184,069 priority Critical patent/US20030149564A1/en
Priority to US10/305,267 priority patent/US7177814B2/en
Priority to US10/358,665 priority patent/US7337405B2/en
Priority to PCT/US2003/003752 priority patent/WO2003067443A1/en
Priority to AU2003215100A priority patent/AU2003215100A1/en
Priority to EP03710916.2A priority patent/EP1481328B1/en
Priority to CN2009101621446A priority patent/CN101621547B/en
Assigned to SAP AKTIENGESELLSCHAFT reassignment SAP AKTIENGESELLSCHAFT ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GONG, LI, SWAN, RICHARD J.
Publication of US20030149564A1 publication Critical patent/US20030149564A1/en
Priority to US11/623,455 priority patent/US20070179778A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/22Synchronisation circuits

Definitions

  • Certain implementations relate generally to a user interface and speech grammar, and more particularly to a user interface and speech grammar for voice-based data access and entry on a mobile device.
  • a user interface may allow a user to gain access to data, such as, for example, products in a catalog database, or to enter data into a system, such as, for example, entering customer information into a customer database.
  • User interfaces are used for applications residing on relatively stationary computing devices, such as desktop computers, as well as for applications residing on mobile computing devices, such as laptops, palmtops, and portable electronic organizers.
  • a voice-activated user interface can be created to provide data access and entry to a system, and voice input may be particularly appealing for mobile devices.
  • a grammar for speech recognition for a given voice-driven application can be written to enable accurate and efficient recognition.
  • Particular implementations described below provide a user interface that allows a user to input data in one or more of a variety of different modes, including, for example, stylus and voice input. Output may also be in one or more of a variety of modes, such as, for example, display or voice.
  • Particular implementations may be used with mobile devices, such as, for example, palmtops, and the combination of voice and stylus input with voice and display output may allow such mobile devices to be more useful to a user. Implementations may also be used with the multi-modal synchronization system described in the incorporated provisional application.
  • Implementations allow enhanced voice recognition accuracy and/or speed due in part to the use of a structured grammar that allows a grammar to be narrowed to a relevant part for a particular voice recognition operation. For example, narrowing of the grammar for a voice recognition operation on a full search string may be achieved by using the results of an earlier, or parallel, voice recognition operation on a component of the full search string. Other implementations may narrow the grammar by accepting parameters of a search string in a particular order from a user, and, optionally, using the initial parameter(s) to narrow the grammar for subsequent parameters.
  • Examples include (i) reversing the standard order of receiving street address information so that, for example, the country is received before the state and the grammar used to recognize the state is narrowed to the states in the selected country, (ii) segmenting an electronic mail address or web site address so that a user supplies a domain identifier, such as, for example “com” separately, or (iii) automatically inserting the “at sign” and the “dot” into an electronic mail address and only prompting the user for the remaining terms, thus obviating the often complex process of recognizing these spoken characters.
  • Implementations may also increase recognition accuracy and speed by augmenting a grammar with possible search strings, or utterances, thus decreasing the likelihood that a voice recognition system will need to identify an entry by its spelling.
  • the voice recognition system also obviates the need to ask the user to spell out a term that is not recognized when spoken.
  • the voice recognition system may include, for example, the names of all “Fortune 100” companies and a variety of popular commercial sites in the grammar for the server identifier of the electronic mail address.
  • the voice recognition system may include, for example, the names of all “Fortune 100” companies and a variety of popular commercial sites in the grammar for the server identifier of the electronic mail address.
  • Implementations also allow enhanced database searching. This may be achieved, for example, by using a structured grammar and associating grammar entries with specific database entries. In this manner, when the structured grammar is used to recognize the search string, then particular database entries or relevant portions of the database may be identified at the same time.
  • performing voice recognition includes accessing a voice input including at least a first part and a second part, performing voice recognition on the first part of the voice input, performing voice recognition on a combination of the first part and the second part using a search space, and limiting the search space based on a result from performing voice recognition on the first part of the voice input. Limiting the search space allows enhanced voice recognition of the combination compared to performing voice recognition on the unlimited search space.
  • Performing voice recognition on the first part may produce a recognized string, and the recognized string may be associated with a set of recognizable utterances from the search space.
  • Limiting the search space may include limiting the search space to a set of recognizable utterances.
  • Voice recognition on the first part may be performed in parallel with voice recognition on the combination, such that the search space is not limited until after voice recognition on the combination has begun.
  • Voice recognition on the first part may be performed before voice recognition on the combination, such that the search space is limited before voice recognition on the combination has begun.
  • Performing voice recognition on the first part of the voice input may include comparing the first part to a set of high-occurrence patterns in the search space, followed by comparing the first part to a set of low-occurrence patterns in the search space.
  • Performing voice recognition on the first part of the voice input may include using a second search space.
  • Voice recognition may be performed on the second part of the voice input.
  • the second search space may be limited based on a result from performing voice recognition on the second part of the voice input. Limiting the search space may also be based on the result from performing voice recognition on the second part of the voice input.
  • Accessing circuitry may be used to access a voice input including at least a first part and a second part.
  • Recognition circuitry may be used to perform voice recognition on the first part of the voice input and on the combination of the first part and the second part, wherein voice recognition may be performed on the combination using a search space.
  • a recognition engine may be used and may include the recognition circuitry.
  • Limiting circuitry may be used to limit the search space based on a result from performing voice recognition on the first part of the voice input. Limiting the search space may allow enhanced voice recognition of the voice input compared to performing voice recognition on the unlimited search space.
  • One or more of the accessing circuitry, the recognition circuitry, and the limiting circuitry may include a memory with instructions for performing one or more of the operations of accessing the voice input, performing voice recognition, and limiting the search space based on the result from performing voice recognition on the first part of the voice input.
  • One or more of the accessing circuitry, the recognition circuitry, and the limiting circuitry may include a processor to perform one or more of the operations of accessing the voice input, performing voice recognition, and limiting the search space based on the result from performing voice recognition on the first part of the voice input.
  • the circuitry may be used to perform one of the other features described for this or another aspect.
  • accepting input from a user includes providing a first set of options to a user, the first set of options relating to a first parameter of a search string, and being provided to the user in a page.
  • a first input is accepted from the user, the first input being selected from the first set of options.
  • a second set of options is limited based on the accepted first input, the second set of options relating to a second parameter of the search string.
  • the second set of options is provided to the user in the page, such that the user is presented with a single page that provides the first set of options and the second set of options.
  • Accepting the first input from the user may include receiving an auditory input and performing voice recognition. Performing voice recognition on the first input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string. Accepting the first input from the user may include receiving a digital input.
  • a second input may be accepted from the user, the second input being selected from the second set of options.
  • Accepting the first input may include receiving the first input auditorily from the user.
  • Voice recognition may be performed on the first input in isolation. Performing voice recognition on the first input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string.
  • Providing the second set of options may include searching a set of data items for the first input and including in the second set of options references only to those data items that include the first input.
  • Accepting the second input may include receiving the second input auditorily from the user.
  • Voice recognition may be performed on the second input in isolation. Performing voice recognition on the second input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string.
  • a third set of options may be provided to the user, and the third set of options may relate to a third parameter of the search string and be provided to the user in the page.
  • a third input may be accepted from the user, and the third input may be selected from the third set of options.
  • the second set of options provided to the user may also be based on the accepted third input.
  • the second set of options provided to the user may be modified based on the accepted third input.
  • Providing the second set of options may include searching a set of data for the first input and providing only data items from the set of data that include the first input.
  • the first input may include a manufacturer designation that identifies a manufacturer.
  • Providing the second set of options may be limited to providing only data items manufactured by the identified manufacturer.
  • Circuitry may be used (i) to provide a first set of options to a user, the first set of options relating to a first parameter of a search string, and being provided to the user in a page, (ii) to accept a first input from the user, the first input being selected from the first set of options, (iii) to limit a second set of options based on the accepted first input, the second set of options relating to a second parameter of the search string, and/or (iv) to provide the second set of options to the user in the page, such that the user is presented with a single page that provides the first set of options and the second set of options.
  • the circuitry may include a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed.
  • the circuitry may include a processor operable to perform at least one of the enumerated operations.
  • the circuitry may be used to perform one of the other features described for this or another aspect.
  • receiving items of an address from a user includes providing the user a first set of options for a first item of an address, receiving from the user the first address item taken from the first set of options, limiting a second set of options for a second item of the address based on the received first item, providing the user the limited second set of options for the second address item, and receiving the second address item.
  • the first address item may include a state identifier.
  • the second address item may include a city identifier identifying a city.
  • the user may be provided a third list of options for a zip code identifier.
  • the third list of options may exclude a zip code not in the identified city.
  • the zip code identifier may be received auditorily from the user.
  • the user may select the zip code identifier from the third list of options.
  • the zip code identifier may identify a zip code.
  • Voice recognition may be performed on the auditorily received zip code identifier. Excluding a zip code in the third list of options may allow enhanced voice recognition compared to not excluding a zip code.
  • the user may be provided a fourth list of options for a street address identifier.
  • the fourth list of options may exclude a street not in the identified zip code.
  • the street address identifier may be received auditorily from the user. The user may select the street address identifier from the fourth list of options.
  • the street address identifier may identify a street address.
  • Voice recognition may be performed on the auditorily received street address identifier. Exclusion of a street in the fourth list of options may allow enhanced voice recognition compared to not excluding a street.
  • Providing the user the first list of options may include providing the first list on a display.
  • Providing the user the second list of options may include providing the second list auditorily.
  • Circuitry may be used (i) to provide the user a first set of options for a first item of an address, (ii) to receive from the user the first address item taken from the first set of options, (iii) to limit a second set of options for a second item of the address based on the received first item, (iv) to provide the user the limited second set of options for the second address item, and/or (v) to receive the second address item.
  • the circuitry may include a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed.
  • the circuitry may include a processor operable to perform at least one of the enumerated operations.
  • the circuitry may be used to perform one of the other features described for this or another aspect.
  • receiving an Internet address from a user includes prompting a user for a first portion of an Internet address.
  • the first portion of the Internet address is received auditorily from the user.
  • Voice recognition is performed on the received first portion.
  • Performing voice recognition on only the first portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the first portion of the Internet address.
  • the user is prompted for a second portion of the Internet address.
  • the second portion of the Internet address is received auditorily from the user.
  • Voice recognition is performed on the received second portion. Performing voice recognition on only the second portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the second portion of the Internet address.
  • the Internet address may include an electronic mail address.
  • the first portion may include a domain identifier of an electronic mail address.
  • the second portion may include a server identifier of an electronic mail address.
  • the user may be prompted for a user identifier portion of an electronic mail address.
  • a user identifier portion may be received auditorily from the user.
  • Voice recognition may be performed on a received user identifier portion. Performing voice recognition on only a user identifier portion may allow enhanced recognition compared to performing voice recognition on more than the user identifier portion of an electronic mail address.
  • Performing voice recognition on a domain identifier may include using a domain vocabulary including common three-letter domain identifiers, which may allow enhanced recognition.
  • Performing voice recognition on a server identifier may include using a server vocabulary including common server identifiers, which may allow enhanced recognition.
  • Performing voice recognition on a user identifier may include using a user vocabulary including common user identifiers, which may allow enhanced recognition.
  • the server vocabulary may be based on a domain identifier.
  • the Internet address may include a web site address.
  • the first portion may include a domain identifier of the web site address.
  • the second portion may include a server identifier of the web site address.
  • the user may be prompted for a network identifier portion of the web site address.
  • the network identifier portion may be received auditorily from the user.
  • Voice recognition may be performed on the received network identifier portion. Performing voice recognition on only the network identifier portion may allow enhanced recognition compared to performing voice recognition on more than the network identifier portion of the web site address.
  • Circuitry may be used (i) to prompt a user for a first portion of an Internet address, (ii) to receive auditorily from the user the first portion of the Internet address, (iii) to perform voice recognition on the received first portion, wherein performing voice recognition on only the first portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the first portion of the Internet address, (iv) to prompt the user for a second portion of the Internet address, (v) to receive auditorily from the user the second portion of the Internet address; and/or (vi) to perform voice recognition on the received second portion, wherein performing voice recognition on only the second portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the second portion of the Internet address.
  • the circuitry may include a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed.
  • the circuitry may include a processor operable to perform at least one of the enumerated operations.
  • the circuitry may be used to perform one of the other features described for this or another aspect.
  • FIG. 1 is a flow chart of a process for recognizing a search string using a multi-cluster approach.
  • FIG. 2 is a diagrammatic flow chart depicting the process of FIG. 1.
  • FIG. 3 is a flow chart of a process for performing a search for a search string using a multi-level, multi-parameter cascade approach.
  • FIG. 4 is a picture of a page for implementing the process of FIG. 3.
  • FIG. 5 is a flow chart of a process for recognizing an address.
  • FIG. 6 is a block diagram of a pop-up wizard for entering address information.
  • FIG. 7 is a block diagram of a format for entering an electronic mail address.
  • FIG. 8 is a block diagram of a format for entering a web site address.
  • FIG. 9 is a flow chart of a process for searching for one or more matches to a search string.
  • FIG. 10 is a block diagram of a system for performing one or more of the described processes.
  • Various implementations include a user interface that provides a user with access to data. These user interfaces may be designed to accept various modes of input and to deliver various modes of output. Examples of input and output modes include manual, visual (for example, display or print), auditory (for example, voice or alarms), haptic, pressure, temperature, and smell. Manual modes may include, for example, keyboard, stylus, keypad, button, mouse, touch (for example, touch screen), and other hand inputs. Certain implementations are particularly suited for mobile applications, for which stylus or voice input is preferred, and for which output is presented visually on the screen and/or auditorily with text-to-speech or recorded human speech.
  • Various implementations also make use of structured grammars for voice recognition.
  • the structured grammars may allow for quicker recognition, for quicker searching for an item in a corresponding database, and/or for enhanced voice recognition due to the decreased likelihood of misrecognizing a voice input.
  • a process 100 for recognizing a search string using a multi-cluster approach includes entering a search string using a voice input ( 110 ).
  • the search string may represent, for example, an item in a database that a user wants to find.
  • the user may enter “Sony laptop superslim 505Z” into a voice recognition engine of a computer database to pull up information on that (hypothetical) computer model.
  • the grammar is structured around the database entries, including the actual database entries, or keywords, etc., and possibly also including additional category descriptions and other vocabulary entries.
  • the process 100 includes parsing the entered search string into at least one component in addition to the full search string ( 120 ).
  • the full search string is also referred to as a component.
  • a component may be a word or other recognized symbol, or group of words or symbols.
  • the search string may be parsed into all of its components, or a single component may be parsed out. Parsing may be performed by recognizing silence between words, symbols, or other components, and the voice entry system may require such silence. Parsing may also be performed on voice inputs entered in a more natural delivery, without obvious pauses between components.
  • the process 100 includes performing voice recognition on at least two components ( 130 ).
  • the parsing ( 120 ) may be performed simultaneously while performing the voice recognition ( 130 ). For example, as the search string is processed from left to right, for example, a component may be recognized ( 130 ) and, upon recognition, may be parsed ( 120 ). One of the two components may be the full search string.
  • the process 100 includes determining a resulting solution space in the grammar for at least one of the voice recognition operations ( 140 ).
  • the solution space represents possible matches for the full search string.
  • the first component may be the first word of the search string, for example, “Sony,” and may correspond to a cluster in the speech recognition grammar. This cluster defined by “Sony” may contain, perhaps, only one hundred entries out of tens of thousands of entries in the grammar (and the corresponding database). Those one hundred entries would form the solution space for the component “Sony.”
  • the process 100 includes modifying the search space for the voice recognition operation ( 130 ) of at least one of the components using the solution space determined in operation 140 ( 150 ).
  • the search space being used to perform the voice recognition on the full string can be narrowed to include only the one hundred grammar entries that include the component “Sony.”
  • both recognition processes ( 130 ) are performed at least partially in parallel and recognizing the smaller component, such as “Sony,” is faster than recognizing the entire search string.
  • the recognition process for the full search string is started on the entire search space of grammar entries and is narrowed after the resulting solution space for the smaller component is determined in operation 140 .
  • Other implementations perform the voice recognition processes serially. For example, one implementation performs voice recognition on a smaller component, and afterwards performs voice recognition for a larger component using the smaller component's solution space as the search space for the larger component.
  • the process 100 includes determining a list of one or more matches for the full search string ( 160 ).
  • Voice recognition algorithms often return confidence scores associated with the results. These confidence scores can be used, for example, to rank order the results and a selected number of the highest scoring results can be returned to the user.
  • the list of matches might not necessarily be good matches.
  • Various implementations may use a threshold confidence score to determine if a good match has been found. If a good match has not been found, then a variety of options are available. For example, (i) the user may be prompted for more information, (ii) the search string may be modified automatically, if it has not already been, by, for example, using synonyms of recognized components, transposing components, etc., or (iii) the user may be presented with information on the size of the solution space for each component, and the confidence scores, which may reveal a component that the system had a difficult time recognizing.
  • a diagrammatic flow chart 200 depicting the process 100 includes a search string 210 .
  • the search string 210 includes a first component 220 and a second component 230 .
  • the search string 210 may be, for example, a voice segment.
  • the search string 210 is parsed using a parse process 240 into the first and second components 220 , 230 .
  • a voice recognition process 250 is performed on each component 210 , 220 , 230 , in parallel, using a search space 260 .
  • the parse process 240 and the voice recognition process 250 may be implemented using, for example, a processor or other computing device or combination of devices.
  • Voice recognition of the first component 220 results in a first solution space 270 . Assuming that voice recognition of the first component 220 finishes before voice recognition of the second component 230 and of the full string 210 , then each of the latter voice recognition operations can be restricted to the first solution space 270 .
  • Voice recognition of the second component 230 results in a second solution space 280 . Assuming that voice recognition of the second component 230 finishes before voice recognition of the full string 210 , then voice recognition of the full string 210 can be restricted to an overlap 290 of the first solution space 270 and the second solution space 280 . Voice recognition of the full string 210 results in a third solution space 295 .
  • the time required for performing voice recognition on a small component can be decreased by structuring the grammar so that common components of the database entries (which are included in the grammar) are compared with the components of the search string before other components of the database entries (which are also included in the grammar).
  • common components may be entered as separate vocabulary entries in a grammar, even though those components do not constitute complete database entries.
  • the word “Sony” may be entered into the vocabulary even though it does not refer to an individual product (database entry).
  • the component “Sony” can then be associated with all of the grammar entries that include the word “Sony” and that correspond to complete database entries. The same can be done for the individual word “laptop,” as well as the two-word component “Sony laptop,” for example.
  • Such a structure may allow for relatively quick recognition of the component “Sony laptop” and a corresponding narrowing of the search space for the recognition of the full search string “Sony laptop superslim 505Z.”
  • the list of matches determined in the process 100 ( 160 ) may return matches that correspond to actual database entries that match the entered search string. Accordingly, in such implementations, the voice recognition process may effectively perform the database search simultaneously. For example, each of the listed matches may serve as an index into the database for easy retrieval of the corresponding database entry.
  • search strings may include components that are not part of the database, however.
  • a user may be allowed to enter a price range for a computer.
  • the grammar could include, and be able to recognize, price ranges entered in a determined format.
  • the grammar may be structured in a variety of ways to support recognizing such search strings. For example, if a user enters only a price range, the voice recognition engine may recognize the search string and associate it with a set of database entries satisfying the price range. Alternatively, the voice recognition engine may query the user for more data by, for example, returning a list of manufacturers having computers (computers being the assumed content of the database ultimately being searched) in that price range.
  • the voice recognition system may use that additional information to narrow the solution space. If the user enters sufficient information, the grammar may be structured to allow the voice recognition system to determine, for the various price ranges that are recognizable, the grammar entries for all actual products (corresponding to actual database entries) that satisfy the entered price range and the other components of the search string. These entries may then be presented to the user.
  • the process 100 can also be applied to systems that do not use voice input.
  • other modes of input may require a recognition process that could be performed in an analogous manner to that already described.
  • a process 300 for performing a search for a search string using a multi-level, multi-parameter cascade approach includes providing a first set of options for a first parameter ( 310 ).
  • a user interface to a database of computers may provide a list of manufacturers as the first set of options, with the first parameter being the manufacturer.
  • the first set of options may be provided, for example, on a display, or through a voice response system.
  • the process 300 includes entering a first parameter selected from the first set of options ( 320 ).
  • a user may select, and enter, a manufacturer from a list provided in operation 310 .
  • the user may enter the first parameter by using, for example, a stylus, keyboard, touch screen, or voice input.
  • the process 300 includes providing a second set of options for a second parameter based on the first parameter ( 330 ).
  • a user interface may provide a list of product types, including, for example, desktops, laptops, and palmtops, that are available from the manufacturer entered in operation 320 .
  • the process 300 includes entering a second parameter selected from the second set of options ( 340 ). Continuing the example from above, a user may select, and enter, a product type from the list provided in operation 330 .
  • the process 300 includes providing a list of matches, based on the first and second parameters ( 350 ).
  • the list of matches may include all computers in the database that are manufactured by the entered manufacturer and that are of the entered product type.
  • the list of matches may include all Sony laptops.
  • the process 300 may be used, for example, instead of having a user enter a one-time, full search phrase.
  • the process 300 presents a set of structured searches or selections from, for example, drop-down lists.
  • the first and second parameters can be considered to be parts of a search string, with the cumulative search string producing the list of matches provided in operation 350 .
  • the database may be structured to allow for efficient searches based on the parameters provided in operations 310 and 330 . Additionally, in voice input applications, by structuring the data entry, the grammar and vocabulary for each parameter may be simplified, thus potentially increasing recognition accuracy and speed.
  • Implementations may present multiple parameters and sets of options, and these may be organized into levels. In the process 300 , one parameter was used at each of two levels. However, for example, multiple parameters may be presented at a first level, with both entries determining the list of options presented for additional multiple parameters at a second level, and with all entries determining a list of matches.
  • Such parameters may include, for example, manufacturer, brand, product type, price range, and a variety of features of the products in the product type. Examples of features for computers include processor speed, amount of random access memory, storage capacity of a hard disk, video card speed and memory, and service contract options.
  • a picture of a page 400 for implementing the process 300 includes a first level 410 and a second level 420 .
  • the first level 410 provides a first parameter 430 for the product, with a corresponding pull-down menu 440 that includes a set of options.
  • the set of options in pull-down menu 440 may include, for example, desktop, laptop, and palmtop.
  • the second level 420 provides a second parameter 450 for the brand, with a corresponding pull-down menu 460 that includes a set of options.
  • the set of options in pull-down menu 460 are all assumed to satisfy the product parameter entered by the user in pull-down menu 440 and may include, for example, Sony, HP/Compaq, Dell, and IBM. Assuming that “laptop” was selected in the pull-down menu 440 , then the pull-down menu 460 would only include brands (manufacturers) that sell laptops.
  • the page 400 also includes a category 470 for models that match the parameters entered in the first and second levels 410 and 420 .
  • the matching models are viewable using a pull-down menu 480 .
  • all of the search string information as well as the results may be presented in a single page.
  • the page 400 is also presentable in a single screen shot, but other single-page implementations may use, for example, a web page that spans multiple screen lengths and requires scrolling to view all of the information.
  • a process 500 for recognizing an address includes determining a list of options for a first part of an address ( 510 ).
  • the address may be, for example, a street address or an Internet address, where Internet addresses include, for example, electronic mail addresses and web site addresses. If the address is a street address, the first part may be, for example, a state identifier.
  • the process 500 includes prompting a user for the first part of the address ( 520 ).
  • the prompt may, for example, simply include a request to enter information, or it may include a list of options.
  • the process 500 includes receiving the first part of the address ( 530 ). If the first part is received auditorily, the process 500 includes performing voice recognition of the first part of the address ( 540 ).
  • the process 500 includes determining a list of options for a second part of the address based on the received first part ( 550 ).
  • the second part may be, for example, a city identifier
  • the list of options may include, for example, only those cities that are in the state identified by the received state identifier.
  • the process 500 includes prompting the user for the second part of the address ( 560 ). Again, the prompt need not include the list of options.
  • the process 500 includes receiving the second part of the address ( 570 ). If the second part is received auditorily, the process 500 includes performing voice recognition of the second part of the address ( 580 ).
  • the process 500 could continue with subsequent determinations of lists of options for further parts of the address.
  • a list of options for a zip code could be determined based on the city identified by the received city identifier. Such a list could be determined from the available zip codes in the identified city. City streets in the city or the zip code could also be determined. Further, country information could be obtained before obtaining state information.
  • the range of possibilities for each subsequent piece of address information can be narrowed by entering the data in an order that is reverse from the ordinary practice, that is, by entering data for geographically broad categories to geographically narrow categories. If multiple countries are concerned, the impact of using the reverse order may be even greater because standard designations for streets varies for different languages.
  • the process 500 may prompt the user in a number of ways. For example, the user may be prompted to enter address information in a particular order, allowing a system to process the address information as it is entered and to prepare the lists of options. Entry fields for country, state or province, city, zip or postal code, street, etc., for example, may be presented top-down on a screen or sequentially presented in speech output.
  • a system may use a pop-up wizard 600 on the screen of a device to ask the user to enter specific address information. Further, a system may preserve the normative order of address information, but use visual cues, for example, to prompt the user to enter the information in a particular order. Visual cues may include, for example, highlighting or coloring the border or the title of an entry field.
  • the process 500 may be applied to data entered using a voice mode or another mode. After the data is entered at each prompt, and after it is recognized if voice input is used, a database of addresses may be searched to determine the list of options for the next address field. Such systems allow database searching on an ongoing basis instead of waiting until all address information is entered. Such systems also allow for guided entry using pull-down menus and, with or without guided entry, alerting a user at the time of entry if an invalid entry is made for a particular part of an address.
  • the process 500 may also be applied to other addresses, in addition to street addresses or parts thereof.
  • the process 500 may be applied to Internet addresses, including, for example, electronic mail addresses and web site addresses.
  • a format 700 for entering an electronic mail address includes using a user identifier 710 , a server identifier 720 , and a domain identifier 730 .
  • the “at sign” separating the user identifier 710 and the server identifier 720 , and the “dot” separating the server identifier 720 and the domain identifier 730 may be implicit and inserted automatically, that is, without human intervention.
  • the domain identifier 730 is entered first due to the small number of options available for this field.
  • a list of options for the server identifier 720 can be generated based on the entered domain. For example, if “com” is entered for the domain, then a list of options for the server identifier 720 may include, for example, all “Fortune 100” companies and the twenty-five most frequently visited commercial web sites. Similar lists may be generated for “gov,” “net,” and other domain identifiers 730 .
  • a list of options for the user identifier 710 may include, for example, common last names and first names and other conventions, such as, for example, a first initial followed by a last name.
  • a format 800 for entering a web site address includes using a network identifier 810 , a server identifier 820 , and a domain identifier 830 .
  • the two “dots” separating the three identifiers 810 , 820 , 830 may be implicit and inserted automatically.
  • the network identifier may be selected from, for example, “www,” “www1,” “www2,” etc.
  • a process 900 for searching for one or matches to a search string includes accessing at least a first part of a search string ( 910 ). Such accessing may include, for example, receiving a voice input, a stylus input, or a menu selection, and the first part may include the entire search string.
  • the process 900 includes searching a first search space for a match for the first part of the search string ( 920 ).
  • the first search space may include, for example, a search space in a grammar of a voice recognition engine, a search space in a database, or a search space in a list of options presented to a user in a pull-down menu. Searching may include, for example, comparing text entries, voice waveforms, or codes representing entries in a codebook of vector-quantized waveforms.
  • the process 900 includes limiting a second search space based on a result of searching the first search space ( 930 ).
  • the second search space may, for example, be similar to or the same as the first search space.
  • Limiting may include, for example, paring down the possible grammar or vocabulary entries that could be examined, paring down the possible database entries that could be examined, or paring down the number of options that could be displayed or made available for a parameter of the search string. And paring down the possibilities or options may be done, for example, so as to exclude possibilities or options that do not satisfy the first part of the search string.
  • the process 900 includes accessing at least a second part of the search string ( 940 ) and searching the limited second search space for a match for the second part of the search string ( 950 ).
  • Accessing the second part of the search string may include, for example, receiving a voice input, a stylus input, or a menu selection, and the second part may include the entire search string.
  • Searching the limited second search space may be performed, for example, in the same way or in a similar way as searching the first search space is performed.
  • the process 900 is intended to cover all of the disclosed processes.
  • a system 1000 for implementing one or more of the above processes includes a computing device 1010 , a first memory 1020 located internal to the computing device 1010 , a second memory 1030 located external to the computing device 1010 , and a recognition engine 1040 located external to the computing device 1010 .
  • the computing device may be, for example, a desktop, laptop, palmtop, or other type of electronic device capable of performing one or more of the processes described.
  • the computing device 1010 may include circuitry, such as, for example, a processor, a controller, a programmed logic device, and a memory having instructions stored thereon.
  • the circuitry may include, for example, analog and/or digital circuitry.
  • the first and second memories 1020 , 1030 may be, for example, permanent or temporary memory capable of storing data or instructions at least temporarily.
  • the recognition engine 1040 may be a voice recognition engine or a recognition engine for another mode of input.
  • the second memory 1030 and the recognition engine 1040 are shown as being external to, and optionally connected to, the computing device 1010 . However, the second memory 1030 and the recognition engine 1040 may also be integrated into the computing device 1010 or be omitted from the system 1000 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Information Transfer Between Computers (AREA)
  • Mobile Radio Communication Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Performing voice recognition may include accessing a voice input including at least a first part and a second part, performing voice recognition on the first part of the voice input, performing voice recognition on a combination of the first part and the second part using a search space, and limiting the search space based on a result from performing voice recognition on the first part of the voice input. Communicating with a user may include presenting the user a first set of options and a second set of options, wherein the second set of options is limited based on the user's selection from the first set of options. The two sets of options may be presented in a single page. The user's selection from the first set of options may be used to select a vocabulary used to recognize the user's response to the second set of options.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims priority from (i) U.S. Provisional Application No. 60/354,324, filed Feb. 7, 2002, and titled MOBILE APPLICATION ARCHITECTURE, (ii) U.S. Application No. 10/131,216 (Attorney Docket No. 13909-017001), filed Apr. 25, 2002, and titled MULTI-MODAL SYNCHRONIZATION, and (iii) U.S. application Ser. No. 10/157,030 (Attorney Docket No. 13909-017002), filed May 30, 2002, and titled USER INTERFACE FOR DATA ACCESS AND ENTRY, all three of which are hereby incorporated by reference in their entirety for all purposes.[0001]
  • TECHNICAL FIELD
  • Certain implementations relate generally to a user interface and speech grammar, and more particularly to a user interface and speech grammar for voice-based data access and entry on a mobile device. [0002]
  • BACKGROUND
  • A user interface may allow a user to gain access to data, such as, for example, products in a catalog database, or to enter data into a system, such as, for example, entering customer information into a customer database. User interfaces are used for applications residing on relatively stationary computing devices, such as desktop computers, as well as for applications residing on mobile computing devices, such as laptops, palmtops, and portable electronic organizers. A voice-activated user interface can be created to provide data access and entry to a system, and voice input may be particularly appealing for mobile devices. [0003]
  • SUMMARY
  • In various implementations, a grammar for speech recognition for a given voice-driven application, mobile or otherwise, can be written to enable accurate and efficient recognition. Particular implementations described below provide a user interface that allows a user to input data in one or more of a variety of different modes, including, for example, stylus and voice input. Output may also be in one or more of a variety of modes, such as, for example, display or voice. Particular implementations may be used with mobile devices, such as, for example, palmtops, and the combination of voice and stylus input with voice and display output may allow such mobile devices to be more useful to a user. Implementations may also be used with the multi-modal synchronization system described in the incorporated provisional application. [0004]
  • Implementations allow enhanced voice recognition accuracy and/or speed due in part to the use of a structured grammar that allows a grammar to be narrowed to a relevant part for a particular voice recognition operation. For example, narrowing of the grammar for a voice recognition operation on a full search string may be achieved by using the results of an earlier, or parallel, voice recognition operation on a component of the full search string. Other implementations may narrow the grammar by accepting parameters of a search string in a particular order from a user, and, optionally, using the initial parameter(s) to narrow the grammar for subsequent parameters. Examples include (i) reversing the standard order of receiving street address information so that, for example, the country is received before the state and the grammar used to recognize the state is narrowed to the states in the selected country, (ii) segmenting an electronic mail address or web site address so that a user supplies a domain identifier, such as, for example “com” separately, or (iii) automatically inserting the “at sign” and the “dot” into an electronic mail address and only prompting the user for the remaining terms, thus obviating the often complex process of recognizing these spoken characters. [0005]
  • Implementations may also increase recognition accuracy and speed by augmenting a grammar with possible search strings, or utterances, thus decreasing the likelihood that a voice recognition system will need to identify an entry by its spelling. In such situations, the voice recognition system also obviates the need to ask the user to spell out a term that is not recognized when spoken. For example, after a user enters “com” as a domain identifier in an electronic mail address, the voice recognition system may include, for example, the names of all “Fortune 100” companies and a variety of popular commercial sites in the grammar for the server identifier of the electronic mail address. Thus, if the user then enters “amazon” as the server identifier, and if “amazon” has been included in the grammar, the system will recognize the entry without having to ask the user to spell it out. [0006]
  • Implementations also allow enhanced database searching. This may be achieved, for example, by using a structured grammar and associating grammar entries with specific database entries. In this manner, when the structured grammar is used to recognize the search string, then particular database entries or relevant portions of the database may be identified at the same time. [0007]
  • According to one general aspect, performing voice recognition includes accessing a voice input including at least a first part and a second part, performing voice recognition on the first part of the voice input, performing voice recognition on a combination of the first part and the second part using a search space, and limiting the search space based on a result from performing voice recognition on the first part of the voice input. Limiting the search space allows enhanced voice recognition of the combination compared to performing voice recognition on the unlimited search space. [0008]
  • Performing voice recognition on the first part may produce a recognized string, and the recognized string may be associated with a set of recognizable utterances from the search space. Limiting the search space may include limiting the search space to a set of recognizable utterances. Voice recognition on the first part may be performed in parallel with voice recognition on the combination, such that the search space is not limited until after voice recognition on the combination has begun. Voice recognition on the first part may be performed before voice recognition on the combination, such that the search space is limited before voice recognition on the combination has begun. Performing voice recognition on the first part of the voice input may include comparing the first part to a set of high-occurrence patterns in the search space, followed by comparing the first part to a set of low-occurrence patterns in the search space. [0009]
  • Performing voice recognition on the first part of the voice input may include using a second search space. Voice recognition may be performed on the second part of the voice input. The second search space may be limited based on a result from performing voice recognition on the second part of the voice input. Limiting the search space may also be based on the result from performing voice recognition on the second part of the voice input. [0010]
  • Accessing circuitry may be used to access a voice input including at least a first part and a second part. Recognition circuitry may be used to perform voice recognition on the first part of the voice input and on the combination of the first part and the second part, wherein voice recognition may be performed on the combination using a search space. A recognition engine may be used and may include the recognition circuitry. Limiting circuitry may be used to limit the search space based on a result from performing voice recognition on the first part of the voice input. Limiting the search space may allow enhanced voice recognition of the voice input compared to performing voice recognition on the unlimited search space. [0011]
  • One or more of the accessing circuitry, the recognition circuitry, and the limiting circuitry may include a memory with instructions for performing one or more of the operations of accessing the voice input, performing voice recognition, and limiting the search space based on the result from performing voice recognition on the first part of the voice input. One or more of the accessing circuitry, the recognition circuitry, and the limiting circuitry may include a processor to perform one or more of the operations of accessing the voice input, performing voice recognition, and limiting the search space based on the result from performing voice recognition on the first part of the voice input. The circuitry may be used to perform one of the other features described for this or another aspect. [0012]
  • According to another general aspect, accepting input from a user includes providing a first set of options to a user, the first set of options relating to a first parameter of a search string, and being provided to the user in a page. A first input is accepted from the user, the first input being selected from the first set of options. A second set of options is limited based on the accepted first input, the second set of options relating to a second parameter of the search string. The second set of options is provided to the user in the page, such that the user is presented with a single page that provides the first set of options and the second set of options. [0013]
  • Accepting the first input from the user may include receiving an auditory input and performing voice recognition. Performing voice recognition on the first input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string. Accepting the first input from the user may include receiving a digital input. [0014]
  • A second input may be accepted from the user, the second input being selected from the second set of options. Accepting the first input may include receiving the first input auditorily from the user. Voice recognition may be performed on the first input in isolation. Performing voice recognition on the first input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string. Providing the second set of options may include searching a set of data items for the first input and including in the second set of options references only to those data items that include the first input. Accepting the second input may include receiving the second input auditorily from the user. Voice recognition may be performed on the second input in isolation. Performing voice recognition on the second input in isolation may allow enhanced voice recognition compared to performing voice recognition on the search string. [0015]
  • A third set of options may be provided to the user, and the third set of options may relate to a third parameter of the search string and be provided to the user in the page. A third input may be accepted from the user, and the third input may be selected from the third set of options. The second set of options provided to the user may also be based on the accepted third input. The second set of options provided to the user may be modified based on the accepted third input. [0016]
  • Providing the second set of options may include searching a set of data for the first input and providing only data items from the set of data that include the first input. The first input may include a manufacturer designation that identifies a manufacturer. Providing the second set of options may be limited to providing only data items manufactured by the identified manufacturer. [0017]
  • Circuitry may be used (i) to provide a first set of options to a user, the first set of options relating to a first parameter of a search string, and being provided to the user in a page, (ii) to accept a first input from the user, the first input being selected from the first set of options, (iii) to limit a second set of options based on the accepted first input, the second set of options relating to a second parameter of the search string, and/or (iv) to provide the second set of options to the user in the page, such that the user is presented with a single page that provides the first set of options and the second set of options. The circuitry may include a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed. The circuitry may include a processor operable to perform at least one of the enumerated operations. The circuitry may be used to perform one of the other features described for this or another aspect. [0018]
  • According to another general aspect, receiving items of an address from a user includes providing the user a first set of options for a first item of an address, receiving from the user the first address item taken from the first set of options, limiting a second set of options for a second item of the address based on the received first item, providing the user the limited second set of options for the second address item, and receiving the second address item. [0019]
  • Receiving the first address item may include receiving the first address item auditorily. Recognition may be performed on the received first address item. Performing voice recognition on the first address item in isolation may allow enhanced voice recognition compared to performing voice recognition on the address. Receiving the second address item may include receiving the second address item auditorily. Recognition may be performed on the received second address item. Performing voice recognition on the second address item in isolation may allow enhanced voice recognition compared to performing voice recognition on a combination of the first address item and the second address item or on the address. [0020]
  • The first address item may include a state identifier. The second address item may include a city identifier identifying a city. The user may be provided a third list of options for a zip code identifier. The third list of options may exclude a zip code not in the identified city. The zip code identifier may be received auditorily from the user. The user may select the zip code identifier from the third list of options. The zip code identifier may identify a zip code. Voice recognition may be performed on the auditorily received zip code identifier. Excluding a zip code in the third list of options may allow enhanced voice recognition compared to not excluding a zip code. The user may be provided a fourth list of options for a street address identifier. The fourth list of options may exclude a street not in the identified zip code. The street address identifier may be received auditorily from the user. The user may select the street address identifier from the fourth list of options. The street address identifier may identify a street address. Voice recognition may be performed on the auditorily received street address identifier. Exclusion of a street in the fourth list of options may allow enhanced voice recognition compared to not excluding a street. [0021]
  • Providing the user the first list of options may include providing the first list on a display. Providing the user the second list of options may include providing the second list auditorily. [0022]
  • Circuitry may be used (i) to provide the user a first set of options for a first item of an address, (ii) to receive from the user the first address item taken from the first set of options, (iii) to limit a second set of options for a second item of the address based on the received first item, (iv) to provide the user the limited second set of options for the second address item, and/or (v) to receive the second address item. The circuitry may include a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed. The circuitry may include a processor operable to perform at least one of the enumerated operations. The circuitry may be used to perform one of the other features described for this or another aspect. [0023]
  • According to another general aspect, receiving an Internet address from a user includes prompting a user for a first portion of an Internet address. The first portion of the Internet address is received auditorily from the user. Voice recognition is performed on the received first portion. Performing voice recognition on only the first portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the first portion of the Internet address. The user is prompted for a second portion of the Internet address. The second portion of the Internet address is received auditorily from the user. Voice recognition is performed on the received second portion. Performing voice recognition on only the second portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the second portion of the Internet address. [0024]
  • The Internet address may include an electronic mail address. The first portion may include a domain identifier of an electronic mail address. The second portion may include a server identifier of an electronic mail address. The user may be prompted for a user identifier portion of an electronic mail address. A user identifier portion may be received auditorily from the user. Voice recognition may be performed on a received user identifier portion. Performing voice recognition on only a user identifier portion may allow enhanced recognition compared to performing voice recognition on more than the user identifier portion of an electronic mail address. [0025]
  • Performing voice recognition on a domain identifier may include using a domain vocabulary including common three-letter domain identifiers, which may allow enhanced recognition. Performing voice recognition on a server identifier may include using a server vocabulary including common server identifiers, which may allow enhanced recognition. Performing voice recognition on a user identifier may include using a user vocabulary including common user identifiers, which may allow enhanced recognition. The server vocabulary may be based on a domain identifier. [0026]
  • The Internet address may include a web site address. The first portion may include a domain identifier of the web site address. The second portion may include a server identifier of the web site address. The user may be prompted for a network identifier portion of the web site address. The network identifier portion may be received auditorily from the user. Voice recognition may be performed on the received network identifier portion. Performing voice recognition on only the network identifier portion may allow enhanced recognition compared to performing voice recognition on more than the network identifier portion of the web site address. [0027]
  • Circuitry may be used (i) to prompt a user for a first portion of an Internet address, (ii) to receive auditorily from the user the first portion of the Internet address, (iii) to perform voice recognition on the received first portion, wherein performing voice recognition on only the first portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the first portion of the Internet address, (iv) to prompt the user for a second portion of the Internet address, (v) to receive auditorily from the user the second portion of the Internet address; and/or (vi) to perform voice recognition on the received second portion, wherein performing voice recognition on only the second portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the second portion of the Internet address. The circuitry may include a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed. The circuitry may include a processor operable to perform at least one of the enumerated operations. The circuitry may be used to perform one of the other features described for this or another aspect. [0028]
  • The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and the drawings, and from the claims.[0029]
  • DESCRIPTION OF DRAWINGS
  • FIG. 1 is a flow chart of a process for recognizing a search string using a multi-cluster approach. [0030]
  • FIG. 2 is a diagrammatic flow chart depicting the process of FIG. 1. [0031]
  • FIG. 3 is a flow chart of a process for performing a search for a search string using a multi-level, multi-parameter cascade approach. [0032]
  • FIG. 4 is a picture of a page for implementing the process of FIG. 3. [0033]
  • FIG. 5 is a flow chart of a process for recognizing an address. [0034]
  • FIG. 6 is a block diagram of a pop-up wizard for entering address information. [0035]
  • FIG. 7 is a block diagram of a format for entering an electronic mail address. [0036]
  • FIG. 8 is a block diagram of a format for entering a web site address. [0037]
  • FIG. 9 is a flow chart of a process for searching for one or more matches to a search string. [0038]
  • FIG. 10 is a block diagram of a system for performing one or more of the described processes.[0039]
  • DETAILED DESCRIPTION
  • Various implementations include a user interface that provides a user with access to data. These user interfaces may be designed to accept various modes of input and to deliver various modes of output. Examples of input and output modes include manual, visual (for example, display or print), auditory (for example, voice or alarms), haptic, pressure, temperature, and smell. Manual modes may include, for example, keyboard, stylus, keypad, button, mouse, touch (for example, touch screen), and other hand inputs. Certain implementations are particularly suited for mobile applications, for which stylus or voice input is preferred, and for which output is presented visually on the screen and/or auditorily with text-to-speech or recorded human speech. [0040]
  • Various implementations also make use of structured grammars for voice recognition. The structured grammars may allow for quicker recognition, for quicker searching for an item in a corresponding database, and/or for enhanced voice recognition due to the decreased likelihood of misrecognizing a voice input. [0041]
  • Referring to FIG. 1, a process [0042] 100 for recognizing a search string using a multi-cluster approach includes entering a search string using a voice input (110). The search string may represent, for example, an item in a database that a user wants to find. For example, the user may enter “Sony laptop superslim 505Z” into a voice recognition engine of a computer database to pull up information on that (hypothetical) computer model. As explained, the grammar is structured around the database entries, including the actual database entries, or keywords, etc., and possibly also including additional category descriptions and other vocabulary entries.
  • The process [0043] 100 includes parsing the entered search string into at least one component in addition to the full search string (120). The full search string is also referred to as a component. A component may be a word or other recognized symbol, or group of words or symbols. The search string may be parsed into all of its components, or a single component may be parsed out. Parsing may be performed by recognizing silence between words, symbols, or other components, and the voice entry system may require such silence. Parsing may also be performed on voice inputs entered in a more natural delivery, without obvious pauses between components.
  • The process [0044] 100 includes performing voice recognition on at least two components (130). The parsing (120) may be performed simultaneously while performing the voice recognition (130). For example, as the search string is processed from left to right, for example, a component may be recognized (130) and, upon recognition, may be parsed (120). One of the two components may be the full search string.
  • The process [0045] 100 includes determining a resulting solution space in the grammar for at least one of the voice recognition operations (140). The solution space represents possible matches for the full search string. For example, the first component may be the first word of the search string, for example, “Sony,” and may correspond to a cluster in the speech recognition grammar. This cluster defined by “Sony” may contain, perhaps, only one hundred entries out of tens of thousands of entries in the grammar (and the corresponding database). Those one hundred entries would form the solution space for the component “Sony.”
  • The process [0046] 100 includes modifying the search space for the voice recognition operation (130) of at least one of the components using the solution space determined in operation 140 (150). Continuing with the example from above, if the full search string is “Sony laptop superslim 505Z,” then the search space being used to perform the voice recognition on the full string can be narrowed to include only the one hundred grammar entries that include the component “Sony.”
  • By narrowing the search space, one or more advantages may be realized in particular implementations. For example, by narrowing the search space, the complexity of the searched-grammar, and the size of the searched vocabulary may be reduced, which may enhance recognition accuracy. Further, the speed of the recognition process may be increased. [0047]
  • In one implementation, both recognition processes ([0048] 130) are performed at least partially in parallel and recognizing the smaller component, such as “Sony,” is faster than recognizing the entire search string. As a result, the recognition process for the full search string is started on the entire search space of grammar entries and is narrowed after the resulting solution space for the smaller component is determined in operation 140. Other implementations perform the voice recognition processes serially. For example, one implementation performs voice recognition on a smaller component, and afterwards performs voice recognition for a larger component using the smaller component's solution space as the search space for the larger component.
  • The process [0049] 100 includes determining a list of one or more matches for the full search string (160). Voice recognition algorithms often return confidence scores associated with the results. These confidence scores can be used, for example, to rank order the results and a selected number of the highest scoring results can be returned to the user.
  • The list of matches might not necessarily be good matches. Various implementations may use a threshold confidence score to determine if a good match has been found. If a good match has not been found, then a variety of options are available. For example, (i) the user may be prompted for more information, (ii) the search string may be modified automatically, if it has not already been, by, for example, using synonyms of recognized components, transposing components, etc., or (iii) the user may be presented with information on the size of the solution space for each component, and the confidence scores, which may reveal a component that the system had a difficult time recognizing. [0050]
  • Referring to FIG. 2, a diagrammatic flow chart [0051] 200 depicting the process 100 includes a search string 210. The search string 210 includes a first component 220 and a second component 230. The search string 210 may be, for example, a voice segment. The search string 210 is parsed using a parse process 240 into the first and second components 220, 230. A voice recognition process 250 is performed on each component 210, 220, 230, in parallel, using a search space 260. The parse process 240 and the voice recognition process 250 may be implemented using, for example, a processor or other computing device or combination of devices.
  • Voice recognition of the [0052] first component 220 results in a first solution space 270. Assuming that voice recognition of the first component 220 finishes before voice recognition of the second component 230 and of the full string 210, then each of the latter voice recognition operations can be restricted to the first solution space 270.
  • Voice recognition of the [0053] second component 230 results in a second solution space 280. Assuming that voice recognition of the second component 230 finishes before voice recognition of the full string 210, then voice recognition of the full string 210 can be restricted to an overlap 290 of the first solution space 270 and the second solution space 280. Voice recognition of the full string 210 results in a third solution space 295.
  • The time required for performing voice recognition on a small component can be decreased by structuring the grammar so that common components of the database entries (which are included in the grammar) are compared with the components of the search string before other components of the database entries (which are also included in the grammar). Further, common components may be entered as separate vocabulary entries in a grammar, even though those components do not constitute complete database entries. For example, the word “Sony” may be entered into the vocabulary even though it does not refer to an individual product (database entry). The component “Sony” can then be associated with all of the grammar entries that include the word “Sony” and that correspond to complete database entries. The same can be done for the individual word “laptop,” as well as the two-word component “Sony laptop,” for example. Such a structure may allow for relatively quick recognition of the component “Sony laptop” and a corresponding narrowing of the search space for the recognition of the full search string “Sony laptop superslim 505Z.”[0054]
  • Note that the list of matches determined in the process [0055] 100 (160) may return matches that correspond to actual database entries that match the entered search string. Accordingly, in such implementations, the voice recognition process may effectively perform the database search simultaneously. For example, each of the listed matches may serve as an index into the database for easy retrieval of the corresponding database entry.
  • Other implementations may allow search strings to include components that are not part of the database, however. For example, a user may be allowed to enter a price range for a computer. In such an example, the grammar could include, and be able to recognize, price ranges entered in a determined format. The grammar may be structured in a variety of ways to support recognizing such search strings. For example, if a user enters only a price range, the voice recognition engine may recognize the search string and associate it with a set of database entries satisfying the price range. Alternatively, the voice recognition engine may query the user for more data by, for example, returning a list of manufacturers having computers (computers being the assumed content of the database ultimately being searched) in that price range. If the user enters additional information, such as, for example, a manufacturer, the voice recognition system may use that additional information to narrow the solution space. If the user enters sufficient information, the grammar may be structured to allow the voice recognition system to determine, for the various price ranges that are recognizable, the grammar entries for all actual products (corresponding to actual database entries) that satisfy the entered price range and the other components of the search string. These entries may then be presented to the user. [0056]
  • The process [0057] 100 can also be applied to systems that do not use voice input. For example, other modes of input may require a recognition process that could be performed in an analogous manner to that already described.
  • Referring to FIG. 3, a process [0058] 300 for performing a search for a search string using a multi-level, multi-parameter cascade approach includes providing a first set of options for a first parameter (310). For example, a user interface to a database of computers may provide a list of manufacturers as the first set of options, with the first parameter being the manufacturer. The first set of options may be provided, for example, on a display, or through a voice response system.
  • The process [0059] 300 includes entering a first parameter selected from the first set of options (320). Continuing the example from above, a user may select, and enter, a manufacturer from a list provided in operation 310. The user may enter the first parameter by using, for example, a stylus, keyboard, touch screen, or voice input.
  • The process [0060] 300 includes providing a second set of options for a second parameter based on the first parameter (330). Continuing the example from above, a user interface may provide a list of product types, including, for example, desktops, laptops, and palmtops, that are available from the manufacturer entered in operation 320.
  • The process [0061] 300 includes entering a second parameter selected from the second set of options (340). Continuing the example from above, a user may select, and enter, a product type from the list provided in operation 330.
  • The process [0062] 300 includes providing a list of matches, based on the first and second parameters (350). Continuing the example from above, the list of matches may include all computers in the database that are manufactured by the entered manufacturer and that are of the entered product type. For example, the list of matches may include all Sony laptops.
  • The process [0063] 300 may be used, for example, instead of having a user enter a one-time, full search phrase. The process 300 presents a set of structured searches or selections from, for example, drop-down lists. The first and second parameters can be considered to be parts of a search string, with the cumulative search string producing the list of matches provided in operation 350. The database may be structured to allow for efficient searches based on the parameters provided in operations 310 and 330. Additionally, in voice input applications, by structuring the data entry, the grammar and vocabulary for each parameter may be simplified, thus potentially increasing recognition accuracy and speed.
  • Implementations may present multiple parameters and sets of options, and these may be organized into levels. In the process [0064] 300, one parameter was used at each of two levels. However, for example, multiple parameters may be presented at a first level, with both entries determining the list of options presented for additional multiple parameters at a second level, and with all entries determining a list of matches. Such parameters may include, for example, manufacturer, brand, product type, price range, and a variety of features of the products in the product type. Examples of features for computers include processor speed, amount of random access memory, storage capacity of a hard disk, video card speed and memory, and service contract options.
  • Referring to FIG. 4, a picture of a [0065] page 400 for implementing the process 300 includes a first level 410 and a second level 420. The first level 410 provides a first parameter 430 for the product, with a corresponding pull-down menu 440 that includes a set of options. The set of options in pull-down menu 440 may include, for example, desktop, laptop, and palmtop. The second level 420 provides a second parameter 450 for the brand, with a corresponding pull-down menu 460 that includes a set of options. The set of options in pull-down menu 460 are all assumed to satisfy the product parameter entered by the user in pull-down menu 440 and may include, for example, Sony, HP/Compaq, Dell, and IBM. Assuming that “laptop” was selected in the pull-down menu 440, then the pull-down menu 460 would only include brands (manufacturers) that sell laptops.
  • The [0066] page 400 also includes a category 470 for models that match the parameters entered in the first and second levels 410 and 420. The matching models are viewable using a pull-down menu 480. As the page 400 indicates, all of the search string information as well as the results may be presented in a single page. The page 400 is also presentable in a single screen shot, but other single-page implementations may use, for example, a web page that spans multiple screen lengths and requires scrolling to view all of the information.
  • Referring to FIG. 5, a process [0067] 500 for recognizing an address includes determining a list of options for a first part of an address (510). The address may be, for example, a street address or an Internet address, where Internet addresses include, for example, electronic mail addresses and web site addresses. If the address is a street address, the first part may be, for example, a state identifier.
  • The process [0068] 500 includes prompting a user for the first part of the address (520). The prompt may, for example, simply include a request to enter information, or it may include a list of options. The process 500 includes receiving the first part of the address (530). If the first part is received auditorily, the process 500 includes performing voice recognition of the first part of the address (540).
  • The process [0069] 500 includes determining a list of options for a second part of the address based on the received first part (550). Continuing the example from above, the second part may be, for example, a city identifier, and the list of options may include, for example, only those cities that are in the state identified by the received state identifier. By inverting the usual order of state and city in entering street addresses, a voice recognition system can simplify the relevant grammar and vocabulary for the city identifier, thus facilitating enhanced voice recognition accuracy and speed.
  • The process [0070] 500 includes prompting the user for the second part of the address (560). Again, the prompt need not include the list of options. The process 500 includes receiving the second part of the address (570). If the second part is received auditorily, the process 500 includes performing voice recognition of the second part of the address (580).
  • The process [0071] 500 could continue with subsequent determinations of lists of options for further parts of the address. Continuing the example from above, a list of options for a zip code could be determined based on the city identified by the received city identifier. Such a list could be determined from the available zip codes in the identified city. City streets in the city or the zip code could also be determined. Further, country information could be obtained before obtaining state information.
  • As the above example and the process [0072] 500 indicate, the range of possibilities for each subsequent piece of address information can be narrowed by entering the data in an order that is reverse from the ordinary practice, that is, by entering data for geographically broad categories to geographically narrow categories. If multiple countries are concerned, the impact of using the reverse order may be even greater because standard designations for streets varies for different languages.
  • The process [0073] 500 may prompt the user in a number of ways. For example, the user may be prompted to enter address information in a particular order, allowing a system to process the address information as it is entered and to prepare the lists of options. Entry fields for country, state or province, city, zip or postal code, street, etc., for example, may be presented top-down on a screen or sequentially presented in speech output.
  • Referring to FIG. 6, there is shown another way to prompt the user in the process [0074] 500. A system may use a pop-up wizard 600 on the screen of a device to ask the user to enter specific address information. Further, a system may preserve the normative order of address information, but use visual cues, for example, to prompt the user to enter the information in a particular order. Visual cues may include, for example, highlighting or coloring the border or the title of an entry field.
  • The process [0075] 500 may be applied to data entered using a voice mode or another mode. After the data is entered at each prompt, and after it is recognized if voice input is used, a database of addresses may be searched to determine the list of options for the next address field. Such systems allow database searching on an ongoing basis instead of waiting until all address information is entered. Such systems also allow for guided entry using pull-down menus and, with or without guided entry, alerting a user at the time of entry if an invalid entry is made for a particular part of an address.
  • The process [0076] 500 may also be applied to other addresses, in addition to street addresses or parts thereof. For example, the process 500 may be applied to Internet addresses, including, for example, electronic mail addresses and web site addresses.
  • Referring to FIG. 7, a [0077] format 700 for entering an electronic mail address includes using a user identifier 710, a server identifier 720, and a domain identifier 730. The “at sign” separating the user identifier 710 and the server identifier 720, and the “dot” separating the server identifier 720 and the domain identifier 730 may be implicit and inserted automatically, that is, without human intervention.
  • In one implementation, the [0078] domain identifier 730 is entered first due to the small number of options available for this field. A list of options for the server identifier 720 can be generated based on the entered domain. For example, if “com” is entered for the domain, then a list of options for the server identifier 720 may include, for example, all “Fortune 100” companies and the twenty-five most frequently visited commercial web sites. Similar lists may be generated for “gov,” “net,” and other domain identifiers 730. A list of options for the user identifier 710 may include, for example, common last names and first names and other conventions, such as, for example, a first initial followed by a last name.
  • Referring to FIG. 8, a [0079] format 800 for entering a web site address includes using a network identifier 810, a server identifier 820, and a domain identifier 830. The two “dots” separating the three identifiers 810, 820, 830 may be implicit and inserted automatically. The network identifier may be selected from, for example, “www,” “www1,” “www2,” etc.
  • Referring to FIG. 9, a process [0080] 900 for searching for one or matches to a search string includes accessing at least a first part of a search string (910). Such accessing may include, for example, receiving a voice input, a stylus input, or a menu selection, and the first part may include the entire search string.
  • The process [0081] 900 includes searching a first search space for a match for the first part of the search string (920). The first search space may include, for example, a search space in a grammar of a voice recognition engine, a search space in a database, or a search space in a list of options presented to a user in a pull-down menu. Searching may include, for example, comparing text entries, voice waveforms, or codes representing entries in a codebook of vector-quantized waveforms.
  • The process [0082] 900 includes limiting a second search space based on a result of searching the first search space (930). The second search space may, for example, be similar to or the same as the first search space. Limiting may include, for example, paring down the possible grammar or vocabulary entries that could be examined, paring down the possible database entries that could be examined, or paring down the number of options that could be displayed or made available for a parameter of the search string. And paring down the possibilities or options may be done, for example, so as to exclude possibilities or options that do not satisfy the first part of the search string.
  • The process [0083] 900 includes accessing at least a second part of the search string (940) and searching the limited second search space for a match for the second part of the search string (950). Accessing the second part of the search string may include, for example, receiving a voice input, a stylus input, or a menu selection, and the second part may include the entire search string. Searching the limited second search space may be performed, for example, in the same way or in a similar way as searching the first search space is performed. As suggested by the discussion of this paragraph and the preceding paragraphs, the process 900 is intended to cover all of the disclosed processes.
  • Referring to FIG. 10, a [0084] system 1000 for implementing one or more of the above processes includes a computing device 1010, a first memory 1020 located internal to the computing device 1010, a second memory 1030 located external to the computing device 1010, and a recognition engine 1040 located external to the computing device 1010. The computing device may be, for example, a desktop, laptop, palmtop, or other type of electronic device capable of performing one or more of the processes described. The computing device 1010 may include circuitry, such as, for example, a processor, a controller, a programmed logic device, and a memory having instructions stored thereon. The circuitry may include, for example, analog and/or digital circuitry. The first and second memories 1020, 1030 may be, for example, permanent or temporary memory capable of storing data or instructions at least temporarily. The recognition engine 1040 may be a voice recognition engine or a recognition engine for another mode of input. The second memory 1030 and the recognition engine 1040 are shown as being external to, and optionally connected to, the computing device 1010. However, the second memory 1030 and the recognition engine 1040 may also be integrated into the computing device 1010 or be omitted from the system 1000.
  • A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, the operations of the disclosed processes need not necessarily be performed in the order(s) indicated. Accordingly, other implementations are within the scope of the following claims. [0085]

Claims (47)

What is claimed is:
1. A method of performing voice recognition, the method comprising:
accessing a voice input including at least a first part and a second part;
performing voice recognition on the first part of the voice input;
performing voice recognition on a combination of the first part and the second part using a search space; and
limiting the search space based on a result from performing voice recognition on the first part of the voice input, wherein limiting the search space allows enhanced voice recognition of the combination compared to performing voice recognition on the unlimited search space.
2. The method of claim 1 wherein:
performing voice recognition on the first part produces a recognized string, and the recognized string is associated with a set of recognizable utterances from the search space; and
limiting the search space comprises limiting the search space to the set of recognizable utterances.
3. The method of claim 1 wherein voice recognition on the first part is performed in parallel with voice recognition on the combination, such that the search space is not limited until after voice recognition on the combination has begun.
4. The method of claim 1 wherein voice recognition on the first part is performed before voice recognition on the combination, such that the search space is limited before voice recognition on the combination has begun.
5. The method of claim 1 wherein performing voice recognition on the first part of the voice input comprises comparing the first part to a set of high-occurrence patterns in the search space, followed by comparing the first part to a set of low-occurrence patterns in the search space.
6. The method of claim 1 wherein performing voice recognition on the first part of the voice input comprises using a second search space, and the method further comprises:
performing voice recognition on the second part of the voice input; and
limiting the second search space based on a result from performing voice recognition on the second part of the voice input.
7. The method of claim 6 wherein limiting the search space is also based on the result from performing voice recognition on the second part of the voice input.
8. An apparatus for performing voice recognition, the apparatus comprising:
accessing circuitry to access a voice input including at least a first part and a second part;
recognition circuitry to perform voice recognition on the first part of the voice input and on the combination of the first part and the second part, wherein voice recognition is performed on the combination using a search space; and
limiting circuitry to limit the search space based on a result from performing voice recognition on the first part of the voice input, wherein limiting the search space allows enhanced voice recognition of the voice input compared to performing voice recognition on the unlimited search space.
9. The apparatus of claim 8 further comprising a recognition engine that includes the recognition circuitry.
10. The apparatus of claim 8 wherein one or more of the accessing circuitry, the recognition circuitry, and the limiting circuitry comprise a memory with instructions for performing one or more of the operations of accessing the voice input, performing voice recognition, and limiting the search space based on the result from performing voice recognition on the first part of the voice input.
11. The apparatus of claim 8 wherein one or more of the accessing circuitry, the recognition circuitry, and the limiting circuitry comprise a processor to perform one or more of the operations of accessing the voice input, performing voice recognition, and limiting the search space based on the result from performing voice recognition on the first part of the voice input.
12. The apparatus of claim 8 wherein:
the recognition circuitry is operable to perform voice recognition on the first part of the voice input using a second search space;
the recognition circuitry is operable to perform voice recognition on the second part of the voice input;
the limiting circuitry is operable to limit the second search space based, at least in part, on a result from performing voice recognition on the second part of the voice input; and
the limiting circuitry is operable to limit the search space based, at least in part, on a result from performing voice recognition on the second part of the voice input.
13. A method of accepting input from a user, the method comprising:
providing a first set of options to a user, the first set of options relating to a first parameter of a search string, and being provided to the user in a page;
accepting a first input from the user, the first input being selected from the first set of options;
limiting a second set of options based on the accepted first input, the second set of options relating to a second parameter of the search string; and
providing the second set of options to the user in the page, such that the user is presented with a single page that provides the first set of options and the second set of options.
14. The method of claim 13 wherein accepting the first input from the user comprises receiving an auditory input and performing voice recognition, wherein performing voice recognition on the first input in isolation allows enhanced voice recognition compared to performing voice recognition on the search string.
15. The method of claim 13 wherein accepting the first input from the user comprises receiving a digital input.
16. The method of claim 13 further comprising accepting a second input from the user, the second input being selected from the second set of options.
17. The method of claim 13 further comprising:
providing a third set of options to the user, the third set of options relating to a third parameter of the search string and being provided to the user in the page; and
accepting a third input from the user, the third input being selected from the third set of options,
wherein the second set of options provided to the user is also based on the accepted third input.
18. The method of claim 13 further comprising:
providing a third set of options to the user, the third set of options relating to a third parameter of the search string and being provided to the user in the page;
accepting a third input from the user, the third input being selected from the third set of options; and
modifying the second set of options provided to the user based on the accepted third input.
19. The method of claim 13 wherein providing the second set of options comprises searching a set of data for the first input and providing only data items from the set of data that include the first input.
20. The method of claim 19 wherein the first input comprises a manufacturer designation and only data items manufactured by the manufacturer identified by the manufacturer designation are provided in the second set of options.
21. The method of claim 16 wherein:
accepting the first input comprises receiving the first input auditorily from the user,
the method further comprises performing voice recognition on the first input in isolation, wherein performing voice recognition on the first input in isolation allows enhanced voice recognition compared to performing voice recognition on the search string,
providing the second set of options comprises searching a set of data items for the first input and including in the second set of options references only to those data items that include the first input,
accepting the second input comprises receiving the second input auditorily from the user, and
the method further comprises performing voice recognition on the second input in isolation, wherein performing voice recognition on the second input in isolation allows enhanced voice recognition compared to performing voice recognition on the search string.
22. An apparatus for accepting input from a user, the apparatus comprising circuitry operable to perform at least the following operations:
provide a first set of options to a user, the first set of options relating to a first parameter of a search string, and being provided to the user in a page;
accept a first input from the user, the first input being selected from the first set of options;
limit a second set of options based on the accepted first input, the second set of options relating to a second parameter of the search string; and
provide the second set of options to the user in the page, such that the user is presented with a single page that provides the first set of options and the second set of options.
23. The apparatus of claim 22 wherein the circuitry comprises a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed.
24. The apparatus of claim 22 wherein the circuitry comprises a processor operable to perform at least one of the enumerated operations.
25. The apparatus of claim 22 wherein:
accepting the first input comprises receiving the first input auditorily from the user;
providing the second set of options comprises searching a set of data items for the first input and including in the second set of options references only to those data items that include the first input; and
the circuitry is further operable to perform at least the following operations:
perform voice recognition on the first input in isolation, wherein performing voice recognition on the first input in isolation allows enhanced voice recognition compared to performing voice recognition on the search string;
receive a second input auditorily from the user, the second input being selected from the second set of options; and
perform voice recognition on the second input in isolation, wherein performing voice recognition on the second input in isolation allows enhanced voice recognition compared to performing voice recognition on the search string.
26. A method of receiving items of an address from a user, the method comprising:
providing the user a first set of options for a first item of an address;
receiving from the user the first address item taken from the first set of options;
limiting a second set of options for a second item of the address based on the received first item;
providing the user the limited second set of options for the second address item; and
receiving the second address item.
27. The method of claim 26 wherein:
receiving the first address item comprises receiving the first address item auditorily, and
the method further comprises performing recognition on the received first address item, wherein performing voice recognition on the first address item in isolation allows enhanced voice recognition compared to performing voice recognition on the address.
28. The method of claim 27 wherein:
receiving the second address item comprises receiving the second address item auditorily, and
the method further comprises performing recognition on the received second address item, wherein performing voice recognition on the second address item in isolation allows enhanced voice recognition compared to performing voice recognition on a combination of the first address item and the second address item or on the address.
29. The method of claim 28 wherein:
the first address item comprises a state identifier,
the second address item comprises a city identifier identifying a city, and
the method further comprises:
providing the user a third list of options for a zip code identifier, wherein the third list of options excludes an excluded zip code not in the identified city;
receiving auditorily from the user the zip code identifier taken from the third list of options and identifying a zip code;
performing voice recognition on the auditorily received zip code identifier, wherein the exclusion of the excluded zip code in the third list of options allows enhanced voice recognition compared to not excluding the excluded zip code;
providing the user a fourth list of options for a street address identifier, wherein the fourth list of options excludes an excluded street not in the identified zip code;
receiving auditorily from the user the street address identifier taken from the fourth list of options and identifying a street address; and
performing voice recognition on the auditorily received street address identifier, wherein the exclusion of the excluded street in the fourth list of options allows enhanced voice recognition compared to not excluding the excluded street.
30. The method of claim 26 wherein providing the user the first list of options comprises providing the first list on a display.
31. The method of claim 26 wherein providing the user the second list of options comprises providing the second list auditorily.
32. An apparatus for receiving items of an address from a user, the apparatus comprising circuitry operable to perform at least the following operations:
provide the user a first set of options for a first item of an address;
receive from the user the first address item taken from the first set of options;
limit a second set of options for a second item of the address based on the received first item;
provide the user the limited second set of options for the second address item; and
receive the second address item.
33. The apparatus of claim 32 wherein the circuitry comprises a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed.
34. The apparatus of claim 32 wherein the circuitry comprises a processor operable to perform at least one of the enumerated operations.
35. The apparatus of claim 32 wherein:
receiving the first address item comprises receiving the first address item auditorily;
receiving the second address item comprises receiving the second address item auditorily;
the circuitry is further operable perform recognition on the received first address item, wherein performing voice recognition on the first address item in isolation allows enhanced voice recognition compared to performing voice recognition on the address;
the circuitry is further operable to perform recognition on the received second address item, wherein performing voice recognition on the second address item in isolation allows enhanced voice recognition compared to performing voice recognition on a combination of the first address item and the second address item or on the address.
the first address item comprises a state identifier;
the second address item comprises a city identifier identifying a city;
the circuitry is further operable to perform at least the following operations:
provide the user a third list of options for a zip code identifier, wherein the third list of options excludes some zip codes not in the identified city;
receive auditorily from the user the zip code identifier taken from the third list of options and identifying a zip code;
perform voice recognition on the auditorily received zip code identifier, wherein the exclusion of some zip codes in the third list of options allows enhanced voice recognition compared to not excluding some zip codes;
provide the user a fourth list of options for a street address identifier, wherein the fourth list of options excludes some streets not in the identified zip code;
receive auditorily from the user the street address identifier taken from the fourth list of options and identifying a street address; and
perform voice recognition on the auditorily received street address identifier,
wherein the exclusion of some streets in the fourth list of options allows enhanced voice recognition compared to not excluding some streets.
36. A method of receiving an Internet address from a user, the method comprising:
prompting a user for a first portion of an Internet address;
receiving auditorily from the user the first portion of the Internet address;
performing voice recognition on the received first portion, wherein performing voice recognition on only the first portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the first portion of the Internet address;
prompting the user for a second portion of the Internet address;
receiving auditorily from the user the second portion of the Internet address; and
performing voice recognition on the received second portion, wherein performing voice recognition on only the second portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the second portion of the Internet address.
37. The method of claim 36 wherein the Internet address comprises an electronic mail address.
38. The method of claim 37 wherein:
the first portion comprises a domain identifier of the electronic mail address,
the second portion comprises a server identifier of the electronic mail address, and
the method further comprises:
prompting the user for a user identifier portion of the electronic mail address;
receiving auditorily from the user the user identifier portion; and
performing voice recognition on the received user identifier portion, wherein performing voice recognition on only the user identifier portion allows enhanced recognition compared to performing voice recognition on more than the user identifier portion of the electronic mail address.
39. The method of claim 38 wherein:
performing voice recognition on the domain identifier comprises using a domain vocabulary including common three-letter domain identifiers, thereby allowing the enhanced recognition,
performing voice recognition on the server identifier comprises using a server vocabulary including common server identifiers, thereby allowing the enhanced recognition, and
performing voice recognition on the user identifier comprises using a user vocabulary including common user identifiers, thereby allowing the enhanced recognition.
40. The method of claim 39 wherein the server vocabulary is based on the domain identifier.
41. The method of claim 36 wherein the Internet address comprises a web site address.
42. The method of claim 41 wherein:
the first portion comprises a domain identifier of the web site address, and
the second portion comprises a server identifier of the web site address.
43. The method of claim 42 further comprising:
prompting the user for a network identifier portion of the web site address;
receiving auditorily from the user the network identifier portion; and
performing voice recognition on the received network identifier portion, wherein performing voice recognition on only the network identifier portion allows enhanced recognition compared to performing voice recognition on more than the network identifier portion of the web site address.
44. An apparatus for receiving an Internet address from a user, the apparatus comprising circuitry for performing at least the following operations:
prompt a user for a first portion of an Internet address;
receive auditorily from the user the first portion of the Internet address;
perform voice recognition on the received first portion, wherein performing voice recognition on only the first portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the first portion of the Internet address;
prompt the user for a second portion of the Internet address;
receive auditorily from the user the second portion of the Internet address; and
perform voice recognition on the received second portion, wherein performing voice recognition on only the second portion of the Internet address allows enhanced recognition compared to performing voice recognition on more than the second portion of the Internet address.
45. The apparatus of claim 44 wherein the circuitry comprises a memory having instructions stored thereon that when executed by a machine result in at least one of the enumerated operations being performed.
46. The apparatus of claim 44 wherein the circuitry comprises a processor operable to perform at least one of the enumerated operations.
47. The apparatus of claim 44 wherein:
the Internet address comprises an electronic mail address;
the first portion comprises a domain identifier of the electronic mail address;
performing voice recognition on the domain identifier comprises using a domain vocabulary including common three-letter domain identifiers, thereby allowing the enhanced recognition;
the second portion comprises a server identifier of the electronic mail address;
performing voice recognition on the server identifier comprises using a server vocabulary that is based on the domain identifier and includes common server identifiers, thereby allowing the enhanced recognition; and
the circuitry is further operable to perform at least the following operations:
prompt the user for a user identifier portion of the electronic mail address;
receive auditorily from the user the user identifier portion; and
perform voice recognition on the received user identifier portion, wherein performing voice recognition on only the user identifier portion allows enhanced recognition compared to performing voice recognition on more than the user identifier portion of the electronic mail address, and performing voice recognition on the user identifier comprises using a user vocabulary including common user identifiers, thereby allowing the enhanced recognition.
US10/184,069 2002-02-07 2002-06-28 User interface for data access and entry Abandoned US20030149564A1 (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
US10/184,069 US20030149564A1 (en) 2002-02-07 2002-06-28 User interface for data access and entry
US10/305,267 US7177814B2 (en) 2002-02-07 2002-11-27 Dynamic grammar for voice-enabled applications
US10/358,665 US7337405B2 (en) 2002-02-07 2003-02-06 Multi-modal synchronization
PCT/US2003/003752 WO2003067443A1 (en) 2002-02-07 2003-02-07 User interface and dynamic grammar in a multi-modal synchronization architecture
AU2003215100A AU2003215100A1 (en) 2002-02-07 2003-02-07 User interface and dynamic grammar in a multi-modal synchronization architecture
EP03710916.2A EP1481328B1 (en) 2002-02-07 2003-02-07 User interface and dynamic grammar in a multi-modal synchronization architecture
CN2009101621446A CN101621547B (en) 2002-02-07 2003-02-07 Method and device for receiving input or address stem from the user
US11/623,455 US20070179778A1 (en) 2002-02-07 2007-01-16 Dynamic Grammar for Voice-Enabled Applications

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US35432402P 2002-02-07 2002-02-07
US10/184,069 US20030149564A1 (en) 2002-02-07 2002-06-28 User interface for data access and entry

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/157,030 Continuation-In-Part US7359858B2 (en) 2002-02-07 2002-05-30 User interface for data access and entry

Related Child Applications (3)

Application Number Title Priority Date Filing Date
US10/305,267 Continuation-In-Part US7177814B2 (en) 2002-02-07 2002-11-27 Dynamic grammar for voice-enabled applications
US10/358,665 Continuation-In-Part US7337405B2 (en) 2002-02-07 2003-02-06 Multi-modal synchronization
US11/623,455 Continuation-In-Part US20070179778A1 (en) 2002-02-07 2007-01-16 Dynamic Grammar for Voice-Enabled Applications

Publications (1)

Publication Number Publication Date
US20030149564A1 true US20030149564A1 (en) 2003-08-07

Family

ID=38252238

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/131,216 Expired - Lifetime US7203907B2 (en) 2002-02-07 2002-04-25 Multi-modal synchronization
US10/184,069 Abandoned US20030149564A1 (en) 2002-02-07 2002-06-28 User interface for data access and entry

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/131,216 Expired - Lifetime US7203907B2 (en) 2002-02-07 2002-04-25 Multi-modal synchronization

Country Status (3)

Country Link
US (2) US7203907B2 (en)
CN (2) CN100578474C (en)
DE (1) DE60319962T2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040110494A1 (en) * 2002-12-09 2004-06-10 Voice Signal Technologies, Inc. Provider-activated software for mobile communication devices
US20060143007A1 (en) * 2000-07-24 2006-06-29 Koh V E User interaction with voice information services
US20060217881A1 (en) * 2005-03-28 2006-09-28 Sap Aktiengesellschaft Incident command post
US20110184730A1 (en) * 2010-01-22 2011-07-28 Google Inc. Multi-dimensional disambiguation of voice commands
US20130066635A1 (en) * 2011-09-08 2013-03-14 Samsung Electronics Co., Ltd. Apparatus and method for controlling home network service in portable terminal
US9317605B1 (en) 2012-03-21 2016-04-19 Google Inc. Presenting forked auto-completions
US9529929B1 (en) * 2002-10-15 2016-12-27 Mitchell Joseph Aiosa Morris Systems, apparatus, methods of operation thereof, program products thereof and methods for web page organization
US9646606B2 (en) 2013-07-03 2017-05-09 Google Inc. Speech recognition using domain knowledge

Families Citing this family (105)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7286651B1 (en) * 2002-02-12 2007-10-23 Sprint Spectrum L.P. Method and system for multi-modal interaction
US8590013B2 (en) 2002-02-25 2013-11-19 C. S. Lee Crawford Method of managing and communicating data pertaining to software applications for processor-based devices comprising wireless communication circuitry
US7330894B2 (en) * 2002-04-19 2008-02-12 International Business Machines Corporation System and method for preventing timeout of a client
US20040181467A1 (en) * 2003-03-14 2004-09-16 Samir Raiyani Multi-modal warehouse applications
US7490183B2 (en) * 2004-02-12 2009-02-10 International Business Machines Corporation Information kit integration architecture for end-user systems
US20060161778A1 (en) * 2004-03-29 2006-07-20 Nokia Corporation Distinguishing between devices of different types in a wireless local area network (WLAN)
US7580867B2 (en) * 2004-05-04 2009-08-25 Paul Nykamp Methods for interactively displaying product information and for collaborative product design
US20060036770A1 (en) * 2004-07-30 2006-02-16 International Business Machines Corporation System for factoring synchronization strategies from multimodal programming model runtimes
US20060075100A1 (en) * 2004-09-28 2006-04-06 Nokia Corporation System, device, software and method for providing enhanced UPnP support on devices
US20060149550A1 (en) * 2004-12-30 2006-07-06 Henri Salminen Multimodal interaction
US20060235694A1 (en) * 2005-04-14 2006-10-19 International Business Machines Corporation Integrating conversational speech into Web browsers
US8171493B2 (en) * 2005-09-06 2012-05-01 Nvoq Incorporated VXML browser control channel
US8156128B2 (en) 2005-09-14 2012-04-10 Jumptap, Inc. Contextual mobile content placement on a mobile communication facility
US20070060114A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Predictive text completion for a mobile communication facility
US8615719B2 (en) 2005-09-14 2013-12-24 Jumptap, Inc. Managing sponsored content for delivery to mobile communication facilities
US8515401B2 (en) 2005-09-14 2013-08-20 Jumptap, Inc. System for targeting advertising content to a plurality of mobile communication facilities
US20070061335A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Multimodal search query processing
US8503995B2 (en) 2005-09-14 2013-08-06 Jumptap, Inc. Mobile dynamic advertisement creation and placement
US9201979B2 (en) 2005-09-14 2015-12-01 Millennial Media, Inc. Syndication of a behavioral profile associated with an availability condition using a monetization platform
US7548915B2 (en) * 2005-09-14 2009-06-16 Jorey Ramer Contextual mobile content placement on a mobile communication facility
US8238888B2 (en) 2006-09-13 2012-08-07 Jumptap, Inc. Methods and systems for mobile coupon placement
US20070198485A1 (en) * 2005-09-14 2007-08-23 Jorey Ramer Mobile search service discovery
US7769764B2 (en) 2005-09-14 2010-08-03 Jumptap, Inc. Mobile advertisement syndication
US7702318B2 (en) 2005-09-14 2010-04-20 Jumptap, Inc. Presentation of sponsored content based on mobile transaction event
US7752209B2 (en) 2005-09-14 2010-07-06 Jumptap, Inc. Presenting sponsored content on a mobile communication facility
US8311888B2 (en) 2005-09-14 2012-11-13 Jumptap, Inc. Revenue models associated with syndication of a behavioral profile using a monetization platform
US8832100B2 (en) 2005-09-14 2014-09-09 Millennial Media, Inc. User transaction history influenced search results
US9076175B2 (en) 2005-09-14 2015-07-07 Millennial Media, Inc. Mobile comparison shopping
US8805339B2 (en) 2005-09-14 2014-08-12 Millennial Media, Inc. Categorization of a mobile user profile based on browse and viewing behavior
US9058406B2 (en) 2005-09-14 2015-06-16 Millennial Media, Inc. Management of multiple advertising inventories using a monetization platform
US10911894B2 (en) 2005-09-14 2021-02-02 Verizon Media Inc. Use of dynamic content generation parameters based on previous performance of those parameters
US8302030B2 (en) 2005-09-14 2012-10-30 Jumptap, Inc. Management of multiple advertising inventories using a monetization platform
US7676394B2 (en) 2005-09-14 2010-03-09 Jumptap, Inc. Dynamic bidding and expected value
US8364521B2 (en) 2005-09-14 2013-01-29 Jumptap, Inc. Rendering targeted advertisement on mobile communication facilities
US8195133B2 (en) 2005-09-14 2012-06-05 Jumptap, Inc. Mobile dynamic advertisement creation and placement
US7660581B2 (en) 2005-09-14 2010-02-09 Jumptap, Inc. Managing sponsored content based on usage history
US20070061198A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Mobile pay-per-call campaign creation
US8290810B2 (en) 2005-09-14 2012-10-16 Jumptap, Inc. Realtime surveying within mobile sponsored content
US10038756B2 (en) 2005-09-14 2018-07-31 Millenial Media LLC Managing sponsored content based on device characteristics
US8660891B2 (en) 2005-11-01 2014-02-25 Millennial Media Interactive mobile advertisement banners
US8688671B2 (en) 2005-09-14 2014-04-01 Millennial Media Managing sponsored content based on geographic region
US8103545B2 (en) 2005-09-14 2012-01-24 Jumptap, Inc. Managing payment for sponsored content presented to mobile communication facilities
US10592930B2 (en) 2005-09-14 2020-03-17 Millenial Media, LLC Syndication of a behavioral profile using a monetization platform
US8819659B2 (en) 2005-09-14 2014-08-26 Millennial Media, Inc. Mobile search service instant activation
US7860871B2 (en) 2005-09-14 2010-12-28 Jumptap, Inc. User history influenced search results
US9703892B2 (en) 2005-09-14 2017-07-11 Millennial Media Llc Predictive text completion for a mobile communication facility
US20110313853A1 (en) 2005-09-14 2011-12-22 Jorey Ramer System for targeting advertising content to a plurality of mobile communication facilities
US8027879B2 (en) 2005-11-05 2011-09-27 Jumptap, Inc. Exclusivity bidding for mobile sponsored content
US9471925B2 (en) 2005-09-14 2016-10-18 Millennial Media Llc Increasing mobile interactivity
US7577665B2 (en) 2005-09-14 2009-08-18 Jumptap, Inc. User characteristic influenced search results
US8131271B2 (en) 2005-11-05 2012-03-06 Jumptap, Inc. Categorization of a mobile user profile based on browse behavior
US8364540B2 (en) 2005-09-14 2013-01-29 Jumptap, Inc. Contextual targeting of content using a monetization platform
US7912458B2 (en) 2005-09-14 2011-03-22 Jumptap, Inc. Interaction analysis and prioritization of mobile content
US8209344B2 (en) 2005-09-14 2012-06-26 Jumptap, Inc. Embedding sponsored content in mobile applications
US8812526B2 (en) 2005-09-14 2014-08-19 Millennial Media, Inc. Mobile content cross-inventory yield optimization
US8229914B2 (en) 2005-09-14 2012-07-24 Jumptap, Inc. Mobile content spidering and compatibility determination
US8666376B2 (en) 2005-09-14 2014-03-04 Millennial Media Location based mobile shopping affinity program
US8989718B2 (en) 2005-09-14 2015-03-24 Millennial Media, Inc. Idle screen advertising
US8175585B2 (en) 2005-11-05 2012-05-08 Jumptap, Inc. System for targeting advertising content to a plurality of mobile communication facilities
US8571999B2 (en) 2005-11-14 2013-10-29 C. S. Lee Crawford Method of conducting operations for a social network application including activity list generation
US8189563B2 (en) 2005-12-08 2012-05-29 International Business Machines Corporation View coordination for callers in a composite services enablement environment
US7827288B2 (en) * 2005-12-08 2010-11-02 International Business Machines Corporation Model autocompletion for composite services synchronization
US7818432B2 (en) * 2005-12-08 2010-10-19 International Business Machines Corporation Seamless reflection of model updates in a visual page for a visual channel in a composite services delivery system
US7809838B2 (en) * 2005-12-08 2010-10-05 International Business Machines Corporation Managing concurrent data updates in a composite services delivery system
US7890635B2 (en) * 2005-12-08 2011-02-15 International Business Machines Corporation Selective view synchronization for composite services delivery
US7877486B2 (en) * 2005-12-08 2011-01-25 International Business Machines Corporation Auto-establishment of a voice channel of access to a session for a composite service from a visual channel of access to the session for the composite service
US20070133769A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Voice navigation of a visual view for a session in a composite services enablement environment
US8259923B2 (en) 2007-02-28 2012-09-04 International Business Machines Corporation Implementing a contact center using open standards and non-proprietary components
US20070133511A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Composite services delivery utilizing lightweight messaging
US10332071B2 (en) 2005-12-08 2019-06-25 International Business Machines Corporation Solution for adding context to a text exchange modality during interactions with a composite services application
US20070136449A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Update notification for peer views in a composite services delivery environment
US20070133509A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Initiating voice access to a session from a visual access channel to the session in a composite services delivery system
US11093898B2 (en) 2005-12-08 2021-08-17 International Business Machines Corporation Solution for adding context to a text exchange modality during interactions with a composite services application
US20070136421A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Synchronized view state for composite services delivery
US20070147355A1 (en) * 2005-12-08 2007-06-28 International Business Machines Corporation Composite services generation tool
US8005934B2 (en) * 2005-12-08 2011-08-23 International Business Machines Corporation Channel presence in a composite services enablement environment
US20070133773A1 (en) 2005-12-08 2007-06-14 International Business Machines Corporation Composite services delivery
US20070133512A1 (en) * 2005-12-08 2007-06-14 International Business Machines Corporation Composite services enablement of visual navigation into a call center
US7792971B2 (en) * 2005-12-08 2010-09-07 International Business Machines Corporation Visual channel refresh rate control for composite services delivery
US20070245383A1 (en) * 2005-12-21 2007-10-18 Bhide Madhav P System and method for inferface navigation
US7487453B2 (en) * 2006-03-24 2009-02-03 Sap Ag Multi-modal content presentation
US7827033B2 (en) * 2006-12-06 2010-11-02 Nuance Communications, Inc. Enabling grammars in web page frames
US8594305B2 (en) 2006-12-22 2013-11-26 International Business Machines Corporation Enhancing contact centers with dialog contracts
US20090013255A1 (en) * 2006-12-30 2009-01-08 Matthew John Yuschik Method and System for Supporting Graphical User Interfaces
US9055150B2 (en) 2007-02-28 2015-06-09 International Business Machines Corporation Skills based routing in a standards based contact center using a presence server and expertise specific watchers
US9247056B2 (en) 2007-02-28 2016-01-26 International Business Machines Corporation Identifying contact center agents based upon biometric characteristics of an agent's speech
US8102975B2 (en) 2007-04-04 2012-01-24 Sap Ag Voice business client
US8862475B2 (en) * 2007-04-12 2014-10-14 Nuance Communications, Inc. Speech-enabled content navigation and control of a distributed multimodal browser
US7912963B2 (en) * 2007-06-28 2011-03-22 At&T Intellectual Property I, L.P. Methods and apparatus to control a voice extensible markup language (VXML) session
US8726170B2 (en) * 2008-10-30 2014-05-13 Sap Ag Delivery of contextual information
US20100241612A1 (en) * 2009-03-20 2010-09-23 Research In Motion Limited Method, system and apparatus for managing media files
EP2230811A1 (en) * 2009-03-20 2010-09-22 Research In Motion Limited synchronization between a mobile device and a computing terminal
US20120185543A1 (en) * 2011-01-18 2012-07-19 Samsung Electronics Co., Ltd. Apparatus and method for sharing information on a webpage
US9230549B1 (en) 2011-05-18 2016-01-05 The United States Of America As Represented By The Secretary Of The Air Force Multi-modal communications (MMC)
JP5601353B2 (en) * 2012-06-29 2014-10-08 横河電機株式会社 Network management system
JP5556858B2 (en) * 2012-06-29 2014-07-23 横河電機株式会社 Network management system
US9093072B2 (en) * 2012-07-20 2015-07-28 Microsoft Technology Licensing, Llc Speech and gesture recognition enhancement
US9734819B2 (en) 2013-02-21 2017-08-15 Google Technology Holdings LLC Recognizing accented speech
CN104065613B (en) * 2013-03-18 2017-11-21 中国移动通信集团内蒙古有限公司 Synchronous method, system and the device of a kind of off-line operation data of application
CN103346954A (en) * 2013-06-17 2013-10-09 北京印刷学院 Email sending-receiving device
US20160269349A1 (en) * 2015-03-12 2016-09-15 General Electric Company System and method for orchestrating and correlating multiple software-controlled collaborative sessions through a unified conversational interface
WO2016171822A1 (en) * 2015-04-18 2016-10-27 Intel Corporation Multimodal interface
CN107945796B (en) * 2017-11-13 2021-05-25 百度在线网络技术(北京)有限公司 Speech recognition method, device, equipment and computer readable medium
CN109299352B (en) * 2018-11-14 2022-02-01 百度在线网络技术(北京)有限公司 Method and device for updating website data in search engine and search engine
CN111008326A (en) * 2019-08-27 2020-04-14 嘉兴太美医疗科技有限公司 Clinical study subject retrieval method, device and computer readable medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5734910A (en) * 1995-12-22 1998-03-31 International Business Machines Corporation Integrating multi-modal synchronous interrupt handlers for computer system
US5945989A (en) * 1997-03-25 1999-08-31 Premiere Communications, Inc. Method and apparatus for adding and altering content on websites
US6012030A (en) * 1998-04-21 2000-01-04 Nortel Networks Corporation Management of speech and audio prompts in multimodal interfaces
US6119147A (en) * 1998-07-28 2000-09-12 Fuji Xerox Co., Ltd. Method and system for computer-mediated, multi-modal, asynchronous meetings in a virtual space
US6173266B1 (en) * 1997-05-06 2001-01-09 Speechworks International, Inc. System and method for developing interactive speech applications
US20010037200A1 (en) * 2000-03-02 2001-11-01 Hiroaki Ogawa Voice recognition apparatus and method, and recording medium
US20010049603A1 (en) * 2000-03-10 2001-12-06 Sravanapudi Ajay P. Multimodal information services
US6330539B1 (en) * 1998-02-05 2001-12-11 Fujitsu Limited Dialog interface system
US6363393B1 (en) * 1998-02-23 2002-03-26 Ron Ribitzky Component based object-relational database infrastructure and user interface
US20020046209A1 (en) * 2000-02-25 2002-04-18 Joseph De Bellis Search-on-the-fly with merge function
US6377913B1 (en) * 1999-08-13 2002-04-23 International Business Machines Corporation Method and system for multi-client access to a dialog system
US6393149B2 (en) * 1998-09-17 2002-05-21 Navigation Technologies Corp. Method and system for compressing data and a geographic database formed therewith and methods for use thereof in a navigation application program
US6501832B1 (en) * 1999-08-24 2002-12-31 Microstrategy, Inc. Voice code registration system and method for registering voice codes for voice pages in a voice network access provider system
US6513063B1 (en) * 1999-01-05 2003-01-28 Sri International Accessing network-based electronic information through scripted online interfaces using spoken input
US6523061B1 (en) * 1999-01-05 2003-02-18 Sri International, Inc. System, method, and article of manufacture for agent-based navigation in a speech-based data navigation system
US6687734B1 (en) * 2000-03-21 2004-02-03 America Online, Incorporated System and method for determining if one web site has the same information as another web site

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6421675B1 (en) * 1998-03-16 2002-07-16 S. L. I. Systems, Inc. Search engine
JP2000295125A (en) * 1999-04-06 2000-10-20 Sony Corp Receiver for digital audio broadcast
FR2796234A1 (en) * 1999-07-09 2001-01-12 Thomson Multimedia Sa SYSTEM AND METHOD FOR CONTROLLING THE USER INTERFACE OF A GENERAL PUBLIC ELECTRONIC DEVICE
US7685252B1 (en) 1999-10-12 2010-03-23 International Business Machines Corporation Methods and systems for multi-modal browsing and implementation of a conversational markup language
US6895558B1 (en) * 2000-02-11 2005-05-17 Microsoft Corporation Multi-access mode electronic personal assistant
US20020007379A1 (en) * 2000-05-19 2002-01-17 Zhi Wang System and method for transcoding information for an audio or limited display user interface
US6745163B1 (en) 2000-09-27 2004-06-01 International Business Machines Corporation Method and system for synchronizing audio and visual presentation in a multi-modal content renderer
US7028306B2 (en) 2000-12-04 2006-04-11 International Business Machines Corporation Systems and methods for implementing modular DOM (Document Object Model)-based multi-modal browsers
US6996800B2 (en) 2000-12-04 2006-02-07 International Business Machines Corporation MVC (model-view-controller) based multi-modal authoring tool and development environment
EP1451679A2 (en) 2001-03-30 2004-09-01 BRITISH TELECOMMUNICATIONS public limited company Multi-modal interface
US20030046316A1 (en) * 2001-04-18 2003-03-06 Jaroslav Gergic Systems and methods for providing conversational computing via javaserver pages and javabeans
US7020841B2 (en) * 2001-06-07 2006-03-28 International Business Machines Corporation System and method for generating and presenting multi-modal applications from intent-based markup scripts
US6983307B2 (en) * 2001-07-11 2006-01-03 Kirusa, Inc. Synchronization among plural browsers
US8799464B2 (en) 2001-12-28 2014-08-05 Motorola Mobility Llc Multi-modal communication using a session specific proxy server
US6807529B2 (en) 2002-02-27 2004-10-19 Motorola, Inc. System and method for concurrent multimodal communication
US20030187944A1 (en) 2002-02-27 2003-10-02 Greg Johnson System and method for concurrent multimodal communication using concurrent multimodal tags

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5734910A (en) * 1995-12-22 1998-03-31 International Business Machines Corporation Integrating multi-modal synchronous interrupt handlers for computer system
US5945989A (en) * 1997-03-25 1999-08-31 Premiere Communications, Inc. Method and apparatus for adding and altering content on websites
US6173266B1 (en) * 1997-05-06 2001-01-09 Speechworks International, Inc. System and method for developing interactive speech applications
US6330539B1 (en) * 1998-02-05 2001-12-11 Fujitsu Limited Dialog interface system
US6363393B1 (en) * 1998-02-23 2002-03-26 Ron Ribitzky Component based object-relational database infrastructure and user interface
US6012030A (en) * 1998-04-21 2000-01-04 Nortel Networks Corporation Management of speech and audio prompts in multimodal interfaces
US6119147A (en) * 1998-07-28 2000-09-12 Fuji Xerox Co., Ltd. Method and system for computer-mediated, multi-modal, asynchronous meetings in a virtual space
US6393149B2 (en) * 1998-09-17 2002-05-21 Navigation Technologies Corp. Method and system for compressing data and a geographic database formed therewith and methods for use thereof in a navigation application program
US6513063B1 (en) * 1999-01-05 2003-01-28 Sri International Accessing network-based electronic information through scripted online interfaces using spoken input
US6523061B1 (en) * 1999-01-05 2003-02-18 Sri International, Inc. System, method, and article of manufacture for agent-based navigation in a speech-based data navigation system
US6377913B1 (en) * 1999-08-13 2002-04-23 International Business Machines Corporation Method and system for multi-client access to a dialog system
US6501832B1 (en) * 1999-08-24 2002-12-31 Microstrategy, Inc. Voice code registration system and method for registering voice codes for voice pages in a voice network access provider system
US20020046209A1 (en) * 2000-02-25 2002-04-18 Joseph De Bellis Search-on-the-fly with merge function
US20010037200A1 (en) * 2000-03-02 2001-11-01 Hiroaki Ogawa Voice recognition apparatus and method, and recording medium
US20010049603A1 (en) * 2000-03-10 2001-12-06 Sravanapudi Ajay P. Multimodal information services
US6687734B1 (en) * 2000-03-21 2004-02-03 America Online, Incorporated System and method for determining if one web site has the same information as another web site

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060143007A1 (en) * 2000-07-24 2006-06-29 Koh V E User interaction with voice information services
US9529929B1 (en) * 2002-10-15 2016-12-27 Mitchell Joseph Aiosa Morris Systems, apparatus, methods of operation thereof, program products thereof and methods for web page organization
US7873390B2 (en) * 2002-12-09 2011-01-18 Voice Signal Technologies, Inc. Provider-activated software for mobile communication devices
US20040110494A1 (en) * 2002-12-09 2004-06-10 Voice Signal Technologies, Inc. Provider-activated software for mobile communication devices
US20060217881A1 (en) * 2005-03-28 2006-09-28 Sap Aktiengesellschaft Incident command post
US7881862B2 (en) 2005-03-28 2011-02-01 Sap Ag Incident command post
US20110066947A1 (en) * 2005-03-28 2011-03-17 Sap Ag Incident Command Post
US8352172B2 (en) 2005-03-28 2013-01-08 Sap Ag Incident command post
US20110184730A1 (en) * 2010-01-22 2011-07-28 Google Inc. Multi-dimensional disambiguation of voice commands
US8626511B2 (en) * 2010-01-22 2014-01-07 Google Inc. Multi-dimensional disambiguation of voice commands
US20130066635A1 (en) * 2011-09-08 2013-03-14 Samsung Electronics Co., Ltd. Apparatus and method for controlling home network service in portable terminal
US9317605B1 (en) 2012-03-21 2016-04-19 Google Inc. Presenting forked auto-completions
US10210242B1 (en) 2012-03-21 2019-02-19 Google Llc Presenting forked auto-completions
US9646606B2 (en) 2013-07-03 2017-05-09 Google Inc. Speech recognition using domain knowledge

Also Published As

Publication number Publication date
CN101621547A (en) 2010-01-06
US20030146932A1 (en) 2003-08-07
CN101621547B (en) 2013-01-30
US7203907B2 (en) 2007-04-10
CN100578474C (en) 2010-01-06
CN1997976A (en) 2007-07-11
DE60319962D1 (en) 2008-05-08
DE60319962T2 (en) 2009-04-16

Similar Documents

Publication Publication Date Title
US20030149564A1 (en) User interface for data access and entry
CN105408890B (en) Performing operations related to listing data based on voice input
US7729913B1 (en) Generation and selection of voice recognition grammars for conducting database searches
US7177814B2 (en) Dynamic grammar for voice-enabled applications
JP4997601B2 (en) WEB site system for voice data search
US9330661B2 (en) Accuracy improvement of spoken queries transcription using co-occurrence information
US9684741B2 (en) Presenting search results according to query domains
US6311182B1 (en) Voice activated web browser
JP3488174B2 (en) Method and apparatus for retrieving speech information using content information and speaker information
US7310601B2 (en) Speech recognition apparatus and speech recognition method
EP2058800B1 (en) Method and system for recognizing speech for searching a database
EP3032532A1 (en) Disambiguating heteronyms in speech synthesis
US11016968B1 (en) Mutation architecture for contextual data aggregator
US20040167875A1 (en) Information processing method and system
CN109637537B (en) Method for automatically acquiring annotated data to optimize user-defined awakening model
WO2004072780A2 (en) Method for automatic and semi-automatic classification and clustering of non-deterministic texts
Carvalho et al. A critical survey on the use of fuzzy sets in speech and natural language processing
US20220165257A1 (en) Neural sentence generator for virtual assistants
US7359858B2 (en) User interface for data access and entry
Kawahara New perspectives on spoken language understanding: Does machine need to fully understand speech?
Wang et al. Voice search
JP2004295578A (en) Translation device
US8024347B2 (en) Method and apparatus for automatically differentiating between types of names stored in a data collection
US7805291B1 (en) Method of identifying topic of text using nouns
WO2005076259A1 (en) Speech input system, electronic device, speech input method, and speech input program

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAP AKTIENGESELLSCHAFT, GERMANY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GONG, LI;SWAN, RICHARD J.;REEL/FRAME:014243/0976

Effective date: 20030603

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION