WO2001026350A1 - Vocal interface system and method - Google Patents

Vocal interface system and method Download PDF

Info

Publication number
WO2001026350A1
WO2001026350A1 PCT/US2000/026935 US0026935W WO0126350A1 WO 2001026350 A1 WO2001026350 A1 WO 2001026350A1 US 0026935 W US0026935 W US 0026935W WO 0126350 A1 WO0126350 A1 WO 0126350A1
Authority
WO
WIPO (PCT)
Prior art keywords
caller
service
information
business
database
Prior art date
Application number
PCT/US2000/026935
Other languages
French (fr)
Inventor
C. Mikael Berner
Amol M. Joshi
Lisa M. Guerra
Kevin M. Stone
Steve T. Tran
Original Assignee
Bevocal, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bevocal, Inc. filed Critical Bevocal, Inc.
Priority to EP00965546A priority Critical patent/EP1224797A1/en
Priority to AU76248/00A priority patent/AU7624800A/en
Publication of WO2001026350A1 publication Critical patent/WO2001026350A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2242/00Special services or facilities
    • H04M2242/22Automatic class or number identification arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/38Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections
    • H04M3/382Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using authorisation codes or passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42025Calling or Called party identification service
    • H04M3/42034Calling party identification service
    • H04M3/42059Making use of the calling party identifier
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q3/00Selecting arrangements
    • H04Q3/72Finding out and indicating number of calling subscriber

Definitions

  • Web pages has for example, experienced explosive growth. Moreover, the
  • VUI vocal user interface
  • VUI implementations tend to be cumbersome
  • the present invention comprises a VUI Speech Object Application
  • Fig. A1 Depicts a conversational state diagram of an embodiment of a
  • Fig. A2 Depicts a conversational state diagram of an embodiment of a
  • Fig. A3 Depicts a conversational state diagram of an embodiment of a
  • Fig. A4 Depicts a conversational state diagram of an embodiment of a
  • Fig. B1 Depicts a conversational state diagram of an embodiment of a
  • Fig. B2 Depicts a conversational state diagram of an alternate embodiment of a Traffic Condition Speech Object.
  • Fig. C1 Depicts a conversational state diagram of an embodiment of a 1/26350
  • Fig. C2 Depicts a conversational state diagram of an embodiment of a extended functionality of the Business Finder Speech Object.
  • Fig. D1 Depicts a conversational state diagram of an embodiment of a Stock Information Speech Object.
  • Fig. D2 Depicts a conversational state diagram of an embodiment of extended functionality of a Stock Information Speech Object.
  • Fig. D3 Depicts a conversational state diagram of an embodiment of extended functionality of a Stock Information Speech Object.
  • Fig. E1 Depicts a conversational state diagram of an embodiment of a Weather Speech Object.
  • Fig. E2 Depicts a conversational state diagram of an embodiment of a List Speech Object for conveying weather information to the caller.
  • Fig. F1 Depicts a conversational state diagram of an embodiment of a Address Locating Speech Object.
  • Fig. F2 Depicts a conversational state diagram of an embodiment of a Address Disambiguation Speech Object.
  • Fig- G1 Depicts a conversational state diagram of an embodiment of a Flight Finder Speech Object.
  • Fig- G2 Depicts a conversational state diagram of an embodiment of a Flight Information Speech Object.
  • Fig- G3 Depicts a conversational state diagram of an embodiment of a Itinerary Speech Object.
  • Fig. H Depicts a conversational state diagram of an embodiment of a Driving Directions Speech Object.
  • scaleable system architecture that includes at least one each of the following;
  • Vocal User Interface Application Server
  • Telephony Server a Telephony Server
  • Speech Recognition Server a Text-to-Speech Server
  • Media Server a Media Server
  • API Application Program Interface
  • the presently preferred backbone network comprises a TCP/IP
  • a caller connects to the Telephony Server by dialing a telephone
  • the Telephony Server includes a Telephone Network (PSTN).
  • PSTN Telephone Network
  • the Telephony Server includes a Telephone Network (PSTN).
  • FIG. 1 depicts the Telephone Network Interface coupled
  • PSTN Public Switched Telephone Network
  • Telephone Network Interface further comprises speech signal processing
  • the VUI Application Server comprises hardware under control of a VUI
  • the VUI Application implements a vocally navigable Speech Object interface between the caller and the API of the independent Service-
  • VUI Application further comprises distinct program
  • speech grammars that are particularly germane to the
  • a Traffic Condition Module a Traffic Condition Module
  • a Business Finder Module a Stock Finder Module
  • the Media Server comprises hardware under program control to store
  • the Media Server conveys speech objects to the voice
  • Speech Recognition Server comprises hardware under program control to
  • VUI Application translates the caller's uttered Service- content
  • Speech Objects that further comprise reused
  • the caller may vocalize a primary specific navigable point or a
  • Service-Database e.g. "Traffic Conditions Database”, “Home Menu”, or “Stock Information Database”
  • the List Speech Object comprises a preamble that will convey
  • Speech Object Further, selection of an item in the list or getting more
  • disambiguation in accordance with the present invention comprises a method
  • the first step is to convey the ambiguous
  • caller is accomplished with appropriate utterance and speech grammar (e.g.
  • Disambiguating Speech Object further creates dynamic speech grammars
  • the Main Menu Speech Object comprises
  • Diagram A depicts several possible transitions depending upon the
  • the caller may utter a grammar associated with a
  • Service-Database program module e.g. "Traffic” or with one of
  • caller administrative program modules e.g. "Login”, “New Account”,
  • the Main Menu document confirms the caller's choice (e.g.
  • Figure A2 depicts a Login Speech Object (SOLogin A4) that permits
  • PIN personal identification number
  • Login Speech Object associates each caller's PIN with their telephone
  • Figure A3 depicts a Passcode Speech Object to
  • Figure A4 depicts a New Account Speech Object to
  • the caller may at any time return to the Main Menu document by
  • the caller having to retrace the same navigated path.
  • the caller may utter several of the same navigational choices previously
  • Figure B1 depicts a Traffic Conditions Program Module conversational
  • Traffic Speech Object The caller can select the Traffic
  • Speech Object from the above Main Menu Speech Object module by uttering
  • the Traffic Speech Object coveys prompts that direct the caller to utterances that indicate a region of interest for traffic condition information.
  • the Traffic Speech Object first processes the caller's area code
  • the Traffic Speech Object prompts the caller to
  • Traffic Module confirms a metro area associated with the caller's selected
  • the Traffic Module confirms that it will search the traffic
  • Service-Database e.g SO_GetMetroTraffic B2
  • the Traffic Module will prompt the caller for a new metro area if the caller at this time cancels the pending search by uttering a "cancel"
  • the caller can request additional information by uttering "that one.”
  • the Traffic Module prompts the caller to optionally perform
  • the Traffic Speech Object engages the
  • Figure C depicts a Business Finder Service Program Module
  • the presumption is conveyed to the caller (e.g.
  • the caller is prompted (e.g. SO_Brand/Category
  • the Business Finder Speech Object automatically first filters out matches that are more than a specified distance away (e.g. more that 50
  • the caller is prompted whether a new search is desired.
  • the Business Finder Speech Object further includes the ability for the
  • Figure C2 depicts a conversational state diagram of this
  • the caller may also select a business
  • the Business Finder Speech Object audibly confirms the
  • the Telephony Network Server initiates a telephone call to the Telephony Network Server
  • Figure D1 depicts a Stock Information Program Module conversational
  • a speech object audibly alerts the caller that the Stock Information
  • the caller may opt out of the assumption by uttering a speech
  • a stock information indicator e.g. company name, ticker symbol,
  • the Stock Information Speech Object performs a search of the stock
  • the caller may wish to receive detailed information about stocks or abbreviated information.
  • Object recognizes both contextually global - non temporal utterances (e.g.
  • Figure D2 depicts a conversational state diagram reflecting additional
  • the Stock Sub-Module performs a search of the stock information
  • Figure D3 depicts an example of a conversational state diagram for
  • Figure E1 depicts the Weather Conditions Speech Object ("Weather
  • the Weather Speech Object infers a city for the caller based upon
  • the Weather Speech Object retrieves the weather information for
  • Figure E2 depicts a List Speech Object for conveying the weather
  • Figure F1 depicts the conversation state diagram of the Address
  • Locating Program Module ("Address Speech Object").
  • the Address Speech Object is ordinarily transitioned to from another Speech Object that needs to
  • Landmarks are preassigned speech grammars that can be both global
  • the Airport Finder Speech Object searches the caller private profile
  • the Address Module will access the address associated with
  • the Address Module prompts the caller
  • Speech Object engages the caller in speech objects that enable the caller to
  • Speech Object or alternatively, to begin searching from scratch. For example,
  • the caller may change the street name, or the cross street name.
  • FIG. G1 depicts the conversation state diagram of the Flight
  • the caller is greeted and prompted to
  • Flight Information Program Module (“Flight Information Speech Object").
  • Figure G2 depicts the conversation state diagram of the Flight
  • Flight Information Program Module (“Flight Information Speech Object"). Upon a transition to the Flight Information Speech Object, the caller is prompted to
  • Flight Information Speech Object transitions to speech objects that prompt the
  • Flight Information Speech Object will transition alternate speech
  • Flight Information Speech Object will convey the flight status information
  • the caller may also pick a flight without any specific information about
  • Figure G3 depicts the Itinerary Speech Object
  • Object includes speech objects that allow the caller to choose a flight
  • Object will engage the caller in a speech object to determine the airline if it is
  • the Itinerary Speech Object engages the caller in speech objects to
  • Service-Database 60 conveys it to the caller (SOReadRoutes G17).
  • the Driving Directions Speech Object determines point-to-point driving
  • Speech Object can be evoked both as a stand-alone program module Speech Object or from another program module, such as the Business Finder Speech
  • Speech Object If the Speech Object is evoked from another program module such as
  • Speech Object contains speech objects to determine either
  • the Driving Directions Speech Object interfaces with the API of an
  • the caller may also receive the driving directions by
  • the caller's driving directions includes a particularly long stretch of road
  • the Driving Directions Speech Object has the
  • a caller may use the Driving Directions Speech Object to determine directions to a particular location, save the driving

Abstract

A VUI Speech Object Application comprised of program module speech objects that interface with the APIs of service-databases to retrieve a caller's desired information.

Description

VOCAL INTERFACE SYSTEM AND METHOD
Field of the Invention The present invention relates to the field of methods for enabling a caller to
vocally access and retrieve information over a computer network.
Background
Customers have come to rely on and expect the quick availability of
information from their merchants. Accordingly, merchants have devised
methods of allowing their customers to have easy access to information about
their products and services. The retrieval of information via merchant Internet
Web pages, has for example, experienced explosive growth. Moreover, the
development of practical speech recognition hardware and software has now
made it possible to allow customers to access merchant information with a
vocal user interface (VUI).
To date however, the development of effective or even tolerable VUIs
has lagged the development of the technology to implement VUIs. This is the
case because much of the communication between people in a typical
conversation is nonverbal. This is particularly true when one individual is
attempting to ascertain a specific item of information from the other. These
conversations, when effective, are heavily influenced by educated and context
dependent inferential answers and prompts for more information to aid in
pinpointing the particular information of interest. When these elements are
missing from the conversation, VUI implementations tend to be cumbersome
and unpleasant to use, and accordingly, frustrating to users. Accordingly, there is need for user friendly VUIs that will increase the likelihood that callers
will take advantage of the services.
Summary of the Invention
The present invention comprises a VUI Speech Object Application
comprised of program module speech objects that interface with the APIs of
service-databases to retrieve a caller's desired information. Moreover,
inferential and educated decisions are made regarding a caller's desired
information to enable a more caller friendly experience with the VUI Speech
Object Application.
The novel features that are considered characteristic of the invention
are set forth with particularity in the appended claims. The invention itself,
however, both as to its structure and its operation together with the additional
object and advantages thereof will best be understood from the following
description of the preferred embodiment of the present invention when read in
conjunction with the accompanying drawings. Unless specifically noted, it is
intended that the words and phrases in the specification and claims be given
the ordinary and accustomed meaning to those of ordinary skill in the
applicable art or arts. If any other meaning is intended, the specification will
specifically state that a special meaning is being applied to a word or phrase.
Likewise, the use of the words "function" or "means" in the Description of
Preferred Embodiments is not intended to indicate a desire to invoke the
special provision of 35 U.S.C. §112, paragraph 6 to define the invention. To
the contrary, if the provisions of 35 U.S.C. §112, paragraph 6, are sought to be invoked to define the invention(s), the claims will specifically state the
phrases "means for" or "step for" and a function, without also reciting in such
phrases any structure, material, or act in support of the function. Even when
the claims recite a "means for" or "step for" performing a function, if they also
recite any structure, material or acts in support of that means of step, then the
intention is not to invoke the provisions of 35 U.S.C. §1 12, paragraph 6.
Moreover, even if the provisions of 35 U.S.C. §112, paragraph 6, are invoked
to define the inventions, it is intended that the inventions not be limited only to
the specific structure, material or acts that are described in the preferred
embodiments, but in addition, include any and all structures, materials or acts
that perform the claimed function, along with any and all known or later-
developed equivalent structures, materials or acts for performing the claimed
function.
Brief Description of the Drawings
Fig. A1 Depicts a conversational state diagram of an embodiment of a
Main Menu Speech Object.
Fig. A2 Depicts a conversational state diagram of an embodiment of a
Login Speech Object.
Fig. A3 Depicts a conversational state diagram of an embodiment of a
New Account Speech Object.
Fig. A4 Depicts a conversational state diagram of an embodiment of a
Passcode Speech Object.
Fig. B1 Depicts a conversational state diagram of an embodiment of a
Traffic Condition Speech Object.
Fig. B2 Depicts a conversational state diagram of an alternate embodiment of a Traffic Condition Speech Object.
Fig. C1 Depicts a conversational state diagram of an embodiment of a 1/26350
Business Finder Speech Object.
Fig. C2 Depicts a conversational state diagram of an embodiment of a extended functionality of the Business Finder Speech Object.
Fig. D1 Depicts a conversational state diagram of an embodiment of a Stock Information Speech Object.
Fig. D2 Depicts a conversational state diagram of an embodiment of extended functionality of a Stock Information Speech Object.
Fig. D3 Depicts a conversational state diagram of an embodiment of extended functionality of a Stock Information Speech Object.
Fig. E1 Depicts a conversational state diagram of an embodiment of a Weather Speech Object.
Fig. E2 Depicts a conversational state diagram of an embodiment of a List Speech Object for conveying weather information to the caller.
Fig. F1 Depicts a conversational state diagram of an embodiment of a Address Locating Speech Object.
Fig. F2 Depicts a conversational state diagram of an embodiment of a Address Disambiguation Speech Object.
Fig- G1 Depicts a conversational state diagram of an embodiment of a Flight Finder Speech Object.
Fig- G2 Depicts a conversational state diagram of an embodiment of a Flight Information Speech Object.
Fig- G3 Depicts a conversational state diagram of an embodiment of a Itinerary Speech Object.
Fig. H Depicts a conversational state diagram of an embodiment of a Driving Directions Speech Object.
Description of Preferred Embodiments
The preferred embodiment of the present invention is implemented in a
scaleable system architecture that includes at least one each of the following;
a Vocal User Interface (VUI) Application Server, a Telephony Server, a Speech Recognition Server, a Text-to-Speech Server, and a Media Server
coupled to an Application Program Interface (API) of an independent Service-
Database. The above mentioned components of the preferred embodiment
are coupled together in a backbone network, and accordingly, each of the
above components includes a network interface comprising hardware under
program control to enable transceiving communications between the network
components. The presently preferred backbone network comprises a TCP/IP
network. Multiples of each of the above mentioned components can be
incorporated together with a load-balancer for efficient handling of increased
demand processing requirements.
A caller connects to the Telephony Server by dialing a telephone
number associated with the Telephony Server by a Public Switched
Telephone Network (PSTN). The Telephony Server includes a Telephone
Network Interface for transceiving and managing phone calls received over a
telephone network. Figure 1 depicts the Telephone Network Interface coupled
to a Public Switched Telephone Network (PSTN) using T1 lines. The
Telephone Network Interface further comprises speech signal processing
hardware under program control for creating and outputting digitized speech-
to-data streams and analog data-to-speech streams (collectively "speech-
data-streams") that are conveyed to and from the Telephone Network
Interface.
The VUI Application Server comprises hardware under control of a VUI
Application. The VUI Application implements a vocally navigable Speech Object interface between the caller and the API of the independent Service-
Database that is responsive to recognized spoken commands ("utterances")
and further includes recorded vocal navigation prompts that are conveyed to
the caller to aid the caller's retrieval of information from the Service-
Databases. Moreover, the VUI Application further comprises distinct program
modules associated with each Service-Database that enable the employment
of module specific speech objects and software code that is responsive to the
caller's utterances ("speech grammars") that are particularly germane to the
Service-Database.
The presently preferred program modules of the VUI Application
include: a Traffic Condition Module, a Business Finder Module, a Stock
Information Module, a Driving Directions Module, a Flight Information Module,
and a Weather Conditions Module. It is contemplated that additional program
modules will be integrated with the VUI Application to service the demand for
additional vocally searchable Service-Databases.
The Media Server comprises hardware under program control to store
the recorded prompts associated with the Speech Object program modules of
the VUI Application. The Media Server conveys speech objects to the
Telephony Server according to the process flow of the VUI Application. The
Speech Recognition Server comprises hardware under program control to
interpret the caller's vocalized navigation commands for the VUI Application.
Thereafter the VUI Application translates the caller's uttered Service- content
requests into database search expressions that are passed by the VUI 1/26350
Application through the API of the selected Service-Database to search for,
retrieve, and convey the retrieved information to the Text-to-Speech and
Media Server, which information is ultimately conveyed vocally to the caller
through the Telephony Server.
The presently preferred embodiment of the VUI Application is
implemented in speech objects with the program modules being implemented
in distinct program module Speech Objects that further comprise reused
component speech objects and custom component Speech Objects.
Additionally, although speech objects are mainly intended to evoke specific
recognized utterances from the caller, speech objects are also contemplated
to be useful for conveying advertisements or other public and private
information to the caller. The VUI Application and program module Speech
Objects are more fully described below in the specification and the depicted
conversational state diagrams.
VUI Application Universal Vocal Navigation
In accordance with the VUI Application, and because specific universal
grammar associated with the VUI Application Main Menu is available within
each distinct Speech Object program module, the VUI Application also
provides for Universal Navigation commands at any point in the caller's
navigation. For example, at any point after establishing a VUI Application
session, the caller may vocalize a primary specific navigable point or a
Service-Database (e.g. "Traffic Conditions Database", "Home Menu", or "Stock Information Database") by vocalizing a recognized utterance that is
enabled within the program module Speech Object.
Standard Treatment of Muti-ltem Search Results
Because it is the intention of some searches of the Service-Databases
to return several information items, it becomes necessary to effectively
present the information to the caller in a manner that permits the caller to
select the desired item. Thus, the VUI Application presents the information
items to the caller by engaging the caller in a List Speech Object.
The List Speech Object comprises a preamble that will convey
acceptable speech grammars to navigate the list, the number of items in a
muti-item list, and an audible separator that will alert the caller that the next
item on the list will be conveyed and that the response period within which to
select the previous item has passed. Both auto-advance and mandatory
vocalized navigation modes are available methods of navigating a List
Speech Object. Further, selection of an item in the list or getting more
information about an item in a list is accomplished by appropriate recognized
speech grammars such as "that one" or "more details".
Standard Treatment of Search Result Ambiguities
Alternatively, there are circumstances when the user's utterance may
be ambiguous to the VUI Application. Resolving the ambiguity, or
disambiguation, in accordance with the present invention comprises a method
of efficiently presenting the ambiguity to the caller and letting the caller select the desired item. To disambiguate, the program modules of the VUI
Application engage the caller in a standard disambiguation dialog to remove
the ambiguity.
The VUI Application transitions to a disambiguation Speech Object
when an ambiguity is detected. The first step is to convey the ambiguous
items in a list to the caller. For example, the caller is first prompted that an
ambiguity exists by conveying an appropriate prompt such as "Did you mean
<item 1>, <item 2>, <item n> ...?" Further, the last item included in the list
and conveyed to the caller is "none of the above" or a prompt with similar
meaning. Then, selection of the desired item and navigation of the list by the
caller is accomplished with appropriate utterance and speech grammar (e.g.
"that one" or "previous item" & "the first one", respectively). Further, the
Disambiguating Speech Object further creates dynamic speech grammars
based upon the caller's utterance of a subset of each item to be
disambiguated. For instance, in a Disambiguation Speech Object to
determine a caller's actual desired New York airport, the Disambiguation
Speech Object will prompt, "Did you mean, New York JFK, New York
LaGuardia, Newark New Jersey, or none of these?" Both "New York, JFK"
and "JFK" are acceptable utterances. Upon disambiguating the search items,
the disambiguation dialog transitions to the next conversation state in the
program module where the ambiguity arose. VUI Application Program Modules
VUI Application Main Menu Speech Object
Referring to figure A1 , the Main Menu Speech Object comprises
several component speech objects that transition either to other component
speech objects within the Main Menu Speech Object or to other program
module Speech Objects in the VUI Application.
Upon entry into the Main Menu Speech Object, the caller is greeted
and prompted to utter a personal identification code or service name
associated with a particular program module Speech Object (SO_Pin/Service
A1). Diagram A depicts several possible transitions depending upon the
caller's utterance. The caller may utter a grammar associated with a
particular Service-Database program module (e.g. "Traffic") or with one of
several caller administrative program modules (e.g. "Login", "New Account",
"Service Tips", "Forgot Passcode").
If the caller utters a speech grammar corresponding to a Service-
Database, the Main Menu document confirms the caller's choice (e.g.
SO_Traffic, SO_Stocks) while transitioning to the program module Speech
Object associated with the Service- Database A5. The Login program module
Speech Object ("Login Speech Object") is of particular significance and
enables accessing and creating a private caller profile to effect the
customization of preferences and/or settings for each caller. For instance, the
caller may enter home or work addresses, telephone numbers and other personal information that enables the other program module Speech Objects
to make educated inferential decisions regarding what will be the caller's
most probable selections or utterances.
Figure A2 depicts a Login Speech Object (SOLogin A4) that permits
the VUI Application to distinguish between callers. The Login Speech Object
permits the caller to enter a personal identification number ("PIN") and
enables the caller to invoke dialogs to determine a forgotten PIN
(ForgotPasscode A2) or establish a new account (SONewAccount A3). The
Login Speech Object associates each caller's PIN with their telephone
number A10, thus upon login, the Login Speech Object process the caller's
telephone number and verifies that it corresponds with the caller's PIN. If so,
the Login Speech Object confirms the verification to the caller and returns to
the program module it came from. If the caller does not login, or if the caller
does not invoke the Login Speech Object, the Main Menu Speech Object
transitions to the next state. Figure A3 depicts a Passcode Speech Object to
retrieve a forgotten PIN. Figure A4 depicts a New Account Speech Object to
establish a new account.
The caller may at any time return to the Main Menu document by
uttering an acceptable speech grammar (e.g. "Home"). The Main Menu
document jumps directly to a second abbreviated greeting (SO_HomeMenu
A6) to account for the caller's familiarly with the VUI Application and to avoid
the caller having to retrace the same navigated path. At this conversation state, the caller may utter several of the same navigational choices previously
discussed.
Traffic Conditions Program Module Conversational State Diagram
Figure B1 depicts a Traffic Conditions Program Module conversational
state diagram ("Traffic Speech Object"). The caller can select the Traffic
Speech Object from the above Main Menu Speech Object module by uttering
a speech grammar that evokes the Traffic Speech Object (e.g. "Traffic"). In
turn, the Traffic Speech Object coveys prompts that direct the caller to utterances that indicate a region of interest for traffic condition information.
The Traffic Speech Object first processes the caller's area code and
prefix, and if the metro area associated with the caller's area code and prefix
are supported by the traffic Service-Database, the Traffic Speech Object
conveys to the caller the presumption that the Traffic Speech Object will use
the associated metro area for the caller's traffic region of interest
(SO_GetMetroTraffic B2). Otherwise, if the metro area for the city-state
combination is not supported, the Traffic Speech Object prompts the caller to
utter a particular city-state combination of interest (SO_City/State B1), and the
Traffic Module confirms a metro area associated with the caller's selected
city-state combination. If the selected metro area is supported by the traffic
Service-Database, the Traffic Module confirms that it will search the traffic
Service-Database (e.g SO_GetMetroTraffic B2) for traffic related incidents in
that metro area. The Traffic Module will prompt the caller for a new metro area if the caller at this time cancels the pending search by uttering a "cancel"
or "stop" speech grammar.
Differing prompts are conveyed to the caller depending upon the
amount of traffic information retrieved from the traffic Service-Database. For
5 instance, "no traffic incidents" can be directly conveyed to the caller, or the
occurrence of single traffic incident can also be directly conveyed to the caller
(SO_ReadTraffic B3). Else, if there are several incidents, the Traffic Speech
Object checks whether it has grammars of the metro area and if so prompts
the caller to enter a primary road or utter whether all the traffic incident reports
: r for that road are desired (SO_MajorRoad B4). If the Traffic Speech Object
supports the major road, a confirmatory prompt (SO_RoadConfirm B5) is
conveyed to the caller who may vocalize a confirmation and hear the traffic
incident report for that major road (SO_ReadTraffic B3). Otherwise, if there
are no highway grammars, all the traffic incidents are conveyed to the caller
is (SO_ReadTraffic B3). The list of traffic related incidents is initially brief, but
the caller can request additional information by uttering "that one." After
providing the additional information, the Traffic Speech Object continues
reading the list. The Traffic Module prompts the caller to optionally perform
another search prior to exiting to the Main Menu document.
20 In an alternate embodiment, the Traffic Speech Object engages the
caller in a series of dialogs to determine a specific road. The Traffic Speech
Object subsequently interfaces with the APi of the Traffic Service-Database to
determine and convey any available information to the caller. Business Finder Program Module Conversational State Diagram
Figure C depicts a Business Finder Service Program Module
conversational state diagram ("Business Finder Speech Object"). The caller
can select the Business Finder Module from the above Main Menu Speech
Object by uttering a speech grammar that evokes the Business Finder Module
(e.g. "Business Finder").
In a preferred embodiment of the Business Finder Speech Object, the
region of interest for the caller is presumed based upon information retained
in the caller profile or based upon the area code and prefix of the caller's
telephone number. The presumption is conveyed to the caller (e.g.
SO_AssumeLocation C1) who may either convey an affirmative speech
grammar to confirm that the presumption is correct, or optionally select
another region of interest with an utterance having a negative connotation(e.g.
"cancel") that will invoke a transition to another Speech Object that prompts
the caller to enter the desired region of interest (e.g.SO_CityState C2).
Upon the caller's utterance of a speech grammar acquiescing to the
Business Finder Speech Object's presumption or utterance and confirmation
of an alternate city and state, the caller is prompted (e.g. SO_Brand/Category
C3) to utter a specific brand name or to vocalize a category to search (e.g.
"grocery stores"). Upon receipt of the caller's response, the Business
Finder Speech Object interfaces with the API of the Business Finder
Database and retrieves the information that most probably fulfills the caller's
desires. The Business Finder Speech Object automatically first filters out matches that are more than a specified distance away (e.g. more that 50
miles). If however, there are no matches within the specified distance, the
Business Finder Speech Object adds back in the matches removed in the
previous step.
If there are multiple matches retrieved from the search, the Business
Finder Speech Object prompts (e.g. SO_FindNearest C4) the caller whether
the caller wants to hear the match that is closest to the caller's presumed
vicinity. If a confirming speech grammar is conveyed by the caller, the
Business Finder Speech Object transitions to the Address Finder Speech
Object (SOAddress C5) which returns processing control to the Business
Finder Speech Object when the Address Finder Speech Object has recorded
the caller's desired address. The Business Finder Speech Object then
conveys the matches that are nearest the vicinity of the caller's address.
However, if the caller elected not to provide an address, or if the search
retrieves no matches within a specified radius (e.g. fifty miles) that meet the
caller's request, a search of maximum radius is performed and the results are
conveyed to the caller. If a maximum radius search retrieves zero matches,
the caller is prompted whether a new search is desired.
The Business Finder Speech Object further includes the ability for the
caller to initiate a telephone call that will connect the current call to a business
establishment on the list, or provide driving directions to a business
establishment. Figure C2 depicts a conversational state diagram of this
additional functionality. As the search results are being read to the caller (SOJJstResults C6), the caller can navigate the list by uttering an appropriate
grammar such as "next" or "previous". The caller may also select a business
on the list by uttering an appropriate speech grammar such as "that one" or
"more information". Once a caller selects a business on the list, the caller may
choose either to receive driving directions or to place a telephone call to the
business selected. The Business Finder Speech Object audibly confirms the
caller's choice and prompts the caller to utter the caller's desired action
(SO_OneLocation C7). If the caller utters a "connect me" or similar speech
grammar, the Telephony Network Server initiates a telephone call to the
telephone number associated with the business selected by the caller. Else, if
the caller utters a speech grammar associated with "directions", the Business
Finder Speech Object will access the Driving Directions Speech Object if the
full address of the business selected by the caller was available in the
Service-Database.
Finally, the Business Finder Speech Object permits the caller to
perform additional searches C9, search for similar type businesses C10 (i.e.
within a same business type category), or find the nearest business C11 on
the list by uttering appropriate speech grammars (e.g. "new search", "find
similar", or "find nearest", respectively).
Stock Information Program Module Conversational State Diagram
Figure D1 depicts a Stock Information Program Module conversational
state diagram ("Stock Information Speech Object"). The caller can select the
Stock Information Program Module from the above Main Menu program Speech Object by uttering a speech grammar that evokes the Stock
Information Speech Object.
Upon the caller's utterance of a speech grammar indicating that stock
information is desired (e.g. "Stock"), the Stock Information Speech Object
conveys an audible confirmation to the caller (SO_AssumePortfolio D1).
Further, if the caller has established a stock portfolio in the private caller
profile, a speech object audibly alerts the caller that the Stock Information
Speech Object will assume what stocks are of particular interest to the caller
based upon the private caller profile. The Stock Information Speech Object
then interfaces with the API of the Stock Information database and retrieves
the most currently available data and reads it to the caller in a List Speech
Object.
The caller may opt out of the assumption by uttering a speech
grammar that indicates the caller's desire to do so (e.g. the caller names a
particular stock). If the caller's assumed portfolio is empty, or if the caller
cancels the presumption, the Stock Information Speech Object prompts the
caller to utter a stock information indicator (e.g. company name, ticker symbol,
or market index name). Upon receipt of the caller's uttered stock information
indicator, the Stock Information Speech Object performs a search of the stock
information database and reads the stock information to the caller.
Moreover, the Stock Information Speech Object permits the caller to
customize preferences regarding how the information is to be conveyed to the
caller. For example, the caller may wish to receive detailed information about stocks or abbreviated information. Thus, the Stock Information Speech
Object recognizes both contextually global - non temporal utterances (e.g.
"long quotes") that will globally effect the extent of information to be conveyed
about all stocks of interest, and item specific temporal utterances (e.g. "more
information", or "more details") that effect the extent of information to be
conveyed only about the specific stock that was just conveyed to the caller.
Item specific temporal utterances are characterized by a finite temporal
duration during which a caller's utterance is interpreted as an utterance only
for a specific item in the list.
Figure D2 depicts a conversational state diagram reflecting additional
functionality of the Stock Information Speech Object including adding and
removing a stock to the caller private profile and effecting the previously
described global - non temporal speech grammars and item specific temporal
speech grammars. If the caller utters a speech grammar to hear more about
a particular stock while it is being read to the caller (e.g. "long quotes" or
"details"), the Stock Sub-Module performs a search of the stock information
database for more information, and subsequently transitions back to the Stock
Information Speech Object and conveys an appropriate audible prompt to the
caller depending upon the information retrieved by the performed search.
Figure D3 depicts an example of a conversational state diagram for
conveying stock information list items to the caller (SOReadData D2). The
List Speech Object automatically sequentially relays the stock information to
the caller stopping only to interpret caller utterances and make modifications to the caller's preferences in accordance with previously described
capabilities. Process flow control returns to the Stock Information Speech
Object upon completion of the list of stock information.
Weather Conditions Program Module Conversational State Diagram
Figure E1 depicts the Weather Conditions Speech Object ("Weather
Speech Object"). Upon receipt of a caller's utterance meeting an acceptable
speech grammar for the Weather Speech Object, the caller is greeted.
Moreover, the Weather Speech Object infers a city for the caller based upon
the caller private profile or caller's telephone number and conveys the
inference to the caller (WeatherAssumeCity E1 ). Unless the caller utters a
context specific cancellation speech grammar (e.g. "new city") within a finite
time interval, the Weather Speech Object retrieves the weather information for
the city. Else the Weather Speech Object prompts the caller to enter another
city for which the Weather Conditions Speech Object will gather information
(WeatherGetCity E2) and convey the most relevant weather information to the
caller. Figure E2 depicts a List Speech Object for conveying the weather
information to the caller in a list format and further checks the caller private
profile to detect the preferred manner of receiving the weather information E3
(i.e. extended or abbreviated forecasts).
Address Locating Program Module Conversational State Diagram
Figure F1 depicts the conversation state diagram of the Address
Locating Program Module ("Address Speech Object"). The Address Speech Object is ordinarily transitioned to from another Speech Object that needs to
locate a specific address to perform a function that requires knowing a
specific street address (e.g. driving directions). Upon a transition to the
Address Speech Object, the caller is prompted to utter a particular city and
state of interest (SO_CityState F1) or to utter a Landmark.
Landmarks are preassigned speech grammars that can be both global
or particular to each caller and stored in each caller's private profile. "Airport"
is a special global grammar landmark that evokes an Airport Finder Speech
Object. The Airport Finder Speech Object searches the caller private profile
for a preferred preference and confirms this preference with the caller, who
may opt otherwise and engage in a dialog to pick an alternate airport.
If the caller utters another landmark that is particular to the caller
private profile, the Address Module will access the address associated with
the Landmark and return to the original program module from where the
transition came. If however, there is more than one address that meets the
caller's uttered Landmark (e.g. "airport"), the Address Module will transition to
an airport city disambiguation list dialog ( SO_AirportCity F2) to identify the
caller's desired airport and subsequently confirm and convey the desired
information to the caller. The Address Speech Object will loop back and
reengage the caller in the airport city list dialog if the city and state is not
supported by the database or if the caller conveys that the choice of airports is
incorrect. Upon uttering a city and state, the caller is engaged in dialogs to name
and confirm a desired street (SO_StreetName F3), name and confirm a street
number (SO_StreetNumber F4), or alternatively, if the street number is not
known, a cross street (SO_CrossStreet F5). When the Address Module has
received the caller's desired address, the Address Module prompts the caller
to confirm the address or, in appropriate circumstances will engage the caller
in a disambiguation and confirmation Speech Object
(SO_AddressDisambiguationAndConfirm F6) to resolve the ambiguity.
Upon obtaining the caller's final address, the Address Disambiguation
Speech Object engages the caller in speech objects that enable the caller to
change only a subset of information conveyed to the Address Disambiguation
Speech Object or alternatively, to begin searching from scratch. For example,
the caller may change the street name, or the cross street name.
Flight Information Program Module Conversational State Diagram
Figure G1 depicts the conversation state diagram of the Flight
Finder Program Module ("Flight Finder Speech Object"). Upon a transition to
the Flight Information Speech Object, the caller is greeted and prompted to
utter whether the caller wants arrival or departure information (SO Arrival
Departure G1) upon which the Flight Finder Speech Object transitions to the
Flight Information Program Module ("Flight Information Speech Object").
Figure G2 depicts the conversation state diagram of the Flight
Information Program Module ("Flight Information Speech Object"). Upon a transition to the Flight Information Speech Object, the caller is prompted to
utter an airline and flight number (SOAirlineFlight G2), to utter an airline or
flight number, or to utter neither the airline or flight number.
Alternate speech objects are transitioned to depending upon the extent
of information known by the caller and conveyed to the Flight Information
Speech Object. If the caller utters either only the airline or flight number, the
Flight Information Speech Object transitions to speech objects that prompt the
caller to utter either the flight number (SOFIightNumber G3) or the airline
(SOAirline G4), respectively. The Flight Information Speech Object then
confirms the caller's airline and flight number (SOFIightlnfoCandC G5) and
assumes that flight information is desired on the day of the call, but optionally
allows the caller to check flight information for another date (SOFIightDate
G6). Upon confirmation of the caller's desired flight information the Flight
Information Speech Object interfaces with the API of the flight information
Service Database to retrieve the caller's desired flight information.
The Flight Information Speech Object will transition alternate speech
objects depending upon the flight information retrieved from the Service-
Database. No available information results in a prompt to perform another
search (tSearchAnotherFlight G7). The existence of multiple flight legs
invokes a Speech Object that allows the caller to optionally choose a specific
leg of the flight (SOChooseLeg G8) for which to hear information. Otherwise,
the Flight Information Speech Object will convey the flight status information
to the caller (SOFIightStatus G9). The caller may also pick a flight without any specific information about
a airline or flight number. Figure G3 depicts the Itinerary Speech Object
which is invoked if the caller does not know the airline or flight number when
conversing with the Flight Information Speech Object. The Itinerary Speech
Object includes speech objects that allow the caller to choose a flight
regardless if an airline is known. For example, if the caller has not entered an
airline in the Flight Information Speech Object, the Itinerary Speech Object will
engage the caller in a dialog to determine an airline (e.g. tCheckAirline G10,
soAirline G11 ). If the caller does not know the airline, the Itinerary Speech
Object will engage the caller in a speech object to determine the airline if it is
supported by the Service-Database 60. If not the caller is prompted for
another airline (tAnotherAirline G12).
The Itinerary Speech Object engages the caller in speech objects to
determine the departure city (SOAirportCity(departure) G13) and arrival city
(SOAirportCity(arrival) G14) and then in a fight time Speech Object
(soFlightTime G15). The Itinerary Speech Object next checks with an
Itinerary check and confirm speech object (SO ItineraryCandC G16), gets a
list of flights meeting the caller's requirements from the flight information
Service-Database 60, and conveys it to the caller (SOReadRoutes G17).
Driving Directions Program Module Speech Object
The Driving Directions Speech Object determines point-to-point driving
directions given two addresses. See Figure H. The Driving Directions
Speech Object can be evoked both as a stand-alone program module Speech Object or from another program module, such as the Business Finder Speech
Object. If the Speech Object is evoked from another program module such as
Business Finder Speech Object, known addresses are passed from the first
program module to the Driving Directions Speech Object. Otherwise, the
Driving Directions Speech Object contains speech objects to determine either
the caller's source or destination address (SOSourceAddress H1 and
SODesitinationAddress H2, respectively).
The Driving Directions Speech Object interfaces with the API of an
independent service-database to retrieve point-to-point driving directions.
Upon retrieval of the driving directions, the Driving Directions Speech Object
formats the driving directions into a list that is conveyed to the caller
(SOReadDirections H3). The caller can navigate the list by uttering
appropriate speech grammars (e.g. "next", "previous", "start over", "stop",
"pause"). Alternatively, the caller may also receive the driving directions by
email or facsimile (SODrectionDeliveryMethod H4). Moreover, if a portion of
the caller's driving directions includes a particularly long stretch of road, the
Driving Directions Speech Object dynamically creates a prompt to query
whether the caller wants to hear directions from the first step or starting after
that road (SOStartFrom H5).
For logged in callers, the Driving Directions Speech Object has the
added capability of storing a set of driving directions in the caller private
profile after they have been determined, and further creating a prompt that
evokes the saved directions. Thus a caller may use the Driving Directions Speech Object to determine directions to a particular location, save the driving
directions to the caller private profile, and disconnect the telephone call. The
caller is prompted whether they would like to resume using the saved driving
directions when they call again and login (SOResume H6).
The preferred embodiment of the invention is described above in the
Drawings and Description of Preferred Embodiments. While these
descriptions directly describe the above embodiments, it is understood that
those skilled in the art may conceive modifications and/or variations to the
specific embodiments shown and described herein. Any such modifications
or variations that fall within the purview of this description are intended to be
included therein as well. Unless specifically noted, it is the intention of the
inventor that the words and phrases in the specification and claims be given
the ordinary and accustomed meanings to those of ordinary skill in the
applicable art(s). The foregoing description of a preferred embodiment and
best mode of the invention known to the applicant at the time of filing the
application has been presented and is intended for the purposes of illustration
and description. It is not intended to be exhaustive or to limit the invention to
the precise form disclosed, and many modifications and variations are
possible in the light of the above teachings. The embodiment was chosen
and described in order to best explain the principles of the invention and its
practical application and to enable others skilled in the art to best utilize the
invention in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

1. In a computer system, a method of retrieving and conveying information requested by a caller, comprising:
initiating a vocal user interface application session upon receipt of a 5 telephone call from the caller;
within a first vocal user interface program module, prompting the caller to enter a first vocal expression that describes the caller's desired information;
prompting the caller to enter a second vocal expression that more narrowly describes, relative to the vocal expression in the first vocal user o interface program module, the caller's desired information,
accessing a service-database that contains information regarding the caller's desired information,
searching the service-database for a database sample that most closely satisfies the service-search expression,
s retrieving the caller's desired information from the service-database,
formatting the caller's desired information into a vocal output, and
conveying the vocal output of the caller's desired information to the caller.
:0 2. The method in claim 1 further comprising the steps of,
recognizing the first vocal expression as a command to transition to a second vocal user interface program module;
transitioning to a second vocal user interface program module, and performing the steps of, prompting the caller to enter a second vocal expression that more narrowly describes, relative to the vocal expression in the first vocal user interface program module, the caller's desired information,
accessing a service-database that contains information regarding the caller's desired information,
searching the service-database for a database sample that most closely satisfies the service-search expression,
retrieving the caller's desired information from the service- database,
formatting the caller's desired information into a vocal output, and
conveying the vocal output of the caller's desired information to the caller,
within the second vocal user interface program module.
3. The method in claim 1 further comprising the step of,
inferring at least a portion of the caller's desired information based upon information delivered during a telephone call.
4. The method in claim 3 wherein the information delivered during a telephone call comprises,
at least a portion of a telephone number.
The method in claim 1 further comprising: aborting the processing of any step in the method upon receipt of a vocal command to do so by the caller.
6. The method in claim 3 further comprising,
checking a caller private profile that is identifiable with the caller by an account number.
7. The method in claim 6 further comprising,
inferring the caller account number based upon information delivered during the telephone call.
8. The method in claim 7 further comprising,
storing a portion of the caller's desired information in the caller private profile, and
inferring at least a portion of the caller's desired information based upon the stored portion of the caller's desired information.
9. The method in claim 1 wherein the service-database that contains information regarding the caller's desired information is selected from a group of service-databases consisting of;
a traffic condition service-databases, a stock information service- databases, a business finder service-databases, a weather condition service- databases, a flight information service-databases, or a driving directions service-databases.
10. The method in claim 9 wherein if the traffic condition service-database is selected, the step of accessing the service-database further comprises:
prompting the caller to vocally enter a city and state;
searching the number of traffic related incidents that pertain to the city and state;
prompting the caller to enter a road by name;
searching the location of traffic related incidents in that pertain to the road, and if there is traffic related information regarding the incidents on the road;
conveying the traffic related information regarding the incidents to the caller.
11. The method in claim 10 further comprising,
prompting the caller to vocally select a particular traffic related incident about which to receive more detail, and
conveying more detail to the caller upon receipt of the caller's vocal selection.
12. The method in claim 9 wherein if the traffic condition service-database is selected, the step of accessing the service-database further comprises:
inferring a city and state of interest for the caller based upon at least a portion of the caller's telephone number; searching the number of traffic related incidents that pertain to the city and state;
prompting the caller to enter a road by name;
searching the location of traffic related incidents that pertain to the road, and if there is traffic related information regarding the incidents on the road;
conveying the traffic related information regarding the incidents to the caller.
13. The method in claim 12 further comprising,
prompting the caller to vocally select a particular traffic related incident about which to receive more detail, and
conveying more detail to the caller upon receipt of the caller's vocal selection.
14. The method in claim 9 wherein if the stock information service- database is selected, the step of accessing the service-database further comprises:
prompting the caller to enter an investment indicator;
searching the independent service-database for the most recent value associated with the investment indicator,
conveying the most recent value to the caller.
15. The method in claim 14 further comprising, prompting the caller to vocally select an investment indicator about which to receive more detail, and
conveying more detail to the caller upon receipt of the caller's vocal selection.
16. The method in claim 9 wherein if the stock information service- database is selected, the step of accessing the service-database further comprises:
retrieving an investment indicator from a private caller profile;
searching the service-database for the most recent value associated with the investment indicator,
conveying the most recent value to the caller.
17. The method in claim 14 further comprising the step of,
adding or removing an investment indicator from the private caller profile upon receipt of a vocal command to do so by the caller.
18. The method in claim 14 further comprising the step of,
providing more or less information about a particular investment indicator upon receipt of a vocal command to do so by the caller
19. The method in claim 9 wherein if the weather condition service- database is selected, the step of accessing an service-database further comprises; inferring a city and state for which weather information is desired based upon information conveyed by the caller,
searching the service-database to find current weather conditions for the city and state;
conveying the current weather conditions to the caller.
20. The method in claim 19 wherein the information conveyed by the caller is selected from a group of information items consisting of; a portion of the caller's telephone number, or a city and state stored in a caller private profile.
21. The method in claim 9 wherein if the weather condition service- database is selected, the step of accessing an service-database further comprises;
prompting the caller to vocally enter a city and state for which weather information is desired,
searching the service-database to find current weather conditions for the city and state;
conveying the current weather conditions to the caller.
22. The method in claim 21 wherein the step of conveying further comprises;
prompting the caller to vocally enter a request for extended forecast information, and if it is available,
conveying the extended forecast information to the caller.
23. The method in claim 9 wherein if the business finder service-database is selected, the step of accessing an service-database further comprises;
prompting the caller to vocally enter a city and state,
prompting the caller to vocally enter a business search request,
searching the service-database to find a business address satisfying the business search request,
conveying the business address to the caller.
24. The method in claim 23 further comprising the step of,
prompting whether the caller desires to hear the business address, and if the caller answers affirmatively,
prompting the caller to enter an reference address from which to determine driving directions,
determining driving directions between the reference address and the business address, and
conveying the driving directions to the caller.
25. The method in claim 9 wherein if the business finder service-database is selected, the step of accessing an service-database further comprises;
inferring a city and state based upon information delivered during the telephone call,
prompting the caller to vocally enter a business search request, searching the service-database to find a business address satisfying the business search request,
conveying the business address to the caller.
26. The method in claim 25 wherein the information delivered during the telephone call is selected from a group of information items consisting of; a portion of the caller's telephone number, or a city and state stored in a caller private profile.
27. The method in claim 26 further comprising the step of, conveying to the caller the number of businesses found and
prompting whether the caller desires to hear the business addresses
closest to a reference address, and if the caller answers affirmatively,
obtaining a reference address from the caller,
searching the independent service-database to find relevant business
addresses,
conveying the business addresses to the caller.
28. The method in claim 9 wherein if the flight information service-database is selected, the step of accessing an service-database further comprises;
prompting the caller to vocally enter a flight information request,
searching the independent flight information service-database,
retrieving information that most closely matches the caller's flight information request, conveying the information to the caller.
29. The method in claim 28 wherein
the flight information request comprises arrival or departure information for unknown airlines, and the method further comprises,
searching the independent service database for information items meeting the request,
retrieving information items meeting the request,
conveying the information items to the caller in a list.
30. The method in claim 28 wherein
the flight information request comprises arrival or departure information regarding a particular flight that has multiple legs, and the method further comprises,
prompting if the caller wants to hear information regarding all the legs or regarding a particular leg.
31. The method in claim 9 wherein,
the caller selected the driving directions service-database, and the method further comprises,
prompting the caller to enter at least one address,
searching the driving directions service-database,
conveying the driving directions to the caller.
32. The method in claim 31 further comprising,
pausing the step of conveying the driving directions to the caller upon a vocal command from the caller.
33. The method in claim 31 further comprising,
saving the list of driving directions in a caller private profile,
resuming the step of conveying the driving directions to the caller.
34. The method in claim 33 wherein,
the step of resuming is performed in a subsequent vocal user interface application session.
35. The method in claim 31 wherein,
the step of conveying the driving directions to the caller is performed in a manner selected from the group consisting; email, fax, wap, or audible.
36. The method in claim 31 further comprising,
prompting whether the caller wants to hear all of the directions on a route from the start or from another point in the route.
37. An address finder vocal user interface for use in a computer system comprising;
a first caller address search software dialog that prompts for and accepts search requests from the group consisting of; an address within a city and state, a landmark address non specific to callers, a landmark address specific to a caller,
a software interface that accesses a service-database to search for and retrieve results for the search request, and
text-to-speech software that translates the results to speech.
38. The address finder vocal user interface in claim 37 wherein,
the address search request is an address within a city and state, and further comprising,
a second caller address search software dialog that prompts for and accepts search requests from the group consisting of; a street number or a cross-street name.
39. The address finder vocal user interface in claim 37 wherein,
the address search request is a landmark address specific to a caller, and further comprising,
software that access and retrieves an address stored in a private caller profile.
40. A business finder vocal user interface for use in a computer system comprising. software code that infers a city and state for a callers desired business location based upon information delivered during a telephone call,
a business finder software dialog that prompts the caller to vocally enter a business search request,
a software module that accesses a service-database to search for and retrieve results for the business search request, and
text-to-speech software that conveys the results to a caller.
41. The business finder vocal user interface in claim 40 wherein,
the information delivered during a telephone call is at least a portion of a caller telephone number.
42. The business finder vocal user interface in claim 40 wherein,
in a subsequent caller interaction with the vocal user interface, the software code that infers a city and state for a callers desired business location based upon information delivered during a telephone call, infers based on a previous caller interaction with the vocal user interface.
43. The business finder vocal user interface in claim 40 wherein,
acceptable caller vocal entries are selected from the group consisting of; a particular business name, and a business type.
44. The business finder vocal user interface in claim 40 further comprising, a telephony software dialog that accepts a caller's vocal command to initiate a telephone call to contact the business.
45. The business finder vocal user interface in claim 40 further comprising,
a driving directions software dialog that accepts a caller's source address and computes driving directions to the retrieved results of the business search request.
46. A driving directions vocal user interface in a computer system comprising,
an address software routine that accepts source and destination addresses during a telephone call,
a software interface that accesses an service-database to search for and retrieve driving directions between the source and destination addresses, and
text-to-speech software that translates the retrieved driving directions to speech for conveyance to a caller.
47. The driving directions vocal user interface in claim 46 wherein the address software dialog further comprises,
a second software dialog that prompts for and accepts a callers request that the directions be given from a point other than from the source address.
48. The driving directions vocal user interface in claim 46 wherein, the address software routine further comprises a software dialog that accepts verbally entered addresses from the caller.
49. The driving directions vocal user interface in claim 46 wherein,
the address software routine accepts addresses from an independent software program.
50. The driving directions vocal user interface in claim 46 further comprising,
software means for storing the driving directions, and
software means for conveying the driving directions to the caller in a subsequent interaction with the driving directions vocal user interface.
51. The driving directions vocal user interface in claim 49 wherein,
the independent software program is selected from the group consisting of; a business finder program module, or an address finder program module.
52. In a vocal user interface for use on a computer system , a method of interfacing with a caller, comprising:
querying the caller with a prompt intended to evoke either an affirmative response or a negative response from the caller, and
interpreting any response other than an affirmative response or a negative response as a negative response to the query, and utilizing the caller response in a subsequent query.
53. The method in claim 14 wherein the investment indicator comprises,
an investment indicator selected from the group consisting of; a publicly traded investment vehicle ticker symbol, the name of the business entity issuing the publicly traded investment vehicle, a market indicator ticker symbol, or a market indicator name.
54. The method in claim 21 wherein the step of conveying further comprises;
prompting the caller to vocally enter a request for extended forecast information, and if it is available,
conveying the extended forecast information to the caller.
55. The method in claim 23 wherein,
acceptable caller vocal entries are selected from the group consisting of; a particular business name, and a business type.
56. The method in claim 23 further comprising the step of,
conveying to the caller the number of businesses found and
prompting whether the caller desires to hear the business addresses
closest to a reference address, and if the caller answers affirmatively,
obtaining a reference address from the caller,
searching the independent service-database to find relevant business
addresses, conveying the business addresses to the caller.
57. The method in claim 23 further comprising the step of,
conveying that the caller can request to hear directions to a business
address and if the caller requests to hear directions,
prompting the caller to enter an reference address from which to
determine driving directions,
determining driving directions between the reference address and the
business address, and conveying the driving directions to the caller.
58. The method in claim 25 further comprising the step of,
conveying to the caller the number of businesses found and
prompting whether the caller desires to hear the business addresses
closest to a reference address, and if the caller answers affirmatively,
obtaining a reference address from the caller,
searching the independent service-database to find relevant business
addresses,
conveying the business addresses to the caller.
59. The method in claim 25 further comprising the step of,
conveying that the caller can request to hear directions to a business
address and if the caller requests to hear directions,
prompting the caller to enter an reference address from which to
determine driving directions, determining driving directions between the reference address and the business address, and conveying the driving directions to the caller.
60. The method in claim 27 further comprising the step of,
conveying that the caller can request to hear directions to a business
address and if the caller requests to hear directions,
prompting the caller to enter an reference address from which to
determine driving directions,
determining driving directions between the reference address and the business address, and conveying the driving directions to the caller.
61. In a vocal user interface for use on a computer system , a method of interfacing with a caller, comprising:
querying the caller with a prompt intended to evoke either an affirmative response or a negative response from the caller, and
interpreting any response other than an affirmative response or a negative response as an affirmative response to the query, and
utilizing the caller response in a subsequent query to the caller.
62. In a vocal user interface implemented on a computer, a method
of interfacing with a caller, comprising:
making an educated inferential decision regarding what will be the
caller's most probable utterances in a conversation state where caller-specific
information is accessible,
performing a subsequent action within the vocal user interface based
on the educated inferential decision, and canceling the subsequent action upon a caller's corrective vocalization.
PCT/US2000/026935 1999-10-01 2000-09-28 Vocal interface system and method WO2001026350A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP00965546A EP1224797A1 (en) 1999-10-01 2000-09-28 Vocal interface system and method
AU76248/00A AU7624800A (en) 1999-10-01 2000-09-28 Vocal interface system and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15696899P 1999-10-01 1999-10-01
US60/156,968 1999-10-01

Publications (1)

Publication Number Publication Date
WO2001026350A1 true WO2001026350A1 (en) 2001-04-12

Family

ID=22561826

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/026935 WO2001026350A1 (en) 1999-10-01 2000-09-28 Vocal interface system and method

Country Status (3)

Country Link
EP (1) EP1224797A1 (en)
AU (1) AU7624800A (en)
WO (1) WO2001026350A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1253569A2 (en) * 2001-04-23 2002-10-30 Hitachi, Ltd. Audio interactive navigation system, mobile terminal device, and audio interactive server
FR2827695A1 (en) * 2001-07-23 2003-01-24 France Telecom Telecommunication services portal with server using speech recognition and associated navigation services, uses telephone link and GPS data as input to server which delivers navigation information taking account of traffic information
WO2017132660A1 (en) * 2016-01-29 2017-08-03 Liquid Analytics, Inc. Systems and methods for dynamic prediction of workflows

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0175503A1 (en) * 1984-09-06 1986-03-26 BRITISH TELECOMMUNICATIONS public limited company Method and apparatus for use in interactive dialogue
WO1998024225A1 (en) * 1996-11-28 1998-06-04 British Telecommunications Public Limited Company Interactive apparatus
US5774860A (en) * 1994-06-27 1998-06-30 U S West Technologies, Inc. Adaptive knowledge base of complex information through interactive voice dialogue
EP0895396A2 (en) * 1997-07-03 1999-02-03 Texas Instruments Incorporated Spoken dialogue system for information access
EP0922279A2 (en) * 1997-01-09 1999-06-16 Scansoft, Inc. Method and apparatus for executing a human-machine dialogue in the form of two-sided speech as based on a modular dialogue structure
WO2000064137A1 (en) * 1999-04-21 2000-10-26 Ranjeet Nabha Method and system for the provision of internet-based information in audible form

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0175503A1 (en) * 1984-09-06 1986-03-26 BRITISH TELECOMMUNICATIONS public limited company Method and apparatus for use in interactive dialogue
US5774860A (en) * 1994-06-27 1998-06-30 U S West Technologies, Inc. Adaptive knowledge base of complex information through interactive voice dialogue
WO1998024225A1 (en) * 1996-11-28 1998-06-04 British Telecommunications Public Limited Company Interactive apparatus
EP0922279A2 (en) * 1997-01-09 1999-06-16 Scansoft, Inc. Method and apparatus for executing a human-machine dialogue in the form of two-sided speech as based on a modular dialogue structure
US6035275A (en) * 1997-01-09 2000-03-07 U.S. Philips Corporation Method and apparatus for executing a human-machine dialogue in the form of two-sided speech as based on a modular dialogue structure
EP0895396A2 (en) * 1997-07-03 1999-02-03 Texas Instruments Incorporated Spoken dialogue system for information access
WO2000064137A1 (en) * 1999-04-21 2000-10-26 Ranjeet Nabha Method and system for the provision of internet-based information in audible form

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1253569A2 (en) * 2001-04-23 2002-10-30 Hitachi, Ltd. Audio interactive navigation system, mobile terminal device, and audio interactive server
EP1253569A3 (en) * 2001-04-23 2004-03-31 Hitachi, Ltd. Audio interactive navigation system, mobile terminal device, and audio interactive server
US7076362B2 (en) 2001-04-23 2006-07-11 Hitachi, Ltd. Audio interactive navigation system, moving terminal device, and audio interactive server
FR2827695A1 (en) * 2001-07-23 2003-01-24 France Telecom Telecommunication services portal with server using speech recognition and associated navigation services, uses telephone link and GPS data as input to server which delivers navigation information taking account of traffic information
WO2003010494A1 (en) * 2001-07-23 2003-02-06 France Telecom Telecommunication service portal comprising a voice recognition server and navigation and guidance equipment using said portal
WO2017132660A1 (en) * 2016-01-29 2017-08-03 Liquid Analytics, Inc. Systems and methods for dynamic prediction of workflows
US10339481B2 (en) 2016-01-29 2019-07-02 Liquid Analytics, Inc. Systems and methods for generating user interface-based service workflows utilizing voice data

Also Published As

Publication number Publication date
AU7624800A (en) 2001-05-10
EP1224797A1 (en) 2002-07-24

Similar Documents

Publication Publication Date Title
US20210201932A1 (en) Method of and system for real time feedback in an incremental speech input interface
US7627096B2 (en) System and method for independently recognizing and selecting actions and objects in a speech recognition system
US7450698B2 (en) System and method of utilizing a hybrid semantic model for speech recognition
US6708150B1 (en) Speech recognition apparatus and speech recognition navigation apparatus
US9202247B2 (en) System and method utilizing voice search to locate a product in stores from a phone
US7376640B1 (en) Method and system for searching an information retrieval system according to user-specified location information
KR100383352B1 (en) Voice-operated service
US6246986B1 (en) User barge-in enablement in large vocabulary speech recognition systems
US6944594B2 (en) Multi-context conversational environment system and method
JP5315289B2 (en) Operating system and operating method
US20030191639A1 (en) Dynamic and adaptive selection of vocabulary and acoustic models based on a call context for speech recognition
US7783475B2 (en) Menu-based, speech actuated system with speak-ahead capability
US20030149566A1 (en) System and method for a spoken language interface to a large database of changing records
US7689425B2 (en) Quality of service call routing system using counselor and speech recognition engine and method thereof
US20020143548A1 (en) Automated database assistance via telephone
EP2289231A1 (en) A system and method utilizing voice search to locate a procuct in stores from a phone
US8428241B2 (en) Semi-supervised training of destination map for call handling applications
TWI698756B (en) System for inquiry service and method thereof
US20200193984A1 (en) Conversation guidance method of speech recognition system
WO2001026350A1 (en) Vocal interface system and method
JP2003016087A (en) Operating method for automatic sector information system and the system
KR20020077422A (en) Distributed speech recognition for internet access
JP2008216461A (en) Speech recognition, keyword extraction, and knowledge base retrieval coordinating device
JP2003223187A (en) Method of operating speech dialogue system
JP2001134285A (en) Speech recognition device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 10089753

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2000965546

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2000965546

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 2000965546

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP