WO2001026350A1

WO2001026350A1 - Vocal interface system and method

Info

Publication number: WO2001026350A1
Application number: PCT/US2000/026935
Authority: WO
Inventors: C. Mikael Berner; Amol M. Joshi; Lisa M. Guerra; Kevin M. Stone; Steve T. Tran
Original assignee: Bevocal, Inc.
Priority date: 1999-10-01
Filing date: 2000-09-28
Publication date: 2001-04-12
Also published as: AU7624800A; EP1224797A1

Abstract

A VUI Speech Object Application comprised of program module speech objects that interface with the APIs of service-databases to retrieve a caller's desired information.

Description

VOCAL INTERFACE SYSTEM AND METHOD

Field of the Invention The present invention relates to the field of methods for enabling a caller to

vocally access and retrieve information over a computer network.

Background

Customers have come to rely on and expect the quick availability of

information from their merchants. Accordingly, merchants have devised

methods of allowing their customers to have easy access to information about

their products and services. The retrieval of information via merchant Internet

Web pages, has for example, experienced explosive growth. Moreover, the

development of practical speech recognition hardware and software has now

made it possible to allow customers to access merchant information with a

vocal user interface (VUI).

To date however, the development of effective or even tolerable VUIs

has lagged the development of the technology to implement VUIs. This is the

case because much of the communication between people in a typical

conversation is nonverbal. This is particularly true when one individual is

attempting to ascertain a specific item of information from the other. These

conversations, when effective, are heavily influenced by educated and context

dependent inferential answers and prompts for more information to aid in

pinpointing the particular information of interest. When these elements are

missing from the conversation, VUI implementations tend to be cumbersome

and unpleasant to use, and accordingly, frustrating to users. Accordingly, there is need for user friendly VUIs that will increase the likelihood that callers

will take advantage of the services.

Summary of the Invention

The present invention comprises a VUI Speech Object Application

comprised of program module speech objects that interface with the APIs of

service-databases to retrieve a caller's desired information. Moreover,

inferential and educated decisions are made regarding a caller's desired

information to enable a more caller friendly experience with the VUI Speech

Object Application.

The novel features that are considered characteristic of the invention

are set forth with particularity in the appended claims. The invention itself,

however, both as to its structure and its operation together with the additional

object and advantages thereof will best be understood from the following

description of the preferred embodiment of the present invention when read in

conjunction with the accompanying drawings. Unless specifically noted, it is

intended that the words and phrases in the specification and claims be given

the ordinary and accustomed meaning to those of ordinary skill in the

applicable art or arts. If any other meaning is intended, the specification will

specifically state that a special meaning is being applied to a word or phrase.

Likewise, the use of the words "function" or "means" in the Description of

Preferred Embodiments is not intended to indicate a desire to invoke the

special provision of 35 U.S.C. §112, paragraph 6 to define the invention. To

the contrary, if the provisions of 35 U.S.C. §112, paragraph 6, are sought to be invoked to define the invention(s), the claims will specifically state the

phrases "means for" or "step for" and a function, without also reciting in such

phrases any structure, material, or act in support of the function. Even when

the claims recite a "means for" or "step for" performing a function, if they also

recite any structure, material or acts in support of that means of step, then the

intention is not to invoke the provisions of 35 U.S.C. §1 12, paragraph 6.

Moreover, even if the provisions of 35 U.S.C. §112, paragraph 6, are invoked

to define the inventions, it is intended that the inventions not be limited only to

the specific structure, material or acts that are described in the preferred

embodiments, but in addition, include any and all structures, materials or acts

that perform the claimed function, along with any and all known or later-

developed equivalent structures, materials or acts for performing the claimed

function.

Brief Description of the Drawings

Fig. A1 Depicts a conversational state diagram of an embodiment of a

Main Menu Speech Object.

Fig. A2 Depicts a conversational state diagram of an embodiment of a

Login Speech Object.

Fig. A3 Depicts a conversational state diagram of an embodiment of a

New Account Speech Object.

Fig. A4 Depicts a conversational state diagram of an embodiment of a

Passcode Speech Object.

Fig. B1 Depicts a conversational state diagram of an embodiment of a

Traffic Condition Speech Object.

Fig. B2 Depicts a conversational state diagram of an alternate embodiment of a Traffic Condition Speech Object.

Fig. C1 Depicts a conversational state diagram of an embodiment of a 1/26350

Business Finder Speech Object.

Fig. C2 Depicts a conversational state diagram of an embodiment of a extended functionality of the Business Finder Speech Object.

Fig. D1 Depicts a conversational state diagram of an embodiment of a Stock Information Speech Object.

Fig. D2 Depicts a conversational state diagram of an embodiment of extended functionality of a Stock Information Speech Object.

Fig. D3 Depicts a conversational state diagram of an embodiment of extended functionality of a Stock Information Speech Object.

Fig. E1 Depicts a conversational state diagram of an embodiment of a Weather Speech Object.

Fig. E2 Depicts a conversational state diagram of an embodiment of a List Speech Object for conveying weather information to the caller.

Fig. F1 Depicts a conversational state diagram of an embodiment of a Address Locating Speech Object.

Fig. F2 Depicts a conversational state diagram of an embodiment of a Address Disambiguation Speech Object.

Fig- G1 Depicts a conversational state diagram of an embodiment of a Flight Finder Speech Object.

Fig- G2 Depicts a conversational state diagram of an embodiment of a Flight Information Speech Object.

Fig- G3 Depicts a conversational state diagram of an embodiment of a Itinerary Speech Object.

Fig. H Depicts a conversational state diagram of an embodiment of a Driving Directions Speech Object.

Description of Preferred Embodiments

The preferred embodiment of the present invention is implemented in a

scaleable system architecture that includes at least one each of the following;

a Vocal User Interface (VUI) Application Server, a Telephony Server, a Speech Recognition Server, a Text-to-Speech Server, and a Media Server

coupled to an Application Program Interface (API) of an independent Service-

Database. The above mentioned components of the preferred embodiment

are coupled together in a backbone network, and accordingly, each of the

above components includes a network interface comprising hardware under

program control to enable transceiving communications between the network

components. The presently preferred backbone network comprises a TCP/IP

network. Multiples of each of the above mentioned components can be

incorporated together with a load-balancer for efficient handling of increased

demand processing requirements.

A caller connects to the Telephony Server by dialing a telephone

number associated with the Telephony Server by a Public Switched

Telephone Network (PSTN). The Telephony Server includes a Telephone

Network Interface for transceiving and managing phone calls received over a

telephone network. Figure 1 depicts the Telephone Network Interface coupled

to a Public Switched Telephone Network (PSTN) using T1 lines. The

Telephone Network Interface further comprises speech signal processing

hardware under program control for creating and outputting digitized speech-

to-data streams and analog data-to-speech streams (collectively "speech-

data-streams") that are conveyed to and from the Telephone Network

Interface.

The VUI Application Server comprises hardware under control of a VUI

Application. The VUI Application implements a vocally navigable Speech Object interface between the caller and the API of the independent Service-

Database that is responsive to recognized spoken commands ("utterances")

and further includes recorded vocal navigation prompts that are conveyed to

the caller to aid the caller's retrieval of information from the Service-

Databases. Moreover, the VUI Application further comprises distinct program

modules associated with each Service-Database that enable the employment

of module specific speech objects and software code that is responsive to the

caller's utterances ("speech grammars") that are particularly germane to the

Service-Database.

The presently preferred program modules of the VUI Application

include: a Traffic Condition Module, a Business Finder Module, a Stock

Information Module, a Driving Directions Module, a Flight Information Module,

and a Weather Conditions Module. It is contemplated that additional program

modules will be integrated with the VUI Application to service the demand for

additional vocally searchable Service-Databases.

The Media Server comprises hardware under program control to store

the recorded prompts associated with the Speech Object program modules of

the VUI Application. The Media Server conveys speech objects to the

Telephony Server according to the process flow of the VUI Application. The

Speech Recognition Server comprises hardware under program control to

interpret the caller's vocalized navigation commands for the VUI Application.

Thereafter the VUI Application translates the caller's uttered Service- content

requests into database search expressions that are passed by the VUI 1/26350

Application through the API of the selected Service-Database to search for,

retrieve, and convey the retrieved information to the Text-to-Speech and

Media Server, which information is ultimately conveyed vocally to the caller

through the Telephony Server.

The presently preferred embodiment of the VUI Application is

implemented in speech objects with the program modules being implemented

in distinct program module Speech Objects that further comprise reused

component speech objects and custom component Speech Objects.

Additionally, although speech objects are mainly intended to evoke specific

recognized utterances from the caller, speech objects are also contemplated

to be useful for conveying advertisements or other public and private

information to the caller. The VUI Application and program module Speech

Objects are more fully described below in the specification and the depicted

conversational state diagrams.

VUI Application Universal Vocal Navigation

In accordance with the VUI Application, and because specific universal

grammar associated with the VUI Application Main Menu is available within

each distinct Speech Object program module, the VUI Application also

provides for Universal Navigation commands at any point in the caller's

navigation. For example, at any point after establishing a VUI Application

session, the caller may vocalize a primary specific navigable point or a

Service-Database (e.g. "Traffic Conditions Database", "Home Menu", or "Stock Information Database") by vocalizing a recognized utterance that is

enabled within the program module Speech Object.

Standard Treatment of Muti-ltem Search Results

Because it is the intention of some searches of the Service-Databases

to return several information items, it becomes necessary to effectively

present the information to the caller in a manner that permits the caller to

select the desired item. Thus, the VUI Application presents the information

items to the caller by engaging the caller in a List Speech Object.

The List Speech Object comprises a preamble that will convey

acceptable speech grammars to navigate the list, the number of items in a

muti-item list, and an audible separator that will alert the caller that the next

item on the list will be conveyed and that the response period within which to

select the previous item has passed. Both auto-advance and mandatory

vocalized navigation modes are available methods of navigating a List

Speech Object. Further, selection of an item in the list or getting more

information about an item in a list is accomplished by appropriate recognized

speech grammars such as "that one" or "more details".

Standard Treatment of Search Result Ambiguities

Alternatively, there are circumstances when the user's utterance may

be ambiguous to the VUI Application. Resolving the ambiguity, or

disambiguation, in accordance with the present invention comprises a method

of efficiently presenting the ambiguity to the caller and letting the caller select the desired item. To disambiguate, the program modules of the VUI

Application engage the caller in a standard disambiguation dialog to remove

the ambiguity.

The VUI Application transitions to a disambiguation Speech Object

when an ambiguity is detected. The first step is to convey the ambiguous

items in a list to the caller. For example, the caller is first prompted that an

ambiguity exists by conveying an appropriate prompt such as "Did you mean

<item 1>, <item 2>, <item n> ...?" Further, the last item included in the list

and conveyed to the caller is "none of the above" or a prompt with similar

meaning. Then, selection of the desired item and navigation of the list by the

caller is accomplished with appropriate utterance and speech grammar (e.g.

"that one" or "previous item" & "the first one", respectively). Further, the

Disambiguating Speech Object further creates dynamic speech grammars

based upon the caller's utterance of a subset of each item to be

disambiguated. For instance, in a Disambiguation Speech Object to

determine a caller's actual desired New York airport, the Disambiguation

Speech Object will prompt, "Did you mean, New York JFK, New York

LaGuardia, Newark New Jersey, or none of these?" Both "New York, JFK"

and "JFK" are acceptable utterances. Upon disambiguating the search items,

the disambiguation dialog transitions to the next conversation state in the

program module where the ambiguity arose. VUI Application Program Modules

VUI Application Main Menu Speech Object

Referring to figure A1 , the Main Menu Speech Object comprises

several component speech objects that transition either to other component

speech objects within the Main Menu Speech Object or to other program

module Speech Objects in the VUI Application.

Upon entry into the Main Menu Speech Object, the caller is greeted

and prompted to utter a personal identification code or service name

associated with a particular program module Speech Object (SO_Pin/Service

A1). Diagram A depicts several possible transitions depending upon the

caller's utterance. The caller may utter a grammar associated with a

particular Service-Database program module (e.g. "Traffic") or with one of

several caller administrative program modules (e.g. "Login", "New Account",

"Service Tips", "Forgot Passcode").

If the caller utters a speech grammar corresponding to a Service-

Database, the Main Menu document confirms the caller's choice (e.g.

SO_Traffic, SO_Stocks) while transitioning to the program module Speech

Object associated with the Service- Database A5. The Login program module

Speech Object ("Login Speech Object") is of particular significance and

enables accessing and creating a private caller profile to effect the

customization of preferences and/or settings for each caller. For instance, the

caller may enter home or work addresses, telephone numbers and other personal information that enables the other program module Speech Objects

to make educated inferential decisions regarding what will be the caller's

most probable selections or utterances.

Figure A2 depicts a Login Speech Object (SOLogin A4) that permits

the VUI Application to distinguish between callers. The Login Speech Object

permits the caller to enter a personal identification number ("PIN") and

enables the caller to invoke dialogs to determine a forgotten PIN

(ForgotPasscode A2) or establish a new account (SONewAccount A3). The

Login Speech Object associates each caller's PIN with their telephone

number A10, thus upon login, the Login Speech Object process the caller's

telephone number and verifies that it corresponds with the caller's PIN. If so,

the Login Speech Object confirms the verification to the caller and returns to

the program module it came from. If the caller does not login, or if the caller

does not invoke the Login Speech Object, the Main Menu Speech Object

transitions to the next state. Figure A3 depicts a Passcode Speech Object to

retrieve a forgotten PIN. Figure A4 depicts a New Account Speech Object to

establish a new account.

The caller may at any time return to the Main Menu document by

uttering an acceptable speech grammar (e.g. "Home"). The Main Menu

document jumps directly to a second abbreviated greeting (SO_HomeMenu

A6) to account for the caller's familiarly with the VUI Application and to avoid

the caller having to retrace the same navigated path. At this conversation state, the caller may utter several of the same navigational choices previously

discussed.

Traffic Conditions Program Module Conversational State Diagram

Figure B1 depicts a Traffic Conditions Program Module conversational

state diagram ("Traffic Speech Object"). The caller can select the Traffic

Speech Object from the above Main Menu Speech Object module by uttering

a speech grammar that evokes the Traffic Speech Object (e.g. "Traffic"). In

turn, the Traffic Speech Object coveys prompts that direct the caller to utterances that indicate a region of interest for traffic condition information.

The Traffic Speech Object first processes the caller's area code and

prefix, and if the metro area associated with the caller's area code and prefix

are supported by the traffic Service-Database, the Traffic Speech Object

conveys to the caller the presumption that the Traffic Speech Object will use

the associated metro area for the caller's traffic region of interest

(SO_GetMetroTraffic B2). Otherwise, if the metro area for the city-state

combination is not supported, the Traffic Speech Object prompts the caller to

utter a particular city-state combination of interest (SO_City/State B1), and the

Traffic Module confirms a metro area associated with the caller's selected

city-state combination. If the selected metro area is supported by the traffic

Service-Database, the Traffic Module confirms that it will search the traffic

Service-Database (e.g SO_GetMetroTraffic B2) for traffic related incidents in

that metro area. The Traffic Module will prompt the caller for a new metro area if the caller at this time cancels the pending search by uttering a "cancel"

or "stop" speech grammar.

Differing prompts are conveyed to the caller depending upon the

amount of traffic information retrieved from the traffic Service-Database. For

5 instance, "no traffic incidents" can be directly conveyed to the caller, or the

occurrence of single traffic incident can also be directly conveyed to the caller

(SO_ReadTraffic B3). Else, if there are several incidents, the Traffic Speech

Object checks whether it has grammars of the metro area and if so prompts

the caller to enter a primary road or utter whether all the traffic incident reports

: r for that road are desired (SO_MajorRoad B4). If the Traffic Speech Object

supports the major road, a confirmatory prompt (SO_RoadConfirm B5) is

conveyed to the caller who may vocalize a confirmation and hear the traffic

incident report for that major road (SO_ReadTraffic B3). Otherwise, if there

are no highway grammars, all the traffic incidents are conveyed to the caller

is (SO_ReadTraffic B3). The list of traffic related incidents is initially brief, but

the caller can request additional information by uttering "that one." After

providing the additional information, the Traffic Speech Object continues

reading the list. The Traffic Module prompts the caller to optionally perform

another search prior to exiting to the Main Menu document.

20 In an alternate embodiment, the Traffic Speech Object engages the

caller in a series of dialogs to determine a specific road. The Traffic Speech

Object subsequently interfaces with the APi of the Traffic Service-Database to

determine and convey any available information to the caller. Business Finder Program Module Conversational State Diagram

Figure C depicts a Business Finder Service Program Module

conversational state diagram ("Business Finder Speech Object"). The caller

can select the Business Finder Module from the above Main Menu Speech

Object by uttering a speech grammar that evokes the Business Finder Module

(e.g. "Business Finder").

In a preferred embodiment of the Business Finder Speech Object, the

region of interest for the caller is presumed based upon information retained

in the caller profile or based upon the area code and prefix of the caller's

telephone number. The presumption is conveyed to the caller (e.g.

SO_AssumeLocation C1) who may either convey an affirmative speech

grammar to confirm that the presumption is correct, or optionally select

another region of interest with an utterance having a negative connotation(e.g.

"cancel") that will invoke a transition to another Speech Object that prompts

the caller to enter the desired region of interest (e.g.SO_CityState C2).

Upon the caller's utterance of a speech grammar acquiescing to the

Business Finder Speech Object's presumption or utterance and confirmation

of an alternate city and state, the caller is prompted (e.g. SO_Brand/Category

C3) to utter a specific brand name or to vocalize a category to search (e.g.

"grocery stores"). Upon receipt of the caller's response, the Business

Finder Speech Object interfaces with the API of the Business Finder

Database and retrieves the information that most probably fulfills the caller's

desires. The Business Finder Speech Object automatically first filters out matches that are more than a specified distance away (e.g. more that 50

miles). If however, there are no matches within the specified distance, the

Business Finder Speech Object adds back in the matches removed in the

previous step.

If there are multiple matches retrieved from the search, the Business

Finder Speech Object prompts (e.g. SO_FindNearest C4) the caller whether

the caller wants to hear the match that is closest to the caller's presumed

vicinity. If a confirming speech grammar is conveyed by the caller, the

Business Finder Speech Object transitions to the Address Finder Speech

Object (SOAddress C5) which returns processing control to the Business

Finder Speech Object when the Address Finder Speech Object has recorded

the caller's desired address. The Business Finder Speech Object then

conveys the matches that are nearest the vicinity of the caller's address.

However, if the caller elected not to provide an address, or if the search

retrieves no matches within a specified radius (e.g. fifty miles) that meet the

caller's request, a search of maximum radius is performed and the results are

conveyed to the caller. If a maximum radius search retrieves zero matches,

the caller is prompted whether a new search is desired.

The Business Finder Speech Object further includes the ability for the

caller to initiate a telephone call that will connect the current call to a business

establishment on the list, or provide driving directions to a business

establishment. Figure C2 depicts a conversational state diagram of this

additional functionality. As the search results are being read to the caller (SOJJstResults C6), the caller can navigate the list by uttering an appropriate

grammar such as "next" or "previous". The caller may also select a business

on the list by uttering an appropriate speech grammar such as "that one" or

"more information". Once a caller selects a business on the list, the caller may

choose either to receive driving directions or to place a telephone call to the

business selected. The Business Finder Speech Object audibly confirms the

caller's choice and prompts the caller to utter the caller's desired action

(SO_OneLocation C7). If the caller utters a "connect me" or similar speech

grammar, the Telephony Network Server initiates a telephone call to the

telephone number associated with the business selected by the caller. Else, if

the caller utters a speech grammar associated with "directions", the Business

Finder Speech Object will access the Driving Directions Speech Object if the

full address of the business selected by the caller was available in the

Service-Database.

Finally, the Business Finder Speech Object permits the caller to

perform additional searches C9, search for similar type businesses C10 (i.e.

within a same business type category), or find the nearest business C11 on

the list by uttering appropriate speech grammars (e.g. "new search", "find

similar", or "find nearest", respectively).

Stock Information Program Module Conversational State Diagram

Figure D1 depicts a Stock Information Program Module conversational

state diagram ("Stock Information Speech Object"). The caller can select the

Stock Information Program Module from the above Main Menu program Speech Object by uttering a speech grammar that evokes the Stock

Information Speech Object.

Upon the caller's utterance of a speech grammar indicating that stock

information is desired (e.g. "Stock"), the Stock Information Speech Object

conveys an audible confirmation to the caller (SO_AssumePortfolio D1).

Further, if the caller has established a stock portfolio in the private caller

profile, a speech object audibly alerts the caller that the Stock Information

Speech Object will assume what stocks are of particular interest to the caller

based upon the private caller profile. The Stock Information Speech Object

then interfaces with the API of the Stock Information database and retrieves

the most currently available data and reads it to the caller in a List Speech

Object.

The caller may opt out of the assumption by uttering a speech

grammar that indicates the caller's desire to do so (e.g. the caller names a

particular stock). If the caller's assumed portfolio is empty, or if the caller

cancels the presumption, the Stock Information Speech Object prompts the

caller to utter a stock information indicator (e.g. company name, ticker symbol,

or market index name). Upon receipt of the caller's uttered stock information

indicator, the Stock Information Speech Object performs a search of the stock

information database and reads the stock information to the caller.

Moreover, the Stock Information Speech Object permits the caller to

customize preferences regarding how the information is to be conveyed to the

caller. For example, the caller may wish to receive detailed information about stocks or abbreviated information. Thus, the Stock Information Speech

Object recognizes both contextually global - non temporal utterances (e.g.

"long quotes") that will globally effect the extent of information to be conveyed

about all stocks of interest, and item specific temporal utterances (e.g. "more

information", or "more details") that effect the extent of information to be

conveyed only about the specific stock that was just conveyed to the caller.

Item specific temporal utterances are characterized by a finite temporal

duration during which a caller's utterance is interpreted as an utterance only

for a specific item in the list.

Figure D2 depicts a conversational state diagram reflecting additional

functionality of the Stock Information Speech Object including adding and

removing a stock to the caller private profile and effecting the previously

described global - non temporal speech grammars and item specific temporal

speech grammars. If the caller utters a speech grammar to hear more about

a particular stock while it is being read to the caller (e.g. "long quotes" or

"details"), the Stock Sub-Module performs a search of the stock information

database for more information, and subsequently transitions back to the Stock

Information Speech Object and conveys an appropriate audible prompt to the

caller depending upon the information retrieved by the performed search.

Figure D3 depicts an example of a conversational state diagram for

conveying stock information list items to the caller (SOReadData D2). The

List Speech Object automatically sequentially relays the stock information to

the caller stopping only to interpret caller utterances and make modifications to the caller's preferences in accordance with previously described

capabilities. Process flow control returns to the Stock Information Speech

Object upon completion of the list of stock information.

Weather Conditions Program Module Conversational State Diagram

Figure E1 depicts the Weather Conditions Speech Object ("Weather

Speech Object"). Upon receipt of a caller's utterance meeting an acceptable

speech grammar for the Weather Speech Object, the caller is greeted.

Moreover, the Weather Speech Object infers a city for the caller based upon

the caller private profile or caller's telephone number and conveys the

inference to the caller (WeatherAssumeCity E1 ). Unless the caller utters a

context specific cancellation speech grammar (e.g. "new city") within a finite

time interval, the Weather Speech Object retrieves the weather information for

the city. Else the Weather Speech Object prompts the caller to enter another

city for which the Weather Conditions Speech Object will gather information

(WeatherGetCity E2) and convey the most relevant weather information to the

caller. Figure E2 depicts a List Speech Object for conveying the weather

information to the caller in a list format and further checks the caller private

profile to detect the preferred manner of receiving the weather information E3

(i.e. extended or abbreviated forecasts).

Address Locating Program Module Conversational State Diagram

Figure F1 depicts the conversation state diagram of the Address

Locating Program Module ("Address Speech Object"). The Address Speech Object is ordinarily transitioned to from another Speech Object that needs to

locate a specific address to perform a function that requires knowing a

specific street address (e.g. driving directions). Upon a transition to the

Address Speech Object, the caller is prompted to utter a particular city and

state of interest (SO_CityState F1) or to utter a Landmark.

Landmarks are preassigned speech grammars that can be both global

or particular to each caller and stored in each caller's private profile. "Airport"

is a special global grammar landmark that evokes an Airport Finder Speech

Object. The Airport Finder Speech Object searches the caller private profile

for a preferred preference and confirms this preference with the caller, who

may opt otherwise and engage in a dialog to pick an alternate airport.

If the caller utters another landmark that is particular to the caller

private profile, the Address Module will access the address associated with

the Landmark and return to the original program module from where the

transition came. If however, there is more than one address that meets the

caller's uttered Landmark (e.g. "airport"), the Address Module will transition to

an airport city disambiguation list dialog ( SO_AirportCity F2) to identify the

caller's desired airport and subsequently confirm and convey the desired

information to the caller. The Address Speech Object will loop back and

reengage the caller in the airport city list dialog if the city and state is not

supported by the database or if the caller conveys that the choice of airports is

incorrect. Upon uttering a city and state, the caller is engaged in dialogs to name

and confirm a desired street (SO_StreetName F3), name and confirm a street

number (SO_StreetNumber F4), or alternatively, if the street number is not

known, a cross street (SO_CrossStreet F5). When the Address Module has

received the caller's desired address, the Address Module prompts the caller

to confirm the address or, in appropriate circumstances will engage the caller

in a disambiguation and confirmation Speech Object

(SO_AddressDisambiguationAndConfirm F6) to resolve the ambiguity.

Upon obtaining the caller's final address, the Address Disambiguation

Speech Object engages the caller in speech objects that enable the caller to

change only a subset of information conveyed to the Address Disambiguation

Speech Object or alternatively, to begin searching from scratch. For example,

the caller may change the street name, or the cross street name.

Flight Information Program Module Conversational State Diagram

Figure G1 depicts the conversation state diagram of the Flight

Finder Program Module ("Flight Finder Speech Object"). Upon a transition to

the Flight Information Speech Object, the caller is greeted and prompted to

utter whether the caller wants arrival or departure information (SO Arrival

Departure G1) upon which the Flight Finder Speech Object transitions to the

Flight Information Program Module ("Flight Information Speech Object").

Figure G2 depicts the conversation state diagram of the Flight

Information Program Module ("Flight Information Speech Object"). Upon a transition to the Flight Information Speech Object, the caller is prompted to

utter an airline and flight number (SOAirlineFlight G2), to utter an airline or

flight number, or to utter neither the airline or flight number.

Alternate speech objects are transitioned to depending upon the extent

of information known by the caller and conveyed to the Flight Information

Speech Object. If the caller utters either only the airline or flight number, the

Flight Information Speech Object transitions to speech objects that prompt the

caller to utter either the flight number (SOFIightNumber G3) or the airline

(SOAirline G4), respectively. The Flight Information Speech Object then

confirms the caller's airline and flight number (SOFIightlnfoCandC G5) and

assumes that flight information is desired on the day of the call, but optionally

allows the caller to check flight information for another date (SOFIightDate

G6). Upon confirmation of the caller's desired flight information the Flight

Information Speech Object interfaces with the API of the flight information

Service Database to retrieve the caller's desired flight information.

The Flight Information Speech Object will transition alternate speech

objects depending upon the flight information retrieved from the Service-

Database. No available information results in a prompt to perform another

search (tSearchAnotherFlight G7). The existence of multiple flight legs

invokes a Speech Object that allows the caller to optionally choose a specific

leg of the flight (SOChooseLeg G8) for which to hear information. Otherwise,

the Flight Information Speech Object will convey the flight status information

to the caller (SOFIightStatus G9). The caller may also pick a flight without any specific information about

a airline or flight number. Figure G3 depicts the Itinerary Speech Object

which is invoked if the caller does not know the airline or flight number when

conversing with the Flight Information Speech Object. The Itinerary Speech

Object includes speech objects that allow the caller to choose a flight

regardless if an airline is known. For example, if the caller has not entered an

airline in the Flight Information Speech Object, the Itinerary Speech Object will

engage the caller in a dialog to determine an airline (e.g. tCheckAirline G10,

soAirline G11 ). If the caller does not know the airline, the Itinerary Speech

Object will engage the caller in a speech object to determine the airline if it is

supported by the Service-Database 60. If not the caller is prompted for

another airline (tAnotherAirline G12).

The Itinerary Speech Object engages the caller in speech objects to

determine the departure city (SOAirportCity(departure) G13) and arrival city

(SOAirportCity(arrival) G14) and then in a fight time Speech Object

(soFlightTime G15). The Itinerary Speech Object next checks with an

Itinerary check and confirm speech object (SO ItineraryCandC G16), gets a

list of flights meeting the caller's requirements from the flight information

Service-Database 60, and conveys it to the caller (SOReadRoutes G17).

Driving Directions Program Module Speech Object

The Driving Directions Speech Object determines point-to-point driving

directions given two addresses. See Figure H. The Driving Directions

Speech Object can be evoked both as a stand-alone program module Speech Object or from another program module, such as the Business Finder Speech

Object. If the Speech Object is evoked from another program module such as

Business Finder Speech Object, known addresses are passed from the first

program module to the Driving Directions Speech Object. Otherwise, the

Driving Directions Speech Object contains speech objects to determine either

the caller's source or destination address (SOSourceAddress H1 and

SODesitinationAddress H2, respectively).

The Driving Directions Speech Object interfaces with the API of an

independent service-database to retrieve point-to-point driving directions.

Upon retrieval of the driving directions, the Driving Directions Speech Object

formats the driving directions into a list that is conveyed to the caller

(SOReadDirections H3). The caller can navigate the list by uttering

appropriate speech grammars (e.g. "next", "previous", "start over", "stop",

"pause"). Alternatively, the caller may also receive the driving directions by

email or facsimile (SODrectionDeliveryMethod H4). Moreover, if a portion of

the caller's driving directions includes a particularly long stretch of road, the

Driving Directions Speech Object dynamically creates a prompt to query

whether the caller wants to hear directions from the first step or starting after

that road (SOStartFrom H5).

For logged in callers, the Driving Directions Speech Object has the

added capability of storing a set of driving directions in the caller private

profile after they have been determined, and further creating a prompt that

evokes the saved directions. Thus a caller may use the Driving Directions Speech Object to determine directions to a particular location, save the driving

directions to the caller private profile, and disconnect the telephone call. The

caller is prompted whether they would like to resume using the saved driving

directions when they call again and login (SOResume H6).

The preferred embodiment of the invention is described above in the

Drawings and Description of Preferred Embodiments. While these

descriptions directly describe the above embodiments, it is understood that

those skilled in the art may conceive modifications and/or variations to the

specific embodiments shown and described herein. Any such modifications

or variations that fall within the purview of this description are intended to be

included therein as well. Unless specifically noted, it is the intention of the

inventor that the words and phrases in the specification and claims be given

the ordinary and accustomed meanings to those of ordinary skill in the

applicable art(s). The foregoing description of a preferred embodiment and

best mode of the invention known to the applicant at the time of filing the

application has been presented and is intended for the purposes of illustration

and description. It is not intended to be exhaustive or to limit the invention to

the precise form disclosed, and many modifications and variations are

possible in the light of the above teachings. The embodiment was chosen

and described in order to best explain the principles of the invention and its

practical application and to enable others skilled in the art to best utilize the

invention in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

1. In a computer system, a method of retrieving and conveying information requested by a caller, comprising:

initiating a vocal user interface application session upon receipt of a 5 telephone call from the caller;

within a first vocal user interface program module, prompting the caller to enter a first vocal expression that describes the caller's desired information;

prompting the caller to enter a second vocal expression that more narrowly describes, relative to the vocal expression in the first vocal user o interface program module, the caller's desired information,

accessing a service-database that contains information regarding the caller's desired information,

searching the service-database for a database sample that most closely satisfies the service-search expression,

s retrieving the caller's desired information from the service-database,

formatting the caller's desired information into a vocal output, and

conveying the vocal output of the caller's desired information to the caller.

:0 2. The method in claim 1 further comprising the steps of,

recognizing the first vocal expression as a command to transition to a second vocal user interface program module;

transitioning to a second vocal user interface program module, and performing the steps of, prompting the caller to enter a second vocal expression that more narrowly describes, relative to the vocal expression in the first vocal user interface program module, the caller's desired information,

retrieving the caller's desired information from the service- database,

formatting the caller's desired information into a vocal output, and

conveying the vocal output of the caller's desired information to the caller,

within the second vocal user interface program module.

3. The method in claim 1 further comprising the step of,

inferring at least a portion of the caller's desired information based upon information delivered during a telephone call.

4. The method in claim 3 wherein the information delivered during a telephone call comprises,

at least a portion of a telephone number.

The method in claim 1 further comprising: aborting the processing of any step in the method upon receipt of a vocal command to do so by the caller.

6. The method in claim 3 further comprising,

checking a caller private profile that is identifiable with the caller by an account number.

7. The method in claim 6 further comprising,

inferring the caller account number based upon information delivered during the telephone call.

8. The method in claim 7 further comprising,

storing a portion of the caller's desired information in the caller private profile, and

inferring at least a portion of the caller's desired information based upon the stored portion of the caller's desired information.

9. The method in claim 1 wherein the service-database that contains information regarding the caller's desired information is selected from a group of service-databases consisting of;

a traffic condition service-databases, a stock information service- databases, a business finder service-databases, a weather condition service- databases, a flight information service-databases, or a driving directions service-databases.

10. The method in claim 9 wherein if the traffic condition service-database is selected, the step of accessing the service-database further comprises:

prompting the caller to vocally enter a city and state;

searching the number of traffic related incidents that pertain to the city and state;

prompting the caller to enter a road by name;

searching the location of traffic related incidents in that pertain to the road, and if there is traffic related information regarding the incidents on the road;

conveying the traffic related information regarding the incidents to the caller.

11. The method in claim 10 further comprising,

prompting the caller to vocally select a particular traffic related incident about which to receive more detail, and

conveying more detail to the caller upon receipt of the caller's vocal selection.

12. The method in claim 9 wherein if the traffic condition service-database is selected, the step of accessing the service-database further comprises:

inferring a city and state of interest for the caller based upon at least a portion of the caller's telephone number; searching the number of traffic related incidents that pertain to the city and state;

prompting the caller to enter a road by name;

searching the location of traffic related incidents that pertain to the road, and if there is traffic related information regarding the incidents on the road;

13. The method in claim 12 further comprising,

14. The method in claim 9 wherein if the stock information service- database is selected, the step of accessing the service-database further comprises:

prompting the caller to enter an investment indicator;

searching the independent service-database for the most recent value associated with the investment indicator,

conveying the most recent value to the caller.

15. The method in claim 14 further comprising, prompting the caller to vocally select an investment indicator about which to receive more detail, and

16. The method in claim 9 wherein if the stock information service- database is selected, the step of accessing the service-database further comprises:

retrieving an investment indicator from a private caller profile;

searching the service-database for the most recent value associated with the investment indicator,

conveying the most recent value to the caller.

17. The method in claim 14 further comprising the step of,

adding or removing an investment indicator from the private caller profile upon receipt of a vocal command to do so by the caller.

18. The method in claim 14 further comprising the step of,

providing more or less information about a particular investment indicator upon receipt of a vocal command to do so by the caller

19. The method in claim 9 wherein if the weather condition service- database is selected, the step of accessing an service-database further comprises; inferring a city and state for which weather information is desired based upon information conveyed by the caller,

searching the service-database to find current weather conditions for the city and state;

conveying the current weather conditions to the caller.

20. The method in claim 19 wherein the information conveyed by the caller is selected from a group of information items consisting of; a portion of the caller's telephone number, or a city and state stored in a caller private profile.

21. The method in claim 9 wherein if the weather condition service- database is selected, the step of accessing an service-database further comprises;

prompting the caller to vocally enter a city and state for which weather information is desired,

conveying the current weather conditions to the caller.

22. The method in claim 21 wherein the step of conveying further comprises;

prompting the caller to vocally enter a request for extended forecast information, and if it is available,

conveying the extended forecast information to the caller.

23. The method in claim 9 wherein if the business finder service-database is selected, the step of accessing an service-database further comprises;

prompting the caller to vocally enter a city and state,

prompting the caller to vocally enter a business search request,

searching the service-database to find a business address satisfying the business search request,

conveying the business address to the caller.

24. The method in claim 23 further comprising the step of,

prompting whether the caller desires to hear the business address, and if the caller answers affirmatively,

prompting the caller to enter an reference address from which to determine driving directions,

determining driving directions between the reference address and the business address, and

conveying the driving directions to the caller.

25. The method in claim 9 wherein if the business finder service-database is selected, the step of accessing an service-database further comprises;

inferring a city and state based upon information delivered during the telephone call,

prompting the caller to vocally enter a business search request, searching the service-database to find a business address satisfying the business search request,

conveying the business address to the caller.

26. The method in claim 25 wherein the information delivered during the telephone call is selected from a group of information items consisting of; a portion of the caller's telephone number, or a city and state stored in a caller private profile.

27. The method in claim 26 further comprising the step of, conveying to the caller the number of businesses found and

prompting whether the caller desires to hear the business addresses

closest to a reference address, and if the caller answers affirmatively,

obtaining a reference address from the caller,

searching the independent service-database to find relevant business

addresses,

conveying the business addresses to the caller.

28. The method in claim 9 wherein if the flight information service-database is selected, the step of accessing an service-database further comprises;

prompting the caller to vocally enter a flight information request,

searching the independent flight information service-database,

retrieving information that most closely matches the caller's flight information request, conveying the information to the caller.

29. The method in claim 28 wherein

the flight information request comprises arrival or departure information for unknown airlines, and the method further comprises,

searching the independent service database for information items meeting the request,

retrieving information items meeting the request,

conveying the information items to the caller in a list.

30. The method in claim 28 wherein

the flight information request comprises arrival or departure information regarding a particular flight that has multiple legs, and the method further comprises,

prompting if the caller wants to hear information regarding all the legs or regarding a particular leg.

31. The method in claim 9 wherein,

the caller selected the driving directions service-database, and the method further comprises,

prompting the caller to enter at least one address,

searching the driving directions service-database,

conveying the driving directions to the caller.

32. The method in claim 31 further comprising,

pausing the step of conveying the driving directions to the caller upon a vocal command from the caller.

33. The method in claim 31 further comprising,

saving the list of driving directions in a caller private profile,

resuming the step of conveying the driving directions to the caller.

34. The method in claim 33 wherein,

the step of resuming is performed in a subsequent vocal user interface application session.

35. The method in claim 31 wherein,

the step of conveying the driving directions to the caller is performed in a manner selected from the group consisting; email, fax, wap, or audible.

36. The method in claim 31 further comprising,

prompting whether the caller wants to hear all of the directions on a route from the start or from another point in the route.

37. An address finder vocal user interface for use in a computer system comprising;

a first caller address search software dialog that prompts for and accepts search requests from the group consisting of; an address within a city and state, a landmark address non specific to callers, a landmark address specific to a caller,

a software interface that accesses a service-database to search for and retrieve results for the search request, and

text-to-speech software that translates the results to speech.

38. The address finder vocal user interface in claim 37 wherein,

the address search request is an address within a city and state, and further comprising,

a second caller address search software dialog that prompts for and accepts search requests from the group consisting of; a street number or a cross-street name.

39. The address finder vocal user interface in claim 37 wherein,

the address search request is a landmark address specific to a caller, and further comprising,

software that access and retrieves an address stored in a private caller profile.

40. A business finder vocal user interface for use in a computer system comprising. software code that infers a city and state for a callers desired business location based upon information delivered during a telephone call,

a business finder software dialog that prompts the caller to vocally enter a business search request,

a software module that accesses a service-database to search for and retrieve results for the business search request, and

text-to-speech software that conveys the results to a caller.

41. The business finder vocal user interface in claim 40 wherein,

the information delivered during a telephone call is at least a portion of a caller telephone number.

42. The business finder vocal user interface in claim 40 wherein,

in a subsequent caller interaction with the vocal user interface, the software code that infers a city and state for a callers desired business location based upon information delivered during a telephone call, infers based on a previous caller interaction with the vocal user interface.

43. The business finder vocal user interface in claim 40 wherein,

acceptable caller vocal entries are selected from the group consisting of; a particular business name, and a business type.

44. The business finder vocal user interface in claim 40 further comprising, a telephony software dialog that accepts a caller's vocal command to initiate a telephone call to contact the business.

45. The business finder vocal user interface in claim 40 further comprising,

a driving directions software dialog that accepts a caller's source address and computes driving directions to the retrieved results of the business search request.

46. A driving directions vocal user interface in a computer system comprising,

an address software routine that accepts source and destination addresses during a telephone call,

a software interface that accesses an service-database to search for and retrieve driving directions between the source and destination addresses, and

text-to-speech software that translates the retrieved driving directions to speech for conveyance to a caller.

47. The driving directions vocal user interface in claim 46 wherein the address software dialog further comprises,

a second software dialog that prompts for and accepts a callers request that the directions be given from a point other than from the source address.

48. The driving directions vocal user interface in claim 46 wherein, the address software routine further comprises a software dialog that accepts verbally entered addresses from the caller.

49. The driving directions vocal user interface in claim 46 wherein,

the address software routine accepts addresses from an independent software program.

50. The driving directions vocal user interface in claim 46 further comprising,

software means for storing the driving directions, and

software means for conveying the driving directions to the caller in a subsequent interaction with the driving directions vocal user interface.

51. The driving directions vocal user interface in claim 49 wherein,

the independent software program is selected from the group consisting of; a business finder program module, or an address finder program module.

52. In a vocal user interface for use on a computer system , a method of interfacing with a caller, comprising:

querying the caller with a prompt intended to evoke either an affirmative response or a negative response from the caller, and

interpreting any response other than an affirmative response or a negative response as a negative response to the query, and utilizing the caller response in a subsequent query.

53. The method in claim 14 wherein the investment indicator comprises,

an investment indicator selected from the group consisting of; a publicly traded investment vehicle ticker symbol, the name of the business entity issuing the publicly traded investment vehicle, a market indicator ticker symbol, or a market indicator name.

54. The method in claim 21 wherein the step of conveying further comprises;

conveying the extended forecast information to the caller.

55. The method in claim 23 wherein,

56. The method in claim 23 further comprising the step of,

conveying to the caller the number of businesses found and

prompting whether the caller desires to hear the business addresses

closest to a reference address, and if the caller answers affirmatively,

obtaining a reference address from the caller,

searching the independent service-database to find relevant business

addresses, conveying the business addresses to the caller.

57. The method in claim 23 further comprising the step of,

conveying that the caller can request to hear directions to a business

address and if the caller requests to hear directions,

prompting the caller to enter an reference address from which to

determine driving directions,

determining driving directions between the reference address and the

business address, and conveying the driving directions to the caller.

58. The method in claim 25 further comprising the step of,

conveying to the caller the number of businesses found and

prompting whether the caller desires to hear the business addresses

closest to a reference address, and if the caller answers affirmatively,

obtaining a reference address from the caller,

searching the independent service-database to find relevant business

addresses,

conveying the business addresses to the caller.

59. The method in claim 25 further comprising the step of,

conveying that the caller can request to hear directions to a business

address and if the caller requests to hear directions,

prompting the caller to enter an reference address from which to

determine driving directions, determining driving directions between the reference address and the business address, and conveying the driving directions to the caller.

60. The method in claim 27 further comprising the step of,

conveying that the caller can request to hear directions to a business

address and if the caller requests to hear directions,

prompting the caller to enter an reference address from which to

determine driving directions,

determining driving directions between the reference address and the business address, and conveying the driving directions to the caller.

61. In a vocal user interface for use on a computer system , a method of interfacing with a caller, comprising:

interpreting any response other than an affirmative response or a negative response as an affirmative response to the query, and

utilizing the caller response in a subsequent query to the caller.

62. In a vocal user interface implemented on a computer, a method

of interfacing with a caller, comprising:

making an educated inferential decision regarding what will be the

caller's most probable utterances in a conversation state where caller-specific

information is accessible,

performing a subsequent action within the vocal user interface based

on the educated inferential decision, and canceling the subsequent action upon a caller's corrective vocalization.