WO1999044345A2 - Controlling navigation paths of a speech-recognition process - Google Patents

Controlling navigation paths of a speech-recognition process Download PDF

Info

Publication number
WO1999044345A2
WO1999044345A2 PCT/US1999/004747 US9904747W WO9944345A2 WO 1999044345 A2 WO1999044345 A2 WO 1999044345A2 US 9904747 W US9904747 W US 9904747W WO 9944345 A2 WO9944345 A2 WO 9944345A2
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
speech
actions
computer program
prompts
Prior art date
Application number
PCT/US1999/004747
Other languages
French (fr)
Other versions
WO1999044345A3 (en
Inventor
Mark S. Pondsack
Gareth L. Gabrys
Raja K. Sait
Peter Grossman
Matthew D. Womer
Tim J. Collins
Lois W. Kaznicki
Diane P. Ballestas
Gary M. Jaspersohn
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to JP2000533989A priority Critical patent/JP2002505556A/en
Priority to KR1020007009503A priority patent/KR20010086258A/en
Priority to AU29826/99A priority patent/AU2982699A/en
Priority to EP99911100A priority patent/EP1057317A2/en
Publication of WO1999044345A2 publication Critical patent/WO1999044345A2/en
Publication of WO1999044345A3 publication Critical patent/WO1999044345A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition

Definitions

  • the invention facilitates control of navigation paths in a
  • the invention organizes the prompts, actions, and speech elements in a
  • a user can alter the navigation paths in the
  • Embodiments may include one or more of the following features. Altering
  • the navigation paths can be done by interacting with a graphical user interface.
  • the user can edit the position of a node in the hierarchical list by a drag and drop
  • the prompts, actions, and speech elements can be any suitable prompts, actions, and speech elements.
  • the user may also be presented with a display of hierarchically included
  • nodes of a selected group node in a separate list of hierarchically included nodes are nodes of a selected group node in a separate list of hierarchically included nodes.
  • the user may be able to collapse or expand group nodes to alter the display of the
  • the speech-recognition process can be a call routing process.
  • routing process can include forwarding calls to phone extensions or playing
  • Advantages may include one or more of the following.
  • call flow management Further, the ability to process call flow based on hierarchical data can not only help callers reach an appropriate extension but can
  • the invention may be implemented in hardware or software, or a
  • the technique is implemented in computer
  • memory and/or storage elements at least one input device, and at least one
  • Program code is applied to data entered using the input device to
  • output information is applied to one or more output devices.
  • Each program is preferably implemented in a high level procedural or
  • object oriented programming language to communicate with a computer system.
  • programs can be implemented in assembly or machine language, if
  • the language may be compiled or interpreted language.
  • Each such computer program is preferably stored on a storage medium or
  • ROM or magnetic diskette that is readable by a general or special
  • FIGS. 1A-1D are diagrams illustrating autoattendant functions.
  • FIG. 2 is a diagram of a computer platform that includes autoattendant
  • FIG. 3 is a diagram of autoattendant components.
  • FIG. 4 is a diagram of table interrelations in an autoattendant relational
  • FIG. 5 is a diagram of hierarchy records.
  • FIG. 6 is a flowchart illustrating the relationship between hierarchically
  • FIG. 7 is a screen display of a graphical user interface (GUI) that manages GUI.
  • GUI graphical user interface
  • FIGS. 8A-8D are screen displays of autoattendant GUI dialogs.
  • an autoattendant configuration 10 forwards
  • the autoattendant 24 can ask a caller questions
  • the autoattendant 24 instructs the switch 20 to connect the incoming call 12 with the
  • extensions 16 and 18 may be phones of employees in a sales department 26. If
  • the autoattendant 24 can ask a caller which department they would like to reach.
  • the autoattendant 24 can either forward
  • the autoattendant 24 can also process calls
  • employee 14 needs to talk with another employee in a particular department 26.
  • the autoattendant 24 can analyze the caller's responses to questions.
  • the autoattendant 24 can perform
  • the autoattendant 24 can play speech files of
  • an autoattendant 24 can include a computer system
  • processor 36 that includes a processor 36, memory 34, and other components such as bus
  • the computer platform 24 includes a standard PC
  • a type keyboard 28 a pointing device such as a mouse 30, and a monitor 27.
  • computer system 32 includes a mass storage element 38 such as a CD, floppy disk, hard disk, etc.
  • the computer system 32 receiving incoming calls through a
  • Mass storage element 32 includes autoattendant management software 40,
  • the management software handles voice user interface (VUI) software 42.
  • VUI voice user interface
  • GUI graphical user interface
  • the VUI 42 processes incoming
  • data 44 includes different relational databases 50 and
  • database 50 and 52 corresponds to a different call flow and produces different
  • Prompt files 54 and 58 include indexed signal information used by the
  • VUI 42 to produce autoattendant speech. For example, after accessing a relational
  • the VUI 42 may retrieve
  • prompt file 54 or 58 information needed to produce a particular prompt e.g.,
  • Prompt files 54 and 58 can include both
  • Grammar files 56 and 60 include indexed signal information that
  • the VUI 42 can access a relational database 50 or 52 to determine how the
  • autoattendant should respond (e.g., forwarding the call to an extension or playing
  • Compiling software 64 produces grammar files 56 and 60 from relational
  • database 50 and 52 records. Compiling can occur either incrementally, en masse
  • the management software 40 service manager 66 enables a manager to
  • incoming channels 62 For example, a business may have a set of phone
  • the VUI 42 can
  • each relational database such as database 50,
  • the configuration includes a call flow configuration (configuration) record 64.
  • the configuration includes a call flow configuration (configuration) record 64.
  • record 64 stores data describing general parameters such as the type of switch
  • the configuration record 64 can also store information that indicates normal business hours, holidays, and an extension (e.g., voice mail or an
  • the configuration record 64 also stores a configuration type identifier that
  • the VUI 42 can list the names of people in that
  • Each relational database 50 includes a table of hierarchy records 66.
  • each hierarchy record 66 describes a node in a hierarchy
  • a node can be a group node 26, 130, 132, 134, or a terminal node.
  • a terminal can be a group node 26, 130, 132, 134, or a terminal node.
  • node can represent an extension 14, 16, 18, 24, 140, a speech file 136, 138, or a
  • 130, 132, 134 can hierarchically include (i.e., parent) any of the other node types.
  • a hierarchy table 66 record includes a unique
  • a name detail table 70 record includes an
  • VUI 42 can use to forward an incoming packet
  • a group detail table 72 record includes a group name, but does not include
  • a pronunciation table 74 describes words in both the name detail 70 (e.g.,
  • group detail 72 e.g., the name of the group
  • a name of "John Doe” contains two words and is represented by two
  • process (64 in FIG. 3) stores the collected phonemes as an entry in a grammar file
  • the VUI 42 finds j-ah-n in a grammar file and searches the relational database 50 hierarchy table 66 for the corresponding hierarchy 66
  • the VUI 42 can forward the
  • the VUI 42 can play the speech
  • the VUI 42 can play a
  • group-level prompt to further query a caller.
  • the management system 40 can import data into a database 50 from a
  • a manager supplies an appropriate ODBC driver.
  • the autoattendant can load each data source record into hierarchy 66, name
  • the manager can also specify
  • the database 50 also includes data that controls the prompts the VUI 42
  • the autoattendant data 44 includes pre-recorded prompts for
  • the prompts correspond to caller navigation to different hierarchy nodes (records).
  • navigating to a configuration node 64 can trigger a message telling a
  • 134 can trigger a group prompt telling a caller to choose a particular employee or
  • Each node can have several associated prompts.
  • VUI 42 can choose prompts based on caller behavior. For example, the VUI 42
  • the template prompt table 73 stores references to
  • pre-recorded prompts in a prompt file A manager can record over a pre-recorded
  • prompt table 75 record that references a different prompt in the prompt file.
  • the VUI 42 first checks the prompt table 75 for a prompt record
  • the VUI 42 can then retrieve
  • call flow follows the hierarchy defined in the
  • the VUI 42 positions the caller at the
  • each node has an associated set of prompts.
  • the VUI 42
  • the VUI 42 plays a prompt for the caller's current node position (112) based on caller behavior (e.g., how many times the caller a visited the same node).
  • caller behavior e.g., how many times the caller a visited the same node.
  • the VUI 42 identifies a hierarchy table 66 record that
  • a name record i.e., a
  • the autoattendant can forward the call (120). If
  • the VUI 42 advances the caller to
  • the management software 40 includes a graphical user interface (GUI) 84
  • MFC Microsoft Foundation Class
  • the GUI 84 provides a manager with the
  • the GUI 78 is to providing an intuitive relational database management system.
  • Hierarchical list display 90 includes a hierarchical list display 90, and a display of hierarchically included
  • nodes 92 of a selected group in the hierarchical list display 90 are nodes 92 of a selected group in the hierarchical list display 90.
  • the hierarchical list display 90 shows an outline of call flow as embodied
  • the hierarchical list display 90 lists the names of the
  • Hierarchical list display 90 shows nodes included in a particular node.
  • Hierarchical list display 90 expands the hierarchical list display 90 to show nodes
  • node 96 produces a hierarchical list display 90 that includes listings of included
  • Closing e.g.,
  • node 96 would conceal group nodes 95 from display on the hierarchical list
  • a manager can manipulate groups from the hierarchical list display 90.
  • a manager can add and delete groups nodes from a configuration.
  • the hierarchical list display 90 also offers a "drag-and-drop" capability. For
  • a manager can drag a selected group into another group. Doing so, alters
  • the hierarchically included node display 92 shows the contents of a
  • selected hierarchical list display 90 element For example, selecting a group node
  • the display 92 can include node
  • the display 92 can further display information (e.g., name, extension, or remarks).
  • the display 92 can further display information (e.g., name, extension, or remarks).
  • management information about each node For example, if an employee
  • the display 92 can indicate this by
  • a manager can sort the
  • a manager can add, delete, and edit display 92 elements.
  • a manager can add, delete, and edit display 92 elements.
  • management system 40 alters database contents based on these actions. This
  • GUI dialogs provide easy management of
  • manager can edit information in dialog fields that describe a configuration record.
  • a manager can alter the configuration level prompt messages issued by the VUI 42 in response to events caused by navigation to a configuration node
  • the management software further records - when
  • GUI presents a manager with a "Keep changes made" dialog
  • group node information can alter the node's description in the hierarchy table
  • management system 40 conceals this cascade of database changes from a
  • selecting a name node produces a name properties
  • dialog In this dialog, a manager can alter an employee's extension or alter the
  • a manager can record a pronunciation of the employee's name or let the
  • management software 40 also allows individual employees to remotely (i.e., from

Abstract

A method and computer program for controlling navigation paths in a speech-recognition process in which prompts are provided to a user, and actions are taken based on comparison of the user's spoken responses to stored speech elements. The method includes organizing the prompts, actions, and speech elements in a hierarchy of nodes that include group nodes and terminal nodes, and displaying the nodes in a hierarchical list that indicates navigation paths in the speech-recognition process. A user can alter the navigation paths in the speech-recognition process by editing the position of the nodes in the hierarchical list. The speech-recognition process may be a call routing process.

Description

CONTROLLING NAVIGATION PATHS OF A SPEECH-RECOGNITION
PROCESS
Background of the Invention
Businesses often offer main phone numbers as an entry point into their
phone network. Customers who call a main number frequently need assistance in
reaching a particular person or department. Speech-based automated attendant
(autoattendant) software uses voice-recognition technology to quickly forward
calls from a main number to a particular extension. For example, an autoattendant
might ask "What is the name of the employee you are trying to reach?" then
analyze a caller's spoken reply to determine a call destination. Automating this
task reduces the burden of offering main business phone numbers.
Unfortunately, information management tasks can partially offset the
advantages of installing a speech-based autoattendant. For example, scripting the
autoattendant's queries and responses to caller speech ("call flow") can become
time consuming. Additionally, coordinating the different speech generating,
speech recognizing, and phone directory data can require significant database
management efforts.
Summary of the Invention
In general, the invention facilitates control of navigation paths in a
speech-recognition process in which prompts are provided to a user, and actions
are taken based on comparison of the user's spoken responses to stored speech elements. The invention organizes the prompts, actions, and speech elements in a
hierarchy of nodes comprising group nodes and terminal nodes and displays the
nodes in a hierarchical list that indicates navigation paths in the
speech-recognition process. A user can alter the navigation paths in the
speech-recognition process by editing the position of the nodes in the hierarchical
list.
Embodiments may include one or more of the following features. Altering
the navigation paths can be done by interacting with a graphical user interface.
The user can edit the position of a node in the hierarchical list by a drag and drop
operation that can also edit the position of hierarchically included nodes. A user
can add, delete, or edit nodes. The prompts, actions, and speech elements can be
organized as records in a relational database.
The user may also be presented with a display of hierarchically included
nodes of a selected group node in a separate list of hierarchically included nodes.
The user may be able to collapse or expand group nodes to alter the display of the
hierarchical list.
The speech-recognition process can be a call routing process. The call
routing process can include forwarding calls to phone extensions or playing
speech files.
Advantages may include one or more of the following.
Controlling call flow with an intuitive user interface reduces the burden of
call flow management. Further, the ability to process call flow based on hierarchical data can not only help callers reach an appropriate extension but can
also speed autoattendant response by narrowing acceptable responses to any
particular prompt.
The invention may be implemented in hardware or software, or a
combination of both. Preferably, the technique is implemented in computer
programs executing on programmable computers that each include a processor, a
storage medium readable by the processor (including volatile and non-volatile
memory and/or storage elements), at least one input device, and at least one
output device. Program code is applied to data entered using the input device to
perform the functions described above and to generate output information. The
output information is applied to one or more output devices.
Each program is preferably implemented in a high level procedural or
object oriented programming language to communicate with a computer system.
However, the programs can be implemented in assembly or machine language, if
desired. In any case, the language may be compiled or interpreted language.
Each such computer program is preferably stored on a storage medium or
device (e.g., ROM or magnetic diskette) that is readable by a general or special
purpose programmable computer for configuring and operating the computer
when the storage medium or device is read by the computer to perform the
procedures described in this document. The system may also be considered to be
implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to
operate in a specific and predefined manner.
Other features and advantages will be apparent from the following
detailed description, including the drawings, and from the claims.
Brief Description of the Drawing
FIGS. 1A-1D are diagrams illustrating autoattendant functions.
FIG. 2 is a diagram of a computer platform that includes autoattendant
components.
FIG. 3 is a diagram of autoattendant components.
FIG. 4 is a diagram of table interrelations in an autoattendant relational
database.
FIG. 5 is a diagram of hierarchy records.
FIG. 6 is a flowchart illustrating the relationship between hierarchically
organized nodes and call flow navigation paths.
FIG. 7 is a screen display of a graphical user interface (GUI) that manages
an autoattendant.
FIGS. 8A-8D are screen displays of autoattendant GUI dialogs.
Description of the Preferred Embodiments
Referring to FIGS. 1A-1D, an autoattendant configuration 10 forwards
incoming calls 12 received by a switch 20, such as a PBX (private branch
exchange), to an autoattendant 24. The autoattendant 24 can ask a caller questions
to determine which extension 14, 16, 18, or 22 the caller wants to reach. The autoattendant 24 instructs the switch 20 to connect the incoming call 12 with the
determined extension as shown in FIG. IB.
Different extensions may belong to different groups. For example,
extensions 16 and 18 may be phones of employees in a sales department 26. If
provided with a description of this hierarchical department/employee relationship,
the autoattendant 24 can ask a caller which department they would like to reach.
After determining the caller's response, the autoattendant 24 can either forward
the call to an extension in the sales department 26 or ask the caller which
extension 16 or 18 within the sales department 26 the caller wishes to reach.
As shown in FIG. 1C, the autoattendant 24 can also process calls
originating from extensions connected to the switch 20, for example, when an
employee 14 needs to talk with another employee in a particular department 26.
After analyzing the caller's responses to questions, the autoattendant 24 can
connect the caller as shown in FIG. ID.
Instead of routing an incoming call 12, the autoattendant 24 can perform
further call processing. For example, the autoattendant 24 can play speech files of
technical support information or road directions.
Referring to FIG. 2, an autoattendant 24 can include a computer system
32 that includes a processor 36, memory 34, and other components such as bus
interface circuits (not shown). The computer platform 24 includes a standard PC
type keyboard 28, a pointing device such as a mouse 30, and a monitor 27. The
computer system 32 includes a mass storage element 38 such as a CD, floppy disk, hard disk, etc. The computer system 32 receiving incoming calls through a
line card 37. Portions of mass storage element 38 are transferred to memory 34
and processor 36 in the course of operation.
Mass storage element 32 includes autoattendant management software 40,
data 44, and voice user interface (VUI) software 42. The management software
40 provides a graphical user interface (GUI) that displays autoattendant
information on the monitor 27 and enables a manager to quickly edit and
configure data 44 to provide a desired call flow. The VUI 42 processes incoming
calls based on the data 44 as arranged by the management software 40. Locking
techniques permit the management software 40 to alter data 44 without
interrupting VUI 42 service.
Referring to FIG. 3, data 44 includes different relational databases 50 and
52, however, other implementations use a single relational database. Each
database 50 and 52 corresponds to a different call flow and produces different
prompt 54 and 58 and grammar 56 and 60 files.
Prompt files 54 and 58 include indexed signal information used by the
VUI 42 to produce autoattendant speech. For example, after accessing a relational
database 50 or 52 to determine which prompt to play, the VUI 42 may retrieve
prompt file 54 or 58 information needed to produce a particular prompt (e.g.,
"Thank you for calling our business"). Prompt files 54 and 58 can include both
prerecorded and site specific prompts. Grammar files 56 and 60 include indexed signal information that
represents the different speech elements the autoattendant can recognize in
response to a particular prompt (e.g., "sales, please"). After analyzing caller
speech, the VUI 42 can access a relational database 50 or 52 to determine how the
autoattendant should respond (e.g., forwarding the call to an extension or playing
another prompt).
Compiling software 64 produces grammar files 56 and 60 from relational
database 50 and 52 records. Compiling can occur either incrementally, en masse
for better run-time retrieval, or both when the manager initiates an incremental
compilation and allows automatic scheduling of an off hours compilation.
The management software 40 service manager 66 enables a manager to
assign different call flows, as embodied in different databases 50 and 52, to
different incoming channels 62. For example, a business may have a set of phone
lines for customer inquires and another set for technical support. The VUI 42 can
examine a call directing file 48 to determine which relational database 50 or 52
provides assigned call flow. Many channels 62 can simultaneously use the same
relational database 50 or 52.
Referring to FIG. 4, each relational database, such as database 50,
includes a call flow configuration (configuration) record 64. The configuration
record 64 stores data describing general parameters such as the type of switch
connected to the autoattendant and the language being used (e.g., English or
Spanish). The configuration record 64 can also store information that indicates normal business hours, holidays, and an extension (e.g., voice mail or an
operator) for handling messages received after hours.
The configuration record 64 also stores a configuration type identifier that
controls the prompts the VUI 42 uses to query a caller. For example, a "Basic"
configuration produces prompts for a fiat configuration that does not nest
extensions or other groups within groups. For example, a basic configuration type
might prompt "Please say the name of the person or department you would like to
reach" and forward the call based on the callers response.
A "Department-Name" configuration produces prompts for a multi-level
configuration that nests extensions, groups, etc. within other groups. A
department-name configuration might prompt a caller: "To reach a party please
say the name of their department or wait for a list of departments." If the caller
responds with a department name, the VUI 42 can list the names of people in that
department for selection.
Each relational database 50 includes a table of hierarchy records 66.
Referring also to FIG. 5, each hierarchy record 66 describes a node in a hierarchy
68. A node can be a group node 26, 130, 132, 134, or a terminal node. A terminal
node can represent an extension 14, 16, 18, 24, 140, a speech file 136, 138, or a
file that includes further call processing instructions (not shown). A group 26,
130, 132, 134 can hierarchically include (i.e., parent) any of the other node types.
The connections between nodes form different navigation paths a caller can navigate. Referring again to FIG. 4, a hierarchy table 66 record includes a unique,
identification number used as an index 5 into other database 50 tables.
Records in name detail 70 and group detail 72 tables further describe each
hierarchy table 66 record. For example, a name detail table 70 record includes an
employee's name and an extension the VUI 42 can use to forward an incoming
call. A group detail table 72 record includes a group name, but does not include
extension data since a group in a department-name typed configuration does not
result in call forwarding until further caller querying (e.g., "who in the department
would you like to speak with?").
A pronunciation table 74 describes words in both the name detail 70 (e.g.,
a person's name) and group detail 72 (e.g., the name of the group) tables. For
example, a name of "John Doe" contains two words and is represented by two
corresponding pronunciation table 74 records. A pronunciation table 74 record
includes both the word and the phonemes that construct the word. For example,
the phonemes "j", "ah" and "n" describe the word "John." The compilation
process (64 in FIG. 3) stores the collected phonemes as an entry in a grammar file
along with the unique identification number of the hierarchy table 66 record that
corresponds to the pronunciation record 74. When a caller speaks, the VUI 42
detects phonemes in the caller's speech, checks the phonemes against phonemes
in the grammar file, and retrieves the hierarchy table 66 record that corresponds
with the grammar file phonemes that match the caller's speech. For example,
when a caller says "John", the VUI 42 finds j-ah-n in a grammar file and searches the relational database 50 hierarchy table 66 for the corresponding hierarchy 66
record. If the hierarchy 66 record stores a name, the VUI 42 can forward the
caller to the extension stored in a corresponding name detail 70 record. If the
hierarchy 66 record stores a speech file reference, the VUI 42 can play the speech
file to the caller. If the hierarchy 66 record stores a group, the VUI 42 can play a
group-level prompt to further query a caller.
The management system 40 can import data into a database 50 from a
variety of sources including CSV (Comma Separated Value) files and any ODBC
(Open Database Connectivity) compliant data source (e.g., Microsoft Excel™ or
Accesses) provided a manager supplies an appropriate ODBC driver. After a
manager links fields in the imported data source with autoattendant database
fields, the autoattendant can load each data source record into hierarchy 66, name
detail 70, and group detail 72 records. The autoattendant automatically produces
corresponding pronunciation table 74 records. The manager can also specify
whether an import record (e.g., a particular person's information) overwrites an
existing record or is ignored. By importing pre-existing human resources data
files, managers can quickly begin using an autoattendant without laborious data
entry.
The database 50 also includes data that controls the prompts the VUI 42
plays to a caller. The autoattendant data 44 includes pre-recorded prompts for
responses to predefined events (e.g., the caller fails to respond or the caller says
something not in the grammar file). Referring again to FIG. 5, the prompts correspond to caller navigation to different hierarchy nodes (records). For
example, navigating to a configuration node 64 can trigger a message telling a
caller that a call occurred after hours. Navigating to a group node 26, 130, 132,
134 can trigger a group prompt telling a caller to choose a particular employee or
other.node within a group. Each node can have several associated prompts. The
VUI 42 can choose prompts based on caller behavior. For example, the VUI 42
can keep track how many times a caller fails to respond and play a series of
different prompts before terminating the call.
Referring back to FIG. 4, the template prompt table 73 stores references to
pre-recorded prompts in a prompt file. A manager can record over a pre-recorded
prompt, perhaps including business specific information, producing an overriding
prompt table 75 record that references a different prompt in the prompt file.
During a call, the VUI 42 first checks the prompt table 75 for a prompt record
before checking the template prompt table 73. The VUI 42 can then retrieve
corresponding prompt file information to produce autoattendant speech.
Referring to FIG. 6, call flow follows the hierarchy defined in the
database. After receiving an incoming call, the VUI 42 positions the caller at the
root of the hierarchy, the configuration node 64 (110). Thereafter, the VUI 42
engages the caller in "conversation" that controls navigation through the hierarchy
(114-124).
As discussed, each node has an associated set of prompts. The VUI 42
plays a prompt for the caller's current node position (112) based on caller behavior (e.g., how many times the caller a visited the same node). The VUI 42
analyzes a caller's response to a prompt by checking the node's associated
grammar file (114). The VUI 42 identifies a hierarchy table 66 record that
corresponds to the caller's speech. If the caller has specified a name record (i.e., a
record with an extension) (118) the autoattendant can forward the call (120). If
the caller has instead specified a group record, the VUI 42 advances the caller to
the group node corresponding to the caller's speech (124) and begins the
prompt/response exchange again (112).
Referring to FIG. 7, the description of the autoattendant data architecture
and call flow may have seemed complicated. Fortunately, management software
40 significantly reduces the complexity of managing such a system.
The management software 40 includes a graphical user interface (GUI) 84
constructed from different Microsoft Foundation Class (MFC) controls (e.g.,
buttons, list controls, and dialogs). The GUI 84 provides a manager with the
ability to quickly define and alter the database node hierarchy (FIG. 6) in addition
to providing an intuitive relational database management system. The GUI 78
includes a menu bar 86, toolbar buttons 88, and a side-by-side display that
includes a hierarchical list display 90, and a display of hierarchically included
nodes 92 of a selected group in the hierarchical list display 90.
The hierarchical list display 90 shows an outline of call flow as embodied
in the node hierarchy. The hierarchical list display 90 lists the names of the
configuration and hierarchy nodes. Alongside each listed name appears a folder icon 94 and a sign 93 (e.g., "+" or "-"). A sign 93 indicates whether the
hierarchical list display 90 shows nodes included in a particular node.
Expanding (e.g., clicking a listing "+" sign 93) a hierarchy node in the
hierarchical list display 90 expands the hierarchical list display 90 to show nodes
hierarchically included within the expanded node. For example, expanding group
node 96 produces a hierarchical list display 90 that includes listings of included
group nodes 95 indented relative to the expanded group node 96. Closing (e.g.,
clicking a listing "-" sign) a hierarchy node conceals nodes within the closed
hierarchy node from the hierarchical list display 90. For example, closing a group
node 96 would conceal group nodes 95 from display on the hierarchical list
display 90.
A manager can manipulate groups from the hierarchical list display 90.
For example, a manager can add and delete groups nodes from a configuration.
The hierarchical list display 90 also offers a "drag-and-drop" capability. For
example, a manager can drag a selected group into another group. Doing so, alters
a hierarchy, nesting the selected group within the other group. While presenting a
caller with more levels of navigation may seem undesirable, this technique can
help a caller quickly winnow through a tremendous amount of information
whether extensions, technical support information, etc. Nesting also narrows the
possible responses the VUI 42 needs to consider to determine a grammar file
match for caller speech speeding VUI 42 response. The hierarchically included node display 92 shows the contents of a
selected hierarchical list display 90 element. For example, selecting a group node
97 in the hierarchical list display 90 changes the selected group elements icon to
an open folder and lists hierarchically included groups and extensions in the
hierarchically included node display 92. The display 92 can include node
information (e.g., name, extension, or remarks). The display 92 can further
include management information about each node. For example, if an employee
has not recorded a pronunciation of her name, the display 92 can indicate this by
marking a node with an exclamation point (not shown). A manager can sort the
display 92 by a variety of criteria such as alphabetical order, when the node was
added to the hierarchy, etc.
A manager can add, delete, and edit display 92 elements. A manager can
also move elements (i.e., groups or names) into a different positions in the call
hierarchy by dragging-and-dropping the element into a different group in either
the hierarchically included node display 90 or the hierarchical list display 92. The
management system 40 alters database contents based on these actions. This
allows a manager to quickly reorganize data and alter call flow.
Referring to FIGS. 8A and 8B, GUI dialogs provide easy management of
database information. For example, selecting a configuration node (from FIG. 7)
for editing produces the tabbed dialogs shown in 8 A and 8B. In FIG. 8 A, a
manager can edit information in dialog fields that describe a configuration record.
In FIG. 8B, a manager can alter the configuration level prompt messages issued by the VUI 42 in response to events caused by navigation to a configuration node
including initial call processing.
Selecting a dialog's "OK" button 100 saves the edited information in the
relational database, in this case, potentially updating the configuration record and
adding new prompt records. The management software further records - when
database changes occur to coordinate record locking. The "Close" button 102
discards edits. The GUI presents a manager with a "Keep changes made" dialog
when the manager ends a management session. Those familiar with database
concepts will recognize that the "OK" button resembles a SQL INSERT
command while the "Keep changes made" dialog causes a database commit or
rollback. Familiarity with such database concepts, however, is unnecessary for
system management since the GUI presents these database concepts in dialog
buttons familiar even to casual word-processing users.
Referring to FIG. 8C, another dialog permits editing of group node
information and optionally recording a pronunciation of the group name. Altering
group node information can alter the node's description in the hierarchy table and
potentially add, delete, or modify prompt and pronunciation records. Again, the
management system 40 conceals this cascade of database changes from a
manager to ease autoattendant management.
Referring to FIG. 8D, selecting a name node produces a name properties
dialog. In this dialog, a manager can alter an employee's extension or alter the
phonemes that construct the employee's name. Similar to the group node dialog, a manager can record a pronunciation of the employee's name or let the
management software generate a pronunciation based on spelling. The
management software 40 also allows individual employees to remotely (i.e., from
any phone) record pronunciation of their own name. Any alterations can produce
or alter pronunciation records and produce entries in a prompt file.
Other embodiments are within the scope of the following claims. The
techniques described should not be considered limited to autoattendant functions,
but instead can be incorporated into a variety of applications.

Claims

What is claimed is:
1. A method of controlling navigation paths in a speech-recognition process in
which prompts are provided to a user, and actions are taken based on comparison of the user's
spoken responses to stored speech elements, the method comprising:
organizing the prompts, actions, and speech elements in a hierarchy of nodes
comprising group nodes and terminal nodes;
displaying the nodes in a hierarchical list that indicates navigation paths in the
speech-recognition process; and
altering the navigation paths in the speech recognition process in response to the user's
editing of the position of the nodes in the hierarchical list.
2. The method of claim 1, wherein altering the navigation paths comprises
interacting with a graphical user interface.
3. The method of claim 1, wherein editing the position of a node in the
hierarchical list comprises a drag and drop operation which comprises editing the position of
hierarchically included nodes.
4. The method of claim 1, wherein organizing the prompts, actions, and speech
elements in a hierarchy of nodes comprises organizing records in a relational database.
5. The method of claim 1, wherein organizing the prompts, actions, and speech
elements in a hierarchy of nodes comprises adding, deleting, or editing nodes.
6. The method of claim 1, further comprising displaying hierarchically included
nodes of a selected group node in a separate list of hierarchically included nodes.
7. The method of claim 1, wherein group nodes in the hierarchical list can be
expanded to display hierarchically included nodes or collapsed to hide hierarchically include
nodes.
8. The method of claim 1, wherein the speech-recognition process comprises a
call routing process.
9. The method of claim 8, wherein terminal nodes comprise phone extensions.
10. The method of claim 9, wherein actions comprise forwarding a call to a phone
extension when the user navigates to a phone extension terminal node.
11. The method of claim 8, wherein terminal nodes comprise speech files.
12. A computer program, residing on a computer readable medium, for controlling
navigation paths in a speech-recognition process in which prompts are provided to a user, and
actions are taken based on comparison of the user's spoken responses to stored speech
elements, the program comprising instructions for:
organizing the prompts, actions, and speech elements in a hierarchy of nodes
comprising group nodes and terminal nodes;
displaying the nodes in a hierarchical list that indicates navigation paths in the
speech-recognition process; and
altering the navigation paths in the speech-recognition process in response to the
user's editing of the position of the nodes in the hierarchical list.
13. The computer program of claim 12, wherein altering the navigation paths
comprises interacting with a graphical user interface.
14. The computer program of claim 12, wherein editing the position of a node in
the hierarchical list comprises a drag and drop operation which comprises editing the position
of hierarchically included nodes.
15. The computer program of claim 12, wherein organizing the prompts, actions,
and speech elements in a hierarchy of nodes comprises organizing records in a relational
database.
16. The computer program of claim 12, wherein organizing the prompts, actions,
and speech elements in a hierarchy of nodes comprises adding, deleting, or editing nodes.
17. The computer program of claim 12, further comprising displaying
hierarchically included nodes of a selected group node in a separate list of hierarchically
included nodes.
18. The computer program of claim 12, wherein group nodes in the hierarchical
list can be expanded to display hierarchically included nodes or collapsed to hide
hierarchically included nodes.
19. The computer program of claim 12, wherein the speech-recognition process
comprises a call routing process.
20. The computer program of claim 19, wherein terminal nodes comprise phone
extensions.
21. The computer program of claim 20, wherein actions comprise forwarding a
call to a phone extension when the user navigates to a phone extension terminal node.
22. The computer program of claim 19, wherein 2 terminal nodes comprise speech
files.
PCT/US1999/004747 1998-02-27 1999-03-01 Controlling navigation paths of a speech-recognition process WO1999044345A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2000533989A JP2002505556A (en) 1998-02-27 1999-03-01 Controlling the course of the speech recognition process
KR1020007009503A KR20010086258A (en) 1998-02-27 1999-03-01 Controlling navigation paths of a speech-recognition process
AU29826/99A AU2982699A (en) 1998-02-27 1999-03-01 Controlling navigation paths of a speech-recognition process
EP99911100A EP1057317A2 (en) 1998-02-27 1999-03-01 Controlling navigation paths of a speech-recognition process

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US3226698A 1998-02-27 1998-02-27
US09/032,266 1998-02-27

Publications (2)

Publication Number Publication Date
WO1999044345A2 true WO1999044345A2 (en) 1999-09-02
WO1999044345A3 WO1999044345A3 (en) 1999-10-21

Family

ID=21864005

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/004747 WO1999044345A2 (en) 1998-02-27 1999-03-01 Controlling navigation paths of a speech-recognition process

Country Status (5)

Country Link
EP (1) EP1057317A2 (en)
JP (1) JP2002505556A (en)
KR (1) KR20010086258A (en)
AU (1) AU2982699A (en)
WO (1) WO1999044345A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297616A (en) * 2019-05-31 2019-10-01 百度在线网络技术(北京)有限公司 Talk about generation method, device, equipment and the storage medium of art

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020023294A (en) * 2002-01-12 2002-03-28 (주)코리아리더스 테크놀러지 GUI Context based Command and Control Method with Speech recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0317480A2 (en) * 1987-11-19 1989-05-24 International Business Machines Corporation Graphical menu tree
WO1993006680A1 (en) * 1991-09-24 1993-04-01 Active Voice Corporation Configurable telephone interface for electronic devices
US5414809A (en) * 1993-04-30 1995-05-09 Texas Instruments Incorporated Graphical display of data
US5493606A (en) * 1994-05-31 1996-02-20 Unisys Corporation Multi-lingual prompt management system for a network applications platform
WO1996016500A1 (en) * 1994-11-22 1996-05-30 Voysys Corporation Voice response system with programming language extension

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0317480A2 (en) * 1987-11-19 1989-05-24 International Business Machines Corporation Graphical menu tree
WO1993006680A1 (en) * 1991-09-24 1993-04-01 Active Voice Corporation Configurable telephone interface for electronic devices
US5414809A (en) * 1993-04-30 1995-05-09 Texas Instruments Incorporated Graphical display of data
US5493606A (en) * 1994-05-31 1996-02-20 Unisys Corporation Multi-lingual prompt management system for a network applications platform
WO1996016500A1 (en) * 1994-11-22 1996-05-30 Voysys Corporation Voice response system with programming language extension

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297616A (en) * 2019-05-31 2019-10-01 百度在线网络技术(北京)有限公司 Talk about generation method, device, equipment and the storage medium of art
CN110297616B (en) * 2019-05-31 2023-06-02 百度在线网络技术(北京)有限公司 Method, device, equipment and storage medium for generating speech technology

Also Published As

Publication number Publication date
AU2982699A (en) 1999-09-15
KR20010086258A (en) 2001-09-10
JP2002505556A (en) 2002-02-19
WO1999044345A3 (en) 1999-10-21
EP1057317A2 (en) 2000-12-06

Similar Documents

Publication Publication Date Title
JP4460305B2 (en) Operation method of spoken dialogue system
US6789064B2 (en) Message management system
US7958151B2 (en) Voice operated, matrix-connected, artificially intelligent address book system
US6839671B2 (en) Learning of dialogue states and language model of spoken information system
Whittaker et al. SCANMail: a voicemail interface that makes speech browsable, readable and searchable
US5493606A (en) Multi-lingual prompt management system for a network applications platform
US9031214B2 (en) System and method of use for indexing automated phone systems
US6356869B1 (en) Method and apparatus for discourse management
US6460057B1 (en) Data object management system
US6163596A (en) Phonebook
US7877261B1 (en) Call flow object model in a speech recognition system
US8355918B2 (en) Method and arrangement for managing grammar options in a graphical callflow builder
US8019057B2 (en) Systems and methods for generating and testing interactive voice response applications
US7747442B2 (en) Speech recognition application grammar modeling
US20040193403A1 (en) Disambiguating results within a speech based IVR session
US20040054538A1 (en) My voice voice agent for use with voice portals and related products
WO1999044345A2 (en) Controlling navigation paths of a speech-recognition process
JP4890721B2 (en) How to operate a spoken dialogue system
Marx Toward effective conversational messaging
CA3005710C (en) System and method for multi-language communication sequencing
Attwater et al. Towards fluency-structured dialogues with natural speech input
KR100285502B1 (en) Method for building phonetic database
CN109920426A (en) Equipment operation flow control method and system based on intelligent sound
Cappellini et al. JULIA: An Intelligent System Allowing Local and Remote Access for Information Requests into Office Communication Terminals
Niedermair A flexible call-server architecture for multi-media and speech dialog systems

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AU CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

AK Designated states

Kind code of ref document: A3

Designated state(s): AU CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1999911100

Country of ref document: EP

ENP Entry into the national phase in:

Ref country code: JP

Ref document number: 2000 533989

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1020007009503

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 1999911100

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020007009503

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1020007009503

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1999911100

Country of ref document: EP