US20080134020A1 - Method and system for the generation of a voice extensible markup language application for a voice interface process - Google Patents

Method and system for the generation of a voice extensible markup language application for a voice interface process Download PDF

Info

Publication number
US20080134020A1
US20080134020A1 US11/877,571 US87757107A US2008134020A1 US 20080134020 A1 US20080134020 A1 US 20080134020A1 US 87757107 A US87757107 A US 87757107A US 2008134020 A1 US2008134020 A1 US 2008134020A1
Authority
US
United States
Prior art keywords
states
application
audio
format
xml
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/877,571
Inventor
Ramy M. Adeeb
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/877,571 priority Critical patent/US20080134020A1/en
Publication of US20080134020A1 publication Critical patent/US20080134020A1/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TELLME NETWORKS, INC.
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/88Mark-up to mark-up conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/35Creation or generation of source code model driven
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Definitions

  • Embodiments of the present invention relate to the field of data processing systems having an audio user interface and is applicable to electronic commerce. More particularly, embodiments of the present invention relate generally to the generation of markup language applications for a voice interface process.
  • caller refers generically to any user interacting over an voice interface, whether via telephone or otherwise.
  • a number of these types of phone services utilize computer implemented automatic voice recognition tools (e.g., automated speech recognition systems) to allow a computer system to understand and react to a caller's spoken commands and information.
  • automated speech recognition systems e.g., automated speech recognition systems
  • the caller listens to information and prompts provided by the service and can speak to the service giving it commands and other information, thus forming a voice interface.
  • these phone services can be integrated within the world wide web (e.g., Internet) to move audio data efficiently across the web to a telephonic user. More and more web devices will be developed to take advantage of the internet infrastructure for providing information data. In particular, voice can be used to interface with these phone services.
  • world wide web e.g., Internet
  • the phone service via a voice interface performs some task as requested or commanded by the user of the voice interface (e.g., information retrieval, electronic commerce, voice dialing, etc.).
  • some task as requested or commanded by the user of the voice interface (e.g., information retrieval, electronic commerce, voice dialing, etc.).
  • a computer implemented application is written that provides the instructions necessary for allowing the user to interact with the voice interface to accomplish the task.
  • VXML Voice Extensible Markup Language
  • the VXML language is a web-based markup language for representing human to computer dialogs, and is analogous to the Hypertext Markup Language (HTML).
  • HTML Hypertext Markup Language
  • the VXML language interacts with a voice browser that outputs audio that is either recorded or computer generated. Also, the VXML language assumes that input through voice or telephone pad is provided as audio input.
  • VXML as a high-level, domain-specific markup language is currently being proposed to the World Wide Web Consortium (W3C) as the standard language for voice applications over the voice web marketplace.
  • W3C World Wide Web Consortium
  • VXML application for a particular phone service can be particularly time consuming and an inefficient use of human resources once the actual coding process begins.
  • the process includes creating the design documents that outline the overall voice interface process as envisioned by the customer and the voice application developer.
  • the voice application is coded by hand in VXML from the design documentation to provide the instructions necessary for the user to interact with a phone service using the voice interface through a network.
  • a software developer is assigned the task of coding each of the various steps required in the voice interface process. At times, this becomes a redundant exercise as many sequences of instructions and various parts of the coded instructions are repeatedly used throughout the final coded voice application. Furthermore, as the voice interface process becomes more complex, the amount of repetition and the chance for error in writing the code increases.
  • additional documentation may be provided to the phone service in support of the voice application.
  • this additional documentation provides for further representations of the VXML application in a coded format (e.g., a web based representation of the voice interface process).
  • coded format e.g., a web based representation of the voice interface process
  • various embodiments of the present invention disclose a method and system for an extensible framework from which a Voice Extensible Markup Language (VXML) application can be automatically generated from design documentation of a voice interface process, thus utilizing human resources more efficiently, and reducing the chance for errors in writing the coded application.
  • VXML Voice Extensible Markup Language
  • embodiments of the present invention allow for the automatic generation of various other representations of a voice interface process, such as, hypertext markup language (HTML) documentation, or any other application based markup.
  • HTML hypertext markup language
  • embodiments of the present invention describe a method and system for Extensible Markup Language (XML) application transformation.
  • XML Extensible Markup Language
  • a method is disclosed for the automatic generation of markup language applications (e.g., a VXML application) for a voice interface process.
  • a call flow diagram is converted into a list of states in an XML format.
  • the call flow diagram is part of the design documentation that describes the steps to the voice interface process.
  • Each of the steps in the call flow diagram is represented by a state in the list of states. Descriptions relating to the type of state and the next transition state are included in the list of states.
  • the list of states is a high level and intermediate representation of the call flow diagram.
  • a lookup table of entries in XML is created to map audio prompts and their audio files with corresponding audio states in the list of states.
  • the lookup table of entries is created from a textual format of a spreadsheet that displays a plurality of audio prompts for audio files and their corresponding textual representations with their corresponding states that play an audio file. More particularly, the lookup table of entries comprises an audio path to the location of each of the particular audio files, or the particular audio file itself.
  • an intermediate application is created in the XML format by starting from the list of states along with their corresponding state and transition information, and in particular, merging corresponding entries in the lookup table with associated audio states.
  • the intermediate application at this point is still a high-level XML representation of the call flow diagram and the voice interface process.
  • the XML representation provides for a well defined and highly flexible representation of the voice interface process.
  • the intermediate application is then transformed into a second application of a second format that is a representation of the call flow diagram. Since the intermediate application is in a structured and well defined extensible XML format, transformation to other extensible and non-extensible markup languages is possible.
  • the second application is in a VXML format.
  • the second application is in an HTML format to provide for web page documentation of the voice interface process.
  • the second application is in a text format to provide for test case documentation in a quality assurance capacity.
  • each of the states and their associated information in the intermediate XML representation is transformed into preliminary VXML instructions. This is accomplished using a standard template that corresponds to the particular state that is being transformed.
  • Second, features that have not been implemented in the XML code for the intermediate XML representation is fully expanded in the VXML code format. This provides for a detailed coded implementation of the voice interface process.
  • Third, optimization of the VXML code is performed in order to streamline and conform to the VXML format. In particular, redundant states or steps are eliminated and various “if” steps are combined.
  • FIG. 1 is a logical block diagram of a computer system with Extensible Markup Language (XML) transformation capabilities, in accordance with one embodiment of the present invention.
  • XML Extensible Markup Language
  • FIG. 2 is a flow chart of steps in a method for the transformation of design documentation into a web based application that is a detailed representation of the call flow of the design documentation, in accordance with one embodiment of the present invention.
  • FIG. 3 is a data flow diagram illustrating the flow of data through the application generator, in accordance with one embodiment of the present invention.
  • FIG. 4 is a data flow diagram illustrating the merging of audio states including audio prompts from a look-up table with corresponding states that play audio files corresponding to the audio prompts during the creation of the intermediate XML application, in accordance with one embodiment of the present invention.
  • FIG. 5 is a flow chart of steps in a method for the transformation of design documentation to a VXML application that is a detailed representation of the call flow from the design documentation, in accordance with one embodiment of the present invention.
  • FIG. 6 is a data flow diagram illustrating the flow of data to transform scripts of states in the intermediate XML application into default preliminary VXML instructions, in accordance with one embodiment of the present invention.
  • FIG. 7 is an exemplary call flow diagram of steps in a first module of states for services performed in connection with accessing account information via a voice interface process, in accordance with one embodiment of the present invention.
  • FIG. 8 is an diagram of an exemplary web page illustrating the transformation of the intermediate application into the hypertext markup language format, in accordance with one embodiment of the present invention.
  • FIG. 1 is a block diagram of exemplary embedded components of such a computer system 100 upon which embodiments of the present invention may be implemented.
  • Exemplary computer system 100 includes an internal address/data bus 120 for communicating information, a central processor 101 coupled with the bus 120 for processing information and instructions, a volatile memory 102 (e.g., random access memory (RAM), static RAM dynamic RAM, etc.) coupled with the bus 120 for storing information and instructions for the central processor 101 , and a non-volatile memory 103 (e.g., read only memory (ROM), programmable ROM, flash memory, EPROM, EEPROM, etc.) coupled to the bus 120 for storing static information and instructions for the processor 101 .
  • Computer system 100 may also include various forms of disc storage 104 for storing large amounts of information.
  • an optional signal Input/Output device 108 is coupled to bus 120 for providing a communication link between computer system 100 and a network environment.
  • signal Input/Output (I/O) device 108 enables the central processor unit 101 to communicate with or monitor other electronic systems or analog circuit blocks that are coupled to the computer system 100 .
  • the computer system 100 is coupled to the network (e.g., the Internet) using the network connection, I/O device 108 , such as an Ethernet adapter coupling the electronic system 100 through a fire wall and/or a local network to the Internet.
  • An output mechanism may be provided in order to present information at a display 105 or print output for the computer system 100 .
  • input devices 107 such as a keyboard and a mouse may be provided for the input of information to the computer system 100 .
  • various embodiments of the present invention disclose a method and system for an extensible framework from which various markup language applications can be automatically generated from design documentation of a voice interface process, thus utilizing human resources more efficiently.
  • embodiments of the present invention allow for the automatic generation of various other representations of a voice interface process, such as, Hypertext Markup Language (HTML) documentation, or any other application based markup.
  • HTML Hypertext Markup Language
  • the extensible framework generates a VXML application as a representation of a voice interface and is implemented via a gateway system running voice browsers that interpret a voice dialog markup language in order to deliver web content and services to telephone and other wireless devices.
  • the VXML language is a web-based markup language for representing human to computer dialogs, and is analogous to HTML.
  • the VXML language assumes a voice browser with audio output that is either recorded or computer generated. Also, the VXML language assumes that audio input through voice or telephone pad is provided as audio input.
  • VXML is an XML application that defines a tree-like structure that the user can traverse through using voice commands.
  • a VXML Document Type Definition (DTD) defines the structure and grammar of a particular VXML application or related applications.
  • FIG. 2 is a flow chart 200 of steps in a computer implemented method for the generation of applications from design documents describing a voice interface process, in accordance with one embodiment of the present invention.
  • the method describes an extensible framework from which the generation of markup language applications from design documentation of a voice interface process is possible.
  • the process disclosed in FIG. 2 is first discussed to provide a general overview to the method of generating a VXML application from design documentation. The particularities of the method is discussed in more detail with respect to the figures following FIG. 2 .
  • the present embodiment begins by converting a call flow diagram into a list of states in an XML format, in step 210 of FIG. 2 .
  • the list of states comprises a finite state machine.
  • the call flow diagram outlines each of the steps implemented in a voice interface process.
  • the list of states describes each of the steps in a voice interface process as outlined in the call flow diagram.
  • the list of states provides for a high level representation of the call flow diagram of the voice interface process.
  • the present embodiment creates a lookup table of audio states in the XML format that maps audio prompts to audio files to corresponding audio states in the list of states.
  • the lookup table of audio states comprises an audio path that describes the web based path to the location of the audio file, and a textual representation of the audio file.
  • the lookup table of audio states comprises the actual audio file itself along with the textual representation of the audio file.
  • the present embodiment creates an intermediate application representing the voice interface process in the aforementioned XML format.
  • the intermediate application is created by merging the lookup table of audio states into the list of states.
  • audio states in the lookup table are merged into corresponding states in the list of states playing an audio playback from an associated audio file.
  • the present embodiment transforms the intermediate application in the XML format into a second application of a second markup language format.
  • the second application is of a HTML format
  • the second application is a source code for generating a web page comprising a tabular representation of the list of states including links between related states.
  • the present embodiment transforms the intermediate application in the XML format into a second application of a VXML format.
  • the generated VXML application is a static representation of the call flow diagram describing the voice interface, in one embodiment.
  • the static nature of the VXML application of the voice interface process allows the voice interface to be implemented in any browser environment using any supporting electronic device.
  • FIG. 3 is a data flow diagram 300 illustrating the transformation of the design documentation describing a voice interface process into various applications representing the voice interface process through an markup application generator 310 , in accordance with one embodiment of the present invention.
  • stage 1 of the data flow diagram 300 the user interface design of the voice interface process is documented as a call flow diagram 320 .
  • the call flow diagram 320 is a flow chart outlining the various steps and procedures necessary to implement the voice interface process. As such, the call flow diagram 320 is a high-level representation of the voice interface process.
  • the user interface design of the voice interface process is documented as a master script 325 .
  • the master script 325 represents a set of audio states with the audio prompts that are associated with corresponding states in the list of states that play an audio file. More particularly, the master script comprises the audio path through a network to each of the locations of audio files played by those states that play an audio file.
  • the corresponding textual representations of the audio files are included within the master script 325 .
  • the actual audio file can be contained in the master script 325 , in one embodiment. As such, the audio path or audio files and their corresponding textual representations can be cross-referenced with the corresponding states that play an audio file.
  • the master script 325 is created in a textual format, such as, the Excel spreadsheet format, and can be saved as a tab delimited text file. Moreover, the master script is written in normal script and not concatenated script, in one embodiment.
  • Both the call flow diagram 320 and the master script is inputted into the application generator 310 in stage 2 of FIG. 3 , in one embodiment.
  • the call flow diagram 320 is converted into the XML format that conforms to a control flow language (CFL) outlined by a document type definition (DTD), in one embodiment.
  • CFL control flow language
  • DTD document type definition
  • the CFL document is an XML representation of an application consisting of one or more modules.
  • Each of the modules is a collection of states, or more accurately, a finite state machine.
  • the CFL document is a list of states 330 .
  • Each of the states include the type of state, the name of the state, and the transitions between states.
  • Embodiments of the present invention enable the conversion to the CFL format through a transformation script or through a web interface.
  • the call flow diagram 320 is created using the Microsoft Visio application.
  • the application generator 310 By following a predetermined set of rules for representing the user interface design of a voice interface process in Visio, the application generator 310 through a transformation script can automatically transform the call flow diagram into the CFL format.
  • a document type definition (DTD) for XML scripts conforming to the CFL language is outlined below. It is appreciated that the CFL DTD is exemplary only, and that other DTDs can be created to transform the call flow diagram 320 into a corresponding XML format for further transformation.
  • the exemplary CFL DTD is as follows in Table 1:
  • CFL is an XML * representation of the Call flow of a Voice * Application.
  • CFL represents a finite state * machine with a type and a name for each state * and the transitions between states. CFL does not * include any information on the inner components * of the states or the associated output.
  • ⁇ !-- Describes an application as a finite state machine of one or more states --> ⁇ !ELEMENT application (state+)> ⁇ !-- Used to uniquely identify the state.
  • Each state has a type and unique name. Type can be one of six different types: start: start state, has one transition fork, a state where a Boolean decision is evaluated that determines the call flow.
  • FIG. 7 illustrates a call flow diagram 700 of an exemplary voice interface process used as an example throughout this Specification, in accordance with one embodiment of the present invention.
  • the call flow diagram 700 describes a voice interface allowing a user to interact with the consumer services division of a company in order to access an account balance.
  • An exemplary set of rules as outlined in the CFL DTD for representing the user interface design of a voice interface process is outlined in the following paragraphs, and as is shown in FIG. 7 . It is appreciated that the predetermined set of rules can vary depending on the various approaches that can be implemented for transforming the call flow diagram 320 into the CFL language.
  • the Visio call flow is comprised of one or more modules that represent the call flow diagram 320 .
  • a module consists of a finite set of states, wherein each of the states is a represented block or step in the call flow diagram 320 .
  • block 710 represents a non-interactive input state, where the voice interface application is not expecting a response from the user.
  • a module is specified using a set of states connected to each other via state transitions.
  • a module must have exactly one start state. Module names must be unique throughout the application generated from the call flow diagram 320 .
  • a module may reference other modules in the application via module states.
  • modules may be internal (e.g., by copy) or external (e.g., by reference only).
  • An internal module is a module that is not a standalone application.
  • a Classic example is explicit confirmation.
  • internal modules are implemented by replacing the call to the module with the actual module code, hence the synonym “by copy.”
  • an external module is one that can be a stand alone application. Examples of external modules include functions like Main Menu, Address Capture, Package Tracking, and trading. An external module is implemented by referencing the module code, hence the synonym “By Reference.”
  • a state in a module is represented via a block shape in Visio.
  • Each state may have zero or more state transitions depending on its type.
  • a state transition is represented by connecting between the various blocks in the call flow diagram.
  • a state transition may have associated text, depending on the type of the predecessor state. The text associated with state transitions is referred to as transition text.
  • a state must be one of the following types: start, input, binary fork, multiple fork, non-interactive audio, system, magic word, module, and end state.
  • the state type is determined through the shape used to represent the state, as will be discussed as follows:
  • a start state is represented in the call flow diagram 320 using the shape of a circle.
  • Block 705 of FIG. 7 is an example of a start state.
  • a start state must have no predecessor.
  • a start state must have exactly one successor. Transition text coming out of the start state is not required and will be ignored.
  • a start state must have a state name that indicates the name of the module. The name can be specified either through the “State Name” property, or through the actual text inside the state shape.
  • a start state must have a “Module Type” property indicating the type of the module.
  • An input state is represented using the “Input or Form” square box.
  • Block 715 of FIG. 7 is an example of an input state.
  • An input state is one where user is prompted for an input that is then recognized against a grammar.
  • An input state must have one or more predecessor.
  • An input state must have one or more successors. Transitions to the next step or block indicate the input result associated with the transition. At most one transition out of an input state may have no associated text, in which case it will be considered the default transition.
  • the “Audio Path” custom property for an input state must be specified. It must match a path in the associated master script of the lookup table 325 .
  • a binary fork state is represented using the “Fork Decision” diamond box.
  • Block 720 of FIG. 7 is an example of a binary fork state.
  • a binary fork state indicates the performance of a Boolean decision that is either true or false.
  • a binary fork state must have one or more predecessors.
  • a binary fork state must have exactly two successors. Transitions out of the binary fork states must have the associated text “YES” and “NO”.
  • a multiple fork state is represented using the “Fork Decision” diamond box.
  • a multiple fork state indicates forking the call flow into various paths depending on the value of a certain variable or state.
  • a multiple fork state must have one or more predecessor.
  • a multiple fork state must have at least two successors. Transitions out of the multiple fork state can have associated text. At most one transition out of a multiple fork state may have no associated text, in which case it will be considered the default transition.
  • a non-interactive audio state is represented using the “non-interactive audio” box.
  • Block 725 of FIG. 7 represents a non-interactive audio block.
  • a non-interactive audio state must have one or more predecessor.
  • a non-interactive audio state must have exactly one successor. Transition text coming out of the non-interactive audio block is not necessary.
  • the “audio path” property for a non-interactive audio state must be specified. It must match a path in the associated lookup table in the master script 325 .
  • the non-interactive state has a required “Function” property.
  • the “function” can be either “Queue Audio” or “Queue and Play Audio”. “Queue Audio” is the default value and means the audio will be queued but will not be played until the next listen state. “Queue and Play Audio” means the audio will be played in the current state. If the audio is played, no special state grammar will be active but the user will be allowed to utter any of universal commands recognized by the application generated by the application generator 310 .
  • a system process state represents one of the various system functions.
  • Block 725 of FIG. 7 illustrates a system state.
  • a system process state must have one or more predecessor.
  • a system process state may have 0 or 1 successors depending on the system function.
  • Functions include: Transfer, Record, Application Programming Interface (API) Call, Data, and Disconnect.
  • Transfer Function represents a call transfer, and may or may not have a successor.
  • Record represents a recording state.
  • a Record state must have one successor.
  • the API Call is a call to an external API through the data tag. API Calls must have one successor.
  • the data function is where actual manipulation of data takes place. Data manipulation implies assigning values to variables that are used later in the application. Data functions must have one successor.
  • Disconnect function ends the call by hanging up on the user.
  • a disconnect function may have no successors implying end of the call, or may have one successor implying post hang up processing.
  • a magic-word content audio state is represented using the “magic-word content” box.
  • the application implementing the call flow diagram 320 can be interrupted with a particular “magic-word,” but otherwise in not interruptible.
  • a magic-word content state must have one or more predecessor.
  • a magic-word content state must have exactly one successor. Transition text on coming out of the magic-word state is not necessary.
  • a module state is represented using the “subroutine or module” box.
  • Block 730 of FIG. 7 illustrates a module block.
  • a module state must have one or more predecessor.
  • a Module state may have either no or one successors.
  • a module is allowed to have a successor if and only if the actual called module has a return state. The actual module to be called needs to be specified either through the “Module” property of the state. If the “Module” property is empty, the state text is used instead.
  • An end state is represented using the “End” circle box.
  • An end state is only allowed in internal modules. External Modules may or may not have an end state.
  • An end state must have one or more predecessor.
  • An end state can not have a successor.
  • An end state must be one of two types: “Return” end state or “Reprompt” end state.
  • the end state type is specified through the state text.
  • a “Return” state implies returning from the current module. The transition to the return state is replaced with a transition to the, then required, successor to the calling module state.
  • a “Reprompt” state implies transitioning to a previously visited prompt state. The transition to the “Reprompt” state will be replaced with a transition to the first input state that is a predecessor of the actual module state.
  • the example of the list of states 330 in the CFL language corresponds to a portion of the blocks in FIG. 7 (blocks 705 , 710 , 715 , 717 , 718 , 719 , 720 , and 725 ) as outlined below in Table 2. Corresponding blocks are noted in enclosed brackets, such as, (block 705 ).
  • the master script 325 text document containing the audio prompts is converted into the XML format that conforms to a master script language (MSL) outlined by a document type definition (DTD), in one embodiment.
  • MSL master script language
  • DTD document type definition
  • the MSL document is an XML representation of the states that play an audio file.
  • the MSL document represents a look-up table of audio states 335 with the audio prompts necessary for states in the list of states to play their associated audio files. Conversion to the look-up table of audio states 335 corresponds to step 220 of FIG. 2 .
  • Embodiments of the present invention enable the conversion to the MSL language through a transformation script or through a web interface.
  • a document type definition (DTD) for XML scripts conforming to the MSL language is outlined below. It is appreciated that the MSL DTD is exemplary only, and that other DTDs can be created to transform the master script 325 into a corresponding XML format for further transformation.
  • the exemplary MSL DTD is as follows in Table 3:
  • MSL is an XML * representation of the Master Script submitted * with a Voice Application.
  • MSL represents a set * of states with the audio prompts played in each * state. MSL does not describe the transitions * between the states or their relationship to each * other.
  • --> ⁇ !-- Describes an application as a set of one or more states --> ⁇ !ELEMENT application (state+)> ⁇ !-- Used to uniquely identify the state.
  • Each state has a name and an optional audiopath as attributes.
  • a state can have audio elements as direct children, or can have audio elements grouped together under some sub-stat one of; ni1. ni2, nm1, nm2, nm3, and help.
  • ⁇ !ATTLIST state name ID #REQUIRED audiopath CDATA #IMPLIED > ⁇ !-- audio can be either a file, or the playback of some variable, such as on playing back a phone number obtained at the state GetPhoneNumber. In this case value will be GetPhoneNumber and type will be phoneNumber.
  • look-up table of audio states 335 in the MSL language of the XML format is provided below.
  • the example of the look-up table of audio states in the MSL language corresponds to block 717 of FIG. 7 as outlined below in Table 4.
  • the intermediate presentation II, the list of states 330 , which conforms to the CFL language, and the look-up table of audio states 335 , which conforms to the MSL language, are combined together into an XML representation of the entire user interface design documents (e.g., the call flow diagram 320 and the master script 325 ).
  • the combined XML representation is referred to an intermediate XML application, and corresponds to step 240 of FIG. 2 .
  • the combined XML representation is referred to as the Tellme User Interface Design Language, or TUIDL.
  • TUIDL Tellme User Interface Design Language
  • the TUIDL document represents an application as a set of modules. Each module is a finite state machine. The actual content of the state and the transition between states is explicitly specified as a high level representation of the voice interface process.
  • a document type definition (DTD) for XML scripts conforming to the TUIDL language is outlined below. It is appreciated that the TUIDL DTD is exemplary only, and that other DTDs can be created to merge the look-up table 335 of audio states with the list of states 330 master script 325 .
  • the exemplary TUIDL DTD is as follows in Table 5:
  • TUIDL * is an XML representation of the complete design * of the User Interface Voice Application.
  • TUIDL * represents an application as a set of modules.
  • Each module is a finite state machine. The actual * content of the state and the transition between * states is explicitly specified.
  • * of each state --> ⁇ !-- Describes an application as a finite state machine of one or more modules --> ⁇ !ELEMENT application (module+)> ⁇ !-- Used to uniquely identify a module.
  • Each module has a type and unique name.
  • Type can be either internal or external --> ⁇ !ELEMENT module (state+)> ⁇ !ATTLIST module name ID #REQUIRED type (internal
  • Children include: transition: transition to the next state property: Set of state specific properties feature: UI Features to be applied to the state --> ⁇ !ELEMENT state (property
  • help)*> ⁇ !-- Attributes for a state include: name: Required ID audiopath. Required for states where audio is queued type.
  • the merging of the list of state 330 in the CFL language and the look-up table of audio states 335 in the MSL language is accomplished through mapping the audiopath properties of the various states of the CFL document 330 with the audio path of the various states of the master script 335 .
  • States in the CFL document 330 may maintain a many to one relationship with states in the MSL document 335 , e.g., more than one state in the CFL document 330 may map to the same audio state playing an audio file in the MSL document 335 . However, at most one audio state in the MSL document 335 may map into a state in the CFL document.
  • the merging of the look-up table of audio states 335 with the corresponding audiopath properties of states playing an audio file in the list of states 330 corresponds to step 230 of FIG. 2 .
  • the merging of the audiopath properties into corresponding states playing an audio file in the list of states is a high level XML representation of the voice interface process.
  • FIG. 4 is a data flow diagram 400 illustrating the merging of the audio prompts in the look-up table 335 of audio states with corresponding states in the list of states 330 conforming to the CFL language.
  • a module 410 is presented in a state machine format.
  • a collection of states 415 comprises module 410 and includes a states 1 , 2 , 3 , 4 , etc.
  • State 2 containing states 417 and state 4 containing state 419 are states that play an audio file.
  • audio path properties are contained in audio script for each of the states in the list of states that play an audio file.
  • a plurality of audio states 420 containing audio prompts for each of the states playing an audio file comprises the look-up table 335 in the MSL language.
  • the audio states refer to audiopath properties for the playing of the audio files. For example, the audiopath properties 425 for input state 2 and the audio path properties 427 for the audio state 4 are illustrated.
  • the list of states in the CFL language is merged with the look-up table 335 containing the audio path properties for audio files that are played, in one embodiment of the present invention.
  • each of the audio path properties are incorporated directly into corresponding states that play an audio file.
  • the audio path properties 425 for state 2 are directly incorporated into state 417 corresponding to input state 2 .
  • the audio path properties 427 for state 4 are directly incorporated into the state 419 corresponding to input state 4 .
  • An exemplary example of the intermediate XML application 340 in the TUIDL language is provided below, and corresponds to a portion of the blocks in FIG. 7 (blocks 717 , 725 , and 720 ) as outlined below in Table 6. Corresponding blocks are noted in enclosed brackets, such as, (block 717 ).
  • the audio prompts are not separated from the call flow diagram 320 .
  • the CFL document 330 and the MSL document 335 would be unnecessary.
  • two inputs are directly used in part 2 of stage 2 , the intermediate presentation II.
  • the list of states, and corresponding audio paths with their textual representations are used to create the intermediate XML application that represents the voice interface process.
  • the application generator 310 establishes an extensible framework allowing the generation of the various markup language application from the design documentation.
  • the extensible manner of the application generator 200 allows for the generation of VXML application, HTML applications, or any other application based markup applications, as an output.
  • the intermediate XML application 340 is transformed into applications of various formats, in one embodiment of the present invention.
  • the XML format is a general and highly flexible representation of any type of data. As such, transformation to any markup language based application can be systematically performed in an extensible manner.
  • the application generator 310 can transform the intermediate XML application 340 into a VXML application 350 that is a static representation of the call flow diagram 320 , in one embodiment.
  • the static nature of the VXML application 350 of the voice interface-process allows the voice interface to be implemented in any browser environment using any supporting electronic device.
  • the application generator 310 can also transform the intermediate XML application 340 into an HTML application 360 , in one embodiment.
  • the HTML application 360 is a source code for generating a web page comprising a tabular representation of the list of states with links between related states.
  • FIG. 8 is a diagram illustrating the web page or the HTML document 800 for block 717 of FIG. 7 which corresponds to the “DemoMainGetHomePhone” state.
  • the HTML document 800 corresponds to the voice interface process as outlined in the call flow diagram 320 .
  • the directory name for the state is presented in cell 810 .
  • the various audio prompts and files that are played are displayed in logical fashion to present an overall process view of the voice interface. For example, the main prompt is presented in cell 820 .
  • the transition state is presented in cell 860 .
  • links to other states in the HTML document 800 can also be provided, in one embodiment. As such, by clicking on the link to “UsedVoice,” the portion of the HTML document corresponding to the “UsedVoice” state would be presented.
  • the application generator 310 can also transform the intermediate XML application 340 into any other application based markup, or any textual format, in one embodiment of the present invention.
  • the application generator 310 can transform the XML application 340 into an application of a text format, wherein the textual application is a quality assurance (QA) application that is used for testing performance of the VXML application 350 .
  • QA quality assurance
  • the application generator 310 is not limited to creating certain functionalities of a voice interface application, but is designed in an extensible fashion allowing the generation of VXML coded applications that can perform any task, as long as the task can be represented in a clear and well defined set of VXML instructions.
  • FIG. 5 is a flow chart 500 of steps illustrating a method for converting the intermediate XML application 340 in the TUIDL language into a VXML application 350 , in accordance with one embodiment of the present invention.
  • the conversion occurs in a three step process.
  • step 510 the present embodiment transforming each state in the intermediate XML application into preliminary VXML instructions.
  • Standard templates are used to convert each state in the intermediate XML application 340 into a default VXML instruction or representation.
  • FIG. 6 is a diagram illustrating the application of the standard templates to convert states in the intermediate XML application-into VXML instructions.
  • FIG. 6 corresponds to the process illustrated in step 510 of FIG. 5 .
  • the script 610 for state “x” in the intermediate XML application has a defined state type.
  • the standard template for the state type corresponding to state “x” is applied to the script 610 in the conversion process to VXML instructions.
  • a plurality of standard templates 610 can be applied to the script 610 in order to convert the script for state “x” into VXML instructions.
  • Embodiments of the present inventions include numerous standard templates for converting script for states into default VXML instructions, including numerous standard templates for a single type of state. The selected standard templates are chosen according to design preference.
  • the plurality of standard templates includes the start state template 612 .
  • the template 612 would be applied to the script 610 to generate preliminary VXML instructions 620 .
  • the template 614 would be applied to the script 610 to generate corresponding preliminary VXML instructions 620 .
  • the template 614 would be applied to the script 610 to generate corresponding preliminary VXML instructions 620 . This process would occur for every state in the intermediate XML application.
  • VXML instructions are outlined below in Table 7:
  • ⁇ /audio> ⁇ /noinput> ⁇ /help> ⁇ audio expr ′′appsAudioRootPath + ‘0300_demo/main/get_home_phone/help.wav’”> Please say or enter your home phone number.
  • step 520 the present embodiment expands features embedded in the states in the intermediate XML application to be included in the preliminary VXML instructions.
  • user interface features are applied to the generated VXML instructions implementing commonly used logic and functionality.
  • features are coded tasks that are used over and over in various applications. The code is repeated in the various applications.
  • User interface features are applied through the manipulation of the document object model that are generated by the standard templates 610 of FIG. 6 .
  • the actual code need not be entered until the last phase of the transformation process, during the feature expansion phase.
  • predetermined instructions can be substituted in the VXML instructions that correspond to the features. This is done for each of the features that are embedded in the preliminary VXML instructions.
  • Table 8 illustrates how the feature named “UsedVoice” as shown in Table 7 is expanded with the appropriate code, as follows:
  • step 530 the present embodiment optimizes the preliminary VXML instructions. Optimization paths are then performed to clean up the code. Optimizations include eliminating redundant states, and combining various “if” conditions together.
  • VXML instructions in Table 7 have separate instructions for Form “Used Voice” and for Form “AniLookup,” as is illustrated below in Table 9:
  • each of the steps 510 , 520 , and 530 can be customized to meet certain output requirements, in accordance with embodiments of the present invention.
  • the transformation into the VXML application of the voice interface process includes the generation of necessary and accompanying code written in the Java Script language, in accordance with one embodiment of the present invention.
  • the VXML language integrates Java Script in order to support operations that the VXML language normally cannot support.
  • supporting Java Script code is integrated within the VXML application to support the necessary and accompanying operations representing the voice interface process.
  • each of the steps in the flow charts of FIGS. 2 and 5 are executed automatically, in accordance with one embodiment of the present invention.
  • the design documents e.g., the call flow diagram 330 and the master script 335
  • the appropriate VXML instructions in the VXML application of the voice interface can be automatically generated.
  • HTML documentation of the voice interface process can be generated automatically.
  • other markup based language documents can be generated automatically, such as quality assurance applications, and other markup based language applications that are representations of the voice interface process.
  • Embodiments of the present invention a method and system for the generation of markup language applications (e.g., a VXML application) for a voice interface process, are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the below claims.
  • markup language applications e.g., a VXML application

Abstract

A method and system for Extensible Markup Language (XML) application transformation. Specifically, in one embodiment, a method is disclosed for the generation of markup language applications (e.g., a VXML application) for a voice interface process. First, a call flow diagram is converted into a in an XML format. The call flow diagram describes the voice interface process. Next, a lookup table of entries in XML is created by mapping a plurality of audio files and their corresponding textual representations with audio states in the. Then, an intermediate application is created in the XML format from the by merging corresponding entries in the lookup table with the audio states. Finally, the intermediate application is transformed into a second application of a second markup language format that is a static representation of the call flow diagram.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • Embodiments of the present invention relate to the field of data processing systems having an audio user interface and is applicable to electronic commerce. More particularly, embodiments of the present invention relate generally to the generation of markup language applications for a voice interface process.
  • 2. Related Art
  • As computer systems and telephone networks modernize, it has become commercially feasible to provide information to users or subscribers over a voice interface, e.g., telephone and other audio networks and systems. These services allow users, i.e., “callers,” to interface with a computer system for receiving and entering information. As used herein, “caller” refers generically to any user interacting over an voice interface, whether via telephone or otherwise.
  • A number of these types of phone services utilize computer implemented automatic voice recognition tools (e.g., automated speech recognition systems) to allow a computer system to understand and react to a caller's spoken commands and information. This has proven to be an effective mechanism for providing information since telephone systems are ubiquitous, familiar to most people and relatively easy to use, understand and operate. When connected, the caller listens to information and prompts provided by the service and can speak to the service giving it commands and other information, thus forming a voice interface.
  • Additionally, these phone services can be integrated within the world wide web (e.g., Internet) to move audio data efficiently across the web to a telephonic user. More and more web devices will be developed to take advantage of the internet infrastructure for providing information data. In particular, voice can be used to interface with these phone services.
  • The phone service via a voice interface performs some task as requested or commanded by the user of the voice interface (e.g., information retrieval, electronic commerce, voice dialing, etc.). Once the task is understood and an overall process is outlined for accomplishing the task, a computer implemented application is written that provides the instructions necessary for allowing the user to interact with the voice interface to accomplish the task.
  • In particular, instructions for implementing the process can be written in the Voice Extensible Markup Language (VXML). The VXML language is a web-based markup language for representing human to computer dialogs, and is analogous to the Hypertext Markup Language (HTML). The VXML language interacts with a voice browser that outputs audio that is either recorded or computer generated. Also, the VXML language assumes that input through voice or telephone pad is provided as audio input. Additionally, VXML as a high-level, domain-specific markup language is currently being proposed to the World Wide Web Consortium (W3C) as the standard language for voice applications over the voice web marketplace.
  • Creating the particular VXML application for a particular phone service can be particularly time consuming and an inefficient use of human resources once the actual coding process begins. To create the VXML application, the process includes creating the design documents that outline the overall voice interface process as envisioned by the customer and the voice application developer. Next, the voice application is coded by hand in VXML from the design documentation to provide the instructions necessary for the user to interact with a phone service using the voice interface through a network.
  • Typically, a software developer is assigned the task of coding each of the various steps required in the voice interface process. At times, this becomes a redundant exercise as many sequences of instructions and various parts of the coded instructions are repeatedly used throughout the final coded voice application. Furthermore, as the voice interface process becomes more complex, the amount of repetition and the chance for error in writing the code increases.
  • Moreover, once the VXML application is completed, additional documentation may be provided to the phone service in support of the voice application. Usually this additional documentation provides for further representations of the VXML application in a coded format (e.g., a web based representation of the voice interface process). However, additional time and resources are necessary to generate and code these further representations of the VXML application.
  • SUMMARY OF THE INVENTION
  • Accordingly, various embodiments of the present invention disclose a method and system for an extensible framework from which a Voice Extensible Markup Language (VXML) application can be automatically generated from design documentation of a voice interface process, thus utilizing human resources more efficiently, and reducing the chance for errors in writing the coded application. Moreover, embodiments of the present invention allow for the automatic generation of various other representations of a voice interface process, such as, hypertext markup language (HTML) documentation, or any other application based markup.
  • Specifically, embodiments of the present invention describe a method and system for Extensible Markup Language (XML) application transformation. Specifically, in one embodiment, a method is disclosed for the automatic generation of markup language applications (e.g., a VXML application) for a voice interface process.
  • A call flow diagram is converted into a list of states in an XML format. The call flow diagram is part of the design documentation that describes the steps to the voice interface process. Each of the steps in the call flow diagram is represented by a state in the list of states. Descriptions relating to the type of state and the next transition state are included in the list of states. As such, the list of states is a high level and intermediate representation of the call flow diagram.
  • Next, a lookup table of entries in XML is created to map audio prompts and their audio files with corresponding audio states in the list of states. The lookup table of entries is created from a textual format of a spreadsheet that displays a plurality of audio prompts for audio files and their corresponding textual representations with their corresponding states that play an audio file. More particularly, the lookup table of entries comprises an audio path to the location of each of the particular audio files, or the particular audio file itself.
  • Then, an intermediate application is created in the XML format by starting from the list of states along with their corresponding state and transition information, and in particular, merging corresponding entries in the lookup table with associated audio states. The intermediate application at this point is still a high-level XML representation of the call flow diagram and the voice interface process. The XML representation provides for a well defined and highly flexible representation of the voice interface process.
  • The intermediate application is then transformed into a second application of a second format that is a representation of the call flow diagram. Since the intermediate application is in a structured and well defined extensible XML format, transformation to other extensible and non-extensible markup languages is possible. In one embodiment, the second application is in a VXML format. In another embodiment, the second application is in an HTML format to provide for web page documentation of the voice interface process. In still another embodiment, the second application is in a text format to provide for test case documentation in a quality assurance capacity.
  • The transformation operations used to generate the VXML application from the intermediate XML representation of the call flow diagram are described in a three stage process, in one embodiment. First, each of the states and their associated information in the intermediate XML representation is transformed into preliminary VXML instructions. This is accomplished using a standard template that corresponds to the particular state that is being transformed. Second, features that have not been implemented in the XML code for the intermediate XML representation is fully expanded in the VXML code format. This provides for a detailed coded implementation of the voice interface process. Third, optimization of the VXML code is performed in order to streamline and conform to the VXML format. In particular, redundant states or steps are eliminated and various “if” steps are combined.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a logical block diagram of a computer system with Extensible Markup Language (XML) transformation capabilities, in accordance with one embodiment of the present invention.
  • FIG. 2 is a flow chart of steps in a method for the transformation of design documentation into a web based application that is a detailed representation of the call flow of the design documentation, in accordance with one embodiment of the present invention.
  • FIG. 3 is a data flow diagram illustrating the flow of data through the application generator, in accordance with one embodiment of the present invention.
  • FIG. 4 is a data flow diagram illustrating the merging of audio states including audio prompts from a look-up table with corresponding states that play audio files corresponding to the audio prompts during the creation of the intermediate XML application, in accordance with one embodiment of the present invention.
  • FIG. 5 is a flow chart of steps in a method for the transformation of design documentation to a VXML application that is a detailed representation of the call flow from the design documentation, in accordance with one embodiment of the present invention.
  • FIG. 6 is a data flow diagram illustrating the flow of data to transform scripts of states in the intermediate XML application into default preliminary VXML instructions, in accordance with one embodiment of the present invention.
  • FIG. 7 is an exemplary call flow diagram of steps in a first module of states for services performed in connection with accessing account information via a voice interface process, in accordance with one embodiment of the present invention.
  • FIG. 8 is an diagram of an exemplary web page illustrating the transformation of the intermediate application into the hypertext markup language format, in accordance with one embodiment of the present invention.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Reference will now be made in detail to the preferred embodiments of the present invention, a method of automatic generation of a Voice Extensible Markup Language (VXML) application from design documentation of a voice interface process, and a system for implementing the method, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims.
  • Furthermore, in the following detailed description of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the present invention.
  • Notation and Nomenclature
  • Some portions of the detailed descriptions which follow are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits that can be performed on computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
  • It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “creating,” “transforming,” “merging,” “expanding,” “optimizing,” “applying,” “combining,” “eliminating,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, including an embedded system, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
  • Referring to FIG. 1, embodiments of the present invention are comprised of computer-readable and computer-executable instructions which reside, for example, in computer-readable media of a computer system, such as a VXML generator. FIG. 1 is a block diagram of exemplary embedded components of such a computer system 100 upon which embodiments of the present invention may be implemented.
  • Exemplary computer system 100 includes an internal address/data bus 120 for communicating information, a central processor 101 coupled with the bus 120 for processing information and instructions, a volatile memory 102 (e.g., random access memory (RAM), static RAM dynamic RAM, etc.) coupled with the bus 120 for storing information and instructions for the central processor 101, and a non-volatile memory 103 (e.g., read only memory (ROM), programmable ROM, flash memory, EPROM, EEPROM, etc.) coupled to the bus 120 for storing static information and instructions for the processor 101. Computer system 100 may also include various forms of disc storage 104 for storing large amounts of information.
  • With reference still to FIG. 1, an optional signal Input/Output device 108 is coupled to bus 120 for providing a communication link between computer system 100 and a network environment. As such, signal Input/Output (I/O) device 108 enables the central processor unit 101 to communicate with or monitor other electronic systems or analog circuit blocks that are coupled to the computer system 100. The computer system 100 is coupled to the network (e.g., the Internet) using the network connection, I/O device 108, such as an Ethernet adapter coupling the electronic system 100 through a fire wall and/or a local network to the Internet.
  • An output mechanism may be provided in order to present information at a display 105 or print output for the computer system 100. Similarly, input devices 107 such as a keyboard and a mouse may be provided for the input of information to the computer system 100.
  • Voice Extensible Markup Language Generator
  • Accordingly, various embodiments of the present invention disclose a method and system for an extensible framework from which various markup language applications can be automatically generated from design documentation of a voice interface process, thus utilizing human resources more efficiently. Moreover, embodiments of the present invention allow for the automatic generation of various other representations of a voice interface process, such as, Hypertext Markup Language (HTML) documentation, or any other application based markup.
  • In one embodiment, the extensible framework generates a VXML application as a representation of a voice interface and is implemented via a gateway system running voice browsers that interpret a voice dialog markup language in order to deliver web content and services to telephone and other wireless devices.
  • The VXML language is a web-based markup language for representing human to computer dialogs, and is analogous to HTML. The VXML language assumes a voice browser with audio output that is either recorded or computer generated. Also, the VXML language assumes that audio input through voice or telephone pad is provided as audio input. VXML is an XML application that defines a tree-like structure that the user can traverse through using voice commands. A VXML Document Type Definition (DTD) defines the structure and grammar of a particular VXML application or related applications.
  • FIG. 2 is a flow chart 200 of steps in a computer implemented method for the generation of applications from design documents describing a voice interface process, in accordance with one embodiment of the present invention. The method describes an extensible framework from which the generation of markup language applications from design documentation of a voice interface process is possible. The process disclosed in FIG. 2 is first discussed to provide a general overview to the method of generating a VXML application from design documentation. The particularities of the method is discussed in more detail with respect to the figures following FIG. 2.
  • The present embodiment begins by converting a call flow diagram into a list of states in an XML format, in step 210 of FIG. 2. In effect, the list of states comprises a finite state machine. The call flow diagram outlines each of the steps implemented in a voice interface process. As such, the list of states describes each of the steps in a voice interface process as outlined in the call flow diagram. The list of states provides for a high level representation of the call flow diagram of the voice interface process.
  • In step 220, the present embodiment creates a lookup table of audio states in the XML format that maps audio prompts to audio files to corresponding audio states in the list of states. The lookup table of audio states comprises an audio path that describes the web based path to the location of the audio file, and a textual representation of the audio file. In another embodiment, the lookup table of audio states comprises the actual audio file itself along with the textual representation of the audio file.
  • In step 230, the present embodiment creates an intermediate application representing the voice interface process in the aforementioned XML format. The intermediate application is created by merging the lookup table of audio states into the list of states. In particular, audio states in the lookup table are merged into corresponding states in the list of states playing an audio playback from an associated audio file.
  • In step 240, the present embodiment transforms the intermediate application in the XML format into a second application of a second markup language format. In one embodiment, the second application is of a HTML format, and wherein the second application is a source code for generating a web page comprising a tabular representation of the list of states including links between related states.
  • In another embodiment, the present embodiment transforms the intermediate application in the XML format into a second application of a VXML format. The generated VXML application is a static representation of the call flow diagram describing the voice interface, in one embodiment. As such, the static nature of the VXML application of the voice interface process allows the voice interface to be implemented in any browser environment using any supporting electronic device.
  • FIG. 3 is a data flow diagram 300 illustrating the transformation of the design documentation describing a voice interface process into various applications representing the voice interface process through an markup application generator 310, in accordance with one embodiment of the present invention. A three stage process, as described in the flow chart 200, is illustrated in the data flow diagram 300.
  • In stage 1 of the data flow diagram 300, the user interface design of the voice interface process is documented as a call flow diagram 320. The call flow diagram 320 is a flow chart outlining the various steps and procedures necessary to implement the voice interface process. As such, the call flow diagram 320 is a high-level representation of the voice interface process.
  • Also, in stage 1 of the data flow diagram 300, the user interface design of the voice interface process is documented as a master script 325. The master script 325 represents a set of audio states with the audio prompts that are associated with corresponding states in the list of states that play an audio file. More particularly, the master script comprises the audio path through a network to each of the locations of audio files played by those states that play an audio file. In addition, the corresponding textual representations of the audio files are included within the master script 325. Also, the actual audio file can be contained in the master script 325, in one embodiment. As such, the audio path or audio files and their corresponding textual representations can be cross-referenced with the corresponding states that play an audio file.
  • In one embodiment, the master script 325 is created in a textual format, such as, the Excel spreadsheet format, and can be saved as a tab delimited text file. Moreover, the master script is written in normal script and not concatenated script, in one embodiment.
  • Both the call flow diagram 320 and the master script is inputted into the application generator 310 in stage 2 of FIG. 3, in one embodiment. In the first half of stage 2 of the data flow diagram 300, intermediate presentation 1, the call flow diagram 320 is converted into the XML format that conforms to a control flow language (CFL) outlined by a document type definition (DTD), in one embodiment. The conversion creates the list of states in the CFL language of the XML format, and corresponds to step 210 of FIG. 2.
  • The CFL document is an XML representation of an application consisting of one or more modules. Each of the modules is a collection of states, or more accurately, a finite state machine. As such, the CFL document is a list of states 330. Each of the states include the type of state, the name of the state, and the transitions between states. Embodiments of the present invention enable the conversion to the CFL format through a transformation script or through a web interface.
  • In one embodiment, the call flow diagram 320 is created using the Microsoft Visio application. By following a predetermined set of rules for representing the user interface design of a voice interface process in Visio, the application generator 310 through a transformation script can automatically transform the call flow diagram into the CFL format.
  • A document type definition (DTD) for XML scripts conforming to the CFL language is outlined below. It is appreciated that the CFL DTD is exemplary only, and that other DTDs can be created to transform the call flow diagram 320 into a corresponding XML format for further transformation. The exemplary CFL DTD is as follows in Table 1:
  • TABLE 1
    <!--
    * Call flow Language DTD. CFL is an XML
    * representation of the Call flow of a Voice
    * Application. CFL represents a finite state
    * machine with a type and a name for each state
    * and the transitions between states. CFL does not
    * include any information on the inner components
    * of the states or the associated output.
    <!--
    Describes an application as a finite state machine
    of one or more states
    -->
    <!ELEMENT application (state+)>
    <!--
    Used to uniquely identify the state.  Each state
    has a type and unique name.  Type can be one of six
    different types:
    start: start state, has one transition
    fork, a state where a Boolean decision is evaluated
      that determines the call flow. Has two
      elements, ontrue and onfalse
    audio, A state where audio is queued, has one
      transition
    input: A state where user input is obtained. Can
      have multiple transitions based on the user's
      input, determined through the idresult
      attribute of the transition tag
    system: A state where system operation takes place
    magicaudio: A state where audio is queued using the
      magic audio property
    module: A link to a different module altogether.
    End: The last state, only one per application.
      Has no child elements
    -->
    <!ELEMENT state {transition*, ontrue*, onfalse*,
      module*)>
    <!ATTLIST state
      name ID #REQUIRED
      type (start|fork|audio|input|system|
        magicaudio|module|end) #REQUIRED
    >
    <!--
    Defines a transition from one state to another.
    Either one transition exists determining the next
    state, or multiple transitions exist based on the
    result of the current state in which case the
    ifresult tag is used
    -->
    <!ELEMENT transition EMPTY>
    <!ATTLIST transition
      next CDATA #REQUIRED
      ifresult CDATA #IMPLIED
    >
    <!--
    Defines a transition for “fork” type states when
    the result of the conditional is true
    -->
    <!ELEMENT ontrue EMPTY>
    <!ATTLIST ontrue
      next CDATA #REQUIRED
    >
    <!--
    Defines a transition for “fork” type states when
    the result of the conditional is false
    -->
    <!ELEMENT onfalse EMPTY>
    <!ATTLIST onfalse
      next CDATA #REQUIRED
    >
    <!--
    Defines the module properties for “module” type
    state name is the name of the module, while
    location is the URI for the CFL representation of
    the module
    -->
    <!ELEMENT module EMPTY>
    <!ATTLIST module
      name CDATA #REQUIRED
      location CDATA #REQUIRED
    >
  • FIG. 7 illustrates a call flow diagram 700 of an exemplary voice interface process used as an example throughout this Specification, in accordance with one embodiment of the present invention. The call flow diagram 700 describes a voice interface allowing a user to interact with the consumer services division of a company in order to access an account balance.
  • An exemplary set of rules as outlined in the CFL DTD for representing the user interface design of a voice interface process is outlined in the following paragraphs, and as is shown in FIG. 7. It is appreciated that the predetermined set of rules can vary depending on the various approaches that can be implemented for transforming the call flow diagram 320 into the CFL language.
  • The Visio call flow is comprised of one or more modules that represent the call flow diagram 320. A module consists of a finite set of states, wherein each of the states is a represented block or step in the call flow diagram 320. For example, in FIG. 7, block 710 represents a non-interactive input state, where the voice interface application is not expecting a response from the user. More particularly, a module is specified using a set of states connected to each other via state transitions. In addition, a module must have exactly one start state. Module names must be unique throughout the application generated from the call flow diagram 320. Also, a module may reference other modules in the application via module states.
  • In one embodiment, modules may be internal (e.g., by copy) or external (e.g., by reference only). An internal module is a module that is not a standalone application. A Classic example is explicit confirmation. During implementation, internal modules are implemented by replacing the call to the module with the actual module code, hence the synonym “by copy.”
  • On the other hand, an external module is one that can be a stand alone application. Examples of external modules Include functions like Main Menu, Address Capture, Package Tracking, and trading. An external module is implemented by referencing the module code, hence the synonym “By Reference.”
  • In one embodiment, a state in a module is represented via a block shape in Visio. Each state may have zero or more state transitions depending on its type. A state transition is represented by connecting between the various blocks in the call flow diagram. A state transition may have associated text, depending on the type of the predecessor state. The text associated with state transitions is referred to as transition text.
  • A state must be one of the following types: start, input, binary fork, multiple fork, non-interactive audio, system, magic word, module, and end state. The state type is determined through the shape used to represent the state, as will be discussed as follows:
  • A start state is represented in the call flow diagram 320 using the shape of a circle. Block 705 of FIG. 7 is an example of a start state. A start state must have no predecessor. A start state must have exactly one successor. Transition text coming out of the start state is not required and will be ignored. A start state must have a state name that indicates the name of the module. The name can be specified either through the “State Name” property, or through the actual text inside the state shape. In addition, a start state must have a “Module Type” property indicating the type of the module.
  • An input state is represented using the “Input or Form” square box. Block 715 of FIG. 7 is an example of an input state. An input state is one where user is prompted for an input that is then recognized against a grammar. An input state must have one or more predecessor. An input state must have one or more successors. Transitions to the next step or block indicate the input result associated with the transition. At most one transition out of an input state may have no associated text, in which case it will be considered the default transition. The “Audio Path” custom property for an input state must be specified. It must match a path in the associated master script of the lookup table 325.
  • A binary fork state is represented using the “Fork Decision” diamond box. Block 720 of FIG. 7 is an example of a binary fork state. A binary fork state indicates the performance of a Boolean decision that is either true or false. A binary fork state must have one or more predecessors. Also, a binary fork state must have exactly two successors. Transitions out of the binary fork states must have the associated text “YES” and “NO”.
  • A multiple fork state is represented using the “Fork Decision” diamond box. A multiple fork state indicates forking the call flow into various paths depending on the value of a certain variable or state. A multiple fork state must have one or more predecessor. A multiple fork state must have at least two successors. Transitions out of the multiple fork state can have associated text. At most one transition out of a multiple fork state may have no associated text, in which case it will be considered the default transition.
  • A non-interactive audio state is represented using the “non-interactive audio” box. Block 725 of FIG. 7 represents a non-interactive audio block. A non-interactive audio state must have one or more predecessor. A non-interactive audio state must have exactly one successor. Transition text coming out of the non-interactive audio block is not necessary. The “audio path” property for a non-interactive audio state must be specified. It must match a path in the associated lookup table in the master script 325.
  • The non-interactive state has a required “Function” property. The “function” can be either “Queue Audio” or “Queue and Play Audio”. “Queue Audio” is the default value and means the audio will be queued but will not be played until the next listen state. “Queue and Play Audio” means the audio will be played in the current state. If the audio is played, no special state grammar will be active but the user will be allowed to utter any of universal commands recognized by the application generated by the application generator 310.
  • A system process state represents one of the various system functions. Block 725 of FIG. 7 illustrates a system state. A system process state must have one or more predecessor. A system process state may have 0 or 1 successors depending on the system function. Functions include: Transfer, Record, Application Programming Interface (API) Call, Data, and Disconnect. Transfer Function represents a call transfer, and may or may not have a successor. Record represents a recording state. A Record state must have one successor. The API Call is a call to an external API through the data tag. API Calls must have one successor. The data function is where actual manipulation of data takes place. Data manipulation implies assigning values to variables that are used later in the application. Data functions must have one successor. Disconnect function ends the call by hanging up on the user. A disconnect function may have no successors implying end of the call, or may have one successor implying post hang up processing.
  • A magic-word content audio state is represented using the “magic-word content” box. The application implementing the call flow diagram 320 can be interrupted with a particular “magic-word,” but otherwise in not interruptible. A magic-word content state must have one or more predecessor. A magic-word content state must have exactly one successor. Transition text on coming out of the magic-word state is not necessary.
  • A module state is represented using the “subroutine or module” box. Block 730 of FIG. 7 illustrates a module block. A module state must have one or more predecessor. A Module state may have either no or one successors. A module is allowed to have a successor if and only if the actual called module has a return state. The actual module to be called needs to be specified either through the “Module” property of the state. If the “Module” property is empty, the state text is used instead.
  • An end state is represented using the “End” circle box. An end state is only allowed in internal modules. External Modules may or may not have an end state. An end state must have one or more predecessor. An end state can not have a successor. An end state must be one of two types: “Return” end state or “Reprompt” end state. The end state type is specified through the state text. A “Return” state implies returning from the current module. The transition to the return state is replaced with a transition to the, then required, successor to the calling module state. A “Reprompt” state implies transitioning to a previously visited prompt state. The transition to the “Reprompt” state will be replaced with a transition to the first input state that is a predecessor of the actual module state.
  • An exemplary example of the list of states 330 in the CFL language is provided below. The example of the list of states 330 in the CFL language corresponds to a portion of the blocks in FIG. 7 ( blocks 705, 710, 715, 717, 718, 719, 720, and 725) as outlined below in Table 2. Corresponding blocks are noted in enclosed brackets, such as, (block 705).
  • TABLE 2
    </module>
    <module name=“Main” type =”external”>
      (block 705)
      <state type=“start” name= “Main””>
      <transition next= “DemoMainWelcome”/>
      <state> (block 710)
        <state type=“audio”
    name=“DemoMainWelcome”
    audiopath=“0300_demo/main/welcome/”>
          <transition next=
    ″DemoMainGetLanguage″/>
      </state> (block 715)
        <state type=″input″
    name=″DemoMainGetLanguage″
    audiopath=″0300_demo/main/get-language/″>
          <transition next=
    ″DemoMainGetHomePhone″ ifresult=″English″/>
          <transition next=
    ″DemoMainGetLanguageSpanish″ ifresult=″Espanol″/>
      </state> (block 717)
        <state type=″input″
    name=″DemoMainGetHomePhone″ audiopath=
    ″0300_demo/main/get_home_phone/″>
          <transition next=″UsedVoice″
    ifresult=″650-428-0919″/>
      </state> (block 725)
        <state type=″audio″
    name=″DemoMainGetLanguageSpanish″ audiopath=″0300
    demo/main/get_language/ spanish.wav″>
          <transition next=
    ″DemoMainGetHomePhone″/>
      </state> (block 720)
        <state type=″fork″ name=″Usedvoice″>
          <transition next=
    ″DemoCommonConfirmPhonenumber″ ifcond=″true″/>
        <transition next=″AniLookup″
    ifcond=″false″/>
        <feature name=″used_voice″/>
      </state> (block 725)
        <state type=″system″ name=″AniLookup″>
          <transition next=″Registered″/>
          <property name=″Function″
    value=″Data″/>
  • Returning back to FIG. 3, the master script 325 text document containing the audio prompts is converted into the XML format that conforms to a master script language (MSL) outlined by a document type definition (DTD), in one embodiment. The MSL document is an XML representation of the states that play an audio file. The MSL document represents a look-up table of audio states 335 with the audio prompts necessary for states in the list of states to play their associated audio files. Conversion to the look-up table of audio states 335 corresponds to step 220 of FIG. 2. Embodiments of the present invention enable the conversion to the MSL language through a transformation script or through a web interface.
  • A document type definition (DTD) for XML scripts conforming to the MSL language is outlined below. It is appreciated that the MSL DTD is exemplary only, and that other DTDs can be created to transform the master script 325 into a corresponding XML format for further transformation. The exemplary MSL DTD is as follows in Table 3:
  • TABLE 3
    <!--
    * Master Script Language DTD. MSL is an XML
    * representation of the Master Script submitted
    * with a Voice Application. MSL represents a set
    * of states with the audio prompts played in each
    * state. MSL does not describe the transitions
    * between the states or their relationship to each
    * other.
    -->
    <!--
    Describes an application as a set of one or more
    states
    -->
    <!ELEMENT application (state+)>
    <!--
    Used to uniquely identify the state. Each state has
    a name and an optional audiopath as attributes. A
    state can have audio elements as direct children,
    or can have audio elements grouped together under
    some sub-stat one of; ni1. ni2, nm1, nm2, nm3, and
    help.
    -->
    <!ELEMENT state (audio*, feature*, ni1?, ni2?,
      nm1?, nm2?, nrn3?, help?)>
    <!ATTLIST state
      name ID #REQUIRED
      audiopath CDATA #IMPLIED
    >
    <!--
    audio can be either a file, or the playback of some
    variable, such as on playing back a phone number
    obtained at the state GetPhoneNumber. In this case
    value will be GetPhoneNumber and type will be
    phoneNumber. Value is used to determine the data
    flow, while type is used to determine the
    JavaScript function used to generate the audio
    -->
    <!ELEMENT audio EMPTY>
    <!ATTLIST audio
      src CDATA #IMPLIED
      tts CDATA #IMPLIED
      value CDATA #IMPLIED
      type CDATA #IMPLIED
    >
    <!--
    Intrastate components
    -->
    <!ELEMENT ni1 (audio*)>
    <!ELEMENT ni2 (audio*)>
    <!ELEMENT nrn1 (audio*)>
    <!ELEMENT nrn2 (audio*)>
    <!ELEMENT nrn3 (audio*)>
    <!ELEMENT help (audio*)>
    <!--
    feature
    -->
    <!ELEMENT feature EMPTY>
    <!ATTLIST feature
      name CDATA #REQUIRED
      id CDATA #REQUIRED
    >
  • An exemplary example of the look-up table of audio states 335 in the MSL language of the XML format is provided below. The example of the look-up table of audio states in the MSL language corresponds to block 717 of FIG. 7 as outlined below in Table 4.
  • TABLE 4
      </state>
      <state name=″0300DemoMainGetHomePhone″
    audiopath=″0300_demo/main/get_home_phone/″
    type=″input″>
        <audio src=″prompt.wav″ tts=″Please say
    or enter your home number, starting with the area
    code.″/>
        <ni1>
          <audio src=″ni1.wav″ tts=″I&apos;m
    sorry, I didn&apos;t hear you. Please say or enter
    your home phone number.″/>
        </ni1>
        <ni2>
          <audio src=″ni2.wav″ tts=″I&apos;m
    sorry, I still didn&apos;t hear you. Please enter
    your home phone number.″/>
        </ni2>
        <nm1>
          <audio src=″nm1.wav″ tts=″I&apos;m
    sorry, I didn&apos;t get that. Please say or enter
    your home phone number.″/>
        </nm1>
        <nm2>
          <audio src=″nm2.wav″ tts=″I&apos;m
    sorry, I still didn&apos;t get that. Please enter
    your home phone number.″/>
        </nm2>
        <nm3>
          <audio src=″nm3.wav″ tts=″I&apos;m
    sorry, I&apos;m having trouble understanding. Using
    your telephone keypad, please enter your home phone
    number.″/>
        </nm3>
        <help>
          <audio src=″help.wav″ tts=″Please
    say or enter your home phone number.”/>
        </help>
      </state>
  • Returning back to FIG. 3, in part 2 of stage 2, the intermediate presentation II, the list of states 330, which conforms to the CFL language, and the look-up table of audio states 335, which conforms to the MSL language, are combined together into an XML representation of the entire user interface design documents (e.g., the call flow diagram 320 and the master script 325). The combined XML representation is referred to an intermediate XML application, and corresponds to step 240 of FIG. 2.
  • In one embodiment, the combined XML representation is referred to as the Tellme User Interface Design Language, or TUIDL. The TUIDL document represents an application as a set of modules. Each module is a finite state machine. The actual content of the state and the transition between states is explicitly specified as a high level representation of the voice interface process.
  • A document type definition (DTD) for XML scripts conforming to the TUIDL language is outlined below. It is appreciated that the TUIDL DTD is exemplary only, and that other DTDs can be created to merge the look-up table 335 of audio states with the list of states 330 master script 325. The exemplary TUIDL DTD is as follows in Table 5:
  • TABLE 5
    <!--
    * Tellme User Interface Design Language DTD. TUIDL
    * is an XML representation of the complete design
    * of the User Interface Voice Application. TUIDL
    * represents an application as a set of modules.
    * Each module is a finite state machine. The actual
    * content of the state and the transition between
    * states is explicitly specified.
    * of each state
    -->
    <!--
    Describes an application as a finite state machine
    of one or more modules
    -->
    <!ELEMENT application (module+)>
    <!--
    Used to uniquely identify a module.  Each module
    has a type and unique name. Type can be either
    internal or external
    -->
    <!ELEMENT module (state+)>
    <!ATTLIST module
      name ID #REQUIRED
      type (internal|external) #IMPLIED
    >
    <!--
    Used to uniquely identify the state.
    Children include:
    transition: transition to the next state
    property: Set of state specific properties
    feature: UI Features to be applied to the state
    -->
    <!ELEMENT state (property | feature | transition |
    audio | ni1 | ni2 | nm1 | nm2 | nm3 | help)*>
    <!--
    Attributes for a state include:
    name: Required ID
    audiopath.  Required for states where audio is
    queued
    type. Can be one of:
      start: start state, has one transition
      fork: a state where a Boolean decision is
        evaluated that determines the call flow.
        Has two elements, on true and onfalse
      audio: A state where audio is queued. has one
        transition
      multiplefork: a state where a forking takes
        place
      input: A state where user input is obtained.
      Can have multiple transitions based on the
        user's input, determined through the
        idresult attribute of the transition tag
      system: A state where system operation takes
        place. Can have anywhere between 0-2
        transitions
      magicaudio: A state where audio is queued
        using the magic audio property module. A
        link to a different module.
      return:
      reprompt:
      end: The last state in a module. Has no child
        elements
    -->
    <!ATTLIST state
      name ID #REQUIRED
      type (start | fork | multiplefork |
    audio | input | system | magicaudio | module |
    return | reprompt | end) #REQUIRED
      audiopath CDATA #IMPLIED
    >
    <!--
    audio can be either a file, or the playback of some
    variable, such as on playing back a phone number.
    value is used to determine the data flow, while
    type is used to determine the JavaScript function
    used to generate the audio
    -->
    <!ELEMENT audio EMPTY>
    <!ATTLIST audio
      src CDATA #IMPLIED
      tts CDATA #IMPLIED
      value CDATA #IMPLIED
      type CDATA #IMPLIED
    >
    <1--
    Intrastate components for input states
    -->
    <!ELEMENT ni1 (audio*}>
    <!ELEMENT ni2 (audio*)>
    <!ELEMENT nm1 (audio*)>
    <!ELEMENT nm2 (audio*)>
    <!ELEMENT nm3 {audio*)>
    <!ELEMENT help (audio*)>
    <!--
    Defines a transition from one state to another.
    Either one transition exists determining the next
    state, or multiple transitions exist based on the
    result of the current state in which case the
    ifresult tag is used
    -->
    <!ELEMENT transition EMPTY>
    <!ATTLIST transition
      next CDATA #REQUIRED
      ifresult CDATA #IMPLIED
      ifcond (true|false) #IMPLIED
    >
    <!ELEMENT property EMPTY>
    <!ATTLIST property
      name CDATA #REQUIRED
      value CDATA #IMPLIED
    >
    <!ELEMENT feature EMPTY>
    <!ATTLIST feature
      name CDATA #REQUIRED
      value CDATA #IMPLIED
    >
  • The merging of the list of state 330 in the CFL language and the look-up table of audio states 335 in the MSL language is accomplished through mapping the audiopath properties of the various states of the CFL document 330 with the audio path of the various states of the master script 335. States in the CFL document 330 may maintain a many to one relationship with states in the MSL document 335, e.g., more than one state in the CFL document 330 may map to the same audio state playing an audio file in the MSL document 335. However, at most one audio state in the MSL document 335 may map into a state in the CFL document.
  • The merging of the look-up table of audio states 335 with the corresponding audiopath properties of states playing an audio file in the list of states 330 corresponds to step 230 of FIG. 2. As such, the merging of the audiopath properties into corresponding states playing an audio file in the list of states is a high level XML representation of the voice interface process.
  • FIG. 4 is a data flow diagram 400 illustrating the merging of the audio prompts in the look-up table 335 of audio states with corresponding states in the list of states 330 conforming to the CFL language. In the list of states 330, a module 410 is presented in a state machine format. A collection of states 415 comprises module 410 and includes a states 1, 2, 3, 4, etc. State 2 containing states 417 and state 4 containing state 419 are states that play an audio file.
  • In the look-up table 335, audio path properties are contained in audio script for each of the states in the list of states that play an audio file. A plurality of audio states 420 containing audio prompts for each of the states playing an audio file comprises the look-up table 335 in the MSL language. The audio states refer to audiopath properties for the playing of the audio files. For example, the audiopath properties 425 for input state 2 and the audio path properties 427 for the audio state 4 are illustrated.
  • To create the TUIDL document 340, the list of states in the CFL language is merged with the look-up table 335 containing the audio path properties for audio files that are played, in one embodiment of the present invention. In essence, each of the audio path properties are incorporated directly into corresponding states that play an audio file. For example, the audio path properties 425 for state 2 are directly incorporated into state 417 corresponding to input state 2. Also, the audio path properties 427 for state 4 are directly incorporated into the state 419 corresponding to input state 4.
  • An exemplary example of the intermediate XML application 340 in the TUIDL language is provided below, and corresponds to a portion of the blocks in FIG. 7 ( blocks 717, 725, and 720) as outlined below in Table 6. Corresponding blocks are noted in enclosed brackets, such as, (block 717).
  • TABLE 6
      </state>  (Block 717)
      <state type=“input”
    name=“DemoMainGetHomePhone”
    audiopath=“0300_demo/main/get_homephone/”>
        <transition next=“UsedVoice”
    ifresult=“650-428-0919”/>
        <audio src=“prompt.wav” tts=“Please say
    or enter your home number, starting with the area
    code.”/>
        <ni1>
          <audio src=“ni1.wav” tts=“I'm sorry,
    I didn't hear you. Please say or enter your home
    phone number.”/>
        </ni1>
        <ni2>
          <audio src=“ni2.wav” tts=“I'm sorry,
    I still didn't hear you. Please say or enter your
    home phone number.”/>
        </ni2>
        <nm1>
          <audio src=“nm1.wav” tts=“I'm sorry,
    I didn't get that. Please enter your home phone
    number.”/>
        </nm1>
        <nm2>
          <audio src=“nm2.wav” tts=“I'm sorry,
    I still didn't get that. Please enter your
    home phone number.”/>
        </nm2>
        <nm3>
          <audio src=“nm3.wav” tts=“I'm sorry,
    I'm having trouble understanding. Using your
    telephone keypad, please enter your home phone
    number.”/>
        </nm3>
        <help>
          <audio src=“help.wav” tts=“Please
    say or enter your home phone number.”/>
        </help>
      </state>  (BLOCK 725)
      <state type=“audio”
    name=“DemoMainGetLangI1ageSpanish”
    audiopath=“0300_demo/main/get_language/”>
        <transition next=“DemoMainGetHomePhone”/>
      <audio src=“spanish.wav” tts=“Sorry, this demo
    doesn't support Spanish. Now continuing in English.
    ”/>
      </state>  (BLOCK 720)
      <state type=“fork” name=“UsedVoice”>
        <transition next=
    “DemoCommonConfirmPhonenumber” ifcond=“true”/>
        <transition next=“AniLookup”
    ifcond=“false”/>
      <feature name=“used_voice”/>
    . . .
  • In another embodiment, in the design phase, the audio prompts are not separated from the call flow diagram 320. In that case, the CFL document 330 and the MSL document 335 would be unnecessary. Instead, two inputs are directly used in part 2 of stage 2, the intermediate presentation II. As inputs, the list of states, and corresponding audio paths with their textual representations are used to create the intermediate XML application that represents the voice interface process.
  • As such, the application generator 310 establishes an extensible framework allowing the generation of the various markup language application from the design documentation. The extensible manner of the application generator 200 allows for the generation of VXML application, HTML applications, or any other application based markup applications, as an output.
  • To implement the transformation, the intermediate XML application 340 is transformed into applications of various formats, in one embodiment of the present invention. The XML format is a general and highly flexible representation of any type of data. As such, transformation to any markup language based application can be systematically performed in an extensible manner.
  • As shown in FIG. 3, the application generator 310 can transform the intermediate XML application 340 into a VXML application 350 that is a static representation of the call flow diagram 320, in one embodiment. As such, the static nature of the VXML application 350 of the voice interface-process allows the voice interface to be implemented in any browser environment using any supporting electronic device.
  • The application generator 310 can also transform the intermediate XML application 340 into an HTML application 360, in one embodiment. As such, the HTML application 360 is a source code for generating a web page comprising a tabular representation of the list of states with links between related states.
  • FIG. 8 is a diagram illustrating the web page or the HTML document 800 for block 717 of FIG. 7 which corresponds to the “DemoMainGetHomePhone” state. The HTML document 800 corresponds to the voice interface process as outlined in the call flow diagram 320. In the HTML document 800 is presented in tabular format in one embodiment, but could easily be presented in other formats in other embodiment. The directory name for the state is presented in cell 810. The various audio prompts and files that are played are displayed in logical fashion to present an overall process view of the voice interface. For example, the main prompt is presented in cell 820.
  • The transition state is presented in cell 860. As an added feature in the HTML document 800, links to other states in the HTML document 800 can also be provided, in one embodiment. As such, by clicking on the link to “UsedVoice,” the portion of the HTML document corresponding to the “UsedVoice” state would be presented.
  • The application generator 310 can also transform the intermediate XML application 340 into any other application based markup, or any textual format, in one embodiment of the present invention. For example, the application generator 310 can transform the XML application 340 into an application of a text format, wherein the textual application is a quality assurance (QA) application that is used for testing performance of the VXML application 350.
  • The application generator 310 is not limited to creating certain functionalities of a voice interface application, but is designed in an extensible fashion allowing the generation of VXML coded applications that can perform any task, as long as the task can be represented in a clear and well defined set of VXML instructions.
  • FIG. 5 is a flow chart 500 of steps illustrating a method for converting the intermediate XML application 340 in the TUIDL language into a VXML application 350, in accordance with one embodiment of the present invention. The conversion occurs in a three step process. In step 510, the present embodiment transforming each state in the intermediate XML application into preliminary VXML instructions. Standard templates are used to convert each state in the intermediate XML application 340 into a default VXML instruction or representation.
  • FIG. 6 is a diagram illustrating the application of the standard templates to convert states in the intermediate XML application-into VXML instructions. FIG. 6 corresponds to the process illustrated in step 510 of FIG. 5. The script 610 for state “x” in the intermediate XML application has a defined state type. The standard template for the state type corresponding to state “x” is applied to the script 610 in the conversion process to VXML instructions.
  • A plurality of standard templates 610 can be applied to the script 610 in order to convert the script for state “x” into VXML instructions. Embodiments of the present inventions include numerous standard templates for converting script for states into default VXML instructions, including numerous standard templates for a single type of state. The selected standard templates are chosen according to design preference.
  • In FIG. 6, the plurality of standard templates includes the start state template 612. Should the script 610 be of the start type, the template 612 would be applied to the script 610 to generate preliminary VXML instructions 620. Should the script 610 be of the input state type, the template 614 would be applied to the script 610 to generate corresponding preliminary VXML instructions 620. Similarly, should the script 610 be of the audio state type, the template 614 would be applied to the script 610 to generate corresponding preliminary VXML instructions 620. This process would occur for every state in the intermediate XML application.
  • An exemplary example of application of the plurality of standard templates 610 is provided below, and corresponds to the generation of VXML instructions for the blocks surrounding block 717 of FIG. 7. The VXML instructions are outlined below in Table 7:
  • TABLE 7
      <!--***********************************************
      * State: DemoMainGetHomePhone
      ************************************************-->
      <form id=″DemoMainGetHomePhone″>
        <field name=″DemoMainGetHomePhone″>
          <grammar src=″demomaingethomephone.gsl″/>
          <prompt>
            <audio expr=″appsAudioRootPath +
      ‘0300_demo/main/get_home_phone/prompt.wav’”>
              Please say or enter your home
      number, starting with the area code.
          </audio>
          </prompt>
          <!--*************************************
          Nomatch Handlers
          **************************************-->
          <nomatch count=”1”>
            <audio expr=″appsAudioRootPath +
      ‘0300_demo/main/get_home_phone/nm1.wav’”>
              I&apos;m sorry, I didn&apos;t
      get that. Please say or enter your home phone
      number.
            </audio>
          </nomatch>
          <nomatch count=″2″>
            <audio expr=″appsAudioRootPath +
      ‘0300_demo/main/get_home_phone/nm2.wav’”>
              I&apos;m sorry, I didn&apos;t
      get that. Please enter your home phone number.
            </audio>
          </nomatch>
          <nomatch count=″3″>
            <audio expr=″appsAudioRootPath +
      ‘0300_demo/main/get_home_phone/nm3.wav’”>
              I&apos;m sorry, I&apos;m
      having trouble understanding. Using your telephone
      keypad, please enter your home phone number.
            </audio>
          </nomatch>
          <!--*************************************
          Noinput Handlers
          **************************************-->
          <noinput count=″1″>
            <audio expr=″appsAudioRootPath +
      ‘0300_demo/main/get_home_phone/nil.wav’”>
            I&apos;m sorry, I didn&apos;t hear
      you. Please say or enter your home phone number.
            </audio>
          </noinput>
          <noinput count=″2″>
            <audio expr=″appsAudioRootPath +
      ‘0300_demo/main/get_home_phone/ni2.wav’”>
              I&apos;m sorry, I still
      didn&apos;t hear you. Please enter your home phone
      number.
            </audio>
          </noinput>
          </help>
            <audio expr=″appsAudioRootPath +
      ‘0300_demo/main/get_home_phone/help.wav’”>
              Please say or enter your home
    phone number.
          <filled>
            <goto next=″#UsedVoice″/>
          </filled>
        </field>
      </form>
      <form id=″DemoMainGetLanguageSpanish″>
        <block>
          <audio expr=″appsAudioRootPath +
      ‘0300_demo/main/get_language/spanish.wav’”>
            Sorry, this demo doesn&apos;t
      support Spanish. Now continuing in English.
          </audio>
          <goto next=″#DemoMainGetHomePhone″/>
        </block>
      </form>
      <form id=″UsedVoice″>
        <block>
          <if cond=″UsedVoice{ )″>
            <goto
      next=″#DemoCommonConfirmPhonenumber″/>
          <else/>
            <goto next=″#AniLookup″/>
          </if>
        </block>
      </form>
      <form id=″AniLookup″>
        <block>
          <!-- TODO Please insert functionality for
      system state AniLookup of Function: Data -->
          <goto next=″#Registered″/>
        </block>
      </form>
  • Returning now back to flow chart 500 of FIG. 5, in step 520, the present embodiment expands features embedded in the states in the intermediate XML application to be included in the preliminary VXML instructions. As such, user interface features are applied to the generated VXML instructions implementing commonly used logic and functionality. In other words, features are coded tasks that are used over and over in various applications. The code is repeated in the various applications. User interface features are applied through the manipulation of the document object model that are generated by the standard templates 610 of FIG. 6.
  • With the use of features, the actual code need not be entered until the last phase of the transformation process, during the feature expansion phase. At that point, predetermined instructions can be substituted in the VXML instructions that correspond to the features. This is done for each of the features that are embedded in the preliminary VXML instructions.
  • Paying particular attention to Table 7, the script pertaining to “<form id=“UsedVoice”>” has not expanded the feature named “UsedVoice.” However, Table 8 illustrates how the feature named “UsedVoice” as shown in Table 7 is expanded with the appropriate code, as follows:
  • TABLE 8
    </form>
    <form id=“UsedVoice”>
      <block>
        <if cond=“application.lastresult$[0].
    Inputmode == ‘voice’”>
          <goto
    next=“#DemoCommonConfirmPhonenumber”/>
        <else/>
          <goto next=“#AniLookup”/>
        </if>
      </block>
    </form>
  • Returning now back to flow chart 500 of FIG. 5, in step 530, the present embodiment optimizes the preliminary VXML instructions. Optimization paths are then performed to clean up the code. Optimizations include eliminating redundant states, and combining various “if” conditions together.
  • As an example of optimization, prior to optimization, the VXML instructions in Table 7 have separate instructions for Form “Used Voice” and for Form “AniLookup,” as is illustrated below in Table 9:
  • TABLE 9
    <form id=“UsedVoice”>
      <block>
        <if cond=“UsedVoice( )”>
          <goto
    next=“#DemoCommonConfirmPhonenumber”/>
        <else/>
          <goto next=“#AniLookup”/>
        </if>
      </block>
    </form>
    <form id=“AniLookup”>
      <block>
        <!-- TODO Please insert functionality for
    system state AniLookup of Function:
     Data -->
        <goto next=“#Registered”/>
      </block>
    </form>
  • However, after optimization, the VXML instructions in Table 9 have been combined such that Form “AniLookup” is eliminated, and its content inserted into the state Form “Used Voice” as is illustrated below in Table 10:
  • TABLE 10
    <form id=“UsedVoice”>
      <block>
        <if cond=“UsedVoice( )”>
          <goto
    next=“#DemoCommonConfirmPhonenumber”/>
        <else/>
          <!-- TODO Please insert functionality
    for system state AniLookup of Function:
     Data -->
          <goto next=“#Registered”/>
        </if>
      </block>
    </form>
  • Referring back to FIG. 5, each of the steps 510, 520, and 530 can be customized to meet certain output requirements, in accordance with embodiments of the present invention.
  • In addition, the transformation into the VXML application of the voice interface process includes the generation of necessary and accompanying code written in the Java Script language, in accordance with one embodiment of the present invention. The VXML language integrates Java Script in order to support operations that the VXML language normally cannot support. As such, supporting Java Script code is integrated within the VXML application to support the necessary and accompanying operations representing the voice interface process.
  • Moreover, each of the steps in the flow charts of FIGS. 2 and 5 are executed automatically, in accordance with one embodiment of the present invention. As such, by inputting the design documents (e.g., the call flow diagram 330 and the master script 335) into the application generator 310, the appropriate VXML instructions in the VXML application of the voice interface can be automatically generated. Correspondingly, HTML documentation of the voice interface process can be generated automatically. In addition, other markup based language documents can be generated automatically, such as quality assurance applications, and other markup based language applications that are representations of the voice interface process.
  • While the methods of embodiments illustrated in flow charts 200 and 500 show specific sequences and quantity of steps, the present invention is suitable to alternative embodiments. For example, not all the steps provided for in the method are required for the present invention. Furthermore, additional steps can be added to the steps presented in the present embodiment. Likewise, the sequences of steps can be modified depending upon the application.
  • Embodiments of the present invention, a method and system for the generation of markup language applications (e.g., a VXML application) for a voice interface process, are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the below claims.

Claims (35)

1. A method of transformation comprising:
a) converting a call flow diagram describing a voice interface process into a list of states in an Extensible Markup Language (XML) format;
b) creating a lookup table of audio states in said XML format by mapping a plurality of audio prompts and their corresponding textual representations with states of said of states that play audio files associated with said plurality of audio prompts;
c) creating an intermediate application in said XML format and from said list of states by merging audio prompts in said lookup table with states of said list of states that play said audio files; and
d) transforming said intermediate application into a second application of a second format that is a representation of said call flow diagram.
2. The method as described in claim 1, wherein said d) comprises automatically transforming said intermediate application into said second application of said second format that is a static representation of said call flow diagram.
3. The method as described in claim 1, wherein said second format is HyperText Markup Language (HTML), and wherein said second application is a source code for generating a web page comprising a tabular representation of said list of states with links between related states in said list of states.
4. The method as described in claim 1, wherein said second format is VXML.
5. The method as described in claim 4, wherein said d) comprises:
d1) transforming each of said list of states in said intermediate application into preliminary VXML instructions;
d2) expanding features embedded in said list of states to be included in said preliminary VXML instructions; and
d3) optimizing said preliminary VXML instructions.
6. The method as described in claim 5, wherein said d1) comprises:
applying standard templates for each of the various types of states in said list of states to generate said preliminary VXML instructions.
7. The method as described in claim 5, wherein said d3) comprises:
eliminating redundant states; and
combining various “if” conditions.
8. The method as described in claim 1, further comprising:
receiving said call flow diagram in a Microsoft VISIO format before said a).
9. The method as described in claim 1, further comprising:
before said b), receiving a Microsoft Excel spreadsheet in a text format comprising said plurality of audio prompts and their corresponding textual representations that are cross referenced with corresponding states in said list of states that play said audio files.
10. A method of transformation comprising:
a) creating a call flow application by converting a call flow diagram describing a voice interface process into a plurality of states substantially following an Extensible Markup Language (XML) format;
b) creating a lookup table comprising a plurality of entries in said XML format by associating audio prompts for accessing a plurality of audio files and their corresponding textual representations with corresponding states of said plurality of states that play said plurality of audio files;
c) merging said call flow application and said lookup table into an XML application that is a high level XML representation of said voice interface process, by incorporating each of said plurality of entries into corresponding states in said call flow application that play audio files; and
d) transforming said XML application into a second application of a VXML format that is a static representation of said call flow diagram.
11. The method of transformation as described in claim 10, wherein said call flow application is comprised of at least one module representing said plurality of states.
12. The method as described in claim 10, further comprising:
automatically transforming said XML application into said second application of a HyperText Markup Language (HTML) format, and wherein said second application is a source code for generating a web page comprising a tabular representation of said plurality of states with links between related states in said plurality of states.
13. The method as described in claim 10, wherein said d) comprises:
d1) applying standard templates for each of the various types of states in said plurality of states to transform each of said plurality of states as described in said XML application into preliminary VXML instructions;
d2) expanding features included in said plurality of states to be included in said preliminary VXML instructions; and
d3) optimizing said preliminary VXML instructions.
14. The method as described in claim 10, further comprising:
automatically transforming said XML application into a third application of a text format, and wherein said third application is a quality assurance (QA) application that is used for testing performance of said second application.
15. The method as described in claim 10, further comprising:
receiving said call flow diagram in a Microsoft VISIO format before said a).
16. The method as described in claim 10, further comprising:
before said b), receiving a Microsoft Excel spreadsheet in a text format comprising said audio prompts for accessing said plurality of audio files and their corresponding textual representations that are cross referenced with corresponding states in said plurality of states that play said audio files.
17. A method of Extensible Markup Language (XML) transformation comprising:
a) accessing a first input of a plurality of states associated with a voice interface process and complying substantially with an XML format;
b) accessing a lookup table of entries in said XML format that maps a plurality of audio files and their corresponding textual representations with audio states in said plurality of states that play said plurality of audio files;
c) creating an intermediate application in said XML format by merging said audio states with corresponding entries in said lookup table into said plurality of states in said a); and
d) transforming said intermediate application into a second application of a second format that is a detailed low level representation of said call flow diagram.
18. The method as described in claim 17, wherein said c) and said d) comprises, respectively:
c1) automatically creating said intermediate application in said XML format from said plurality of states in said a) by merging said audio states with corresponding entries in said lookup table; and
d1) automatically transforming said intermediate application into said second application of said second format that is a detailed low level representation of said call flow diagram.
19. The method as described in claim 17, wherein said second format is HyperText Markup Language (HTML), and wherein said second application is a source code for generating a web page comprising a tabular representation of said plurality of states with links between related states in said plurality of states.
20. The method as described in claim 17, wherein said second format is VXML.
21. The method as described in claim 17, wherein said second application is of a text format, and wherein said second application is a quality assurance (QA) application.
22. A method of transforming from Extensible Markup Language (XML) to VXML comprising:
a) transforming an application substantially complying with an XML format into preliminary VXML instructions, said application comprising a plurality of states corresponding to a call flow diagram that describes a voice interface process;
b) expanding features embedded in said plurality of states to be included in said preliminary VXML instructions; and
c) optimizing said preliminary VXML instructions.
23. The method as described in claim 22, wherein audio states in said plurality of states comprise a plurality of audio prompts to audio files and their corresponding textual representations.
24. The method as described in claim 22, wherein said a) comprises:
generating said preliminary VXML instructions by applying standard templates for each of the various types of states of said plurality of states.
25. The method as described in claim 22, wherein said b) comprises:
substituting predetermined instructions corresponding to said features for each of said features embedded in said plurality of states.
26. The method as described in claim 22, wherein said c) further comprises:
eliminating redundant states; and
combining various “if” conditions.
27. A transformation generator comprising:
a processor; and
a computer readable memory coupled to said processor and containing program instructions that, when executed, implement a method of transformation comprising:
a) converting a call flow diagram describing a voice interface process into a in an Extensible Markup Language (XML) format;
b) creating a lookup table of audio states in said XML format by mapping a plurality of audio prompts and their corresponding textual representations with states of said that play audio files associated with said plurality of audio prompts;
c) creating an intermediate application in said XML format and from said by merging audio prompts in said lookup table with states of said that play said audio files; and
d) transforming said intermediate application into a second application of a second format that is a representation of said call flow diagram.
28. The transformation generator as described in claim 27, wherein said d) comprises automatically transforming said intermediate application into said second application of said second format that is a static representation of said call flow diagram.
29. The transformation generator as described in claim 27, wherein said second format is HyperText Markup Language (HTML), and wherein said second application is a source code for generating a web page comprising a tabular representation of said with links between related states in said.
30. The transformation generator as described in claim 27, wherein said second format is VXML.
31. The transformation generator as described in claim 30, wherein said d) comprises:
d1) transforming each of said in said intermediate application into preliminary VXML instructions;
d2) expanding features embedded in said to be included in said preliminary VXML instructions; and
d3) optimizing said preliminary VXML instructions.
32. The transformation generator as described in claim 31, wherein said d1) comprises:
applying standard templates for each of the various types of states in said to generate said preliminary VXML instructions.
33. The transformation generator as described in claim 31, wherein said d3) comprises:
eliminating redundant states; and
combining various “if” conditions.
34. The transformation generator as described in claim 27, further comprising:
receiving said call flow diagram in a Microsoft VISIO format before said a).
35. The transformation generator as described in claim 27, further comprising:
before said b), receiving a Microsoft Excel spreadsheet in a text format comprising said plurality of audio prompts and their corresponding textual representations that are cross referenced with corresponding states in said that play said audio files.
US11/877,571 2002-10-31 2007-10-23 Method and system for the generation of a voice extensible markup language application for a voice interface process Abandoned US20080134020A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/877,571 US20080134020A1 (en) 2002-10-31 2007-10-23 Method and system for the generation of a voice extensible markup language application for a voice interface process

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/285,894 US7287248B1 (en) 2002-10-31 2002-10-31 Method and system for the generation of a voice extensible markup language application for a voice interface process
US11/877,571 US20080134020A1 (en) 2002-10-31 2007-10-23 Method and system for the generation of a voice extensible markup language application for a voice interface process

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/285,894 Continuation US7287248B1 (en) 2002-10-31 2002-10-31 Method and system for the generation of a voice extensible markup language application for a voice interface process

Publications (1)

Publication Number Publication Date
US20080134020A1 true US20080134020A1 (en) 2008-06-05

Family

ID=38607168

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/285,894 Expired - Fee Related US7287248B1 (en) 2002-10-31 2002-10-31 Method and system for the generation of a voice extensible markup language application for a voice interface process
US11/877,571 Abandoned US20080134020A1 (en) 2002-10-31 2007-10-23 Method and system for the generation of a voice extensible markup language application for a voice interface process

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/285,894 Expired - Fee Related US7287248B1 (en) 2002-10-31 2002-10-31 Method and system for the generation of a voice extensible markup language application for a voice interface process

Country Status (1)

Country Link
US (2) US7287248B1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282607A1 (en) * 2004-04-28 2007-12-06 Otodio Limited System For Distributing A Text Document
US20080162140A1 (en) * 2006-12-28 2008-07-03 International Business Machines Corporation Dynamic grammars for reusable dialogue components
US20080162138A1 (en) * 2005-03-08 2008-07-03 Sap Aktiengesellschaft, A German Corporation Enhanced application of spoken input
US20090307664A1 (en) * 2006-09-20 2009-12-10 National Ict Australia Limited Generating a transition system for use with model checking
US20100281076A1 (en) * 2009-05-04 2010-11-04 National Taiwan University Assisting method and apparatus for accessing markup language document
US9135952B2 (en) * 2010-12-17 2015-09-15 Adobe Systems Incorporated Systems and methods for semi-automatic audio problem detection and correction

Families Citing this family (76)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7287248B1 (en) * 2002-10-31 2007-10-23 Tellme Networks, Inc. Method and system for the generation of a voice extensible markup language application for a voice interface process
US7206391B2 (en) * 2003-12-23 2007-04-17 Apptera Inc. Method for creating and deploying system changes in a voice application system
US7697673B2 (en) 2003-11-17 2010-04-13 Apptera Inc. System for advertisement selection, placement and delivery within a multiple-tenant voice interaction service system
US7817784B2 (en) * 2003-12-23 2010-10-19 Apptera, Inc. System for managing voice files of a voice prompt server
US20050246174A1 (en) * 2004-04-28 2005-11-03 Degolia Richard C Method and system for presenting dynamic commercial content to clients interacting with a voice extensible markup language system
US8768711B2 (en) * 2004-06-17 2014-07-01 Nuance Communications, Inc. Method and apparatus for voice-enabling an application
US7519946B2 (en) * 2004-12-20 2009-04-14 International Business Machines Corporation Automatically adding code to voice enable a GUI component
US20060241947A1 (en) * 2005-04-25 2006-10-26 Belhaj Said O Voice prompt generation using downloadable scripts
US7848928B2 (en) * 2005-08-10 2010-12-07 Nuance Communications, Inc. Overriding default speech processing behavior using a default focus receiver
US7937687B2 (en) * 2006-09-01 2011-05-03 Verizon Patent And Licensing Inc. Generating voice extensible markup language (VXML) documents
US8713542B2 (en) * 2007-02-27 2014-04-29 Nuance Communications, Inc. Pausing a VoiceXML dialog of a multimodal application
US8086460B2 (en) * 2007-06-20 2011-12-27 International Business Machines Corporation Speech-enabled application that uses web 2.0 concepts to interface with speech engines
WO2009124223A1 (en) 2008-04-02 2009-10-08 Twilio Inc. System and method for processing telephony sessions
US8837465B2 (en) 2008-04-02 2014-09-16 Twilio, Inc. System and method for processing telephony sessions
US8321226B2 (en) * 2008-08-08 2012-11-27 Hewlett-Packard Development Company, L.P. Generating speech-enabled user interfaces
WO2010040010A1 (en) 2008-10-01 2010-04-08 Twilio Inc Telephony web event system and method
US8117538B2 (en) * 2008-12-19 2012-02-14 Genesys Telecommunications Laboratories, Inc. Method for dynamically converting voice XML scripts into other compatible markup language scripts based on required modality
US8509415B2 (en) 2009-03-02 2013-08-13 Twilio, Inc. Method and system for a multitenancy telephony network
WO2010101935A1 (en) 2009-03-02 2010-09-10 Twilio Inc. Method and system for a multitenancy telephone network
US8582737B2 (en) 2009-10-07 2013-11-12 Twilio, Inc. System and method for running a multi-module telephony application
US9210275B2 (en) 2009-10-07 2015-12-08 Twilio, Inc. System and method for running a multi-module telephony application
WO2011091085A1 (en) 2010-01-19 2011-07-28 Twilio Inc. Method and system for preserving telephony session state
US9590849B2 (en) 2010-06-23 2017-03-07 Twilio, Inc. System and method for managing a computing cluster
US9459926B2 (en) 2010-06-23 2016-10-04 Twilio, Inc. System and method for managing a computing cluster
US20120208495A1 (en) 2010-06-23 2012-08-16 Twilio, Inc. System and method for monitoring account usage on a platform
US9338064B2 (en) 2010-06-23 2016-05-10 Twilio, Inc. System and method for managing a computing cluster
US9459925B2 (en) 2010-06-23 2016-10-04 Twilio, Inc. System and method for managing a computing cluster
US8416923B2 (en) 2010-06-23 2013-04-09 Twilio, Inc. Method for providing clean endpoint addresses
US8838707B2 (en) 2010-06-25 2014-09-16 Twilio, Inc. System and method for enabling real-time eventing
US20120121108A1 (en) * 2010-11-16 2012-05-17 Dennis Doubleday Cooperative voice dialog and business logic interpreters for a voice-enabled software application
US8649268B2 (en) 2011-02-04 2014-02-11 Twilio, Inc. Method for processing telephony sessions of a network
US9081550B2 (en) * 2011-02-18 2015-07-14 Nuance Communications, Inc. Adding speech capabilities to existing computer applications with complex graphical user interfaces
US9398622B2 (en) 2011-05-23 2016-07-19 Twilio, Inc. System and method for connecting a communication to a client
US20140044123A1 (en) 2011-05-23 2014-02-13 Twilio, Inc. System and method for real time communicating with a client application
US9648006B2 (en) 2011-05-23 2017-05-09 Twilio, Inc. System and method for communicating with a client application
WO2013044138A1 (en) 2011-09-21 2013-03-28 Twilio, Inc. System and method for authorizing and connecting application developers and users
US10182147B2 (en) 2011-09-21 2019-01-15 Twilio Inc. System and method for determining and communicating presence information
US9495227B2 (en) 2012-02-10 2016-11-15 Twilio, Inc. System and method for managing concurrent events
US20130304928A1 (en) 2012-05-09 2013-11-14 Twilio, Inc. System and method for managing latency in a distributed telephony network
US9602586B2 (en) 2012-05-09 2017-03-21 Twilio, Inc. System and method for managing media in a distributed communication network
US9240941B2 (en) 2012-05-09 2016-01-19 Twilio, Inc. System and method for managing media in a distributed communication network
US9247062B2 (en) 2012-06-19 2016-01-26 Twilio, Inc. System and method for queuing a communication session
US8737962B2 (en) 2012-07-24 2014-05-27 Twilio, Inc. Method and system for preventing illicit use of a telephony platform
US8738051B2 (en) 2012-07-26 2014-05-27 Twilio, Inc. Method and system for controlling message routing
US8948356B2 (en) 2012-10-15 2015-02-03 Twilio, Inc. System and method for routing communications
US8938053B2 (en) 2012-10-15 2015-01-20 Twilio, Inc. System and method for triggering on platform usage
US9253254B2 (en) 2013-01-14 2016-02-02 Twilio, Inc. System and method for offering a multi-partner delegated platform
US9282124B2 (en) 2013-03-14 2016-03-08 Twilio, Inc. System and method for integrating session initiation protocol communication in a telecommunications platform
US9001666B2 (en) 2013-03-15 2015-04-07 Twilio, Inc. System and method for improving routing in a distributed communication platform
US9338280B2 (en) 2013-06-19 2016-05-10 Twilio, Inc. System and method for managing telephony endpoint inventory
US9240966B2 (en) 2013-06-19 2016-01-19 Twilio, Inc. System and method for transmitting and receiving media messages
US9225840B2 (en) 2013-06-19 2015-12-29 Twilio, Inc. System and method for providing a communication endpoint information service
US9483328B2 (en) 2013-07-19 2016-11-01 Twilio, Inc. System and method for delivering application content
US9137127B2 (en) 2013-09-17 2015-09-15 Twilio, Inc. System and method for providing communication platform metadata
US9338018B2 (en) 2013-09-17 2016-05-10 Twilio, Inc. System and method for pricing communication of a telecommunication platform
US9274858B2 (en) 2013-09-17 2016-03-01 Twilio, Inc. System and method for tagging and tracking events of an application platform
US9553799B2 (en) 2013-11-12 2017-01-24 Twilio, Inc. System and method for client communication in a distributed telephony network
US9325624B2 (en) 2013-11-12 2016-04-26 Twilio, Inc. System and method for enabling dynamic multi-modal communication
US9344573B2 (en) 2014-03-14 2016-05-17 Twilio, Inc. System and method for a work distribution service
US9226217B2 (en) 2014-04-17 2015-12-29 Twilio, Inc. System and method for enabling multi-modal communication
US9246694B1 (en) 2014-07-07 2016-01-26 Twilio, Inc. System and method for managing conferencing in a distributed communication network
US9516101B2 (en) 2014-07-07 2016-12-06 Twilio, Inc. System and method for collecting feedback in a multi-tenant communication platform
US9774687B2 (en) 2014-07-07 2017-09-26 Twilio, Inc. System and method for managing media and signaling in a communication platform
US9251371B2 (en) 2014-07-07 2016-02-02 Twilio, Inc. Method and system for applying data retention policies in a computing platform
WO2016065080A1 (en) 2014-10-21 2016-04-28 Twilio, Inc. System and method for providing a miro-services communication platform
CN105808217A (en) * 2014-12-30 2016-07-27 航天信息软件技术有限公司 Flow chart drawing method and system based on XML
US10291776B2 (en) * 2015-01-06 2019-05-14 Cyara Solutions Pty Ltd Interactive voice response system crawler
US11489962B2 (en) 2015-01-06 2022-11-01 Cyara Solutions Pty Ltd System and methods for automated customer response system mapping and duplication
US9477975B2 (en) 2015-02-03 2016-10-25 Twilio, Inc. System and method for a media intelligence platform
US10419891B2 (en) 2015-05-14 2019-09-17 Twilio, Inc. System and method for communicating through multiple endpoints
US9948703B2 (en) 2015-05-14 2018-04-17 Twilio, Inc. System and method for signaling through data storage
US10659349B2 (en) 2016-02-04 2020-05-19 Twilio Inc. Systems and methods for providing secure network exchanged for a multitenant virtual private cloud
US10063713B2 (en) 2016-05-23 2018-08-28 Twilio Inc. System and method for programmatic device connectivity
US10686902B2 (en) 2016-05-23 2020-06-16 Twilio Inc. System and method for a multi-channel notification service
US10824401B2 (en) * 2018-04-30 2020-11-03 MphasiS Limited Method and system for automated creation of graphical user interfaces
CN112669839B (en) * 2020-12-17 2023-08-08 阿波罗智联(北京)科技有限公司 Voice interaction method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5633916A (en) * 1994-12-30 1997-05-27 Unisys Corporation Universal messaging service using single voice grade telephone line within a client/server architecture
US20020194388A1 (en) * 2000-12-04 2002-12-19 David Boloker Systems and methods for implementing modular DOM (Document Object Model)-based multi-modal browsers
US20020198719A1 (en) * 2000-12-04 2002-12-26 International Business Machines Corporation Reusable voiceXML dialog components, subdialogs and beans
US20030083882A1 (en) * 2001-05-14 2003-05-01 Schemers Iii Roland J. Method and apparatus for incorporating application logic into a voice responsive system
US20030139928A1 (en) * 2002-01-22 2003-07-24 Raven Technology, Inc. System and method for dynamically creating a voice portal in voice XML
US20030147518A1 (en) * 1999-06-30 2003-08-07 Nandakishore A. Albal Methods and apparatus to deliver caller identification information
US20030182305A1 (en) * 2002-03-05 2003-09-25 Alexander Balva Advanced techniques for web applications
US20030212561A1 (en) * 2002-05-08 2003-11-13 Williams Douglas Carter Method of generating test scripts using a voice-capable markup language
US20040093217A1 (en) * 2001-02-02 2004-05-13 International Business Machines Corporation Method and system for automatically creating voice XML file
US7287248B1 (en) * 2002-10-31 2007-10-23 Tellme Networks, Inc. Method and system for the generation of a voice extensible markup language application for a voice interface process

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5633916A (en) * 1994-12-30 1997-05-27 Unisys Corporation Universal messaging service using single voice grade telephone line within a client/server architecture
US20030147518A1 (en) * 1999-06-30 2003-08-07 Nandakishore A. Albal Methods and apparatus to deliver caller identification information
US20020194388A1 (en) * 2000-12-04 2002-12-19 David Boloker Systems and methods for implementing modular DOM (Document Object Model)-based multi-modal browsers
US20020198719A1 (en) * 2000-12-04 2002-12-26 International Business Machines Corporation Reusable voiceXML dialog components, subdialogs and beans
US20040093217A1 (en) * 2001-02-02 2004-05-13 International Business Machines Corporation Method and system for automatically creating voice XML file
US20030083882A1 (en) * 2001-05-14 2003-05-01 Schemers Iii Roland J. Method and apparatus for incorporating application logic into a voice responsive system
US20030139928A1 (en) * 2002-01-22 2003-07-24 Raven Technology, Inc. System and method for dynamically creating a voice portal in voice XML
US20030182305A1 (en) * 2002-03-05 2003-09-25 Alexander Balva Advanced techniques for web applications
US20030212561A1 (en) * 2002-05-08 2003-11-13 Williams Douglas Carter Method of generating test scripts using a voice-capable markup language
US7287248B1 (en) * 2002-10-31 2007-10-23 Tellme Networks, Inc. Method and system for the generation of a voice extensible markup language application for a voice interface process

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282607A1 (en) * 2004-04-28 2007-12-06 Otodio Limited System For Distributing A Text Document
US20080162138A1 (en) * 2005-03-08 2008-07-03 Sap Aktiengesellschaft, A German Corporation Enhanced application of spoken input
US7672851B2 (en) * 2005-03-08 2010-03-02 Sap Ag Enhanced application of spoken input
US20090307664A1 (en) * 2006-09-20 2009-12-10 National Ict Australia Limited Generating a transition system for use with model checking
US8850415B2 (en) * 2006-09-20 2014-09-30 National Ict Australia Limited Generating a transition system for use with model checking
US20080162140A1 (en) * 2006-12-28 2008-07-03 International Business Machines Corporation Dynamic grammars for reusable dialogue components
US8417511B2 (en) * 2006-12-28 2013-04-09 Nuance Communications Dynamic grammars for reusable dialogue components
US20100281076A1 (en) * 2009-05-04 2010-11-04 National Taiwan University Assisting method and apparatus for accessing markup language document
US8150834B2 (en) * 2009-05-04 2012-04-03 National Taiwan University Assisting method and apparatus for accessing markup language document
US9135952B2 (en) * 2010-12-17 2015-09-15 Adobe Systems Incorporated Systems and methods for semi-automatic audio problem detection and correction

Also Published As

Publication number Publication date
US7287248B1 (en) 2007-10-23

Similar Documents

Publication Publication Date Title
US7287248B1 (en) Method and system for the generation of a voice extensible markup language application for a voice interface process
US7143040B2 (en) Interactive dialogues
US7487440B2 (en) Reusable voiceXML dialog components, subdialogs and beans
US6832196B2 (en) Speech driven data selection in a voice-enabled program
EP1535453B1 (en) System and process for developing a voice application
US7171361B2 (en) Idiom handling in voice service systems
US7877260B2 (en) Content creation, graphical user interface system and display
US8175248B2 (en) Method and an apparatus to disambiguate requests
US20020077823A1 (en) Software development systems and methods
US20060122836A1 (en) Dynamic switching between local and remote speech rendering
US9047869B2 (en) Free form input field support for automated voice enablement of a web page
US20080034032A1 (en) Methods and Systems for Authoring of Mixed-Initiative Multi-Modal Interactions and Related Browsing Mechanisms
US20030145062A1 (en) Data conversion server for voice browsing system
US20050028085A1 (en) Dynamic generation of voice application information from a web server
CA2493261A1 (en) System and method to disambiguate and clarify user intention in a spoken dialog system
US20090254347A1 (en) Proactive completion of input fields for automated voice enablement of a web page
US20070233495A1 (en) Partially automated technology for converting a graphical interface to a speech-enabled interface
US20030187656A1 (en) Method for the computer-supported transformation of structured documents
US20080120111A1 (en) Speech recognition application grammar modeling
CN110244941A (en) Task development approach, device, electronic equipment and computer readable storage medium
US20020193907A1 (en) Interface control
US7937687B2 (en) Generating voice extensible markup language (VXML) documents
US20030121002A1 (en) Method and system for exchanging information through speech via a packet-oriented network
CN109814916A (en) A kind of configuration method, device, storage medium and the server of IVR process
Leavitt Two technologies vie for recognition in speech market

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TELLME NETWORKS, INC.;REEL/FRAME:027910/0585

Effective date: 20120319

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034766/0509

Effective date: 20141014