WO2001084539A1 - Voice commands depend on semantics of content information - Google Patents
Voice commands depend on semantics of content information Download PDFInfo
- Publication number
- WO2001084539A1 WO2001084539A1 PCT/EP2001/004714 EP0104714W WO0184539A1 WO 2001084539 A1 WO2001084539 A1 WO 2001084539A1 EP 0104714 W EP0104714 W EP 0104714W WO 0184539 A1 WO0184539 A1 WO 0184539A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content information
- user
- control
- speech
- command
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/482—End-user interface for program selection
- H04N21/4821—End-user interface for program selection using a grid, e.g. sorted out by channel and broadcast time
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/78—Television signal recording using magnetic recording
- H04N5/781—Television signal recording using magnetic recording on disks or drums
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/84—Television signal recording using optical recording
- H04N5/85—Television signal recording using optical recording on discs or drums
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/907—Television signal recording using static stores, e.g. storage tubes or semiconductor memories
Definitions
- the invention relates to voice control, especially for the play-out of content information by consumer electronics (CE) equipment.
- CE consumer electronics
- U.S. patent 5,255,326 in particular addresses an interactive audio system that employs a sound signal processor coupled with a microprocessor as an interactive audio control system.
- a pair of transceivers operated as stereophonic loudspeakers and also as receiving microphones, are coupled with the signal processor for receiving voice commands from a principal user.
- the voice commands are processed to operate a variety of different devices, such as television, tape, radio or CD player for supplying signals to the processor, from which signals then are supplied to the loudspeakers of the transceivers to produce the desired sound.
- Additional infrared sensors may be utilized to constantly triangulate the position of the principal listener to supply signals back through the transceiver system to the processor for constantly adjusting the balance of the sound to maintain the "sweet spot" of the sound focused on the principal listener.
- Additional devices also may be controlled by the signal processor in response to voice commands which are matched with stored commands to produce an output from the signal processor to operate these other devices in accordance with the spoken voice commands.
- the system is capable of responding to voice commands simultaneously with the reproduction of stereophonic sound from any one of the sources of sound which are operated by the system.
- Speech recognition is a technology, aspects of which are discussed in, e.g., U.S. patent 5,987,409; U.S. patent 5,946,655; U.S. patent 5,613,034; U.S. patent 5,228,110; and U.S. patent 5,995,930, all incorporated herein by reference.
- the known speech control and voice control of devices or applications is limited to a fixed set of commands that is tied to the equipment.
- the inventors have realized that user-friendliness of, and ergonomic aspects during operational use of, voice-controllable equipment are enhanced if the voice command or voice commands are linked to the information content to be played out, rather than to the apparatus or platform. That is, the inventors believe that control of CE equipment should be content-centric, rather than device- centric.
- the commands are preferably tailored to the semantics of the content information.
- the content information comprises audio, e.g., a collection of songs
- selection of one or more specific ones of the songs is achieved by speaking the title or part of the lyrics of the song.
- Special meta-data is added to the content of the CD to enable this feature.
- This meta-data is typically, but not necessarily, a representation of the vocabulary required by the voice controller of the device or application to enable voice control for that particular CD and the music on it.
- the user can hum or (attempt to) sing a part of the desired piece of music in order to select it for play out.
- U.S. patent 5,963,957 issued 10/5/99 to Mark Hoffberg for BIBLIOGRAPHIC MUSIC DATA BASE WITH NORMALIZED MUSICAL THEMES (attorney docket PHA 23,241) incorporated herein by reference.
- This latter patent relates to an information processing system that comprises a music database.
- the music database stores homophonic reference sequences of music notes.
- the reference sequences are all normalized to the same scale degree so that they can be stored lexicographically.
- the system Upon finding a match between a string of input music notes and a particular reference sequence through an N-ary query, the system provides bibliographic information associated with the matching reference sequence. This system can also be used to convert the input hummed by the user into a play command via the N-ary query.
- the audio output of the system may trigger an undesirable activation of the speech-controlled processing, e.g., when a song is being played out.
- This undesirable activation is prevented, e.g., through echo cancellation, by pressing an activation button on the remote, e.g., the Pronto (TM), the universal programmable remote from Philips Electronics, to activate speech command receipt, or by having the equipment registering the user making a specific gesture, etc.
- the content information comprises video
- key scenes are labeled by key words so that speaking those words sets the playing out at the start of the relevant scene.
- a key word profile of the video content may be used to identify certain scenes, either through a one-to-one mapping of the user's voice input to the keywords or through a semantic mapping of the user's voice input onto an indexed list of the content's keyword labels and their synonyms.
- undesired activation is prevented from occurring, e.g., by using certain fixed commands or parts thereof such as a prefix.
- interactive software applications using graphics e.g., virtual reality or video games, are made speech-controllable by allowing the processes to associate speech input with controllable features of graphics objects displayed or to be displayed.
- actions to be carried out by a graphics object are made speech -controllable or speech- selectable by having the user say the proper words fitting the semantic context.
- This is suitable for video games allowing multiple modalities of control (e.g., both hand-input through joy-stick and speech input), as well as educational programs for teaching another language, or for teaching children the proper words and expressions for certain concepts such as tangible objects or actions.
- the speech is converted into data for being processed so as to identify the proper action intended. This is achieved through, e.g., semantic matching of the speech data with items in a pre-determined look-up table and finding the candidate for the closest match.
- the association between speech input and action intended may be made trainable by virtue of taking user-history into account.
- speech commands are derived from the content when the content is stored locally after downloading from the Web and/or playing- out. For example, key words in the lyrics are identified and stored as associated with the piece of audio whereto they pertain. This can be done by a dedicated software application. Either the digital data are analyzed or the audible lyrics are analyzed during the first play out of the audio content, for example, by isolating the voice part from the instrumental part and analyzing the former.
- the speech commands thus created can be used in addition to, or instead of, the basic set that comes with the specific content.
- the user is enabled to download preexisting or customized commands from the Web that pertain to specific content information and that are to be stored at the user's equipment as semantically associated with the information content for the purpose of enabling voice control.
- the user can make his/her home library of electronic content information, considered as a resource for the home network, fully speech driven.
- the user has a collection of CD's, DVD's, in his/her jukebox and/or on a hard disk. If the content relates to publicly available audio and video, a service provider can create a library of annotations for each piece of the content in advance, and the user can download those elements that are relevant to his/her collection.
- the annotations for a CD or DVD can be tied to the disk's identifier as well as to its segments.
- the name of an album spoken by the user, is linked to a certain identifier that in turn enables retrieval and selection of the CD or DVD in the jukebox.
- the name of a song or scene can be linked to both the identifier of the CD or DVD and to the relevant key frames. The user then speaks the terms "movie” and "car chase” and gets in return the movies available that have scenes in them that relate to a car chase.
- the speech commands are linked to the content as presented in an electronic program guide (EPG), e.g., as broadcast by a service provider.
- EPG electronic program guide
- a speech interface enables to select a specific program or program category that matches or match the words spoken by the user.
- commands as spoken by the user are processed via a server, e.g., a home server or a server on the Web and routed back to the Web-enabled play-out equipment as instructions.
- the server has an inventory of content available and a dictionary of words that are representative of the content's semantics.
- the Web-enabled equipment identifies to the server the content, e.g., through the identifier code of a CD or DVD, or through the header of a file, whereupon the speech commands for this content are readily matched to instructions for the control through, e.g., a look-up table.
- the voice control enables, e.g., the selection of a piece of content information for play-out, or for storage or for fast forward until a stop, etc. Also, content bookmarked with key words in advance can be browsed under voice control for retrieval of certain excerpts matching the voice input at the key word level.
- the first storage medium comprises the content information and the control information that enables voice control as explained above.
- the information for the voice control is copy- protected, as a result of which the copy does not have the control commands.
- This is considered a feature supporting the content information industry. If the consumer wants to have a full copy of the voice controlled version, he or she can download the voice control information from a server on the Internet identified by a link to the CD number or DVD number, at a certain price. This has the advantage that the author's rights are acknowledged, even if the price is merely symbolic. Thus, this feature contributes to maintaining awareness that content information is the intellectual property of the author or his/her assignees.
- voice command as used herein is meant to indicate a voice control input that may consist of one or more keywords but it may also comprise a more verbose linguistic expression.
- Figs.l and 2 are block diagrams of systems in the invention.
- the invention allows for voice control of apparatus or software applications, in particular of those that use content pre-recorded on a storage medium.
- Voice commands are used that semantically relate to, are associated with or based on, the content as stored in the storage medium.
- the commands are therefore different per sample of the medium's content. For example, the commands available for a CD with music from composer or lyrics author X are different from those for a CD with music composed by composer or lyrics author Y.
- the operation is as follows.
- the user inserts a CD of performer Daan van Schooneveld into the player.
- the CD stores the music and the software to enable the user to interact with the CD through voice control.
- the user says "Mustang Danny”
- the player starts to play the rock song of that title, one of the tracks of
- a jukebox application is a software application that allows for archiving CD content on the PC's hard disk drive (HDD).
- HDD hard disk drive
- the user has archived the Jos Swillens "Greatest Hits” CD on the HDD.
- the jukebox starts to play "My Beemer fits my crewcut", one of the tracks of Swillens' CD archived on the PC.
- the voice commands need not consist of only keywords but may comprise more verbose linguistic expressions.
- the system processes the voice input to match it with one of the options available using, e.g., a suitable search algorithm in an index list.
- a suitable search algorithm in an index list.
- the user has also archived the "Greatest Hits" CD from Koos Middeljans on the PC.
- the jukebox starts to play the folk song with that title, one of the tracks of the CD archived.
- the jukebox starts playing “Nat the Lab”.
- the jukebox starts playing the tracks of this CD in a random order.
- Copy protection measures are available and implemented, e.g., DRM (Digital Rights Management).
- DRM Digital Rights Management
- the speech commands as supplied together with the semantically related content information on a CD or DVD could be implemented in such a manner that they cannot be copied to a location other then the onboard memory of a player. Any copy to another location would lose this feature and become less attractive.
- the user downloads the content via the Internet together with the semantically related control date that enables voice controlled selection and play out in a similar manner as discussed for the jukebox.
- the control data is preferably an integral part of the downloaded data in this example.
- the same content information can be tied to phonetically different sets of voice commands, for example, to allow for differences in language and in pronunciation in different geographic regions so as to facilitate voice recognition.
- the user preferably has a choice of the language he or she wants to use for voice control of the system.
- the storage medium may have too small a storage capacity for storing the commands of all the languages likely to be used. If voice commands are not available from the medium in one of the languages most likely to be used, the play out device is preferably able to download the equivalent speech commands in the desired language whereupon the system will translate the commands at run time into the corresponding instructions.
- a dedicated service can be made available on the Internet.
- the recording is then accomplished at home under secure circumstances.
- the local recording preferably allows the consumer to create his/her own command set semantically related to a specific piece of content information. This needs some editing and a preferably a specific graphical user interface (GUI) that assists the user with establishing the relationships between content segments, voice input commands and actions or processing desired. For example, if the content information is not annotated at all, the user has to specify which segments he/she wants to control as separate items, how he/she wants to control is with what voice commands, and what actions should be taken upon what segment under what command.
- GUI graphical user interface
- the phonetic transcription covers any relevant form of phonetic transcription, independent of phoneme inventory, for example, limited to a subset of the vocabulary, or just for the exception of a standard pronunciation.
- a language model can be used optionally, that includes a description of how people typically interact with the system and say sentences (the so-called "language model"), be it via example sentences, patterns or phrases, via (stochastic) finite state grammars, via (stochastic) context free grammars, or another kind of grammar.
- the language model may just contain a modification of any standard way of communicating.
- the system optionally includes any description of what action should be triggered by certain words, commands, phrases, expressions, typically as given via a grammar.
- the system may include a dialogue model that includes a description of how the system should react to user's input and how the system enters a dialogue mode. For example, the system may ask for clarification, or to reconfirm a command, etc., under specific circumstances.
- the system may use a relationship between the data configuring the speech recognizer and other data. For example, the system has a display that shows what the user can say in order to play a current track.
- the storage medium e.g., a CD, DVD, solid state (e.g., flash) memory, etc.
- the storage medium has a bit pattern that gets recognized during start-up and that confirms the availability of the voice command feature.
- the confirmation can be conveyed to the user through, e.g., a pop-up screen on a display or spoken pre-recorded text supplied via the loudspeakers.
- CD-DA has the extra capacity of the R - W channels that can be used for adding the voice command feature without losing the CD's backwards compatibility.
- the lead-in tracks may not have adequate storage for the various language versions, but the data can be downloaded from the disc into a local memory. In this case each language has to be only once on the disc.
- CD ROM on the other hand, has a file structure which makes it easy to accommodate the speech control file on the disc as required.
- DVD also has a file structure and allows for the same approach as the CD ROM. Flash, HDD etc can be handled in the same way.
- Fig.l is a block diagram of a system 100 in the invention.
- System 100 comprises a play-out apparatus 102 for playing out content information 104 stored on a carrier 106.
- Carrier 106 comprises, for example, a CD, a DVD or a solid state memory.
- carrier 106 comprises a HDD onto which content information 104 has been downloaded via the Internet or another data network.
- Content information 104 in these examples is stored in a digital format.
- content information 104 may also be stored in an analog format.
- Apparatus 102 has a rendering subsystem 108 for making content information 104 available to the end-user. For example, if content information 104 comprises audio, sub-system 108 comprises one or more loudspeakers, and in case content information 104 comprises video information sub-system 108 comprises a display monitor.
- carrier 106 comprises control information 110 that is semantically associated with content information 104.
- Control information 110 enables a data processing sub-system 112 to determine if a voice input 114 by the user via a microphone (not shown) matches an information item in the control information. If there is a match, the relevant play-out mode is selected, examples of which have been given above.
- the semantic relationship between control information 110 on the one hand, and content information 104 on the other hand facilitates user-interaction with apparatus 102, owing to the highly intuitive correspondence, as explained above in the play-out examples of audio content.
- visual feedback is provided via a local display, e.g., a small LCD 116, as to the content available and/or mode selected.
- Carrier 106 can be a component that can be inserted into apparatus 102 one at a time.
- apparatus 102 comprises a jukebox functionality 118 that enables to select content from among multiple carriers (not shown) like carrier 106 or from among even physically different ones, CD and solid state memory, for example.
- Control information 110 is shown here as stored or recorded with content information 104 on carrier 106.
- a CD, DVD or flash can thus be supplied having prerecorded voice control applications and commands.
- control information 110 cooperates with a dedicated software application running on data processing system 112 for matching voice input 114 with one or more items available in control information 110.
- the software application is provided via another channel than the control information, e.g., via the Internet or a set-up diskette for setting up apparatus 102.
- Voice control itself is known, and so is user-interaction with an apparatus for selecting an operational mode of the apparatus.
- the invention here relates to using a control interface, part of which is semantically associated with the content information available for playing-out.
- System 100 provides auditory or visual feedback in response to the user having entered a spoken command. For example, system 100 confirms receipt of the command, e.g., by repeating the command word or command words in a pre-recorded voice if there is a match, or by supplying the word "confirmed" in a pre-recorded voice if there is a match. This feature can be readily implemented with a relatively small number of predetermined commands per information content item.
- the confirmation data can be integrated within control data 110.
- system 100 supplies auditory feedback indicating the negative status. For example, system 100 supplies in a pre-recorded voice "cannot process this command”, “cannot find this artist", or cannot find this song” or words of a similar meaning.
- auditory feedback system 100 can give visual feedback, e.g., a green blinking light if system 100 is capable of processing the voice input, and a red light if it is not.
- system 100 preferably pronounces, in a pre-recorded or synthetic voice, the name of the artist and the song title or album title of the content selected for being played out.
- the synthetic voice uses a text-to-speech engine for this feature so the system can use the information that comes available from the download or the media carrier.
- Text-to-Speech (TTS) systems convert words from a computer document (e.g., a word processor document, a web page) into audible speech through a loudspeaker.
- a computer document e.g., a word processor document, a web page
- the words are stored together with their phonetic transcription, comprising intonation of carrier sentences, etc.
- control data 110 comprises pre-recorded or synthetic voice data explaining to the user which commands, e.g., which song keywords, are available.
- the pre-recorded or synthetic voice data can again be part of control data 110.
- the user should be able to turn this on or off when he/she does not want the system to provide auditory feedback.
- Fig.2 is a diagram illustrating a system 200 with an EPG wherein available content information is identified and arranged in rows 202 and columns 204 on a display monitor 206. For example, each respective row represents a respective TV channel and each of the columns represents a specific time slot.
- a label or title 212 is shown that represents the content available from that specific channel and in that particular time slot.
- Other types of arrangements can be used instead, e.g., by topical category and time, or ranked by user- preference according to a profile per channel or resource (e.g., on the Internet), etc.
- the user can browse the EPG by, e.g., moving a window 214 across the grid of the EPG through a suitable user-interface (e.g., arrow keys on a wireless keyboard or another directional device, not shown) in order to get the portion of the EPG displayed that falls within the boundaries of window 214.
- the user can thereupon select particular content information by clicking or highlighting the associated label in the portion displayed.
- an EPG is supplied via the Internet by a service provider.
- the EPG is enhanced with additional control software 216 that enables a mode of user-interaction with the EPG other than the conventional clicking or highlighting of a desired label.
- Control software 216 is preferably downloaded, updated or refreshed together with the EPG.
- Control software 216 comprises control information 218 associated with the semantics of the labels that identify the programs in the EPG for user-selection.
- the EPG's grid is re-organized to only show the available programs according to the category "movie” in window 214, or the movie programs are graphically represented as distinct from programs in the other categories.
- the user browses through the category "movies", preferably also under speech command.
- the user sees the movie of his/her liking and enters as voice input the expression "The Magnificent Six and Okke", the title indicated in the EPG of the classic movie about an aviation event.
- the user enters "tonight” and "from eight o'clock” upon which window 214 is being located to, at least partly, show the collection of programs available that day and as from eight o'clock (8:00pm) on.
- the user has identified an interesting program in the portion of the EPG displayed in window 214 and speaks the words, representative of the title of the program, into microphone 220. Then, the user speaks "watch” or "record”. The words that represent the title are converted into a suitable format for comparison with control information 218.
- the control software 216 enables a microprocessor 222 to control a tuner 224 and display monitor 206 or a recording device 226. In this manner, the user can interact with the EPG using voice control.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020017016976A KR20020027382A (en) | 2000-05-03 | 2001-04-26 | Voice commands depend on semantics of content information |
EP01940369A EP1281173A1 (en) | 2000-05-03 | 2001-04-26 | Voice commands depend on semantics of content information |
JP2001581272A JP2003532164A (en) | 2000-05-03 | 2001-04-26 | How to control the processing of content information |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US20148800P | 2000-05-03 | 2000-05-03 | |
US60/201,488 | 2000-05-03 | ||
US62152200A | 2000-07-21 | 2000-07-21 | |
US09/621,522 | 2000-07-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001084539A1 true WO2001084539A1 (en) | 2001-11-08 |
Family
ID=26896795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2001/004714 WO2001084539A1 (en) | 2000-05-03 | 2001-04-26 | Voice commands depend on semantics of content information |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1281173A1 (en) |
JP (1) | JP2003532164A (en) |
KR (1) | KR20020027382A (en) |
CN (1) | CN1193343C (en) |
WO (1) | WO2001084539A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1455342A1 (en) * | 2003-03-05 | 2004-09-08 | Delphi Technologies, Inc. | System and method for voice enabling audio compact disc players via descriptive voice commands |
GB2402507A (en) * | 2003-06-03 | 2004-12-08 | Canon Kk | A user input interpreter and a method of interpreting user input |
EP1686796A1 (en) * | 2005-01-05 | 2006-08-02 | Alcatel | Electronic program guide presented by an avatar featuring a talking head speaking with a synthesized voice |
EP2675153A1 (en) * | 2012-06-14 | 2013-12-18 | Samsung Electronics Co., Ltd | Display apparatus, interactive server, and method for providing response information |
EP1259071B1 (en) * | 2001-05-15 | 2018-12-05 | Thomson Licensing | Method for modifying a user interface of a consumer electronic apparatus, corresponding consumer electronic apparatus |
US10257576B2 (en) | 2001-10-03 | 2019-04-09 | Promptu Systems Corporation | Global speech user interface |
US20190147052A1 (en) * | 2017-11-16 | 2019-05-16 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for playing multimedia |
WO2019108257A1 (en) * | 2017-11-28 | 2019-06-06 | Rovi Guides, Inc. | Methods and systems for recommending content in context of a conversation |
CN110880321A (en) * | 2019-10-18 | 2020-03-13 | 平安科技(深圳)有限公司 | Intelligent braking method, device and equipment based on voice and storage medium |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2410682A3 (en) * | 2005-03-31 | 2012-05-02 | Yamaha Corporation | Control apparatus for music system comprising a plurality of equipments connected together via network, and integrated software for controlling the music system |
JP4655722B2 (en) * | 2005-03-31 | 2011-03-23 | ヤマハ株式会社 | Integrated program for operation and connection settings of multiple devices connected to the network |
EP2933796B1 (en) * | 2014-04-17 | 2018-10-03 | Softbank Robotics Europe | Executing software applications on a robot |
US10659851B2 (en) * | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10609454B2 (en) * | 2015-07-31 | 2020-03-31 | Promptu Systems Corporation | Natural language navigation and assisted viewing of indexed audio video streams, notably sports contests |
US20170127150A1 (en) * | 2015-11-04 | 2017-05-04 | Ubitus Inc. | Interactive applications implemented in video streams |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5774859A (en) * | 1995-01-03 | 1998-06-30 | Scientific-Atlanta, Inc. | Information system having a speech interface |
EP1037463A2 (en) * | 1999-03-15 | 2000-09-20 | Matsushita Electric Industrial Co., Ltd. | Voice activated controller for recording and retrieving audio/video programs |
EP1079371A1 (en) * | 1999-08-26 | 2001-02-28 | Matsushita Electric Industrial Co., Ltd. | Universal remote control allowing natural language modality for television and multimedia searches and requests |
-
2001
- 2001-04-26 WO PCT/EP2001/004714 patent/WO2001084539A1/en not_active Application Discontinuation
- 2001-04-26 KR KR1020017016976A patent/KR20020027382A/en not_active Application Discontinuation
- 2001-04-26 JP JP2001581272A patent/JP2003532164A/en active Pending
- 2001-04-26 EP EP01940369A patent/EP1281173A1/en not_active Withdrawn
- 2001-04-26 CN CNB018011926A patent/CN1193343C/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5774859A (en) * | 1995-01-03 | 1998-06-30 | Scientific-Atlanta, Inc. | Information system having a speech interface |
EP1037463A2 (en) * | 1999-03-15 | 2000-09-20 | Matsushita Electric Industrial Co., Ltd. | Voice activated controller for recording and retrieving audio/video programs |
EP1079371A1 (en) * | 1999-08-26 | 2001-02-28 | Matsushita Electric Industrial Co., Ltd. | Universal remote control allowing natural language modality for television and multimedia searches and requests |
Non-Patent Citations (1)
Title |
---|
See also references of EP1281173A1 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1259071B1 (en) * | 2001-05-15 | 2018-12-05 | Thomson Licensing | Method for modifying a user interface of a consumer electronic apparatus, corresponding consumer electronic apparatus |
US10932005B2 (en) | 2001-10-03 | 2021-02-23 | Promptu Systems Corporation | Speech interface |
US11172260B2 (en) | 2001-10-03 | 2021-11-09 | Promptu Systems Corporation | Speech interface |
US10257576B2 (en) | 2001-10-03 | 2019-04-09 | Promptu Systems Corporation | Global speech user interface |
US11070882B2 (en) | 2001-10-03 | 2021-07-20 | Promptu Systems Corporation | Global speech user interface |
EP1455342A1 (en) * | 2003-03-05 | 2004-09-08 | Delphi Technologies, Inc. | System and method for voice enabling audio compact disc players via descriptive voice commands |
GB2402507A (en) * | 2003-06-03 | 2004-12-08 | Canon Kk | A user input interpreter and a method of interpreting user input |
EP1686796A1 (en) * | 2005-01-05 | 2006-08-02 | Alcatel | Electronic program guide presented by an avatar featuring a talking head speaking with a synthesized voice |
EP2675153A1 (en) * | 2012-06-14 | 2013-12-18 | Samsung Electronics Co., Ltd | Display apparatus, interactive server, and method for providing response information |
US9219949B2 (en) | 2012-06-14 | 2015-12-22 | Samsung Electronics Co., Ltd. | Display apparatus, interactive server, and method for providing response information |
US20190147052A1 (en) * | 2017-11-16 | 2019-05-16 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for playing multimedia |
WO2019108257A1 (en) * | 2017-11-28 | 2019-06-06 | Rovi Guides, Inc. | Methods and systems for recommending content in context of a conversation |
US11140450B2 (en) | 2017-11-28 | 2021-10-05 | Rovi Guides, Inc. | Methods and systems for recommending content in context of a conversation |
US11716514B2 (en) | 2017-11-28 | 2023-08-01 | Rovi Guides, Inc. | Methods and systems for recommending content in context of a conversation |
CN110880321A (en) * | 2019-10-18 | 2020-03-13 | 平安科技(深圳)有限公司 | Intelligent braking method, device and equipment based on voice and storage medium |
Also Published As
Publication number | Publication date |
---|---|
EP1281173A1 (en) | 2003-02-05 |
CN1193343C (en) | 2005-03-16 |
CN1381039A (en) | 2002-11-20 |
JP2003532164A (en) | 2003-10-28 |
KR20020027382A (en) | 2002-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10956006B2 (en) | Intelligent automated assistant in a media environment | |
JP3577454B2 (en) | Mechanism for storing information about recorded television broadcasts | |
US20090076821A1 (en) | Method and apparatus to control operation of a playback device | |
US7684991B2 (en) | Digital audio file search method and apparatus using text-to-speech processing | |
JP3554262B2 (en) | Universal remote control that enables natural language modality for television and multimedia retrieval and demand | |
US9153233B2 (en) | Voice-controlled selection of media files utilizing phonetic data | |
EP1693830B1 (en) | Voice-controlled data system | |
US8106285B2 (en) | Speech-driven selection of an audio file | |
US6643620B1 (en) | Voice activated controller for recording and retrieving audio/video programs | |
US7870142B2 (en) | Text to grammar enhancements for media files | |
US20040266337A1 (en) | Method and apparatus for synchronizing lyrics | |
EP1281173A1 (en) | Voice commands depend on semantics of content information | |
JPH09185879A (en) | Recording indexing method | |
JP2005539254A (en) | System and method for media file access and retrieval using speech recognition | |
KR20100005177A (en) | Customized learning system, customized learning method, and learning device | |
US6741791B1 (en) | Using speech to select a position in a program | |
JP2002189483A (en) | Voice input-type musical composition search system | |
Lindsay et al. | Representation and linking mechanisms for audio in MPEG-7 | |
JP5431817B2 (en) | Music database update device and music database update method | |
KR20080065205A (en) | Customized learning system, customized learning method, and learning device | |
Laia et al. | Designed for Enablement or Disabled by Design? Choosing the Path to Effective Speech Application Design |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001940369 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020017016976 Country of ref document: KR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 018011926 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2001940369 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001940369 Country of ref document: EP |