WO2005041455A1 - Video content detection - Google Patents

Video content detection Download PDF

Info

Publication number
WO2005041455A1
WO2005041455A1 PCT/IB2003/050031 IB0350031W WO2005041455A1 WO 2005041455 A1 WO2005041455 A1 WO 2005041455A1 IB 0350031 W IB0350031 W IB 0350031W WO 2005041455 A1 WO2005041455 A1 WO 2005041455A1
Authority
WO
WIPO (PCT)
Prior art keywords
fingerprint data
audio fingerprint
data item
video
content
Prior art date
Application number
PCT/IB2003/050031
Other languages
French (fr)
Inventor
Jozef P. Van Gassel
Declan P. Kelly
Jan A. D. Nesvadba
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to AU2003283783A priority Critical patent/AU2003283783A1/en
Publication of WO2005041455A1 publication Critical patent/WO2005041455A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • H04H60/58Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54 of audio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H20/00Arrangements for broadcast or for distribution combined with broadcast
    • H04H20/10Arrangements for replacing or switching information during the broadcast or the distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H2201/00Aspects of broadcast communication
    • H04H2201/90Aspects of broadcast communication characterised by the use of signatures

Definitions

  • the invention relates to the detection of content, e.g. commercials, in multimedia signals such as multimedia data streams.
  • PVR personal video recorder
  • An electronic program guide is an application used with digital set-top boxes, modern television sets, video recorders, etc. in order to list current and scheduled programs that are or will be available on each channel together with a short summary or commentary for each program.
  • EPG electronic program guide
  • an EPG is accessed using a remote control device.
  • Menus are provided that allow the user to view a list of programs scheduled for the next few hours up to the next seven days.
  • a typical EPG includes options to set parental controls, to order pay- per-view programming, to search for programs based on theme or category, and to set up a VCR to record programs.
  • an EPG may be used, sometimes in combination with the personal preferences or profile of the user, to (automatically) schedule recordings of programs selected from a wide range of television channels.
  • This approach of recording television broadcasts has emerged, because of the random accessibility of hard disc drives, thereby creating a number of interesting possibilities, such as the simultaneous recording and playback of programs, the simultaneous recording of multiple programs, etc.
  • the huge storage capacity of current hard disk drives and the availability of consumer priced video encoders are of key importance as well.
  • the broadcaster decides to broadcast a program of interest to the user, e.g. as defined by a stored preference list, a user profile, or the like, the PVR is not able to record it unless it is specifically added to the EPG.
  • EPGs are rarely up-to-date to the last minute, because they are often compiled by a third party, i.e. not necessarily the broadcaster, or the PVR box manufacturer.
  • the updating process typically takes place via a dial-up connection, i.e. it is only updated periodically.
  • Another complication is the fact that many broadcasters are still broadcasting their programs using analog technologies. Consequently, the start and ending of programs are not explicitly or incorrectly signaled by the broadcaster.
  • a system that can detect commercials may allow substitute advertisements to be inserted in a video stream ("commercial swapping") or temporary halting of the video at the end of a commercial to prevent a user, momentarily distracted during a commercial, from missing any of the main program content.
  • a robust and efficient method of identifying content in a multimedia signal is provided.
  • a method is provided for robustly recognizing the specific content of the received multimedia signal, e.g. the start or finish of a specific television program, such as a specific television show, a specific news program, the start or finish of commercials, or the like.
  • multimedia signals comprise a video part representing (moving) pictures and an audio part representing the associated audio content.
  • the visual content and the audio content may be encoded in a multimedia signal by a number of different encoding schemes.
  • the multimedia signal may represent a sequence of picture frames that are grouped into blocks, where each block comprises a number of frames and the associated audio data.
  • MPEG Moving Pictures Experts Group
  • multimedia signal refers to an analog or digital signal or data comprising the actual video content and the associated audio content to be presented together, preferably synchronized, with the video content.
  • the data representing the video content alone will be referred to as video data, while the data representing the audio content will be referred to as audio data.
  • the audio fingerprints provide a particularly reliable method of recognizing specific content in a multimedia signal. For example, it has been observed that e.g. audio trailers of news programs, television shows, etc., remain unchanged over long periods of time and, therefore, they provide a reliable source for recognizing these programs.
  • the term audio fingerprint comprises any suitable method of extracting robust features from audio data indicative of the audio content in such data, and storing the extracted features in a compact form.
  • an audio fingerprint is a representation of the corresponding audio content in question.
  • the fingerprint is shorter than the original audio signal.
  • the fingerprint represents the most relevant perceptual features of the audio signal in question.
  • Such fingerprints are sometimes also known as "robust hashes".
  • robust hashes refers to a hash function which, to a certain extent, is robust with respect to data processing and signal degradation, e.g. due to compression/decompression, coding, AD/DA conversion, etc.
  • Robust hashes are sometimes also referred to as robust summaries, robust signatures, or perceptual hashes.
  • the term content element comprises any fragment of a video program, e.g. a trailer, a leader, a jingle, or the like.
  • a content element may further comprise the entire content of a video program, e.g. an entire commercial, or the like.
  • the'audio fingerprints of a large number of video programs are stored, e.g. in a database.
  • the content in a multimedia signal is recognized by computing an audio fingerprint of the associated audio content and by performing a lookup or query in the database using the computed fingerprint as a lookup key or query parameter. It is understood that more than one fingerprint may be associated with a given content element.
  • the reference audio fingerprints may be stored in a database locally in the device, e.g. a PVR, thereby allowing efficient content identification by the device without the need for establishing a communications link to a remote fingerprint server.
  • the matching of the computed fingerprints against reference fingerprints may be done at a remote location, for example on a server connected to the Internet.
  • the client device computes the fingerprint and sends it to the server, which returns a content identifier.
  • a combination of a remote and a local database may be used too, e.g. for supplementing a remote database with a personal local database of fingerprints related to content of personal interest of the user.
  • the extracted fingerprint data items may be added to the set of reference audio fingerprint data items, e.g. subject to an approval by a user, thereby gradually extending the set of reference audio fingerprints.
  • the method further comprises:
  • the audio fingerprint data items are stored along with a video content identifier, i.e. an identification of their respective content, e.g. the title of the program, a number identifying the program, or the like. Accordingly, a lookup in the database returns the stored video content identifier indicative of the content of the recorded video program.
  • a video content identifier i.e. an identification of their respective content, e.g. the title of the program, a number identifying the program, or the like.
  • the method further comprises:
  • a predetermined part of a video program may comprise the entire program.
  • the method further comprises: [37] - comparing the generated audio fingerprint data item with at least one previously generated audio fingerprint data item to generate viewing frequency information indicative of a number of times a corresponding content element has previously been presented to a user; and
  • the PVR can compile a 'hot list' of frequently occurring audio fingerprint information derived from the recorded programs that are previously stored/recorded onto the hard disk of the PVR. Using this hot list, the PVR can automatically record such programs in the future, even for programs and channels that do not have an entry associated with them in the EPG. At the same time this information as derived from the contents stored on the hard disk of the PVR can be used to improve the profile of the user.
  • such a list of frequently occurring fingerprints may be used to trigger other types of decisions, e.g. to control the PVR not to record the video content corresponding to selected ones of the frequent fingerprints, as they may relate to commercials or the like.
  • the audio fingerprints are calculated for certain characteristic parts of a video program only, e.g. leaders and/or trailers of video programs, thereby reducing the amount of data that has to be calculated, stored, and compared.
  • the method further comprises controlling a video recording device in response to the retrieved video content identifier.
  • the method of identifying content in a multimedia signal according to the invention is combined with other commercial detection algorithms, thereby significantly improving the reliability and accuracy of existing algorithms.
  • Usually commercial blocks are separated from the normal program by a leader and trailer that signal the start and end of these blocks, respectively. Since these leaders and trailers only rarely change (typically at most once a season) they are well suited for identification by the audio fingerprinting according to the invention.
  • the method further comprises communicating information about the detected content element to a remote data processing system.
  • the invention provides a mechanism for providing feedback information about the viewed programs by the user of a PVR to a third party.
  • the present invention can be implemented in different ways including the method described above and in the following, further methods and arrangements, a video recorder, and further product means, each yielding one or more of the benefits and advantages described in connection with the first-mentioned method, and each having one or more preferred embodiments corresponding to the preferred embodiments described in connection with the first-mentioned method and disclosed in the dependant claims.
  • a second aspect of the invention relates to a method of recording a video program by a video recording device.
  • One of the interesting features of modern PVRs is the possibility of a pre-programmed recording of a series of programs, e.g. a complete set of episodes of a television series.
  • programs do not always start and stop exactly at the times indicted in the (electronic) program guide. This can be caused by a number of different reasons, such as cancellation, delays or radio interference.
  • programs may be longer than anticipated, e.g. due to inserted news flashes or extra-time in sports broadcasts.
  • Such unannounced content can nevertheless be very interesting to the users. Examples of such unannounced content include news flashes, weather forecasts and stock market updates that might be broadcasted at random intervals during the day (without explicitly being listed in the EPG).
  • the first-mentioned method may be used in a PVR as a tool to provide feedback information about the viewed programs by the user of the PVR to a third party.
  • the method may be applied as a tool to market products offered in commercials or during regular programs, thereby enabling e-commerce applications.
  • the invention relates to a method of communicating information about a content element of a video program , the method comprising:
  • [60] receiving an audio fingerprint data item generated by a device for presenting multimedia content, the audio fingerprint data item representing a predetermined content element of the presented multimedia content;
  • the user can initiate that an audio fingerprint data item is sent back to the provider of the television program or to a third party, thereby showing his/her interest in this item.
  • an audio fingerprint data item is sent back to the provider of the television program or to a third party, thereby showing his/her interest in this item.
  • the transaction can be limited to be performed between the end-user and the third party, e.g. via the Internet or another communications channel, e.g. a telecommunications link.
  • the features of the methods described above and in the following may be implemented in software and carried out in a data processing system or other processing means caused by the execution of computer-executable instructions.
  • the instructions may be program code means loaded in a memory, such as a RAM, from a storage medium or from another computer via a computer network.
  • the described features may be implemented by hardwired circuitry instead of software or in combination with software.
  • the invention further relates to an arrangement for detecting content in a multimedia signal, the arrangement comprising:
  • [69] - processing means adapted to compare the determined audio fingerprint data item with at least one of a number of reference audio fingerprint data items each related to a corresponding content element and, if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, to identify the corresponding related content element as detected.
  • processing means comprises general- or special-purpose programmable microprocessors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Programmable Logic Arrays (PLA), Field Programmable Gate Arrays (FPGA), special purpose electronic circuits, etc., or a combination thereof.
  • DSP Digital Signal Processors
  • ASIC Application Specific Integrated Circuits
  • PDA Programmable Logic Arrays
  • FPGA Field Programmable Gate Arrays
  • special purpose electronic circuits etc., or a combination thereof.
  • Examples of storage means include magnetic tape, optical disc, digital video disk (DVD), compact disc (CD or CD-ROM), mini-disc, hard disk, floppy disk, ferroelectric memory, electrically erasable programmable read only memory (EEPROM), flash memory, EPROM, read only memory (ROM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), ferromagnetic memory, optical storage, charge coupled devices, smart cards, PCMCIA card, etc.
  • DVD digital video disk
  • CD or CD-ROM compact disc
  • mini-disc hard disk
  • floppy disk ferroelectric memory
  • EEPROM electrically erasable programmable read only memory
  • flash memory EPROM
  • ROM read only memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • ferromagnetic memory optical storage, charge coupled devices, smart cards, PCMCIA card, etc.
  • the arrangement may further comprise means for providing a number of reference audio fingerprint data items.
  • Such means may comprise storage means for storing such data items, communications means for receiving such data items, or any other circuitry or device suitable for providing such data items.
  • the means for providing a multimedia signal may comprise any circuitry or device for receiving a multimedia signal, such as a receiver, e.g. a television receiver, a satellite receiver, a storage means for multimedia signals, or any other suitable communications means.
  • a receiver e.g. a television receiver, a satellite receiver, a storage means for multimedia signals, or any other suitable communications means.
  • the arrangement may further comprise communications means for communicating information about the detected content element, e.g. a retrieved video content identifier, to a remote data processing system.
  • communications means for communicating information about the detected content element, e.g. a retrieved video content identifier, to a remote data processing system.
  • the term communications means comprises circuitry and/or devices suitable for enabling the communication of data between, e.g. via a wired or a wireless data link.
  • communications means include a network interface, a network card, a radio transmitter/receiver, a cable modem, a telephone modem, an Integrated Services Digital Network (ISDN) adapter, a Digital Subscriber Line (DSL) adapter, a satellite transceiver, an Ethernet adapter, or the like.
  • ISDN Integrated Services Digital Network
  • DSL Digital Subscriber Line
  • the invention further relates to a video recorder comprising such an arrangement.
  • the invention further relates to a system for communicating information about a content element of a video program, the system comprising: [78] a device for presenting a multimedia signal, the multimedia signal comprising video data and corresponding audio data, the device for presenting a multimedia signal comprising [79] - processing means for determining an audio fingerprint data item from a predetermined part of the presented audio data; [80] - communications means for transmitting the determined audio fingerprint data item; [81] a data processing system comprising
  • - storage means having stored thereon a number of reference audio fingerprint data items each related to a corresponding content element;
  • - processing means adapted to compare the received audio fingerprint data item with at least one of the number of reference audio fingerprint data items, and, if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, to identify the corresponding related content element as detected.
  • the invention further relates to a device for presenting a multimedia signal in such a system and a data processing system in such a system.
  • FIG. 1 shows a block diagram of a video recorder including an arrangement for detecting content in a multimedia signal according to an embodiment of the invention
  • FIG. 1 shows a block diagram of a video recorder including an arrangement for detecting content in a multimedia signal according to an embodiment of the invention
  • fig. 2 shows a flow diagram of a method of detecting content in a multimedia signal according to an embodiment of the invention
  • fig. 3 shows a block diagram of a video recorder comprising an arrangement for generating reference audio fingerprint data items according to an embodiment of the invention
  • [90] fig. 4 schematically shows a fingerprint database module according to an embodiment of the invention
  • [91] fig. 5 shows a system for communicating information about multimedia content according to an embodiment of the invention.
  • Fig. 1 shows a block diagram of a video recorder comprising an arrangement for detecting content in a multimedia signal according to an embodiment of the invention.
  • the video recorder 101 receives a multimedia data stream 107.
  • the multimedia data stream comprises video data representing moving pictures and audio data representing corresponding audio data.
  • the multimedia data may be received by a receiver, e.g. a digital or analog receiver for receiving television programs or other multimedia content, e.g. via an antenna, a cable network, a satellite, or another communications network, such as the Internet.
  • the multimedia data may be coded according to any suitable coding scheme, e.g. an MPEG scheme. It is noted that the multimedia data may comprise a plurality of parallel data streams corresponding to a plurality of video channels or the like.
  • the video recorder 101 comprises a recorder block 102 which stores received multimedia data 107 onto a storage medium 103.
  • the storage medium 103 may be a hard disk or any other suitable storage means.
  • the recorder may record one or more video programs at the same time.
  • the video recorder 101 further comprises an audio fingerprint calculation block 104 which receives the input data and computes one or more audio fingerprints from the audio component of the input data.
  • the video recorder further comprises a fingerprint database module 105 which receives the calculated audio fingerprint item(s) from the audio fingerprint calculation block 104.
  • the fingerprint database module 105 has further access to a fingerprint database 106 which comprises a number of reference fingerprint data items. Based on a comparison of the calculated audio fingerprint item(s) and the reference fingerprint data items in the database 106, the fingerprint database module 105 controls the recording operations of the recorder block 102, e.g. by starting a recording, stopping, pausing, resuming a recording, or the like.
  • the reference fingerprints may be stored in a different way, e.g. as files in a file systems. It is an advantage of a database system that it allows an efficient search when a large number of reference fingerprints are stored.
  • the multimedia input data 107 may originate from the storage medium 103, i.e. the multimedia data may be previously stored program material which is to be re-editted, e.g. in order to remove commercials or other unwanted material, and the processed data is stored on the storage medium 103.
  • Fig. 2 shows a flow diagram of a method of detecting content in a multimedia signal according to an embodiment of the invention.
  • a recording device receives a segment of an input signal representing multimedia data comprising video data and audio data.
  • the received segment may comprise a predetermined number of video frames and the corresponding audio data.
  • a fingerprint H is calculated for a segment of the audio data of the received multimedia data.
  • step 203 the calculated fingerprint is compared to the reference fingerprints stored in a database 106, e.g. by performing a database query using the calculated fingerprint H as a key. If no matching reference fingerprint is found, i.e. no content is recognized in the multimedia data, the process returns to step 201 receiving a next segment of the input signal. If a match is found, a predetermined content corresponding to the matching reference audio fingerprint is determined to be detected (step 204) in the multimedia data, and the process continues at step 205.
  • step 205 additional data corresponding to the identified content is retrieved from the database 106.
  • this information comprises a content identifier, e.g. a title of a video program or another suitable identifier identifying a specific program, a specific series or type of programs, e.g. identifying the content as an episode of a specific series or show, as a news flash of specific news program, as a specific commercial for a specific product, or the like.
  • the additional data comprises control code information indicative of a predetermined operation to be initiated in response to the detected content, such as displaying additional information, starting, stopping, etc. of a recording, etc.
  • the operation may further be conditioned on a explicit acknowledgment by a user.
  • step 206 the operation of the video recorder is controlled according to the detected content.
  • Examples of the control of operations include the pausing of a recording at the beginning of a detected commercial, news program, etc., which interrupts the current program, and resuming the recording at the end of the commercial or the like.
  • This skipping of (single) commercials can also be applied in live viewing situations.
  • additional commercial detection, scene- change and/or genre change detection algorithms can be used in combination with the present invention. It is understood, that a commercial enforcement may also be implemented.
  • step 207 If the process is not stopped (step 207), the process continues at step 201 and receives the next signal segment.
  • Fig. 3 shows a block diagram of a video recorder comprising an arrangement for generating reference audio fingerprint data items according to an embodiment of the invention.
  • the recognition of content in a multimedia signal is based on a comparison with a number of reference audio fingerprints, e.g. of leaders and trailers of commercial blocks and/or programs of interest, available in the database 106.
  • the video recorder 101 of fig. 3 comprises an arrangement for acquiring such information.
  • the video recorder 101 receives input multimedia data 107, e.g. during recordings or during a separate configuration or training session.
  • the video recorder comprises a recorder block 102 that stores a recording of a predetermined program material on a storage medium 103.
  • the video recorder further comprises a fingerprint calculation block 104 that receives the input data and computes audio fingerprints from the audio component of the input data.
  • the generated audio fingerprint information is written as a separate stream or file to the storage medium 103.
  • the fingerprint information may be embedded in private data of the encoded multimedia stream.
  • the video recorder further comprises a fingerprint management block 301 which retrieves previously recorded multimedia data and the corresponding fingerprints from the storage medium 103 and identifies the fingerprint information associated with leaders and trailers of commercials and other programs. The identified fingerprint information is stored in the fingerprint database 106.
  • this identification of relevant fingerprint information is performed by comparing the beginning and ending of different recordings of the same program, e.g. of different episodes of a television series or show or of different news bulletins.
  • the fingerprint management block 301 provides a user interface allowing a user to select a number of stored recordings to be used as a basis for identifying fingerprint data.
  • the identification of key program segments providing recognizable program content which is indicative for a given program may be performed automatically, e.g. by correlating the fingerprint information of different episodes.
  • the fingerprint management block 301 may provide a user interface allowing a user to explicitly select such key fragments of a program.
  • an automatic and a manual identification of key fragments are combined, e.g. by requesting a confirmation from the user on fragments identified by the video recorder. It is noted that the present invention may be combined with other automatic commercial detectors in order to identify the boundaries of the commercials and/or the boundaries of the single commercial clips to be stored in the fingerprint database.
  • the fingerprint management block 301 further provides a user interface allowing a user to input additional data, such as a content identifier or other descriptive information about the content of the program material.
  • additional data may further comprise control code information, e.g. an indication whether the identified content is to be recorded whenever detected, whether the identified content is to be skipped during recordings, etc.
  • the method according to the invention may be used to implement a commercial skip on a commercial-by-commercial basis.
  • a commercial 'thumbs down' button may be supplied allowing the user to disqualify a specific commercial that will be skipped automatically in the future.
  • the fingerprint management block 301 can compile a 'hot list' of frequently occurring audio fingerprints derived from the recorded programs that are previously stored on the storage medium 103. Using this hot list, a number of decisions are facilitated. For example, the video recorder may be controlled to automatically record such programs in the future, even programs and channels that do not have an entry associated with them in the EPG. At the same time this information as derived from the hard disk contents can be used to improve the profile of the user.
  • the fingerprint information may be made available by service providers, e.g. on a web-site of the Internet.
  • the reference fingerprint information may be accessed on-line via the Internet, or the fingerprint information may be downloaded and stored in the video recorder, e.g. embedded into the EPG.
  • fingerprints may be distributed on a storage medium, such as on a disc, e.g. a CD, DVD, etc.
  • Fig. 4 schematically shows a fingerprint database module according to an embodiment of the invention.
  • the fingerprint database module 105 comprises an input module 401, a Database Management System (DBMS) backend module 403, and a response module 404.
  • DBMS Database Management System
  • the input module 401 receives an audio fingerprint and supplies the fingerprint to the DBMS backend module 403.
  • the DBMS backend module 403 performs a query on the database 106 to identify any matching reference fingerprints and to retrieve any additional data associated with the matching reference fingerprint.
  • the database 106 comprises fingerprints FP1, FP2, FP3, FP4 and FP5 and respective associated sets of additional information Dl, D2, D3, D4 and D5.
  • the database 106 can be organized in various ways to optimize query time and/or data organization.
  • the output from the input module 401 should be taken into account when designing the tables in the database 106.
  • the database 106 comprises a single table with entries (records) comprising respective fingerprints and sets of additional data.
  • the DBMS backend module 403 feeds the results of the query to the response module 404, which returns the results to a requesting application or directly generates a control signal for controlling a device, e.g. a video recorder, in response to the retrieved additional information.
  • each reference audio fingerprint data item is stored together with an associated control code, where each control code causes a video recorder to perform a specific action, such as starting a recording, stopping an ongoing recording, etc, thereby allowing a control of a video recorder based on detected content in a current video program.
  • Fig. 5 shows a system for communicating information about multimedia content according to an embodiment of the invention.
  • the system comprises a set-top box 501 which receives a multimedia data stream 502, e.g. via a cable network, a satellite, a communications network, the Internet, or the like.
  • the set-top box comprises a control unit 503 which feeds the multimedia data to a television set 512 for presentation to a user.
  • the set-top box further comprises a user interface module 505 for providing a user interface allowing a user to select programs to be viewed, etc.
  • the set-top box 501 further comprises a fingerprint calculation module 504 which receives the input multimedia data stream 502 and computes audio fingerprint information from the audio data of the received input.
  • a fingerprint calculation module 504 which receives the input multimedia data stream 502 and computes audio fingerprint information from the audio data of the received input.
  • the user can, via the user interface 505, initiate the transmission of the computed fingerprint data for a selected program fragment to a service provider or other third party, thereby indicating his/her interest in the displayed item.
  • the set-top box 501 comprises a communications interface 506, e.g. a modem, a network adapter, or the like, for transmitting the computed and selected fingerprint data to a service provider system 507, e.g. a network server or other data processing system. Additionally, further information may be transmitted, such as the identification of the sending set-top box, an indication of the type of interest, e.g. a request for further information, a purchase order, etc.
  • a communications interface 506 e.g. a modem, a network adapter, or the like
  • a service provider system 507 e.g. a network server or other data processing system.
  • further information may be transmitted, such as the identification of the sending set-top box, an indication of the type of interest, e.g. a request for further information, a purchase order, etc.
  • the service provider system 507 comprises a corresponding communications interface 508 for receiving the fingerprint data along with any additional information transmitted by the set-top box 501.
  • the service provider system further comprises a fingerprint database module 509 and a fingerprint database 510 as described above.
  • the fingerprint database module 509 compares the received fingerprint data item with reference fingerprints in the database 510. If a matching reference fingerprint is found, the provider system 507 initiates a suitable transaction in response to the recognized item viewed by the user of the set-top box. For example, the provider system may send additional information, initiate a purchase transaction, send a control signal back to the set-top box, thereby causing the set-top box to display a suitable menu, or the like.
  • the service provider may cause e.g. an e- commerce application to be launched by the set-top box, thereby providing the user with the option of buying an item that is featured in a commercial or other broadcast, or of engaging in some other e-commerce transaction.
  • an e- commerce application e.g. an e-commerce application to be launched by the set-top box, thereby providing the user with the option of buying an item that is featured in a commercial or other broadcast, or of engaging in some other e-commerce transaction.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • PPA Programmable Logic Arrays
  • FPGA Field Programmable Gate Arrays
  • the invention is not limited to video recorders but may also be implanted in other devices or systems for processing multimedia data, such as set-top boxes, television sets, multimedia data viewers implemented in software or hardware, or the like.
  • the invention is not limited to commercials but can easily be applied to other program material to be recorded and/or skipped from recording, in particular program material comprising identifiable fragments, such as leaders and trailers.
  • program material comprising identifiable fragments, such as leaders and trailers.
  • Examples of such program material comprise inserted news programs, weather forecasts, episodes of television shows, etc.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word “comprising” does not exclude the presence of elements or steps other than those listed in a claim.
  • the word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Abstract

A method of detecting content in a multimedia signal, the method comprising the steps of providing (201) a multimedia signal comprising video data and corresponding audio data; de­termining (202) an audio fingerprint data item from a predetermined part of the audio data; comparing (203) the determined audio fingerprint data item with at least one of a number of reference audio fingerprint data items each related to a corresponding content element; and if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, identifying (205) the corresponding related content element as detected.

Description

Description Video content detection
[1] The invention relates to the detection of content, e.g. commercials, in multimedia signals such as multimedia data streams.
[2] Personal video receivers/recorders, devices that receive, process and/or record the content of broadcast video, are becoming increasingly popular. Modern hard disk based personal video recorder (PVR) devices that are currently available in the market (e.g. TiVo, UltimateTV, EchoStar and ReplayTV) use so-called electronic program guides.
[3] An electronic program guide (EPG) is an application used with digital set-top boxes, modern television sets, video recorders, etc. in order to list current and scheduled programs that are or will be available on each channel together with a short summary or commentary for each program. Hence, an EPG is the electronic equivalent of a printed television program guide.
[4] Typically, an EPG is accessed using a remote control device. Menus are provided that allow the user to view a list of programs scheduled for the next few hours up to the next seven days. A typical EPG includes options to set parental controls, to order pay- per-view programming, to search for programs based on theme or category, and to set up a VCR to record programs.
[5] In the context of video recording, an EPG may be used, sometimes in combination with the personal preferences or profile of the user, to (automatically) schedule recordings of programs selected from a wide range of television channels. This approach of recording television broadcasts has emerged, because of the random accessibility of hard disc drives, thereby creating a number of interesting possibilities, such as the simultaneous recording and playback of programs, the simultaneous recording of multiple programs, etc. Furthermore, the huge storage capacity of current hard disk drives and the availability of consumer priced video encoders are of key importance as well.
[6] These systems, however, suffer from a number of shortcomings inherent to their way of operation. If an EPG is available on a video recording device, the scheduling of programmed recordings heavily relies on the accuracy of the EPG. In the situation where scheduled programs are delayed or broadcasted earlier than advertised in the EPG, the programmed recording is disrupted, unless the EPG information consulted by the recording device is updated in due time. Another cause of annoyance to the user is the interruption of an intended recording by commercial blocks or other inserted programs, e.g. weather forecasts or news bulletins.
[7] Furthermore, if for some reason the broadcaster decides to broadcast a program of interest to the user, e.g. as defined by a stored preference list, a user profile, or the like, the PVR is not able to record it unless it is specifically added to the EPG. However, it is a general problem that EPGs are rarely up-to-date to the last minute, because they are often compiled by a third party, i.e. not necessarily the broadcaster, or the PVR box manufacturer. The updating process typically takes place via a dial-up connection, i.e. it is only updated periodically. Another complication is the fact that many broadcasters are still broadcasting their programs using analog technologies. Consequently, the start and ending of programs are not explicitly or incorrectly signaled by the broadcaster.
[8] One of the features under investigation for such systems is content detection. For example, a system that can detect commercials may allow substitute advertisements to be inserted in a video stream ("commercial swapping") or temporary halting of the video at the end of a commercial to prevent a user, momentarily distracted during a commercial, from missing any of the main program content.
[9] There are known methods for detecting commercials. One method is the detection of high cut rate due to a sudden change in the scene with no fade or movement transition between temporally-adjacent frames. Cuts can include fades so the cuts do not have to be hard cuts. A more robust criterion may be high transition rates. Another indicator is the presence of a black frame (or monochrome frame) coupled with silence, which may indicate the beginning of a commercial break. Another known indicator of commercials is high activity, an indicator derived from the observation/ assumption that objects move faster and change more frequently during commercials than during the feature (non-commercial) material. These methods show somewhat promising results, but reliability is still wanting.
[10] Hence, it is an object of the present invention to solve the problem of providing an accurate method of content detection.
[11] The above and other problems are solved by a method of detecting content in a multimedia signal, the method comprising:
[12] - providing a multimedia signal comprising video data and corresponding audio data;
[13] - determining an audio fingerprint data item from a predetermined part of the audio data;
[14] - comparing the determined audio fingerprint data item with at least one of a number of reference audio fingerprint data items each related to a corresponding content element; and
[15] - if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, identifying the corresponding related content element as detected.
[16] Hence, by identifying content in a multimedia signal based on a comparison of audio fingerprints with previously determined and stored reference audio fingerprints, a robust and efficient method of identifying content in a multimedia signal is provided. Hence, a method is provided for robustly recognizing the specific content of the received multimedia signal, e.g. the start or finish of a specific television program, such as a specific television show, a specific news program, the start or finish of commercials, or the like.
[17] It is a further advantage of the invention that it provides a computationally efficient method of identifying content in a multimedia signal, since it is based on the processing of the audio part of a multimedia signal. Audio processing, e.g. the calculation and comparison of audio fingerprints, is less computationally complex than the processing of the actual video part of a multimedia signal.
[18] For the purpose of the current description, multimedia signals comprise a video part representing (moving) pictures and an audio part representing the associated audio content. The visual content and the audio content may be encoded in a multimedia signal by a number of different encoding schemes. For example, the multimedia signal may represent a sequence of picture frames that are grouped into blocks, where each block comprises a number of frames and the associated audio data. Known encoding schemes for multimedia signals include the schemes provided by the Moving Pictures Experts Group (MPEG).
[19] For the purpose of the present description, the term multimedia signal refers to an analog or digital signal or data comprising the actual video content and the associated audio content to be presented together, preferably synchronized, with the video content. The data representing the video content alone will be referred to as video data, while the data representing the audio content will be referred to as audio data.
[20] It has been realized by the inventors that the audio fingerprints provide a particularly reliable method of recognizing specific content in a multimedia signal. For example, it has been observed that e.g. audio trailers of news programs, television shows, etc., remain unchanged over long periods of time and, therefore, they provide a reliable source for recognizing these programs.
[21] Here, the term audio fingerprint comprises any suitable method of extracting robust features from audio data indicative of the audio content in such data, and storing the extracted features in a compact form. Hence, an audio fingerprint is a representation of the corresponding audio content in question. Preferably, the fingerprint is shorter than the original audio signal. Furthermore, preferably the fingerprint represents the most relevant perceptual features of the audio signal in question. Such fingerprints are sometimes also known as "robust hashes". The term robust hashes refers to a hash function which, to a certain extent, is robust with respect to data processing and signal degradation, e.g. due to compression/decompression, coding, AD/DA conversion, etc. Robust hashes are sometimes also referred to as robust summaries, robust signatures, or perceptual hashes.
[22] The term content element comprises any fragment of a video program, e.g. a trailer, a leader, a jingle, or the like. A content element may further comprise the entire content of a video program, e.g. an entire commercial, or the like.
[23] "" According to the invention, the'audio fingerprints of a large number of video programs are stored, e.g. in a database. Hence, the content in a multimedia signal is recognized by computing an audio fingerprint of the associated audio content and by performing a lookup or query in the database using the computed fingerprint as a lookup key or query parameter. It is understood that more than one fingerprint may be associated with a given content element.
[24] The reference audio fingerprints may be stored in a database locally in the device, e.g. a PVR, thereby allowing efficient content identification by the device without the need for establishing a communications link to a remote fingerprint server.
[25] Alternatively, the matching of the computed fingerprints against reference fingerprints may be done at a remote location, for example on a server connected to the Internet. In this embodiment, the client device computes the fingerprint and sends it to the server, which returns a content identifier. It is understood, that a combination of a remote and a local database may be used too, e.g. for supplementing a remote database with a personal local database of fingerprints related to content of personal interest of the user.
[26] In one embodiment, the extracted fingerprint data items may be added to the set of reference audio fingerprint data items, e.g. subject to an approval by a user, thereby gradually extending the set of reference audio fingerprints.
[27] In a preferred embodiment, the method further comprises:
[28] - storing each of the number of reference audio fingerprint data items in relation to a corresponding video content identifier; and
[29] - if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, retrieving the video content identifier corresponding to the identified first reference audio fingerprint data item.
[30] Hence, the audio fingerprint data items are stored along with a video content identifier, i.e. an identification of their respective content, e.g. the title of the program, a number identifying the program, or the like. Accordingly, a lookup in the database returns the stored video content identifier indicative of the content of the recorded video program.
[31] According to another preferred embodiment of the invention, the method further comprises:
[32] - recording a video program resulting in a multimedia signal;
[33] - identifying at least a predetermined part of the video program corresponding to a predetermined content element; and
[34] - generating at least one audio fingerprint data item corresponding to the predetermined part of the video program.
[35] It is understood that a predetermined part of a video program may comprise the entire program.
[36] According to a further preferred embodiment, the method further comprises: [37] - comparing the generated audio fingerprint data item with at least one previously generated audio fingerprint data item to generate viewing frequency information indicative of a number of times a corresponding content element has previously been presented to a user; and
[38] - storing the generated audio fingerprint data item in relation to the generated viewing frequency information.
[39] Hence, it is an advantage of the invention, that information about how often a given video content has been watched by a user may be stored. Based on this data, the PVR can compile a 'hot list' of frequently occurring audio fingerprint information derived from the recorded programs that are previously stored/recorded onto the hard disk of the PVR. Using this hot list, the PVR can automatically record such programs in the future, even for programs and channels that do not have an entry associated with them in the EPG. At the same time this information as derived from the contents stored on the hard disk of the PVR can be used to improve the profile of the user.
[40] Alternatively or additionally, such a list of frequently occurring fingerprints may be used to trigger other types of decisions, e.g. to control the PVR not to record the video content corresponding to selected ones of the frequent fingerprints, as they may relate to commercials or the like.
[41] There are several advantages in storing audio fingerprints in a database instead of the multimedia signal itself. To name a few:
[42] - The memory/storage requirements for the database are reduced.
[43] - The comparison of fingerprints is more efficient than the comparison of the multimedia signal, as fingerprints are substantially shorter than the signals they are calculated from.
[44] - Searching in a database for a matching fingerprint is more efficient than searching for a complete video signal, since it involves matching shorter items.
[45] - Searching for a matching fingerprint is more likely to be successful, as small changes to a video signal (such as encoding in a different format or changing the bit rate) do not affect the fingerprint.
[46] An example of a method of generating an audio fingerprint is described in Jaap Haitsma, Ton Kalker and Job Oostveen, "Robust Audio Hashing For Content Identification", International Workshop on Content-Based Multimedia Indexing, Brescia, September 2001, which discloses the computation of audio fingerprints and the obtaining of identifiers from them as such.
[47] Preferably, the audio fingerprints are calculated for certain characteristic parts of a video program only, e.g. leaders and/or trailers of video programs, thereby reducing the amount of data that has to be calculated, stored, and compared.
[48] According to a preferred embodiment of the invention, the method further comprises controlling a video recording device in response to the retrieved video content identifier. [49] It is an advantage of the invention that it provides an improved control of a video recording device. In particular, it is an advantage of the invention that it provides an improved accuracy of commercial detection. In a further preferred embodiment, the method of identifying content in a multimedia signal according to the invention is combined with other commercial detection algorithms, thereby significantly improving the reliability and accuracy of existing algorithms. Usually commercial blocks are separated from the normal program by a leader and trailer that signal the start and end of these blocks, respectively. Since these leaders and trailers only rarely change (typically at most once a season) they are well suited for identification by the audio fingerprinting according to the invention.
[50] According to another preferred embodiment of the invention, the method further comprises communicating information about the detected content element to a remote data processing system. Hence, the invention provides a mechanism for providing feedback information about the viewed programs by the user of a PVR to a third party.
[51] The present invention can be implemented in different ways including the method described above and in the following, further methods and arrangements, a video recorder, and further product means, each yielding one or more of the benefits and advantages described in connection with the first-mentioned method, and each having one or more preferred embodiments corresponding to the preferred embodiments described in connection with the first-mentioned method and disclosed in the dependant claims.
[52] A second aspect of the invention relates to a method of recording a video program by a video recording device. One of the interesting features of modern PVRs is the possibility of a pre-programmed recording of a series of programs, e.g. a complete set of episodes of a television series. However, such programs do not always start and stop exactly at the times indicted in the (electronic) program guide. This can be caused by a number of different reasons, such as cancellation, delays or radio interference. Furthermore, programs may be longer than anticipated, e.g. due to inserted news flashes or extra-time in sports broadcasts.
[53] The above problem is solved by a method of recording a video program by a video recording device, the method comprising:
[54] - detecting a content element in a multimedia signal corresponding to a predetermined part of the video program by performing the steps of the first-mentioned method;
[55] - controlling a recording operation of the video recording device in response to the detected content element.
[56] Hence, by detecting the content corresponding to a predetermined part of a video program, such as leaders and trailers according to the invention, recorded programs can be assured to be complete, and they do not take up more disc space than strictly necessary. Hence, the accuracy of the start- end end-time of programmed recordings is improved significantly.
[57] It is a further advantage of the invention that it facilitates the recording of programs that are not scheduled and, consequently, not announced in the (Electronic) Program Guide at all. Such unannounced content can nevertheless be very interesting to the users. Examples of such unannounced content include news flashes, weather forecasts and stock market updates that might be broadcasted at random intervals during the day (without explicitly being listed in the EPG).
[58] According to yet another aspect of the invention, the first-mentioned method may be used in a PVR as a tool to provide feedback information about the viewed programs by the user of the PVR to a third party. For example, the method may be applied as a tool to market products offered in commercials or during regular programs, thereby enabling e-commerce applications.
[59] Accordingly, the invention relates to a method of communicating information about a content element of a video program , the method comprising:
[60] - receiving an audio fingerprint data item generated by a device for presenting multimedia content, the audio fingerprint data item representing a predetermined content element of the presented multimedia content;
[61] - providing a number of reference audio fingerprint data items each related to a corresponding content element;
[62] - comparing the received audio fingerprint data item with at least one of the number of reference audio fingerprint data items; and
[63] - if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, identifying the corresponding related content element as detected.
[64] For example, as soon as an item of interest is shown to a user within a video program by a device for presenting multimedia content, such as a PVR, a television set, or the like, the user can initiate that an audio fingerprint data item is sent back to the provider of the television program or to a third party, thereby showing his/her interest in this item. There is no additional data required to be sent with the audio/ video stream to identify the product. Consequently it is an advantage that no hardware is required at the head-end, broadcaster, or other provider of the video program to create such data and insert it into the audio/video stream. Furthermore, there is no longer a need to involve the broadcaster in this e-commerce chain. The transaction can be limited to be performed between the end-user and the third party, e.g. via the Internet or another communications channel, e.g. a telecommunications link.
[65] It is noted that the features of the methods described above and in the following may be implemented in software and carried out in a data processing system or other processing means caused by the execution of computer-executable instructions. The instructions may be program code means loaded in a memory, such as a RAM, from a storage medium or from another computer via a computer network. Alternatively, the described features may be implemented by hardwired circuitry instead of software or in combination with software.
[66] The invention further relates to an arrangement for detecting content in a multimedia signal, the arrangement comprising:
[67] - means for providing a multimedia signal, the multimedia signal comprising video data and corresponding audio data;
[68] - processing means for determining an audio fingerprint data item from a predetermined part of the audio data;
[69] - processing means adapted to compare the determined audio fingerprint data item with at least one of a number of reference audio fingerprint data items each related to a corresponding content element and, if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, to identify the corresponding related content element as detected.
[70] Here and in the following, the term processing means comprises general- or special-purpose programmable microprocessors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Programmable Logic Arrays (PLA), Field Programmable Gate Arrays (FPGA), special purpose electronic circuits, etc., or a combination thereof.
[71] Examples of storage means include magnetic tape, optical disc, digital video disk (DVD), compact disc (CD or CD-ROM), mini-disc, hard disk, floppy disk, ferroelectric memory, electrically erasable programmable read only memory (EEPROM), flash memory, EPROM, read only memory (ROM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), ferromagnetic memory, optical storage, charge coupled devices, smart cards, PCMCIA card, etc.
[72] The arrangement may further comprise means for providing a number of reference audio fingerprint data items. Such means may comprise storage means for storing such data items, communications means for receiving such data items, or any other circuitry or device suitable for providing such data items.
[73] The means for providing a multimedia signal may comprise any circuitry or device for receiving a multimedia signal, such as a receiver, e.g. a television receiver, a satellite receiver, a storage means for multimedia signals, or any other suitable communications means.
[74] The arrangement may further comprise communications means for communicating information about the detected content element, e.g. a retrieved video content identifier, to a remote data processing system.
[75] Here and in the following, the term communications means comprises circuitry and/or devices suitable for enabling the communication of data between, e.g. via a wired or a wireless data link. Examples of such communications means include a network interface, a network card, a radio transmitter/receiver, a cable modem, a telephone modem, an Integrated Services Digital Network (ISDN) adapter, a Digital Subscriber Line (DSL) adapter, a satellite transceiver, an Ethernet adapter, or the like. [76] The invention further relates to a video recorder comprising such an arrangement.
[77] The invention further relates to a system for communicating information about a content element of a video program, the system comprising: [78] a device for presenting a multimedia signal, the multimedia signal comprising video data and corresponding audio data, the device for presenting a multimedia signal comprising [79] - processing means for determining an audio fingerprint data item from a predetermined part of the presented audio data; [80] - communications means for transmitting the determined audio fingerprint data item; [81] a data processing system comprising
[82] - communications means for receiving the transmitted audio fingerprint data item;
[83] - storage means having stored thereon a number of reference audio fingerprint data items each related to a corresponding content element; [84] - processing means adapted to compare the received audio fingerprint data item with at least one of the number of reference audio fingerprint data items, and, if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, to identify the corresponding related content element as detected. [85] The invention further relates to a device for presenting a multimedia signal in such a system and a data processing system in such a system. [86] These and other aspects of the invention will be apparent and elucidated from the embodiments described in the following with reference to the drawing in which: [87] fig. 1 shows a block diagram of a video recorder including an arrangement for detecting content in a multimedia signal according to an embodiment of the invention; [88] fig. 2 shows a flow diagram of a method of detecting content in a multimedia signal according to an embodiment of the invention; [89] fig. 3 shows a block diagram of a video recorder comprising an arrangement for generating reference audio fingerprint data items according to an embodiment of the invention; [90] fig. 4 schematically shows a fingerprint database module according to an embodiment of the invention; and [91] fig. 5 shows a system for communicating information about multimedia content according to an embodiment of the invention. [92] Fig. 1 shows a block diagram of a video recorder comprising an arrangement for detecting content in a multimedia signal according to an embodiment of the invention. The video recorder 101 receives a multimedia data stream 107. The multimedia data stream comprises video data representing moving pictures and audio data representing corresponding audio data. The multimedia data may be received by a receiver, e.g. a digital or analog receiver for receiving television programs or other multimedia content, e.g. via an antenna, a cable network, a satellite, or another communications network, such as the Internet. The multimedia data may be coded according to any suitable coding scheme, e.g. an MPEG scheme. It is noted that the multimedia data may comprise a plurality of parallel data streams corresponding to a plurality of video channels or the like.
[93] The video recorder 101 comprises a recorder block 102 which stores received multimedia data 107 onto a storage medium 103. The storage medium 103 may be a hard disk or any other suitable storage means.
[94] It is noted that the recorder may record one or more video programs at the same time.
[95] According to the invention, the video recorder 101 further comprises an audio fingerprint calculation block 104 which receives the input data and computes one or more audio fingerprints from the audio component of the input data.
[96] The video recorder further comprises a fingerprint database module 105 which receives the calculated audio fingerprint item(s) from the audio fingerprint calculation block 104. The fingerprint database module 105 has further access to a fingerprint database 106 which comprises a number of reference fingerprint data items. Based on a comparison of the calculated audio fingerprint item(s) and the reference fingerprint data items in the database 106, the fingerprint database module 105 controls the recording operations of the recorder block 102, e.g. by starting a recording, stopping, pausing, resuming a recording, or the like.
[97] It is noted that, instead of a database 106, the reference fingerprints may be stored in a different way, e.g. as files in a file systems. It is an advantage of a database system that it allows an efficient search when a large number of reference fingerprints are stored.
[98] It is understood that, in one embodiment, the multimedia input data 107 may originate from the storage medium 103, i.e. the multimedia data may be previously stored program material which is to be re-editted, e.g. in order to remove commercials or other unwanted material, and the processed data is stored on the storage medium 103.
[99] Fig. 2 shows a flow diagram of a method of detecting content in a multimedia signal according to an embodiment of the invention. In an initial step 201 a recording device receives a segment of an input signal representing multimedia data comprising video data and audio data. For example, the received segment may comprise a predetermined number of video frames and the corresponding audio data. In step 202, a fingerprint H is calculated for a segment of the audio data of the received multimedia data.
[100] In step 203, the calculated fingerprint is compared to the reference fingerprints stored in a database 106, e.g. by performing a database query using the calculated fingerprint H as a key. If no matching reference fingerprint is found, i.e. no content is recognized in the multimedia data, the process returns to step 201 receiving a next segment of the input signal. If a match is found, a predetermined content corresponding to the matching reference audio fingerprint is determined to be detected (step 204) in the multimedia data, and the process continues at step 205.
[101] In step 205, additional data corresponding to the identified content is retrieved from the database 106. In one embodiment this information comprises a content identifier, e.g. a title of a video program or another suitable identifier identifying a specific program, a specific series or type of programs, e.g. identifying the content as an episode of a specific series or show, as a news flash of specific news program, as a specific commercial for a specific product, or the like.
[102] Alternatively or additionally, the additional data comprises control code information indicative of a predetermined operation to be initiated in response to the detected content, such as displaying additional information, starting, stopping, etc. of a recording, etc. The operation may further be conditioned on a explicit acknowledgment by a user.
[103] In step 206, the operation of the video recorder is controlled according to the detected content. Examples of the control of operations include the pausing of a recording at the beginning of a detected commercial, news program, etc., which interrupts the current program, and resuming the recording at the end of the commercial or the like. This skipping of (single) commercials can also be applied in live viewing situations. For fine tuning, i.e. determining a frame-accurate in— and outpoint to pause and resume the recording, additional commercial detection, scene- change and/or genre change detection algorithms can be used in combination with the present invention. It is understood, that a commercial enforcement may also be implemented.
[104] By detecting the leaders and trailers of the program to be recorded and of the program material to be skipped, it can be assured that the program to be recorded is recorded in full, even if the start and/or finish times differ from the announced times. Furthermore, unwanted program material may efficiently be excluded from the recording. Hence, recorded programs do not take up more disc space than strictly necessary.
[105] If the process is not stopped (step 207), the process continues at step 201 and receives the next signal segment.
[106] Fig. 3 shows a block diagram of a video recorder comprising an arrangement for generating reference audio fingerprint data items according to an embodiment of the invention. As described above, the recognition of content in a multimedia signal is based on a comparison with a number of reference audio fingerprints, e.g. of leaders and trailers of commercial blocks and/or programs of interest, available in the database 106. The video recorder 101 of fig. 3 comprises an arrangement for acquiring such information. The video recorder 101 receives input multimedia data 107, e.g. during recordings or during a separate configuration or training session. The video recorder comprises a recorder block 102 that stores a recording of a predetermined program material on a storage medium 103. The video recorder further comprises a fingerprint calculation block 104 that receives the input data and computes audio fingerprints from the audio component of the input data. The generated audio fingerprint information is written as a separate stream or file to the storage medium 103. Alternatively, the fingerprint information may be embedded in private data of the encoded multimedia stream. The video recorder further comprises a fingerprint management block 301 which retrieves previously recorded multimedia data and the corresponding fingerprints from the storage medium 103 and identifies the fingerprint information associated with leaders and trailers of commercials and other programs. The identified fingerprint information is stored in the fingerprint database 106.
[107] In one embodiment, this identification of relevant fingerprint information is performed by comparing the beginning and ending of different recordings of the same program, e.g. of different episodes of a television series or show or of different news bulletins. In one embodiment, the fingerprint management block 301 provides a user interface allowing a user to select a number of stored recordings to be used as a basis for identifying fingerprint data. The identification of key program segments providing recognizable program content which is indicative for a given program may be performed automatically, e.g. by correlating the fingerprint information of different episodes. In another embodiment, the fingerprint management block 301 may provide a user interface allowing a user to explicitly select such key fragments of a program. In yet a further embodiment, an automatic and a manual identification of key fragments are combined, e.g. by requesting a confirmation from the user on fragments identified by the video recorder. It is noted that the present invention may be combined with other automatic commercial detectors in order to identify the boundaries of the commercials and/or the boundaries of the single commercial clips to be stored in the fingerprint database.
[108] The fingerprint management block 301 further provides a user interface allowing a user to input additional data, such as a content identifier or other descriptive information about the content of the program material. As mentioned above, the additional data may further comprise control code information, e.g. an indication whether the identified content is to be recorded whenever detected, whether the identified content is to be skipped during recordings, etc.
[109] For example, in one embodiment, the method according to the invention may be used to implement a commercial skip on a commercial-by-commercial basis. Some people like to watch commercials and only tend to get bored after repeated viewings. Hence, a commercial 'thumbs down' button may be supplied allowing the user to disqualify a specific commercial that will be skipped automatically in the future.
[110] In one embodiment, the fingerprint management block 301 can compile a 'hot list' of frequently occurring audio fingerprints derived from the recorded programs that are previously stored on the storage medium 103. Using this hot list, a number of decisions are facilitated. For example, the video recorder may be controlled to automatically record such programs in the future, even programs and channels that do not have an entry associated with them in the EPG. At the same time this information as derived from the hard disk contents can be used to improve the profile of the user.
[I l l] It is noted that alternative methods of acquiring reference fingerprint data may be employed. For example, the fingerprint information may be made available by service providers, e.g. on a web-site of the Internet. Thus, the reference fingerprint information may be accessed on-line via the Internet, or the fingerprint information may be downloaded and stored in the video recorder, e.g. embedded into the EPG. Alternatively or additionally, fingerprints may be distributed on a storage medium, such as on a disc, e.g. a CD, DVD, etc.
[112] Fig. 4 schematically shows a fingerprint database module according to an embodiment of the invention. The fingerprint database module 105 comprises an input module 401, a Database Management System (DBMS) backend module 403, and a response module 404.
[113] The input module 401 receives an audio fingerprint and supplies the fingerprint to the DBMS backend module 403. The DBMS backend module 403 performs a query on the database 106 to identify any matching reference fingerprints and to retrieve any additional data associated with the matching reference fingerprint. As shown in Fig. 4, the database 106 comprises fingerprints FP1, FP2, FP3, FP4 and FP5 and respective associated sets of additional information Dl, D2, D3, D4 and D5. International patent application WO 02/065782, which is included herein by reference in its entirety, describes various matching strategies for matching fingerprints computed for an audio clip with fingerprints stored in a database as well as an efficient method of matching a fingerprint representing an unknown information signal with a plurality of fingerprints of identified information signals stored in a database to identify the unknown signal. This method uses reliability information of the extracted fingerprint bits. The fingerprint bits are determined by computing features of an information signal and thresholding said features to obtain the fingerprint bits. If a feature has a value very close to the threshold, a small change in the signal may lead to a fingerprint bit with opposite value. The absolute value of the difference between feature value and threshold is used to mark each fingerprint bit as reliable or unreliable. The reliabilities are subsequently used to improve the actual matching procedure.
[114] The database 106 can be organized in various ways to optimize query time and/or data organization. The output from the input module 401 should be taken into account when designing the tables in the database 106. In the embodiment shown in Fig. 4, the database 106 comprises a single table with entries (records) comprising respective fingerprints and sets of additional data. The DBMS backend module 403 feeds the results of the query to the response module 404, which returns the results to a requesting application or directly generates a control signal for controlling a device, e.g. a video recorder, in response to the retrieved additional information.
[115] In one embodiment, each reference audio fingerprint data item is stored together with an associated control code, where each control code causes a video recorder to perform a specific action, such as starting a recording, stopping an ongoing recording, etc, thereby allowing a control of a video recorder based on detected content in a current video program.
[116] Fig. 5 shows a system for communicating information about multimedia content according to an embodiment of the invention. The system comprises a set-top box 501 which receives a multimedia data stream 502, e.g. via a cable network, a satellite, a communications network, the Internet, or the like. The set-top box comprises a control unit 503 which feeds the multimedia data to a television set 512 for presentation to a user. The set-top box further comprises a user interface module 505 for providing a user interface allowing a user to select programs to be viewed, etc.
[117] According to the invention, the set-top box 501 further comprises a fingerprint calculation module 504 which receives the input multimedia data stream 502 and computes audio fingerprint information from the audio data of the received input. As soon as an item of interest is shown in the program, the user can, via the user interface 505, initiate the transmission of the computed fingerprint data for a selected program fragment to a service provider or other third party, thereby indicating his/her interest in the displayed item.
[118] Consequently, according to this embodiment, the set-top box 501 comprises a communications interface 506, e.g. a modem, a network adapter, or the like, for transmitting the computed and selected fingerprint data to a service provider system 507, e.g. a network server or other data processing system. Additionally, further information may be transmitted, such as the identification of the sending set-top box, an indication of the type of interest, e.g. a request for further information, a purchase order, etc.
[119] The service provider system 507 comprises a corresponding communications interface 508 for receiving the fingerprint data along with any additional information transmitted by the set-top box 501. The service provider system further comprises a fingerprint database module 509 and a fingerprint database 510 as described above. The fingerprint database module 509 compares the received fingerprint data item with reference fingerprints in the database 510. If a matching reference fingerprint is found, the provider system 507 initiates a suitable transaction in response to the recognized item viewed by the user of the set-top box. For example, the provider system may send additional information, initiate a purchase transaction, send a control signal back to the set-top box, thereby causing the set-top box to display a suitable menu, or the like.
[120] It is an advantage of this embodiment that third-party product vendors or service providers can offer an extra service to the end-user of the set-top box 501 without having to establish a costly infrastructure. This extra service is based on the recognition of content being broadcasted by the broadcaster during regular broadcasts on existing channels without the broadcaster having to include any special metadata in the broadcast. In fact the broadcaster is not involved in the resulting e-commerce transaction which is limited between the end-user and the service provider. The content is recognized by the service provider, e.g. set-top-box provider, based on audio fragments, thereby simplifying the required infrastructure. In particular, since no additional data needs to be sent with the multimedia stream 502, no hardware at the broadcaster of the multimedia data stream for creating such data and inserting it into the multimedia stream.
[121] Based on the content the user is watching, the service provider may cause e.g. an e- commerce application to be launched by the set-top box, thereby providing the user with the option of buying an item that is featured in a commercial or other broadcast, or of engaging in some other e-commerce transaction.
[122] It is noted that, alternatively, the above functionality may be implemented by a video recorder or a television set instead of a set-top box.
[123] It is noted that the above arrangements may be implemented as general- or special- purpose programmable microprocessors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Programmable Logic Arrays (PLA), Field Programmable Gate Arrays (FPGA), special purpose electronic circuits, etc., or a combination thereof.
[124] It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims.
[125] For example,the invention is not limited to video recorders but may also be implanted in other devices or systems for processing multimedia data, such as set-top boxes, television sets, multimedia data viewers implemented in software or hardware, or the like.
[126] Furthermore, the invention is not limited to commercials but can easily be applied to other program material to be recorded and/or skipped from recording, in particular program material comprising identifiable fragments, such as leaders and trailers. Examples of such program material comprise inserted news programs, weather forecasts, episodes of television shows, etc.
[127] In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. [128] The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
[129] Hence, in the above, methods and systems for the detection of content in a multimedia signal have been disclosed that significantly improve the commercial detection, programming accuracy and overall functionality of PVRs. These advantages may be achieved even in the absence of EPGs, using analog broadcasts and/or on conventional devices like Video Cassette Recorders (VCRs). Furthermore, it should be noted that the above applications of content detection may also be applied to audio broadcasts, e.g. radio broadcasts, since the content detection is done on the basis of the audio signal.

Claims

Claims
[1] A method of detecting content in a multimedia signal, the method comprising: - providing a multimedia signal comprising video data and corresponding audio data; - determining an audio fingerprint data item from a predetermined part of the audio data; - comparing the determined audio fingerprint data item with at least one of a number of reference audio fingerprint data items each related to a corresponding content element; and - if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, identifying the corresponding related content element as detected.
[2] A method according to claim 1, further comprising: storing each of the number of reference audio fingerprint data items in relation to a corresponding video content identifier; and - if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, retrieving the video content identifier corresponding to the identified first reference audio fingerprint data item.
[3] A method according to claim 1 or 2, further comprising - recording a video program resulting in a multimedia signal; - identifying at least a predetermined part of the video program corresponding to a predetermined content element; and - generating at least one audio fingerprint data item corresponding to the predetermined part of the video program.
[4] A method according to claim 3, further comprising - comparing the generated audio fingerprint data item with at least one previously generated audio fingerprint data item to generate viewing frequency information indicative of a number of times a corresponding content element has previously been presented to a user; and - storing the generated audio fingerprint data item in relation to the generated viewing frequency information.
[5] A method according to any one of claims 1 through 4, wherein the predetermined part of the provided multimedia signal corresponds to at least one of a leader and a trailer of a predetermined video program.
[6] A method according to any one of claims 1 through 5, wherein the step of determining an audio fingerprint data item comprises calculating a robust hash value from a predetermined part of the audio content represented by the provided multimedia signal.
[7] A method according to any one of claims 1 through 6, wherein the method further comprises controlling a video recording device in response to the retrieved video content identifier.
[8] A method according to any one of claims 1 through 7, wherein the method further comprises communicating information about the detected content element to a remote data processing system.
[9] A method of recording a video program by a video recording device, the method comprising: - detecting a content element in a multimedia signal corresponding to a predetermined part of the video program by performing the steps of the method according to any one of claims 1 through 6; - controlling a recording operation of the video recording device in response to the detected content element.
[10] A method of communicating information about a content element of a video program, the method comprising: - receiving an audio fingerprint data item generated by a device for presenting multimedia content, the audio fingerprint data item representing a predetermined content element of the presented multimedia content; - providing a number of reference audio fingerprint data items each related to a corresponding content element; - comparing the received audio fingerprint data item with at least one of the number of reference audio fingerprint data items; and - if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, identifying the corresponding related content element as detected.
[11] An arrangement for detecting content in a multimedia signal, the arrangement comprising: means for providing a multimedia signal, the multimedia signal comprising video data and corresponding audio data; processing means for determining an audio fingerprint data item from a predetermined part of the audio data; processing means adapted to compare the determined audio fingerprint data item with at least one of a number of reference audio fingerprint data items each related to a corresponding content element and, if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, to identify the corresponding related content element as detected.
[12] A recorder for recording video program material, the recorder comprising an arrangement according to claim 10.
[13] A system for communicating information about a content element of a video program, the system comprising: a device for presenting a multimedia signal, the multimedia signal comprising video data and corresponding audio data, the device for presenting a multimedia signal comprising - processing means for determining an audio fingerprint data item from a predetermined part of the presented audio data; - communications means for transmitting the determined audio fingerprint data item; a data processing system comprising - communications means for receiving the transmitted audio fingerprint data item; - storage means having stored thereon a number of reference audio fingerprint data items each related to a corresponding content element; - processing means adapted to compare the received audio fingerprint data item with at least one of the number of reference audio fingerprint data items, and, if at least a first one of the number of reference audio fingerprint data items is identified to correspond to the determined audio fingerprint data item, to identify the corresponding related content element as detected.
PCT/IB2003/050031 2002-12-20 2003-12-03 Video content detection WO2005041455A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003283783A AU2003283783A1 (en) 2002-12-20 2003-12-03 Video content detection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP02080528.9 2002-12-20
EP02080528 2002-12-20

Publications (1)

Publication Number Publication Date
WO2005041455A1 true WO2005041455A1 (en) 2005-05-06

Family

ID=34486046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2003/050031 WO2005041455A1 (en) 2002-12-20 2003-12-03 Video content detection

Country Status (2)

Country Link
AU (1) AU2003283783A1 (en)
WO (1) WO2005041455A1 (en)

Cited By (132)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006003490A1 (en) * 2004-06-30 2006-01-12 Nokia Corporation Multiple services within a channel-identification in a device
EP1855481A2 (en) * 2006-05-10 2007-11-14 Lee S. Weinblatt System and method for providing incentive rewards to an audience tuned to a broadcast signal
US7979464B2 (en) 2007-02-27 2011-07-12 Motion Picture Laboratories, Inc. Associating rights to multimedia content
WO2013095893A1 (en) * 2011-12-20 2013-06-27 Yahoo! Inc. Audio fingerprint for content identification
WO2014138632A1 (en) * 2013-03-07 2014-09-12 Google Inc. Personal video recorder with limited attached local storage
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9560425B2 (en) 2008-11-26 2017-01-31 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9703947B2 (en) 2008-11-26 2017-07-11 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9716736B2 (en) 2008-11-26 2017-07-25 Free Stream Media Corp. System and method of discovery and launch associated with a networked media device
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9961388B2 (en) 2008-11-26 2018-05-01 David Harrison Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9986279B2 (en) 2008-11-26 2018-05-29 Free Stream Media Corp. Discovery, access control, and communication with networked services
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
WO2019018164A1 (en) * 2017-07-19 2019-01-24 Netflix, Inc. Identifying previously streamed portions of a media title to avoid repetitive playback
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10334324B2 (en) 2008-11-26 2019-06-25 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10567823B2 (en) 2008-11-26 2020-02-18 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10607141B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10631068B2 (en) 2008-11-26 2020-04-21 Free Stream Media Corp. Content exposure attribution based on renderings of related content across multiple devices
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10880340B2 (en) 2008-11-26 2020-12-29 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10977693B2 (en) 2008-11-26 2021-04-13 Free Stream Media Corp. Association of content identifier of audio-visual data with additional data through capture infrastructure
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5612729A (en) * 1992-04-30 1997-03-18 The Arbitron Company Method and system for producing a signature characterizing an audio broadcast signal
GB2375907A (en) * 2001-05-14 2002-11-27 British Broadcasting Corp An automated recognition system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5612729A (en) * 1992-04-30 1997-03-18 The Arbitron Company Method and system for producing a signature characterizing an audio broadcast signal
GB2375907A (en) * 2001-05-14 2002-11-27 British Broadcasting Corp An automated recognition system

Cited By (202)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
KR100841710B1 (en) 2004-06-30 2008-07-02 노키아 코포레이션 Multiple services within a channel-identification in a device
WO2006003490A1 (en) * 2004-06-30 2006-01-12 Nokia Corporation Multiple services within a channel-identification in a device
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
EP1855481A2 (en) * 2006-05-10 2007-11-14 Lee S. Weinblatt System and method for providing incentive rewards to an audience tuned to a broadcast signal
EP1855481A3 (en) * 2006-05-10 2010-08-04 Lee S. Weinblatt System and method for providing incentive rewards to an audience tuned to a broadcast signal
US9554092B2 (en) 2006-05-10 2017-01-24 Winmore, Inc. System and method for providing incentive rewards to an audience tuned to a broadcast signal
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US7979464B2 (en) 2007-02-27 2011-07-12 Motion Picture Laboratories, Inc. Associating rights to multimedia content
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US9961388B2 (en) 2008-11-26 2018-05-01 David Harrison Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US10567823B2 (en) 2008-11-26 2020-02-18 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US10142377B2 (en) 2008-11-26 2018-11-27 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10986141B2 (en) 2008-11-26 2021-04-20 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10977693B2 (en) 2008-11-26 2021-04-13 Free Stream Media Corp. Association of content identifier of audio-visual data with additional data through capture infrastructure
US9854330B2 (en) 2008-11-26 2017-12-26 David Harrison Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9848250B2 (en) 2008-11-26 2017-12-19 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9716736B2 (en) 2008-11-26 2017-07-25 Free Stream Media Corp. System and method of discovery and launch associated with a networked media device
US9560425B2 (en) 2008-11-26 2017-01-31 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US10880340B2 (en) 2008-11-26 2020-12-29 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9866925B2 (en) 2008-11-26 2018-01-09 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9591381B2 (en) 2008-11-26 2017-03-07 Free Stream Media Corp. Automated discovery and launch of an application on a network enabled device
US10791152B2 (en) 2008-11-26 2020-09-29 Free Stream Media Corp. Automatic communications between networked devices such as televisions and mobile devices
US10771525B2 (en) 2008-11-26 2020-09-08 Free Stream Media Corp. System and method of discovery and launch associated with a networked media device
US10425675B2 (en) 2008-11-26 2019-09-24 Free Stream Media Corp. Discovery, access control, and communication with networked services
US10032191B2 (en) 2008-11-26 2018-07-24 Free Stream Media Corp. Advertisement targeting through embedded scripts in supply-side and demand-side platforms
US9703947B2 (en) 2008-11-26 2017-07-11 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9706265B2 (en) 2008-11-26 2017-07-11 Free Stream Media Corp. Automatic communications between networked devices such as televisions and mobile devices
US10334324B2 (en) 2008-11-26 2019-06-25 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US9986279B2 (en) 2008-11-26 2018-05-29 Free Stream Media Corp. Discovery, access control, and communication with networked services
US10074108B2 (en) 2008-11-26 2018-09-11 Free Stream Media Corp. Annotation of metadata through capture infrastructure
US9838758B2 (en) 2008-11-26 2017-12-05 David Harrison Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10631068B2 (en) 2008-11-26 2020-04-21 Free Stream Media Corp. Content exposure attribution based on renderings of related content across multiple devices
US9686596B2 (en) 2008-11-26 2017-06-20 Free Stream Media Corp. Advertisement targeting through embedded scripts in supply-side and demand-side platforms
US9967295B2 (en) 2008-11-26 2018-05-08 David Harrison Automated discovery and launch of an application on a network enabled device
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US11410053B2 (en) 2010-01-25 2022-08-09 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984326B2 (en) 2010-01-25 2021-04-20 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984327B2 (en) 2010-01-25 2021-04-20 New Valuexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10607141B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10607140B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US8949872B2 (en) 2011-12-20 2015-02-03 Yahoo! Inc. Audio fingerprint for content identification
WO2013095893A1 (en) * 2011-12-20 2013-06-27 Yahoo! Inc. Audio fingerprint for content identification
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11522930B2 (en) 2013-03-07 2022-12-06 Google Llc Personal video recorder with limited attached local storage
US10735243B2 (en) 2013-03-07 2020-08-04 Google Llc Personal video recorder with limited attached local storage
US9819531B2 (en) 2013-03-07 2017-11-14 Google Inc. Personal video recorder with limited attached local storage
WO2014138632A1 (en) * 2013-03-07 2014-09-12 Google Inc. Personal video recorder with limited attached local storage
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9606986B2 (en) 2014-09-29 2017-03-28 Apple Inc. Integrated word N-gram and class M-gram language models
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US10560506B2 (en) 2017-07-19 2020-02-11 Netflix, Inc. Identifying previously streamed portions of a media title to avoid repetitive playback
JP7175957B2 (en) 2017-07-19 2022-11-21 ネットフリックス・インコーポレイテッド Identifying previously streamed portions of media titles to avoid repeated playback
CN111095939A (en) * 2017-07-19 2020-05-01 奈飞公司 Identifying previously streamed portions of a media item to avoid repeated playback
WO2019018164A1 (en) * 2017-07-19 2019-01-24 Netflix, Inc. Identifying previously streamed portions of a media title to avoid repetitive playback
AU2018304058B2 (en) * 2017-07-19 2021-01-21 Netflix, Inc. Identifying previously streamed portions of a media title to avoid repetitive playback
JP2020530954A (en) * 2017-07-19 2020-10-29 ネットフリックス・インコーポレイテッドNetflix, Inc. Identifying previously streamed parts of a media title to avoid repeated playback

Also Published As

Publication number Publication date
AU2003283783A1 (en) 2005-05-11

Similar Documents

Publication Publication Date Title
WO2005041455A1 (en) Video content detection
US9967514B2 (en) Recording system
US20070136782A1 (en) Methods and apparatus for identifying media content
US8155498B2 (en) System and method for indexing commercials in a video presentation
US7251413B2 (en) System and method for improved blackfield detection
US6469749B1 (en) Automatic signature-based spotting, learning and extracting of commercials and other video content
US8270810B2 (en) Method and system for advertisement insertion and playback for STB with PVR functionality
US20150163545A1 (en) Identification of video content segments based on signature analysis of the video content
US8661471B2 (en) Information processing apparatus and information processing method
CA2761031A1 (en) Correlation of media metadata gathered from diverse sources
US7904936B2 (en) Technique for resegmenting assets containing programming content delivered through a communications network
US20040086263A1 (en) System for maintaining history of multimedia content and method thereof
US20230291974A1 (en) Apparatus, systems and methods for song play using a media device having a buffer
JP4650555B2 (en) Information processing apparatus and information processing method
JP2002354391A (en) Method for recording program signal, and method for transmitting record program control signal
WO2010076266A2 (en) Recording media content
JP2006510286A (en) System, method, and apparatus for retrieving and automatically recording broadcast program

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP