WO1999035631A1 - Method and apparatus for providing interactive karaoke entertainment - Google Patents

Method and apparatus for providing interactive karaoke entertainment Download PDF

Info

Publication number
WO1999035631A1
WO1999035631A1 PCT/US1999/000407 US9900407W WO9935631A1 WO 1999035631 A1 WO1999035631 A1 WO 1999035631A1 US 9900407 W US9900407 W US 9900407W WO 9935631 A1 WO9935631 A1 WO 9935631A1
Authority
WO
WIPO (PCT)
Prior art keywords
karaoke
recited
performer
video
local
Prior art date
Application number
PCT/US1999/000407
Other languages
French (fr)
Other versions
WO1999035631A8 (en
WO1999035631A9 (en
Inventor
David Kumar
Subutai Ahmad
Original Assignee
Electric Planet, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Planet, Inc. filed Critical Electric Planet, Inc.
Priority to AU24538/99A priority Critical patent/AU2453899A/en
Publication of WO1999035631A1 publication Critical patent/WO1999035631A1/en
Publication of WO1999035631A8 publication Critical patent/WO1999035631A8/en
Publication of WO1999035631A9 publication Critical patent/WO1999035631A9/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/368Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems displaying animated or moving pictures synchronized with the music or audio part
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/365Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems the accompaniment information being stored on a host computer and transmitted to a reproducing terminal by means of a network, e.g. public telephone lines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/005Non-interactive screen display of musical or status data
    • G10H2220/011Lyrics displays, e.g. for karaoke applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/155User input interfaces for electrophonic musical instruments
    • G10H2220/441Image sensing, i.e. capturing images or optical patterns for musical purposes or musical control purposes
    • G10H2220/455Camera input, e.g. analyzing pictures from a video camera and using the analysis results as control data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/201Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
    • G10H2240/241Telephone transmission, i.e. using twisted pair telephone lines or any type of telephone network
    • G10H2240/245ISDN [Integrated Services Digital Network]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/281Protocol or standard connector for transmission of analog or digital data to or from an electrophonic musical instrument
    • G10H2240/295Packet switched network, e.g. token ring
    • G10H2240/305Internet or TCP/IP protocol use for any electrophonic musical instrument data or musical parameter transmission purposes

Definitions

  • This invention relates generally to multimedia entertainment systems, and more particularly to karaoke systems.
  • Karaoke is a form of entertainment, originating in Japan, that features a live singer with pre-recorded accompaniment.
  • Karaoke is a Japanese abbreviated compound word, where "kara” comes from “karappo” meaning empty, and “oke” is the abbreviation of "okesutura,” or orchestra. Therefore, karaoke literally means "empty orchestra.” While originating in Japan, the karaoke boom has spread abroad, and is popular in Korea, China and other parts of Southeast Asia, as well as in the U.S. and Europe.
  • Karaoke music was originally recorded on audio tape, but quickly evolved with the advent of the compact disk, which not only allows rapid, non-serial access new songs, but which also can include multimedia effects such as video and lyrics. Therefore, the advent of the compact disk made it possible to enhance the karaoke experience with video scenes synchronized with the music and the accompanying lyrics.
  • karaoke has grown to be a major entertainment industry.
  • Family-use karaoke sets are also available.
  • Kansai area It was built from a converted freight car. Since then, karaoke boxes have been built on unoccupied grounds all over Japan, and in urban areas, karaoke rooms, which consist of compartments made by partitioning and soundproofing rooms in a building, were introduced .and set up one after another.
  • Karaoke is a common form of entertainment for Japanese business people. It is not at all uncommon for workers to drop into a bar with colleagues after work, have a drink, and enjoy singing popular songs to the accompaniment of karaoke. Karaoke has been entertaining people ever since its invention 20 years ago, and has become firmly established in Japanese society.
  • karaoke is available in a wide variety of formats, suitable for any venue, from a soloist rehearsing up to large crowds at community gatherings.
  • a typical karaoke show includes one or two singers, and a possibly a karaoke operator to operate the karaoke equipment. Couples will often enjoy a karaoke session together.
  • the equipment typically includes a player, an amplifier, and a television monitor for the music video. There may be an additional television monitor facing the singers to display the lyrics, or the lyrics can be displayed on the television monitory that is displaying the music video.
  • karaoke While karaoke is very popular, it may be reaching a saturation point, at least in Japan. This is because there are many thousands of karaoke boxes and bars having karaoke systems and, as such, the novelty is beginning to wear off.
  • karaoke One attempt to increase the interest in karaoke is the use of "blue screen” technology which allows a video camera to capture the image of one or more persons standing in front of a blue screen, and inserting the images of those persons into the music video.
  • this technology is somewhat cumbersome in that it requires a specialized stage including the blue screen, and in that the karaoke customers are merely superimposed upon a background image of the music video without any interactivity with that background scene.
  • karaoke system which allows new, enhanced, and interactive participation of karaoke customers with their karaoke experience.
  • a personal computer is paired with a karaoke audio/video system and a video camera to provide interactivity between the karaoke customers (i.e. the karaoke performers) and the karaoke system.
  • images of the karaoke customers are captured with a video camera, processed in the personal computer, and composited into the musical video presentation.
  • a process for providing interactive karaoke entertainment includes the acts of determining if there is a user initiation and, if so, whether the request of content is local. If not, the content is retrieved. Next, a "frame" of video information is received by the video camera, and background subtraction is performed. Then, there is a tracking analysis, with the results being put into a tracking buffer. A gesture analysis is then performed. Next, the image is "composited” based upon the tracking and gesture analysis and the request of content. The resulting multimedia content is then outputted and, preferably, recorded. The next frame is then retrieved from the video camera and the process is repeated.
  • the interactive karaoke entertainment system is designed so that it can form a part of a larger network of karaoke entertainment systems. More particularly, a number of interactive karaoke entertainment systems are adapted to coupled to a local area network (LAN) which is served by a local PC server.
  • the local PC server can communicate with an Internet based content server to download content that is not locally available and to upload accounting information.
  • the local PC server includes the acts of determining whether it has been polled by a content server and, if so, accounting information is transferred to the contents server and other information or software or content can be uploaded or downloaded with the content server. If there has been no polling, the local PC server then determines whether there is a request from a local PC that is coupled to the local area network. If there is, it is determined whether the content is locally available and, if not, the local PC server communicates with remote content server to obtain the desired content. The content is then downloaded to the requesting PC over the local area network .and .an accounting entry is created at the local PC server reflecting the karaoke customers use of that content.
  • the interactive karaoke system of the present invention will add a new dimension of enjoyment to the karaoke experience.
  • the interactive nature allows the karaoke to transcend a simple performance and take on aspects of an interactive game. This increases the enjoyment and therefore the use of the interactive karaoke systems of the present invention.
  • Fig. 1 is a representation of interactive karaoke entertainment system in accordance with the present invention
  • Fig. 2 is a block diagram of a portion of the system of Fig. 1;
  • Fig. 3 is a pictorial representation of the personal computer (PC) portion of the system of Fig. 1;
  • PC personal computer
  • Fig. 4 is a flow diagram illustrating the computer implement operations performed by the personal computer of Fig. 3;
  • Fig. 4A is a illustration of the compositing act of Fig. 4.
  • Fig. 4B is an illustration of the compositing act of Fig. 4.
  • Fig. 5 is a representation of a networked karaoke entertainment system of the present invention.
  • Fig. 6 is a flow diagram illustrating computer implemented acts performed by the local PC server of Fig. 5 ;
  • Fig. 7 is a pictorial representation illustrating one implementation of the interactive karaoke entertainment system of the present invention.
  • Figs. 8A .and 8B illustrate illustrating another, more integrated, implementation of the interactive karaoke entertainment system of the present invention
  • Fig. 9 is a more detailed view of the karaoke module used in the DVD and VCD player of
  • Fig. 10 is a block diagram of the vision processor of the karaoke module illustrated in Fig. 9;
  • Figs. 10A and 10B illustrate a preferred integrated circuit package arrangement for the vision processor of Fig. 10;
  • Fig. 11 illustrates a typical set-up of an interactive karaoke entertainment system of the present invention
  • Fig. 12 illustrates an embodiment of the present invention that utilizes a digital television system
  • Figures 13 A and 13B are flowcharts showing a preferred embodiment of a method for model-based compositing of the present invention
  • Figure 14 is a flowchart showing a process for capturing a frame of an average (background) image
  • Figure 15 is a flowchart showing a process for updating the background model
  • Figure 16 is a flowchart showing a process for updating the minimum and maximum values for pixels in the average image
  • Figure 17A is a replica of a sample background model or average image
  • Figure 17B is a replica of a sample input image consisting of the background image including the object being composited
  • Figure 18 A is a flowchart showing a process for subtracting a background to isolate the object being composited
  • Figure 18B shows an initial alpha image of an object being composited after the background subtraction procedure described with respect to Figure 8 A is done;
  • Figures 19A and 19B are flowcharts a process for showing a preferred embodiment of the shadow reduction process
  • Figures 20 A through 20D are flowcharts showing a process for matching the object to a model of the object made up of object part templates
  • Figure 21 is a flowchart showing the process for fitting parts of the object with the templates
  • Figure 22A is a flowchart showing a process for eliminating background artifacts and clutter close to the boundary of the object so that such items are not unintentionally composited with the object onto the destination image;
  • Figure 22B shows an alpha image of the object after the shadow reduction, hole filling, and background clutter procedures have been performed.
  • Figure 23 is a flowchart showing a process for blending the object from the input image onto the destination image using the alpha image as a blending coefficient.
  • an interactive karaoke entertainment system 10 in accordance with the present invention includes karaoke audio and video equipment 12, a personal computer (PC) 14, a TV monitor 16, and a video camera 18.
  • a personal computer PC
  • TV monitor 16 TV monitor
  • a video camera 18 Associated with the karaoke audio and video equipment 12 is an input microphone 20 and a remote control 22.
  • An optional photo-printer 24 can be coupled to the PC 12.
  • the karaoke audio and video equipment can be provided by any number of vendors. In this embodiment of the entertainment system 10, only the audio portion of the karaoke equipment 12 is used. In other words, as a karaoke customer sings into the microphone 20, the karaoke equipment 12 will amplify and process the sound and play it from speakers (not shown) and/or the TV monitor 16. However, the image for the TV monitor 16, in the present embodiment, is provided by the PC 14 via a video input line 26 to the karaoke equipment 12.
  • Karaoke equipment such as karaoke equipment 12, typically have an external video input to receive external video information. The combined video and audio is then provided by the karaoke equipment 12 to the TV monitor 16 as illustrated by arrow 28.
  • the karaoke equipment 12 typically includes a control and data port (often a serial port) which is coupled to the PC by a bus 30.
  • the output of the video camera 18 is coupled to the PC 14 by a cable 32 and, in alternate embodiments of the invention, may be coupled to the PC by a control cable to allow specialized software and utilities to be loaded into the camera 18 from the PC 14.
  • the photo-printer 24 allows the capture of images that are displayed on the TV monitor that can be printed as photographs, photographic buttons, rubber stamps, etc. There are several vendors for such photo-printers.
  • the PC 14 is coupled to a local network server by a local area network (LAN) cable 34.
  • LAN local area network
  • the PC 14 is preferably a standard microcomputer available from a variety of sources including a microprocessor 36 that is coupled to dynamic random access memory (DRAM) 38 .and to read only memory (ROM) 40.
  • the microprocessor 36 is also coupled to one or more I/O buses 42 to which peripherals, such as peripheral 44 is coupled.
  • peripheral 44 can be a CD-ROM drive, a DVD drive, a hard disk drive, or any number of input/output (I/O) interfaces.
  • the voice input from the microphone 20 is coupled to the karaoke audio video equipment 12 via a cable 46 and, optionally, to the I/O bus 42 by .an audio input card 48.
  • the image input from the video camera 18 is input to a video input card 50 which, also, is coupled to I/O bus 42.
  • the LAN 34 is coupled to the I/O bus 42 by a network card 52.
  • a video output card 54 is coupled to the I/O bus and produces NTSC (and possibly stereo) output for the karaoke audio visual system 12 on the line 26.
  • a parallel card 56 is coupled to the I/O bus 42 and produces photo-printer output signals for the photo-printer 24.
  • An audio card 58 produces an audio output for a power amplifier (not shown) that may be hooked up to loudspeakers (also not shown).
  • a control card 60 an be provided for purposes such as lighting control. In Fig. 3, a preferred physical implementation of the PC 14 as illustrated.
  • the PC 14 is of a "tower" design which provides a multiplicity of I/O slots for he various cards of the present invention. More particularly, a memory expansion board 62, a video card 54, the audio card 58, the camera interface card 50, the network interface card 52, the control card 60, and the parallel card 58 are preferably plugged into I/O slots within the PC tower 14. A keyboard 64 and a mouse 66 are coupled to the PC tower 14 in a conventional manner. Likewise, the PC tower 14 is preferably provided with a CD-ROM drive, a floppy drive, and a pair of hard disks in a conventional fashion. It is preferred to have two hard disks operating in parallel (i.e. "mirroring" each other) for redundancy, since this is the most common area of failure in the PC. By having redundant hard disks drives, the karaoke operator can be virtually assured that the karaoke entertainment system will be continuously operable.
  • the computer implemented process running on the PC 14 is illustrated in flow- diagram. More particularly, the process 68 begins at 70 and, in a decision operation, it is determined whether a user (i.e. a "karaoke customer") is initiating the use of the karaoke entertainment system. This is typically accomplished by using the remote control 22 to activate the selection of a karaoke song. If there is no user initiation, the operation 72 cycles until an initiation is detected. Once an initiation is detected, the process 68 determines whether the requested content is local. By “content” it is meant the requested music video, along with any accompanying multi-media affects .and software required for the interactivity with the karaoke entertainment system. If the content is not local, an operation 76 retrieves the content.
  • a "frame" of video data is retrieved from the video camera 18. Once the frame has been retrieved and buffered in the memory of the personal computer 14, a background subtraction is performed. Next, a tracking analysis operation 82 is performed and the results are placed in a tracking buffer of the PC 14 .and an operation 84. Next, a gesture .analysis operation 86 is performed. Subsequently, the images composited based upon the tracking and gesture analysis of operations 82 and 86, respectively, and by the content requested by the karaoke customer. Finally, in operation 90 the resulting composited multi-media content is outputted and, preferably, recorded in a suitable recording device such as a video cassette recorder, recordable CD-ROM, recordable DVD disk, etc. It is the determined in operation 92 if the karaoke customer is done with their particular karaoke session. If so, process control is returned to operation 72 and if not, process control is returned to operation 78 to retrieve a new frame from the video camera.
  • a suitable recording device such
  • a "frame" 94 of video derived from the camera 18 is loaded into the memory 62 of the PC 14.
  • the frame 94 includes the "true" background image 96 and the images of two karaoke customers or "players” or “performers” 96 and 98.
  • the frame is retrieved by operation 78 and a background subtraction is performed by operation 80 to remove all but the karaoke customers 96 and 98. It should be noted that this background subtraction is accomplished without the use of the awkward blue screen apparatus of the prior art.
  • the operation 82 performs the tracking analysis operation 82 to provide a tracked image 100.
  • the compositing operation 88 then composites the karaoke customers 96 and 98 into an interactive environment 102.
  • karaoke customers 96 and 98 permit the karaoke customers 96 and 98 to interact with the environment 102.
  • animated sparks 104 can be caused to fly from her fingertips.
  • the grasping of the hand of the karaoke customer 96 by the karaoke customer 98 can be used a gesture which produces the images of hearts 106 in the interactive environment 102.
  • Other gestures or body positions can also interact with various objects 108 in the interactive environment, or change the scene of the interactive environment. Therefore, with the technology of the present invention, karaoke becomes a truly interactive activity, somewhat akin to a game, wherein the multi-media, enhanced reality, and virtual reality effects are possible. It should also be noted that this is a true multi-media experience for the karaoke customers.
  • the operation 88 includes a "media merging" engine 112 which has input, lyrics, audio (e.g. such as from the microphone), sound effects, graphics, animation, camera images, alpha images, tracking information, and gestures.
  • the output is a video stream which provides the video signals for a television monitor, and an audio stream which provides the audio signals for the television monitor and/or separate loudspeakers.
  • a network configuration for the interactive karaoke entertainment system 10 is illustrated. More particularly, a karaoke entertainment system 10 is shown in the lower left hand corner of the page, while a number of other similar systems 10A, 10B, 10C, etc. are also illustrated.
  • Each of the interactive karaoke entertainment systems 10 are coupled to a local area network (LAN) backbone or hub 114 to communicate with a local PC server 116.
  • LAN local area network
  • the local PC server 116 is simply a powerful personal computer system.
  • the local PC server 116 and the interactive karaoke entertainment systems are in fairly close proximity, e.g. within the same building.
  • each of the interactive karaoke entertainment systems 10 can be located in its own, soundproofed room, while the local PC server can be provided in a server or operator room in the same building.
  • the implementation of local area networks are well known to skilled in the .art.
  • the local PC server is coupled to a content server 118 by a telephone line 120.
  • the content server 118 includes karaoke "content", which is defined as musical video accompanied by lyrics and any data or software programs required for the interactive use of the "content.”
  • the telephone line connecting the local PC server to the content server can be a standard analog telephone line (with the use of appropriate modems at both the local PC 16 and the content server 118), or can be a digital line such as an ISDN line, Tl line, etc. digital line.
  • the advantage of the digital lines are, of course, a significantly higher data transfer rate, with the disadvantage of higher cost.
  • Other data tr.ansmission medium are also well known to those skilled in the .art.
  • the content server 118 is a "mirror site” that is coupled to a remote content server 122 by, for example, the Internet 124.
  • a "mirror site” is a site which is updated on a periodic basis, to reflect or "mirror” the contents of another or “master” site, such as content server 122.
  • the purpose of the mirror site 118 is to prevent unnecessary communication delays, especially when transferring large amounts of data, over a relatively slow transmission media such as the Internet 124.
  • one or more content servers can be provided in various cities in Japan while a single content server can be provided in Palo Alto, California.
  • a number of content development systems 124 can then be used to load new content on content server 22 which, as explained previously, creates a mirror image of itself at the content server mirror site 118 via the Internet 124 on a periodic basis.
  • a computer implemented process 126 running on the local PC server 116 begins at 128 and, in an operation 130, it determines whether it has been polled by the content server mirror site 118. It should be noted here that the mirror sites 118 are not required, as the local PC server could communicate directly with the content server 122 via the Internet 124. However, for purposes of efficiency, it is often more desirable to access a local mirror site 118.
  • the local PC server 116 determines that it has been polled, it connects with the appropriate content server and transfers accounting information in an operation 132.
  • This accounting information can include the number of times a particular karaoke video has been played and what the appropriate charge for the karaoke operator should be.
  • an operation 134 can be used to upload and download other information, content, software, etc. Process control is then returned to operation 30.
  • an operation 136 determines whether here is a request from a local PC, i.e. one of the interactive karaoke entertainment systems 10. If not, process control is returned to operation 130. If there is a request from a local PC, an operation 138 determines whether the requested content is locally available. If not, the content is retrieved from the content server in an operation 140. It should be noted that the local PC server 116 can be connected to the content server mirror site 118 either on a continuous basis (such as with a ISDN line) or on an "on demand" basis, such as with dial- up modem access. Next, an operation 142 downloads the requested content to the requesting local PC, and in operation 144 creates an accounting entry at the local PC server 116. This accounting entry, along with other data, is what is transferred to the content server in the operation 132.
  • an alternative interactive karaoke entertainment system 10' includes a DVD and VCD player 146, a karaoke adapter 148 of the present invention, a recorder 150, a binocular camera 18', and a television monitor 16.
  • the player 146 and adapter 148 are controlled by a remote control 152.
  • a microphone 20 is coupled to the player 146, and a number of DVD and/or VCD disks 154 are inserted to the player 146.
  • the output of the player 146 goes into the adapter 148, as does the output of the camera 18'.
  • the adapter 48 performs the functionality described previously with regards to the PC 14 running the computer implemented process 68 of Fig. 4.
  • the advantage of this systems is that a separate, dedicated personal computer 14 is not required, since that functionality has been integrated into the adapter 148.
  • the output of the adapter 148 is input into the television monitor and/or loudspeakers (not shown).
  • a VCR, recordable CD-ROM or recordable DVD recorder 150 can be used to record the output of the adapter 148.
  • FIGs. 8A and 8B yet another alternate embodiment of the present invention integrates the functionality of the player 146 with the adapter 158 of Fig. 7. More particularly, a combination DVD/VCD karaoke player 156 is shown in a front elevational view in Fig. 8 A and a top plan view with the top lid removed in Fig. 8B.
  • the combined unit 156 includes a VCD and DVD logic module 158, a disk loader 160, a VCD and DVD drive 162, and a karaoke module 164.
  • a power supply 166 is coupled to a source of AC power by a cord and plug 68.
  • the unit 156 has, as inputs, an input 170 from the server, and an input 172 from the camera.
  • the unit 156 has, as outputs, .an output 174 to the television monitor 16 and an output 176 to recorder 150.
  • the advantaged of integrating the karaoke module 164 into a DVD and VCD player includes both size and cost reductions.
  • the interactive video karaoke module 164 is shown in a conceptual form. It includes, as inputs, an input 174 for receiving video input from the disk player, and an input 180 for receiving input from the camera 18'.
  • the module 164 includes an output 182 to the television monitor 16 and an optional output 184 to the camera 18'. It is therefore contemplated that the camera 18' being used with the interactive karaoke entertainment system 10' may be a "smart" camera which can receive programs, data, and commands from the karaoke module 164.
  • the karaoke module 164 includes a vision processor 186 and an ASIC 188 to handle data communications between the karaoke module 164 and the rest of the unit 156.
  • the vision processor 186 includes a digital video interface 190, the color processing unit 192, a microcontroller 194, a vision algorithm core 196, a compression unit 198, an ASIC 200 to handle various glue logic functions, memory 202, a Universal Serial Bus (USB) module 204, a memory controller 206, a field programmable gate array (FPGA) controller 208, and a PAL/NTSC module 210.
  • J-Tag circuitry can be included to provide bound.ary scan capabilities.
  • the input signals (at the digital video interface 190) are processed by the vision processor 186 under microcontroller 194 control.
  • a first output 212 is provided by the USB, and a second output, either for European (PAL) or U.S. (NTSC) video formats is provided at an output 214.
  • External DRAM 216 is coupled to the memory controller 206, and an external FTGA 218 is coupled to the FPGA controller 208.
  • a top plan view of a preferred packaging for the vision processor 186 is shown in Fig. 10A, with a side elevational view taken along line 10B-
  • FIG. 10B is shown in Fig. 10B.
  • FIG. 11 an exemplary use of an interactive karaoke entertainment system 10 is illustrated.
  • the camera 18 of the unit is aimed toward a play area 220 where the karaoke customers may sing and otherwise perform. It is preferred that the customer stay within the play area 220 so as to remain within the "field of sight" 222 of the camera 18.
  • a wired or wireless microphone 20 can be used by the karaoke customers as they sing, and a remote control can be used to activate the system and to select the karaoke music video they wish to accompany.
  • the karaoke customers moves about in the play area 220 and make pre-determined gestures .and poses, they can interact with the video and other content displayed on the television monitor 16.
  • a digital television 224 is used as the display unit an interactive karaoke entertainment system 10".
  • the real time video interaction and vision technologies 226, as disclosed herein provide an interaction between the digital television and a number of peripheral sources. More particularly, the real time video interaction vision technologies provide an interaction with a computer 228, a digital camera 230, a DVD player 232, a VCD player 234, a game console 236, a digital broadcast receiver 238, a video telephone 240, a "set top” box 242, a satellite receiver 244, or a camcorder 246.
  • the functionality of the interactive karaoke entertainment systems 10 as described with reference to the analog television monitor are quite transportable to the digital television system as well.
  • Figures 13 A .and 13B are flowcharts showing a preferred embodiment of a method for model-based compositing of the present invention.
  • the system is initialized by setting the variable N to zero.
  • N is the number of iterations the system will perform to create a background model.
  • a background model (“average image" ) is created or built by averaging several frames of a background image.
  • An average image is essentially an image of a backdrop that does not contain the object that is being composited.
  • an average image could be a simple image of the karaoke room.
  • the background model is essentially a model of the generally static (i.e., non-moving or unchanging) background in which the object being composited, such as a person, will enter.
  • this background model Before the object enters, however, this background model must be created.
  • the system captures one frame of the background image. With every image frame captured, the system updates the background model as shown block 304.
  • the process of updating the background model involves updating the average image and maintaining a minimum and maximum pixel value chart for each pixel in the average image examined as well as updating the average image. These charts are maintained by comparing the value of the pixel to the minimum and maximum value of that pixel based on previous images. This process is described in greater detail in Figure 16. Blocks 302 and 304 are described in greater detail in Figures 14 and 15 respectively.
  • the system determines whether the background model update is complete by checking if the number of iterations (i.e., number of captured frames of the average image) has reached N.
  • N is the number of iterations the user wants the system to perform in order to create the background model. For example, the user may want the system to go through 30 or a 100 iterations to build the background model depending on how much time the user wants to spend building the model and how accurate the user wants it to be. If the number of iterations has not reached N, the system returns to block 302 where another frame of the average image is retrieved. If the number of iterations has reached N and no more frames are needed to build the background model, the system proceeds to block 308.
  • the system retrieves the minimum and maximum values for each pixel in the average image from the minimum and maximum pixel value charts discussed above.
  • the system computes the tolerance for each pixel value to be the difference between the maximum pixel value and the minimum pixel value for that pixel. For many of the pixels that are stationary, this tolerance will be close to zero. The tolerance for non-static pixels will likely be greater than zero.
  • the system checks whether there are any more pixels. If there are more pixels, the process returns to block 308 where the minimum and maximum values for the next pixel in the average image are retrieved. If there .are no more pixels in the average image, the process continues with block 314.
  • the system captures a frame of an input image.
  • the input image is essentially a background image containing the object being composited.
  • the object could be a human being (e.g., a child) and the background image could be a living room or bedroom.
  • the system begins creating a new image called the alpha image which contains a representation of the object being composited by first isolating the object. This is first done by subtracting the background from the input image.
  • the background subtraction block is described in greater detail in Figure 18 A.
  • the system performs a procedure for improving the alpha image referred to generally as shadow reduction at 318.
  • This procedure reduces the effect of shadows cast by the object on other background objects in creating the alpha image. It is described in greater detail in Figures 19A and 19B.
  • the system performs another procedure for improving the alpha image called model fitting as shown at 320.
  • model fitting the system creates a configuration of templates where each template fits entirely within a particular part of the object. For example, if the object is a person, one template could be for the torso or head.
  • the configuration of templates make up the model which fits the object.
  • the model fitting allows the system to fill up holes in the object while the object is being created in the alpha image. This process is described in greater detail in Figures 20A to 20D.
  • the block following the creation of the templates is simply that of matching each template to its appropriate object part and setting the alpha pixels within the templates to one. This object fill process is shown at 322 and is described in greater detail in Figure 11.
  • Figure 18 is the alpha image, which now contains less holes than the previous alpha image, after the object fill block.
  • the system eliminates as much background clutter and artifacts as possible without affecting the object itself. In order to do this it assumes that artifacts greater than a predetermined distance from the closest template (created in block 320 above) is clutter or some type of extraneous artifact not part of the object being composited and ensures that it is not composited onto the destination image.
  • the system uses the alpha image to blend the object from the input image onto the destination image.
  • This procedure known in the art as an alpha blend, uses the value of the pixels in the alpha image to determine which pixels from the input image should be blended onto the destination image. It is described in greater detail in Figure 23.
  • the system checks whether there are any other images to be composited at 328. If there are, the system returns to block 314 where it captures a frame of the next input image that contains the new object to be composited.
  • Figure 14 is a flowchart showing a process for capturing a frame of an average
  • the current pixel is from the image that was captured in the present iteration.
  • the system looks at the current value and the previous value for each pixel in the average image frame.
  • the system determines whether the sum of the differences computed block 400 is greater than a predetermined threshold value. If not, the system proceeds to 408 where the number of iterations is incremented by one. If the number of iterations reaches N, the process of capturing frames of an average image is complete. If the sum of differences is greater than the threshold value, then there has been too much activity in the background image thereby preventing a background model from being built. This can occur, for example, if a large object passes through the image or if an object in the image is moved.
  • the threshold value is set such that some non-static activity, such as a television screen displaying images or a window showing a road with occasional passing objects, is acceptable and will not prevent a background model from being built. However, significant activity will cause the system to re-initialize itself (setting N to zero) and re-starting the process from block 300 of Figure 13A as shown block 406.
  • Figure 15 is a flowchart showing a process for updating the background model.
  • the background model is updated, if necessary, with each new background image frame captured as described in Figure 14. Once the number of frames captured equals N, the updating process is complete and the background model has been created.
  • the average (background) image is comprised of pixels.
  • the system retrieves a pixel from the average image.
  • the system updates the average image color pixel value. Each pixel in the average image has an average color value.
  • the average color value for the pixels is determined in a preferred embodiment according to the RGB color scheme, well-known in the art. Other color schemes such as YUB can also be used in another preferred embodiment.
  • a low pixel color value indicates a dark pixel.
  • a color pixel value of zero would essentially be a black pixel.
  • the brightest pixel will have the maximum color value for a pixel.
  • the system In building the background model, the system also maintains a minimum image and a maximum image.
  • the minimum color image and the maximum color image are used to provide a tolerance or variance for each pixel in the background model.
  • a pixel that is part of a stationary object for example a piece of furniture in the living room, will have little variance or none at all. Any variance for such a pixel would most likely result from camera noise.
  • a pixel that is part of a background image that is dynamic, such as a television screen or the view through a window will have a greater tolerance.
  • Such pixels are not stationary and the brightness of such pixels can vary while the background model is being updated. For these pixels, the system needs to have a variance or tolerance level.
  • the system updates the minimum and maximum values for each pixel if needed.
  • the minimum and maximum values for each pixel provides the tolerance for each pixel. Thus, if the new color pixel value is less than the previous minimum color value for that pixel, the minimum value is updated. Similarly, if the color pixel value is greater than the maximum value for that pixel the maximum value is updated. Block 504 is described in greater detail in Figure 6.
  • the system checks to see whether there are any more pixels in the average image that need to be checked. If there are, the process returns to block 500 where the next pixel from the average image is retrieved. If not, the system returns to the background model update process as shown in block 306 of Figure 13 A.
  • Figure 16 is a flowchart showing a process for updating the minimum and maximum values for pixels in the average image.
  • the system determines whether the color pixel value of the pixel just retrieved is greater than the maximum value of the corresponding pixel from previous frames. If the current color pixel value is greater, the system sets the maximum color pixel value to the current color pixel value in block 602. Once this is done, the maximum color value for the pixel in that location is set to a new high value. If the current color pixel value is not greater than the maximum value, the system proceeds to block 604. At 604, the same process as in blocks 600 and 602 takes place except the minimum color pixel value is compared to the color pixel value of the pixel just retrieved.
  • the system sets the new minimum color pixel value to the current color pixel value in block 606. Once the system determines whether the minimum or maximum pixel values need to be updated, the system continues the process of updating the background model.
  • Figure 17A is a replica of a sample background model or average image. It shows a typical karaoke area without the object to be composited.
  • Figure 17B is a replica of a sample input image (discussed below) that consists of the average image including the object being composited, in this example, a figure of a person.
  • Figure 18 A is a flowchart showing a process for subtracting a background to isolate the object being composited.
  • Background subtraction is basically the first process in creating an alpha image of the object that is being composited to a destination image.
  • Each frame of an alpha image is digitally composed such that each pixel is either a 0 or 1 based on whether that pixel is either part of the object. If a pixel has a value of one, that pixel is within the object being composited. Where the value of the alpha pixel is zero, the pixel is not part of the object (i.e., it may be part of the background) and is not composited onto the destination image.
  • the alpha image is used in an alpha blend, a technique well-known in the art, to blend the object in the input image with the destination image.
  • the system retrieves a pixel in the input image frame.
  • the input image contains the background and the object being composited.
  • the system also determines its value and sets it to be the current pixel value.
  • the system determines whether the absolute value of the difference between the current pixel value and the value of its corresponding pixel from the average image is greater than the tolerance of the current pixel plus a constant. As described block 310 of Figure 13 A, each pixel in the average image has a tolerance which is essentially the difference between the maximum and minimum pixel values. If the absolute value of the difference between the current pixel value and the average image pixel value is greater than the tolerance of the current pixel, the system proceeds to block 804 where the system sets the alpha pixel value to one.
  • the alpha image is initially set to all zeros and the value for each pixel in the alpha image is changed to one only if the corresponding pixel in the input image frame is determined to be part of the object that is being composited. Otherwise the value of the alpha pixel is unaffected.
  • the system provides a means for recognizing gestures, positions, and movements made by one or more subjects (karaoke singers) within a sequence of images and performing an operation based on the semantic meaning of the gesture.
  • a subject such as a human being, enters the viewing field of a camera connected to a computer and performs a gesture.
  • the gesture is then examined by the system one image frame at a time.
  • Positional data is derived from the input frame and compared to previously derived data representing gestures known to the system. The comparisons are done in real time and the system can be trained to better recognize known gestures or to recognize new gestures.
  • a computer-implemented gesture recognition system is described.
  • a background image model is created by examining frames of an average background image before the subject that will perform the gesture enters the image.
  • a frame of the input image containing the subject, such as a human being, is obtained after the background image model has been created.
  • the frame captures the person in the action of performing the gesture at one moment in time.
  • the input frame is used to derive a frame data set that contains particular coordinates of the subject at that given moment.
  • sequence of frame data sets taken over a period of time is compared to sequences of positional data making up one or more recognizable gestures i.e., gestures already known to the system. If the gesture performed by the subject is recognizable to the system, an operation based on the semantic meaning of the gesture may be performed by the system.
  • the gesture recognition procedure may include a routine setting its confidence level according to the degree of mismatch between the input gesture data and the patterns of positional data making up the system's recognizable gestures. If the confidence passes a threshold, a material is considered found.
  • the gesture recognition procedure may further include a partial completion query routine that updates a status report which provides information on how many of the requirements of the known gestures have been met by the input gesture. This allows queries of how much or what percentage of a known gesture is completed by probing the status report. This is done by determining how many key points of a recognizable gesture have been met.
  • the gesture recognition procedure preferably includes a routine for training the system to recognize new gestures or to recognize certain gestures performed by an individual more efficiently.
  • a probability distribution for each key point indicating the likelihood of producing a particular observable output at that key point is also derived.
  • Once a characteristic data pattern is obtained for the new gesture it can be compared to patterns of previously stored known gestures to produce a confusion matrix.
  • the confusion matrix describes possible similarities between the new gesture .and known gestures as well as the likelihood that the system will confuse these similar gestures.
  • an interactive karaoke system of the present invention includes a microphone developing an audio input from at least one karaoke performer, a camera producing a series of video frames including the at least one karaoke performer, and a karaoke processor system including a video environment .and a related audio environment for the karaoke performer.
  • the karaoke processor system is coupled to the camera to create extracted images of the at least one karaoke performer from the series of video frames and to composite the extracted images with a background derived from the video environment, where the video environment is affected by at least one of a position and a movement of the at least one karaoke performer as detected, for example, by a gesture recognizer.
  • the karoake performer image can be recognized for position, movement, and/or semantic content either before or after image extraction from the background.
  • the present invention includes a karaoke network including a local area network, a local karaoke server coupled to the local area network .and storing local karaoke content; and a plurality of karaoke systems coupled to the local area network, each of which can request karaoke content from the local karaoke server.
  • the karaoke network also includes a distal content server system coupled to the local karaoke server.

Abstract

An interactive karaoke system includes a microphone (20) developing an audio input from at least one karaoke performer; a camera (18) producing a series of video frames including the at least one karaoke performer; and a karaoke processor system including a video environment and a related audio environment for the karaoke performer. The karaoke processor system is coupled to the camera to create extracted images of the at least one karaoke performer from the series of video frames and to composite the extracted images with a background derived from the viedo environment. The video environment is affected by at least one of a position and a movement of the at least one karaoke performer. A karaoke network includes a local area network, a local karaoke server coupled to the local area network and storing local karaoke content; and a number of karaoke systems coupled to the local area network, each of which can request karaoke content from the local karaoke server.

Description

Method and Apparatus for Providing Interactive Karaoke Entertainment
Description
Technical Field
This invention relates generally to multimedia entertainment systems, and more particularly to karaoke systems. Background Art
Karaoke is a form of entertainment, originating in Japan, that features a live singer with pre-recorded accompaniment. Karaoke is a Japanese abbreviated compound word, where "kara" comes from "karappo" meaning empty, and "oke" is the abbreviation of "okesutura," or orchestra. Therefore, karaoke literally means "empty orchestra." While originating in Japan, the karaoke boom has spread abroad, and is popular in Korea, China and other parts of Southeast Asia, as well as in the U.S. and Europe.
Karaoke music was originally recorded on audio tape, but quickly evolved with the advent of the compact disk, which not only allows rapid, non-serial access new songs, but which also can include multimedia effects such as video and lyrics. Therefore, the advent of the compact disk made it possible to enhance the karaoke experience with video scenes synchronized with the music and the accompanying lyrics.
Using technological innovations such as the video disk, laser disk, and CD graphics, karaoke has grown to be a major entertainment industry. Family-use karaoke sets are also available. However, there is an obstacle to this end of the business: since most Japanese houses stand close each other and are still built of wood, with poor soundproofing, it would be very annoying of the neighbors to sing into -an .amplified karaoke system at night.
Reacting to the opportunity created by this problem, entrepreneurs created the "karaoke box", a roadside facility containing closed-door insulated rooms for singing. They are advertised as a place where you can "sing to your heart's content." The first karaoke box appeared in 1984 in a rice field in the countryside of Okayama Prefecture, just west of the
Kansai area. It was built from a converted freight car. Since then, karaoke boxes have been built on unoccupied grounds all over Japan, and in urban areas, karaoke rooms, which consist of compartments made by partitioning and soundproofing rooms in a building, were introduced .and set up one after another. Karaoke is a common form of entertainment for Japanese business people. It is not at all uncommon for workers to drop into a bar with colleagues after work, have a drink, and enjoy singing popular songs to the accompaniment of karaoke. Karaoke has been entertaining people ever since its invention 20 years ago, and has become firmly established in Japanese society.
Today, karaoke is available in a wide variety of formats, suitable for any venue, from a soloist rehearsing up to large crowds at community gatherings. However, a typical karaoke show includes one or two singers, and a possibly a karaoke operator to operate the karaoke equipment. Couples will often enjoy a karaoke session together. The equipment typically includes a player, an amplifier, and a television monitor for the music video. There may be an additional television monitor facing the singers to display the lyrics, or the lyrics can be displayed on the television monitory that is displaying the music video.
While karaoke is very popular, it may be reaching a saturation point, at least in Japan. This is because there are many thousands of karaoke boxes and bars having karaoke systems and, as such, the novelty is beginning to wear off.
One attempt to increase the interest in karaoke is the use of "blue screen" technology which allows a video camera to capture the image of one or more persons standing in front of a blue screen, and inserting the images of those persons into the music video. However, this technology is somewhat cumbersome in that it requires a specialized stage including the blue screen, and in that the karaoke customers are merely superimposed upon a background image of the music video without any interactivity with that background scene.
What would therefore be desirable is a karaoke system which allows new, enhanced, and interactive participation of karaoke customers with their karaoke experience.
Disclosure of the Invention
In one embodiment of the interactive karaoke system of the present invention, a personal computer (PC) is paired with a karaoke audio/video system and a video camera to provide interactivity between the karaoke customers (i.e. the karaoke performers) and the karaoke system. In one aspect of the present invention, images of the karaoke customers are captured with a video camera, processed in the personal computer, and composited into the musical video presentation.
However, unlike prior art "blue screen" technologies, no special blue screen is required, and the user can interact with the karaoke content as portrayed on the TV monitor. For example, the karaoke customer may make gestures which to cause the images on the TV monitor to change. A process for providing interactive karaoke entertainment includes the acts of determining if there is a user initiation and, if so, whether the request of content is local. If not, the content is retrieved. Next, a "frame" of video information is received by the video camera, and background subtraction is performed. Then, there is a tracking analysis, with the results being put into a tracking buffer. A gesture analysis is then performed. Next, the image is "composited" based upon the tracking and gesture analysis and the request of content. The resulting multimedia content is then outputted and, preferably, recorded. The next frame is then retrieved from the video camera and the process is repeated.
The interactive karaoke entertainment system is designed so that it can form a part of a larger network of karaoke entertainment systems. More particularly, a number of interactive karaoke entertainment systems are adapted to coupled to a local area network (LAN) which is served by a local PC server. The local PC server can communicate with an Internet based content server to download content that is not locally available and to upload accounting information.
The local PC server includes the acts of determining whether it has been polled by a content server and, if so, accounting information is transferred to the contents server and other information or software or content can be uploaded or downloaded with the content server. If there has been no polling, the local PC server then determines whether there is a request from a local PC that is coupled to the local area network. If there is, it is determined whether the content is locally available and, if not, the local PC server communicates with remote content server to obtain the desired content. The content is then downloaded to the requesting PC over the local area network .and .an accounting entry is created at the local PC server reflecting the karaoke customers use of that content.
It will therefore be appreciated that the interactive karaoke system of the present invention will add a new dimension of enjoyment to the karaoke experience. The interactive nature allows the karaoke to transcend a simple performance and take on aspects of an interactive game. This increases the enjoyment and therefore the use of the interactive karaoke systems of the present invention.
Brief Description of the Drawings
Fig. 1 is a representation of interactive karaoke entertainment system in accordance with the present invention;
Fig. 2 is a block diagram of a portion of the system of Fig. 1; Fig. 3 is a pictorial representation of the personal computer (PC) portion of the system of Fig. 1;
Fig. 4 is a flow diagram illustrating the computer implement operations performed by the personal computer of Fig. 3;
Fig. 4A is a illustration of the compositing act of Fig. 4;
Fig. 4B is an illustration of the compositing act of Fig. 4;
Fig. 5 is a representation of a networked karaoke entertainment system of the present invention;
Fig. 6 is a flow diagram illustrating computer implemented acts performed by the local PC server of Fig. 5 ;
Fig. 7 is a pictorial representation illustrating one implementation of the interactive karaoke entertainment system of the present invention;
Figs. 8A .and 8B illustrate illustrating another, more integrated, implementation of the interactive karaoke entertainment system of the present invention;
Fig. 9 is a more detailed view of the karaoke module used in the DVD and VCD player of
Figs. 8B;
Fig. 10 is a block diagram of the vision processor of the karaoke module illustrated in Fig. 9;
Figs. 10A and 10B illustrate a preferred integrated circuit package arrangement for the vision processor of Fig. 10;
Fig. 11 illustrates a typical set-up of an interactive karaoke entertainment system of the present invention;
Fig. 12 illustrates an embodiment of the present invention that utilizes a digital television system;
Figures 13 A and 13B are flowcharts showing a preferred embodiment of a method for model-based compositing of the present invention;
Figure 14 is a flowchart showing a process for capturing a frame of an average (background) image; Figure 15 is a flowchart showing a process for updating the background model;
Figure 16 is a flowchart showing a process for updating the minimum and maximum values for pixels in the average image;
Figure 17A is a replica of a sample background model or average image;
Figure 17B is a replica of a sample input image consisting of the background image including the object being composited;
Figure 18 A is a flowchart showing a process for subtracting a background to isolate the object being composited;
Figure 18B shows an initial alpha image of an object being composited after the background subtraction procedure described with respect to Figure 8 A is done;
Figures 19A and 19B are flowcharts a process for showing a preferred embodiment of the shadow reduction process;
Figures 20 A through 20D are flowcharts showing a process for matching the object to a model of the object made up of object part templates;
Figure 21 is a flowchart showing the process for fitting parts of the object with the templates;
Figure 22A is a flowchart showing a process for eliminating background artifacts and clutter close to the boundary of the object so that such items are not unintentionally composited with the object onto the destination image;
Figure 22B shows an alpha image of the object after the shadow reduction, hole filling, and background clutter procedures have been performed; and
Figure 23 is a flowchart showing a process for blending the object from the input image onto the destination image using the alpha image as a blending coefficient.
Best Modes for Carrying out the Invention
In Fig. 1, an interactive karaoke entertainment system 10 in accordance with the present invention includes karaoke audio and video equipment 12, a personal computer (PC) 14, a TV monitor 16, and a video camera 18. Associated with the karaoke audio and video equipment 12 is an input microphone 20 and a remote control 22. An optional photo-printer 24 can be coupled to the PC 12.
The karaoke audio and video equipment can be provided by any number of vendors. In this embodiment of the entertainment system 10, only the audio portion of the karaoke equipment 12 is used. In other words, as a karaoke customer sings into the microphone 20, the karaoke equipment 12 will amplify and process the sound and play it from speakers (not shown) and/or the TV monitor 16. However, the image for the TV monitor 16, in the present embodiment, is provided by the PC 14 via a video input line 26 to the karaoke equipment 12. Karaoke equipment, such as karaoke equipment 12, typically have an external video input to receive external video information. The combined video and audio is then provided by the karaoke equipment 12 to the TV monitor 16 as illustrated by arrow 28.
In addition, the karaoke equipment 12 typically includes a control and data port (often a serial port) which is coupled to the PC by a bus 30. The output of the video camera 18 is coupled to the PC 14 by a cable 32 and, in alternate embodiments of the invention, may be coupled to the PC by a control cable to allow specialized software and utilities to be loaded into the camera 18 from the PC 14. The photo-printer 24 allows the capture of images that are displayed on the TV monitor that can be printed as photographs, photographic buttons, rubber stamps, etc. There are several vendors for such photo-printers. Preferably, the PC 14 is coupled to a local network server by a local area network (LAN) cable 34.
In Fig. 2, the PC 14 and some peripheral components connected thereto are illustrated in block diagram form. The PC 14 is preferably a standard microcomputer available from a variety of sources including a microprocessor 36 that is coupled to dynamic random access memory (DRAM) 38 .and to read only memory (ROM) 40. The microprocessor 36 is also coupled to one or more I/O buses 42 to which peripherals, such as peripheral 44 is coupled. For example, peripheral 44 can be a CD-ROM drive, a DVD drive, a hard disk drive, or any number of input/output (I/O) interfaces. The voice input from the microphone 20 is coupled to the karaoke audio video equipment 12 via a cable 46 and, optionally, to the I/O bus 42 by .an audio input card 48. The image input from the video camera 18 is input to a video input card 50 which, also, is coupled to I/O bus 42. The LAN 34 is coupled to the I/O bus 42 by a network card 52. A video output card 54 is coupled to the I/O bus and produces NTSC (and possibly stereo) output for the karaoke audio visual system 12 on the line 26. A parallel card 56 is coupled to the I/O bus 42 and produces photo-printer output signals for the photo-printer 24. An audio card 58 produces an audio output for a power amplifier (not shown) that may be hooked up to loudspeakers (also not shown). A control card 60 an be provided for purposes such as lighting control. In Fig. 3, a preferred physical implementation of the PC 14 as illustrated. In the present embodiment, the PC 14 is of a "tower" design which provides a multiplicity of I/O slots for he various cards of the present invention. More particularly, a memory expansion board 62, a video card 54, the audio card 58, the camera interface card 50, the network interface card 52, the control card 60, and the parallel card 58 are preferably plugged into I/O slots within the PC tower 14. A keyboard 64 and a mouse 66 are coupled to the PC tower 14 in a conventional manner. Likewise, the PC tower 14 is preferably provided with a CD-ROM drive, a floppy drive, and a pair of hard disks in a conventional fashion. It is preferred to have two hard disks operating in parallel (i.e. "mirroring" each other) for redundancy, since this is the most common area of failure in the PC. By having redundant hard disks drives, the karaoke operator can be virtually assured that the karaoke entertainment system will be continuously operable.
In Fig. 4, the computer implemented process running on the PC 14 is illustrated in flow- diagram. More particularly, the process 68 begins at 70 and, in a decision operation, it is determined whether a user (i.e. a "karaoke customer") is initiating the use of the karaoke entertainment system. This is typically accomplished by using the remote control 22 to activate the selection of a karaoke song. If there is no user initiation, the operation 72 cycles until an initiation is detected. Once an initiation is detected, the process 68 determines whether the requested content is local. By "content" it is meant the requested music video, along with any accompanying multi-media affects .and software required for the interactivity with the karaoke entertainment system. If the content is not local, an operation 76 retrieves the content.
Next, in an operation 78, a "frame" of video data is retrieved from the video camera 18. Once the frame has been retrieved and buffered in the memory of the personal computer 14, a background subtraction is performed. Next, a tracking analysis operation 82 is performed and the results are placed in a tracking buffer of the PC 14 .and an operation 84. Next, a gesture .analysis operation 86 is performed. Subsequently, the images composited based upon the tracking and gesture analysis of operations 82 and 86, respectively, and by the content requested by the karaoke customer. Finally, in operation 90 the resulting composited multi-media content is outputted and, preferably, recorded in a suitable recording device such as a video cassette recorder, recordable CD-ROM, recordable DVD disk, etc. It is the determined in operation 92 if the karaoke customer is done with their particular karaoke session. If so, process control is returned to operation 72 and if not, process control is returned to operation 78 to retrieve a new frame from the video camera.
In Fig. 4A, the operation of the process 68 is illustrated. More particularly, a "frame" 94 of video derived from the camera 18 is loaded into the memory 62 of the PC 14. Those skilled in the art of digital video are well acquainted with the concept of frames. The frame 94 includes the "true" background image 96 and the images of two karaoke customers or "players" or "performers" 96 and 98. The frame is retrieved by operation 78 and a background subtraction is performed by operation 80 to remove all but the karaoke customers 96 and 98. It should be noted that this background subtraction is accomplished without the use of the awkward blue screen apparatus of the prior art. With the background subtracted, the operation 82 performs the tracking analysis operation 82 to provide a tracked image 100. The compositing operation 88 then composites the karaoke customers 96 and 98 into an interactive environment 102.
The aforementioned technologies permit the karaoke customers 96 and 98 to interact with the environment 102. For example, when karaoke customer 96 raises her hands above her head, animated sparks 104 can be caused to fly from her fingertips. As another example, the grasping of the hand of the karaoke customer 96 by the karaoke customer 98 can be used a gesture which produces the images of hearts 106 in the interactive environment 102. Other gestures or body positions can also interact with various objects 108 in the interactive environment, or change the scene of the interactive environment. Therefore, with the technology of the present invention, karaoke becomes a truly interactive activity, somewhat akin to a game, wherein the multi-media, enhanced reality, and virtual reality effects are possible. It should also be noted that this is a true multi-media experience for the karaoke customers. In addition to video and audio outputs, there .are the lyrics 110 of the song, animation effects, etc.
In Fig. 4B, some of the activities of the compositing operation 88 are illustrated in a conceptual form. The operation 88 includes a "media merging" engine 112 which has input, lyrics, audio (e.g. such as from the microphone), sound effects, graphics, animation, camera images, alpha images, tracking information, and gestures. The output is a video stream which provides the video signals for a television monitor, and an audio stream which provides the audio signals for the television monitor and/or separate loudspeakers.
In Fig. 5, a network configuration for the interactive karaoke entertainment system 10 is illustrated. More particularly, a karaoke entertainment system 10 is shown in the lower left hand corner of the page, while a number of other similar systems 10A, 10B, 10C, etc. are also illustrated. Each of the interactive karaoke entertainment systems 10 are coupled to a local area network (LAN) backbone or hub 114 to communicate with a local PC server 116. Preferably, the local PC server 116 is simply a powerful personal computer system.
Also preferably, the local PC server 116 and the interactive karaoke entertainment systems are in fairly close proximity, e.g. within the same building. For example, each of the interactive karaoke entertainment systems 10 can be located in its own, soundproofed room, while the local PC server can be provided in a server or operator room in the same building. The implementation of local area networks are well known to skilled in the .art. Preferably, the local PC server is coupled to a content server 118 by a telephone line 120. The content server 118 includes karaoke "content", which is defined as musical video accompanied by lyrics and any data or software programs required for the interactive use of the "content." The telephone line connecting the local PC server to the content server can be a standard analog telephone line (with the use of appropriate modems at both the local PC 16 and the content server 118), or can be a digital line such as an ISDN line, Tl line, etc. digital line. The advantage of the digital lines are, of course, a significantly higher data transfer rate, with the disadvantage of higher cost. Other data tr.ansmission medium are also well known to those skilled in the .art.
In the present example, the content server 118 is a "mirror site" that is coupled to a remote content server 122 by, for example, the Internet 124. As is well known to those skilled in the .art, a "mirror site" is a site which is updated on a periodic basis, to reflect or "mirror" the contents of another or "master" site, such as content server 122. The purpose of the mirror site 118 is to prevent unnecessary communication delays, especially when transferring large amounts of data, over a relatively slow transmission media such as the Internet 124. For example, one or more content servers can be provided in various cities in Japan while a single content server can be provided in Palo Alto, California. A number of content development systems 124 can then be used to load new content on content server 22 which, as explained previously, creates a mirror image of itself at the content server mirror site 118 via the Internet 124 on a periodic basis.
In Fig. 6, a computer implemented process 126 running on the local PC server 116 begins at 128 and, in an operation 130, it determines whether it has been polled by the content server mirror site 118. It should be noted here that the mirror sites 118 are not required, as the local PC server could communicate directly with the content server 122 via the Internet 124. However, for purposes of efficiency, it is often more desirable to access a local mirror site 118.
If the local PC server 116 determines that it has been polled, it connects with the appropriate content server and transfers accounting information in an operation 132. This accounting information can include the number of times a particular karaoke video has been played and what the appropriate charge for the karaoke operator should be. In addition, an operation 134 can be used to upload and download other information, content, software, etc. Process control is then returned to operation 30.
If operation 130 does not detect a polling from a content server, an operation 136 determines whether here is a request from a local PC, i.e. one of the interactive karaoke entertainment systems 10. If not, process control is returned to operation 130. If there is a request from a local PC, an operation 138 determines whether the requested content is locally available. If not, the content is retrieved from the content server in an operation 140. It should be noted that the local PC server 116 can be connected to the content server mirror site 118 either on a continuous basis (such as with a ISDN line) or on an "on demand" basis, such as with dial- up modem access. Next, an operation 142 downloads the requested content to the requesting local PC, and in operation 144 creates an accounting entry at the local PC server 116. This accounting entry, along with other data, is what is transferred to the content server in the operation 132.
In Fig. 7, an alternative interactive karaoke entertainment system 10' includes a DVD and VCD player 146, a karaoke adapter 148 of the present invention, a recorder 150, a binocular camera 18', and a television monitor 16. Preferably, the player 146 and adapter 148 are controlled by a remote control 152. In this embodiment, a microphone 20 is coupled to the player 146, and a number of DVD and/or VCD disks 154 are inserted to the player 146. The output of the player 146 goes into the adapter 148, as does the output of the camera 18'.
In this embodiment of the present invention, the adapter 48 performs the functionality described previously with regards to the PC 14 running the computer implemented process 68 of Fig. 4. However, the advantage of this systems is that a separate, dedicated personal computer 14 is not required, since that functionality has been integrated into the adapter 148. The output of the adapter 148 is input into the television monitor and/or loudspeakers (not shown). In addition, a VCR, recordable CD-ROM or recordable DVD recorder 150 can be used to record the output of the adapter 148.
In Figs. 8A and 8B, yet another alternate embodiment of the present invention integrates the functionality of the player 146 with the adapter 158 of Fig. 7. More particularly, a combination DVD/VCD karaoke player 156 is shown in a front elevational view in Fig. 8 A and a top plan view with the top lid removed in Fig. 8B.
With primary reference to Fig. 8B, the combined unit 156 includes a VCD and DVD logic module 158, a disk loader 160, a VCD and DVD drive 162, and a karaoke module 164. A power supply 166 is coupled to a source of AC power by a cord and plug 68. The unit 156 has, as inputs, an input 170 from the server, and an input 172 from the camera. The unit 156 has, as outputs, .an output 174 to the television monitor 16 and an output 176 to recorder 150. The advantaged of integrating the karaoke module 164 into a DVD and VCD player includes both size and cost reductions.
In Fig. 9, the interactive video karaoke module 164 is shown in a conceptual form. It includes, as inputs, an input 174 for receiving video input from the disk player, and an input 180 for receiving input from the camera 18'. In addition, the module 164 includes an output 182 to the television monitor 16 and an optional output 184 to the camera 18'. It is therefore contemplated that the camera 18' being used with the interactive karaoke entertainment system 10' may be a "smart" camera which can receive programs, data, and commands from the karaoke module 164. The karaoke module 164 includes a vision processor 186 and an ASIC 188 to handle data communications between the karaoke module 164 and the rest of the unit 156.
In Fig. 10, a block diagram of the major components of the vision processor 186 is illustrated. More particularly, the vision processor 186 includes a digital video interface 190, the color processing unit 192, a microcontroller 194, a vision algorithm core 196, a compression unit 198, an ASIC 200 to handle various glue logic functions, memory 202, a Universal Serial Bus (USB) module 204, a memory controller 206, a field programmable gate array (FPGA) controller 208, and a PAL/NTSC module 210. J-Tag circuitry can be included to provide bound.ary scan capabilities. The input signals (at the digital video interface 190) are processed by the vision processor 186 under microcontroller 194 control. A first output 212 is provided by the USB, and a second output, either for European (PAL) or U.S. (NTSC) video formats is provided at an output 214. External DRAM 216 is coupled to the memory controller 206, and an external FTGA 218 is coupled to the FPGA controller 208. A top plan view of a preferred packaging for the vision processor 186 is shown in Fig. 10A, with a side elevational view taken along line 10B-
10B is shown in Fig. 10B.
In Fig. 11, an exemplary use of an interactive karaoke entertainment system 10 is illustrated. The camera 18 of the unit is aimed toward a play area 220 where the karaoke customers may sing and otherwise perform. It is preferred that the customer stay within the play area 220 so as to remain within the "field of sight" 222 of the camera 18. A wired or wireless microphone 20 can be used by the karaoke customers as they sing, and a remote control can be used to activate the system and to select the karaoke music video they wish to accompany. As the karaoke customers moves about in the play area 220 and make pre-determined gestures .and poses, they can interact with the video and other content displayed on the television monitor 16.
While the present invention has been described primarily with reference to standard television (analog) monitors, an embodiment of the present invention utilizes the new digital television standards. More particularly, in Fig. 12 a digital television 224 is used as the display unit an interactive karaoke entertainment system 10". The real time video interaction and vision technologies 226, as disclosed herein provide an interaction between the digital television and a number of peripheral sources. More particularly, the real time video interaction vision technologies provide an interaction with a computer 228, a digital camera 230, a DVD player 232, a VCD player 234, a game console 236, a digital broadcast receiver 238, a video telephone 240, a "set top" box 242, a satellite receiver 244, or a camcorder 246. It will be appreciated by those skilled in the art, the functionality of the interactive karaoke entertainment systems 10 as described with reference to the analog television monitor are quite transportable to the digital television system as well.
Figures 13 A .and 13B are flowcharts showing a preferred embodiment of a method for model-based compositing of the present invention. At 300 the system is initialized by setting the variable N to zero. N is the number of iterations the system will perform to create a background model. A background model (" average image" ) is created or built by averaging several frames of a background image. An average image is essentially an image of a backdrop that does not contain the object that is being composited. For example, an average image could be a simple image of the karaoke room. The background model is essentially a model of the generally static (i.e., non-moving or unchanging) background in which the object being composited, such as a person, will enter. Before the object enters, however, this background model must be created. At 302 the system captures one frame of the background image. With every image frame captured, the system updates the background model as shown block 304. The process of updating the background model involves updating the average image and maintaining a minimum and maximum pixel value chart for each pixel in the average image examined as well as updating the average image. These charts are maintained by comparing the value of the pixel to the minimum and maximum value of that pixel based on previous images. This process is described in greater detail in Figure 16. Blocks 302 and 304 are described in greater detail in Figures 14 and 15 respectively.
At 306 the system determines whether the background model update is complete by checking if the number of iterations (i.e., number of captured frames of the average image) has reached N. As mentioned above, N is the number of iterations the user wants the system to perform in order to create the background model. For example, the user may want the system to go through 30 or a 100 iterations to build the background model depending on how much time the user wants to spend building the model and how accurate the user wants it to be. If the number of iterations has not reached N, the system returns to block 302 where another frame of the average image is retrieved. If the number of iterations has reached N and no more frames are needed to build the background model, the system proceeds to block 308.
At 308 the system retrieves the minimum and maximum values for each pixel in the average image from the minimum and maximum pixel value charts discussed above. At 310 the system computes the tolerance for each pixel value to be the difference between the maximum pixel value and the minimum pixel value for that pixel. For many of the pixels that are stationary, this tolerance will be close to zero. The tolerance for non-static pixels will likely be greater than zero. At 312 the system checks whether there are any more pixels. If there are more pixels, the process returns to block 308 where the minimum and maximum values for the next pixel in the average image are retrieved. If there .are no more pixels in the average image, the process continues with block 314.
At 314 the system captures a frame of an input image. The input image is essentially a background image containing the object being composited. For example, the object could be a human being (e.g., a child) and the background image could be a living room or bedroom. At 316 the system begins creating a new image called the alpha image which contains a representation of the object being composited by first isolating the object. This is first done by subtracting the background from the input image. The background subtraction block is described in greater detail in Figure 18 A.
In a preferred embodiment, the system performs a procedure for improving the alpha image referred to generally as shadow reduction at 318. This procedure reduces the effect of shadows cast by the object on other background objects in creating the alpha image. It is described in greater detail in Figures 19A and 19B.
hi another preferred embodiment, the system performs another procedure for improving the alpha image called model fitting as shown at 320. In this block the system creates a configuration of templates where each template fits entirely within a particular part of the object. For example, if the object is a person, one template could be for the torso or head. The configuration of templates make up the model which fits the object. The model fitting allows the system to fill up holes in the object while the object is being created in the alpha image. This process is described in greater detail in Figures 20A to 20D. The block following the creation of the templates is simply that of matching each template to its appropriate object part and setting the alpha pixels within the templates to one. This object fill process is shown at 322 and is described in greater detail in Figure 11. Figure 18 is the alpha image, which now contains less holes than the previous alpha image, after the object fill block.
At 324 the system eliminates as much background clutter and artifacts as possible without affecting the object itself. In order to do this it assumes that artifacts greater than a predetermined distance from the closest template (created in block 320 above) is clutter or some type of extraneous artifact not part of the object being composited and ensures that it is not composited onto the destination image.
At 326 the system uses the alpha image to blend the object from the input image onto the destination image. This procedure, known in the art as an alpha blend, uses the value of the pixels in the alpha image to determine which pixels from the input image should be blended onto the destination image. It is described in greater detail in Figure 23. Once the alpha blend is complete, the system checks whether there are any other images to be composited at 328. If there are, the system returns to block 314 where it captures a frame of the next input image that contains the new object to be composited.
Figure 14 is a flowchart showing a process for capturing a frame of an average
(background) image. At 400 the system computes the sum of all differences between the current pixel value of an image just captured and the previous pixel value from an image captured immediately before the current image. In a preferred embodiment, this is done through the formula: SUm = ∑∑P(i,j)- |P0(i,j]| , where i and j are coordinates for each i j pixel. The current pixel is from the image that was captured in the present iteration. Thus, at
400 the system looks at the current value and the previous value for each pixel in the average image frame. At 402 the system prepares for the next iteration by setting the value of the previous pixel value to the current pixel value (P0=P).
At 404 the system determines whether the sum of the differences computed block 400 is greater than a predetermined threshold value. If not, the system proceeds to 408 where the number of iterations is incremented by one. If the number of iterations reaches N, the process of capturing frames of an average image is complete. If the sum of differences is greater than the threshold value, then there has been too much activity in the background image thereby preventing a background model from being built. This can occur, for example, if a large object passes through the image or if an object in the image is moved. The threshold value is set such that some non-static activity, such as a television screen displaying images or a window showing a road with occasional passing objects, is acceptable and will not prevent a background model from being built. However, significant activity will cause the system to re-initialize itself (setting N to zero) and re-starting the process from block 300 of Figure 13A as shown block 406.
Figure 15 is a flowchart showing a process for updating the background model. The background model is updated, if necessary, with each new background image frame captured as described in Figure 14. Once the number of frames captured equals N, the updating process is complete and the background model has been created. Like all images discussed herein, the average (background) image is comprised of pixels. At 500 the system retrieves a pixel from the average image. At 502 the system updates the average image color pixel value. Each pixel in the average image has an average color value. The average color value for the pixels is determined in a preferred embodiment according to the RGB color scheme, well-known in the art. Other color schemes such as YUB can also be used in another preferred embodiment. A low pixel color value indicates a dark pixel. Thus, a color pixel value of zero would essentially be a black pixel. Similarly, the brightest pixel will have the maximum color value for a pixel. By way of example, the pixel from the average image corresponding to the pixel retrieved in block 500 can have a color pixel value of .4 and the pixel in the current frame can have a color pixel value of 0.3. If an averaging coefficient of 0.5 is used, the system would just update the average color pixel value for that particular pixel from .4 to .35, i.e., (.5)(.4)+(.5)(.3)=0.35.
In building the background model, the system also maintains a minimum image and a maximum image. The minimum color image and the maximum color image are used to provide a tolerance or variance for each pixel in the background model. A pixel that is part of a stationary object, for example a piece of furniture in the living room, will have little variance or none at all. Any variance for such a pixel would most likely result from camera noise. On the other hand, a pixel that is part of a background image that is dynamic, such as a television screen or the view through a window, will have a greater tolerance. Such pixels are not stationary and the brightness of such pixels can vary while the background model is being updated. For these pixels, the system needs to have a variance or tolerance level. At 504 the system updates the minimum and maximum values for each pixel if needed. The minimum and maximum values for each pixel provides the tolerance for each pixel. Thus, if the new color pixel value is less than the previous minimum color value for that pixel, the minimum value is updated. Similarly, if the color pixel value is greater than the maximum value for that pixel the maximum value is updated. Block 504 is described in greater detail in Figure 6. At
506 the system checks to see whether there are any more pixels in the average image that need to be checked. If there are, the process returns to block 500 where the next pixel from the average image is retrieved. If not, the system returns to the background model update process as shown in block 306 of Figure 13 A.
Figure 16 is a flowchart showing a process for updating the minimum and maximum values for pixels in the average image. At 600 the system determines whether the color pixel value of the pixel just retrieved is greater than the maximum value of the corresponding pixel from previous frames. If the current color pixel value is greater, the system sets the maximum color pixel value to the current color pixel value in block 602. Once this is done, the maximum color value for the pixel in that location is set to a new high value. If the current color pixel value is not greater than the maximum value, the system proceeds to block 604. At 604, the same process as in blocks 600 and 602 takes place except the minimum color pixel value is compared to the color pixel value of the pixel just retrieved. If the current color pixel value is less than the minimum value, the system sets the new minimum color pixel value to the current color pixel value in block 606. Once the system determines whether the minimum or maximum pixel values need to be updated, the system continues the process of updating the background model.
Figure 17A is a replica of a sample background model or average image. It shows a typical karaoke area without the object to be composited. Figure 17B is a replica of a sample input image (discussed below) that consists of the average image including the object being composited, in this example, a figure of a person.
Figure 18 A is a flowchart showing a process for subtracting a background to isolate the object being composited. Background subtraction is basically the first process in creating an alpha image of the object that is being composited to a destination image. Each frame of an alpha image is digitally composed such that each pixel is either a 0 or 1 based on whether that pixel is either part of the object. If a pixel has a value of one, that pixel is within the object being composited. Where the value of the alpha pixel is zero, the pixel is not part of the object (i.e., it may be part of the background) and is not composited onto the destination image. As will be described in greater detail below, the alpha image is used in an alpha blend, a technique well-known in the art, to blend the object in the input image with the destination image.
At 800 of Figure 18A the system retrieves a pixel in the input image frame. As mentioned above, the input image contains the background and the object being composited. The system also determines its value and sets it to be the current pixel value. At 802 the system determines whether the absolute value of the difference between the current pixel value and the value of its corresponding pixel from the average image is greater than the tolerance of the current pixel plus a constant. As described block 310 of Figure 13 A, each pixel in the average image has a tolerance which is essentially the difference between the maximum and minimum pixel values. If the absolute value of the difference between the current pixel value and the average image pixel value is greater than the tolerance of the current pixel, the system proceeds to block 804 where the system sets the alpha pixel value to one. This indicates that the pixel retrieved from the input image is part of the object because that pixel's color value has changed greater than a "tolerable" amount. A color value change this significant means that there is a new pixel in that position, and that new pixel could be part of the object since the object is the main change in the background model. If the absolute value of the difference is not greater than the tolerance of the current pixel value, the system proceeds to block 806 and simply checks whether there are any more pixels in the input image frame. If there are pixels remaining in the frame the system returns to block 800 and repeats the process. Otherwise, the background subtraction process is complete. It should be noted that in a preferred embodiment, the alpha image is initially set to all zeros and the value for each pixel in the alpha image is changed to one only if the corresponding pixel in the input image frame is determined to be part of the object that is being composited. Otherwise the value of the alpha pixel is unaffected.
The system provides a means for recognizing gestures, positions, and movements made by one or more subjects (karaoke singers) within a sequence of images and performing an operation based on the semantic meaning of the gesture. In a preferred embodiment, a subject, such as a human being, enters the viewing field of a camera connected to a computer and performs a gesture. The gesture is then examined by the system one image frame at a time. Positional data is derived from the input frame and compared to previously derived data representing gestures known to the system. The comparisons are done in real time and the system can be trained to better recognize known gestures or to recognize new gestures.
In a preferred embodiment, a computer-implemented gesture recognition system is described. A background image model is created by examining frames of an average background image before the subject that will perform the gesture enters the image. A frame of the input image containing the subject, such as a human being, is obtained after the background image model has been created. The frame captures the person in the action of performing the gesture at one moment in time. The input frame is used to derive a frame data set that contains particular coordinates of the subject at that given moment. These sequence of frame data sets taken over a period of time is compared to sequences of positional data making up one or more recognizable gestures i.e., gestures already known to the system. If the gesture performed by the subject is recognizable to the system, an operation based on the semantic meaning of the gesture may be performed by the system.
The gesture recognition procedure may include a routine setting its confidence level according to the degree of mismatch between the input gesture data and the patterns of positional data making up the system's recognizable gestures. If the confidence passes a threshold, a material is considered found.
The gesture recognition procedure may further include a partial completion query routine that updates a status report which provides information on how many of the requirements of the known gestures have been met by the input gesture. This allows queries of how much or what percentage of a known gesture is completed by probing the status report. This is done by determining how many key points of a recognizable gesture have been met.
The gesture recognition procedure preferably includes a routine for training the system to recognize new gestures or to recognize certain gestures performed by an individual more efficiently. Several samples of the subject, i.e., individual, performing the new gesture .are used by the system to extract the number of key points, the dimensions, and other relevant characteristics of the gesture. A probability distribution for each key point indicating the likelihood of producing a particular observable output at that key point is also derived. Once a characteristic data pattern is obtained for the new gesture, it can be compared to patterns of previously stored known gestures to produce a confusion matrix. The confusion matrix describes possible similarities between the new gesture .and known gestures as well as the likelihood that the system will confuse these similar gestures.
It will therefore be appreciated that an interactive karaoke system of the present invention includes a microphone developing an audio input from at least one karaoke performer, a camera producing a series of video frames including the at least one karaoke performer, and a karaoke processor system including a video environment .and a related audio environment for the karaoke performer. The karaoke processor system is coupled to the camera to create extracted images of the at least one karaoke performer from the series of video frames and to composite the extracted images with a background derived from the video environment, where the video environment is affected by at least one of a position and a movement of the at least one karaoke performer as detected, for example, by a gesture recognizer. The karoake performer image can be recognized for position, movement, and/or semantic content either before or after image extraction from the background.
It will also be appreciated that the present invention includes a karaoke network including a local area network, a local karaoke server coupled to the local area network .and storing local karaoke content; and a plurality of karaoke systems coupled to the local area network, each of which can request karaoke content from the local karaoke server. Preferably, the karaoke network also includes a distal content server system coupled to the local karaoke server.

Claims

C L A I M S
1. An interactive karaoke system comprising:
a microphone developing an audio input from at least one karaoke performer;
a camera producing a series of video frames including said at least one karaoke performer; and
a karaoke processor system including a video environment and a related audio environment for said karaoke performer, said karaoke processor system being coupled to said camera to create extracted images of said at least one karaoke performer from said series of video frames and to composite said extracted images with a background derived from said video environment, where said video environment is affected by at least one of a position and a movement of said at least one karaoke performer.
2. An interactive karaoke system as recited in claim 1 wherein said related audio environment is affected by at least one of a position and a movement of said at least one karaoke performer.
3. An interactive karaoke system as recited in claim 1 wherein there are multiple karaoke performers, and wherein said video environment is affected by at least one of the positions and movements of said multiple karaoke performers.
4. An interactive karaoke system as recited in claim 3 wherein said related audio environment is affected by at least one of the positions and movements of said multiple karaoke performers.
5. An interactive karaoke system as recited in claim 1 wherein said karaoke processor system includes a karaoke unit having a microphone input, a control and data input, a video input, an audio output, .and a video output, and wherein said karaoke processor system further includes a digital computer system having a camera input coupled to said camera, a control and data output coupled to said control and data input of said karaoke unit, and a video output coupled to said video input of said karaoke unit.
6. An interactive karaoke system as recited in claim 5 wherein said digital computer system is coupled to a network.
7. An interactive karaoke system as recited in claim 6 further comprising a karaoke server coupled to said network for two-way communication with said digital computer system.
8. An interactive karaoke system as recited in claim 5 further comprising a video display unit coupled to said video output of said karaoke processor system.
9. A method for providing interactive karaoke entertainment comprising:
receiving a plurality of video frames which include images of at least one karaoke performer;
subtracting background images from said video frames to create extracted images of said at least one karaoke performer;
performing an analysis of at least one of the position and motion of said extracted images to provide a visual performer input; and
providing background images with accompanying sound, at least one of which is affected, by said visual performer input.
10. A method for providing interactive karaoke entertainment as recited in claim 9 wherein said background images and accompanying sound are based upon stored content.
11. A method for providing interactive karaoke entertainment as recited in claim 10 further comprising retrieving said stored content from a server.
12. A method for providing interactive karaoke entertainment as recited in claim 9 wherein said analysis includes a tracking analysis and a gesture analysis.
13. A method for providing interactive karaoke entertainment as recited in claim 12 further comprising compositing said extracted images with said background images to create composited images.
14. A method for providing interactive karaoke entertainment as recited in claim 13 further comprising recording said composited images and accompanying sound.
15. A karaoke network comprising :
a local area network;
a local karaoke server coupled to said local area network and storing local karaoke content; and
a plurality of karaoke systems coupled to said local area network, each of which can request karaoke content from said local karaoke server.
16. A karaoke network as recited in claim 15 further comprising a distal content server system coupled to said local karaoke server.
17. A karaoke network as recited in claim 16 wherein said distal content server system includes a connection over a world-wide network system.
18. A karaoke network as recited in claim 16 wherein said distal content server system is coupled to said local karaoke server, at least in part, by a local telephone exchange.
19. A karaoke network as recited in claim 16 wherein said distal content server system includes a mirror site content server coupled to a master site content server.
20. A karaoke network as recited in claim 19 wherein said mirror site content server is coupled to said master site content server by a TCP/IP network.
21. A karaoke network as recited in claim 20 wherein said mirror site content server is coupled to said local karaoke server, at least in part, by a local telephone exchange.
22. A karaoke network as recited in claim 15 wherein said local karaoke server further stores accounting information concerning requests for karaoke content from said plurality of karaoke systems .
23. A karaoke network as recited in claim 15 wherein at least one of said plurality of karaoke systems include:
a microphone developing an audio input from at least one karaoke performer;
a camera producing a series of video frames including said at least one karaoke performer; and
a karaoke processor system including a video environment and a related audio environment for said karaoke performer, said karaoke processor system being coupled to said camera to create extracted images of said at least one karaoke performer from said series of video frames and to composite said extracted images with a background derived from said video environment, where said video environment is affected by at least one of a position and a movement of said at least one karaoke performer.
24. A method for providing networked karaoke entertainment comprising:
determining whether an accounting polling event has occurred and, if so, uploading accounting information to a remote server;
providing a requested karaoke content to a local karaoke unit; and
creating an accounting entry concerning the provision of said karaoke content to said local karaoke unit.
25. A method for providing networked karaoke entertainment as recited in claim 24 further comprising determining whether said requested karaoke content is available locally and, if not, obtaining said requested karaoke content from said remote server.
26. A method for providing networked karaoke entertainment as recited in claim 24 wherein at least one local karaoke unit:
receives a plurality of video frames which include images of at least one karaoke performer;
subtracts background images from said video frames to create extracted images of said at least one karaoke performer;
performs an analysis of at least one of the position and motion of said extracted images to provide a visual performer input; and
provides background images with accompanying sound, at least one of which is affected, by said visual performer input.
PCT/US1999/000407 1998-01-07 1999-01-07 Method and apparatus for providing interactive karaoke entertainment WO1999035631A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU24538/99A AU2453899A (en) 1998-01-07 1999-01-07 Method and apparatus for providing interactive karaoke entertainment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US7062698P 1998-01-07 1998-01-07
US60/070,626 1998-01-07

Publications (3)

Publication Number Publication Date
WO1999035631A1 true WO1999035631A1 (en) 1999-07-15
WO1999035631A8 WO1999035631A8 (en) 1999-09-30
WO1999035631A9 WO1999035631A9 (en) 1999-11-04

Family

ID=22096440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/000407 WO1999035631A1 (en) 1998-01-07 1999-01-07 Method and apparatus for providing interactive karaoke entertainment

Country Status (3)

Country Link
US (2) US6514083B1 (en)
AU (1) AU2453899A (en)
WO (1) WO1999035631A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6520776B1 (en) * 1998-11-11 2003-02-18 U's Bmb Entertainment Corp. Portable karaoke microphone device and karaoke apparatus

Families Citing this family (96)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8352400B2 (en) 1991-12-23 2013-01-08 Hoffberg Steven M Adaptive pattern recognition based controller apparatus and method and human-factored interface therefore
EP0982695B1 (en) * 1998-08-21 2004-08-18 NSM Music Group Limited Network for multimedia devices
US7904187B2 (en) 1999-02-01 2011-03-08 Hoffberg Steven M Internet appliance system and method
TW495735B (en) * 1999-07-28 2002-07-21 Yamaha Corp Audio controller and the portable terminal and system using the same
JP2001070652A (en) * 1999-09-07 2001-03-21 Konami Co Ltd Game machine
US9818386B2 (en) * 1999-10-19 2017-11-14 Medialab Solutions Corp. Interactive digital music recorder and player
US20020072047A1 (en) * 1999-12-13 2002-06-13 Michelson Daniel R. System and method for generating composite video images for karaoke applications
JP2001325195A (en) * 2000-03-06 2001-11-22 Sony Computer Entertainment Inc Communication system, entertainment device, recording medium and program
KR20020026374A (en) * 2000-06-20 2002-04-09 요트.게.아. 롤페즈 Karaoke system
US7068596B1 (en) * 2000-07-07 2006-06-27 Nevco Technology, Inc. Interactive data transmission system having staged servers
KR20030060917A (en) * 2000-10-20 2003-07-16 웨벡스프레스 인코포레이티드 System and method of providing relevant interactive content to a broadcast display
JP2002369126A (en) * 2001-06-11 2002-12-20 Hitachi Ltd Linked service method for viewing attraction and content, receiver used therefor, and system of attraction
US20030025726A1 (en) * 2001-07-17 2003-02-06 Eiji Yamamoto Original video creating system and recording medium thereof
TWI244838B (en) * 2002-01-07 2005-12-01 Compal Electronics Inc Method of karaoke by network system
US7312816B2 (en) 2002-07-24 2007-12-25 Freestone Systems, Inc. Digital observation system
US20040017333A1 (en) * 2002-07-24 2004-01-29 Cooper Alan Neal Universal serial bus display unit
US20050076376A1 (en) * 2002-07-24 2005-04-07 Raymond Lind Video entertainment satellite network system
US7053915B1 (en) * 2002-07-30 2006-05-30 Advanced Interfaces, Inc Method and system for enhancing virtual stage experience
US20030159567A1 (en) * 2002-10-18 2003-08-28 Morton Subotnick Interactive music playback system utilizing gestures
US20050153265A1 (en) * 2002-12-31 2005-07-14 Kavana Jordan S. Entertainment device
CA2415533A1 (en) * 2002-12-31 2004-06-30 Jordan Kavana Entertainment device
AU2004281154A1 (en) * 2003-10-16 2005-04-28 Novartis Vaccines And Diagnostics, Inc. 2,6-disubstituted quinazolines, quinoxalines, quinolines and isoquinolines as inhibitors of Raf kinase for treatment of cancer
US7707039B2 (en) 2004-02-15 2010-04-27 Exbiblio B.V. Automatic modification of web pages
US8442331B2 (en) 2004-02-15 2013-05-14 Google Inc. Capturing text from rendered documents using supplemental information
US10635723B2 (en) 2004-02-15 2020-04-28 Google Llc Search engines and systems with handheld document data capture devices
US7812860B2 (en) 2004-04-01 2010-10-12 Exbiblio B.V. Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device
US20060041605A1 (en) * 2004-04-01 2006-02-23 King Martin T Determining actions involving captured information and electronic content associated with rendered documents
US20060122983A1 (en) * 2004-12-03 2006-06-08 King Martin T Locating electronic instances of documents based on rendered instances, document fragment digest generation, and digest based document fragment determination
US9008447B2 (en) 2004-04-01 2015-04-14 Google Inc. Method and system for character recognition
US20080313172A1 (en) * 2004-12-03 2008-12-18 King Martin T Determining actions involving captured information and electronic content associated with rendered documents
US7990556B2 (en) 2004-12-03 2011-08-02 Google Inc. Association of a portable scanner with input/output and storage devices
US9143638B2 (en) 2004-04-01 2015-09-22 Google Inc. Data capture from rendered documents using handheld device
US7500176B2 (en) * 2004-04-01 2009-03-03 Pinnacle Systems, Inc. Method and apparatus for automatically creating a movie
US20070300142A1 (en) * 2005-04-01 2007-12-27 King Martin T Contextual dynamic advertising based upon captured rendered text
US9116890B2 (en) 2004-04-01 2015-08-25 Google Inc. Triggering actions in response to optically or acoustically capturing keywords from a rendered document
US20060098900A1 (en) 2004-09-27 2006-05-11 King Martin T Secure data gathering from rendered documents
US8146156B2 (en) 2004-04-01 2012-03-27 Google Inc. Archive of text captures from rendered documents
US20060081714A1 (en) 2004-08-23 2006-04-20 King Martin T Portable scanning device
US8081849B2 (en) 2004-12-03 2011-12-20 Google Inc. Portable scanning and memory device
US7894670B2 (en) 2004-04-01 2011-02-22 Exbiblio B.V. Triggering actions in response to optically or acoustically capturing keywords from a rendered document
US8713418B2 (en) 2004-04-12 2014-04-29 Google Inc. Adding value to a rendered document
US8874504B2 (en) 2004-12-03 2014-10-28 Google Inc. Processing techniques for visual capture data from a rendered document
US8620083B2 (en) 2004-12-03 2013-12-31 Google Inc. Method and system for character recognition
US8489624B2 (en) 2004-05-17 2013-07-16 Google, Inc. Processing techniques for text capture from a rendered document
US8346620B2 (en) 2004-07-19 2013-01-01 Google Inc. Automatic modification of web pages
US20060058101A1 (en) * 2004-09-16 2006-03-16 Harmonix Music Systems, Inc. Creating and selling a music-based video game
FR2884029B1 (en) * 2005-04-05 2007-05-25 Cyrille David INTERACTIVE LUTRIN EQUIPMENT FOR MUSICAL INSTRUMENT
CN1845591A (en) * 2005-04-06 2006-10-11 上海渐华科技发展有限公司 Kara-Ok receiver
US20070073837A1 (en) * 2005-05-24 2007-03-29 Johnson-Mccormick David B Online multimedia file distribution system and method
US20060292537A1 (en) * 2005-06-27 2006-12-28 Arcturus Media, Inc. System and method for conducting multimedia karaoke sessions
US20070122786A1 (en) * 2005-11-29 2007-05-31 Broadcom Corporation Video karaoke system
US7643422B1 (en) * 2006-03-24 2010-01-05 Hewlett-Packard Development Company, L.P. Dynamic trans-framing and trans-rating for interactive playback control
US7459624B2 (en) 2006-03-29 2008-12-02 Harmonix Music Systems, Inc. Game controller simulating a musical instrument
US20070287141A1 (en) * 2006-05-11 2007-12-13 Duane Milner Internet based client server to provide multi-user interactive online Karaoke singing
US20070299694A1 (en) * 2006-06-26 2007-12-27 Merck David E Patient education management database system
EP2584530A2 (en) * 2006-08-03 2013-04-24 Alterface S.A. Method and device for identifying and extracting images of multiple users, and for recognizing user gestures
US8160489B2 (en) 2006-09-01 2012-04-17 Jack Strauser Karaoke device with integrated mixing, echo and volume control
EP2067119A2 (en) 2006-09-08 2009-06-10 Exbiblio B.V. Optical scanners, such as hand-held optical scanners
JP2010512673A (en) * 2006-09-22 2010-04-22 ローレンス ジー リックマン Live broadcast interviews conducted between the studio booth and remote interviewers
EP2173444A2 (en) * 2007-06-14 2010-04-14 Harmonix Music Systems, Inc. Systems and methods for simulating a rock band experience
US8678896B2 (en) * 2007-06-14 2014-03-25 Harmonix Music Systems, Inc. Systems and methods for asynchronous band interaction in a rhythm action game
KR20070099501A (en) 2007-09-18 2007-10-09 테크온팜 주식회사 System and methode of learning the song
US8380119B2 (en) * 2008-05-15 2013-02-19 Microsoft Corporation Gesture-related feedback in eletronic entertainment system
US20090305782A1 (en) * 2008-06-10 2009-12-10 Oberg Gregory Keith Double render processing for handheld video game device
US8663013B2 (en) * 2008-07-08 2014-03-04 Harmonix Music Systems, Inc. Systems and methods for simulating a rock band experience
CN105930311B (en) 2009-02-18 2018-10-09 谷歌有限责任公司 Execute method, mobile device and the readable medium with the associated action of rendered document
US8447066B2 (en) 2009-03-12 2013-05-21 Google Inc. Performing actions based on capturing information from rendered documents, such as documents under copyright
WO2010105245A2 (en) 2009-03-12 2010-09-16 Exbiblio B.V. Automatically providing content associated with captured information, such as information captured in real-time
US8465366B2 (en) * 2009-05-29 2013-06-18 Harmonix Music Systems, Inc. Biasing a musical performance input to a part
US8449360B2 (en) * 2009-05-29 2013-05-28 Harmonix Music Systems, Inc. Displaying song lyrics and vocal cues
US9981193B2 (en) 2009-10-27 2018-05-29 Harmonix Music Systems, Inc. Movement based recognition and evaluation
EP2494432B1 (en) 2009-10-27 2019-05-29 Harmonix Music Systems, Inc. Gesture-based user interface
US9081799B2 (en) 2009-12-04 2015-07-14 Google Inc. Using gestalt information to identify locations in printed information
US9323784B2 (en) 2009-12-09 2016-04-26 Google Inc. Image search using text-based elements within the contents of images
US20110141253A1 (en) * 2009-12-16 2011-06-16 Heran Co., Ltd. Remote interactive monitoring and alarm system utilizing television apparatus with video-song accompaniment function
US8874243B2 (en) 2010-03-16 2014-10-28 Harmonix Music Systems, Inc. Simulating musical instruments
CN102742261A (en) * 2010-05-24 2012-10-17 联发科技(新加坡)私人有限公司 Method for generating multimedia data to be displayed on display apparatus and associated multimedia player
US9358456B1 (en) 2010-06-11 2016-06-07 Harmonix Music Systems, Inc. Dance competition game
CA2802348A1 (en) 2010-06-11 2011-12-15 Harmonix Music Systems, Inc. Dance game and tutorial
US8562403B2 (en) 2010-06-11 2013-10-22 Harmonix Music Systems, Inc. Prompting a player of a dance game
US9024166B2 (en) 2010-09-09 2015-05-05 Harmonix Music Systems, Inc. Preventing subtractive track separation
US9601118B2 (en) * 2010-10-20 2017-03-21 Megachips Corporation Amusement system
WO2013058678A1 (en) 2011-10-19 2013-04-25 Ikonomov Artashes Valer Evich Device for controlling network user data
US20140298174A1 (en) * 2012-05-28 2014-10-02 Artashes Valeryevich Ikonomov Video-karaoke system
US10115084B2 (en) 2012-10-10 2018-10-30 Artashes Valeryevich Ikonomov Electronic payment system
TWI497960B (en) * 2013-03-04 2015-08-21 Hon Hai Prec Ind Co Ltd Tv set and method for displaying video image
US9127473B1 (en) 2014-10-28 2015-09-08 Darrel Scipio Home entertainment stage
CN104966527B (en) * 2015-05-27 2017-04-19 广州酷狗计算机科技有限公司 Karaoke processing method, apparatus, and system
US10275446B2 (en) 2015-08-26 2019-04-30 International Business Machines Corporation Linguistic based determination of text location origin
US9639524B2 (en) 2015-08-26 2017-05-02 International Business Machines Corporation Linguistic based determination of text creation date
US9659007B2 (en) 2015-08-26 2017-05-23 International Business Machines Corporation Linguistic based determination of text location origin
USD774553S1 (en) * 2015-08-29 2016-12-20 Sithon Chan Handheld karaoke
CN109478342B (en) * 2016-07-15 2020-03-10 纳维株式会社 Image display device and image display system
US20190147841A1 (en) * 2017-11-13 2019-05-16 Facebook, Inc. Methods and systems for displaying a karaoke interface
US10599916B2 (en) 2017-11-13 2020-03-24 Facebook, Inc. Methods and systems for playing musical elements based on a tracked face or facial feature
US10810779B2 (en) 2017-12-07 2020-10-20 Facebook, Inc. Methods and systems for identifying target images for a media effect

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5144454A (en) * 1989-10-31 1992-09-01 Cury Brian L Method and apparatus for producing customized video recordings
US5151793A (en) * 1990-02-26 1992-09-29 Pioneer Electronic Corporation Recording medium playing apparatus
US5689081A (en) * 1995-05-02 1997-11-18 Yamaha Corporation Network karaoke system of broadcast type having supplementary communication channel
US5691494A (en) * 1994-10-14 1997-11-25 Yamaha Corporation Centralized system providing karaoke service and extraneous service to terminals
US5725383A (en) * 1993-07-16 1998-03-10 Brother Kogyo Kabushiki Kaisha Data transmission system
US5803747A (en) * 1994-04-18 1998-09-08 Yamaha Corporation Karaoke apparatus and method for displaying mixture of lyric words and background scene in fade-in and fade-out manner
US5810603A (en) * 1993-08-26 1998-09-22 Yamaha Corporation Karaoke network system with broadcasting of background pictures
US5827990A (en) * 1996-03-27 1998-10-27 Yamaha Corporation Karaoke apparatus applying effect sound to background video

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5099337A (en) * 1989-10-31 1992-03-24 Cury Brian L Method and apparatus for producing customized video recordings
US5296643A (en) * 1992-09-24 1994-03-22 Kuo Jen Wei Automatic musical key adjustment system for karaoke equipment
US5464946A (en) * 1993-02-11 1995-11-07 Multimedia Systems Corporation System and apparatus for interactive multimedia entertainment
US5649234A (en) * 1994-07-07 1997-07-15 Time Warner Interactive Group, Inc. Method and apparatus for encoding graphical cues on a compact disc synchronized with the lyrics of a song to be played back
US6072933A (en) * 1995-03-06 2000-06-06 Green; David System for producing personalized video recordings
JPH09275524A (en) * 1996-04-08 1997-10-21 Aba Internatl:Kk Image synthesizing method and device therefor
US6343987B2 (en) * 1996-11-07 2002-02-05 Kabushiki Kaisha Sega Enterprises Image processing device, image processing method and recording medium
KR19990011180A (en) * 1997-07-22 1999-02-18 구자홍 How to select menu using image recognition
US5913259A (en) * 1997-09-23 1999-06-15 Carnegie Mellon University System and method for stochastic score following
US6072494A (en) * 1997-10-15 2000-06-06 Electric Planet, Inc. Method and apparatus for real-time gesture recognition
US6130677A (en) * 1997-10-15 2000-10-10 Electric Planet, Inc. Interactive computer vision system
US6411744B1 (en) * 1997-10-15 2002-06-25 Electric Planet, Inc. Method and apparatus for performing a clean background subtraction
US6192135B1 (en) * 1997-11-18 2001-02-20 Donald S. Monopoli Dearticulator
KR100270988B1 (en) * 1998-03-12 2000-11-01 최길호 Recording and regenerating apparatus in microphone
JP2000029483A (en) * 1998-07-15 2000-01-28 Ricoh Co Ltd Karaoke machine
US6086380A (en) * 1998-08-20 2000-07-11 Chu; Chia Chen Personalized karaoke recording studio

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5144454A (en) * 1989-10-31 1992-09-01 Cury Brian L Method and apparatus for producing customized video recordings
US5151793A (en) * 1990-02-26 1992-09-29 Pioneer Electronic Corporation Recording medium playing apparatus
US5725383A (en) * 1993-07-16 1998-03-10 Brother Kogyo Kabushiki Kaisha Data transmission system
US5810603A (en) * 1993-08-26 1998-09-22 Yamaha Corporation Karaoke network system with broadcasting of background pictures
US5803747A (en) * 1994-04-18 1998-09-08 Yamaha Corporation Karaoke apparatus and method for displaying mixture of lyric words and background scene in fade-in and fade-out manner
US5691494A (en) * 1994-10-14 1997-11-25 Yamaha Corporation Centralized system providing karaoke service and extraneous service to terminals
US5689081A (en) * 1995-05-02 1997-11-18 Yamaha Corporation Network karaoke system of broadcast type having supplementary communication channel
US5827990A (en) * 1996-03-27 1998-10-27 Yamaha Corporation Karaoke apparatus applying effect sound to background video

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6520776B1 (en) * 1998-11-11 2003-02-18 U's Bmb Entertainment Corp. Portable karaoke microphone device and karaoke apparatus

Also Published As

Publication number Publication date
AU2453899A (en) 1999-07-26
WO1999035631A8 (en) 1999-09-30
US6692259B2 (en) 2004-02-17
US20030124499A1 (en) 2003-07-03
WO1999035631A9 (en) 1999-11-04
US6514083B1 (en) 2003-02-04

Similar Documents

Publication Publication Date Title
WO1999035631A1 (en) Method and apparatus for providing interactive karaoke entertainment
US6971882B1 (en) Method and apparatus for providing interactive karaoke entertainment
US6537078B2 (en) System and apparatus for a karaoke entertainment center
US5830065A (en) User image integration into audiovisual presentation system and methodology
US8758130B2 (en) Image integration, mapping and linking system and methodology
US5464946A (en) System and apparatus for interactive multimedia entertainment
CN101770772B (en) Embedded Internet kara OK entertainment device and method for controlling sound and images thereof
US20070098368A1 (en) Mobile recording studio system
US20030049591A1 (en) Method and system for multimedia production and recording
CN1383543A (en) Karaoka system
JP3197506B2 (en) Bowling alley management system and bowling console
CA2148089A1 (en) System and apparatus for interactive multimedia entertainment
JPH09247532A (en) Image synthesis method and its device
US20120094758A1 (en) Image integration, mapping and linking system and methodology
JP2012198380A (en) Display control device
JP5498341B2 (en) Karaoke system
JP3315333B2 (en) Karaoke equipment
CN1251963A (en) Multifunctional network singing installation
JPH06110480A (en) Karaoke (recorded accompaniment) device
US20020083091A1 (en) Seamless integration of video on a background object
JP5550593B2 (en) Karaoke equipment
KR101295862B1 (en) Karaoke apparatus and method thereof for providing augmented reality images
KR100383019B1 (en) Apparatus for authoring a music video
KR100462826B1 (en) A portable multimedia playing device of synchronizing independently produced at least two multimedia data, a method for controlling the device, and a system of providing the multimedia data with the device
JP2000132150A (en) Image display device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: C1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

CFP Corrected version of a pamphlet front page
CR1 Correction of entry in section i
AK Designated states

Kind code of ref document: C2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/32-32/32, DRAWINGS, REPLACED BY NEW PAGES 1/32-32/32; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase