METHOD AND SYSTEM FOR DELIVERING REAL TIME VIDEO AND AUDIO
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims priority from provisional U.S. patent application serial number 60/169,111, entitled "Internet Real Time Video Caster", filed December 6, 1999.
The above referenced application is incorporated herein by reference for all purposes. The prior application, in some parts, may indicate earlier efforts at describing the invention or describing specific embodiments and examples. The present invention is, therefore, best understood as described herein.
FIELD OF THE INVENTION The present invention is related generally to transmitting continuous media through a computer network and, more particularly, to broadcasting audio and video signals via the Internet for efficient broadcasting of programs.
Copyright Notice
A portion of the disclosure of this patent document may contain materials that are subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION Many applications exist for broadcasting video presentations to multiple destinations. Many of these conventional systems are directed to online
conferences, broadcasting a recorded video conference to a number of users. This can occur either simultaneously or, to some extent, on demand.
Applications also exist to provide for the broadcast of live presentations via the Internet. The main problem with these systems is the inefficiency at which the live broadcast is sent to users for viewing and listening to the conferences. Many times the audio and visual signals are mismatched, making the presentation frustrating to watch, especially when the audio delivery does not track the movement of the people speaking. Another problem in live broadcasts is the inability to deliver the signal to a multitude of disparate systems that operate at different speeds. These systems receive and display data at different rates, making it difficult for a live performance to assimilate with these different systems, causing different levels of quality of the final display product, the presentation on any one individual user's display and audio player.
These problems are compounded v/hen a system strives to serve a large number of users wanting to view, listen and participate in the presentation. For example, an online conference may. broadcast video and audio signals so that a user can see the presentation and, additionally, can receive data from individual users for feedback to the presentation and even rebroadcast to the other users. Video and audio data are a large burden to such a system, especially with a live video performance that produces a large amount of data. Having to broadcast and synchronize with the disparate systems is a monumental challenge. As it will be seen, the invention rises to this challenge and resolves many of these problems in a simple and elegant manner.
SUMMARY OF THE INVENTION
The invention provides a method and apparatus for broadcasting real-time presentations over a network. In one embodiment, a memory is configured to store digital data in a circular buffer of a server. Recording circuitry is configured to write formatted strings of digital data into individual storage locations in the circular memory buffer. The writing of data into the circular buffer is done according to a writing pointer that directs the formatted digital data to be stored at available locations in the circular memory buffer. Formatted data from a particular session is recorded into the circular buffer that,
at some point, overwrites previously recorded data. This minimizes the use of memory locations by the writing pointer, freeing up the memory for other uses. Reading circuitry is configured to read formatted strings of data from the circular data buffer according to a reading pointer. The read pointer tracks consecutively stored data at storage locations by a corresponding writing pointer. Transmission circuitry is configured to transmit strings of formatted video digital data and formatted audio digital data read from the circular buffer in synchronicity to deliver the session in real-time.
The invention further provides a scheduler configured to schedule a plurality of sessions for broadcast to a users having access to the server. An editor is also provided and is configured to read data from the circular buffer. An indexer is also provided and is configured to index edited data pertaining to a session. A text generator is also provided to translate audio data into textual data and to store the textual data in memory. The indexer can be configured to index a session according to predetermined times during the presentation or according to the audio or textual data.
A system is provided for broadcasting a real-time presentation session. The system provides a primary server having control circuitry configured to temporarily store and transmit video data and audio data recorded from a session and memory for storing data. A proxy server is also provided and is configured to receive and store video and audio data transmitted by the primary server and to transmit the video and audio data in real-time and at selected rates to serve disparate user systems. The proxy server provides additional broadcast bandwidth to the primary server and to trans-code the video and audio data to user systems of varying specification. A video-on-demand server is configured to receive video and audio data from the primary server, store data in memory and transmit video and audio data to users on demand. The video on demand server can be configured to store prerecorded edited versions of live presentations including audio and video data in memory, wherein the sessions include textual data pertaining to audio data, the video on demand server further comprising indexing circuitry for indexing the textual data.
A method of recording and broadcasting real-time video presentations begins with receiving video and audio data signals in real time
from a live presentation or performance. The data signals are then converted into strings of video digital data and audio digital data and encoded into a format. The format allows for the storage and transmission of the data. The data is stored by temporarily writing the formatted strings of digital data into individual storage locations in a circular memory buffer according to a writing pointer. The writing pointer directs the formatted digital data to be stored at available locations in the circular memory buffer. The formatted digital data from a particular session is then recorded in the buffer memory. During the process, the writing pointer may direct the data to overwrite previously recorded data. This minimizes the use of the number of memory locations utilized by the writing pointer, freeing the memory up for use by other recording sessions or other functions. The data is read from the circular data buffer according to a reading pointer. The reading pointer tracks consecutively stored data at storage locations previously written to according to a corresponding writing pointer. The data read from the circular buffer is then transmitted in synchronicity to deliver the session in real-time to a user.
The method can further include reading formatted strings of data from the circular buffer at a rate that corresponds to a user's particular specifications. The reading pointer can be reset according to the location of the writing pointer to avoid reading stored data out of order. This would help account for any disparity in writing speeds as compared to reading speeds.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows the relationships among the Realtime Video Encoding Adapter, the encoder buffers, and encoder cards.
Figure 2 shows the procedure of adding a new program to schedule list.
Figure 3 shows the sessions' read/write pointers and the encoder write pointer. Figure 4 illustrates a real-time video casting system according to the invention.
Figure 5 shows a more detailed illustration of the primary server in Figure 4.
Figure 6 shows a detailed illustration of the VOD server.
Figure 7 presents a more detailed illustration of the user system in Figure 4.
Figure 8 shows the steps are followed to add a task in the program schedule in one embodiment. Figure 9 shows procedures for creating a video file on remote server.
Figure 10 gives procedures for starting an encoder.
Figure 11 is a block diagram showing a representative example logic device in which aspects of the present invention may be embodied. The invention and various specific aspects and embodiments will be better understood with reference to the drawings and detailed descriptions. In the different figures, similarly numbered items are intended to represent similar functions within the scope of the teachings provided herein.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Although the media delivery system described below is described mainly in terms of video streaming signals, those skilled in the art can appreciate that the description can be equally be applied to other continuous media, such as audio streaming signals, combined audio and video, or multimedia signals. It can also be applied to non-continuous media where the amount of data being transmitted for a single media data title is, although not continuous, very large. An example is the transmission of an image, for example a high-resolution X- ray, where the amount of data may be of sufficient size that it is more practical to transmit the particular media data title broken up into blocks as is done for the continuous case. The detailed description of the present invention here provides numerous specific details in order to provide a thorough understanding of the present invention. However, it will become obvious to those skilled in the art that the present invention may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuitry have not been described in detail to avoid unnecessarily obscuring aspects of the present invention. Before the specifics of the embodiments are described, some aspects are outlined generally below for an embodiment to transmit video over the Internet.
The fast development of Internet and streaming technologies make the broadcasting of video through the Internet possible. This emphasizes the need of efficient scheduling of Internet video programs on the server side. The schedulers should be able to start and stop the video programs at a given time and, additionally, in many cases the broadcast programs need to be saved in other video servers or sent to proxy servers.
An exemplary embodiment of the present invention introduced may include the following parts:
1. Real-time encoding part: The real-time encoding part communicates between the video encoder and the client-session handling part in the system.
By using the one-buffer-for-all technology, the buffer management component successfully handles the data from a video encoder card and passes the data to different clients through the Internet. The auto-stop and auto-start mechanism built in this part automatically stops and starts the encoder card according to. given criteria, making the encoding part robust and fault resilient.
2. Video program scheduling part: This enables an user to schedule video programs and check the validity of a given program schedule. It controls the start and stop of a video server and also communicates with remote Video-On-Demand (VOD) servers or proxy servers to record a video program being broadcast.
3. Client-Session handling part: This part in the system handles the requests from the Internet. It fetches data from the video data buffer in the real-time encoding part and distributes the video/audio data among the connected client sessions. The fast development of Internet has made the video-through-
Internet applications possible. The Internet real-time video caster embodiment supports MPEG1 and MPEG2 video format with wide range of bandwidth from 64Kb/s to 16Mb/s. This makes the system adaptable to different network conditions and provides different quality video programs to users with different bandwidth access.
The real-time video encoder can handle several video encoders at the same time. The adapter is designed with a plug-in interface and can process the data from encoders of the same type or different types. In the one-buffer-for-
all technology, each video encoder E\ is assigned a circular data buffer Mj in the memory that contains the video data. The buffer size S for the encoder E\ is determined by the video bit-rate bj from the encoder and can be expressed as: S
=bj t, where t equals to the time duration of the video clip that is saved in the memory. By calling the callback functions, the encoder writes data into the encoder buffer in the system main memory and finally read by the client-session handling part and sends to clients connected with the system. This is explained in Figure 1.
Figure 1 shows the relationships among the realtime video encoding adapter, the encoder buffers and encoder cards. In a server 10, a number of sources supply continuous media. Each of these streams is fed to an encoder, such hardware encoders 10-1 and 10-2 or a software encoder 14-1.
Each of these encoder cards E supply data in a format, for example MPEG1 or
MPEG4, and at a bit-rate bj. The realtime Video encoder adapter then allows the continuous media, encoded according to any of the acceptable set of formats, to be stored on a respective circular buffer 11-i.
Internet video providers need automatic and efficient schemes to arrange video programs that will start and stop at certain time slots. The program scheduler core is designed to help the users arrange video programs. In the program scheduler core, video programs are considered as tasks. Tasks are categorized into two types: the periodic tasks and sporadic tasks. A job in the program scheduler core refers to a unit of program that is played and broadcast at certain time. It can be one occurrence of the periodic task or a sporadic task that does not repeat according to a given period. When a user adds a program to the schedule, a validation procedure is first evoked to check the validity of the program. If the active time of the inserted program is conflicting with other programs, it will be set invalid and rejected. Otherwise, the program added to the calendar and will be started at the assigned time. The validity of a given task is determined by its start time and duration. If the task is a periodical task, it is also determined by its expiration time. Besides the start time, end time, frequency and expiration time, each task has its other parameters such as whether or not to save video clips on the local disk, whether or not to stream the video program to a remote media server or proxy server, their Internet
protocol (IP) address, login name and password, the hue, contrast, stream type and so on.
Figure 2 shows the procedure of adding a new program to schedule list This process is outlined in Figure 2. The added program is input as a new task 21. The validation test 22 compares the new program to the other scheduled programs 23 and determines it there it is valid. If not, it is rejected at 25. A valid request is added to the scheduled programs 23. The process then gets the program which will occur soonest 26.
The scheduled task sets a timer after it is added to the task list 27 and the process of adding the new task ends 28. When the start time comes, the system brings up the task, sets the encoder card according to the parameters provided by the task, and starts to wait for connections from the Internet. If the task needs to write the video program to another Video-On-Demand server or proxy server, the system evokes the creation component in the program. scheduler core and writes data to the servers. The creation component is composed of two different clients: a reading client and a writing client. The reading client makes a connection locally with the media caster server and gets video data. The writing client connects with the video-on demand server or proxy server and writes video data from the reading client to the remote servers. The client-session handling component handles request from the client through the Internet. At the initialization stage, the client-session handling part calculates the upper limit of the system according to the CPU time, memory capability and network bandwidth. The system kicks out new connection request when it reaches the upper limit in order to keep the reliability of the system. This limit is determined in a method similar to that described in U.S. patent application serial number 09/665,827, "Method and System for Providing World Wide Streaming Media Services", by Horng-Juing Lee and Joe M-J Lin, filed on September 20, 2000, and which is hereby incorporated by this reference. The system will only accept the request if it does not lead the system to exceed the maximum number of streams that it can simultaneously manage. Here, the limit is usually determined by the smaller of the number of streams available from the server to the network, as determined by its network interface card (NIC), and the number of streams available into the server from the media sources. More details are given the cited application, although in this case the
bandwidth is not limited by the memory size as the incoming media is stored on the circular buffers, whose size is selected as described in further detail below, and does not the memory disc access needed there.
Each client session keeps its own information such as its reading pointer position, video buffer writing pointer position, distance between the writing pointer and reading pointer and so on. Because all clients connecting to the same encoder card are using the video encoder buffer as its own reading buffer, the buffer scheme in the client-session handling component saves considerable CPU time. Each session updates its reading position according to the size of the circular buffer and its reading pointer position according the video encoder writing pointer position in the circular buffer. Figure 3 explains this concept.
Figure 3 shows the sessions' read/write pointers 34-j and the encoder write pointer 33-i. The encoder write pointer 33-i writes the formatted, encoder date on to the circular buffer 11-i of the server 10. Each of the read/write pointers 34-j independently keeps track of the current location of the write pointer on the buffer as well as where the buffer is being read, and transmitted over the network 35, for the corresponding user 35-j
The client-session handling component processes two kinds of requests closely related to the sessions. One is the session read request that needs to supply video data to the session and finally send it to client who is getting video data. Every time a session reads data from the video data buffer, its session read pointer is updated. The other request is the session write request that updates the session write point according to the encoder card pointer. The encoder card writes video data to the video circular buffer and the encoder writing pointer points to the video data boundary.
It is well known in the art that logic or digital systems and/or methods can include a wide variety of different components and different functions in a modular fashion. The following will be apparent to those of skill in the art from the teachings provided herein. Different embodiments of the present invention can include different combinations of elements and/or functions. Different embodiments of the present invention can include actions or steps performed in a different order than described in any specific example herein. Different embodiments of the present invention can include groupings
of parts or components into larger parts or components different than described in any specific example herein. For purposes of clarity, the invention is described in terms of systems that include many different innovative components and innovative combinations of innovative components and known components. No inference should be taken to limit the invention to combinations containing all of the innovative components listed in any illustrative embodiment in this specification. The functional aspects of the invention, as will be understood from the teachings herein, may be implemented or accomplished using any appropriate implementation environment or programming language, such as C++, COBOL, Pascal, Java, Java-script, etc. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
The invention therefore in specific aspects provides a streaming of continuous media such as video/audio signals that can be played on various types of video-capable terminal devices operating under any types of operating systems regardless of what type of players are pre-installed in the terminal devices.
In specific embodiments, the present invention involves methods and systems suitable for providing multimedia streaming over a communication data network including a cable network, a local area network, a network of other private networks and the Internet.
The present invention is presented largely in terms of procedures, steps, logic blocks, processing, and other symbolic representations that resemble data processing devices. These process descriptions and representations are the means used by those experienced or skilled in the art to most effectively convey the substance of their work to others skilled in the art. The method along with the system to be described in detail below is a self-consistent sequence of processes or steps leading to a desired result. These steps or processes are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities may take the form of electrical signals capable of being stored, transferred, combined, compared, displayed and otherwise manipulated in a computer system or electronic computing devices. It proves convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, operations, messages, terms, numbers,
or the like. It should be borne in mind that all of these similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following description, it is appreciated that throughout the present invention, discussions utilizing terms such as "processing" or "computing" or "verifying" or "displaying" or the like, refer to the actions and processes of a computing device that manipulates and transforms data represented as physical quantities within the device's registers and memories into analog output signals via resident transducers. The present invention provides a primary server configured to broadcasts real-time presentations sessions to a number of end-users of disparate computer systems. These disparate systems can be configured under different specifications, including the rates at which a system perceives, processes, displays and sends video and audio data. This server includes a memory configured to store data in a circular buffer, which is characterized by a reusable memory device that stores data. The memory is configurable to provide the consecutive storage of video and audio data in memory locations for subsequent retrieval and broadcasts to disparate systems. In order to save memory space, a finite number of memory locations can be designated for the storage of video and audio data. As data floods into the system from an event such as a live presentation, data is written into the circular buffer and the memory locations are written over with new data as the performance progresses and the data is retrieved and broadcast. The circular memory includes a writing pointer that directs data to be stored at available locations in the circular memory buffer and to record the data from a particular session. Reading pointers are also included in the memory to read the video and audio data. Preferably, one reading pointer is provided for each user who is to receive the data. The reading pointer tracks consecutively stored data at storage locations previously written according to a corresponding writing pointer. The stored audio and video data are then broadcast to deliver a presentation session in real-time. The video and audio are preferably broadcast together and decoded in synchronicity to ensure that the audio play matches the corresponding video presentation. Whether broadcast together or apart, the video and audio data should be able to be displayed in synchronicity after the data is decoded.
In one embodiment, the invention further provides a scheduling mechanism configured to scheduling a number of sessions over a period of time. This scheduling mechanism can be more broadly utilized by many types of systems, including a system for broadcasting conventional television programs over a digital network. In such an application, television and cable systems can be replaced by broadcasting programs over a computer network. Such a network may also incorporate conventional cable system technology.
The invention further provides for a proxy server that works in conjunction with a primary server to deliver sessions in order to add bandwidth and flexibility to the functions of the primary server. Proxy servers can be located in remote geographical locations in order to service certain end-users of the broadcast sessions. The proxy server can add bandwidth by servicing a number of users and groups of users that may have common needs, common system specifications, geographical proximity, cr other commonalties that can be best handled by a separate server. The proxy server may include all of the functionality of the primary server, receiving a broadcast from the primary server and re-broadcasting a session to the proxy server's end-users.
Either or both of the primary server and proxy server may include a scheduler. Such a scheduler can individually schedule sessions directed to particular end-users. Such a scheduler can be independent among servers or, in other applications, may be inter-dependent. Furthermore, a single scheduler can govern all the servers if uniform broadcasting is desired.
The invention further provides a video on demand server to work in conjunction with primary or proxy servers to provide prerecorded sessions to end-users on demand of the end users. The sessions available on demand can be full recordings of the sessions broadcast by the primary or proxy servers. Furthermore, these sessions on demand can be edited versions of the sessions for end-users that prefer such presentations. These edited versions can also include textual information generated in conjunction with the audio data to provide readable text along with or separate from the audio signal. This text can be easily indexed according to key words and phrases. The edited version of the video can also be indexed according to certain highlights or other distinctive portions of the video presentation.
Referring to Figure 4, a real-time video casting system according to the invention is illustrated in more detail. Such a system is adaptable to many types of applications where video and audio data can be gathered from a location and easily broadcast to a number of end-users. For example, online conferencing for professional conferences can be adapted to such a system to deliver real-time video, audio and possibly textual data to end-users participating in the conference. Other applications include any conventional live television-type broadcast, where real-time visual and audio data is desired. Such a system is adaptable to even a simple system such as the recording and broadcast of traffic conditions in different locations to end-users. End-users in this application can include law enforcement authorities, traffic control authorities, radio and television entities and even individual commuters who desire to plan around bad traffic conditions.
The invention described below is directed to an on-line conferencing system, illustrating one embodiment of the invention. More detail on such a system is given in the U.S. patent application entitled "Global Messaging with Distributed Adaptive Streaming Control", by Yen-Jen Lee, Chiun-An Chao, Ray Ngai, Ming-Chao Chiang, and Yu-Ping Huang, filed concurrently with the present application, and which is hereby incorporated herein by this reference. It will be understood, however, that such an embodiment is merely illustrative of the invention, which is not limited to such an embodiment. The invention is not limited by the summary of the invention above or this detailed description.
The invention incorporates a type of data technology referred to in the art as streaming. Streaming technology is a broadly used term that refers to the optimization of data delivery between the data source and the ultimate end-user. This optimization can include encoding techniques for recording and formatting data for storage, processing, broadcasting and end-use and end-storage. The invention utilizes conventional encoding and decoding techniques well known to those skilled in the art, including artisans of streaming technology. Streaming can include real-time delivery in which the video is viewed as it is received, such that the video data not be stored at the recipients location.
Referring to Figure 4, a system for broadcasting real-time presentations sessions according to the invention is illustrated. The real-time video/audio broadcasting system 100 operates over network 102. This network can be an Internet system connected to a number of entities via different communication channels and connected to various service providers that enable the exchange of data in the system. For example, if network 102 was the Internet, Internet service providers 104 would be connected to network 102 for providing Internet connections to entities connected to network 102. Different entities can use a single Internet service provider (ISP), or could have separate ISPs for each individual entity. Conventional ISPs include companies such as
Yahoo!™, America On Line (AOL™), or other service providers.
System 100 further includes a video/audio recording system 106 that is connected to a camera 108 for gathering video data and a microphone 110 for gather audio data to be incorporated into a session. The video/audio recorder 106 can be connected to a primary server 112 for processing any information and ultimate broadcast. As described above with respect to Figure 1, this input is received at the one of the encoder cards 113-i and passed on to the encoding adapter. This encoded, formatted data is then ready to place on the corresponding one of the circular buffers 130. The recorder 106 can further be directly connected to network 102, giving the primary server remote access to a data-gathering location. The primary server 112 includes a CPU 114 for controlling the functions the primary server including the receiving, processing and broadcasting of video and audio data received from the recorder 106. The primary server further includes cache memory 116 for storing information that is commonly used by CPU. The cache is configured to store and transfer data quickly and conveniently being in control of the CPU for efficient processing of the data. Memory 118 is further included for storing larger amounts of data for performing primary server functions.
Scheduler application 122 is included in Memory 118 for scheduling sessions over a period of time. This scheduling function is similar in some respects to the scheduling of conventional television programs by a television network. These are discussed in further detail below. Editor 124 is also included in Memory 118 for editing sessions, eliminating unwanted portions
of session recordings. The Editor is configured to produce useful abridged versions of the sessions for end-users who are interested in only particular portions of the session. In depth application 126 is also included for indexing prerecorded sessions. An end-user can find this useful for advancing to or returning to different portions of a session while viewing it. A text generation application 128 is also included for generating textual information according to the processed audio data. This textual data can be used for display during a presentation. It can also be used in subsequent editing to adapt text to the presentation. This is also useful for hearing impaired persons who participate in a conference, the substance of which is relayed verbally. A set of circular buffers 130 are also included in Memory 118 for the receiving, processing and broadcasting of data of a session. These applications are discussed in further detail below. An optional database 132 is further included to provide access to a large amount of data that is easily accessible and that can easily be searched through.
Modem 120 is also included for exchanging information among other entities as Network 102 and the primary server 112. When incoming continuous media is received in a stream over the Network 102, as with the input from the Video/Audio Recorder 106, it may be fed to an encoder card 113-i and the encoding adapter 115 before storage in one of the circular buffers 130.
System 100 further includes a user system 134 connected to Network 102 via modem 136 for exchange of data between the primary server 112 and the user system 134, as well as other entities connected to network 102. User system 134 includes a user computer 138 connected to a monitor 140 having a display 142 for displaying video presentations of a session. The system 134 further includes an audio system 144 for playing audio recordings that correspond with the sessions. User computer 138 includes a CPU 146 for controlling the inner workings of the computer including the receiving and transfer of information and the processing of video and audio data received from primary server 112.
User system 134 further includes memory 148 containing software applications configured to receive and view broadcast sessions sent by the primary server. The memory 148, much like other memory devices discussed herein, can be any volatile type memory such as a random access
memory (RAM), a dynamic random access memory (DRAM), and other volatile type memory systems that allow the transfer of information to and from the memory within the user system 134.
Included in the memory is scheduling application 150. The scheduling application provides a user with a schedule of sessions from one or a number of servers that broadcast both real-time and prerecorded video sessions. The scheduling application may allow a user to view a schedule much like a television guide, which displays certain programs and the time in which they are offered. To a user inter-face, the scheduling application may allow a user to interact with the schedule, selecting certain programs for broadcast to the user system 134. In one embodiment, a user may be able to preview a certain scheduled session before fully broadcasting it to the user.
Referring to Figure 7, a more detailed illustration of user system 134 is shown. Scheduling application ISO includes software for previewing schedules, reviewing schedules and ordering scheduled sessions. Application 150 can further include the ability to order prerecorded videos that may not be on the schedule. Session application 152 includes the decoder 400 for decoding the data received from the primary server or from other servers that are related to ordered sessions. The session application also includes a session playing application 404 that allows the computer 138 to play or display audio, video and textual data during a session or during the playing of a prerecorded session. Further included in the session application is an interact application 405 which allows a user to interact during a session when allowed. This application allows for data to be sent back to the primary server from the user during the session. In one embodiment, the data sent from the user is rebroadcast to other participants watching a live session together at remote locations.
In one embodiment of the invention, a proxy server 156 is provided to work in conjunction with primary server 112 to serve participants in sessions and other users that request prerecorded sessions. The proxy server 156 includes CPU 158 for controlling the inter-workings of the proxy server and modem 160 communicating with network 102 for sharing data among other entities connected to the network. The proxy server further includes a cache memory 162 for storing information commonly used by the CPU and for providing fast transfer of information within the proxy server and outward to
network 102. The proxy server further includes the memory 164 that includes software applications that provide much of the same features of the primary server. This makes sense since the proxy server is working in conjunction with the primary server to service outside users connected to network 102 that are participating in live sessions and that are ordering prerecorded sessions. The proxy server includes circular buffers 166, encoder cards 157-i, and encoding adapter 159 that operate much in the same manner as those of the primary server 112. The operation of the circular buffers is discussed in more detail below.
The memory further includes scheduler application 168 for scheduling programs to be accessed by the users accessing sessions from proxy server 156. The scheduler may work independently or interdependently with scheduler 122 of the primary server, depending on the particular application. Proxy server 156 may be completely subservient to primary server 112, only offering the same sessions in order to expand the bandwidth of the primary server. In other applications, the scheduler 168 may work independently from scheduler 122 of the primary server to cater to the specific needs of the proxy server's particular users. In certain applications, the proxy server and primary server may be catering to a single group of users for a particular session. However, the proxy server can also act independently of the primary server, offering sessions that may not be offered by the primary server. The proxy server may be in a different geographical location, where different sessions may of particular interest. More details on the use of proxy servers and suitable caching techniques are described in copending U.S. patent application serial number 09/658,705, by Horng-Juing Lee, entitled "Method and Apparatus for Caching for Streaming Data", filed on September 8, 2000, U.S. patent application serial number 09/668,498, entitled "Method and System for Providing Real-Time Streaming Video Services", filed on September 22, 2000, and U.S. patent application serial number 09/679,763, entitled "Sfreaming Machine Aware Binary and Multimedia Content for Multimedia Synchronization", filed on October 5, 2000, which are hereby incorporated herein by this reference.
In certain applications, the primary server and proxy server may be catering to either individual users, individual user groups, or other groups of users having common interests in certain sessions. User systems 134 may act
independently and access certain live sessions or certain prerecorded sessions at the user's request. In other configurations, a user group 135 may include a number of users having a single access network 102 in order to participate in sessions offered by the primary and proxy servers. In different situations, different configurations of primary server \proxy servers may be required in order to handle the demand of data to be delivered.
The proxy server further includes an editor 170 that, similarly to the schedulers may work dependently or interdependently with editor 124 of the primary server. The editor 170, like the other editor, may edit and record certain sessions for later.
The editor application 170 and indexer application 172, both in the primary server and proxy server, are applications directed to the recording of a session for subsequent viewing by a user. During a live broadcast, a user can view the entire session as it occurs. However, if the user wishes to view the application at a later time, it may have the benefit of an edited version of the session, indexed for easy use and manipulation, such as skipping ahead to desired portions of the session.
In one embodiment, however, an editor can be used during a live broadcast to insert certain prerecorded video and/or audio clips to fill in periods during the session where nothing of interest is happening in the seminar. For example, advertising can be inserted during these dead times in order to stimulate the user's interests and, possibly, to plug other sessions, other advertisements, or any other type of video and audio content that may benefit the user. In certain applications, it is even possible to have a "picture inside a picture", where multiple sessions can be viewed at once by a user.
Similar to the primary server, the proxy server also includes a text generator 174 for generating text that corresponds to the audio data. This text generator may also have the ability to generate text other than that related to the audio feed. Text can be created by the primary or proxy servers and, in some applications, can even be created by users interacting with the session.
The proxy server further includes a transcoder 176 that is configured to encode a session in various ways to assimilate with disparate systems having uncommon specifications. This transcoder may be unique to the proxy server or the proxy server may be serving as a counterpart to the primary
server by providing video and audio data delivery at different speeds and, possibly, different formats. Such a configuration would allow a single broadcast of a session among disparate user systems having uncommon specifications. The primary server may then be freed up to do more demanding tasks such as capturing data from other sessions.
Database 178 communicates with the proxy server 156 to provide access to data that is easily accessed and searched. This database could include data pertaining to scheduling, editing, indexing or text generation. The database could be a relational database-that allows easy access to data as well as easy searching of data and related files. The database could provide access to certain files such as other sessions that can be broadcast to users or periodically scheduled. The database could also include video clips that may be inserted into sessions during slow periods where no activities are occurring during certain portions of a session. Still referring to Figure 4, another feature which may be included in system 100 is video-on-demand (VOD) server 180. The VOD server may communicate with one or both of the primary server and proxy server for storing previously recorded sessions. The VOD server 180 allows users to access sessions upon request. The sessions may include the full sessions as they occurred and were recorded in full. The sessions, however, could also include edited sessions created by the editors 124, 170. They could also be indexed by indexers 126, 172. Even further, these sessions may include text generated by text generators 128, 174. In one embodiment, the recorded sessions may be made available to a user in the VOD server on the primary server before the real-time live session is completed. For example, a user may access the VOD server if the user attended the session late after it has begun in order to catch up with what has occurred up to that point.
The VOD server includes a CPU 182 configured to control the inter-workings of the VOD server. It also includes a cache memory 186 for temporarily storing commonly used data by the CPU. The VOD server further includes memory 188 containing VOD application 190. The VOD application may include a software application program for performing the VOD server operations. More details are given in the U.S. patent application serial number 09/665,827, "Method and System for Providing World Wide Streaming Media
Services", by Horng-Juing Lee and Joe M-J Lin, filed on September 20, 2000, that was include by reference above.
Referring now to Figure 6, a further detailed illustration of the VOD server is illustrated. The VOD application 190 includes an audio application 300 and a video application 302 for providing audio and video data to a user via network 102. Like the primary server and proxy server, the VOD server delivers audio and video data in synchronicity in order to provide a quality session presentation. This data is delivered on demand by the VOD server to a user system 134 or user group 135, Figure 4. The VOD server further includes a sessions application 304 for providing recorded sessions to a user. The sessions to be delivered can include edited sessions 306 as well as original sessions 308. As discussed above, edited sessions are ideal for subsequent use by users because they can may be reduced in length by editing out periods during the session where no activity is occurring. Also included in sessions application 304 is a text application 310 configured to store text related to sessions. This text could include transcribed text from the audio portion of a session or could be text added later during editing. Also included in the VOD application is index application 312 which provides indexing for edited and original sessions. Also included in the VOD server may be an option database 314 communicating with the server to provide data that is easily accessed and searched.
Referring now to Figure 5, a more detailed illustration of primary server is shown. Included in the memory 118 is client controller 200. The client controller is a software application configured to control multiple users that access the primary server in order to participate in real-time live sessions. This is discussed in further detail below in connection with the circular buffers 130, only one of which, 130-j, is shown to simplify the figure.
Scheduler 122 includes a periodic control 202 that controls sessions that are offered on a plurality of occasions throughout a certain time period. This application, for example, could be used to schedule sessions that are to be shown once a day at a particular time over the period of, for example, a month. Application code may be required to recode the duration of the session 204 as well as the time slot 206 during the day of which the session is to be broadcast. Furthermore, when a session is periodic, an end date 208 may be
required to automatically stop broadcasting the daily session after a particular day. This periodic control can be used to repeatedly show prerecorded sessions during a certain time slots. Furthermore, the periodic control in the scheduler can control the broadcast of a particular live real-time session at a particular time.
Sporadic control application 210 is configured to place certain sessions into particular time slots sporadically. These sporadic controls, like the periodic control, would need the session duration application 212 and a time slot application 214 to define the length of the session and into which time slot the session may be placed. Unlike the periodic control, however, the sporadic control would only need the date in which the single session would be broadcast using date application 216.
The scheduler application 122 further includes the merger application 218 configured to merge and set the schedule for the sessions. Editor application 220 allows the primary server to edit the schedule. Adding application 222 is configured to allow the server to add new sessions, either periodic or sporadic, to the schedule for broadcast. Delete application 224 may allow the server to remove certain sessions from a particular time slots or a particular series of time slots. Rearrange application 226 is included in the edits schedule also to allow the server to rearrange sessions in certain time slots.
Circular buffer 130-j is configured to receive, store, and transmit video and audio data in and out of data locations 228 in memory. The configuration allows for data to be written into certain data locations as it receives from a video/audio recorder 106 (Figure 4) while the same data is read and transmitted after it is stored. The incoming continuous media is first supplied to one of the encoder cards 113-i and then to the encoding adapter 115 at a bit rate of bj. The formatted data is then written to the appropriate circular buffer.
In operation, the write pointer 230 is configured to consecutively record data into data locations 228 of the circular buffer 130-j as they are received from a session performance period. In a preferred embodiment, a distinct circular buffer is allocated to correspond to each particular live session in progress and being recorded. Once the data is recorded by the write pointer, a
first read pointer 234 can read data out of the data locations after they have been stored by the first write pointer 230.
In this embodiment, there is only one write pointer per circular buffer, as each session is written to its own buffer, although these could alternately be arranged to have several different write pointers. The read and write pointers do not need to have a one to one correspondence there will generically be many readers accessing a particular buffer, each with a distinct and independent read pointer. Thus, a number of read pointers may read data written by a single write pointer. Similarly, in alternate embodiments with multiple write pointers, a read pointer may read data written by a plurality of write pointers. In operation, each read pointer can read the data after it is stored by the write pointer. The various read pointers continue to independently follow the write pointer in time, reading data after it is stored. This data is subsequently transmitted, or broadcast, through the modem 120 to network 102 and, ultimately, to end-users viewing the live session. In order to save data space, a write pointers 230 allocated a predetermined finite amount of space, determined by the size of the circular buffer, to record data. After the end of this space is met, the write pointer overwrites the earlier written data locations.
The buffer size Sj for the encoder Ej is determined by the video bit-rate bj from the encoder and can be expressed as: Sj =bj t, where t equals to the time duration of the video clip that is saved in the memory and therefore correspond to the maximum delay time of the buffer. The larger the value of t, the greater the variability that the different users' write pointers are allowed in their read rates. Thus, a larger value for t will make the transmission more stable. However, a larger value oft can result in longer delay times and result in larger memory requirements. Therefore, a value for is usually taken to be as small as is consistent with allowing enough overhead for the majority of the users' requirements. For most applications, a few seconds is a suitable value. By using the information on users' requirements from the scheduling process, the value of Sj can be optimized.
As a practical matter, it is possible that a read pointer will back behind a write pointer if it lags the write pointer by more than the maximum delay time, a problem that occurs is that the write pointer reaches the end of its
allocated memory space and the read pointer is still reading information from earlier memory locations allocated to the write pointer. In live sessions, however, the data keeps flowing in and must be stored. If the read pointer is passed by the write pointer, the streamed data to the user will jump ahead by a time t and miss the content on the buffer at that moment. In this event, if the write pointer is anticipated to write over information that has not been read yet by the corresponding read pointer, the read pointer is reset to subsequently written locations allocated by the corresponding write pointer. From here, the read pointer can continue reading and transmitting information to the end-user. In an optimized application, the resetting of the read pointer would be minimized so that a quality video and audio presentation can be delivered to an end-user. The end-user is the ultimately bottleneck. Each read pointer is allocated to a particular end-user and reads data according to the end-user's ability to receive the data. For example, if the read pointer lags the write pointer by an amount near the maximum delay time and the user has buffer space, the transmission rate to this user be increased. This depends to a great deal on the speed of the end-user's system, the end-user's ability to buffer the data it receives, the consistency of the connection between the end-user and the primary server via the network 102, and many other factors that may occur between the primary server and end-users.
In one embodiment, a multitude of write pointers can be allocated to separate sessions, where each write pointer records data from a particular session to its allocated memory locations.
Additionally, a multitude of readers can read data from the locations 228 corresponding to a selected write pointer which corresponds to the session in which the end-user lets the read pointer corresponds to select.
In operation, an end-user 134, Figure 4, may select a particular session from the scheduling application 150 and send the request to the primary server 112 so that the user can participate in the live session. In response, the primary server 112 will allocate a read pointer, for example first read pointer 234, to the end-user so that data can be read from data locations 228. A write pointer 230 will write video an audio data from the selected session into data locations 228 in the corresponding buffer 130-j, within which a finite number of allocated memory spaces is accessed by the write pointer 230 to record the
session data. After the session has commenced, data will have been written into data locations 228 that pertain to the write pointer and are available to the first read pointer for transmission, or broadcast, of the session data to the end-user.
In the event that the end-user is unable to receive data fast enough, wherein the write pointer is catching up behind the read pointer in the allocated data locations, the first read pointer 234 will be reset so that some data will be skipped and not broadcast to the end-user. Throughout the broadcast, the first read pointer 234 and write pointer 230 will be continuously synchronized by the system so that the read pointer will stay behind the write pointer in reading consecutively recorded data so that the read pointer does not fall behind, reading data too far out of order.
The above described primary server operations 112 can be mimicked by the proxy server 156 in one embodiment. Allocated to certain users or user groups, a proxy server can perform the same functions of the primary server. The proxy server can even be configured to receive video and audio data from a video recorder 106. This is illustrated in Figure 4. In other embodiments, the primary, proxy, and VOD servers can be combined into a single module to perform either or all of the operations of the individual servers.
Referring again to Figure 7, a more detailed illustration of a user computer is shown. The session application can include a decoder 400 that is configured to decode the video and audio data as it is received from the primary, proxy or VOD servers. The decoder can be configured to decode MPEG1, MPEG2 and other video encoder formats. The play application 404 is included in the user computer to play audio, video and text data sent by the servers. Interact application 405 is included to enable a user to interact with a live presentation by sending input data. This input data can be in the form of audio, video or text data. The VOD application 154 includes applications for reviewing the schedule 406, ordering video presentations from the VOD server 408, previewing videos from the server and playing videos from the server 412. The apparatus and method described above includes a flexible system that delivers real-time video and audio data signals to an end user. Although this embodiment is described and illustrated in the context of a real-time video and audio conference system, the scope of the invention extends to other applications where efficient real-time data transfer between systems is
useful. Furthermore, while the foregoing description has been with reference to particular embodiments of the invention, it will be appreciated that these are only illustrative of the invention and that changes may be made to those embodiments without departing from the principles of the invention. Several application cases are now discussed. To use the described Internet real-time video caster embodiment, the system will have Internet access and a video encoder card installed in, for example, a MPEG1 or MPEG2 format.
A first application is a real-time video broadcasting through the Internet embodiment. When a user wants to provide a video through Internet at certain time, the user should connect the video source with the encoder card and use the program scheduler to set the program begin time, end time, and the video encoder parameters. At the time to begin the program, the scheduler starts the video encoder automatically. Users interested in the broadcast video can connect to the server and receive the video. Using the program scheduler, the user can also set the video program start and stop on a daily, weekly, or monthly.
Another application is real-time monitoring through Internet or other network. Besides the entertainment field, the present invention can also be used in the traffic monitoring, security, video conferencing and so on. The user may setup a video camera at the place where they want to monitor and connect the video source to the video encoder. After the system starts running, the user may go anywhere that has Internet access and use the client to watch the monitoring video. A further application is in real-time creation. As mentioned before, using the discussed system, a user can write video data to another video- on-demand server while broadcasting a program. To do this, the user should enable this function and input the IP address of the remote media server, his login name and video clip duration that will be created on that server. After the encoder is started, the system will connect with the remote server and send video data to it. For a Video-on-Demand server, it would save this incoming data as a new video file. Users connected with the video-on-demand server then can use a video player play the file even while it is still creating. If the real-time Internet video caster is broadcasting live football game, a football fan that just connects
to the VOD server in the middle of the football game will be able to browse the game he already missed till to the very latest result.
An additional application is to video broadcasting without using scheduler. A user may directly go to the encoder control panel in the system to start and stop the encoder. Also, in the control panel, there are options that enable user to set the encoder properties such as bit-rate, hue, contrast, and etc. Users can also input the IP address of the remote server in this panel. After all this settings, when the encoder started, the video data will be available for clients connected to the system and a video file will be sent to remote servers as requested by user.
Some detailed procedures are now presented for some of the aspects of the present invention described above. The first of these are procedures for scheduling a video program task.
Figure 8 shows the steps are followed to add a task in the program schedule in one embodiment. In step 801, a set of parameters are input. From the user interface (UI), each of the users inputs the parameters for the program. The parameters the user inputs may include three types of parameters. The first of these are the time parameters, including information such as start time, duration, frequency and expiration time. The second set of parameter are encoder parameters, such as stream type, bit-rate, resolution, hue, contrast, and so on, which are sent from the server to the user. Normally, when a clip is sent to a user, it includes a sort of header, containing these various encoder parameters. However, as the present system allows the user to connect to a clip, such as a live broadcast, that has already begun, a user joining at this later time would normally not receive these parameters. Therefore, under the present process, the encoder parameter are supplied at this step. If a schedule is being updated or otherwise altered, these parameters may have previously been supplied. The third type are remote server parameters if the user wants to write video data to a remote media server or proxy server. These could include things such as the IP address, password, and login name to that server.
Step 803 is the validation process. Each individual encoder in the system has its own scheduler list. According to the time parameters of the input task, the scheduler compares sort the tasks already in the list to see if there are any conflicts, as shown in Figure 2. If there is any conflict, the input task is
either rejected or some other rescheduling resolves the conflict; otherwise, the input task is added to the schedule list.
In step 805, the scheduler sorts the task list and gets the program nearest in time for each encoder. The scheduler then sets the system timer according to the time parameters of the nearest program in step 807 and transmits the continuous media over the network in step 809.
Once this transmission concludes, the schedule is checked in step 811 to see if there are more programs on the schedule. If so, the timer is set for the next program. Once the schedule is empty, the process stops until additional programs are scheduled. The schedule may be altered by the user while a particular program is being transmitted, in which case the new programs would go through the validation and sorting processes before the timer is reset.
Figure 9 shows procedures for creating a video file on remote server. In order to create a video file on a remote video-on-demand server during the time the Internet Real-time video caster broadcasting video, the user must enable this function in step 901 and input the IP address, login name and password of the remote server. On the time the encoder is started, in step 903 the creation module starts a read client and a write client. In step 905, the read client connects to the caster system as a common client does. The client-session-handling module distributes data among all clients and thus, the read client gets data in step 907. The write client connects to the remote server in step 909 according to the IP address, login name and password provided by the user. The writing client in the creation module gets data from the reading clients and writes data to remote server in step 911. The remote server creates a video file and writes to this file in step 913 using data from the writing client connected from the caster server.
Procedures for starting an encoder are given in Figure 10. This starts in step 1001 when the system gets the video parameters from the configuration supplied by the host and initializes the encoder. The encoding that the system will support are entered, for example, as part of a C++ code. The system starts the encoder by calling the application programmer's interface (API) function in step 1003. In step 1005, the encoder begins to write data to encoder buffer through callback functions. The encoder adapter then allows the different cards to pass the data in the differing formats to the buffer, where it can
be stored in its original form. After an encoder is actually started, the session-client-handling module is started in step 1007. This enables users to connect to the video server through Internet and get real-time video data.
Application Domains
As already noted, the described structures and methods are suitable for continuous media besides video related service, and deliverable over both the Internet, Intranet, and other network environments. This includes (but not limit to) video on demand service, video mail service, movie on demand service, etc.
Additionally, although this discussion has focused on streaming continuous media, these techniques extend to the non-continuous data. This is particularly so where the amount of data being transmitted for a single media data title is, although not continuous, very large. An example is the transmission of an image, for example a high-resolution X-ray. Here the amount of data may be of sufficient size that it is more practical to transmit the particular media data title broken up into blocks as is done for the continuous case. The limits on transmitting this data then become the same as for the continuous case, with similar storage and transmission bandwidth concerns. The media delivery system as described herein is robust, operationally efficient and cost-effective. In addition, the present invention may be used in connection with presentations of any type, including sales presentations and product/service promotion, which provides the video service providers additional revenue resources. The processes, sequences or steps and features discussed herein are related to each other and each are believed independently novel in the art. The disclosed processes and sequences may be performed alone or in any combination to provide a novel and nonobvious file structure system suitable for media delivery system. It should be understood that the processes and sequences in combination yield an equally independently novel combination as well, even if combined in their broadest sense.
Other Embodiments
The invention has now been described with reference to specific embodiments. Other embodiments will be apparent to those of skill in the art. In particular, a user digital information appliance has generally been illustrated as a personal computer. However, the digital computing device is meant to be any device for interacting with a remote data application, and could include such devices as a digitally enabled television, cell phone, personal digital assistant, etc.
Furthermore, while the invention has in some instances been described in terms of client/server application environments, this is not intended to limit the invention to only those logic environments described as client/server. As used herein, "client" is intended to be understood broadly to comprise any logic used to access data from a remote system and "server" is intended to be understood broadly to comprise any logic used to provide data to a remote system. It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested by the teachings herein to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the claims and their equivalents.
Embodiment in a Programmed Information Appliance
As shown in Figure 11, the invention can be implemented in hardware and/or software. In some embodiments of the invention, different aspects of the invention can be implemented in either client-side logic or a server-side logic. As will be understood in the art, the invention or components thereof may be embodied in a fixed media program component containing logic instructions and/or data that when loaded into an appropriately configured computing device cause that device to perform according to the invention. As will be understood in the art, a fixed media program may be delivered to a user on a fixed media for loading in a users computer or a fixed media program can reside on a remote server that a user accesses through a communication medium in order to download a program component.
Figure 11 shows an information appliance (or digital device) 1400 that may be understood as a logical apparatus that can read instructions from media 1417 and/or network port 1419. Apparatus 1400 can thereafter use those instructions to direct server or client logic, as understood in the art, to embody aspects of the invention. One type of logical apparatus that may embody the invention is a computer system as illustrated in 1400, containing CPU 1407, optional input devices 1409 and 1411, disk drives 1415 and optional monitor 1405. Fixed media 1417 may be used to program such a system and may represent a disk-type optical or magnetic media, magnetic tape, solid state memory, etc.. The invention may be embodied in whole or in part as software recorded on this fixed media. Communication port 1419 may also be used to initially receive instructions that are used to program such a system and may represent any type of communication connection.
The invention also may be embodied in whole or in part within the circuitry of an application specific integrated circuit (ASIC) or a programmable logic device (PLD). In such a case, the invention may be embodied in a computer understandable descriptor language which may be used to create an ASIC or PLD that operates as herein described.