WO1998037699A1

WO1998037699A1 - System and method for sending and receiving a video as a slide show over a computer network

Info

Publication number: WO1998037699A1
Application number: PCT/US1998/003904
Authority: WO
Inventors: Kenneth W. Colby; Brian Kenner; Guy P. Weathersby; Lonnie J. Brownell; Peter K. Flynn
Original assignee: Intervu, Inc.
Priority date: 1997-02-25
Filing date: 1998-02-25
Publication date: 1998-08-27
Also published as: AU6672298A

Abstract

A system and method for encoding and decoding digitized audio/video files prepares a slide show of still images and a low bit rate audio stream which can be downloaded in real time over a typical connection to a computer network. The quality of audio/video file is subsequently improved by downloading in successive passes the remaining video frames, which are restored to their original order, and the original high-quality audio content.

Description

SYSTEM AND METHOD FOR SENDING AND RECEIVING A VIDEO

AS A SLIDE SHOW OVER A COMPUTER NETWORK

The invention relates to a system and method whereby a digitized audio- video file is reconfigured and downloaded over a computer network to a user terminal in successive passes of data, so that during or after each pass, the user can see and hear the audio-video file with increasing quality. In a preferred embodiment, the audio-video file can be viewed as a high quality slide show with low bit rate audio during the download process and replayed as a video with full audio after completing the download process.

BACKGROUND OF THE INVENTION

Video data has extremely high storage and bandwidth requirements. In order to reduce the bandwidth required to transmit video data, digitized video files can be compressed to reduce the data comprising the video file. During the process of video compression, video information is deleted that would be imperceptible to the human eye. As more video data is deleted the size of the video file decreases and the bandwidth required to deliver the video file is reduced. A variety of methods and protocols exist for compressing digitized video files and are well known in the art.

MPEG (Motion Pictures Experts Group) is regarded by many as the standard for digital video compression. Videos produced in the MPEG format and played at a rate of 24 frames per second provide high quality, high resolution video and high quality audio.

MPEG video files, like other compressed video files, are still rather large compared to smaller text and graphic files, and can take from several minutes to hours of constant data flow to download. High capacity host/client architecture capable of high storage and transmission rates is required to transmit and receive this data error-free without corruption or loss of data. In a distributed computer network such as the Internet, it is difficult, if not impossible, to provide a host/client architecture which has the capacity for accurate, sustained, high speed transmission of large audio/video files.

Even where the capacity of the network distribution system is improved to permit more data transfer, a bottleneck typically occurs at the user modem which establishes the connection between the user and the network. A typical user modem only receives data at a rate of 28.8 kilobits per second. A 30 second MPEG video can take 5 minutes or more to download over a 28.8 modem. Because the data is often transferred from afar, many factors can cause the loss of parts or all of a transmission, thus slowing the receipt as re-transmission of the lost data occurs.

Real time video delivery has even more specific and stringent transfer and display timing requirements. In this case, the user wants to be able to view the video at the user terminal while the video data is being downloaded. In order to do this, the line between the user terminal and the server must have enough bandwidth to accommodate a steady stream of data comprising all the information necessary for playing the video. If the bandwidth is not available, the data stream will be delayed during the download and there will be insufficient data available at the user terminal to play back the video in real time, as it was originally encoded. As a result, the user will observe interruptions and delays in the video and audio content.

One attempt to improve real time video delivery has been to further compress the video. To accomplish this, some video content providers compress the video data by encoding at a slower frame rate of 6-7 frames per second (fps) and encoding the audio data at a lower bit rate, thereby deleting large portions of content. The resulting video has poor quality and very choppy motion and the sound quality is poor. The video and audio data which is deleted during this compression process is permanently lost. Therefore, even if the download is successful, the quality of the video cannot be improved; it will look and sound just as poor on subsequent replays. Even at this reduced size, the video may consist of more data than can be transmitted at the necessary viewing speed (in real time) over a 28.8 kbaud modem, so that picture and sound quality is further degraded when the user views it.

Another solution involves a compression format wherein data can be added to a video file during transmission to progressively improve the image. As the video file is being downloaded, the content server is continuously testing the bandwidth of the network link to the user and making decisions on a frame-by-frame basis whether to pass more or less data to the user. As more bandwidth becomes available, more data can be passed down and the quality of the video image and audio is improved. Like the previous example, the video data is lost and cannot be recovered once the video file is downloaded. The resulting video is of uneven quality, and subsequent replays will look and sound the same.

Neither solution provides a means to transmit meaningful and entertaining audio/video data to a user in real time that gives the user the option to replay the video in its original format, i.e., a high quality video with high quality sound. The invention solves this problem by providing a method and system whereby a digitized audio-video file can be reconfigured and downloaded over a computer network to a user terminal where it can be viewed as a high quality video slide show with low bit rate audio during the download process and replayed as a full-motion video with high quality audio after completing the download process.

SUMMARY OF THE INVENTION

In a first embodiment, the audio portion of an original audio-video (AV) file is compressed into a low bit rate (LBR) audio data stream by means known in the art. The order of the individual frames comprising the original video data stream is then rearranged. In a first pass, a frame selector module is used to select individual video frames from among all the frames comprising the original video data stream. These frames will be stored at the front end of a reconfigured AV file along with the

LBR audio stream. In subsequent passes, the remaining video frames are selected. The video frame data, LBR audio data stream and audio data stream of the original AV file are then assembled as an AV file having a selectively reordered download sequence and stored for delivery at a server site. When a video clip is requested by a client, the server downloads the video data to the client according to the selectively reordered sequence. As the "front-loaded" portion of the new AV file, is downloaded, the client is able to view a comprehensive audio/video slide show representative of the whole video.

The "front-loaded" portion of the new AV file comprising the slide show is many magnitudes in size smaller than the original AV file (Fig. 1). Thus, even when the bandwidth available for transmission is limited, as is the case with a 28.8 kbaud modem, a high quality video slide show with audio can still be displayed during the download process because the data stream required to support the slide show and compressed audio is much smaller.

Once the slide show frames and LBR audio portion of the new AV file have been downloaded, the remaining video frames and the original audio data stream are downloaded in stages. The client software displays the front loaded data as a slide show during the download process and then resequences the front-loaded data and remaining video frames into the original order. This makes it possible for the client's player to replay portions of the video clip as a low frame rate video during download. If all of the AV data is downloaded, the client software can display the video in its original format and speed with the originally recorded audio quality.

In a second embodiment, the audio portion of an original AV file is highly compressed into an LBR audio data stream by means known in the art. A reconfigured AV file is created consisting of the LBR audio data stream, the original audio data stream and a resequenced video data stream. The frame selector module is used to determine different download orders of video frame data for a variety of given connection speeds. A corresponding index file is created for each download order. The index file records both the download order and information for locating the video data in the new AV file for reassembly in the original order. A frame sequencing interface (FSI) is responsible for delivering AV files from the server to the client. The FSI, among other functions, reads the index file that matches the client's connection speed and downloads the video frame data to the client according to the order recorded on the index file.

In both embodiments, the client software downloads the file until the entire AV file is delivered or the user discontinues the download. As each pass of video data is downloaded, the client software reshuffles the data into its original temporal order making it possible for the client to display the video data with progressively improved quality. Regardless of the number of frames downloaded, each frame is displayed with the full quality of the originally recorded video file. If all the video data is downloaded the video can be displayed in its originally recorded condition with high quality audio.

In both embodiments, the user has the option to stop transmission of a reconfigured AV file at any point. The user can elect, for example, to see only the first frame of the video, to view part or all of a slide show with LBR audio, to view a high quality video with LBR audio, or to view a progressively higher quality video with LBR or originally recorded sound. Thus, the user does not have to use up valuable bandwidth or time waiting for or viewing video content that does not significantly enhance the viewing experience.

In one embodiment, the client software is configured to permit the full download to occur in the background so the user can perform other operations during the download process. Once the video is completely downloaded, the user can be signaled, and can replay the high quality video. The client software can also interrupt, delay, and later resume the download process when it senses competition for the communication interface.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is graph comparing the size (in bytes) of an MPEG video file, a low frame rate video with low bit rate audio, and a video slide show with low bit rate audio;

Fig. 2 is a block diagram representative of a standard MPEG audio/video decoder; Fig. 3 is a block diagram of a video delivery system according to the invention;

Fig. 4 is a flowchart illustrating the operation of a transcoder module according to Fig. 3;

Fig. 5 is a flowchart illustrating the operation of a frame selector module according to Fig. 3; and

Fig. 6 is a flowchart illustrating the operation of the video delivery system of Fig. 3.

DEFINITIONS

The terms used herein have their ordinary meaning in the art, and in addition, specific terms set forth below have the meanings given.

Slide Show. A sequence of visual images or frames presented as a condensed or slow-motion version of a video presentation or clip. In an embodiment of the invention, a slide show comprises a sequence of video frames taken from an original full motion audio/video data file, rearranged and adjusted in timing and sequence so as to make an attractive and synchronized presentation. A slide show may be presented with or without accompanying audio content.

Video Clip. A video clip is a sequence, of any length, of images, with or without audio content (sound), defining a moving picture or animation.

Audio/ Video Data File. An audio/video data file is a digitized computer file representative of a video clip. The audio/video data file can be in any machine readable format and can be compressed, or reduced in size, by any of several known compression techniques, such as MPEG.

( Video Data Stream. A video data stream is that portion of an audio/video data file attributable to the storage of visual images. A video data stream typically comprises at least one sequence of video frames, in presentation or viewing order, or indexed to represent a viewing order. Other possible portions of an audio/video data file include an audio data stream and a system stream, such as a timing stream or an index representative of a viewing order.

Audio Data Stream. An audio data stream is that portion of an audio/video data file attributable to the storage of audio content. An audio data stream may be made up of a sequence of audio frames.

Video Frame. A video frame is a single static image taken from a video clip. A sequence of video frames, viewed in fast succession, provides an illusion of motion.

Audio Frame. An audio frame is a time-divided portion of an audio data stream. Audio frames typically are used for simplicity in handling and processing audio data streams; there is no necessary relationship between individual audio frames and individual video frames. Moreover, individual audio frames may vary in length.

Reconfigured Audio/ Video (RAV) File. An RAV file is produced from an audio/ video data file, which may be referred to as an original or source file, and includes a video data stream having video frames in a different presentation or viewing order than the original audio/video data file. An RAV file may have one or more video data streams and one or more audio data streams, one of which may be LBR audio. An RAV file may be produced or displayed in one or more passes, and may have less than, more than, or the same audio and video information as the original audio/video data file.

Presentation Order. A presentation order is an order, or sequence, in which audio or video frames are stored in an audio/video data file. In certain compression schemes, such as MPEG, the presentation order of certain video frames may differ from the viewing order, as certain video frames are decoded based on information in

1 other video frames which have not yet been displayed.

Viewing Order. A viewing order is an order, or sequence, in which audio or video frames are displayed. Viewing order may differ from presentation order.

Low Bit Rate (LBR) Audio. LBR audio is highly-compressed sound information derived from the audio content of an original audio/video data file. In an embodiment of the invention, LBR audio frames are interleaved with video frames comprising a slide show, so that both the slide show video frames and the LBR audio frames can be downloaded simultaneously and displayed in real-time; the original (non-LBR) audio data stream can be downloaded at a later time.

Low Frame Rate Video. A low frame rate video is a slow motion or reduced- quality version of an original video clip. An audio/video data file representing a low frame rate video includes a subset of the video frames included in the original audio/video data file.

Transcoder Module. In an embodiment of the invention, a transcoder module is a combination of computer hardware and software that decodes an audio/video data file, extracts its video stream and audio stream, and optionally compresses the audio stream into LBR audio.

Frame Selector Module. In an embodiment of the invention, a frame selector module is a combination of computer hardware and software that allows certain video frames to be selected from an audio/video data file for use in a slide show or low frame rate video. Information taken from the selection process is used to generate an RAV file or an index file.

Frame Sequencing Interface (FSI). In an embodiment of the invention, an FSI is used to generate a second version of an RAV file, having a different viewing order or presentation order, from a first RAV file and an index file. The second version can then be transmitted over a communication link having different properties than the one for which the first RAV file was created.

User Terminal. A computer system capable of displaying audio/video data files. A user terminal may be coupled to a communications network.

Server. A computer system coupled to a communications network, capable of transmitting (downloading) stored information to another computer system coupled to the network.

DETAILED DESCRIPTION OF THE INVENTION

For purposes of definition, the term video data, as used herein, can mean both video frame data and audio frame data or just video frame data. To display video data means to process an audio-video file in a computer so video images are displayed on the computer monitor and corresponding audio is broadcast on the computer speakers. The term playback or played back has the same meaning as display.

The invention as described below and in each of the following examples is discussed in terms of its application to the delivery of video data in the MPEG format, but the scope of the invention is not limited to the MPEG format or to the examples given. MPEG is one protocol for compression of digitized video. There are a number of compression protocols which are used to reduce the size of an AV file, i.e., JPEG, H261, Indeo, Cinepak, AVI, Quicktime, TrueMotion and Wavelet. The invention can easily be adapted by one skilled in the art to reconfigure video data compressed by any of these methods, and such adaptations are within the scope of the invention.

Regardless of the compression protocol used, the corresponding AV file would comprise an original audio data stream, an original video data stream and a user stream containing information related to the synchronization and playback of the audio/video streams. The video data stream consists of encoded information for video frames comprising all of the picture information for a given video. The video frames are arranged in a preselected order so that when they are processed by a video player at a certain speed (frames per second) a full-motion video can be displayed.

In the MPEG format, a discrete cosine transform compression algorithm is used to identify and delete redundant video information both between frames and within an individual frame. The video stream of an MPEG movie comprises a series of video frames flanked by a header sequence and an end-of-sequence code. Much of the information in a frame within a video sequence is similar to information in the previous or subsequent frame. The MPEG standard takes advantage of this temporal redundancy by representing some frames in terms of their differences from other (reference) frames.

The MPEG standard specifically defines three types of frames: intra, predicted, and bidirectional. Intra (I) frames, are coded using only information present in the frame itself and are present at unpredictable points within the sequential frames of compressed video data. Predicted (P) frames are coded with respect to the nearest previous I or P frame. Bidirectional (B) frames are frames that use both a past and future frame as a reference. I and P frames both serve as reference frames for B frames. B frames are never used as a reference.

The frequency and location of I frames is based on the need for random accessibility and the location of scene cuts in the video sequence. Where random access is important, I frames are typically used two times a second. The MPEG encoder reorders the sequence of frames in the video stream to present frames to the decoder in the most efficient sequence. In particular, the I or P reference frames needed to reconstruct B frames are sent before the associated B frames.

The MPEG audio stream is similar to the MPEG video stream in that it contains an audio header sequence and one or more audio frames. It should be noted that individual audio frames do not necessarily correspond to individual video frames. Audio frames are simply "packetized" versions of the audio data, that is, the audio data stream divided into frames by any convenient or useful means. For example, a particular audio compression scheme used to create LBR audio might create frames of substantially equal size, but unequal duration. In contrast, video frames typically have substantially equal duration but unequal size (in particular, I frames are typically larger than P and B frames).

The timing mechanism that ensures synchronization of audio and video includes two parameters: a system clock (SC) and presentation time stamps (PTS). The values for these timing mechanisms are coded in the MPEG bitstream. PTS are samples of the system clock that are associated with an individual video frame or audio frame. The PTS indicates the order and timing in which the video frame is to be displayed or the starting playback time for the audio frame.

The MPEG AV file consists of both a compression layer and a system layer.

The audio and video data streams comprise the compression layer. The system layer contains timing and other information needed to demultiplex the audio and video data streams and to synchronize audio and video during playback.

Fig. 2 shows a generalized decoding system for MPEG videos. The system decoder is responsible for extracting the timing information from the MPEG system stream and sending it to the other system components. The system decoder also demultiplexes the video and audio streams from the system stream and sends the data to the appropriate audio or video decoder. Chapter 10 of Video Demystified by Keith Jack, High Tech Publications, 1996, provides a file format for implementing an MPEG video player that is incorporated by reference and can be adapted for use in the video delivery system described herein.

EXAMPLE ONE

A preferred embodiment of the video delivery system allows a user to download a video clip in four passes, the first of which occurs in real time. The system and method according to which the video delivery is performed is discussed in detail below.

With reference to Fig. 3, a reconfigured AV (RAV) file 112 is created from an MPEG video and stored at a server site 126 on the Internet. A client 132 at a user terminal builds a video request in the form of a URL 130 containing the address of the stored file. The client transmits the URL to the server 126. A connection is made between the client and the server and the server downloads the file to the user terminal (receive sequencing interface) 72 in its precoded order. The user terminal initially processes and displays the slide show data in the order it is received, as it is being received. As additional data is downloaded, it is reshuffled with the slide show data in original temporal order making it possible to replay the video with progressively enhanced quality.

Transcoder Module

In Fig. 3, a transcoder module 120 is shown as a component of the content manager 118 of the video delivery system. As will be discussed below, the transcoder module 120 is used in the video delivery system to create an LBR audio data stream and prepare an MPEG video file for resequencing. Accordingly, the transcoder module 120 is used in place of the system decoder of a standard MPEG player (Fig. 2) and performs a similar function.

With reference to Fig. 4, the operations performed by the transcoder module are shown. The transcoder module 120 is used to separate the compression layer of the MPEG file from an original system layer 20. The original system layer is discarded 22 and the transcoder module 120 then disassembles the remaining compression layer into pure MPEG video and MPEG audio data streams 32 and 24, respectively. The data streams 32 and 24 consist of sequential streams of bytes or characters.

The transcoder module 120 compresses the MPEG audio data stream 26 using standard audio compression techniques such as GSM (Global System For Mobile Telecommunications, an international standard for audio compression) to produce a LBR audio data stream which requires transmission bandwidth of approximately 13,000 bits per second or less. The transcoder module 120 also associates 28 a copy of the corresponding PTS with each LBR audio frame indicating the display order of the audio data. Both the original MPEG audio component and the LBR audio component are retained for incorporation into the RAV file.

The transcoder module 120, using markers embedded in the MPEG video data streams, locates all of the pure MPEG data necessary to construct a single video frame 34 and encodes that data 36 in an information block (see Table A). Each audio frame in the original MPEG and LBR audio component is also encoded as information blocks 30 and 40. Each block comprises one byte of block ID representative of the block type, followed by four bytes of block length, followed by the individual block data.

Table A

(Information Blocks)

Block Type

Block Length

Block Data

Next Block Type

Block Length

Block Data

The file block types are: slide show file header block, I frame block, P frame block, B frame block, video sequence header block, end of video file block, GSM (LBR) audio frame block, and MPEG (high quality) audio frame block. The layout of each type of block is shown in Table B. is TABLE B Slide Show Header Block

I, P and B Frame Blocks

Sequence Header, End of Video File, LBR Audio, and MPEG Audio Blocks

As the audio and video frame data is converted to information blocks, the blocks are stored in temporary files which retain the data in its original stream order 38, 42, and 44. As these files are created, a temporary index file is generated which records information indicating in which files the sequential audio and video information blocks are located 46. Finally, the index tables and data stream information are forwarded to a frame selector module 48, as will be discussed in detail below.

Frame Selector Module

The content manager 118 in Fig. 3 also includes a frame selector module

116. The frame selector module is used to select the video data in successive passes for slide show and download sequencing, and thus to encode the RAV and index files. The operations performed by the frame selector module 116 are shown in Fig. 5. The frame selector module 116 is used to select and assemble the data that will be used to build the RAV file 112. The frame selector module 116 picks the video frame data in successive passes using the index information to choose and locate the respective information blocks. In a first pass, the frame selector 116 picks certain I frame blocks. The chosen I frames are intended to provide a comprehensive "slide show" sampling of the entire video. In a preferred embodiment, I frames are chosen at a rate no greater than approximately one frame every two seconds. Where an exemplary MPEG file contains two I frames per second, every fourth I frame would be chosen.

The frames that appear in the first pass are chosen as follows; the average bit size of an I frame is computed 50. The target delivery bandwidth (for example, 28,800 bits per second) is multiplied by a typical usage factor (such as 70%) to give a predicted available bandwidth. The amount of bandwidth needed for the LBR audio is subtracted from the predicted available bandwidth to give the available video bandwidth in bits per second (this assures that there is always sufficient bandwidth to transmit the LBR audio error-free in real time).

The average bit size of an I frame is divided by the available video bandwidth to give the time needed to download the slide. In a preferred embodiment, this value is used as the interval between slides, unless the number is less than two seconds, in which case two seconds is used as the interval. Each slide chosen is the one which has its PTS closest to (but not less than) the next frame interval 52. The last I frame in the video is generally selected as a slide, and an end-of-pass marker is associated with the last frame 54. Accordingly, a slide show representation of a 5 minute (300 second) video would include approximately 150 selected I frames.

Each selected I frame is marked with a second PTS 56 corresponding to its order and timing within the slide show. The frames are then stored in a temporary file according to their original order. The second PTS makes it possible to vary when and how long each frame is displayed during the slide show.

Once the frames comprising the slide show have been selected, the revised order of frames is stored in a temporary video file and indexed 58. As will be discussed in detail below, the slide show can then be viewed frame-by -frame 60, 62 by an operator using the video player component of the frame selector module 116. The video player utilizes standard MPEG video and audio decoders and has a rewind and replay function. The frame selector module 116 permits the operator to edit the slide show by adding or deleting frames 64 and 66, or by substituting individual frames 68 in place of ones picked randomly by the frame selector module 116. The frame selector 116 also allows the operator to add, delete or change slide show PTS values 70 in order to vary when and how long a slide is displayed. When the operator finishes editing, the frame selector begins 92 to write the actual RAV file which will be stored at the server site.

An RAV file header sequence is prepared containing information on the total number of video and audio frames in the video and the bit rate the download order was prepared for. The header sequence is encoded at the front end of the RAV file 94. The information blocks representing the I frames chosen in the first pass and the corresponding LBR audio (the entire LBR audio data stream) are written into the lb front end of the RAV file 112 immediately following the header sequence 96. The file is written such that a portion of the LBR audio data precedes the initial corresponding I frame data. The remaining audio data is arranged in temporal order with the remaining I frame data, however, the file is written such that an audio frame is always downloaded sometime prior to its corresponding video frame. In this manner, LBR audio data is always available to be played when the corresponding slides are displayed. This addresses the experience that short gaps in audio playback are more easily discernable, and more distracting, than short gaps in the visual slideshow presentation.

On the second pass, the frame selector 116 selects video frames which, when played back with the video frames and audio data from the first pass, produce a low frame rate video (1/4 to 1/2 the original frame rate) with LBR audio. This video plays back with good to very good motion. However, unlike the first pass, the second pass need not be downloaded in real time.

The frames on the second pass 74 are chosen in one of two ways, depending on the makeup of the MPEG file. If the total number of I frames in the file is more than 25% of all frames 76, then approximately every fourth frame is chosen 78 (unless that frame was already selected during the slide show pass). If the fourth frame is not an I frame, then the next valid frame is chosen instead. If the number of I frames is less than 25% of the total number of frames, then the second pass consists of all the remaining I frames plus all P frames 80. This results in a video which displays at approximately 1/2 the original frame rate. The actual frame rate ultimately achieved depends on the combination of frames used to make the original video but can be from 5 frames per second (fps) to 15 fps. The quantity of data selected for the second pass is typically more than is able to be downloaded in real time over a 28.8 kilobaud modem connection.

The information blocks representing the video frame data chosen in the second pass are written into the RAV file immediately following the slide show video frame data and LBR audio 98.

II A third pass 86 includes all remaining video frames which have not been selected in either of the two preceding passes. The information blocks representing the video frame data chosen in the third pass are written into the RAV file immediately following the video frame data chosen in the second pass 100. Like the second pass, the third pass comprises a quantity of data which is typically more than is able to be downloaded in real time over a 28.8 kilobaud modem connection.

In a fourth pass, the information blocks representing the MPEG audio data stream are written into the end of the RAV file, followed by the end of sequence block 102 and 104. Like the second and third passes, the fourth pass comprises a quantity of data which is typically more than is able to be downloaded in real time over a 28.8 kilobaud modem connection.

As discussed, the second, third, and fourth passes may have more data than can be downloaded in real time. Accordingly, the transfer can take place in the background without user intervention. For example, if a user is using the invention in the context of browsing the World Wide Web, a certain Web page might contain a video clip. The user, by actuating a software control, can choose to receive the video clip, which is then displayed as a slide show in a portion of the Web page. If the user decides to download subsequent passes, the user can continue to browse other Web pages as the download continues. When the download pass is complete, the user is alerted and given the option to return to the Web page containing the video to view the downloaded file.

By way of example, the previously described RAV file is arranged to download over a 28.8 kilobaud channel in the following order: slide show frames and low bit rate audio in the first pass, video frames for building a low frame rate video in the second pass, the remaining frames (frames for building the original MPEG video) in the third pass, and the high quality MPEG audio in the fourth pass. Given this arrangement, the slide show data and low bit rate audio would be downloaded or passed down first, so a slide show could be displayed during the download process. During the second and subsequent passes, the slide show and low bit rate audio, or a higher quality presentation if more data is available, is shown during the beginning of the download. After the playback is finished, the download is able to proceed in the background until the pass is completed.

If the user terminal is connected to the network by a faster connection, more bandwidth is available to transmit more video data. In this case, the content provider might elect to arrange the RAV file so that video frames necessary to display a low-frame-rate video could be downloaded or passed down first, at the same time as the low bit rate audio data. In this way, a low-frame-rate video with LBR audio, instead of a slide show, can be displayed during the initial download process.

In the latter case, video frames which would normally be selected in a first and second pass would be selected in a first pass for incorporation into the front end of the RAV file. The RAV file components would then be arranged to download in the following preferred order: video frames for building the low frame rate video, the remaining video frames, and the MPEG audio. The LBR audio is preferably downloaded simultaneously with the low frame rate video; alternatively, it can be downloaded before or after any of the RAV file components.

It is also possible to assemble RAV files comprising a variety of different download arrangements including arrangements where audio data is downloaded last, or not at all, in which case a slide show or video could be displayed without audio. In this case, more frame data can be transmitted during the download process. These embodiments are also within the scope of the invention.

The audio and video information blocks of the RAV file 112 would be prearranged in the necessary download order for a given baud rate and stored in that order at the content provider's server sites as a data structure encoded on a computer-readable medium.

Receive Sequencing Interface When a client requests a video, a URL (Uniform Resource Locator) describing the name and address of the file to be downloaded is transmitted to the server 126 storing the RAV file 112. The server 126 uses the URL address to locate the RAV file 112. The server then forwards a URL to the Receive Sequencing Interface (RSI) 72 requesting authorization to begin transmitting the RAV file 112.

With reference to Fig. 3, the RSI 72 comprises a URL processor 130, a block transfer interface 128, frame builder 116, an index file generator 134, a frame sequence table 140, an audio and video play list 136, 138 and an MPEG video decoder /player 144. The components of the RSI 72 cooperate to receive and process video data at the user terminal so it can be displayed.

Upon notification of the URL processor 130, the RSI 72 establishes a TCP/IP connection to the server via the block transfer interface 128 which starts a flow of block data from the server 126 to the RSI 72. The frame builder 116 stores the blocks of data in the order received so that the RAV file 112 is reassembled. At the same time, the index file generator begins to construct the audio and video playlists 136 and 138 and the frame sequence table 140.

The frame sequence table 140 is constructed from information extracted from the header of each video information block. The frame sequence table 140 has an entry for each block of video frame data. The layout of each entry is shown in Table C. The information in the frame sequence table 140 is used by the system sequencer 142 of the player 144 to locate video information blocks in the RAV file 112.

.0 TABLE C

Frame Sequence Table

The video and audio playlists 138 and 136 are computed from information extracted from the RAV file header sequence. Each playlist consists of a plurality of entries, and each entry stores data for an individual video or audio frame. Each playlist is created with enough entries to accept information for every frame in the data stream. The information stored in an entry in the playlist is shown in Table D. TABLE D

Video Playlist

<* /

Audio Playlist

The video playlist 138 tells the system sequencer 142 in what order the video frames are to be decoded in a given cycle. The video playlist also contains pointers into the frame sequence table 140 for each frame entry.

Frame Builder Module

Fig. 6 shows the operation of the frame builder 116 and index file generator 134. Initially, the video playlist 138 is created by the file generator 134 with a -2 in each index entry. As the first video frame to be played is downloaded 168, the frame sequence table 140 is updated 170, 172 with the location of that video frame block in the RAV file 112, and the negative number in the playlist 138 index entry corresponding to that video frame is updated with a positive number 172 pointing into the frame sequence table 140.

When the next video frame is downloaded 176, the frame sequence table 140 is updated 178 and the negative number in the video playlist 138 index entry corresponding to that frame is updated with a positive number 180 into the frame sequence table 140. The negative two (-2) in each entry between the entries containing the positive numbers is then changed to negative one (-1) 182, and the video data block is saved to the RAV file by the frame builder 184. As each video frame is downloaded, the process is repeated 186 until a positive number is entered in the video playlist 138 for every frame in the slide show and all of the intervening entries have been changed from -2 to -1.

The audio playlist 136 contains pointers into the RAV file 112. There are two audio play lists, the first list will be a pointer into the LBR audio. The second list will be a pointer into the MPEG audio. Audio frames are stored by the frame builder into the RAV file in the same order they are received 166. As each LBR audio block is received 162, the file generator registers the audio blocks file location in its corresponding entry in the LBR audio playlist 220.

Once the slide show frame data and LBR audio data have been downloaded, the video frame data selected in the second pass is downloaded 188. As each video frame is received, the frame sequence table 140 is updated 190, a positive number is entered 192 in the corresponding entry on the video playlist 138 and the slide show PTS is deselected for every preceding video frame. The video data is then saved to the RAV file 194 by the frame builder. This process is repeated for the video frame data which was selected on the third pass 196. After the third pass data is downloaded, the frame sequence table 140 would contain a complete record of all video frame data and the video playlist 138 would have a positive number in every entry. The order of frame data in the video playlist 138 reflects the same order in which data is presented by an MPEG encoder to an MPEG decoder. The presentation order is different than the display order.

Once the second and third pass video data is downloaded, the MPEG audio data is downloaded 198, timing information is extracted 200, each audio frame is registered 202 in an entry in the MPEG audio playlist 136, and the MPEG audio frame data is stored 202 in the RAV file.

Video Player Module

With reference to Fig. 3, the video player module 144 is shown. The player operates as a standard MPEG decoder /player, as shown in Fig. 2, except the standard MPEG system decoder is replaced with a system sequencer 142. The system sequencer 142 is responsible for synchronizing and directing the playback of the audio/ video streams and is invoked as soon as the frame builder module 116 begins to receive the RAV file 112 from the server 126.

The system sequencer 142 determines the next frame to decode by looking at the video and audio playlists 138, 136 and the current status of the video and audio

&3 output buffers 146, 148 of the player 144. When the system sequencer 142 reads through the video playlist 138, it will retrieve the corresponding video frame block for each positive entry it comes to and forward the blocks from the RAV file 112 to the video decoder 154 for decompression. However, if the system sequencer 142 sees that the video output buffer 146 is full or the audio output buffer 148 is near empty, the system sequencer 142 will look to the audio playlist 136 to determine the next audio frame to decode and retrieve this audio block from the RAV file 112 for decoding.

The video player module 144 decompresses audio and video frame data in the order presented by the system sequencer 142. Once decompressed, the video frames are stored in the buffers and displayed in the order and for the length of time referenced by the slide show PTS.

If the user chooses to replay the downloaded data after the end of the slide show, the system sequencer 142 will read through the video playlist 138 again decoding the corresponding video blocks for each positive number it comes to. Since more video frames will have been downloaded, more video frames will be available for decompression and the resulting video image will be enhanced. If all of the video frames in the second pass have been downloaded, the system sequencer 142 will be able to direct the playback of the low frame rate video with sound.

In one embodiment, the system sequencer 142 is disabled from selecting video frames from the second or third pass for decoding until the last frame in that pass has been downloaded and the system sequencer 142 has read an end of pass marker. In that case, a display on the user terminal screen indicates when a given pass is downloaded and the user can elect to replay the slide show or wait until the download is complete.

Depending on the arrangement of video data in the RAV file 112 and the amount of data downloaded, the following non-limiting playback configurations are possible: slide shows, with or without LBR audio, where the video frame display rate is from 1 frame every 10 seconds up to 4 frames per second; a standard video at a preselected frame rate from as low as 4 fps to 24 fps, with or without LBR audio; an MPEG video (if all I, B, and P frames are downloaded) with or without LBR audio; and, if the full RAV file 112 has been downloaded, an MPEG video with MPEG audio can be assembled and played back. Standard videos with frame playback rates slower than 7.5 fps are within the scope of the invention but are not desirable due to the poor image quality.

EXAMPLE TWO

In this alternative embodiment, the RAV file 112 is created from an MPEG video as described in Example 1 and stored at a server site. However, the RAV file 112 does not have to be downloaded in its prearranged order. Instead, the server site is equipped with a frame sequencing interface (FSI) which can rearrange, in real time, the download order of the RAV file 112.

With reference to Fig. 7, a video distribution system is partitioned into a content management system 118 which comprises the transcoder 120 and frame selector 116 programs, an FSI 204 which is located on the video pump 126 (the principal storage unit for the RAV files and index files), a title manager 206 for processing video requests from the user terminal, and a client 208 which comprises the RSI 72 programming for receiving and displaying the RAV file 112 and the player 144.

The video distribution system operates as follows. The user registers for the video service via client/title manager interaction. This process compiles user hardware and software configuration, preferences, and password data.

The user, in interaction with the title manager 206, will select a video either from a video guide provided by the title manager or from a Web site. The title manager then selects a URL specifying the address of the video at an appropriate video pump, and transmits it to the client 208. The client then requests this video by transmitting the URL to the video pump 204.

The FSI and video pump system 204 respond by providing this video to the client 208 in a format and frame rate selected by the client, or one which matches the hardware configuration (e.g. modem speed) of the particular user. If the modem speed will not support the download of a video, the user will receive a slide show with real-time LBR audio. As the amount of local video data increases during the downloading process as described above in connection with Example 1, low frame rate videos can be displayed with progressively enhanced quality. Upon completion of the download, the user will be able to view the full frame rate MPEG audio/video presentation.

With reference to Fig. 7, the MPEG video files are converted to RAV files 112 within the content management system 118 where the transcoder 120 and frame selector 116 programs reside. The transcoder 120 and frame selector 116 may perform the same function in the same way as described in Example 1. Thus, video data is selected in four passes and an RAV file 112 is created in which the video data is stored in the following order: slide show frames and LBR audio, low frame rate video frames, remaining video frame data, and MPEG audio.

At the same time the RAV file 112 is being assembled, a primary index file 122 (Table E) is created (see Fig. 5, step 114) which contains a record of the download order of information blocks in the RAV file 112 and information for locating each block in the RAV file or the original MPEG file. The primary index file is stored with the RAV file 112 at the server site 126.

TABLE E

The frame selection process performed by the frame selector module 116 in

Ab Example 1 is repeated so that download sequences for different baud rates can be calculated. For instance, if the user has an ISDN connection, it may be possible to download sufficient data to play a low frame rate video with LBR audio during the download process instead of a slide show. In that case, the frame selector module 116 would make a first pass and select all the frames necessary to make a low frame rate video (all the frames that were previously chosen in the first and second pass). The remaining video frame data would be selected on a second pass and the MPEG audio would be selected on a third and final pass as described in Example 1.

After the operator has finished making the final slide selection, the frame selector 116, instead of writing a new RAV file, creates a secondary index file 122 (Table E) which records the new download order and information about where the blocks are located in the original RAV file 112.

The secondary index files 122 are stored with the RAV file 112 and primary index file at the server site 126. A number of secondary indices would be prepared for a variety of different download arrangements and each index would contain pointers into the same RAV file 112. Thus only one large AV file need be stored, along with a number of small index files 122.

When a user attempts to download a RAV file using a low bit rate connection (e.g. a 28.8 kilobaud modem), the RAV file 112 created by the content manager 118 will be downloaded directly by the video pump 204. When a higher bit rate connection (e.g. ISDN) is used, the FSI 204 will resequence the RAV file 112, according to the information in the primary and secondary index files, so that the most appropriate sequence is used. All of the frame selection and ordering calculations would have been made, in advance, in connection with the content manager 118, and stored in an appropriate secondary index file as discussed above.

With reference to Fig. 7, the content manager 118 is responsible for transferring the RAV file 112 and index files to the FSI/video pump storage unit 204, and upgrading the database of the title manager 206 to include the new video clip title. In a preferred embodiment, the title manager 206 and FSI/video pumps would be located at the head end in an Internet or intranet service provider facility or on an Internet backbone.

With reference to Fig. 7, the FSI video pump 204 comprises a transfer monitor 124, storage for the RAV and index files 112 and 122, and a block transfer system 210. When the server 126 receives a URL from the client for a particular video clip, the server creates a TCP/IP socket connection to the client's RSI 72. The URL contains both address information into the RAV file 112 and client information. The server 126 starts the transfer monitor 124 by passing the name of the file to be transferred and the connection speed of the user to the transfer monitor.

The transfer monitor 124 searches the index files 122 for the secondary index that contains the download sequence for the given connection speed. The transfer monitor 124 then uses the index 122 to locate the appropriate information blocks in the RAV file 112, so they can be downloaded according to the download sequence recorded in that index.

In one embodiment, the FSI can respond to a user request for a particular

RAV file format. For example, a user may elect to preview a slide show first, even though the connection speed may accommodate the download and real-time display of a low-frame-rate video. In this case, the transfer monitor would accept the request and search the index files for an index which contains a record of a download sequence which is front loaded with slide show video frames, such as the RAV file 112 described in example 1.

The transfer monitor 124 then uses the secondary index to locate the appropriate information blocks in the RAV file 112, so they can be downloaded according to the download sequence recorded in that index 122.

The output of the transfer monitor 124 comprises a series of information

2 blocks from the RAV file 112, transmitted according to the download sequence recorded in the appropriate secondary index file 122. The referenced blocks are then forwarded via the video pump TCP/IP block transfer system 210 to the client's RSI 72.

Once the FSI/video pump 204 has delivered the data blocks to the RSI 72, the data is processed by the RSI 72 and video player as discussed in Example 1 and as shown in Figures 5 and 6.

While certain exemplary structures and operations have been described, the invention is not so limited, and its scope is to be determined according to the claims set forth below.

Z

Claims

WE CLAIM:

1. A method for encoding and decoding an audio/video data file, comprising the steps of: obtaining a digitized audio/video data file comprising a video data stream representing a sequence of video frames having an original order and a first timing mechanism for indicating the display order of the video frames; decoding the audio/video data file into at least its component video data stream; reordering the video frames in the video data stream; assembling a reconfigured audio/video file comprising the reordered video frames; and displaying the reordered video frames according to a second timing mechanism.

2. The method of claim 1, further comprising the step of rearranging the video frames according to the original order so the original audio/video data file can be displayed.

3. The method of claim 2, further comprising the steps of: decoding the audio/video data file into its component audio data stream; and compressing the audio data stream to produce a low bit rate audio data stream.

4. The method of claim 3, wherein the assembling step further comprises the step of incorporating the low bit rate audio stream into the reconfigured audio/video file.

5. The method of claim 4, wherein the assembling step further comprises the step of associating the second timing mechanism with the low bit rate audio data stream to synchronize the low bit rate audio data stream to the reordered video frames.

2_>

6. The method of claim 1, wherein the reordering step further comprises the steps of: selecting video frames that can be used to assemble a slide show having a first pre-selected display rate; associating the video frames in the slide show with the second timing mechanism to indicate the display order and timing of display of each slide show frame; and selecting video frames which can be combined with the video frames selected in previous passes to assemble a video file having a second, higher, display rate.

7. The method of claim 6, further comprising the step of selecting all remaining video frames not previously selected.

8. The method of claim 7, further comprising the step of storing the video frames in a reconfigured audio/video file.

9. The method of claim 1, further comprising the steps of: receiving and processing a request for the reconfigured audio/video file from a user terminal; and downloading the reconfigured audio/video file from a storage unit to the user terminal in the selectively reordered download sequence.

10. The method of claim 9, wherein the download is accomplished in at least two passes.

11. The method of claim 10, wherein the download is accomplished in four passes.

12. The method of claim 11, further comprising the step of arranging the video frames for downloading in the following order: slide show frames, subsequent pass video frames, final pass video frames.

13. The method of claim 11, further comprising the step of arranging the reconfigured audio/video file for downloading in the following order: slide show frames, subsequent pass video frames, final pass video frames, original audio data stream.

14. The method of claim 10, further comprising the step of arranging the video frames for downloading in the following order: low frame rate video frames, subsequent pass video frames.

15. The method of claim 1, wherein the assembling step further comprises the step of creating an index file representative of an order for the reconfigured audio/ video data file.

16. The method of claim 1, wherein preparing the reordering step comprises the steps of: using a frame selector module in at least two successive passes to select and identify appropriate video frames and their download order; and creating and updating an index file with a timing and a sequence for each frame.

17. A system for encoding and decoding an original audio/video data file having an original order, comprising: a transcoder module for decoding the original audio/video data file into a video data stream; a frame selector module for specifying a reordered sequence for the video data stream; a storage unit comprising a computer readable medium, for storing a reconfigured audio/video file representative of the original audio/video data file in the reordered sequence; and a user terminal through which a user may display the reconfigured audio/ video file.

3Z

18. The system of claim 17, further comprising: a frame sequencing interface for retrieving and sending the reconfigured audio/video file from the storage unit to the user terminal; and a receive sequencing interface for downloading and processing the reconfigured audio/video file for display.

19. The system of claim 18, wherein the transcoder module is capable of decoding an original audio data stream from the original audio/video data file and compressing the original audio data stream into a low bit rate audio data stream.

20. The system of claim 19, wherein the transcoder module creates a plurality of data files representative of the video frames of the original audio/video data file, the original audio data stream, and the low bit rate audio data stream.

21. The system of claim 17, wherein the reconfigured audio/video file comprises an original audio data stream, a low bit rate audio data stream, a video data stream, and a first timing mechanism associated with each stream indicating a display order for the audio stream data and the video stream data.

22. The system of claim 21, wherein the video data stream of the reconfigured audio/video file comprises data representing a sequence of video frames.

23. The system of claim 18, wherein the video data stream of the reconfigured audio/video file is ordered so that video frames comprising a slide show are downloaded first.

24. The system of claim 23, wherein the reconfigured audio/video file is ordered so that the low bit rate audio stream is interleaved with video frames comprising the slide show, such that the slide show with low bit rate audio can be displayed while the data is being downloaded.

25. The system of claim 24, wherein the reconfigured audio/video file is >3 structured such that additional video frame data is downloaded after the data comprising the slide show and low bit rate audio.

26. The system of claim 25, wherein the reconfigured audio/video file is rearranged into its original order.

27. The system of claim 26, wherein the original audio/video data file has a video frame rate between approximately four frames per second and approximately thirty frames per second.

28. The system of claim 19, wherein the original audio data stream is downloaded after all video data stream information.

29. The system of claim 17, wherein the reconfigured audio/video file is arranged for playback according to its original order.

30. The system of claim 23, wherein each video frame in the slide show is associated with a second timing mechanism to indicate a display order and time.

31. The system of claim 30, wherein the slide show has a display rate between approximately one frame every ten seconds and approximately four frames per second.

32. The system of claim 31, wherein the display rate is approximately one frame every two seconds.

33. The system of claim 31, wherein the display rate is variable.

34. The system of claim 24, wherein the low bit rate audio stream has a transmission bandwidth between approximately twelve and approximately fourteen kilobits per second.

3

35. The system of claim 18, wherein the video data file is reordered so that video frames comprising a low frame rate video are downloaded first, so that the low frame rate video can be displayed while the data is being downloaded.

36. The system of claim 35, wherein the video data file is reordered so that the low bit rate audio stream is passed down at the same time as the video frames comprising the low frame rate video, such that a low frame rate video with low bit rate audio can be displayed while the data is being downloaded.

37. The system of claim 18, wherein the frame selector module prepares an index file representative of an order for the reconfigured audio/video file.

38. The system of claim 37, wherein the frame sequencing interface utilizes the index file to create a reconfigured audio/video file for downloading.

39. The system of claim 38, wherein the frame sequencing interface selects and downloads video data in an appropriate order selected from one of the following orders: a) slide show frames, low frame rate video frames, faster frame rate video frames; b) low frame rate video frames, faster frame rate video frames; c) slide show frames with low bit rate audio, low frame rate video frames, faster frame rate video frames, original audio; d), low frame rate video frames with low bit rate audio, faster frame rate video frames, original audio;

40. The system of claim 39, wherein the appropriate order is selected according to a download speed.

3