WO2007072244A1 - A device for and a method of processing a data stream comprising a plurality of frames - Google Patents

A device for and a method of processing a data stream comprising a plurality of frames Download PDF

Info

Publication number
WO2007072244A1
WO2007072244A1 PCT/IB2006/054417 IB2006054417W WO2007072244A1 WO 2007072244 A1 WO2007072244 A1 WO 2007072244A1 IB 2006054417 W IB2006054417 W IB 2006054417W WO 2007072244 A1 WO2007072244 A1 WO 2007072244A1
Authority
WO
WIPO (PCT)
Prior art keywords
frames
frame
stream
reproduction mode
play
Prior art date
Application number
PCT/IB2006/054417
Other languages
French (fr)
Inventor
Albert M. A. Rijckaert
Eric W. J. Moors
Roland P. J. M. Manders
Jozef P. Van Gassel
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2007072244A1 publication Critical patent/WO2007072244A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/78Television signal recording using magnetic recording
    • H04N5/782Television signal recording using magnetic recording on tape
    • H04N5/783Adaptations for reproducing at a rate different from the recording rate
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/005Reproducing at a different information rate from the information rate of recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/48Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/587Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/78Television signal recording using magnetic recording
    • H04N5/781Television signal recording using magnetic recording on disks or drums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/84Television signal recording using optical recording
    • H04N5/85Television signal recording using optical recording on discs or drums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/907Television signal recording using static stores, e.g. storage tubes or semiconductor memories
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/804Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
    • H04N9/8042Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components involving data reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal

Definitions

  • the invention relates to a device for processing a data stream comprising a plurality of frames.
  • the invention further relates to a method of processing a data stream comprising a plurality of frames.
  • the invention further relates to a program element.
  • the invention further relates to a computer-readable medium.
  • audio and video data are often stored in a compressed manner, and for security reasons in an encrypted manner.
  • MPEG2 is a standard for the generic coding of moving pictures and associated audio and creates a video stream out of frame data that can be arranged in a specified order called the GOP ("Group Of Pictures") structure.
  • An MPEG2 video bit stream is made up of a series of data frames encoding pictures.
  • the three ways of encoding a picture are intra-coded (I picture), forward predictive (P picture) and bi-directional predictive (B picture).
  • An intra- coded frame (I-frame) is an independently decodable frame.
  • a forward predictive frame (P- frame) needs information of a preceding I-frame or P-frame.
  • a bi-directional predictive frame (B-frame) is dependent on information of a preceding and/or subsequent I-frame or P- frame.
  • the coding type of each frame in the MPEG coded video data is identified, and freeze frames are inserted as a predefined function of the identified coding type and as a predefined function of a desired slow down factor.
  • freeze frames are inserted as a predefined function of the identified coding type and as a predefined function of a desired slow down factor.
  • n for a slow-down factor of n, for each original I- or P-frame, (n-1) backward-predicted freeze frames are inserted, and for each original B-frame, (n-1) copies of the original B-frames are added, and a selected amount of padding is added to each copy of each original B-frame in order to obtain a normal play bit rate and avoid video buffer overflow or underflow.
  • a device for processing a data stream comprising a plurality of frames a method of processing a data stream comprising a plurality of frames, a program element and a computer-readable medium according to the independent claims are provided.
  • a device for processing a data stream comprising a plurality of frames comprising a plurality of frames
  • the device comprises a detection unit for detecting switching from a first reproduction mode to a second reproduction mode, a delay unit for delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames, and a correction unit for correcting a temporal reference of the plurality of frames.
  • a method of processing a data stream comprising a plurality of frames comprising detecting switching from a first reproduction mode to a second reproduction mode, delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames, and correcting a temporal reference of the plurality of frames.
  • a computer-readable medium in which a computer program is stored, which computer program, when being executed by a processor, is adapted to control or carry out the above-mentioned method.
  • a program element is provided, which program element, when being executed by a processor, is adapted to control or carry out the above-mentioned method.
  • the data processing according to the invention can be realized by a computer program, that is to say by software, or by using one or more special electronic optimization circuits, that is to say in hardware, or in hybrid form, that is to say by means of software components and hardware components.
  • a switch is detected from one playback operation mode to another playback operation mode, more particularly from a normal play mode to a trick-play mode, and still more particularly from a normal play mode to a slow-forward mode, or vice versa.
  • measures may be taken to smoothly change the operation mode so as to achieve proper audio and/or video playback quality for a user. Since switching between different playback modes may include reordering different frames, repeating frames or, more generally, modifying the velocity of playback, such a switch should be properly synchronized with the altered characteristics of playing back the frames.
  • the switch from the outgoing mode to the incoming mode may be delayed, wherein the delay time may be selected so that the next anchor frame of the frame sequence is awaited.
  • the waiting time may be used for calculations related to a necessary correction of the timing or other temporal references between the subsequent frames.
  • a proper playback quality may be obtained when the start of the new operation mode is postponed until a start of a next anchor frame (in the MPEG2 technology the start of the next I-frame or P-frame).
  • Such an anchor frame may be a frame which may serve as an anchor for playing back content with a sufficient degree of independency from other frames.
  • Such a procedure may simultaneously allow a fast switch to a new desired operation mode, a proper quality and sufficient time for a processor to calculate a sequence of frames in accordance with the new operation mode and in accordance with the requirements of a transition between the two playback modes.
  • an exemplary embodiment of the invention provides correction of temporal references and switching effects for slow-forward.
  • a scheme for correcting possibly incorrect temporal reference of MPEG frames is provided, which may occur when there is a switch from normal play mode to slow-forward mode, or vice versa, in an MPEG decoder.
  • the switching may be delayed until the start of a next anchor frame.
  • the temporal references may be corrected during a transition phase from normal play mode to slow-forward mode, or vice versa.
  • An exemplary embodiment of the invention relates to a storage device for storing MPEG transport streams with a digital interface to an MPEG compliant decoder that is capable of providing an MPEG compliant stream during the transition phase from normal play to slow- forward, or vice versa.
  • the temporal reference of the MPEG frames may be incorrect. Due to reordering the temporal reference of already transmitted frames cannot be corrected without using large buffers, which would add cost and latency.
  • the storage device may delay the switching point to slow-forward until the start of a next anchor frame (which may be an I-frame or a P-frame in the MPEG terminology).
  • the temporal references are corrected during a transition phase from normal play to slow-forward. Thereafter, a slow-forward stream may be created.
  • the switching point may be delayed until the start of the next anchor frame after receiving the switching command and the correction of the temporal references may be continued until the start of the next I-frame. Thereafter, the normal play stream may be transmitted.
  • the first or the second reproduction mode may be a standstill mode which may be interpreted as a very slow (or an infinite) slow- forward mode. Therefore, when switching between a normal play mode and a standstill mode, or vice versa, or when switching between a slow- forward or slow-backward mode and a standstill mode, methods of correcting the temporal references of the frames according to an exemplary embodiment of the invention may be applied as well.
  • Embodiments of the invention may therefore solve problems related to the temporal reference of frames in a switching regime.
  • anchor frame may particularly denote a frame which, in transmission order and/or in display order, keeps its relative temporal position with respect to other anchor frames.
  • 1-frames and P-frames may be denoted as anchor frames.
  • B-frames would not be denoted as anchor frames in the context of MPEG2.
  • At least two types of empty B-frames may be distinguished.
  • Empty forward predictive B-frames (so-called Bf- frames) may particularly denote frames referring to an anchor frame preceding the Bf- frame in a display mode.
  • Bb-frames may particularly denote frames referring to an anchor frame following the Bb-frame in a display mode.
  • Bf-frames and Bb-frames particularly differ concerning the property at which position they should be inserted in a data stream.
  • switching effects occurring between different reproduction modes may be taken into account according to an exemplary embodiment of the invention.
  • the use of a so-called Bf-empty frame may be even more advantageous as compared to the use of empty P-frames or of Bb-frames.
  • This phenomenon will be described below in more detail with reference to Fig. 42 to Fig. 44.
  • the command to switch to slow- forward may be received somewhere in the time period indicated with the switching area 4002.
  • Both Bf-frames and Pe-frames may result in a slow- forward effect starting with the repetition of the first anchor frame (in display order) following the reception of the slow- forward command. In the embodiment of Fig. 42 and Fig. 43, this is anchor frame P7.
  • Bb-frames may result in a slow- forward stream starting one frame later, as can be taken from Fig. 44. Repetition starts with the first B-frame following the first anchor frame (B8 in the example of Fig. 44). So one advantage of using Bf-frames is that it is possible to start one frame sooner with the slow- forward stream than when Bb-frames are used.
  • a further advantage may be that, when constructing the slow- forward stream, and when an anchor frame is encountered, only Bf-frames are post-inserted (that is to say after the anchor frame). Both Bb-frames and Pe-frames are pre-inserted. Therefore, when using Bf-frames, the anchor frame may not need to be kept in memory after it was read, but can be played out immediately, thus supporting an improved memory usage during slow- forward trick-play reconstruction.
  • the first reproduction mode or the second reproduction mode may be a normal play mode.
  • the other one of the first reproduction mode and the second reproduction mode may be a trick-play mode.
  • the switching may be between a normal play mode and a trick-play mode, in both directions.
  • playback parameters like the playback speed may be modified in accordance with a (predetermined or user-defined) trick-play factor, for instance a slow-motion factor or a fast- forward factor.
  • the trick-play mode may be a slow-forward reproduction mode, a slow- reverse reproduction mode, a freeze frame reproduction mode, a standstill reproduction mode or an instant replay reproduction mode.
  • the invention is not restricted to these trick-play modes but may be applied to other trick-play modes as well, like fast-forward or fast-reverse.
  • the frame of the data stream to be processed may include at least one of the group consisting of an intra-coded frame (I-frame), a forward predictive frame (P-frame) and a bi-directional predictive frame (B-frame). Such frames may be part of an MPEG2 video bit stream.
  • An intra-coded frame (I-frame) is an independently decodable frame.
  • a forward predictive frame (P-frame) needs information of a preceding I-frame or P-frame.
  • a bidirectional predictive frame (B-frame) may be dependent on information of a preceding and/or of a subsequent I-frame or P-frame.
  • the anchor frame may be an intra-coded frame or a forward predictive frame, since these kinds of frames may have a higher degree of independence of other frames as compared to a B-frame.
  • the correction unit may be adapted for correcting a sequence of the plurality of frames by means of an empty forward predictive bi-directional predictive frame (a so- called Bf frame).
  • Bf frame empty forward predictive bi-directional predictive frame
  • the correction unit may be adapted for correcting a sequence of the plurality of frames by means of an empty backward predictive bi-directional predictive frame (a so-called Bb-frame). It may be more desirable to play back a Bf- frame after the anchor frame, as has been described above.
  • the correction unit may be adapted for correcting a sequence of the plurality of frames by means of an empty forward predictive frame (a so-called Pe- frame).
  • the correction unit may be adapted for correcting the temporal reference of the plurality of frames during a transition phase after the delay. Between the delay time and a time interval in which the data is presented in the second reproduction mode, a transition phase occurs in which the timing of the frames may be adjusted to the modified reproduction mode.
  • the correction unit may be adapted for correcting the temporal reference of the plurality of frames in such a manner that an order of the plurality of frames is corrected.
  • the altered time dependencies between the different frames, or the different numbers assigned to the frames and related to the order of playback may require a reordered time sequence of the plurality of frames.
  • the device may comprise an insertion unit adapted for inserting frames, particularly empty frames, after having switched (and/or during switching) from the first reproduction mode to the second reproduction mode.
  • Such an insertion may take into account modified requirements when switching from one reproduction mode to another one.
  • the insertion unit may be adapted to insert forward predictive frames and/or bi-directional predictive frame as the empty frames. Therefore, the playback velocity may be modified by insertion of frames having low data storage requirements, for instance as compared to an I-frame. This may allow for a calculation of the played back stream with low computational burden, and may reduce the storage capacities and the computing resources involved.
  • the device may comprise a repetition unit adapted for repeating frames after having switched from the first reproduction mode to the second reproduction mode. Therefore, it is also possible to simply repeat frames several times so as to achieve a desired reproduction factor.
  • the repetition unit may be adapted for repeating forward predictive frames and/or bi-directional predictive frames. This is advantageous from the storage memory point of view, that is to say may keep the required storage capacities in a reasonable limit.
  • the device may comprise a storage unit for storing the data stream. Such a storage unit may be a harddisk, a flash card or any other data carrier like a CD or a DVD. However, the storage unit may also be an Internet server to which the device has (network-)access for downloading required information.
  • the device may further be adapted to process a plaintext data stream, a fully encrypted data stream or a mixture of encrypted parts and plaintext parts (a so-called hybrid stream).
  • the entire data streams may be entirely encrypted or decrypted or may be a combination of both.
  • decryptors and/or encryptors may be foreseen at appropriate positions of a data processing system according to an embodiment of the invention.
  • the device may further be adapted to process a data stream of video data or audio data.
  • such content is not the only type of data which may be processed with the scheme according to embodiments of the invention.
  • Trick-play generation and similar applications may be an issue for both, video processing and (pure) audio processing.
  • the device may further be adapted to process a data stream of digital data.
  • the device may comprise a reproduction unit for reproducing the processed data stream.
  • a reproduction unit may comprise a loudspeaker or earphones and/or an optical display device so that both, audio and visual data can be reproduced perceivable for a human being.
  • the device may comprise a generation unit for processing the data stream for reproduction in a trick-play reproduction mode.
  • Such a generation unit may be adjusted or controlled by a user by selecting corresponding options in a user interface, for instance buttons of a device, a keypad or a remote control.
  • the trick-play reproduction mode selected by a user may be one of the group consisting of a fast-forward reproduction mode, a fast- reverse reproduction mode, a slow- forward reproduction mode, a slow-backward reproduction mode, a freeze frame reproduction mode, an instant reproduction mode, and a reverse reproduction mode.
  • Other trick-play streams are however possible.
  • For trick-play only a portion of subsequent data shall be used for output (for instance for visual display and/or for acoustical output) or one and the same content shall be used several times.
  • the device according to exemplary embodiments of the invention may be adapted to process an MPEG2 data stream.
  • MPEG2 is a designation for a group of audio and video coding standards agreed upon by MPEG (moving pictures experts group), and published as the ISO/IEC 13818 International Standard.
  • MPEG2 is used to encode audio and video broadcast signals including digital satellite and cable TV, but may also be used for DVD.
  • the device according to exemplary embodiments of the invention may also be adapted to process an MPEG4 data stream. More generally, any codec scheme may be implemented which uses anchor frames from which other frames are dependent, particularly any type of encoding using predictive frames and thus any kind of MPEG encoding/decoding.
  • the device may be realized as one of the group consisting of digital video recording device, a network-enabled device, a conditional access system, a portable audio player, a portable video player, a mobile phone, a DVD player, a CD player, a hard disk based media player, an Internet radio device, a public entertainment device, and an MP3 player.
  • digital video recording device a network-enabled device
  • conditional access system a portable audio player
  • a portable video player a mobile phone
  • DVD player a CD player
  • a hard disk based media player a hard disk based media player
  • an Internet radio device a public entertainment device
  • MP3 player an MP3 player
  • Fig. 1 illustrates a time-stamped transport stream packet.
  • Fig. 2 shows an MPEG2 group of picture structure with intra-coded frames and forward predictive frames.
  • Fig. 3 illustrates an MPE G2 group of picture structure with intra-coded frames, forward predictive frames and bi-directional predictive frames.
  • Fig. 4 illustrates a structure of a characteristic point information file and stored stream content.
  • Fig. 5 illustrates a system for trick-play on a plaintext stream.
  • Fig. 6 illustrates time compression in trick-play.
  • Fig. 7 illustrates trick-play with fractional distance.
  • Fig. 8 illustrates low speed trick-play.
  • Fig. 9 illustrates a general conditional access system structure.
  • Fig. 10 illustrates a digital video broadcasting encrypted transport stream packet.
  • Fig. 11 illustrates a transport stream packet header of the digital video broadcasting encrypted transport stream packet of Fig. 10.
  • Fig. 12 illustrates a system allowing the performance of trick-play on a fully encrypted stream.
  • Fig. 13 illustrates a full transport stream and a partial transport stream.
  • Fig. 14 illustrates Entitlement Control Messages for a stream type I and for a stream type II.
  • Fig. 15 illustrates writing Control Words to a decrypter.
  • Fig. 16 illustrates Entitlement Control Message handling in a fast forward mode.
  • Fig. 17 illustrates detection of one or two Control Words.
  • Fig. 18 illustrates a device for processing a data stream according to an exemplary embodiment.
  • Fig. 19 illustrates splitting of the packet at a frame boundary.
  • Fig. 20 illustrates slow- forward construction after decryption of normal play data.
  • Fig. 21 illustrates a hybrid stream with plaintext packets on each frame boundary.
  • Fig. 22 illustrates slow-forward construction on a stored hybrid stream.
  • Fig. 23 illustrates an incomplete picture start code at the concatenation point.
  • Fig. 24 illustrates the effect of reordering in normal play.
  • Fig. 25 illustrates the effect of reordering in slow-forward mode.
  • Fig. 26 illustrates the insertion of empty P-frames before the anchor frames.
  • Fig. 27 illustrates the use of backward predictive empty B-frames.
  • Fig. 28 illustrates the use of forward predictive empty B-frames.
  • Fig. 29 illustrates a temporal reference for normal play.
  • Fig. 30 illustrates a temporal reference for slow-forward with Bf-frames.
  • Fig. 31 illustrates a temporal reference for pre-insertion of Bb-frames.
  • Fig. 32 illustrates a temporal reference for pre-insertion of Pe-frames.
  • Fig. 33 illustrates a temporal reference for three types of B-frames.
  • Fig. 34 illustrates a distance D and a slow motion factor L for a normal play and a slow-forward stream.
  • Fig. 35 illustrates a temporal reference for the I-frame with empty B-frames used.
  • Fig. 36 illustrates a temporal reference for the P-frame when empty B-frames are used.
  • Fig. 37 illustrates a temporal reference for the I-frame when empty P-frames are used.
  • Fig. 38 illustrates a temporal reference for the P-frame when empty P-frames are used.
  • Fig. 39 illustrates a temporal reference for empty P-frames.
  • Fig. 40 illustrates switching from normal play to slow- forward at the start of a GOP.
  • Fig. 41 illustrates switching at the first frame after the switching command.
  • Fig. 42 illustrates switching from normal play to slow-forward along a GOP.
  • Fig. 43 illustrates switching from normal play to slow-forward with Pe-frames.
  • Fig. 44 illustrates switching from normal play to slow-forward with Bb- frames.
  • Fig. 45 illustrates switching from slow- forward to normal play at the start of a GOP.
  • Fig. 46 illustrates switching from slow-forward to normal play along a GOP.
  • Fig. 47 illustrates switching from slow-forward to normal play with Pe-frames.
  • Fig. 48 illustrates the splitting of the stream for one PES packet per frame.
  • Fig. 49 illustrates the splitting of the stream at the start of a PES header.
  • Fig. 50 illustrates the splitting of the stream at the start of a Picture Start Code.
  • Fig. 51 illustrates the splitting of the stream within a Picture Start Code.
  • Fig. 52 illustrates an incomplete picture start code at the concatenation point.
  • Fig. 54 illustrates an example of n+m>4.
  • Fig. 55 illustrates an example of n+m ⁇ 4.
  • time-stamped transport stream This comprises transport stream packets, all of which are pre-pended with a 4 bytes header in which the transport stream packet arrival time is placed. This time may be derived from the value of the program clock reference (PCR) time-base at the time the first byte of the packet is received at the recording device. This is a proper method to store the timing information with the stream, so that playback of the stream becomes a relatively easy process.
  • PCR program clock reference
  • Fig. 1 illustrates a time stamped transport stream packet 100 having a total length 104 of 188 Bytes and comprising a time stamp 101 having a length 105 of 4 Bytes, a packet header 102, and a packet payload 103 having a length of 184 Bytes.
  • trick-play engines When creating trick-play for an MPEG/DVB transport stream, problems may arise when the content is at least partially encrypted. It may not be possible to descend to the elementary stream level, which is the usual approach, or even access any packetized elementary stream (PES) headers before decryption. This also means that finding picture frames is not possible.
  • PES packetized elementary stream
  • ECM denotes an Entitlement Control Message.
  • This message may particularly comprise secret provider proprietary information and may, among others, contain encrypted Control Words (CW) needed to decrypt the MPEG stream. Typically, Control Words expire in 10-20 seconds.
  • CW Control Words
  • the term “keys” particularly denotes data that may be stored in a smart card and may be transferred to the smart card using EMMs, that is so-called "Entitlement Management Messages” that may be embedded in the transport stream. These keys may be used by the smart card to decrypt the Control Words present in the ECM. An exemplary validity period of such a key is one month.
  • the term “Control Words” (CW) particularly denotes decryption information needed to decrypt actual content. Control words may be decrypted by the smart card and then stored in a memory of the decryption core.
  • any MPEG2 streams created are MPEG2 compliant transport streams. This is because the decoder may not only be integrated within a device, but may also be connected via a standard digital interface, such as an IEEE 1394 interface, for example.
  • FIG. 2 shows a stream 200 comprising several MPEG2 GOP structures with a sequence of I-frames 201 and P-frames 202.
  • the GOP size is denoted with reference numeral 203.
  • the GOP size 203 is set to 12 frames, and only I-frames 201 and P-frames 202 are shown here.
  • a GOP structure may be used in which only the first frame is coded independently of other frames. This is the so-called intra-coded or I-frame 201.
  • the predictive frames or P-frames 202 are coded with a unidirectional prediction, meaning that they only rely on the previous I-frame 201 or P-frame 202 as indicated by arrows 204 in Figure 2.
  • Such a GOP structure has typically a size of 12 or 16 frames 201, 202.
  • Another structure 300 of a plurality of GOPs is shown in Fig. 3. Particularly, Fig. 3 shows the MPEG2 GOP structure with a sequence of I-frames 201, P-frames 202 and B-frames 301.
  • the GOP size is again denoted with reference numeral 203.
  • B-frames 301 it is possible to use a GOP structure containing also bi-directionally predictive frames or B-frames 301 as shown in Fig. 3.
  • a GOP size 203 of 12 frames is chosen for the example.
  • the B-frames 301 are coded with a bi-directional prediction, meaning that they rely on a previous and a next I- or P-frame 201, 202 as indicated for some B-frames 301 by curved arrows 204.
  • the transmission order of the compressed frames may be not the same as the order in which they are displayed.
  • both reference frames before and after the B-frame 301 are needed.
  • the compressed frames may be reordered. So in transmission, the reference frames may come first.
  • the reordered stream, as it is transmitted, is also shown in Fig. 3, lower part.
  • the reordering is indicated by straight arrows 302.
  • a stream containing B-frames 301 can give a nice looking trick-play picture if all the B-frames 301 are skipped. For the present example, this leads to a trick-play speed of 3x forward.
  • the distance between I-frames in normal play is around half a second and for slow- forward/reverse it is multiplied with the slow motion factor. So this type of slow- forward or slow-reverse is not really the slow motion consumers are used to but in fact it is more like a slide show with a large temporal distance between the successive pictures.
  • still picture mode In another trick-play mode the display picture is halted. This can be achieved by adding empty P-frames to the I-frame for the duration of the still picture mode. This means that the picture resulting from the last I-frame is halted. When switching to still picture from normal play, this can also be the nearest I-frame according to the data in the CPI file.
  • This technique is an extension of the fast-forward/reverse modes and results in nice still pictures especially if interlace kill is used. However the positional accuracy is often not sufficient when switching from normal play or slow- forward/reverse to still picture.
  • the still picture mode can be extended to implement a step mode.
  • the step command advances the stream to some next or previous I-frame.
  • the step size is at minimum one GOP but can also be set to a higher value equal to an integer number of GOPs.
  • Step forward and step backward are both possible in this case because only I-frames are used.
  • the slow- forward can also be based on a repetition of every frame, which results in a much smoother slow motion.
  • the best form of slow- forward would in fact be a repetition of fields instead of frames because the temporal resolution is doubled and there are no interlace artifacts. This is however practically impossible for the intrinsically frame based MPEG2 streams and even more so if they are largely encrypted.
  • interlace artifacts can be significantly reduced for the I- and P-frames by using special empty frames to force the repetition. Such an interlace reduction technique is not available for the B-frames though. Whether the use of interlace kill for the I- and P-frames is still advantageous in this case or in fact leads to a more annoying picture for the viewer can only be verified by experiments.
  • Still picture mode can be defined as an extension of the frame-based slow- forward mode. It is based on a repeated display of the current frame for the duration of the still picture mode whatever the type of this frame is. This is in fact a slow- forward with an infinite slow motion factor if this indicates the factor with which the normal play stream is slowed down. No interlace kill is possible if the picture is halted on a B-frame. In that sense this still picture mode is worse than the trick-play GOP based still picture mode. This can be corrected by only halting the picture at an I- or P- frame at the cost of a somewhat less accurate still picture position. Discontinuities in the temporal reference and the PTS can also be avoided in this case.
  • bit rate is significantly reduced because the repetition of an I- or P-frame is forced by the insertion of empty frames instead of a repetition of the frame data itself as is necessary for the B-frames. So, technically speaking, the halting of a picture at an I- or P-frame is the best choice.
  • the still picture mode can also be extended with a step mode.
  • the step command advances the stream in principle to the next frame. Larger step sizes are possible by stepping to the next P-frame or some next I-frame. A step backward on frame basis is not possible. The only option is to step backward to one of the previous I-frames.
  • Two types of still picture mode have been mentioned, namely trick-play GOP based and frame based. The first one is most logically connected to fast-forward/reverse whereas the second one is related to slow-forward. When switching from some mode to still picture, it is preferable to choose the related still picture mode to minimize the switching delay.
  • the streams resulting from both methods look very alike because they are both based on the insertion of empty frames to force the repetition of an anchor frame. But on detailed stream construction level there are some differences.
  • characteristic point information file 400 is visualized in Fig.
  • the CPI file 400 may also contain some other data that are not discussed here.
  • the CPI file 400 With the data from the CPI file 400 it is possible to jump to the start of any I- frame 201 in the stream. If the CPI file 400 also contains the end of the I-frames 201, the amount of data to read from the transport stream file is exactly known to get a complete I- frame 201. If for some reason the I-frame end is not known, the entire GOP or at least a large part of the GOP data is to be read to be sure that the entire I-frame 201 is read. The end of the GOP is given by the start of the next I-frame 201. It is known from measurements that the amount of I-frame data can be 40% or more of the total GOP data.
  • trick-play picture refresh rate can be achieved by displaying each I-frame 201 several times.
  • the bit rate will be reduced accordingly. This may be achieved by adding so-called empty P-frames 202 between the I-frames 201.
  • Such an empty P-frame 202 is not really empty but may contain data instructing the decoder to repeat the previous frame. This has a limited bit cost, which can in many cases be neglected compared to an I-frame 201.
  • trick-play GOP structures like IPP or IPPP may be acceptable for the trick-play picture quality and even advantageous at high trick-play speeds.
  • the resulting trick-play bit rate is of the same order as the normal play bit rate. It is also mentioned that these structures may reduce the required sustained bandwidth from the storage device.
  • a trick-play system 500 is schematically depicted in Fig. 5.
  • the trick-play system 500 comprises a recording unit 501, an I-frame selection unit 502, a trick-play generation block 503 and an MPEG2 decoder 504.
  • the trick-play generation block 503 includes a parsing unit 505, an adding unit 506, a packetizer unit 507, a table memory unit 508 and a multiplexer 509.
  • the recording unit 501 provides the I-frame selection unit 502 with plaintext
  • the multiplexer 509 provides the MPEG2 decoder 504 with an MPEG2 DVB compliant transport stream 511.
  • the I-frame selector 502 reads specific I-frames 201 from the storage device 501. Which I-frames 201 are chosen depends on the trick-play speed as will be described below. The retrieved I-frames 201 are used to construct an MPEG-2/DVB compliant trick- play stream that is then sent to the MPEG-2 decoder 504 for decoding and rendering.
  • the position of the I-frame packets in the trick-play stream cannot be coupled to the relative timing of the original transport stream.
  • the time axis may be compressed or expanded with the speed factor and additionally inversed for reverse trick- play. Therefore, the time stamps of the original time stamped transport stream may not be suitable for trick-play generation.
  • the original PCR time base may be disturbing for trick-play.
  • a PCR will be available within the selected I-frame 201.
  • the frequency of the PCR time base would be changed. According to the MPEG2 specification, this frequency should be within 30 ppm from 27
  • I-frames 201 normally contain two time stamps that tell the decoder
  • Decoding and presentation may be started when DTS respectively PTS are equal to the PCR time base, which is reconstructed in the decoder 504 by means of the PCRs in the stream.
  • the distance between, e.g., the PTS values of 2 I-frames 201 corresponds to their nominal distance in display time. In trick-play this time distance is compressed or expanded with the speed factor. Since a new PCR time base is used in trick-play, and because the distance for DTS and PTS is no longer correct, the original DTS and PTS of the I-frame 201 have to be replaced.
  • the I-frame 201 may first be parsed into an elementary stream in the parsing unit 505. Then the empty P-frames 202 are added on elementary stream level. The obtained trick-play, GOP is mapped into one PES packet and packetized to transport stream packets. Then corrected tables like PAT, PMT, etc. are added. At this stage, a new PCR time base together with DTS and PTS are included. The transport stream packets are pre-pended with a 4 bytes time stamp that is coupled to the PCR time base such that the trick-play stream can be handled by the same output circuitry as used for normal play.
  • trick-play speeds In the following, some aspects related to trick-play speeds will be described. In this context, firstly, fixed trick-play speeds will be discussed.
  • N b G/T (1)
  • the basic speed is an integer but this is not necessarily the case.
  • the set of trick-play speeds resulting from the method described above is satisfying, in some cases not.
  • the trick-play speed formula will be inverted and the distance D will be calculated which is given by:
  • next ideal point Ip with the distance D may be calculated and one of the I-frames 201 may be chosen closest to this ideal point to construct a trick-play GOP.
  • next ideal point may be calculated by increasing the last ideal point by D.
  • trick-play speed N does not need to be an integer but can be any number above the basic speed Nb. Also speeds below this minimum can be chosen, but then the picture refresh rate may be lowered locally because the effective trick-play GOP size T is doubled or at still lower speeds even tripled or more. This is due to a repetition of the trick-play GOPs, as the algorithm will choose the same I-frame 201 more than once.
  • the round function is used to select the I-frames 201 and as can be seen frames 2 and 4 are selected twice.
  • the described method will allow for a continuously variable trick- play speed.
  • a negative value is chosen for N.
  • the method described will also include the sets of fixed trick-play speeds mentioned earlier and they will have the same quality, especially if the round function is used. Therefore, it might be appropriate that the flexible method described in this section should always be implemented whatever the choice of the speeds will be.
  • refresh rate particularly denotes the frequency with which new pictures are displayed. Although not speed dependent, it will be briefly discussed here because it can influence the choice of T. If the refresh rate of the original picture is denoted by R (25Hz or 30Hz), the refresh rate of the trick-play picture [R 1 ) is given by:
  • Fig. 9 illustrates a conditional access system 900 which will now be described.
  • content 901 may be provided to a content encryption unit 902.
  • the content encryption unit 902 supplies a content decryption unit 904 with encrypted content 903.
  • ECM denotes Entitlement Control Messages.
  • KMM denotes Key Management Messages
  • GKM denotes Group Key Messages
  • EMM denotes Entitlement Management Messages.
  • the Word 906 may be supplied to the content encryption unit 902 and to an ECM generation unit 907.
  • the ECM generation unit 907 generates an ECM and provides the same to an ECM decoding unit 908 of a smart card 905.
  • the ECM decoding unit 908 generates from the ECM a Control Word that is decryption information that is needed and provided to the content encryption unit 904 to decrypt the encrypted content 903.
  • an authorization key 910 is provided to the ECM generation unit 907 and to a KMM generation unit 911, wherein the latter generates a KMM and provides the same to a KMM decoding unit 912 of the smart card 905.
  • the KMM decoding unit 912 provides an output signal to the ECM decoding unit 908.
  • a group key 914 may be provided to the KMM generation unit 911 and to a GKM generation unit 915 which may further be provided with a user key 918.
  • the GKM generation unit 915 generates a GKM signal GKM and provides the same to a GKM decoding unit 916 of the smart card 905, wherein the GKM decoding unit 916 gets as a further input a user key 917.
  • entitlements 919 may be provided to an EMM generation unit
  • the EMM decoding unit 921 located in the smart card 905 is coupled with an entitlement list unit 913 which provides the ECM decoding unit 908 with corresponding control information.
  • CA conditional access
  • the broadcasted content 901 is encrypted under the control of the CA system 900.
  • content is decrypted before decoding and rendering if access is granted by the CA system 900.
  • the CA system 900 uses a layered hierarchy (see Fig. 9).
  • the CA system 900 transfers the content decryption key (Control Word CW 906, 909) from server to client in the form of an encrypted message, called an ECM.
  • ECMs are encrypted using an authorization key (AK) 910.
  • the CA server 900 may renew the authorization key 910 by issuing a KMM.
  • a KMM is in fact a special type of EMM, but for clarity the term KMM may be used.
  • KMMs are also encrypted using a key that for instance can be a group key (GK) 914, which is renewed by sending a GKM that is again a special type of EMM.
  • GK group key
  • GKMs are then encrypted with the user key (UK) 917, 918, which is a fixed unique key embedded in the smart card 905 and known by the CA system 900 of the provider only.
  • Authorization keys and group keys are stored in the smart card 905 of the receiver.
  • Entitlements 919 are sent to individual customers in the form of an EMM and stored locally in a secure device (smart card 905). Entitlements 919 are coupled to a specific program. An entitlements list 913 gives access to a group of programs depending on the type of subscription. ECMs are only processed into keys (Control Words) by the smart card 905 if an entitlement 919 is available for the specific program.
  • Entitlement EMMs are subject to an identical layered structure as the KMMs (not depicted in Fig. 9).
  • ECMs and EMMs are all multiplexed into a single MPEG2 transport stream.
  • the description above is a generalized view of the CA system 900.
  • digital video broadcasting only the encryption algorithm, the odd/even Control Word structure, the global structure of ECMs and EMMs and their referencing are defined.
  • the detailed structure of the CA system 900 and the way the payloads of ECMs and EMMs are encoded and used are provider specific. Also the smart card is provider specific. However, from experience it is known that many providers follow essentially the structure of the generalized view of Fig. 9.
  • the applied encryption and decryption algorithm is defined by the DVB standardization organization. In principle two encryption possibilities are defined namely PES level encryption and TS level encryption. However, in real life mainly the TS level encryption method is used. Encryption and decryption of the transport stream packets is done packet based. This means that the encryption and decryption algorithm is restarted every time a new transport stream packet is received. Therefore, packets can be encrypted or decrypted individually. In the transport stream, encrypted and plaintext packets are mixed because some stream parts are encrypted (e.g. audio/video) and others are not (e.g. tables). Even within one stream part (e.g. video) encrypted and plaintext packets may be mixed.
  • a DVB encrypted transport stream packet 1000 will be described.
  • the stream packet 1000 has a length 1001 of 188 Bytes and comprises three portions.
  • a packet header 1002 has a size 1003 of 4 Bytes.
  • an adaptation field 1004 may be included in the stream packet 1000. After that, a DVB encrypted packet payload 1005 may be sent.
  • Fig. 11 illustrates a detailed structure of the transport stream packet header 1002 of Fig. 10.
  • the transport stream packet header 1002 comprises a synchronization unit
  • SYNC transport error indicator
  • TEI transport error indicator
  • PUSI payload unit start indicator
  • PID packet identifier
  • SCB transport scrambling control
  • AFLD adaptation field control
  • CC continuity counter
  • Packet header 1002 is in plaintext. It serves to obtain important information such as a packet identifier (PID) number, presence of an adaptation field, scrambling control bits, etc.
  • PID packet identifier
  • - Adaptation field 1004 is also in plaintext. It can contain important timing information such as the PCR.
  • - DVB Encrypted Packet Payload 1005 contains the actual program content that may have been encrypted using the DVB algorithm.
  • SCB scrambling control bits
  • FIG. 12 shows the basic principle of trick-play on a fully encrypted stream.
  • data stored on a hard disk 1201 are provided as a transport stream 1202 to a decrypter 1203.
  • the hard disk 1201 provides a smart card 1204 with an ECM, wherein the smart card 1204 generates Control Words from this ECM and sends the same to the decrypter 1203.
  • the decrypter 1203 decrypts the encrypted transport stream 1202 and sends the decrypted data to an I-frame detector and filter 1205. From there, the data are provided to an insert empty P frame unit 1206 which conveys the data to a set top box 1207. From there, data are provided to a television 1208.
  • Fig. 13 illustrates a full transport stream 1301.
  • NIT network information table
  • BAT bouquet association table
  • DIT discontinuity information table
  • Jumping to the next block during trick-play can mean jumping back in the stream. It will be explained that this may not be only the case for trick-play reverse but also for trick-play forward at moderate speeds. The situation for forward trick-play with forward jumps and for reverse trick-play with inherently backward jumps will be explained afterwards.
  • a conditional access system may be designed for transmission. In normal play, the transmitted stream may be reconstructed with original timings. But trick-play may have severe implications for the handling of cryptographic metadata due to changed timings.
  • the data may be compressed or expanded in time due to trick-play, but the latency of the smart card may remain constant.
  • Control Words used in the encryption process to decrypt the data blocks.
  • These Control Words may also be encrypted and stored in ECMs.
  • ECMs In a normal set-top-box (STB), these ECMs may be part of the program tuned to.
  • a conditional access module may extract the ECMs, send them to a smart card, and, if the card has rights or an authorization to decrypt these ECMs, may receive the decrypted Control Words from it.
  • Control Words usually have a relatively short lifetime of, for instance, approximately 10 seconds. This lifetime may be indicated by the Scrambling Control Bit, SCB 1014, in the transport stream packet headers. If it changes, the next Control Word has to be used. This SCB change or toggle is indicated in Fig. 14 by a vertical line and with a reference numeral 1402.
  • a stream type I shown in a lower row 1401 in Fig. 14 two Control Words (CWs) are provided per ECM.
  • CWs Control Words
  • a stream type II shown in an upper row 1400 in Fig. 14 only one Control Word (CW) is provided per ECM.
  • Fig. 14 illustrates the two data streams 1400, 1401 comprising subsequently arranged periods or segments A, B, C denoted with reference numeral 1403.
  • each ECM comprises two Control Words, namely the Control Word relating to the current period or ECM, and additionally the Control Word of the subsequent period or ECM.
  • the conditional access module may only send the first unique ECM it finds to the smart card to reduce or minimize the traffic to the card, as it may have a fairly slow processor.
  • ECM A may be defined as being the ECM that is present during the major part of period A. It can be seen that, in that case, ECM A holds the CW for the current period A and for stream type I additionally for the next period B. In general, an ECM may hold at least the CW for the current period and might hold the CW for the next period. Due to zapping, this may probably be true for all or many providers.
  • the decrypter may contain two registers, one for the "odd” and one for the "even” CW. "Odd” and “even” does not have to mean that the values of the CWs themselves are odd or even. The terms are particularly used to distinguish between two subsequent CWs in the stream. Which CW has to be used for the decryption of a packet is indicated by the SCB 1014 in the packet header. So the CWs used to encrypt the stream are alternating between odd and even. In Fig. 14 this means that, for instance, CW A and CW C are odd, whereas CW B and CW D are even. After the decryption by the smart card, CWs may be written to the corresponding registers in the decrypter overwriting previous values, as indicated in Fig. 15.
  • Fig. 15 illustrates the two registers 1501, 1502 containing even CWs (register 1501) and containing odd CWs (register 1502). Further, smart card latency 1500, that is a time needed by the smart card to retrieve or decrypt a CW from an ECM, is illustrated in Fig. 15.
  • each ECM holds two CWs and as a result both registers 1501, 1502 may be overwritten after the decryption of the ECM.
  • One of the registers 1501, 1502 is active and the other is inactive. Which one is active depends on the SCB 1014. In the example, the SCB 1014 will indicate during period B that the even register 1501 is the active one.
  • the active register may only be overwritten with a CW identical to the one it already holds because it is still needed for decryption of the remainder of that particular period. Therefore, only the inactive register may be overwritten with a new value.
  • This ECM should hold CW C to ensure a timely decryption by the smart card for usage at the start of period C.
  • Fig. 16 shows ECM handling in a fast forward mode.
  • a plurality of subsequent periods 1403 separated by SCB toggles 1402 a plurality of data blocks 1600 are reproduced, wherein a switching 1601 occurs between different data blocks.
  • an ECM B is sent at a border between periods A and B.
  • an ECM C is sent at a border between period A and period B.
  • an ECM C is sent at a border between period B and period C.
  • an ECM D is sent at a border between period B and period C.
  • the ECMs may be stored in a separate file. In this file it may also be indicated to which period an ECM belongs (which part of the recorded stream).
  • the packets in the MPEG stream file may be numbered. The number of the first packet of a period (SCB toggle 1402) may be stored alongside with the ECM for this same period 1403.
  • the ECM file may be generated during recording of the stream.
  • the ECM file is a file that may be created during the recording.
  • ECM packets may be located which may contain the Control Words needed to decrypt the video data. Every ECM may be used for a certain period, for instance 10 seconds, and may be transmitted (repeated) several times during this period (for instance 100 times).
  • the ECM file may contain every first new ECM of such a period.
  • the ECM data may be written into this file, and may be accompanied by some metadata. First of all, a serial number (counting up from 1) may be given.
  • the ECM file may contain the position of the SCB toggle. This may denote the first packet that can use this ECM to correctly decrypt its content. Then the position in time of this SCB toggle may follow as the third field. These three fields may be followed by the ECM packet data itself.
  • the position of a PES header may be easily detected because a PLUSI bit in the plaintext header of the packet may indicate its presence. If correct PES headers are only found during the first period (after the latency of the smartcard), the ECM contains one CW. If they are also found during the second period, it contains two CWs. Such a situation is depicted in Fig. 17.
  • Fig. 17 illustrates a situation for one CW detection and for two CW detection.
  • different periods 1403 of encrypted content 1700 are provided.
  • an ECM A may be decrypted to generate corresponding CWs.
  • decrypted content 1701 may be generated.
  • PES headers 1702 namely a PES header A in period A (left) and a PES header B in period B (right).
  • the area 1703 of period B for one CW in Fig. 17 indicates that the data is decrypted with the wrong key and therefore scrambled. This checking could be done while recording, in which case it will take for instance 20 to 30 seconds.
  • S Io w- forward which may also be denoted as slow motion forward is a mode in which the display picture runs at a lower than normal speed.
  • slow-forward is already possible with the technique explained above referring to Fig. 7 and Fig. 8. Setting the fast-forward speed to a value between zero and one results in a slow- forward stream based on a repetition of fast-forward trick-play GOPs. For a plaintext stream, this is a proper solution, but for an encrypted stream it may lead to the erroneous decryption of a part of the I-frame in certain specific conditions.
  • One option to solve this problem is not to repeat the fast-forward trick-play GOP but to extend the size of the trick-play GOP by the addition of empty P- frames. This technique in fact may also enable slow-reverse, because it is based on the trick- play GOPs used for fast-forward/reverse and therefore on the independently decodable I- frames.
  • Such an I-frame based slow-forward or slow-reverse may be inappropriate in special cases for the following reason.
  • the distance between I-frames in normal play is around half a second and for slow- forward/reverse it is multiplied with the slow motion factor. So this type of slow- forward or slow-reverse is not exactly what is usually understood as the slow motion but in fact more like a slide show with a large temporal distance between the successive pictures.
  • the display picture may be halted. This can be achieved by adding empty P-frames to the I-frame for the duration of the still picture mode. This means that the picture resulting from the last I-frame is halted. When switching from normal play to still picture, this can also be the nearest I-frame according to the data in the CPI file.
  • This technique is an extension of the fast-forward/reverse modes and results in nice still pictures especially if interlace kill is used. However, the positional accuracy is not always satisfactory when switching from normal play or slow- forward/reverse to still picture.
  • the still picture mode can be extended to implement a step mode.
  • the step command advances the stream to some next or previous I-frame.
  • the step size is at minimum one GOP but can also be set to a higher value equal to an integer number of GOPs.
  • Step forward and step backward are both possible in this case because only I-frames are used.
  • For the construction of a slow- forward stream many considerations apply. For example, the construction of a slow- forward stream on elementary stream level can only be performed on fully plaintext data. As a consequence, the slow-forward stream will be fully plaintext, even if the normal play stream was originally encrypted. Such a situation may be unacceptable to a copyright holder. Furthermore, this is worse than in the case of fast- forward/reverse stream because all information, i.e.
  • each and every frame is present in plaintext in the slow- forward stream and not just a subset of the frames as is the case for true fast-forward/reverse streams. Therefore a plaintext normal play stream can easily be reconstructed from a plaintext slow-forward stream. So the slow-forward stream should be encrypted if the normal play stream is encrypted. Since a DVB encryptor is not permissible in a consumer device this can only be realized if the slow- forward stream is constructed on transport stream level using the encrypted data packets from the originally transmitted encrypted data stream.
  • the device 1800 is adapted for processing an MPEG2 data stream comprising a plurality of frames, for instance a sequence of I-frames, P-frames and B-frames. At least one or a sequence of I-frames should be included in the MPEG2 data stream, all other frame types are optional.
  • a central processing unit or control unit may have access to the harddisk 1801 and may provide the harddisk 1801 with corresponding control signals so that data stored on the harddisk 1801 may be supplied to a first switch unit 1803.
  • control unit is under control of a human user operating a user input/output interface 1808.
  • a user interface 1808 may include a display, input means like a keypad, a joystick, a trackball, or the like and may allow a user to specify a mode according to which she or he wishes to reproduce content stored on the harddisk 1801. For instance, the user may adjust, via the user input/output unit 1808, parameters like volume, playback speed, a trick-play reproduction mode, equalizing, etc.
  • a data stream comprising a plurality of frames may be supplied from the harddisk 1801 to the first switch unit 1803 being under control of a detection unit 1802 which is in functional relationship with a delay unit 1804.
  • the detection unit 1802 is adapted for controlling the first switch unit 1803 to switch from a first reproduction mode (for instance normal-play, NP) to a second reproduction mode (for instance trick-play, TP, particularly slow- forward trick play).
  • NP normal-play
  • TP trick-play
  • the user may indicate via the user input/output unit 1808, that she or he wishes to switch from a normal play reproduction mode to a slow- forward reproduction mode.
  • Such a switch may be detected by the detection unit 1802.
  • the detection unit 1802 is adapted for detecting the first anchor frame after a switch from one reproduction mode to another one, triggered by a user.
  • the first switch unit 1803 When the detection unit 1802 has detected a switch between two operation modes, for instance a switch from a normal play mode to a slow-forward mode, the first switch unit 1803 may be controlled accordingly.
  • a normal play mode is present so that the output of the harddisk 1801 is coupled, via the first switch 1803, to a reproduction unit 1806 for reproducing the content in accordance with a normal play mode.
  • the reproduction unit 1806 which may include a display, a loudspeaker, an earpiece, or the like may have a communication connection with the control unit.
  • the first switch 1803 may be brought in another position so as to start the slow- forward mode.
  • the data stream to be played back is delayed by a delay unit 1804 for delaying the switch to the slow- forward mode by a delay time which corresponds to the time difference between the point of time of the switch and a point of time of the start of a next anchor frame in the sequence of the plurality of frames.
  • an anchor frame may be an I-frame or a P-frame (in the nomenclature of MPEG).
  • the data stream provided at an output of the harddisk 1801 is connected to input of a replication unit 1809 for repeating frames.
  • the replication unit 1809 which is also in communication with the control unit, is coupled with a correction unit 1805 for correcting a temporal reference of the plurality of frames after the delay time.
  • the correction unit 1805 corrects the temporal reference of these frames.
  • An output of the correction unit 1805 is coupled to the reproduction unit 1806 so as to provide the reproduction unit 1806 with the data modified in accordance with the slow-forward reproduction mode.
  • the first switching unit 1803 When a user operates the user input/output unit 1808 so as to go back to a normal play mode, then the first switching unit 1803 is brought back to the normal play reproduction state (shown in Fig. 18) so that, after a corresponding delay or waiting time, the normal play reproduction mode is continued.
  • the correction unit 1805 may perform features like reordering the sequence of frames to be played back, inserting (empty) frames so as to take into account a trick-play factor, or to repeat frames for delayed playback.
  • a second switch unit 1810 is foreseen which can be switched under control of an additional detection unit 1811 which, in turn, can be controlled via the user interface 1808.
  • Fig. 18 shows the second switch unit 1810 in a normal- play operation mode NP. However, by means of the additional detection unit 1811, the second switch unit 1810 can also be brought in a trick-play operation mode TP (not shown).
  • the additional detection unit 1811 may detect a GOP start after switch to NP, and may provide for a time correction when switching back to NP.
  • the first switch 1803 and the second switch 1810 are in the positions as is given in Fig. 18.
  • the data is read from harddisk 1801 and flows directly to the reproduction unit 1806.
  • the detection unit 1802 is triggered by the user interface unit indicating a switch to slow forward.
  • the detection unit 1802 searches for the first anchor frame after the user-induced switch and switches, when the anchor frame is found, the first switch 1803 to the position with the branch named TP.
  • the replication unit 1809 takes care of the replication of the frames.
  • the data flows through the correction unit 1805 to the reproduction unit 1806.
  • the detection unit 1802 is triggered by the user interface unit 1808 indicating a switch to normal play.
  • the detection unit 1802 searches for the first anchor frame after the user- induced switch and switches, when the anchor frame is found, the first switch 1803 to the position with the branch named NP.
  • the additional detection unit 1811 simultaneously starts looking for a GOP start.
  • the second switch 1810 should be set to the position where the correction unit 1805 is connected. When a GOP start is found, the second switch 1811 will return to the normal play position as is indicated in Fig. 18.
  • the second switch unit 1810 is optional. Although the processing of temporal references after a switch back is preferred, and might provide better visual results during the switch back to normal play, there may be other discontinuities during this switch back that cannot be compensated for. Good use of the discontinuity flags may be made, and because of this, a system without the second switch unit 1810 may also provide a proper performance.
  • Fig. 19 shows splitting of the packet at a frame boundary.
  • Fig. 19 illustrates a plurality of TS packets 1900 each comprising a header 1901 and a frame portion 1902.
  • a packet comprising a header 1901 and two subsequent frames 1902 is split up into two separate portions each having a separate header 1901 followed by an Adaptation Field 1903 and followed by the corresponding frame 1902.
  • Fig. 20 shows a slow-forward construction after decryption of normal play data.
  • Encrypted normal play data 2000 from a harddisk 2001 are supplied to a decrypter 2002 generating a plaintext stream 2003.
  • the plaintext stream 2003 is supplied to a frame splitting unit 2004 for splitting the different frames in a manner as shown in Fig. 19. Then, this data is supplied to a slow- forward construction unit 2005 constructing a slow- forward stream, which is then supplied to a set top box 2006.
  • the decryption and slow- forward mode of a stored fully encrypted stream 2000 or a stored hybrid stream is not difficult because no stream data is skipped or duplicated in the stream by the decrypter 2002.
  • the stored stream 2000 (fully encrypted or hybrid) is simply fed at a lower than normal rate through the decrypter 2002 which also means that there are no problems with embedded ECMs (Entitlement Control Messages).
  • the plaintext stream 2003 coming from the decrypter unit 2002 can then be used to split the packets or in fact to perform any necessary stream manipulation in the frame splitting unit 2004.
  • the resulting slow- forward stream is a plaintext stream in this case.
  • Fig. 21 The construction of an encrypted slow- forward stream from an encrypted normal play stream is performed on transport level because the use of a DVB (Digital Video Broadcasting) encryptors in consumer devices may not be allowed in special cases.
  • a hybrid stream (see Fig. 21) with only a few plaintext packets 2100 and 2102 on all frame boundaries are needed.
  • Fig. 21 furthermore shows encrypted packets 2101 which belong to the I-frames 2103, B-frames 2104 or P-frames 2105.
  • the decrypter unit 2002 in Fig. 20 may be a selective type that only decrypts the necessary packets. But preferably the stream is already stored as a hybrid stream as indicated in Fig. 22.
  • Fig. 22 illustrates slow-forward construction on a stored hybrid stream 2200.
  • no decryption unit 2002 is foreseen between the harddisk 2001 and the frame splitting unit 2004.
  • a decrypter unit 2201 may then be foreseen in the set top box 2006.
  • the plaintext packets 2100, 2102 in the hybrid stream should now also allow for the splitting of packets containing data from the two frames. This may be guaranteed by a criteria which will be described below in more detail. However, some part of the sequence header code or picture start code can still be located in an encrypted packet. In this case, an ideal splitting is not easily possible. In fact the split may be made between the encrypted and plaintext packets. Solutions for these problems will be described below in more detail.
  • Fig. 23 illustrates a data stream in which a previous frame 2300, a current frame 2312 and a next frame 2301 are shown. At the end of the previous frame 2300, three bytes of picture start code 2302 are provided. Furthermore, at the beginning of the current frame 2312 one byte of picture start code 2303 is foreseen.
  • the frame end of the packet before comprises one byte of picture start code 2304.
  • three bytes of picture start code 2305 are provided.
  • Fig. 23 shows that an incomplete picture start code may be present at the concatenation point. This may make a gluing necessary at a connection region 2306. Thus, gluing should be performed between the B-frame 2307 and a repetition of the B-frame 2308.
  • Fig. 23 particularly illustrates a packet header 2309, plaintext data 2310 and encrypted data 2311. In the example of Fig. 23, there is only one byte of the picture start code at the start and the end of the B-frame. As a result, two bytes are missing at the concatenation point.
  • the gluing algorithm which will be described below in more detail may heal such a problem.
  • the picture start code is split. This information may be obtained with a method that will be described below in more detail.
  • repetition of the frames will be described in more detail.
  • the decoder In a slow- forward mode, the decoder has somehow to be forced to repeat the display of a picture in accordance with the slow- forward factor. Empty P-frames may be used to force the repetition of a picture resulting from an I-frame.
  • This technique can also be applied for pictures resulting from P-frames. However, this technique cannot be easily applied for B-frames because empty P-frames always point to an anchor frame being an I- frame or a P-frame.
  • the repetition of the I- and P-frames may be enforced by the insertion in the transmission stream for empty P-frames after the original I-frame or P-frame.
  • Such a method may be used for the fast forward/reverse stream comprising I-frames followed by empty P- frames.
  • this method may be not absolutely correct for a stream that also includes B-frames, as in the case for a slow- forward stream constructed from a stored transmission stream with B-streams. Due to the reordering from transmission data to display stream, the I- frames and P-frames will be repeated in the wrong position thus disturbing the normal display order of the frames. This is illustrated in Fig. 24 and Fig. 25.
  • Fig. 24 illustrates the effect of reordering in normal play.
  • Fig. 24 shows a transmission order 2400 and a display order 2401. Particularly, Fig. 24 depicts the effect of reordering in normal play.
  • the top line shows a normal play transition stream 2400 with a GOP size of 12 frames comprising I-frames 2103, P-frames 2105 and B-frames 2104. The first four frames of the next transmission GOP are also shown for clarity.
  • the bottom line of Fig. 24 shows the stream 2401 after reordering to the display order.
  • the index indicates the display frame order.
  • the reordering may be performed as follows:
  • Anchor frames that is I-frames and P-frames
  • I-frames and P-frames are shifted to the position of the next anchor frame.
  • Fig. 25 shows the effect of reordering in slow- forward mode.
  • Fig. 25 illustrates the transmission order 2500, an order after the reordering 2501 and an order of the displayed pictures 2502.
  • the top line of Fig. 25 shows the transmission order 2500 of the first part of the slow-motion stream for this case, assuming a slow-motion factor of three.
  • Empty P-frames may be inserted after the I-frames and the P-frames, and the B-frames may be repeated.
  • the middle line of Fig. 25 shows the effect of the reordering.
  • FIG. 25 shows how the I-frames and the P-frames are repeated by the empty P-frames in this case.
  • An empty P-frame may result in a display picture that is a copy of the picture resulting from the previous anchor frame, which itself could also be an empty P-frame. It is visible in Fig. 25 that the normal display order 2502 indicated by the index is disturbed because the display of frame 14 is split up into two parts. Only the last time frame 14 is displayed in the correct position. This also means that the B-frames may be decoded erroneously.
  • Fig. 26 shows the insertion of empty P- frames before the anchor frames.
  • the three rows in Fig. 26 are similar to the three lines of Fig. 25.
  • the empty P-frames are inserted before the anchor frames in the transmitted stream extracted from the storage device as is shown in the top line 2500.
  • the empty P-frames are now positioned after the anchor frames. This is where they should be for a correct repetition of the anchor frames as is clear from the display pictures 2502 of Fig. 26.
  • it may be appropriate to avoid empty P- frames One is related to the propagation of errors within a GOP.
  • a data error during the transfer to the set top box results in coding errors and therefore disturbances in the picture. If this error is an anchor frame it propagates until the end of the GOP because subsequent P-frames depend on this anchor frame. Also the B-frames are affected because they use the pictures from the disturbed surrounding anchor frames for the decoding. This may have the consequence that the picture disturbances gradually increase towards the end of the GOP. This may be especially important for slow- forward where the GOP size can be very large and therefore very long in time. On the other hand, a data error in a B-frame has only a very limited effect because no other frames depend on it. So the picture disturbances are restrained to this B-frame and its repetitions.
  • empty B-frames Referring to the construction of empty frames, several types of empty B- frames can be constructed. They may have the advantage that no additional error propagation is introduced and that interlace kill can be used. Possible types of empty B-frames are the forward predictive empty B-frames
  • Bf frames backward predictive empty B-frames
  • Bb frames backward predictive empty B-frames
  • a B-frame is normally bi-directionally predictive, but uni-directional predictive B-frames can also exist. In the latter case they can be forward or backward predictive.
  • Forward predictive means that an anchor frame is used to predict the following B- frames during encoding. So the picture resulting from a forward predictive B-frame is reconstructed during decoding from the previous anchor frame. This means that the Bf- frame forces the repetition of the previous anchor frame. Therefore, it has the same effect as an empty P- or Pe-frame.
  • the Bb-frame has the opposite effect. It forces the display of the anchor frame following it. For both types of empty B-frames, an interlace kill version is possible as well.
  • FIG. 27 A first possibility on the basis of Bb-frames is depicted in Fig. 27.
  • the Bb-frames are inserted before the anchor frames and keep their position during the reordering.
  • the anchor frames are shifted to the position of the next anchor frame.
  • the Bb frame forces the display of the anchor frame following it in the reordered stream.
  • Bf- frames As shown in Fig. 28.
  • the Bf- frames are inserted after the anchor frames in the transmission stream.
  • the repeated display of the anchor frames in the reordered stream is forced by the Bf-frames that follow them.
  • Bf-frames are similar to the use of empty P-frames for the construction of fast-forward and fast-reverse streams. In fact the use of Bf-frames is also possible in that case thus commonising the trick-play generation even further. But when Bf- frames are used for fast-forward and fast-reverse, the effect of reordering should be considered. This means that some parameters in the fast-forward/reverse stream like PTS/DTS and temporal reference have to be chosen appropriately.
  • Fig. 29 illustrates a temporal reference 2900 for the transmission order 2902 and illustrates a temporal reference 2901 for a display order 2903.
  • the temporal references 2901 are a monotonously increasing series from 0 to 11. Due to the reordering, the temporal references of the anchor frames in the transmission stream are shifted.
  • Fig. 30 indicates the temporal reference for slow-forward with Bf-frames.
  • the top line of Fig. 30 indicates the frames taken from the normal play stream shown in Fig. 29 with the original temporal references.
  • the second line of Fig. 30 shows the insertion of Bf-frames and the repetition of the B-frames.
  • the original temporal references are shown above this line and how they should be below this line.
  • the third line of Fig. 30 shows the frames after reordering, and the bottom line of Fig. 30 shows the displayed pictures.
  • the temporal references of the reordered frames are shown below these lines. It forms an increasing series from 0 to 35.
  • the temporal references in the case of pre-insertion of B-frames or Pe- frames are depicted in Fig. 31 and in Fig. 32 for comparison.
  • Fig. 33 shows an example for the case that Bf- or Bb-frames are inserted (note that B B is not Bb).
  • Bf- or Bb-frames are inserted (note that B B is not Bb).
  • three types of B-frames are distinguished:
  • this B-frame is displayed after the last anchor frame preceding the P-frame in the transmission stream in front of this B-frame.
  • This last anchor frame is denoted by A L and can be an I-frame, a P-frame or an empty P-frame.
  • the temporal reference of the B-frame is equal to the temporal reference of the preceding B-frame increased by 1 :
  • T ⁇ B B T ⁇ B L ⁇ + 1 (7)
  • the anchor frames Due to the reordering, the anchor frames will be displayed after the sequence of B-frames following them in the transmission stream. So it is important to know how many B-frames will follow the I-frames and P-frames in the slow-forward stream to determine their new temporal reference. In the case of a varying GOP size or of a varying GOP structure this cannot be derived from history. In practice, a varying GOP structure is not common. Even for stations having a varying GOP size, the anchor frames will always be followed by the same amount of B-frames. Nevertheless, a varying GOP structure will be considered and is possible.
  • the number of B-frames that will follow an individual anchor frame in a transmitted slow- forward stream has to be determined. This can be calculated from the slow motion factor and the number of B-frames following this anchor frame in the original recorded stream, taking into account whether empty B-frames or empty P-frames are inserted. So this number of B-frames is determined somehow.
  • a possibility how this can be performed is to read all the data up to the next anchor frame but this demands for a substantial amount of buffering.
  • Another possibility avoiding this buffering is to store this information in the CPI file and extract it from there.
  • the number of B-frames can be easily derived from the distance in frames to the next anchor frame in the transmitted stream. In fact it is equal to this distance minus one. There are two ways to store this information in the CPI file:
  • the CPI file holds an entry for each frame including its type;
  • the CPI file holds an entry for each anchor frame that includes the distance in frames to the previous anchor frame.
  • the distance in frames to the next anchor frame can easily be counted in the CPI file.
  • the second case may seem a bit strange because the distance of the previous anchor frame is stored with the frame instead of the distance to the next anchor frame. This is chosen because the distance of the previous anchor frame is known at the moment that an anchor frame is received.
  • the distance from the current anchor frame to the next anchor frame is simply found by reading the distance information from the next anchor frame in the CPI file. This distance will be denoted by D and the slow motion factor will be denoted by L, both of which being an integer larger than zero (see Fig. 34).
  • Fig. 34 shows the distance D and the slow motion factor L for normal play 3400 and for slow- forward play 3401.
  • the factor L is therefore not the speed factor but the slow down factor.
  • the total number of B-frames following the anchor frame depends on the insertion of empty B-frames or P-frames. So it is distinguished between two situations, namely that empty B-frames (Bf or Bb) or empty P-frames (Pe) are inserted. In case no GOP header is present, the I-frame is treated as a P-frame.
  • the original distance to the next anchor frame is equal to D (see Fig. 34).
  • the distance to the next anchor frame in the slow- forward stream is equal to L x D.
  • the first B-frame following an I-frame has a temporal reference of zero (see Fig. 35). So the last B-frame following the I-frame has a temporal reference equal to L x D - 2.
  • the I-frame is the next one to be displayed, so its temporal reference is one higher. Then the temporal reference for the I-frames is given by:
  • the temporal reference for the P-frame also depends on the temporal reference of the previous anchor frame and the slow-forward stream.
  • This previous anchor frame I- frame or P-frame
  • a L temporal reference
  • T(A L ) temporal reference
  • the B-frame following the P-frame will be displayed after the previous anchor frame A L .
  • SO the temporal reference of this B-frame is equal to T ⁇ A L ⁇ + 1.
  • the temporal reference of the last B-frame following the P-frame is T ⁇ A L ⁇ + L X D - I .
  • the P-frame is the next one to be displayed so its temporal reference is one higher.
  • the temporal reference for the P-frames is given by:
  • the previous anchor frame will be denoted by A L and its temporal reference by T ⁇ A L ⁇ , see Fig. 38.
  • T (P) T (A L ) + L X (D - 1) + 1 (11)
  • a Pe-frame After the reordering, a Pe-frame will immediately follow a previous I-frame, P-frame, or Pe-frame, so a previous anchor frame. As a result, the temporal reference to the Pe-frame is always one higher than that of the previous anchor frame A L (see Fig. 39).
  • Fig. 40 shows a transition from a normal play mode 4000 to a slow-forward mode 4001.
  • a switching area 4002 is indicated by an arrow.
  • a transition area is indicated by an arrow 4003. From a point in time 4004 onwards, the situation is identical to a continuous slow-forward situation.
  • the top line of Fig. 40 indicates the stream of the decoder/renderer with a switching point after frame B3. It is also shown the insertion of Bf- frames and the repetition of the B-frames after the switching point for a slow motion factor of three.
  • the original temporal references of the stored normal play stream are shown above this line and the new temporal references needed for a continuous display below this line.
  • the second line shows the frames after reordering and the bottom line the displayed pictures. The reordered new temporal references are again shown below these lines.
  • the switching area 4002 indicates the area in which a switching command is received from the user. The fastest response would be to switch to slow- forward after the current frame. Assuming that the switching command is received during frame B2 as is indicated in Fig. 41, then switching could be done at the start of frame B3.
  • Fig. 41 indicates the time of a switching command 4100, the original temporal reference 4101, the transmission order 4102 and the new temporal reference 4103.
  • the switching is done at the start of frame B3, this means that frame B3 will be repeated.
  • the number of B-frames following frame 14 is changed. This change is even larger if the switching command is received during frame 14.
  • the temporal reference of an anchor frame depends on the number of B-frames following it in the transmission stream as was explained above.
  • the temporal reference of frame 14 should be changed for a correct display. In the example of Fig. 41 it is indicated that it should change from 2 to 4.
  • the slow- forward processing is started resulting in an insertion of Bf- frames after the anchor frame and a repetition of the B-frames.
  • new temporal references are calculated according to the method formulated above.
  • a transition area is indicated in Fig. 40 up to the start for next I-frame. This does not mean that something special has to be done during this area. It merely indicates that the resulting temporal references of the frames in this area are different from the temporal references in the continuous slow- forward situation.
  • Fig. 42 shows the situation where the switching command is received during a P-frame or the B-frame following it.
  • a top line of Fig. 42 again shows an original temporal reference 4101, a transmission order 4102, and a new temporal reference 4103. Furthermore, a middle line of Fig. 42 shows a sequence after reordering 4200, and a new temporal reference 4201. Furthermore, a bottom line of Fig. 42 shows the displayed pictures 4202 and the new temporal reference 4203.
  • the switching could occur at the start of the next anchor frame. So in general, the switching to slow- forward occurs at the start of the next anchor frame following the reception of the switching command.
  • Fig. 43 shows a situation when pre-insertion of Pe-frames is used to construct the slow- forward stream. Also here the switching to slow- forward occurs at the start of the first anchor frame following the reception of the switching command. In this case it is started with the insertion of the Pe-frames.
  • Fig. 44 shows the situation when pre-insertion of Bb-frames is used to construct the slow- forward stream. Also in this case the switching to slow- forward is performed at the start of the next anchor frame following the reception of the switching command. But, in contradiction to the situation with Pe-frames, no empty frames are inserted before this first anchor frame of the slow- forward stream.
  • pre-insertion of the Bb-frames would require a change of the temporal reference of the previous anchor frames due to the additional Bb-frames following this previous anchor frame.
  • pre- insertion of Bb-frames effectively delays the switching in the display stream by one frame with respect to post-insertion of Bf- frames or pre-insertion of Pe-frames.
  • Switching rules are derived for a normal play stream containing B-frames but can be applied to a normal play stream containing no such frames. In this case the switch occurs at a next frame boundary (minimum delay).
  • minimum delay For post-insertion of Bf-frames and pre- insertion of Pe-frames this means that the next picture to be displayed after the decoding delay is immediately part of the displayed slow- forward picture sequence.
  • the transition to slow- forward on the display screen is delayed by one additional frame time.
  • the temporal reference of frame 14 in the slow- forward stream is based on the number of B-frames that will follow it in this mode. Without additional buffering, this number cannot be changed and switching to normal play can only occur at the start of the next anchor frame, which is P7 in this example.
  • a transition area is indicated up to the start of the next I-frame in the transmission stream. It can be seen that in this transition area the needed temporal references (indicated below the frames in transmission order) are not identical to the original temporal references of the stored normal play stream. So the generation of temporal references as described above has to be continued until the start of the next I-frame.
  • the switching to normal play is therefore performed in two steps. The switching to the normal play data will occur at the start of the next anchor frame following the reception of the switching command, but the generation and correction of the temporal references is continued. The complete switch to normal play occurs at the start of the next I-frame.
  • Fig. 46 shows the situation where the switching command is received during a
  • Fig. 47 shows the situation when switching from slow-forward to normal play in the case a pre-insertion of Pe-frames is used to construct the slow- forward stream. It can be seen that as described before also in this case the switching has to be performed in two steps. Switching to the normal play data with a continuation of the temporal reference generation at the start of the first anchor frame following the switching command and fully switching at the start of the next I-frame. When the switching command is received during a Pe-frame, the same two steps switching method may be applied in this case. The first switching step then will occur at the start of the next frame because this will always be an anchor frame.
  • the first anchor frame following the switching command could also be an I-frame.
  • the transition area is absent in that case.
  • the switching from slow- forward to normal play is described in the relation to a normal play stream containing B- frames. Even if the normal play stream contains no B-frames, a slow- forward stream with B- frames results from the insertion of empty B-frames. In any case the same switching rules are also applied if the original normal play stream contains no B-frames.
  • the described switching method leads to a large switching delay from slow- forward to normal play especially for a large slow motion factor. This cannot be avoided if a correct temporal reference is wanted unless a serious amount of buffering is used. On the other hand, some discontinuities in other items will always occur.
  • fast-forward and fast-reverse a discontinuity in the PCR time base can be avoided when switching from normal play to slow- forward but never when switching the other way round unless a complete new PCR time base is also generated for the normal play stream.
  • this could of course be done but practically it is expected that at such a moment a maximum use is made of discontinuity flags of the MPEG stream. This could also help to reduce effects of incorrect temporal references, thus allowing a much faster switching from slow- forward to normal play.
  • Fig. 48 the splitting of the stream for one PES packet per frame is illustrated.
  • the data streams shown in Fig. 48 include plaintext packet headers 4800, Adaptation Fields 4801, plaintext data 4802, encrypted data 4803 and plaintext PES header 4804. Furthermore, a PLUSI present is denoted with reference numeral 4805, and a PES header is denoted with reference numeral 4806.
  • the individual frames comprise a number of complete original packets. So no packet splitting is necessary. This frame splitting could also be performed in a completely encrypted stream, but access to some plaintext data is still necessary for the construction of the slow- forward stream.
  • the splitting at the start of a packet with a PLUSI also means that there are no picture start codes that are spread over two packets. Each individual frame contains its own correct and complete picture start code. Therefore, no gluing activity is necessary in this case. However, in the case of one PES packet per GOP, the situation is different.
  • the split between frames is made at the picture start code of a new frame, unless a PES header precedes it.
  • the original stream is simultaneously researched for a packet with a PLUSI bit set, a picture start code and a picture coding extension;
  • the split is made at the start of this packet (see Fig. 49, including a picture start code 4900 and a picture code extension 4901). Subsequently, the stream is searched for the picture coding extension. After this is found, the search is continued as described in point 1.; 3. If the picture start code is encountered first, the split is made at the start of the picture start code. In many cases this means that the packet containing the picture start code has to be split in two packets of which the first is assigned to the previous frame and the second to the subsequent frame (see Fig. 50 illustrating splitting of a stream at the start of a picture start code 4900, wherein places of insertion of an Adaptation Field are denoted with reference numeral 5000).
  • Both packets are stuffed with an Adaptation Field 5000.
  • the payload of the second packet then starts with the picture start code 4900.
  • the recording time stamp of the original packet is copied to each of the two packets resulting from the split. Whether the two packets from the split or the original packet will be used at a concatenation point of two frames depends on the specific situation as will be explained below. Subsequently, the stream is searched for the picture coding extension 4901. After having found this, the search is continued as described in point 1.;
  • the picture start code must be undetectable because it is partially encrypted. This means that the current plaintext area starts with some bytes of the picture start code. In this case the split is made at the start of the first plaintext packet of the current plaintext area (see Fig. 51 showing the splitting of the stream within a picture start code 4900, and illustrating bytes of picture start code 5100 as well as picture code extension 4901).
  • the search which is described in point 1. is continued after having found picture coding extension 4901.
  • the described algorithm would also result in the correct splitting points for a stream with one PES packet per frame.
  • the algorithm is designed for application to plaintext streams as well as the hybrid streams mentioned above.
  • Gluing is only necessary in the case of incomplete picture start codes that can only result from point 4. of the given algorithm. So only point 4. leads to a non-ideal splitting point.
  • a plaintext stream contains only ideal splitting points because the picture start code is always found. So no gluing is necessary in this case. But hybrid streams will contain non- ideal splitting points.
  • a method described below may be used to determine how many bytes of the picture start code are on either side of the non-ideal splitting points. The effects of a non-ideal splitting point will be explained in detail hereinafter. Next, the situation will be considered that empty P-frames of any type are inserted at such a non-ideal splitting point. How to handle the first empty frame will be explained below.
  • a number of bytes equal to the part of the picture start code after the splitting point is removed from the picture start code of the first empty frame.
  • the intermediate empty frames are unchanged.
  • the last empty frame has to be corrected for the missing part of the picture start code of the subsequent frame. So this missing part may be added to the end of the last empty frame. No changes are necessary to empty frames that are inserted at ideal splitting points.
  • Fig. 52 illustrates incomplete picture start code at the concatenation point.
  • the start code may be 4 bytes in length
  • the start code may be 4 bytes in length
  • the number n for one frame and the number m for the subsequent frame may be determined with a method which will be illustrated below.
  • n and m represent the number of bytes of the picture start code at the end and start of a B-frame that has to be repeated. As a consequence, they also represent a number of bytes of the picture start code before and after an intermediate concatenation point.
  • the last packet of frame N is denoted with reference numeral 5300, and Fig. 53 further shows the first packet of frame N denoted with reference numeral 5301. No gluing action is necessary at a border 5302.
  • Fig. 54 shows the situation with n+m>4. This means that there are 1, 2 or 3 bytes too much at the concatenation point.
  • a number of bytes equal to n+m-4 is removed from the start of the second frame. This is accomplished by replacing these plaintext bytes by an Adaptation Field (AF) containing stuffing bytes. If an Adaptation Field is already present, its length has to be increased with m+n-4 and the data to be discarded is replaced by stuffing bytes that, according to the standard, have a hexadecimal value FF.
  • AF Adaptation Field
  • any of the embodiments described comprise implicit features, such as, an internal current supply, for example, a battery or an accumulator.
  • any reference signs placed in parentheses shall not be construed as limiting the claims.
  • the word “comprising” and “comprises”, and the like, does not exclude the presence of elements or steps other than those listed in any claim or the specification as a whole.
  • the singular reference of an element does not exclude the plural reference of such elements and vice- versa.

Abstract

A device (1800) for processing a data stream comprising a plurality of frames, wherein the device (1800) comprises a detection unit (1802) for detecting switching from a first reproduction mode to a second reproduction mode, a delay unit (1804) for delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames, and a correction unit (1805) for correcting a temporal reference of the plurality of frames.

Description

A device for and a method of processing a data stream comprising a plurality of frames
FIELD OF THE INVENTION
The invention relates to a device for processing a data stream comprising a plurality of frames.
The invention further relates to a method of processing a data stream comprising a plurality of frames.
The invention further relates to a program element.
The invention further relates to a computer-readable medium.
BACKGROUND OF THE INVENTION Electronic entertainment devices become more and more important.
Particularly, an increasing number of users buy hard disk based audio/video players and other entertainment equipment.
Since the reduction of storage space is an important issue in the field of audio/video players, audio and video data are often stored in a compressed manner, and for security reasons in an encrypted manner.
MPEG2 is a standard for the generic coding of moving pictures and associated audio and creates a video stream out of frame data that can be arranged in a specified order called the GOP ("Group Of Pictures") structure. An MPEG2 video bit stream is made up of a series of data frames encoding pictures. The three ways of encoding a picture are intra-coded (I picture), forward predictive (P picture) and bi-directional predictive (B picture). An intra- coded frame (I-frame) is an independently decodable frame. A forward predictive frame (P- frame) needs information of a preceding I-frame or P-frame. A bi-directional predictive frame (B-frame) is dependent on information of a preceding and/or subsequent I-frame or P- frame. It is an interesting function in a media playback device to switch from a normal reproduction mode, in which media content is played back in a normal speed, to a trick-play reproduction mode, in which media content is played back in a modified manner, for instance with a reduced speed ("slow forward"), a still picture, or vice versa. US 2003/0053540 Al discloses processing MPEG coded video data including groups of pictures (GOPs). Each group of pictures includes one or more I-frames and a plurality of B- or P-frames. To produce an MPEG slow-forward coded video stream, the coding type of each frame in the MPEG coded video data is identified, and freeze frames are inserted as a predefined function of the identified coding type and as a predefined function of a desired slow down factor. In one implementation, for a slow-down factor of n, for each original I- or P-frame, (n-1) backward-predicted freeze frames are inserted, and for each original B-frame, (n-1) copies of the original B-frames are added, and a selected amount of padding is added to each copy of each original B-frame in order to obtain a normal play bit rate and avoid video buffer overflow or underflow.
BRIEF SUMMARY OF THE INVENTION
It is an object of the invention to enable efficient processing of a data stream. In order to achieve the object defined above, a device for processing a data stream comprising a plurality of frames, a method of processing a data stream comprising a plurality of frames, a program element and a computer-readable medium according to the independent claims are provided.
According to an exemplary embodiment of the invention, a device for processing a data stream comprising a plurality of frames is provided, wherein the device comprises a detection unit for detecting switching from a first reproduction mode to a second reproduction mode, a delay unit for delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames, and a correction unit for correcting a temporal reference of the plurality of frames. According to another exemplary embodiment of the invention, a method of processing a data stream comprising a plurality of frames is provided, the method comprising detecting switching from a first reproduction mode to a second reproduction mode, delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames, and correcting a temporal reference of the plurality of frames.
Beyond this, according to another exemplary embodiment of the invention, a computer-readable medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, is adapted to control or carry out the above-mentioned method. Moreover, according to still another exemplary embodiment of the invention, a program element is provided, which program element, when being executed by a processor, is adapted to control or carry out the above-mentioned method.
The data processing according to the invention can be realized by a computer program, that is to say by software, or by using one or more special electronic optimization circuits, that is to say in hardware, or in hybrid form, that is to say by means of software components and hardware components.
According to an exemplary embodiment of the invention, a switch is detected from one playback operation mode to another playback operation mode, more particularly from a normal play mode to a trick-play mode, and still more particularly from a normal play mode to a slow-forward mode, or vice versa. After having detected such a switch, measures may be taken to smoothly change the operation mode so as to achieve proper audio and/or video playback quality for a user. Since switching between different playback modes may include reordering different frames, repeating frames or, more generally, modifying the velocity of playback, such a switch should be properly synchronized with the altered characteristics of playing back the frames.
In order to achieve a smooth transition, the switch from the outgoing mode to the incoming mode may be delayed, wherein the delay time may be selected so that the next anchor frame of the frame sequence is awaited. The waiting time may be used for calculations related to a necessary correction of the timing or other temporal references between the subsequent frames. A proper playback quality may be obtained when the start of the new operation mode is postponed until a start of a next anchor frame (in the MPEG2 technology the start of the next I-frame or P-frame). Such an anchor frame may be a frame which may serve as an anchor for playing back content with a sufficient degree of independency from other frames.
Such a procedure may simultaneously allow a fast switch to a new desired operation mode, a proper quality and sufficient time for a processor to calculate a sequence of frames in accordance with the new operation mode and in accordance with the requirements of a transition between the two playback modes. Particularly, an exemplary embodiment of the invention provides correction of temporal references and switching effects for slow-forward. Thus, a scheme for correcting possibly incorrect temporal reference of MPEG frames is provided, which may occur when there is a switch from normal play mode to slow-forward mode, or vice versa, in an MPEG decoder. For this purpose, the switching may be delayed until the start of a next anchor frame. Then, the temporal references may be corrected during a transition phase from normal play mode to slow-forward mode, or vice versa.
An exemplary embodiment of the invention relates to a storage device for storing MPEG transport streams with a digital interface to an MPEG compliant decoder that is capable of providing an MPEG compliant stream during the transition phase from normal play to slow- forward, or vice versa. During the transition phase, the temporal reference of the MPEG frames may be incorrect. Due to reordering the temporal reference of already transmitted frames cannot be corrected without using large buffers, which would add cost and latency. To solve this, the storage device may delay the switching point to slow-forward until the start of a next anchor frame (which may be an I-frame or a P-frame in the MPEG terminology). Furthermore, the temporal references are corrected during a transition phase from normal play to slow-forward. Thereafter, a slow-forward stream may be created. When switching from slow- forward back to normal play, the switching point may be delayed until the start of the next anchor frame after receiving the switching command and the correction of the temporal references may be continued until the start of the next I-frame. Thereafter, the normal play stream may be transmitted.
However, the first or the second reproduction mode may be a standstill mode which may be interpreted as a very slow (or an infinite) slow- forward mode. Therefore, when switching between a normal play mode and a standstill mode, or vice versa, or when switching between a slow- forward or slow-backward mode and a standstill mode, methods of correcting the temporal references of the frames according to an exemplary embodiment of the invention may be applied as well.
Embodiments of the invention may therefore solve problems related to the temporal reference of frames in a switching regime. The term "anchor frame" may particularly denote a frame which, in transmission order and/or in display order, keeps its relative temporal position with respect to other anchor frames. In the context of MPEG2, 1-frames and P-frames may be denoted as anchor frames. In contrast to this, B-frames would not be denoted as anchor frames in the context of MPEG2. At least two types of empty B-frames may be distinguished. Empty forward predictive B-frames (so-called Bf- frames) may particularly denote frames referring to an anchor frame preceding the Bf- frame in a display mode. Empty backward predictive B- frames (so-called Bb-frames) may particularly denote frames referring to an anchor frame following the Bb-frame in a display mode. Bf-frames and Bb-frames particularly differ concerning the property at which position they should be inserted in a data stream.
Therefore, switching effects occurring between different reproduction modes may be taken into account according to an exemplary embodiment of the invention. Particularly, the use of a so-called Bf-empty frame may be even more advantageous as compared to the use of empty P-frames or of Bb-frames. This phenomenon will be described below in more detail with reference to Fig. 42 to Fig. 44. As can be taken from Fig. 42, the command to switch to slow- forward may be received somewhere in the time period indicated with the switching area 4002. Both Bf-frames and Pe-frames may result in a slow- forward effect starting with the repetition of the first anchor frame (in display order) following the reception of the slow- forward command. In the embodiment of Fig. 42 and Fig. 43, this is anchor frame P7.
Using Bb-frames may result in a slow- forward stream starting one frame later, as can be taken from Fig. 44. Repetition starts with the first B-frame following the first anchor frame (B8 in the example of Fig. 44). So one advantage of using Bf-frames is that it is possible to start one frame sooner with the slow- forward stream than when Bb-frames are used.
A further advantage may be that, when constructing the slow- forward stream, and when an anchor frame is encountered, only Bf-frames are post-inserted (that is to say after the anchor frame). Both Bb-frames and Pe-frames are pre-inserted. Therefore, when using Bf-frames, the anchor frame may not need to be kept in memory after it was read, but can be played out immediately, thus supporting an improved memory usage during slow- forward trick-play reconstruction.
Next, further exemplary embodiments of the invention will be described. In the following, further exemplary embodiments of the device for processing a data stream comprising a plurality of frames will be described. However, these embodiments also apply for the method of processing a data stream comprising a plurality of frames, for the program element and for the computer-readable medium.
The first reproduction mode or the second reproduction mode may be a normal play mode. Furthermore, the other one of the first reproduction mode and the second reproduction mode may be a trick-play mode. In other words, the switching may be between a normal play mode and a trick-play mode, in both directions. When entering a trick-play mode, playback parameters like the playback speed may be modified in accordance with a (predetermined or user-defined) trick-play factor, for instance a slow-motion factor or a fast- forward factor. Thus, the trick-play mode may be a slow-forward reproduction mode, a slow- reverse reproduction mode, a freeze frame reproduction mode, a standstill reproduction mode or an instant replay reproduction mode. However, the invention is not restricted to these trick-play modes but may be applied to other trick-play modes as well, like fast-forward or fast-reverse.
The frame of the data stream to be processed may include at least one of the group consisting of an intra-coded frame (I-frame), a forward predictive frame (P-frame) and a bi-directional predictive frame (B-frame). Such frames may be part of an MPEG2 video bit stream. An intra-coded frame (I-frame) is an independently decodable frame. A forward predictive frame (P-frame) needs information of a preceding I-frame or P-frame. A bidirectional predictive frame (B-frame) may be dependent on information of a preceding and/or of a subsequent I-frame or P-frame.
Particularly, the anchor frame may be an intra-coded frame or a forward predictive frame, since these kinds of frames may have a higher degree of independence of other frames as compared to a B-frame.
The correction unit may be adapted for correcting a sequence of the plurality of frames by means of an empty forward predictive bi-directional predictive frame (a so- called Bf frame). By inserting empty frames, that is to say frames without independent information, in the sequence of frames to be reproduced, a reduced playback velocity with low computational burden for complex calculations of intermediate frames may be obtained. However, the correction unit may be adapted for correcting a sequence of the plurality of frames by means of an empty backward predictive bi-directional predictive frame (a so-called Bb-frame). It may be more desirable to play back a Bf- frame after the anchor frame, as has been described above. Furthermore the correction unit may be adapted for correcting a sequence of the plurality of frames by means of an empty forward predictive frame (a so-called Pe- frame).
The correction unit may be adapted for correcting the temporal reference of the plurality of frames during a transition phase after the delay. Between the delay time and a time interval in which the data is presented in the second reproduction mode, a transition phase occurs in which the timing of the frames may be adjusted to the modified reproduction mode. Particularly, the correction unit may be adapted for correcting the temporal reference of the plurality of frames in such a manner that an order of the plurality of frames is corrected. Thus, the altered time dependencies between the different frames, or the different numbers assigned to the frames and related to the order of playback, may require a reordered time sequence of the plurality of frames.
The device may comprise an insertion unit adapted for inserting frames, particularly empty frames, after having switched (and/or during switching) from the first reproduction mode to the second reproduction mode. Such an insertion may take into account modified requirements when switching from one reproduction mode to another one.
The insertion unit may be adapted to insert forward predictive frames and/or bi-directional predictive frame as the empty frames. Therefore, the playback velocity may be modified by insertion of frames having low data storage requirements, for instance as compared to an I-frame. This may allow for a calculation of the played back stream with low computational burden, and may reduce the storage capacities and the computing resources involved.
The device may comprise a repetition unit adapted for repeating frames after having switched from the first reproduction mode to the second reproduction mode. Therefore, it is also possible to simply repeat frames several times so as to achieve a desired reproduction factor.
The repetition unit may be adapted for repeating forward predictive frames and/or bi-directional predictive frames. This is advantageous from the storage memory point of view, that is to say may keep the required storage capacities in a reasonable limit. The device may comprise a storage unit for storing the data stream. Such a storage unit may be a harddisk, a flash card or any other data carrier like a CD or a DVD. However, the storage unit may also be an Internet server to which the device has (network-)access for downloading required information.
The device may further be adapted to process a plaintext data stream, a fully encrypted data stream or a mixture of encrypted parts and plaintext parts (a so-called hybrid stream). In other words, the entire data streams may be entirely encrypted or decrypted or may be a combination of both. Thus, decryptors and/or encryptors may be foreseen at appropriate positions of a data processing system according to an embodiment of the invention. The device may further be adapted to process a data stream of video data or audio data. However, such content is not the only type of data which may be processed with the scheme according to embodiments of the invention. Trick-play generation and similar applications may be an issue for both, video processing and (pure) audio processing.
The device may further be adapted to process a data stream of digital data. Furthermore, the device may comprise a reproduction unit for reproducing the processed data stream. Such a reproduction unit may comprise a loudspeaker or earphones and/or an optical display device so that both, audio and visual data can be reproduced perceivable for a human being. The device may comprise a generation unit for processing the data stream for reproduction in a trick-play reproduction mode. Such a generation unit may be adjusted or controlled by a user by selecting corresponding options in a user interface, for instance buttons of a device, a keypad or a remote control. The trick-play reproduction mode selected by a user may be one of the group consisting of a fast-forward reproduction mode, a fast- reverse reproduction mode, a slow- forward reproduction mode, a slow-backward reproduction mode, a freeze frame reproduction mode, an instant reproduction mode, and a reverse reproduction mode. Other trick-play streams are however possible. For trick-play, only a portion of subsequent data shall be used for output (for instance for visual display and/or for acoustical output) or one and the same content shall be used several times. The device according to exemplary embodiments of the invention may be adapted to process an MPEG2 data stream. MPEG2 is a designation for a group of audio and video coding standards agreed upon by MPEG (moving pictures experts group), and published as the ISO/IEC 13818 International Standard. For example, MPEG2 is used to encode audio and video broadcast signals including digital satellite and cable TV, but may also be used for DVD.
However, the device according to exemplary embodiments of the invention may also be adapted to process an MPEG4 data stream. More generally, any codec scheme may be implemented which uses anchor frames from which other frames are dependent, particularly any type of encoding using predictive frames and thus any kind of MPEG encoding/decoding.
The device according to embodiments of the invention may be realized as one of the group consisting of digital video recording device, a network-enabled device, a conditional access system, a portable audio player, a portable video player, a mobile phone, a DVD player, a CD player, a hard disk based media player, an Internet radio device, a public entertainment device, and an MP3 player. However, these applications are only exemplary. The aspects defined above and further aspects of the invention are apparent from the examples of embodiment to be described hereinafter and are explained with reference to these examples of embodiment. BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be described in more detail hereinafter with reference to examples of embodiment but to which the invention is not limited.
Fig. 1 illustrates a time-stamped transport stream packet. Fig. 2 shows an MPEG2 group of picture structure with intra-coded frames and forward predictive frames.
Fig. 3 illustrates an MPE G2 group of picture structure with intra-coded frames, forward predictive frames and bi-directional predictive frames.
Fig. 4 illustrates a structure of a characteristic point information file and stored stream content.
Fig. 5 illustrates a system for trick-play on a plaintext stream.
Fig. 6 illustrates time compression in trick-play.
Fig. 7 illustrates trick-play with fractional distance.
Fig. 8 illustrates low speed trick-play. Fig. 9 illustrates a general conditional access system structure.
Fig. 10 illustrates a digital video broadcasting encrypted transport stream packet.
Fig. 11 illustrates a transport stream packet header of the digital video broadcasting encrypted transport stream packet of Fig. 10. Fig. 12 illustrates a system allowing the performance of trick-play on a fully encrypted stream.
Fig. 13 illustrates a full transport stream and a partial transport stream.
Fig. 14 illustrates Entitlement Control Messages for a stream type I and for a stream type II. Fig. 15 illustrates writing Control Words to a decrypter.
Fig. 16 illustrates Entitlement Control Message handling in a fast forward mode.
Fig. 17 illustrates detection of one or two Control Words.
Fig. 18 illustrates a device for processing a data stream according to an exemplary embodiment.
Fig. 19 illustrates splitting of the packet at a frame boundary.
Fig. 20 illustrates slow- forward construction after decryption of normal play data. Fig. 21 illustrates a hybrid stream with plaintext packets on each frame boundary.
Fig. 22 illustrates slow-forward construction on a stored hybrid stream.
Fig. 23 illustrates an incomplete picture start code at the concatenation point. Fig. 24 illustrates the effect of reordering in normal play.
Fig. 25 illustrates the effect of reordering in slow-forward mode.
Fig. 26 illustrates the insertion of empty P-frames before the anchor frames.
Fig. 27 illustrates the use of backward predictive empty B-frames.
Fig. 28 illustrates the use of forward predictive empty B-frames. Fig. 29 illustrates a temporal reference for normal play.
Fig. 30 illustrates a temporal reference for slow-forward with Bf-frames.
Fig. 31 illustrates a temporal reference for pre-insertion of Bb-frames.
Fig. 32 illustrates a temporal reference for pre-insertion of Pe-frames.
Fig. 33 illustrates a temporal reference for three types of B-frames. Fig. 34 illustrates a distance D and a slow motion factor L for a normal play and a slow-forward stream.
Fig. 35 illustrates a temporal reference for the I-frame with empty B-frames used.
Fig. 36 illustrates a temporal reference for the P-frame when empty B-frames are used.
Fig. 37 illustrates a temporal reference for the I-frame when empty P-frames are used.
Fig. 38 illustrates a temporal reference for the P-frame when empty P-frames are used. Fig. 39 illustrates a temporal reference for empty P-frames.
Fig. 40 illustrates switching from normal play to slow- forward at the start of a GOP.
Fig. 41 illustrates switching at the first frame after the switching command.
Fig. 42 illustrates switching from normal play to slow-forward along a GOP. Fig. 43 illustrates switching from normal play to slow-forward with Pe-frames.
Fig. 44 illustrates switching from normal play to slow-forward with Bb- frames.
Fig. 45 illustrates switching from slow- forward to normal play at the start of a GOP. Fig. 46 illustrates switching from slow-forward to normal play along a GOP.
Fig. 47 illustrates switching from slow-forward to normal play with Pe-frames.
Fig. 48 illustrates the splitting of the stream for one PES packet per frame.
Fig. 49 illustrates the splitting of the stream at the start of a PES header. Fig. 50 illustrates the splitting of the stream at the start of a Picture Start Code.
Fig. 51 illustrates the splitting of the stream within a Picture Start Code.
Fig. 52 illustrates an incomplete picture start code at the concatenation point.
Fig. 53 illustrates an example of n+m=4.
Fig. 54 illustrates an example of n+m>4. Fig. 55 illustrates an example of n+m<4.
The Figures are schematically drawn and not true to scale, and the identical reference numerals in different Figures refer to corresponding elements. It will be clear for those skilled in the art, that alternative but equivalent embodiments of the invention are possible without deviating from the true inventive concept, and that the scope of the invention will be limited by the claims only.
DETAILED DESCRIPTION OF THE INVENTION
In the following, referring to Fig. 1 to Fig. 13, different aspects of trick-play implementation for transport streams according to exemplary embodiments of the invention will be described.
Particularly, several possibilities to perform trick-play on an MPEG2 encoded stream will be described, which may be partly or totally encrypted, or non-encrypted. The following description will target methods specific to the MPE G2 transport stream format. However, the invention is not restricted to this format. Experiments were actually done with an extension, the so-called time-stamped transport stream. This comprises transport stream packets, all of which are pre-pended with a 4 bytes header in which the transport stream packet arrival time is placed. This time may be derived from the value of the program clock reference (PCR) time-base at the time the first byte of the packet is received at the recording device. This is a proper method to store the timing information with the stream, so that playback of the stream becomes a relatively easy process.
One problem during playback is to ensure that the MPEG2 decoder buffer will not overrun nor underflow. If the input stream was compliant to the decoder buffer model, restoring the relative timing ensures that the output stream is also compliant. Some of the trick-play methods described herein are independent of the time stamp and perform equally well on transport streams with and without time stamps.
Fig. 1 illustrates a time stamped transport stream packet 100 having a total length 104 of 188 Bytes and comprising a time stamp 101 having a length 105 of 4 Bytes, a packet header 102, and a packet payload 103 having a length of 184 Bytes.
This following description will give an overview of the possibilities to create an MPEG/DVB (digital video broadcasting) compliant trick-play stream from a recorded transport stream and intends to cover the full spectrum of recorded streams from those that are completely plaintext, so every bit of data can be manipulated, to streams that are completely encrypted (for instance according to the DVB scheme), so that only transport stream headers and some tables may be accessible for manipulation.
When creating trick-play for an MPEG/DVB transport stream, problems may arise when the content is at least partially encrypted. It may not be possible to descend to the elementary stream level, which is the usual approach, or even access any packetized elementary stream (PES) headers before decryption. This also means that finding picture frames is not possible. Known trick-play engines need to be able to access and process this information.
In the frame of this description, the term "ECM" denotes an Entitlement Control Message. This message may particularly comprise secret provider proprietary information and may, among others, contain encrypted Control Words (CW) needed to decrypt the MPEG stream. Typically, Control Words expire in 10-20 seconds. The ECMs are embedded in packets in the transport stream.
In the frame of this description, the term "keys" particularly denotes data that may be stored in a smart card and may be transferred to the smart card using EMMs, that is so-called "Entitlement Management Messages" that may be embedded in the transport stream. These keys may be used by the smart card to decrypt the Control Words present in the ECM. An exemplary validity period of such a key is one month. In the frame of this description, the term "Control Words" (CW) particularly denotes decryption information needed to decrypt actual content. Control words may be decrypted by the smart card and then stored in a memory of the decryption core.
Some aspects related to trick-play on plaintext streams will now be described.
It is preferable that any MPEG2 streams created are MPEG2 compliant transport streams. This is because the decoder may not only be integrated within a device, but may also be connected via a standard digital interface, such as an IEEE 1394 interface, for example.
Account should also be taken of any problems that may occur when using a video coding technique like MPEG2 that exploits the temporal redundancy of video to achieve high compression ratios. Frames may no longer be decoded independently.
A structure of a plurality of groups of pictures (GOPs) is shown in Fig. 2. Particularly, Fig. 2 shows a stream 200 comprising several MPEG2 GOP structures with a sequence of I-frames 201 and P-frames 202. The GOP size is denoted with reference numeral 203. The GOP size 203 is set to 12 frames, and only I-frames 201 and P-frames 202 are shown here. In MPEG, a GOP structure may be used in which only the first frame is coded independently of other frames. This is the so-called intra-coded or I-frame 201. The predictive frames or P-frames 202 are coded with a unidirectional prediction, meaning that they only rely on the previous I-frame 201 or P-frame 202 as indicated by arrows 204 in Figure 2. Such a GOP structure has typically a size of 12 or 16 frames 201, 202. Another structure 300 of a plurality of GOPs is shown in Fig. 3. Particularly, Fig. 3 shows the MPEG2 GOP structure with a sequence of I-frames 201, P-frames 202 and B-frames 301. The GOP size is again denoted with reference numeral 203.
It is possible to use a GOP structure containing also bi-directionally predictive frames or B-frames 301 as shown in Fig. 3. A GOP size 203 of 12 frames is chosen for the example. The B-frames 301 are coded with a bi-directional prediction, meaning that they rely on a previous and a next I- or P-frame 201, 202 as indicated for some B-frames 301 by curved arrows 204. The transmission order of the compressed frames may be not the same as the order in which they are displayed.
To decode a B-frame 301, both reference frames before and after the B-frame 301 (in display order) are needed. To minimize the buffer demand in a decoder, the compressed frames may be reordered. So in transmission, the reference frames may come first. The reordered stream, as it is transmitted, is also shown in Fig. 3, lower part. The reordering is indicated by straight arrows 302. A stream containing B-frames 301 can give a nice looking trick-play picture if all the B-frames 301 are skipped. For the present example, this leads to a trick-play speed of 3x forward.
Even if an MPEG2 stream is not encrypted (that is to say plaintext), trick-play is not trivial. The possibility of a slow-reverse based on I-frames only is briefly mentioned. An efficient frame based slow-reverse is more difficult though, due to the necessary inversion of the MPEG2 GOP. Slow-forward which is also known as slow motion forward is a mode in which the display picture runs at a lower than normal speed. A rudimentary form of slow- forward is already possible with the technique making use of a fast-forward algorithm that generates trick-play GOPs. Setting the fast-forward speed to a value between zero and one results in a slow- forward stream based on a repetition of fast-forward trick-play GOPs. For a plaintext stream this is no problem but for an encrypted stream it can lead to the erroneous decryption of part of the I-frame in certain specific conditions. There are several options to solve this problem but the most suitable way is not to repeat the fast-forward trick-play GOP but to extend the size of the trick-play GOP by the addition of empty P-frames. This technique in fact also enables slow-reverse, because it is based on the trick-play GOPs used for fast-forward/reverse and therefore on the independently decodable I-frames. However, it is not preferred to make use of this kind of I-frame based slow- forward or slow-reverse for the following reason. The distance between I-frames in normal play is around half a second and for slow- forward/reverse it is multiplied with the slow motion factor. So this type of slow- forward or slow-reverse is not really the slow motion consumers are used to but in fact it is more like a slide show with a large temporal distance between the successive pictures.
In another trick-play mode called still picture mode the display picture is halted. This can be achieved by adding empty P-frames to the I-frame for the duration of the still picture mode. This means that the picture resulting from the last I-frame is halted. When switching to still picture from normal play, this can also be the nearest I-frame according to the data in the CPI file. This technique is an extension of the fast-forward/reverse modes and results in nice still pictures especially if interlace kill is used. However the positional accuracy is often not sufficient when switching from normal play or slow- forward/reverse to still picture.
The still picture mode can be extended to implement a step mode. The step command advances the stream to some next or previous I-frame. The step size is at minimum one GOP but can also be set to a higher value equal to an integer number of GOPs. Step forward and step backward are both possible in this case because only I-frames are used. The slow- forward can also be based on a repetition of every frame, which results in a much smoother slow motion. The best form of slow- forward would in fact be a repetition of fields instead of frames because the temporal resolution is doubled and there are no interlace artifacts. This is however practically impossible for the intrinsically frame based MPEG2 streams and even more so if they are largely encrypted. The interlace artifacts can be significantly reduced for the I- and P-frames by using special empty frames to force the repetition. Such an interlace reduction technique is not available for the B-frames though. Whether the use of interlace kill for the I- and P-frames is still advantageous in this case or in fact leads to a more annoying picture for the viewer can only be verified by experiments.
Slow-reverse on the basis of individual frames is in fact very complicated for MPEG signals due to the temporal predictions. A complete GOP has to be buffered and reversed. There is no simple method that we know of to recode the frames in a GOP to the reverse order. So an almost complete decoding and encoding might be necessary with an inversion of the frame order between these two. This asks for the buffering of a complete decoded GOP as well as an MPEG decoder and encoder.
Still picture mode can be defined as an extension of the frame-based slow- forward mode. It is based on a repeated display of the current frame for the duration of the still picture mode whatever the type of this frame is. This is in fact a slow- forward with an infinite slow motion factor if this indicates the factor with which the normal play stream is slowed down. No interlace kill is possible if the picture is halted on a B-frame. In that sense this still picture mode is worse than the trick-play GOP based still picture mode. This can be corrected by only halting the picture at an I- or P- frame at the cost of a somewhat less accurate still picture position. Discontinuities in the temporal reference and the PTS can also be avoided in this case. Moreover, the bit rate is significantly reduced because the repetition of an I- or P-frame is forced by the insertion of empty frames instead of a repetition of the frame data itself as is necessary for the B-frames. So, technically speaking, the halting of a picture at an I- or P-frame is the best choice.
The still picture mode can also be extended with a step mode. The step command advances the stream in principle to the next frame. Larger step sizes are possible by stepping to the next P-frame or some next I-frame. A step backward on frame basis is not possible. The only option is to step backward to one of the previous I-frames. Two types of still picture mode have been mentioned, namely trick-play GOP based and frame based. The first one is most logically connected to fast-forward/reverse whereas the second one is related to slow-forward. When switching from some mode to still picture, it is preferable to choose the related still picture mode to minimize the switching delay. The streams resulting from both methods look very alike because they are both based on the insertion of empty frames to force the repetition of an anchor frame. But on detailed stream construction level there are some differences.
In the following, some aspects related to a CPI ("characteristic point information") file will be described. Finding I-frames in a stream usually requires parsing the stream, to find the frame headers. Locating the positions where the I-frame starts can be done while the recording is being made, or off-line after the recording is completed, or semi on-line, in fact being off-line but with a small delay with respect to the moment of recording. The I-frame end can be found by detecting the start of the next P-frame or B-frame. The meta-data derived this way can be stored in a separate but coupled file that may be denoted as characteristic point information file or CPI file. This file may contain pointers to the start and eventually end of each I-frame in the transport stream file. Each individual recording may have its own CPI file. The structure of a characteristic point information file 400 is visualized in Fig.
4.
Apart from the CPI file 400, stored information 401 is shown. The CPI file 400 may also contain some other data that are not discussed here.
With the data from the CPI file 400 it is possible to jump to the start of any I- frame 201 in the stream. If the CPI file 400 also contains the end of the I-frames 201, the amount of data to read from the transport stream file is exactly known to get a complete I- frame 201. If for some reason the I-frame end is not known, the entire GOP or at least a large part of the GOP data is to be read to be sure that the entire I-frame 201 is read. The end of the GOP is given by the start of the next I-frame 201. It is known from measurements that the amount of I-frame data can be 40% or more of the total GOP data.
It is known that reducing the trick-play picture refresh rate can be achieved by displaying each I-frame 201 several times. The bit rate will be reduced accordingly. This may be achieved by adding so-called empty P-frames 202 between the I-frames 201. Such an empty P-frame 202 is not really empty but may contain data instructing the decoder to repeat the previous frame. This has a limited bit cost, which can in many cases be neglected compared to an I-frame 201. From experiments it is known that trick-play GOP structures like IPP or IPPP may be acceptable for the trick-play picture quality and even advantageous at high trick-play speeds. The resulting trick-play bit rate is of the same order as the normal play bit rate. It is also mentioned that these structures may reduce the required sustained bandwidth from the storage device.
Here some aspects related to timing issues and stream construction will be described.
A trick-play system 500 is schematically depicted in Fig. 5. The trick-play system 500 comprises a recording unit 501, an I-frame selection unit 502, a trick-play generation block 503 and an MPEG2 decoder 504. The trick-play generation block 503 includes a parsing unit 505, an adding unit 506, a packetizer unit 507, a table memory unit 508 and a multiplexer 509. The recording unit 501 provides the I-frame selection unit 502 with plaintext
MPEG2 data 510. The multiplexer 509 provides the MPEG2 decoder 504 with an MPEG2 DVB compliant transport stream 511.
The I-frame selector 502 reads specific I-frames 201 from the storage device 501. Which I-frames 201 are chosen depends on the trick-play speed as will be described below. The retrieved I-frames 201 are used to construct an MPEG-2/DVB compliant trick- play stream that is then sent to the MPEG-2 decoder 504 for decoding and rendering.
The position of the I-frame packets in the trick-play stream cannot be coupled to the relative timing of the original transport stream. In trick-play, the time axis may be compressed or expanded with the speed factor and additionally inversed for reverse trick- play. Therefore, the time stamps of the original time stamped transport stream may not be suitable for trick-play generation.
Moreover, the original PCR time base may be disturbing for trick-play. First of all it is not guaranteed that a PCR will be available within the selected I-frame 201. But even more important is that the frequency of the PCR time base would be changed. According to the MPEG2 specification, this frequency should be within 30 ppm from 27
MHz. The original PCR time base fulfils this requirement, but if used for trick-play it would be multiplied by the trick-play speed factor. For reverse trick-play this even leads to a time base running in the wrong direction. Therefore, the old PCR time base has to be removed and a new one added to the trick-play stream. Finally, I-frames 201 normally contain two time stamps that tell the decoder
504 when to start decoding the frame (decoding time stamp, DTS) and when to start presenting, for instance displaying, it (presentation time stamp, PTS). Decoding and presentation may be started when DTS respectively PTS are equal to the PCR time base, which is reconstructed in the decoder 504 by means of the PCRs in the stream. The distance between, e.g., the PTS values of 2 I-frames 201 corresponds to their nominal distance in display time. In trick-play this time distance is compressed or expanded with the speed factor. Since a new PCR time base is used in trick-play, and because the distance for DTS and PTS is no longer correct, the original DTS and PTS of the I-frame 201 have to be replaced. To solve above-mentioned complications, the I-frame 201 may first be parsed into an elementary stream in the parsing unit 505. Then the empty P-frames 202 are added on elementary stream level. The obtained trick-play, GOP is mapped into one PES packet and packetized to transport stream packets. Then corrected tables like PAT, PMT, etc. are added. At this stage, a new PCR time base together with DTS and PTS are included. The transport stream packets are pre-pended with a 4 bytes time stamp that is coupled to the PCR time base such that the trick-play stream can be handled by the same output circuitry as used for normal play.
In the following, some aspects related to trick-play speeds will be described. In this context, firstly, fixed trick-play speeds will be discussed.
As mentioned before, a trick-play GOP structure like IPP may be used in which the I-frame 201 is followed by two empty P-frames 202. It is assumed that the original GOP has a GOP size 203 of 12 frames and that all the original I-frames 201 are used for trick-play. This means that the I-frames 201 in the normal play stream have a distance of 12 frames and the same I-frames 201 in the trick-play stream a distance of 3 frames. This leads to a trick-play speed of 12/3 = 4x. If the original GOP size 203 in frames is denoted by G, the trick-play GOP size in frames by T and the trick-play speed factor by Nb, the trick-play speed in general is given by:
Nb=G/T (1)
Nb will also be denoted as the basic speed. Higher speeds can be realized by skipping I-frames 201 from the original stream. If every second I-frame 201 is taken, the trick-play speed is doubled, if every third I-frame 201 is taken, the trick-play speed is tripled and so on. In other words, the distance between the used I-frames 201 of the original stream is 2, 3 and so on. This distance may be always an integer number. If the distance between the I-frames 201 used for trick-play generation is denoted by D (D=I meaning that every I-frame 201 is used), then the general trick-play speed factor N is given by:
N=D*G/T (2)
This means that all integer multiples of the basic speed can be realized, leading to an acceptable set of speeds. It should be noticed that D is negative for reverse trick-play and that D=O results in a still picture. Data can only be read in a forward direction. Therefore, in reverse trick-play, data is read forward and jumps are made backwards to retrieve the preceding I-frame 201 given by D. It should also be noticed that a larger trick- play GOP size T results in a lower basic speed. For instance, IPPP leads to a finer grained set of speeds than IPP. Referring to Fig. 6, time compression in trick-play will be explained.
Fig. 6 shows the situation for 7=3 (IPP) and G= 12. For D=I, an original display time of 24 frames is compressed into a trick-play display time of 3 frames resulting in N=S. In the given example, the basic speed is an integer but this is not necessarily the case. For G= 16 and 7=3, the basic speed is 16/3 = 5 1/3 which does not result in a set of integer trick-play speeds. Therefore, the IPPP structure (7=4) is better suited for a GOP size of 16 resulting in a basic speed of 4x. If a single trick-play structure is desired that fits to the most common GOP sizes of 12 and 16, IPPP may be chosen.
Secondly, arbitrary trick-play speeds will be discussed.
In some cases, the set of trick-play speeds resulting from the method described above is satisfying, in some cases not. In the case of G= 16 and 7=3 one probably still would prefer integer trick-play speed factors. Even in the case of G= 12 and 7=4 it might be preferred to have a speed not available in the set like for instance 7x. Now, the trick-play speed formula will be inverted and the distance D will be calculated which is given by:
D=N*T/G (3)
Using the above example with G=Yl, 7=4 and N=I results in D=2 1/3. Instead of skipping a fixed number of I-frames 201, an adaptive skipping algorithm might be used that chooses the next I-frame 201 based on the fact what I-frame 201 best matches the required speed. To choose the best matching I-frame 201, the next ideal point Ip with the distance D may be calculated and one of the I-frames 201 may be chosen closest to this ideal point to construct a trick-play GOP. In the following step, again the next ideal point may be calculated by increasing the last ideal point by D.
As visualized in Fig. 7 illustrating trick-play with fractional distances, there are particularly three possibilities to choose the I-frame 201 :
A. The I-frame closest to the ideal point; / = round(//?)
B. The last I-frame before the ideal point; / = mt{Ip)
C. The first I-frame after the ideal point; / = int(//?)+l As can clearly be seen, the actual distance is varying between int(D) and int(D)+l, the ratio between the occurrences of the two being dependent on the fraction of D, such that the average distance is equal to D. This means that the average trick-play speed is equal to N, but that the actually used frame has a small jitter with respect to the ideal frame. Several experiments have been performed with this, and although the trick-play speed may vary locally, this is not visually disturbing. Usually, it is not even noticeable especially at somewhat higher trick-play speeds. It is also clear from Fig.7 that it makes no essential difference whether to choose method A, B or C.
With this method, trick-play speed N does not need to be an integer but can be any number above the basic speed Nb. Also speeds below this minimum can be chosen, but then the picture refresh rate may be lowered locally because the effective trick-play GOP size T is doubled or at still lower speeds even tripled or more. This is due to a repetition of the trick-play GOPs, as the algorithm will choose the same I-frame 201 more than once.
Fig. 8 shows an example for D=2/3 which is equivalent to N=2/3 Nb. Here, the round function is used to select the I-frames 201 and as can be seen frames 2 and 4 are selected twice.
Anyway, the described method will allow for a continuously variable trick- play speed. For reverse trick-play a negative value is chosen for N. For the example of Fig. 7 this simply means that the arrows 700 are pointing in the other direction. The method described will also include the sets of fixed trick-play speeds mentioned earlier and they will have the same quality, especially if the round function is used. Therefore, it might be appropriate that the flexible method described in this section should always be implemented whatever the choice of the speeds will be.
Now some aspects related to the refresh rate of the trick-play picture will be discussed.
The term "refresh rate" particularly denotes the frequency with which new pictures are displayed. Although not speed dependent, it will be briefly discussed here because it can influence the choice of T. If the refresh rate of the original picture is denoted by R (25Hz or 30Hz), the refresh rate of the trick-play picture [R1) is given by:
Rt=RfT (4)
With a trick-play GOP structure of IPP (T=3) or IPPP [T=A), the refresh rate Rt is 8 1/3 Hz respectively 6 1/4 Hz for Europe and 10 Hz respectively 7 1/2 Hz for the USA. Although the judgment of trick-play picture quality is a somewhat subjective matter, there are clear hints from experiments that these refresh rates are acceptable for low speeds and even advantageous at higher speeds.
In the following, some aspects related to encrypted stream environments will be described.
Here some information about encrypted transport streams is presented as a basis for the description of trick-play on encrypted streams. It is focussed on the Conditional Access System used for broadcast.
Fig. 9 illustrates a conditional access system 900 which will now be described. In the conditional access system 900, content 901 may be provided to a content encryption unit 902. After having encrypted the content 901, the content encryption unit 902 supplies a content decryption unit 904 with encrypted content 903. In this specification it has been stated that ECM denotes Entitlement Control Messages. Furthermore, it is meant that KMM denotes Key Management Messages, GKM denotes Group Key Messages and EMM denotes Entitlement Management Messages. A Control
Word 906 may be supplied to the content encryption unit 902 and to an ECM generation unit 907. The ECM generation unit 907 generates an ECM and provides the same to an ECM decoding unit 908 of a smart card 905. The ECM decoding unit 908 generates from the ECM a Control Word that is decryption information that is needed and provided to the content encryption unit 904 to decrypt the encrypted content 903.
Furthermore, an authorization key 910 is provided to the ECM generation unit 907 and to a KMM generation unit 911, wherein the latter generates a KMM and provides the same to a KMM decoding unit 912 of the smart card 905. The KMM decoding unit 912 provides an output signal to the ECM decoding unit 908. Moreover, a group key 914 may be provided to the KMM generation unit 911 and to a GKM generation unit 915 which may further be provided with a user key 918. The GKM generation unit 915 generates a GKM signal GKM and provides the same to a GKM decoding unit 916 of the smart card 905, wherein the GKM decoding unit 916 gets as a further input a user key 917. Beyond this, entitlements 919 may be provided to an EMM generation unit
920 that generates an EMM signal and provides the same to an EMM decoding unit 921. The EMM decoding unit 921 located in the smart card 905 is coupled with an entitlement list unit 913 which provides the ECM decoding unit 908 with corresponding control information. In many cases, content providers and service providers want to control access to certain content items through a conditional access (CA) system.
To achieve this, the broadcasted content 901 is encrypted under the control of the CA system 900. In the receiver, content is decrypted before decoding and rendering if access is granted by the CA system 900.
The CA system 900 uses a layered hierarchy (see Fig. 9). The CA system 900 transfers the content decryption key (Control Word CW 906, 909) from server to client in the form of an encrypted message, called an ECM. ECMs are encrypted using an authorization key (AK) 910. For security reasons, the CA server 900 may renew the authorization key 910 by issuing a KMM. A KMM is in fact a special type of EMM, but for clarity the term KMM may be used. KMMs are also encrypted using a key that for instance can be a group key (GK) 914, which is renewed by sending a GKM that is again a special type of EMM. GKMs are then encrypted with the user key (UK) 917, 918, which is a fixed unique key embedded in the smart card 905 and known by the CA system 900 of the provider only. Authorization keys and group keys are stored in the smart card 905 of the receiver.
Entitlements 919 (for instance viewing rights) are sent to individual customers in the form of an EMM and stored locally in a secure device (smart card 905). Entitlements 919 are coupled to a specific program. An entitlements list 913 gives access to a group of programs depending on the type of subscription. ECMs are only processed into keys (Control Words) by the smart card 905 if an entitlement 919 is available for the specific program.
Entitlement EMMs are subject to an identical layered structure as the KMMs (not depicted in Fig. 9).
In an MPEG2 system, encrypted content, ECMs and EMMs (including the KMM and GKM types) are all multiplexed into a single MPEG2 transport stream. The description above is a generalized view of the CA system 900. In digital video broadcasting, only the encryption algorithm, the odd/even Control Word structure, the global structure of ECMs and EMMs and their referencing are defined. The detailed structure of the CA system 900 and the way the payloads of ECMs and EMMs are encoded and used are provider specific. Also the smart card is provider specific. However, from experience it is known that many providers follow essentially the structure of the generalized view of Fig. 9.
In the following, DVB Encryption/Decryption topics will be discussed.
The applied encryption and decryption algorithm is defined by the DVB standardization organization. In principle two encryption possibilities are defined namely PES level encryption and TS level encryption. However, in real life mainly the TS level encryption method is used. Encryption and decryption of the transport stream packets is done packet based. This means that the encryption and decryption algorithm is restarted every time a new transport stream packet is received. Therefore, packets can be encrypted or decrypted individually. In the transport stream, encrypted and plaintext packets are mixed because some stream parts are encrypted (e.g. audio/video) and others are not (e.g. tables). Even within one stream part (e.g. video) encrypted and plaintext packets may be mixed.
Referring to Fig. 10, a DVB encrypted transport stream packet 1000 will be described.
The stream packet 1000 has a length 1001 of 188 Bytes and comprises three portions. A packet header 1002 has a size 1003 of 4 Bytes. Subsequent to the packet header 1002, an adaptation field 1004 may be included in the stream packet 1000. After that, a DVB encrypted packet payload 1005 may be sent.
Fig. 11 illustrates a detailed structure of the transport stream packet header 1002 of Fig. 10. The transport stream packet header 1002 comprises a synchronization unit
(SYNC) 1010, a transport error indicator (TEI) 1011 which may indicate transport errors in a packet, a payload unit start indicator (PLUSI) 1012 which may particularly indicate a possible start of a PES packet in the subsequent payload 1005, a transport priority unit (TPI) 1017 indicating priority of the transport, a packet identifier (PID) 1013 used for determining the assignment of the packet, a transport scrambling control (SCB) 1014 is used to select the CW that is needed for decrypting the transport stream packet, an adaptation field control (AFLD) 1015, and a continuity counter (CC) lOlβ.Thus, Fig. 10 and Fig. 11 show the MPEG2 transport stream packet 1000 that has been encrypted and which comprises different parts: - Packet header 1002 is in plaintext. It serves to obtain important information such as a packet identifier (PID) number, presence of an adaptation field, scrambling control bits, etc.
- Adaptation field 1004 is also in plaintext. It can contain important timing information such as the PCR.
- DVB Encrypted Packet Payload 1005 contains the actual program content that may have been encrypted using the DVB algorithm.
In order to select the correct CW that is needed to decrypt the broadcasted program it is necessary to parse the transport stream packet header. A schematic overview of this header is given in Fig.11. An important field for the decryption of the broadcasted program is the scrambling control bits (SCB) field 1014. This SCB field 1014 indicates which CW the decrypter must use to decrypt the broadcasted program. Moreover, it indicates whether the payload of the packet is encrypted or in plaintext. For every new transport stream packet, this SCB 1014 must be parsed since it changes over time and can change from packet to packet. In the following, some aspects related to trick-play on fully encrypted streams will be described.
The first reason why this is an interesting topic is that trick-play on plaintext and fully encrypted streams are the two extremes of a range of possibilities. Another reason is that there exist applications in which it may be necessary to record fully encrypted streams. Thus, it would be useful to have a technique at hand to perform trick-play on a fully encrypted stream. A basic principle is to read a large enough block of data from the storage device, decrypt it, select an I-frame in the block and construct a trick-play stream with it. Such a system 1200 is depicted in Fig. 12 Fig. 12 shows the basic principle of trick-play on a fully encrypted stream. For this purpose, data stored on a hard disk 1201 are provided as a transport stream 1202 to a decrypter 1203. Further, the hard disk 1201 provides a smart card 1204 with an ECM, wherein the smart card 1204 generates Control Words from this ECM and sends the same to the decrypter 1203.
Using the Control Words, the decrypter 1203 decrypts the encrypted transport stream 1202 and sends the decrypted data to an I-frame detector and filter 1205. From there, the data are provided to an insert empty P frame unit 1206 which conveys the data to a set top box 1207. From there, data are provided to a television 1208.
Some aspects will be mentioned with respect to the question of what a recording contains. Making a recording of a single channel, the recording must contain all the data required to playback the recording of the channel at a later stage. One can resort to just record everything on a certain transponder, but this way one would record far more than one needs to playback the program intended to record. This means that both bandwidth and storage space would be wasted. So instead of this, only the packets really needed should be recorded. For each program this means one must record all the MPEG2 mandatory packets like PAT (program association table), CAT (conditional access table), and obviously for each program the video and audio packets as well as the PMT (program map table) that describes which packets belong to a program. Furthermore, the CAT/PMT may describe CA packets (ECMs) needed for decryption of the stream. Unless the recording is made in plaintext after decryption, those ECM packets have to be recorded as well.
If the recording made does not consist of all packets from the full multiplex, the recording becomes a so-called partial transport stream 1300 (see Fig. 13). Further, Fig. 13 illustrates a full transport stream 1301. The DVB standard requires that if a partial transport stream 1300 is played, all normal DVB mandatory tables like NIT (network information table), BAT (bouquet association table) etc. are removed. Instead of these tables, the partial stream should have SIT (selection information table) and DIT (discontinuity information table) tables inserted. In the following, some aspects related to dealing with ECMs will be described.
Jumping to the next block during trick-play can mean jumping back in the stream. It will be explained that this may not be only the case for trick-play reverse but also for trick-play forward at moderate speeds. The situation for forward trick-play with forward jumps and for reverse trick-play with inherently backward jumps will be explained afterwards.
Specific problems may occur caused by the fact that data has to be decrypted. A conditional access system may be designed for transmission. In normal play, the transmitted stream may be reconstructed with original timings. But trick-play may have severe implications for the handling of cryptographic metadata due to changed timings. The data may be compressed or expanded in time due to trick-play, but the latency of the smart card may remain constant.
To create a trick-play stream, the mentioned data blocks may go through a decrypter. This decrypter needs the Control Words used in the encryption process to decrypt the data blocks. These Control Words may also be encrypted and stored in ECMs. In a normal set-top-box (STB), these ECMs may be part of the program tuned to. A conditional access module may extract the ECMs, send them to a smart card, and, if the card has rights or an authorization to decrypt these ECMs, may receive the decrypted Control Words from it. Control Words usually have a relatively short lifetime of, for instance, approximately 10 seconds. This lifetime may be indicated by the Scrambling Control Bit, SCB 1014, in the transport stream packet headers. If it changes, the next Control Word has to be used. This SCB change or toggle is indicated in Fig. 14 by a vertical line and with a reference numeral 1402.
Referring to Fig. 14, particularly two different scenarios or stream types may be distinguished: According to a stream type I shown in a lower row 1401 in Fig. 14, two Control Words (CWs) are provided per ECM.
According to a stream type II shown in an upper row 1400 in Fig. 14, only one Control Word (CW) is provided per ECM. Fig. 14 illustrates the two data streams 1400, 1401 comprising subsequently arranged periods or segments A, B, C denoted with reference numeral 1403. In the scenario illustrated in the upper row 1400 of Fig. 14, essentially one Control Word per corresponding ECM is provided. In contrast to this, in the lower row 1401, each ECM comprises two Control Words, namely the Control Word relating to the current period or ECM, and additionally the Control Word of the subsequent period or ECM. Thus, there is some redundancy concerning the provision of the Control Words.
During the short lifespan, items of the decryption information may be transmitted several times, so that tuning to such a channel halfway through the lifespan of such a Control Word does not mean waiting for the next Control Word. The conditional access module may only send the first unique ECM it finds to the smart card to reduce or minimize the traffic to the card, as it may have a fairly slow processor.
This shows that there may be a limitation of trick-play on encrypted streams. There may be an implicit upper speed limit, coming from the limited speed of the processing capability of the smart card. In trick-play, the Control Word lifetime of 10 seconds may be compressed or expanded with the trick-play speed factor. Sending an ECM to a smart card and receiving the decrypted Control Words may take approximately half a second. The way Control Words are packed into an ECM may be provider-specific and particularly different for stream type I and stream type II, as depicted in Fig. 14.
CW A denotes the CW that was used to encrypt period A, CW B denotes the CW that was used to encrypt period B, and so on. Horizontally, the transmission time axis is plotted. ECM A may be defined as being the ECM that is present during the major part of period A. It can be seen that, in that case, ECM A holds the CW for the current period A and for stream type I additionally for the next period B. In general, an ECM may hold at least the CW for the current period and might hold the CW for the next period. Due to zapping, this may probably be true for all or many providers.
Before going on, more information will be provided about a decrypter and how it may handle the CWs. The decrypter may contain two registers, one for the "odd" and one for the "even" CW. "Odd" and "even" does not have to mean that the values of the CWs themselves are odd or even. The terms are particularly used to distinguish between two subsequent CWs in the stream. Which CW has to be used for the decryption of a packet is indicated by the SCB 1014 in the packet header. So the CWs used to encrypt the stream are alternating between odd and even. In Fig. 14 this means that, for instance, CW A and CW C are odd, whereas CW B and CW D are even. After the decryption by the smart card, CWs may be written to the corresponding registers in the decrypter overwriting previous values, as indicated in Fig. 15.
Fig. 15 illustrates the two registers 1501, 1502 containing even CWs (register 1501) and containing odd CWs (register 1502). Further, smart card latency 1500, that is a time needed by the smart card to retrieve or decrypt a CW from an ECM, is illustrated in Fig. 15.
In the case of stream type I, each ECM holds two CWs and as a result both registers 1501, 1502 may be overwritten after the decryption of the ECM. One of the registers 1501, 1502 is active and the other is inactive. Which one is active depends on the SCB 1014. In the example, the SCB 1014 will indicate during period B that the even register 1501 is the active one. The active register may only be overwritten with a CW identical to the one it already holds because it is still needed for decryption of the remainder of that particular period. Therefore, only the inactive register may be overwritten with a new value.
Taking a closer look at period B in trick-play. Assuming that an ECM is sent to the smart card at the start of this period so at the moment the SCB toggle 1402 is crossed. The question is what ECM could then be sent to the smart card?
This ECM should hold CW C to ensure a timely decryption by the smart card for usage at the start of period C.
It may also hold CW B without disturbing the correct availability of CWs in the decrypter. Looking again at Fig. 14, it can be seen that for stream type I this means sending ECM B and for stream type II ECM C at the start of period B. In general, the current ECM can be sent in case it holds two CWs, and one period in advance if it holds only one CW. Sending an ECM one period in advance may be contradictory though to the embedded ECMs, so the latter have to be removed from the stream in that case. For a more generalized approach it may be preferred that the original ECMs are always removed from the stream by the trick-play generation circuitry or software. However, this cannot always be true.
Fig. 16 shows ECM handling in a fast forward mode. In a plurality of subsequent periods 1403 separated by SCB toggles 1402, a plurality of data blocks 1600 are reproduced, wherein a switching 1601 occurs between different data blocks.
For stream type I, an ECM B is sent at a border between periods A and B. For stream type II, an ECM C is sent at a border between period A and period B. Furthermore, according to stream type I, an ECM C is sent at a border between period B and period C. For a stream type II, an ECM D is sent at a border between period B and period C.
For ECMs to be available for trick-play at the correct moment, the ECMs may be stored in a separate file. In this file it may also be indicated to which period an ECM belongs (which part of the recorded stream). The packets in the MPEG stream file may be numbered. The number of the first packet of a period (SCB toggle 1402) may be stored alongside with the ECM for this same period 1403. The ECM file may be generated during recording of the stream.
The ECM file is a file that may be created during the recording. In the stream, ECM packets may be located which may contain the Control Words needed to decrypt the video data. Every ECM may be used for a certain period, for instance 10 seconds, and may be transmitted (repeated) several times during this period (for instance 100 times). The ECM file may contain every first new ECM of such a period. The ECM data may be written into this file, and may be accompanied by some metadata. First of all, a serial number (counting up from 1) may be given. As a second field, the ECM file may contain the position of the SCB toggle. This may denote the first packet that can use this ECM to correctly decrypt its content. Then the position in time of this SCB toggle may follow as the third field. These three fields may be followed by the ECM packet data itself.
Using the SCB toggles stored in the ECM file, it may be easy to detect if such toggle is crossed even if this would be during a jump. To send the correct ECM, it may be required to know whether the ECMs contain one or two CWs. In principle, this is not known because it is provider-specific and secret. However, this can easily be determined experimentally by sending ECMs at various moments and observing the results on the display. An alternative method that is particularly suitable for implementation in the storage device itself is as follows. Send one single ECM to the smart card at the moment of an SCB toggle, decrypt the stream and check for PES headers in the coming two periods. With one PES header per GOP, there are around twenty PES headers in each period. The position of a PES header may be easily detected because a PLUSI bit in the plaintext header of the packet may indicate its presence. If correct PES headers are only found during the first period (after the latency of the smartcard), the ECM contains one CW. If they are also found during the second period, it contains two CWs. Such a situation is depicted in Fig. 17.
Fig. 17 illustrates a situation for one CW detection and for two CW detection. As can be seen, different periods 1403 of encrypted content 1700 are provided. With a smartcard latency 1500, an ECM A may be decrypted to generate corresponding CWs. By decrypting the encrypted content 1700, decrypted content 1701 may be generated. Further shown in Fig. 17 are PES headers 1702, namely a PES header A in period A (left) and a PES header B in period B (right). The area 1703 of period B for one CW in Fig. 17 indicates that the data is decrypted with the wrong key and therefore scrambled. This checking could be done while recording, in which case it will take for instance 20 to 30 seconds. It could also be done offline and, because only two packets indicated by the PLUSIs (one in each period) would have to be checked, it could be very quick. In the unlikely event that adequate PES headers are not available, the picture headers could be used instead. In fact, any known information may be useable for detection. Anyway, a one/two CW indication may be stored in the ECM file.
In the following, some aspects related to dealing with slow-forward streams in particular will be described.
Next, trick-play GOP based slow- forward, still picture and step mode will be explained.
S Io w- forward which may also be denoted as slow motion forward is a mode in which the display picture runs at a lower than normal speed. One form of slow- forward is already possible with the technique explained above referring to Fig. 7 and Fig. 8. Setting the fast-forward speed to a value between zero and one results in a slow- forward stream based on a repetition of fast-forward trick-play GOPs. For a plaintext stream, this is a proper solution, but for an encrypted stream it may lead to the erroneous decryption of a part of the I-frame in certain specific conditions. One option to solve this problem is not to repeat the fast-forward trick-play GOP but to extend the size of the trick-play GOP by the addition of empty P- frames. This technique in fact may also enable slow-reverse, because it is based on the trick- play GOPs used for fast-forward/reverse and therefore on the independently decodable I- frames.
Such an I-frame based slow-forward or slow-reverse may be inappropriate in special cases for the following reason. The distance between I-frames in normal play is around half a second and for slow- forward/reverse it is multiplied with the slow motion factor. So this type of slow- forward or slow-reverse is not exactly what is usually understood as the slow motion but in fact more like a slide show with a large temporal distance between the successive pictures.
In a still picture mode, the display picture may be halted. This can be achieved by adding empty P-frames to the I-frame for the duration of the still picture mode. This means that the picture resulting from the last I-frame is halted. When switching from normal play to still picture, this can also be the nearest I-frame according to the data in the CPI file. This technique is an extension of the fast-forward/reverse modes and results in nice still pictures especially if interlace kill is used. However, the positional accuracy is not always satisfactory when switching from normal play or slow- forward/reverse to still picture.
The still picture mode can be extended to implement a step mode. The step command advances the stream to some next or previous I-frame. The step size is at minimum one GOP but can also be set to a higher value equal to an integer number of GOPs. Step forward and step backward are both possible in this case because only I-frames are used. For the construction of a slow- forward stream many considerations apply. For example, the construction of a slow- forward stream on elementary stream level can only be performed on fully plaintext data. As a consequence, the slow-forward stream will be fully plaintext, even if the normal play stream was originally encrypted. Such a situation may be unacceptable to a copyright holder. Furthermore, this is worse than in the case of fast- forward/reverse stream because all information, i.e. each and every frame, is present in plaintext in the slow- forward stream and not just a subset of the frames as is the case for true fast-forward/reverse streams. Therefore a plaintext normal play stream can easily be reconstructed from a plaintext slow-forward stream. So the slow-forward stream should be encrypted if the normal play stream is encrypted. Since a DVB encryptor is not permissible in a consumer device this can only be realized if the slow- forward stream is constructed on transport stream level using the encrypted data packets from the originally transmitted encrypted data stream.
In the following, referring to Fig. 18 to Fig. 55, systems will be described which are capable of processing a data stream in a system according to exemplary embodiments of the invention.
It is emphasized that the systems described in the following can be implemented in the frame of and in combination with any of the systems described referring to Fig. 1 to Fig. 17. In the following, referring to Fig. 18, a data processing device 1800 for processing an MPEG2 data stream according to an exemplary embodiment of the invention will be described.
The device 1800 is adapted for processing an MPEG2 data stream comprising a plurality of frames, for instance a sequence of I-frames, P-frames and B-frames. At least one or a sequence of I-frames should be included in the MPEG2 data stream, all other frame types are optional.
Content may be stored on a harddisk 1801 as a storage device. A central processing unit or control unit (not shown) may have access to the harddisk 1801 and may provide the harddisk 1801 with corresponding control signals so that data stored on the harddisk 1801 may be supplied to a first switch unit 1803.
However, the control unit is under control of a human user operating a user input/output interface 1808. Such a user interface 1808 may include a display, input means like a keypad, a joystick, a trackball, or the like and may allow a user to specify a mode according to which she or he wishes to reproduce content stored on the harddisk 1801. For instance, the user may adjust, via the user input/output unit 1808, parameters like volume, playback speed, a trick-play reproduction mode, equalizing, etc.
A data stream comprising a plurality of frames may be supplied from the harddisk 1801 to the first switch unit 1803 being under control of a detection unit 1802 which is in functional relationship with a delay unit 1804. The detection unit 1802 is adapted for controlling the first switch unit 1803 to switch from a first reproduction mode (for instance normal-play, NP) to a second reproduction mode (for instance trick-play, TP, particularly slow- forward trick play). Thus, the user may indicate via the user input/output unit 1808, that she or he wishes to switch from a normal play reproduction mode to a slow- forward reproduction mode. Such a switch may be detected by the detection unit 1802. The detection unit 1802 is adapted for detecting the first anchor frame after a switch from one reproduction mode to another one, triggered by a user.
When the detection unit 1802 has detected a switch between two operation modes, for instance a switch from a normal play mode to a slow-forward mode, the first switch unit 1803 may be controlled accordingly. In the operation state shown in Fig. 18, a normal play mode is present so that the output of the harddisk 1801 is coupled, via the first switch 1803, to a reproduction unit 1806 for reproducing the content in accordance with a normal play mode. Also the reproduction unit 1806 which may include a display, a loudspeaker, an earpiece, or the like may have a communication connection with the control unit.
After the mode switching has been detected by the detection unit 1802, the first switch 1803 may be brought in another position so as to start the slow- forward mode. For this purpose, the data stream to be played back is delayed by a delay unit 1804 for delaying the switch to the slow- forward mode by a delay time which corresponds to the time difference between the point of time of the switch and a point of time of the start of a next anchor frame in the sequence of the plurality of frames. As will be described below in more detail, such an anchor frame may be an I-frame or a P-frame (in the nomenclature of MPEG). By delaying the start of the slow- forward mode by a corresponding time difference, it is possible to obtain a smooth transition between the different playback modes. In other words, before expiry of the delay time, the playback is continued in the normal play mode, and only after expiry of this delay time interval, the system 1800 starts playing back in the slow- forward mode. When the first switch unit 1803 has switched to the trick-play operation mode
TP (not shown in Fig. 18), the data stream provided at an output of the harddisk 1801 is connected to input of a replication unit 1809 for repeating frames. The replication unit 1809, which is also in communication with the control unit, is coupled with a correction unit 1805 for correcting a temporal reference of the plurality of frames after the delay time. In other words, it may be necessary in the slow- forward mode that the order of playback of the frames may be altered, or that a plurality of frames have to be repeated several times, or that additional (empty) frames are inserted in the data stream. In order to take into account for such modified frame conditions, the correction unit 1805 corrects the temporal reference of these frames. An output of the correction unit 1805 is coupled to the reproduction unit 1806 so as to provide the reproduction unit 1806 with the data modified in accordance with the slow-forward reproduction mode.
When a user operates the user input/output unit 1808 so as to go back to a normal play mode, then the first switching unit 1803 is brought back to the normal play reproduction state (shown in Fig. 18) so that, after a corresponding delay or waiting time, the normal play reproduction mode is continued.
The correction unit 1805 may perform features like reordering the sequence of frames to be played back, inserting (empty) frames so as to take into account a trick-play factor, or to repeat frames for delayed playback. As can be taken from Fig. 18, a second switch unit 1810 is foreseen which can be switched under control of an additional detection unit 1811 which, in turn, can be controlled via the user interface 1808. Fig. 18 shows the second switch unit 1810 in a normal- play operation mode NP. However, by means of the additional detection unit 1811, the second switch unit 1810 can also be brought in a trick-play operation mode TP (not shown). The additional detection unit 1811 may detect a GOP start after switch to NP, and may provide for a time correction when switching back to NP.
In the following, different operation modes of the system 1800 will be explained. Normal play: The first switch 1803 and the second switch 1810 are in the positions as is given in Fig. 18. The data is read from harddisk 1801 and flows directly to the reproduction unit 1806.
Normal play to slow forward: The detection unit 1802 is triggered by the user interface unit indicating a switch to slow forward. The detection unit 1802 searches for the first anchor frame after the user-induced switch and switches, when the anchor frame is found, the first switch 1803 to the position with the branch named TP. The replication unit 1809 takes care of the replication of the frames. The data flows through the correction unit 1805 to the reproduction unit 1806.
Slow forward to normal play: The detection unit 1802 is triggered by the user interface unit 1808 indicating a switch to normal play. The detection unit 1802 searches for the first anchor frame after the user- induced switch and switches, when the anchor frame is found, the first switch 1803 to the position with the branch named NP. The additional detection unit 1811 simultaneously starts looking for a GOP start. The second switch 1810 should be set to the position where the correction unit 1805 is connected. When a GOP start is found, the second switch 1811 will return to the normal play position as is indicated in Fig. 18.
It should be mentioned that the second switch unit 1810 is optional. Although the processing of temporal references after a switch back is preferred, and might provide better visual results during the switch back to normal play, there may be other discontinuities during this switch back that cannot be compensated for. Good use of the discontinuity flags may be made, and because of this, a system without the second switch unit 1810 may also provide a proper performance.
In the following, further details concerning the slow- forward trick-play reproduction according to exemplary embodiments of the invention will be explained. Next, splitting of the stream into separate frames will be explained. To be able to construct a slow-forward stream on transport level it is advantageous that each individual frame is available as a series of transport stream packets. In case of one PES packet per frame this comes natural. A PES packet is contained in a series of transport stream packets because PES and transport stream packets are aligned. In the case of one PES packet per GOP this is only the case for the start of the I-frame. All other frame boundaries are mostly located somewhere inside a packet. This packet contains information from the two frames. So first this packet may be split up into two packets, the first one containing the data from the first frame and the second one of the data from the next frame. Each of the two packets resulting from the splitting may be stuffed with an Adaptation Field (AF).
This situation is indicated in Fig. 19.
Fig. 19 shows splitting of the packet at a frame boundary. Particularly, Fig. 19 illustrates a plurality of TS packets 1900 each comprising a header 1901 and a frame portion 1902. As can be taken from a central portion of the data stream shown in Fig. 19, a packet comprising a header 1901 and two subsequent frames 1902 is split up into two separate portions each having a separate header 1901 followed by an Adaptation Field 1903 and followed by the corresponding frame 1902.
The splitting of packets is not difficult for a plaintext stream. A first option is to fully decrypt the normal play data as depicted in Fig. 20. Fig. 20 shows a slow-forward construction after decryption of normal play data. Encrypted normal play data 2000 from a harddisk 2001 are supplied to a decrypter 2002 generating a plaintext stream 2003. The plaintext stream 2003 is supplied to a frame splitting unit 2004 for splitting the different frames in a manner as shown in Fig. 19. Then, this data is supplied to a slow- forward construction unit 2005 constructing a slow- forward stream, which is then supplied to a set top box 2006.
The decryption and slow- forward mode of a stored fully encrypted stream 2000 or a stored hybrid stream is not difficult because no stream data is skipped or duplicated in the stream by the decrypter 2002. The stored stream 2000 (fully encrypted or hybrid) is simply fed at a lower than normal rate through the decrypter 2002 which also means that there are no problems with embedded ECMs (Entitlement Control Messages). The plaintext stream 2003 coming from the decrypter unit 2002 can then be used to split the packets or in fact to perform any necessary stream manipulation in the frame splitting unit 2004. The resulting slow- forward stream is a plaintext stream in this case. The construction of an encrypted slow- forward stream from an encrypted normal play stream is performed on transport level because the use of a DVB (Digital Video Broadcasting) encryptors in consumer devices may not be allowed in special cases. For this, a hybrid stream (see Fig. 21) with only a few plaintext packets 2100 and 2102 on all frame boundaries are needed. Fig. 21 furthermore shows encrypted packets 2101 which belong to the I-frames 2103, B-frames 2104 or P-frames 2105.
Below, it will be described how such a stream could be generated on the playback side of the storage device if the stored stream is fully encrypted. In this case, the decrypter unit 2002 in Fig. 20 may be a selective type that only decrypts the necessary packets. But preferably the stream is already stored as a hybrid stream as indicated in Fig. 22.
Fig. 22 illustrates slow-forward construction on a stored hybrid stream 2200. In the array shown in Fig. 22, no decryption unit 2002 is foreseen between the harddisk 2001 and the frame splitting unit 2004. However, a decrypter unit 2201 may then be foreseen in the set top box 2006. The plaintext packets 2100, 2102 in the hybrid stream should now also allow for the splitting of packets containing data from the two frames. This may be guaranteed by a criteria which will be described below in more detail. However, some part of the sequence header code or picture start code can still be located in an encrypted packet. In this case, an ideal splitting is not easily possible. In fact the split may be made between the encrypted and plaintext packets. Solutions for these problems will be described below in more detail. In that situation only empty P-frames are concatenated to an I-frame and vice versa. For a frame based slow- forward, also other types of concatenation may be considered among which the concatenation of B-frames to B-frames. This may result in some kind of gluing algorithm at these frame boundaries as will be clarified referring to Fig. 23. Fig. 23 illustrates a data stream in which a previous frame 2300, a current frame 2312 and a next frame 2301 are shown. At the end of the previous frame 2300, three bytes of picture start code 2302 are provided. Furthermore, at the beginning of the current frame 2312 one byte of picture start code 2303 is foreseen. Coming now to the next frame 2301, the frame end of the packet before comprises one byte of picture start code 2304. At the beginning of the next frame 2301, three bytes of picture start code 2305 are provided. Fig. 23 shows that an incomplete picture start code may be present at the concatenation point. This may make a gluing necessary at a connection region 2306. Thus, gluing should be performed between the B-frame 2307 and a repetition of the B-frame 2308. Fig. 23 particularly illustrates a packet header 2309, plaintext data 2310 and encrypted data 2311. In the example of Fig. 23, there is only one byte of the picture start code at the start and the end of the B-frame. As a result, two bytes are missing at the concatenation point. The gluing algorithm, which will be described below in more detail may heal such a problem. For this gluing it should be known how the picture start code is split. This information may be obtained with a method that will be described below in more detail. In the following, repetition of the frames will be described in more detail. In a slow- forward mode, the decoder has somehow to be forced to repeat the display of a picture in accordance with the slow- forward factor. Empty P-frames may be used to force the repetition of a picture resulting from an I-frame. This technique can also be applied for pictures resulting from P-frames. However, this technique cannot be easily applied for B-frames because empty P-frames always point to an anchor frame being an I- frame or a P-frame. This is in fact the case for any type of empty frame. So the repetition of a picture resulting from a B-frame has to be realized in another way. A possible method is to repeat the B-frame data itself. Since the repeated B-frames point to the same anchor frames as the original B-frame the resulting pictures will be identical. The amount of data for a B- frame is usually much more than for an empty P-frame but in general it is still significantly less than for an I-frame. Anyway, the transmission is also multiplied with the slow-motion factor so there need not be an increasing bit rate at least on average. The empty frames used to force the repetition of pictures resulting from an I- frame or a P-frame can be of the interlace kill type thus reducing interlace artefacts for these pictures. But such a reduction is not easily possible for pictures resulting from the B-frames because the repetition is not forced by an empty frame but the repetition of the B-frame data itself. So the B-frames will have the original interlace effects. If interlace kill would be used for the I-frames and P-frames this might look very awkward because pictures with and without interlace effects are sequentially present in the stream of displayed pictures. It is presently believed that it might be better to only use empty frames without interlace kill to construct the slow- forward stream.
The repetition of the I- and P-frames may be enforced by the insertion in the transmission stream for empty P-frames after the original I-frame or P-frame. Such a method may be used for the fast forward/reverse stream comprising I-frames followed by empty P- frames. However, this method may be not absolutely correct for a stream that also includes B-frames, as in the case for a slow- forward stream constructed from a stored transmission stream with B-streams. Due to the reordering from transmission data to display stream, the I- frames and P-frames will be repeated in the wrong position thus disturbing the normal display order of the frames. This is illustrated in Fig. 24 and Fig. 25.
Fig. 24 illustrates the effect of reordering in normal play. Fig. 24 shows a transmission order 2400 and a display order 2401. Particularly, Fig. 24 depicts the effect of reordering in normal play. The top line shows a normal play transition stream 2400 with a GOP size of 12 frames comprising I-frames 2103, P-frames 2105 and B-frames 2104. The first four frames of the next transmission GOP are also shown for clarity. The bottom line of Fig. 24 shows the stream 2401 after reordering to the display order. The index indicates the display frame order. According to the MPEG2 standard ISO/IEC 13818-2: 1995(E) (see particular pages 24 and 25), the reordering may be performed as follows:
- B-frames keep their original position;
- Anchor frames (that is I-frames and P-frames) are shifted to the position of the next anchor frame.
Fig. 25 shows the effect of reordering in slow- forward mode. Particularly, Fig. 25 illustrates the transmission order 2500, an order after the reordering 2501 and an order of the displayed pictures 2502. Looking at the slow- forward stream constructed from the normal play stream in more detail, the top line of Fig. 25 shows the transmission order 2500 of the first part of the slow-motion stream for this case, assuming a slow-motion factor of three. Empty P-frames may be inserted after the I-frames and the P-frames, and the B-frames may be repeated. The middle line of Fig. 25 shows the effect of the reordering. The bottom line of Fig. 25 shows how the I-frames and the P-frames are repeated by the empty P-frames in this case. An empty P-frame may result in a display picture that is a copy of the picture resulting from the previous anchor frame, which itself could also be an empty P-frame. It is visible in Fig. 25 that the normal display order 2502 indicated by the index is disturbed because the display of frame 14 is split up into two parts. Only the last time frame 14 is displayed in the correct position. This also means that the B-frames may be decoded erroneously.
In the following, several options will be described how to correct such deficiencies. One possibility is shown in Fig. 26. Fig. 26 shows the insertion of empty P- frames before the anchor frames. The three rows in Fig. 26 are similar to the three lines of Fig. 25. In Fig. 26, the empty P-frames are inserted before the anchor frames in the transmitted stream extracted from the storage device as is shown in the top line 2500. In the reordered stream 2501, the empty P-frames are now positioned after the anchor frames. This is where they should be for a correct repetition of the anchor frames as is clear from the display pictures 2502 of Fig. 26. However, there are arguments why it may be appropriate to avoid empty P- frames. One is related to the propagation of errors within a GOP. P-frames depend on the previous anchor frame and B-frames depend on the surrounding anchor frames. A data error during the transfer to the set top box results in coding errors and therefore disturbances in the picture. If this error is an anchor frame it propagates until the end of the GOP because subsequent P-frames depend on this anchor frame. Also the B-frames are affected because they use the pictures from the disturbed surrounding anchor frames for the decoding. This may have the consequence that the picture disturbances gradually increase towards the end of the GOP. This may be especially important for slow- forward where the GOP size can be very large and therefore very long in time. On the other hand, a data error in a B-frame has only a very limited effect because no other frames depend on it. So the picture disturbances are restrained to this B-frame and its repetitions. One might argue that data errors should not occur on a digital interface but there may be a second advantage in preventing the use of empty P-frames. If these are of the interlace kill type they change at the decoded picture by nature resulting in decoding errors for the subsequent frames. So interlace kill may be not possible.
Referring to the construction of empty frames, several types of empty B- frames can be constructed. They may have the advantage that no additional error propagation is introduced and that interlace kill can be used. Possible types of empty B-frames are the forward predictive empty B-frames
(which may be denoted as Bf frames) and backward predictive empty B-frames (which may be denoted as Bb frames).
A B-frame is normally bi-directionally predictive, but uni-directional predictive B-frames can also exist. In the latter case they can be forward or backward predictive. Forward predictive means that an anchor frame is used to predict the following B- frames during encoding. So the picture resulting from a forward predictive B-frame is reconstructed during decoding from the previous anchor frame. This means that the Bf- frame forces the repetition of the previous anchor frame. Therefore, it has the same effect as an empty P- or Pe-frame. The Bb-frame has the opposite effect. It forces the display of the anchor frame following it. For both types of empty B-frames, an interlace kill version is possible as well.
In the following, it will be described how to use such empty B-frames for the construction of a slow- forward stream.
A first possibility on the basis of Bb-frames is depicted in Fig. 27. The Bb-frames are inserted before the anchor frames and keep their position during the reordering. The anchor frames are shifted to the position of the next anchor frame. The Bb frame forces the display of the anchor frame following it in the reordered stream.
Another option is the use of Bf- frames as shown in Fig. 28. The Bf- frames are inserted after the anchor frames in the transmission stream.
The repeated display of the anchor frames in the reordered stream is forced by the Bf-frames that follow them.
The use of Bf-frames is similar to the use of empty P-frames for the construction of fast-forward and fast-reverse streams. In fact the use of Bf-frames is also possible in that case thus commonising the trick-play generation even further. But when Bf- frames are used for fast-forward and fast-reverse, the effect of reordering should be considered. This means that some parameters in the fast-forward/reverse stream like PTS/DTS and temporal reference have to be chosen appropriately.
In the following, further details concerning the temporal reference will be explained.
The display order within the transmission GOP starting with a GOP header is indicated by the temporal reference in each picture header. The first frame to be displayed has a temporal reference equal to zero. This is depicted in Fig. 29 for a normal play stream. Fig. 29 illustrates a temporal reference 2900 for the transmission order 2902 and illustrates a temporal reference 2901 for a display order 2903.
In display order 2903, the temporal references 2901 are a monotonously increasing series from 0 to 11. Due to the reordering, the temporal references of the anchor frames in the transmission stream are shifted.
Considering the temporal references in the case of a slow- forward stream, the situation for the preferred case that the Bf-frames are inserted is depicted in Fig. 30 for a slow motion factor of three.
Fig. 30 indicates the temporal reference for slow-forward with Bf-frames. The top line of Fig. 30 indicates the frames taken from the normal play stream shown in Fig. 29 with the original temporal references. The second line of Fig. 30 shows the insertion of Bf-frames and the repetition of the B-frames. The original temporal references are shown above this line and how they should be below this line. The third line of Fig. 30 shows the frames after reordering, and the bottom line of Fig. 30 shows the displayed pictures. The temporal references of the reordered frames are shown below these lines. It forms an increasing series from 0 to 35. The temporal references in the case of pre-insertion of B-frames or Pe- frames are depicted in Fig. 31 and in Fig. 32 for comparison.
It can be taken from Fig. 30,Fig. 31 and Fig. 32 that the frames of the slow- forward stream should be provided with new temporal references. How these are derived is explained hereinafter. It should be mentioned that in theory a GOP does not need to be preceded by a GOP header. Although a GOP without GOP header has not been encountered in practice, this situation will also be considered. The temporal reference is only reset to zero for the first displayed frame after a GOP header. So in the absence of a GOP header the temporal reference will not be reset to zero but increased to its maximum value of 1023 and then return to zero. In this case, the I-frame has to be treated in the same way as the P-frame and the B-frame following an I-frame as a B-frame following a P-frame. All calculations are performed on a modulo 1024 basis. For the generation of new temporal references, a distinction is made between the new temporal references for the B-frames and for the anchor frames. In the following, new temporal references for the B-frames will be described.
No distinction is here made between original B-frames, repeated B-frames or inserted empty B-frames. But another categorization of the B-frames is made in relation to the temporal reference.
Fig. 33 shows an example for the case that Bf- or Bb-frames are inserted (note that BB is not Bb). In general, three types of B-frames are distinguished:
1. B-frames following an I-frame (Bi).
This is always the first frame to be displayed of the current transmission GOP. If no GOP header is present, it is treated as a B-frame following a P-frame. When a GOP header is present, the temporal reference in this B-frame is zero:
T(Bi) = O (5)
2. B-frames following a P-frame (Bp).
Due to the reordering, this B-frame is displayed after the last anchor frame preceding the P-frame in the transmission stream in front of this B-frame. This last anchor frame is denoted by AL and can be an I-frame, a P-frame or an empty P-frame. In this case, the temporal reference of the B-frame is equal to the temporal reference of the last anchor frame AL increased by 1 : T {BP} = T {AL} + 1 (6)
3. B-frames following another B-frame (BB).
It is displayed after the preceding B-frame (BL) in the transmission stream, which can also be an empty B-frame.
In this case, the temporal reference of the B-frame is equal to the temporal reference of the preceding B-frame increased by 1 :
T {BB) = T {BL} + 1 (7)
Next, new temporal references for the anchor frames will be described. Due to the reordering, the anchor frames will be displayed after the sequence of B-frames following them in the transmission stream. So it is important to know how many B-frames will follow the I-frames and P-frames in the slow-forward stream to determine their new temporal reference. In the case of a varying GOP size or of a varying GOP structure this cannot be derived from history. In practice, a varying GOP structure is not common. Even for stations having a varying GOP size, the anchor frames will always be followed by the same amount of B-frames. Nevertheless, a varying GOP structure will be considered and is possible. To be able to handle a varying GOP structure, the number of B-frames that will follow an individual anchor frame in a transmitted slow- forward stream has to be determined. This can be calculated from the slow motion factor and the number of B-frames following this anchor frame in the original recorded stream, taking into account whether empty B-frames or empty P-frames are inserted. So this number of B-frames is determined somehow. A possibility how this can be performed is to read all the data up to the next anchor frame but this demands for a substantial amount of buffering. Another possibility avoiding this buffering is to store this information in the CPI file and extract it from there. The number of B-frames can be easily derived from the distance in frames to the next anchor frame in the transmitted stream. In fact it is equal to this distance minus one. There are two ways to store this information in the CPI file:
The CPI file holds an entry for each frame including its type; The CPI file holds an entry for each anchor frame that includes the distance in frames to the previous anchor frame. In the first case, the distance in frames to the next anchor frame can easily be counted in the CPI file. The second case may seem a bit strange because the distance of the previous anchor frame is stored with the frame instead of the distance to the next anchor frame. This is chosen because the distance of the previous anchor frame is known at the moment that an anchor frame is received. The distance from the current anchor frame to the next anchor frame is simply found by reading the distance information from the next anchor frame in the CPI file. This distance will be denoted by D and the slow motion factor will be denoted by L, both of which being an integer larger than zero (see Fig. 34).
Fig. 34 shows the distance D and the slow motion factor L for normal play 3400 and for slow- forward play 3401.
The factor L is therefore not the speed factor but the slow down factor.
The total number of B-frames following the anchor frame depends on the insertion of empty B-frames or P-frames. So it is distinguished between two situations, namely that empty B-frames (Bf or Bb) or empty P-frames (Pe) are inserted. In case no GOP header is present, the I-frame is treated as a P-frame.
Next, the new temporal reference in case that empty B-frames (Bf or Bb) are inserted will be described.
The original distance to the next anchor frame is equal to D (see Fig. 34).
The distance to the next anchor frame in the slow- forward stream is equal to L x D.
So the total number of B-frames following the anchor frames is equal to L x D - 1.
The first B-frame following an I-frame has a temporal reference of zero (see Fig. 35). So the last B-frame following the I-frame has a temporal reference equal to L x D - 2. The I-frame is the next one to be displayed, so its temporal reference is one higher. Then the temporal reference for the I-frames is given by:
T(I) = Z x D - I (8)
The temporal reference for the P-frame also depends on the temporal reference of the previous anchor frame and the slow-forward stream. This previous anchor frame (I- frame or P-frame) will be denoted by AL, and its temporal reference is denoted by T(AL) (see Fig. 36). The B-frame following the P-frame will be displayed after the previous anchor frame AL. SO the temporal reference of this B-frame is equal to T{AL} + 1.
The temporal reference of the last B-frame following the P-frame is T {AL} + L X D - I . The P-frame is the next one to be displayed so its temporal reference is one higher. Then the temporal reference for the P-frames is given by:
T{P} = T{AL} + L x Z) (9)
In the following, it will be explained how the temporal reference is defined in case that empty P-frames (Pe) are inserted.
Since no empty B-frames are inserted, the total number of B-frames following an anchor frame is now L x [D - X) instead of L x D - 1 (see Fig. 37).
The temporal reference for the I-frames is now given by:
T(I) = Z x (D - I) (10)
A distinction is now made between P-frames and Pe-frames.
The anchor frame previous to the P-frame is normally a Pe-frame except for the case L = I where it is an I-frame or P-frame. In any case the previous anchor frame will be denoted by AL and its temporal reference by T{AL}, see Fig. 38.
The temporal reference for the P-frames excluding the Pe-frames is now given by:
T (P) = T (AL) + L X (D - 1) + 1 (11)
After the reordering, a Pe-frame will immediately follow a previous I-frame, P-frame, or Pe-frame, so a previous anchor frame. As a result, the temporal reference to the Pe-frame is always one higher than that of the previous anchor frame AL (see Fig. 39). The temporal reference for the Pe-frame can also be calculated with the formula for the P-frame by taking D = I. This results from the fact that a Pe-frame in the transmission stream is always followed by another anchor frame. It should also be noted that L = I corresponds to normal play and results in a normal temporal reference in all cases. In the following, switching from normal play to slow-forward and vice versa will be explained.
In this context, the switching effect from normal play to slow- forward and vice versa will be considered. In this context, the effect on the temporal reference (and corresponding PTS) of the frames will be considered.
Next, switching from normal play to slow- forward will be explained.
This situation is depicted in Fig. 40 for the post-insertion of Bf- frames.
Fig. 40 shows a transition from a normal play mode 4000 to a slow-forward mode 4001. A switching area 4002 is indicated by an arrow. Furthermore, a transition area is indicated by an arrow 4003. From a point in time 4004 onwards, the situation is identical to a continuous slow-forward situation.
The top line of Fig. 40 indicates the stream of the decoder/renderer with a switching point after frame B3. It is also shown the insertion of Bf- frames and the repetition of the B-frames after the switching point for a slow motion factor of three. The original temporal references of the stored normal play stream are shown above this line and the new temporal references needed for a continuous display below this line. The second line shows the frames after reordering and the bottom line the displayed pictures. The reordered new temporal references are again shown below these lines.
The switching area 4002 indicates the area in which a switching command is received from the user. The fastest response would be to switch to slow- forward after the current frame. Assuming that the switching command is received during frame B2 as is indicated in Fig. 41, then switching could be done at the start of frame B3.
Fig. 41 indicates the time of a switching command 4100, the original temporal reference 4101, the transmission order 4102 and the new temporal reference 4103. When the switching is done at the start of frame B3, this means that frame B3 will be repeated. As a result of this, the number of B-frames following frame 14 is changed. This change is even larger if the switching command is received during frame 14. The temporal reference of an anchor frame depends on the number of B-frames following it in the transmission stream as was explained above. As a consequence, the temporal reference of frame 14 should be changed for a correct display. In the example of Fig. 41 it is indicated that it should change from 2 to 4.
As the temporal reference is transmitted at the start of a frame it is no longer available for correction at the switching moment unless the transition stream had been buffered from the start of the last anchor frame onwards. This not only demands for additional buffering but also delays the display start of a normal play stream. This effect can be avoided or suppressed if the switching to slow- forward is delayed to the start of the next anchor frame, in this case the start of frame P7. This is the situation depicted in Fig. 40.
At the start of frame P7, the slow- forward processing is started resulting in an insertion of Bf- frames after the anchor frame and a repetition of the B-frames. Moreover, new temporal references are calculated according to the method formulated above. A transition area is indicated in Fig. 40 up to the start for next I-frame. This does not mean that something special has to be done during this area. It merely indicates that the resulting temporal references of the frames in this area are different from the temporal references in the continuous slow- forward situation.
Fig. 42 shows the situation where the switching command is received during a P-frame or the B-frame following it.
A top line of Fig. 42 again shows an original temporal reference 4101, a transmission order 4102, and a new temporal reference 4103. Furthermore, a middle line of Fig. 42 shows a sequence after reordering 4200, and a new temporal reference 4201. Furthermore, a bottom line of Fig. 42 shows the displayed pictures 4202 and the new temporal reference 4203.
Also in this case the switching could occur at the start of the next anchor frame. So in general, the switching to slow- forward occurs at the start of the next anchor frame following the reception of the switching command.
Fig. 43 shows a situation when pre-insertion of Pe-frames is used to construct the slow- forward stream. Also here the switching to slow- forward occurs at the start of the first anchor frame following the reception of the switching command. In this case it is started with the insertion of the Pe-frames. Fig. 44 shows the situation when pre-insertion of Bb-frames is used to construct the slow- forward stream. Also in this case the switching to slow- forward is performed at the start of the next anchor frame following the reception of the switching command. But, in contradiction to the situation with Pe-frames, no empty frames are inserted before this first anchor frame of the slow- forward stream. This is because the pre-insertion of the Bb-frames would require a change of the temporal reference of the previous anchor frames due to the additional Bb-frames following this previous anchor frame. So pre- insertion of Bb-frames effectively delays the switching in the display stream by one frame with respect to post-insertion of Bf- frames or pre-insertion of Pe-frames. Switching rules are derived for a normal play stream containing B-frames but can be applied to a normal play stream containing no such frames. In this case the switch occurs at a next frame boundary (minimum delay). For post-insertion of Bf-frames and pre- insertion of Pe-frames this means that the next picture to be displayed after the decoding delay is immediately part of the displayed slow- forward picture sequence. For the pre- insertion of Bb-frames, the transition to slow- forward on the display screen is delayed by one additional frame time.
Next, switching from slow- forward to normal play will be described. Also here the problem is the temporal reference of the anchor frames. The situation for post-insertion of Bf-frames is depicted in Fig. 45 showing, among others, a slow-forward temporal reference 4500.
The temporal reference of frame 14 in the slow- forward stream is based on the number of B-frames that will follow it in this mode. Without additional buffering, this number cannot be changed and switching to normal play can only occur at the start of the next anchor frame, which is P7 in this example.
After the switching point, a transition area is indicated up to the start of the next I-frame in the transmission stream. It can be seen that in this transition area the needed temporal references (indicated below the frames in transmission order) are not identical to the original temporal references of the stored normal play stream. So the generation of temporal references as described above has to be continued until the start of the next I-frame. The switching to normal play is therefore performed in two steps. The switching to the normal play data will occur at the start of the next anchor frame following the reception of the switching command, but the generation and correction of the temporal references is continued. The complete switch to normal play occurs at the start of the next I-frame. Fig. 46 shows the situation where the switching command is received during a
P-frame or the B-frames following it. Also in this case the switching method as described above may be used.
Fig. 47 shows the situation when switching from slow-forward to normal play in the case a pre-insertion of Pe-frames is used to construct the slow- forward stream. It can be seen that as described before also in this case the switching has to be performed in two steps. Switching to the normal play data with a continuation of the temporal reference generation at the start of the first anchor frame following the switching command and fully switching at the start of the next I-frame. When the switching command is received during a Pe-frame, the same two steps switching method may be applied in this case. The first switching step then will occur at the start of the next frame because this will always be an anchor frame.
The switching from slow- forward to normal play when pre-insertion of Bb- frames is used to construct the slow- forward stream is fully identical to the case of Bf-stream post-insertion. It makes no difference whether the empty B-frames come first or last in the total series of B-frames following an anchor frame.
It will be clear that the first anchor frame following the switching command could also be an I-frame. The transition area is absent in that case. The switching from slow- forward to normal play is described in the relation to a normal play stream containing B- frames. Even if the normal play stream contains no B-frames, a slow- forward stream with B- frames results from the insertion of empty B-frames. In any case the same switching rules are also applied if the original normal play stream contains no B-frames.
The described switching method leads to a large switching delay from slow- forward to normal play especially for a large slow motion factor. This cannot be avoided if a correct temporal reference is wanted unless a serious amount of buffering is used. On the other hand, some discontinuities in other items will always occur. As for fast-forward and fast-reverse, a discontinuity in the PCR time base can be avoided when switching from normal play to slow- forward but never when switching the other way round unless a complete new PCR time base is also generated for the normal play stream. Technically speaking this could of course be done but practically it is expected that at such a moment a maximum use is made of discontinuity flags of the MPEG stream. This could also help to reduce effects of incorrect temporal references, thus allowing a much faster switching from slow- forward to normal play.
Next, gluing of the individual frames will be described. Particularly, the gluing of frames in the case of incomplete picture start codes will be discussed. In order to determine the required gluing activities at the concatenation point in the slow- forward stream, it should first be clear where the original stream is explicitly split into individual frames. In the following, the practical situation of one PES packet per GOP or per frame will be considered. In the case of one PES packet per frame, the original stream may be split between the packet with the PLUSI and the preceding packet, as indicated in Fig. 48.
In Fig. 48, the splitting of the stream for one PES packet per frame is illustrated. The data streams shown in Fig. 48 include plaintext packet headers 4800, Adaptation Fields 4801, plaintext data 4802, encrypted data 4803 and plaintext PES header 4804. Furthermore, a PLUSI present is denoted with reference numeral 4805, and a PES header is denoted with reference numeral 4806.
The individual frames comprise a number of complete original packets. So no packet splitting is necessary. This frame splitting could also be performed in a completely encrypted stream, but access to some plaintext data is still necessary for the construction of the slow- forward stream. The splitting at the start of a packet with a PLUSI also means that there are no picture start codes that are spread over two packets. Each individual frame contains its own correct and complete picture start code. Therefore, no gluing activity is necessary in this case. However, in the case of one PES packet per GOP, the situation is different.
The split between frames is made at the picture start code of a new frame, unless a PES header precedes it.
The following algorithm may be used to determine the splitting point:
1. The original stream is simultaneously researched for a packet with a PLUSI bit set, a picture start code and a picture coding extension;
2. If the packet with the PLUSI bit set is encountered first, the split is made at the start of this packet (see Fig. 49, including a picture start code 4900 and a picture code extension 4901). Subsequently, the stream is searched for the picture coding extension. After this is found, the search is continued as described in point 1.; 3. If the picture start code is encountered first, the split is made at the start of the picture start code. In many cases this means that the packet containing the picture start code has to be split in two packets of which the first is assigned to the previous frame and the second to the subsequent frame (see Fig. 50 illustrating splitting of a stream at the start of a picture start code 4900, wherein places of insertion of an Adaptation Field are denoted with reference numeral 5000). Both packets are stuffed with an Adaptation Field 5000. The payload of the second packet then starts with the picture start code 4900. The recording time stamp of the original packet is copied to each of the two packets resulting from the split. Whether the two packets from the split or the original packet will be used at a concatenation point of two frames depends on the specific situation as will be explained below. Subsequently, the stream is searched for the picture coding extension 4901. After having found this, the search is continued as described in point 1.;
4. If the picture coding extension is encountered first, the picture start code must be undetectable because it is partially encrypted. This means that the current plaintext area starts with some bytes of the picture start code. In this case the split is made at the start of the first plaintext packet of the current plaintext area (see Fig. 51 showing the splitting of the stream within a picture start code 4900, and illustrating bytes of picture start code 5100 as well as picture code extension 4901). The search which is described in point 1. is continued after having found picture coding extension 4901. The described algorithm would also result in the correct splitting points for a stream with one PES packet per frame. Moreover, the algorithm is designed for application to plaintext streams as well as the hybrid streams mentioned above.
Gluing is only necessary in the case of incomplete picture start codes that can only result from point 4. of the given algorithm. So only point 4. leads to a non-ideal splitting point. A plaintext stream contains only ideal splitting points because the picture start code is always found. So no gluing is necessary in this case. But hybrid streams will contain non- ideal splitting points. A method described below may be used to determine how many bytes of the picture start code are on either side of the non-ideal splitting points. The effects of a non-ideal splitting point will be explained in detail hereinafter. Next, the situation will be considered that empty P-frames of any type are inserted at such a non-ideal splitting point. How to handle the first empty frame will be explained below. A number of bytes equal to the part of the picture start code after the splitting point is removed from the picture start code of the first empty frame. The intermediate empty frames are unchanged. The last empty frame has to be corrected for the missing part of the picture start code of the subsequent frame. So this missing part may be added to the end of the last empty frame. No changes are necessary to empty frames that are inserted at ideal splitting points.
In the following, the repetition of the B-frames will be considered. In case the B-frame has ideal splitting points on both sides, no gluing action is necessary for the repetition. But if a non-ideal splitting point is present on either side of the frame, gluing actions may be necessary or advantageous. The original frame and its repetition form a series of identical B-frames. No gluing action is necessary at the start or end of the series because here the frame is either connected to the same frame as in the normal play stream or to an empty frame. In the first case there is no discontinuity because normal order of the data is restored at this point. The solution for the second case has been given above. So only the intermediate concatenation points have to be considered where the end of a B-frame is connected to the start of the same B-frame. The example described here refers to the example given above referring to Fig. 23 and is repeated in more detail in Fig. 52 for clarity.
Fig. 52 illustrates incomplete picture start code at the concatenation point. For a correct gluing it is necessary to know the number of bytes of the picture start code (within MPEG2 the start code may be 4 bytes in length) at the end and the start of the B-frame. Denoting the number of bytes at the end by n and at the start by m, for an ideal splitting point n=0 and m=4. In the case of a non-ideal splitting point, the number n for one frame and the number m for the subsequent frame may be determined with a method which will be illustrated below.
It is evident that n can never be equal to 4 because then the split would have been made at the start of the picture start code resulting in n=0. On the other hand, m can never be 0 because in that case the picture start code would be completely in a previous frame and the split would have been made in the ideal position thus leading to m=4. So 0 < n < 3 and 1 < m < 4 is a usual situation.
In order to get the numbers n and m for one and the same frame N, these numbers have to be extracted from the information of the two splitting points surrounding the frame. So n and m now represent the number of bytes of the picture start code at the end and start of a B-frame that has to be repeated. As a consequence, they also represent a number of bytes of the picture start code before and after an intermediate concatenation point.
Next, it will be assumed that n+m=4. This is the case when both splitting points surrounding the B-frame are ideal. But it is already known that no gluing action is needed in that case. However, this can be also the case when both splitting points are non- ideal. This is the situation depicted in Fig. 53.
Fig. 53 therefore illustrates the example of n+m=4.
The last packet of frame N is denoted with reference numeral 5300, and Fig. 53 further shows the first packet of frame N denoted with reference numeral 5301. No gluing action is necessary at a border 5302. The bytes of the picture start code (n=3) is denoted with reference numeral 5303, and the byte of picture start code (m=l) is denoted with reference numeral 5304.
The fact that n+m=4 means that the correct amount of picture start code bytes are present at the concatenation point and that no gluing action is necessary.
However, Fig. 54 shows the situation with n+m>4. This means that there are 1, 2 or 3 bytes too much at the concatenation point.
In this case a number of bytes equal to n+m-4 is removed from the start of the second frame. This is accomplished by replacing these plaintext bytes by an Adaptation Field (AF) containing stuffing bytes. If an Adaptation Field is already present, its length has to be increased with m+n-4 and the data to be discarded is replaced by stuffing bytes that, according to the standard, have a hexadecimal value FF.
In the special cases of n+m>4 and n<3 it is also possible to do no gluing. Effectively, one gets elementary stream stuffing. A point at which gluing action is necessary is denoted with reference numeral
5400. In the example, the bytes of picture start code (n=2) is denoted with reference numeral
5401. Bytes of picture start codes (m=3) are denoted with reference numeral 5402. Furthermore, bytes of picture start code (n=2) are denoted with reference numeral 5403 and bytes of picture start code (m=2) are denoted with reference numeral 5404. A position of replaced bytes using Adaptation Fields (n+m-4) is denoted with reference numeral 5405. Referring to Fig. 55, it is assumed that n+m<4.
This means that 1 , 2 or 3 bytes are missing from the picture start code at the concatenation point. In this case it should be known which byte or bytes are missing. Because n and m are both known, the missing bytes can be uniquely identified. The missing bytes are now placed in a new packet that is further stuffed with Adaptation Field. This gluing packet is then placed between the two frames. This gluing packet is denoted with reference numeral 5500. Reference numeral 5501 denote bytes of picture start code (n=2), reference numeral 5502 denote bytes of picture start code (m=l). Reference numeral 5504 denotes inserted bytes (4-n-m). Reference numeral 5505 illustrates bytes of picture start code (m=l). A list of abbreviations used in the specification is provided in Table 1.
AFLD Adaptation Field Control BAT Bouquet Association Table
CA Conditional Access
CAT Conditional Access Table
CC Continuity Counter
CW Control Word
CPI Characteristic Point Information
DIT Discontinuity Information Table
DTS Decoding Time Stamp
DVB Digital Video Broadcast
ECM Entitlement Control Messages EMM Entitlement Management Messages GK Group Key
GKM Group Key Message GOP Group Of Pictures
HDD Hard Disk Drive
KMM Key Management Message
MPEG Motion Pictures Experts Group
NIT Network Information Table
PAT Program Association Table
PCR Program Clock Reference
PES Packetized Elementary Stream
PID Packet Identifier
PLUSI Payload Unit Start Indicator
PMT Program Map Table
PTS Presentation Time Stamp
SIT Selection Information Table
SCB Scrambling Control Bits
STB Set-top-box
SYNC Synchronization Unit
TEI Transport Error Indicator
TPI Transport Priority Unit
TS Transport Stream
UK User Key
Table 1 Abbreviations of terms related to trick-play
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be capable of designing many alternative embodiments without departing from the scope of the invention as defined by the appended claims. Furthermore, any of the embodiments described comprise implicit features, such as, an internal current supply, for example, a battery or an accumulator. In the claims, any reference signs placed in parentheses shall not be construed as limiting the claims. The word "comprising" and "comprises", and the like, does not exclude the presence of elements or steps other than those listed in any claim or the specification as a whole. The singular reference of an element does not exclude the plural reference of such elements and vice- versa. In a device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. The terms "data" and "content" have been used interchangeably through the text, but are to be understood as equivalents.

Claims

CLAIMS:
1. A device (1800) for processing a data stream comprising a plurality of frames, wherein the device (1800) comprises a detection unit (1802) for detecting switching from a first reproduction mode to a second reproduction mode; a delay unit (1804) for delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames; and a correction unit (1805) for correcting a temporal reference of the plurality of frames.
2. The device (1800) according to claim 1, wherein one of the first reproduction mode and the second reproduction mode is a normal play mode.
3. The device (1800) according to claim 1, wherein one of the first reproduction mode and the second reproduction mode is a trick-play play mode.
4. The device (1800) according to claim 3, wherein the trick-play mode is one of the group consisting of a slow- forward reproduction mode, a slow-reverse reproduction mode, a stand still reproduction mode, a step reproduction mode, and an instant replay reproduction mode.
5. The device (1800) according to claim 1, wherein the plurality of frames includes at least one frame of the group consisting of an intra-coded frame, a forward predictive frame and a bi-directional predictive frame.
6. The device (1800) according to claim 1, wherein the anchor frame is an intra- coded frame or a forward predictive frame.
7. The device (1800) according to claim 1, wherein the correction unit (1805) is adapted for correcting a sequence of the plurality of frames by means of a forward predictive bi-directional predictive frame, particularly an empty forward predictive bi-directional predictive frame.
8. The device (1800) according to claim 1, wherein the correction unit (1805) is adapted for correcting a sequence of the plurality of frames by means of a backward predictive bi-directional predictive frame, particularly an empty backward predictive bidirectional predictive frame.
9. The device (1800) according to claim 1, wherein the correction unit (1805) is adapted for correcting the temporal reference of the plurality of frames during a transition phase after the delay.
10. The device (1800) according to claim 1, wherein the correction unit (1805) is adapted for correcting the temporal reference of the plurality of frames in such a manner that an order of the plurality of frames is corrected.
11. The device (1800) according to claim 1, comprising an insertion unit (1805) adapted for inserting empty frames after having switched from the first reproduction mode to the second reproduction mode.
12. The device (1800) according to claim 11, wherein the insertion unit (1805) is adapted for inserting forward predictive frames and/or bi-directional predictive frame as the empty frames.
13. The device (1800) according to claim 1, comprising an repetition unit (1809) adapted for repeating frames a plurality of times in accordance with a predetermined repetition rate after having switched from the first reproduction mode to the second reproduction mode.
14. The device (1800) according to claim 13, wherein the repetition unit (1809) is adapted for repeating bi-directional predictive frame.
15. The device (1800) according to claim 1, comprising a storing unit (1801) for storing the data stream.
16. The device (1800) according to claim 1, adapted to process a data stream of video data or audio data.
17. The device (1800) according to claim 1, adapted to process a data stream of digital data.
18. The device (1800) according to claim 1, comprising a reproduction unit (1806) for reproducing the processed data stream.
19. The device (1800) according to claim 1, adapted to process an MPEG2 encrypted data stream or an MPEG4 encrypted data stream.
20. The device (1800) according to claim 1, realized as at least one of the group consisting of a digital video recording device; a network-enabled device; a conditional access system; a portable audio player; a portable video player; a mobile phone; a DVD player; a CD player; a hard disk based media player; an Internet radio device; a computer; a television; a public entertainment device; and an MP3 player.
21. A method of processing a data stream comprising a plurality of frames, the method comprising detecting switching from a first reproduction mode to a second reproduction mode; delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames; and correcting a temporal reference of the plurality of frames.
22. A computer-readable medium, in which a computer program of processing a data stream comprising a plurality of frames is stored, which computer program, when being executed by a processor, is adapted to control or carry out the following method: detecting switching from a first reproduction mode to a second reproduction mode; delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames; and correcting a temporal reference of the plurality of frames.
23. A program element of processing a data stream comprising a plurality of frames, which program element, when being executed by a processor, is adapted to control or carry out the method: detecting switching from a first reproduction mode to a second reproduction mode; delaying the switch to the second reproduction mode by a delay time which corresponds to the time difference between the switching and a start of the next anchor frame of the plurality of frames; and correcting a temporal reference of the plurality of frames.
PCT/IB2006/054417 2005-12-23 2006-11-24 A device for and a method of processing a data stream comprising a plurality of frames WO2007072244A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05112862.7 2005-12-23
EP05112862 2005-12-23

Publications (1)

Publication Number Publication Date
WO2007072244A1 true WO2007072244A1 (en) 2007-06-28

Family

ID=37905850

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/054417 WO2007072244A1 (en) 2005-12-23 2006-11-24 A device for and a method of processing a data stream comprising a plurality of frames

Country Status (1)

Country Link
WO (1) WO2007072244A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2507707C2 (en) * 2009-02-18 2014-02-20 Тенсент Текнолоджи (Шэньчжэнь) Компани Лимитед Method and apparatus for controlling video and audio data reproduction
CN106937141A (en) * 2017-03-24 2017-07-07 北京奇艺世纪科技有限公司 A kind of bitstreams switching method and device
EP3547684A1 (en) * 2018-03-28 2019-10-02 Axis AB Method, device and system for method of encoding a sequence of frames in a video stream
CN111949616A (en) * 2020-09-08 2020-11-17 天津云遥宇航科技有限公司 Ground real-time inversion demonstration system for GNSS occultation data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6192186B1 (en) * 1997-11-06 2001-02-20 Sanyo Electric Co. Ltd Method and apparatus for providing/reproducing MPEG data
WO2002087232A1 (en) * 2001-04-24 2002-10-31 Koninklijke Philips Electronics N.V. Method and device for generating a video signal
US20030053540A1 (en) * 2001-09-11 2003-03-20 Jie Wang Generation of MPEG slow motion playout
WO2003036971A1 (en) * 2001-10-23 2003-05-01 Thomson Licensing S.A. Trick modes using non-progressive dummy bidirectional predictive pictures

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6192186B1 (en) * 1997-11-06 2001-02-20 Sanyo Electric Co. Ltd Method and apparatus for providing/reproducing MPEG data
WO2002087232A1 (en) * 2001-04-24 2002-10-31 Koninklijke Philips Electronics N.V. Method and device for generating a video signal
US20030053540A1 (en) * 2001-09-11 2003-03-20 Jie Wang Generation of MPEG slow motion playout
WO2003036971A1 (en) * 2001-10-23 2003-05-01 Thomson Licensing S.A. Trick modes using non-progressive dummy bidirectional predictive pictures

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VAN GASSEL J P ET AL: "MPEG-2 compliant trick play over a digital interface", 2002 DIGEST OF TECHNICAL PAPERS. INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (IEEE CAT. NO.02CH37300) IEEE PISCATAWAY, NJ, USA, 2002, pages 170 - 171, XP002429170, ISBN: 0-7803-7300-6 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2507707C2 (en) * 2009-02-18 2014-02-20 Тенсент Текнолоджи (Шэньчжэнь) Компани Лимитед Method and apparatus for controlling video and audio data reproduction
CN106937141A (en) * 2017-03-24 2017-07-07 北京奇艺世纪科技有限公司 A kind of bitstreams switching method and device
EP3547684A1 (en) * 2018-03-28 2019-10-02 Axis AB Method, device and system for method of encoding a sequence of frames in a video stream
KR20190113546A (en) * 2018-03-28 2019-10-08 엑시스 에이비 Method, device and system for method of encoding a sequence of frames in a video stream
KR102113948B1 (en) 2018-03-28 2020-05-21 엑시스 에이비 Method, device and system for method of encoding a sequence of frames in a video stream
US10856002B2 (en) 2018-03-28 2020-12-01 Axis Ab Method, device and system for method of encoding a sequence of frames in a video stream
TWI763983B (en) * 2018-03-28 2022-05-11 瑞典商安訊士有限公司 Method, device and system of encoding a sequence of frames in a video stream
CN111949616A (en) * 2020-09-08 2020-11-17 天津云遥宇航科技有限公司 Ground real-time inversion demonstration system for GNSS occultation data
CN111949616B (en) * 2020-09-08 2023-05-26 天津云遥宇航科技有限公司 GNSS occultation data ground real-time inversion demonstration system

Similar Documents

Publication Publication Date Title
EP1967002B1 (en) A device for and a method of processing a data stream
US20080304810A1 (en) Device for and a Method of Processing an Input Data Stream Comprising a Sequence of Input Frames
US20080170687A1 (en) Device for and a Method of Processing an Encrypted Data Stream
RU2407214C2 (en) Device and method for processing of data flow, having sequence of packets and information of synchronisation related to packets
US20080212774A1 (en) Device for and a Method of Processing an Encrypted Data Stream in a Cryptographic System
WO2006114761A1 (en) A device for and a method of detecting positions of intra-coded frames in a data stream
US6970640B2 (en) Systems and methods for playing digital video in reverse and fast forward modes
WO2007072257A1 (en) A device for and a method of processing an encrypted data stream
JP2005039308A (en) Recording method for digital broadcast program, reproducing method therefor, and digital broadcast receiver
WO2007072244A1 (en) A device for and a method of processing a data stream comprising a plurality of frames
WO2007072252A2 (en) Creation of &#39;trick-play&#39; streams for plaintext, partially, or fully encrypted video streams
WO2007072419A2 (en) A device for and a method of processing a data stream
WO2007072242A1 (en) A device for and a method of processing an encrypted data stream
JP4763589B2 (en) Playback device and playback method thereof
MX2007012939A (en) A device for and a method of processing an encrypted data stream for trick play
KR20080065133A (en) Apparatus and method for controlling playback speed of video frame

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06821540

Country of ref document: EP

Kind code of ref document: A1