WO2003041413A1

WO2003041413A1 - Error control to video encoder

Info

Publication number: WO2003041413A1
Application number: PCT/GB2001/004924
Authority: WO
Inventors: Simon Durrant; Pat Mulroy; Dekun Yang
Original assignee: Pa Consulting Services Limited
Priority date: 2001-11-05
Filing date: 2001-11-05
Publication date: 2003-05-15

Abstract

A method of generating a compressed video data stream (CVDS) from a video stream, for transmission from a transmitter to a receiver over a wireless channel within a communications network, the method including the steps of: (a) determining values of one or more error rate parameters associated with the wireless channel; (b) determining values of one or more bit rate parameters associated with the wireless channel; (c) calculating values of one or more encoding parameters using the values of the error rate parameters and the bit rate parameters; (d) controlling encoding of the video stream on the basis of the encoding parameters, thereby to generate the CVDS.

Description

ERROR CONTROL TO VIDEO ENCODER

FIELD OF THE INVENTION

The present invention relates to video streaming, and, more particularly, to the streaming of video data over a wireless communications network.

The invention has been developed primarily to allow video to be streamed in a UMTS or GPRS mobile telecommunications network using streamable formats such as MPEG-4 and H.263. However it will be appreciated by those skilled in the art that the invention is not limited to use with those particular standards

BACKGROUND TO THE INVENTION

Robust transmission of real time video data over wireless networks promises to be one of important applications of the wireless packet data services. Robustness is one of the key issues because a wireless network is a bandwidth- constrained as well as error-prone environment. Bandwidth constraint forces video data to be compressed into lower bit-rate video streams before it can be transmitted over a wireless mobile network. Channel errors in a wireless mobile network range from random bit errors to burst errors.

Existing video compression standards such as MPEG-4 and H.263 use three kinds of coding techniques: motion compensated prediction to encode temporal redundancy, Discrete Cosine Transform (DCT) with quantisation to encode spatial redundancy, and Variable-Length Code (VLC) for entropy encoding. These compression mechanisms are based on exploiting both the spatial dependency and temporal dependency which make the compressed video stream susceptive to any transmission errors. Both motion-compensation based predictive coding and VLC are sensitive to transmission errors. A single bit error can cause failure in VLC decoding. Use of predictive coding leads to error propagation. Error resilient encoding is one of important issues in the reliable transmission of video data over noisy channels.

Existing video compression standards provide some error resilience to minimise the amount of data that has to be discarded whenever errors are detected: for example, the use of reversible VLC and the periodic insertion of ^synchronisation markers so that the decoder regains synchronisation more quickly when transmission errors occur. Another mechanism is to increase the ratio of intra to inter-coding both at frame and block level to stop any residual error propagation at the expense of reducing the compression gain.

For real-time wireless streaming video applications, one of the key issues is how to achieve best possible encoding robustness under the bandwidth and time constraints of the wireless networks. Each wireless channel is associated with Quality of Service (QoS) parameters indicative of among other things the error rate parameters. During the transmission of video data between a transmitter and a receiver, QoS parameters vary with time randomly and asynchronously for the transmitter and the receiver. It is important for the transmitter to exploit wireless channel conditions and select appropriate error resilient mechanisms for successful end to end real time video transmission over the wireless channel.

It is the object of this invention to provide a method of making best use of error rate parameters associated with wireless channels for generating compressed video data stream with improved quality when transmitted over the wireless channels.

SUMMARY OF INVENTION

The present invention provides a method of generating a Compressed Video Data Stream (CVDS) for transmission from a transmitter to a receiver over a wireless channel within a communications network, the method including the steps of: a) determining values of one or more error rate parameters associated with the wireless channel; b) determining values of one or more bit rate parameters associated with the wireless channel; c) calculating values of one or more encoding parameters using the values of the error rate parameters and the bit rate parameters; d) controlling encoding of the video stream on the basis of the encoding parameters, thereby to generate the CVDS.

In accordance with the described embodiment of the present invention, a CVDS transmitter in a mobile handset comprises an encoder controller and a prior art video encoder. Preferably the video encoder has several adjustable encoding parameters for altering the degree of error resilience of the generated CVDS. The encoder controller ascertains the error rate parameters associated with the wireless channel between the transmitter and the receiver, and derives the encoding parameters for generating the CVDS with improved error resilience while meeting the bandwidth constraints of the wireless channel.

The basic idea underlying the present invention is to make best use of error rate parameters associated with the wireless channels for generating CVDS with improved quality when transmitted over said wireless channels. The encoder controller may ascertain the error rate parameters from several sources. In an embodiment of the invention where the transmitter is a mobile transmitter and the receiver is a network receiver, the encoder controller can ascertain the error rate parameters associated with the wireless channel between the transmitter and the network from the air interface. In another embodiment of the invention where the transmitter is a network transmitter and the receiver is a mobile receiver, the encoder controller can ascertain the error rate parameters associated with the wireless channel between the network and the receiver. In yet another embodiment of the invention where a mobile transmitter transmits CVDS to a mobile receiver, the encoder controller can ascertain the error rate parameters from the two sources described above. The end to end feedback from the receiver and the transmitter is used to convey the information related to the error rate parameters. In a preferred embodiment of the invention, the CVDS is packetised into IP format packet before being sent to the transmitter air interface. The IP packet is constructed by three wrapping operations respectively: (1) wrapping the CVDS in an RTP format packet, (2) wrapping the RTP packet in a UDP format packet, and (3) wrapping the UDP packet in a IP format packet.

The present invention can apply to both unidirectional streaming arrangement and bidirectional conversational arrangement. In the streaming arrangement the end to end feedback of error rate parameters can be performed via an RTCP session or an RTSP session. RTCP packets are wrapped in UDP and further IP format. RTSP packets are wrapped in TCP format and further IP format packets. In the conversational arrangement the end to end feedback of error rate parameters can be performed via a SIP session. SIP packets are wrapped in UDP format and further IP format packets.

Preferably, the wireless networks are UMTS or GPRS mobile telecommunications networks. Each wireless channel is associated with Quality of Service (QoS) parameters indicative of among other things the error rate information. At session initiation and when wireless channel characteristics are to be modified, the network provides each mobile handset with a QoS parameter set that is indicative of the available bit rate, error rate conditions of the wireless channel between the mobile handset and the network. The QoS parameter set is read at the mobile handset by the encoder controller or the decoder controller. The error rate parameters are derived from the QoS parameters associated with the wireless channels.

In a preferred embodiment of the invention, the video data is divided into a sequence of frames, with each frame being further divided into a number of Group of Blocks (GOB) or a number of slices, and with each GOB or slice being further divided into a number of blocks. Preferably, the encoder parameters including frame rate, frame type indicative of either intra-coded frame or inter- coded frame, block type indicative of either intra-coded block or inter-coded block, and quantisation parameter.

According to the preferred embodiment, the error control of video encoding is achieved by a combination of the following operations: a) setting proper ratio of intra-coded frames and inter-coded frames; b) allocating GOBs or slices with suitable sizes such that the headers of GOBs or slices can be used as resynchronisation markers for the decoder to regain synchronisation quickly when transmission errors occur; c) setting proper ratio of intra-coded blocks and inter-coded blocks for each GOB or slices in each inter-coded frame.

In a preferred embodiment of the invention, the rate control of video encoding operates in conjunction with the error control such that the generated CVDS is error resilient as well as within the bit rate budget associated with the channel bandwidth. Preferably, the encoder controller performs the rate control based on the derived parameters of the error control and the current encoding status information (e.g. instantaneous bit rate generated by the encoder and the motion estimation). The rate control is used for: a) setting proper frame rate; b) setting proper quantisation parameters; such that the generated CVDS can match the bit rate constraints of the wireless channels.

In an embodiment of the present invention, the CVDS comprises a plurality of layers including the base layer and. enhanced layers in which each layer comprises a sequence of frames which may be of the same type (intra-coded or inter-coded) or a mixture of types. Each layer can be carried via an RTP session. Preferably, more than one wireless channels are available and each wireless channel is used to transmit an RTP session. In this embodiment, in conjunction with the above error control operations, the error control also includes mapping CVDS layers into wireless channels such that the base layer is mapped into the wireless channel with better transmission quality.

BRIEF DESCRIPTION OF DRAWINGS

Preferred embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a simplified schematic diagram of a UMTS communication system showing a network and two mobile handsets, where one handset is the transmitter and the other a receiver, for streaming video;

Figure 2 is a simplified schematic diagram of a UMTS communication system showing a network and two mobile handsets, where each handset is simultaneously a transmitter and a receiver for conversational video;

Figure 3 shows the construction of a UDP/IP packet containing MPEG-4 video payload data;

Figure 4 shows the construction of a UDP/IP packet containing RTCP data;

Figure 5 shows the construction of multiple substreams with a common IP address and different UDP addresses per substream;

Figure 6 shows the construction of a TCP/IP packet containing RTSP and SDP data;

Figure 7 shows the construction of a TCP/IP packet containing SIP and SDP data;

Figure 8, Figure 10 and Figure 12 show single and multiple RTP/RTCP sessions and an RTSP session (also labelled more generically as IP Sessions), mapped onto single and multiple wireless channels for streaming video;

Figure 9, Figure 11 and Figure 13 show single and multiple RTP/RTCP sessions and a SIP session (also labelled more generically as IP Sessions), mapped onto single and multiple wireless channels for conversational video;

Figure 14 shows Quality of Service (QoS) parameters associated with the wireless channels between mobile handsets and networks, and the sending of these and other QoS parameters between the mobile handsets via IP sessions for streaming video;

Figure 15 shows QoS parameters associated with the wireless channels between mobile handsets and networks, and the sending of these and other QoS parameters between the mobile handsets via IP sessions for conversational video;

Figure 16 and Figure 17 shows typical mapping of compressed video data stream frames onto base and enhanced substreams; Figure 18 is a flowchart showing sequential operations performed by the encoder controller;

Figure 19 is a flowchart showing sequential operations performed by the decoder controller; and

Figure 20 is a flowchart showing sequential operations performed by the encoder.

DETAILED DESCRIPTION OF PREFERRED AND OTHER EMBODIMENTS

The preferred embodiment of the present invention is applied to a network and associated mobile handsets designed to operate under the current GPRS or proposed UMTS standard.

Referring to the drawings, and Figure 1 in particular, there is shown a UMTS network 100 that is used to establish an end to end link between a first mobile handset 102 and a second mobile handset 104. The communication session between the first mobile handset 102 and the second mobile handset 104 is unidirectional. Mobile handset 102 is acting solely as a transmitter of compressed video data, whilst mobile handset 104 is acting solely as a receiver of compressed video data. This is termed the streaming arrangement.

In the embodiment shown in Figure 2, the communication session between the first mobile handset 142 and the second mobile handset 144 is bi-directional. Mobile handset 142 is acting both as a transmitter and a receiver of compressed video data, and mobile handset 144 is also acting as a transmitter and receiver of compressed video data. This is termed the conversational arrangement. In Figure 2 the functions and operations of the second mobile handset 144 are identical to the functions and operations of the first mobile handset 142.

It will be appreciated that the transmitter function and operation of mobile handsets 142 and 144 is identical to that of the transmitter function and operation of mobile handset 102. It will also be appreciated that the receiver function and operation of mobile handsets 142 and 144 is identical to that of the receiver function and operation of mobile handset 104. In the case of mobile handsets 142 and 144, the receiver and transmitter functions and operations are present within the same mobile handset. In mobile handsets 102 and 104 only the transmitter or the receiver function is present respectively. It will be appreciated that the mobile handsets 142 and 144 can also operate exclusively as transmitters or receivers to produce the streaming arrangement, in addition to their conventional conversational arrangement, if so configured.

In Figure 1 the first mobile handset includes a transmitter controller 108, a Real Time Streaming Protocol (RTSP) server 116, an encoder 106, a Real Time Protocol (RTP) packetiser 117, a Real Time Control Protocol (RTCP) client 119 and a transmitter air interface 110, which are operatively interconnected with each other as shown. The encoder 106 accepts raw video data (RVD) from a video source such as a camera (not shown) associated with the first mobile handset 102 and encodes it into a compressed video data stream (CVDS) format, as discussed in detail below. This is then packetised by the RTP packetiser 117. According to the normal operation of a mobile communications network, the transmitter air interface 110 establishes a wireless channel with a transmit side air interface 112 in the network 100, which in turn is in communication with a network backbone 114.

The network 100 also includes a receive side air interface 118 that establishes a wireless channel with a receiver air interface 120 disposed in the second mobile handset 104. The second mobile handset 104 also includes a receiver controller 122, an RTSP client 123, an RTP depacketiser 125, an RTCP server 128 and a decoder 124. These are operatively interconnected with each other as shown.

In Figure 2 the mobile handsets 142 and 144 includes all the components that are present in both the mobile handset 102 for the transmitter function, with the exception of the RTSP server 116, and the mobile handset 104 for the receiver function, with the exception of the RTSP client 123. The RTSP server 116 is replaced by a SIP User Agent (UA) 146 in mobile handset 142 and the RTSP client 123 is replaced by a SIP UA 148 in mobile handset 144. The components of the transmitter function, the components of the receiver function, the User Agent and the air interface for mobile handsets 142 and 144 are operatively interconnected with each other as shown. It will be appreciated that the function and operation of the mobile communications network shown in Figure 2 is as above.

In a UMTS network, the mobile handsets 102, 104, 142 and 144 are designated User Equipment (UE), the air interface elements 112 and 118 correspond to the Universal Terrestrial Radio Access Network (UTRAN), the backbone element 114 corresponds to the Core Network (CN).

According to normal mobile communications network operation, an end to end link is established between the first and second mobile handsets 102, and 104 (or 142 and 144), comprising a first wireless channel 126 between the first mobile handset and the network and a second wireless channel 127 between the network and the second mobile handset. Wireless channels are established using different frequencies and/or spreading codes and/or time slots in a manner well known in the mobile communications art. They allow for bi-directional communication, both for data and control information.

As part of the wireless channel control information, the wireless channel 126 between the transmitter air interface 110 and the transmit side air interface 112 carries Quality of Service parameters (QoS), from the network 100 to the first mobile handset 102 (or 142). Similarly, the wireless channel 127 between the air interface 118 and the receiver side air interface 120 carries Quality of Service parameters (QoS), from the network 100 to the second mobile handset 104 (or 144).

The packetised CVDS is transmitted over the end to end link defined between the two mobile handsets 102 and 104 (or 142 and 144), across wireless channels 126 and 127. In the preferred embodiment, the CVDS takes the form of an MPEG-4 stream, but other suitable streaming formats, such as H.263 can also be used. Both of these standards are applicable to variable bitrate and low bitrate video, e.g. bitrates of 10kbps or higher. It is particularly preferred that the transmission be in RTP format, as discussed in detail below in relation to Figure 3, Figure 4, Figure 5.

Turning to Figure 3, the packetisation of the raw MPEG-4 data for transmission is shown. This packetisation takes place in the first mobile handset 102 or 142 under the control of the transmitter controller 108 before sending the packets to the transmitter air interface 110. Upon emerging from the wireless network of the transmitter the packets travel over a packet switched network to the wireless network of the receiver. Here the packets are sent to the receive side air interface 118 and on to the second mobile handset 104 or 144 via wireless channel 127.

Figure 3 shows the packetisation layers for a single MPEG-4 packet 200 in the form in which it leaves the encoder 106. It will be appreciated that a stream of such packets will be generated from the incoming RVD. The MPEG-4 video data 200 is wrapped in an RTP format packet layer 201. This, in turn, is wrapped in a UDP format packet layer 202, which in turn is packetised into an Internet Protocol (IP) packet 203. It is this IP packet 203 that is presented by the RTP packetiser 117 to the transmitter air interface 110 for transmission over the wireless channel. This layering format, defined by IETF, is known and is included in the developing UMTS standard for video transmission.

Each of the packetisation layers of the packet is directed to a particular part of the overall communication. They will not be described in detail because they are already known in the art and conform to the respective standards. However the principal component of each packet will be described insofar as is necessary to understand the embodiments of the invention that follow.

The MPEG-4 layer 200 contains the coded video data. The RTP layer 201 contains sequence numbers, time stamps, and payload bits that enable a depacketiser and decoder to decode it and replay at the correct time and in the correct sequence in relation to other packets from the same stream. The UDP layer is used for asynchronous communication of the data over the wireless communications channel and is a "best effort" connectionless protocol. The IP packet 203 contains an IP address which identifies the mobile receiver 104 or 144 as the destination. The IP packet header may also contain a Differentiated Services Code Point (DSCP) which could be used by a diffserv-enabled core network to determine how that packet should be forwarded by nodes inside that network. Figure 4 shows the packetisation layers for a single RTCP packet 205 in the form in which it leaves the RTCP server 128. The RTCP packet is wrapped in a UDP format packet layer 206, and packetised into an IP packet 207.

Turning to Figure 5, the CVDS can be transmitted over the wireless channels in one or multiple substreams, each transported by an RTP session, (and an associated RTCP session) where these are mapped to one or multiple wireless channels that may have different quality parameters. In this case the IP address 208 is common to both substreams. Routing through the transmission chain is achieved by characterising different substreams by different socket numbers 210 and 212 in the UDP address.

It will be understood that some of the receiving handsets, or, indeed, the transmitting handset may not be capable of forming a multi wireless channel connection with the network. This may be because of equipment incompatibilities or network resource issues, for example. In this case, it is still possible for the video layers to be allocated to multiple wireless channels in accordance with the above embodiment, whilst multiplexing the video onto a single wireless channel for those not capable of forming the requisite multi wireless channel connection.

In the streaming arrangement of Figure 1 , the IP packet of Figure 3 is transmitted directly from the first mobile handset 102, via the network 100, to the second mobile handset 104.

Once received at the receiver air interface 120, the packets are forwarded to the RTP depacketiser 125, where the MPEG-4 data 200 is re-constructed. The packets must be re-ordered using RTP layer data 201 such as frame timestamps and the data from the plurality of substreams must be re-assembled. The reconstructed MPEG-4 data 200 is then sent from the RTP depacketiser 125 to the decoder 124, where it is decoded for replay on, for example, a visual display (not shown) on the second mobile handset 104.

In the return direction (i.e., from the second mobile handset 104 to the first mobile handset 102 in this example), control information is returned in accordance with the known RTCP and RTSP protocols, with the latter using the known Session Description Protocol (SDP). Figure 8 and Figure 10 show the IP sessions between a streaming server and client. The RTCP sessions provide feedback on the data transmission quality for each RTP session. RTSP additionally provides an overriding control connection from RTSP client 123 to RTSP server 116, using SDP to provide a description of the connection between client and server. It is known in the art that, when a session changes, an RTSP control packet containing a new SDP packet is sent to the remote entity.

The packetisation of the SDP and RTSP information is shown in Figure 6. The SDP information 306 is wrapped in an RTSP packet 300. An RTSP packet 300 is wrapped in a Transport Control Protocol (TCP) packet 302, which is within an IP packet 304. This packet is built up by the RTSP client 123 and supplied to the receiver air interface 120 for transmission to the receive side air interface 118 of the network 100. The destination address of this RTSP packet is that of the first mobile handset 102.

In the preferred embodiment shown in Figure 1 , the RTSP packet passes to the transmit side air interface 112 via the backbone 1 14 for transmission to the transmitter air interface 110, to the RTSP server 116 and thence to the transmitter controller 108. Also, control information is exchanged between both mobile handsets 102 and 104 in accordance with the known RTCP protocol.

In the conversational arrangement of Figure 2 the IP packet of Figure 3 is transmitted and received between the mobile handsets 142 and 144, via the network 100, in the same way as for the streaming arrangement described above. The encoded video packet is also processed in the same way as for the streaming arrangement described above.

For the control of bi-directional communications sessions the Session Initiation Protocol (SIP) is used instead of RTSP. This is managed by user agents 146 and 148 in mobile handsets 142 and 144 respectively. SIP uses SDP to provide a description of the connection between two peer user agents. It is known in the art that, when a session changes, a SIP control packet containing a new SDP payload is sent to the peer user agent.

Figure 9 and Figure 11 show the IP sessions between two peer user agents 146 and 148. The SIP session provides an overriding connection control between user agents 146 and 148 using the SDP. Again, control information is also exchanged between both mobile handsets 142 and 144 in accordance with the known RTCP protocol. The RTCP session will provide feedback on the data transmission quality for each RTP session.

The packetisation of the SIP information is shown in Figure 7, where the packet is denoted 308. An SDP payload 307 is encapsulated within the SIP packet 308. A SIP packet 308 is wrapped in a User Datagram Protocol (UDP) packet 310, which is within an IP packet 305. This packet is built up by the user agent 148 and supplied to the air interface 127 for transmission to the receive side air interface 118 of the network 100. The destination address of this SIP packet is that of the other mobile handset 142. It will be understood that user agents 146 and 148 are interchangeable in this scenario.

In the preferred embodiment shown in Figure 2, the SIP packet passes to the transmit side air interface 112 via the backbone 114 for transmission to the transmitter air interface 126 and thence to the transmitter controller 108 via the user agent 146.

In use, a user of, say, the first mobile handset, 102 or 142, places a call to the second mobile handset 104 or 144, by dialling the second handset's mobile number.

The first mobile handset's number is mapped to a first IP address taken from a pool of IP addresses, and the second mobile handset's number is mapped to a second IP address taken from the pool of IP addresses. This mapping persists for as long as the connection is maintained. Once the connection is broken, usually because one or both of the users hang up, the mapping is removed and the IP addresses are returned to the pool for reuse. It will be appreciated that this arrangement means that packets can be routed using the allocated IP addresses instead of the phone numbers.

There follows a description of the basic elements of call set-up at the wireless level to aid understanding of the invention.

When a user requests a call, the requested quality class of the wireless channel is communicated to the network 100. In particular, the first mobile handset 102 or 142 can request a particular QoS from the UMTS network, which specifies, for example, guaranteed and maximum bitrates. On this basis, and assuming there are sufficient network resources available, a wireless communications channel is established between the first mobile handset 102 or 142 and the network 100, the wireless channel having defined QoS criteria.

Alternatively the first mobile handset 102 or 142 might request the network resources as a number of wireless channels each with associated QoS (see later for multiple wireless channel discussion).

The second mobile handset 104 or 144 must similarly establish a connection with the network 100, establishing a wireless communications channel with an independent QoS criteria to that of the wireless channel established by the first handset 102 or 142.

Once the wireless communications channel or set of channels is established, video data from, say, a camera (not shown) associated with the first mobile handset 102 or 142 is received by the encoder 106, which in turn generates a sequence of CVDS video data. The video data is sent to the RTP packetiser 117, where it is packetised as described above and sent to the second mobile handset as described above. The receiving mobile handset is mobile handset 104 if mobile handset 102 is the transmitter; mobile handset 144 if mobile handset 142 is the transmitter; and mobile handset 142 if mobile handset 144 is the transmitter.

As other users make and break wireless communications channels and the network 100 continuously monitors user resource allocations, it can be the case that available bandwidth for any particular call changes, either increasing or decreasing. If a user has a multiple wireless channel allocation, one or more of the wireless channels may be terminated. It will be understood that these bandwidth changes can take place at a number of points along a given wireless communications channel. For example, it could take place between the first mobile handset 102 (or 142) and the network, or between the network and the second handset 104 (or 144).

Other factors, such as distance of a handset from a base station with which it is communicating, or strong multi-path reflections, can also cause the effective bandwidth or quality of a wireless channel to be reduced. Any change in effective bandwidth or quality can have two consequences. If the bandwidth increases it could in principle be possible to transmit video data at a higher bitrate. If it decreases, however, the rate at which video data can reliably be transmitted also decreases, possibly below the value that was set at the start of transmission. To accommodate these consequences the QoS parameter set including the available bitrate and bit error rate on the wireless channel between the first mobile handset 102 or 142 and the network 100 is monitored by the controller 108 at the first mobile handset 102 or 142.

In the preferred form, at session initiation and when wireless channel characteristics are to be modified, the UMTS network provides the first mobile handset with a QoS parameter set that is indicative of the available bitrate (i.e. bandwidth) on the wireless communications channel. Such a QoS parameter set is supplied through the protocol stack in known wireless systems from the air interface 112 of the network to the air interface 110 of the transmitting mobile handset 102 or 142. It is normally supplied across a wireless control channel on the downlink of the call.

According to the developing UMTS standard, the QoS parameter set is indicative of various transmission parameters, including the transmissible bitrate over the wireless channel, the signal to noise ratio, the error rate and a priority indicator which is an indication provided from the network to the transmitting mobile handset of the likely priority to be placed on the call. This is therefore an indicator of the bandwidth and likely reliability of the wireless communication channel that has been opened for the particular wireless channel. It will be appreciated in this context that the word "call" is used herein to describe the transmission of video data as well or instead of voice data. The QoS parameter set is read at the mobile handset 102 or 142 by the controller 108 and the transmissible bitrate is extracted from it.

The quality of the wireless communication channel between the network 100 and the second mobile handset 104 or 144 is also monitored. In the preferred form, a QoS parameter set indicative of, amongst other things, the available bandwidth for the RTP session mapped onto this wireless channel is ascertained in the second mobile handset 104 or 144 derived from the wireless control channel information it receives from the network according to the relevant wireless standard (in this case UMTS).

The QoS parameter set is dealt with at the second mobile handset 104 or 144 in a novel manner. The session control protocol (both bi-directional using SIP and unidirectional using RTSP) and have already been discussed. SDP provides for the exchange and updating of session description information such as codec and bitrate.

For the streaming arrangement (shown in Figure 1 ), according to the existing RTSP standard, various control parameters are conveyed by the RTSP packets including, for example, video commands such as PLAY and PAUSE. The standard also provides an ANNOUNCE instruction The system described herein uses the ANNOUNCE provision in the RTSP standard to cause elements of the QoS parameter set determined in the wireless environment and/or other derived parameters to be placed into an SDP payload which itself is placed in an RTSP packet for transmission from the second mobile handset 104 to the mobile handset 102. The thus constructed novel packets are transmitted by the RTSP client 123, via the receiver air interface and the received side air interface to the network backbone 114. From here they travel to the RTSP server 116 via the transmission side air interface 112 and the transmitter air interface 110.

For the conversational arrangement (in Figure 2), according to the existing SIP standard, SDP payload are conveyed by the SIP packets to control the bidirectional communication between two mobile handsets. A session is initiated using the INVITE instruction, which itself contains a session description in the SDP format. The standard provides for a session to be modified by either agent by issuing a subsequent INVITE instruction. The system described herein uses the re-INVITE provision in the SIP standard to cause the quality parameter determined in the wireless environment and/or other derived parameters to be placed into an SDP packet which itself is placed in a SIP packet for transmission from the receiving mobile handset 144 to the transmitting mobile handset 142. The thus constructed novel packets are transmitted by the session control agent 148, via the receiver air interface 120 and the received side air interface 118 to the network backbone 114. From here they travel to the session control agent 146 via the transmission side air interface 112 and the transmitter air interface 120.

It can be understood that user agents 146 and 148 are interchangeable in this scenario. For both streaming and conversational arrangements the RTP sessions carrying the video data each have associated RTCP sessions carrying control information back to the transmitter. The system described herein can, in addition to or instead of using RTSP or SIP, use the RTCP application defined (APP) packet to transfer application data (in this case the wireless and other derived QoS parameters) from the receiver to the transmitter.

In both the streaming (Figure 1 ) and conversational (Figure 2) arrangements, when a session control packet is received at the transmitting mobile handset, this along with locally derived QoS parameters can be used to modify the encoded bitrate as already discussed above. In this way, the bitrate of the video stream transmitted from the encoder can be adapted to the wireless channel between a transmitting mobile handset and the network and also to the wireless channel between the network and a receiving mobile handset. In the streaming arrangement and now referencing Figure 8 and Figure 10, there are shown two ways of transmitting a CVDS from an encoder 106 to a decoder 124 using RTP between an RTP packetiser 117 and an RTP depacketiser 125. RTCP control messages are sent between RTCP server 128 and RTCP client 119. RTSP control messages are sent between an RTSP server 116 and an RTSP client 123. Figure 8 shows a prior art method of streaming CVDS transported by an RTP session 804 and associated RTCP session 805. RTSP control messages are sent via RTSP session 806. It will be appreciated that at least some part of the end to end communications channel is wireless. In Figure 8 the RTP session containing all the frames of the video stream and the RTCP session containing all RTCP control messages are multiplexed onto a single wireless channel, with the CVDS parameters and frame sequencing being derived according to the QoS parameter of the wireless channel at call set up.

Figure 9 shows the prior art for conversational arrangement. Each mobile handset is depicted with an RTP packetiser (117 or 134), an RTP depacketiser (133 or 125) and a user agent (146 or 148). For a conversational arrangement, a CVDS is transmitted from RTP packetiser 117 to RTP depacketiser 125 using RTP session 814 and RTCP session 815 simultaneously with a CVDS transmitted from RTP packetiser 134 to RTP depacketiser 133 using RTP session 816 and RTCP session 817. A single SIP session 818 controls all the RTP/RTCP sessions. In Figure 9, as in Figure 8, the RTP sessions containing all the frames of the video stream, the RTCP sessions corresponding to these and the SIP session containing all of the SIP control messages are multiplexed onto a single wireless channel, with the CVDS parameters and frame sequencing being selected according to the QoS parameter of the wireless channel at call set up.

Figure 10 shows a preferred embodiment of the invention for the streaming arrangement, in which there is a plurality of wireless channels. In the example shown there are three wireless channels. RTP/RTCP sessions 824/825 and 826/827 are used for transmitting the two substreams of a CVDS and RTSP control messages are sent via RTSP session 828. The wireless channels are provided at call set up as previously described in a situation where the encoder 106, RTP packetiser 117, RTCP client 119 and RTSP server 116 are in a first mobile handset, and the decoder 124, RTP depacketiser 125, RTCP server 128 and RTSP client 123 are in a second mobile handset in accordance with requested QoS and bandwidth parameters according to the UMTS standard.

As with previous embodiments, the CVDS substreams are transported via RTP between the RTP packetiser 117 and RTP depacketiser 125, RTCP control messages are sent between RTCP server 128 and RTCP client 119 and control messages are transmitted via an RTSP session between the RTSP server 116 and RTSP client 123. As shown in the example Figure 8, one RTSP session covers all the RTP/RTCP sessions between any two entities while each CVDS substream requires an individual RTP session and an associated RTCP session.

Figure 11 shows a preferred embodiment of the invention for the conversational arrangement, in which there is a plurality of wireless channels. In the example shown there are three wireless channels. RTP/RTCP sessions 834/835 and 836/837 are used for transmitting the two substreams of a CVDS generated by the encoder 106. RTP/RTCP sessions 838/839 and 840/841 are used for transmitting the two substreams of a CVDS generated by the encoder 129. The SIP control messages are sent via SIP session 842. Figure 11 shows that the base layers produced by encoders 106 and 129, carried by RTP sessions 834 and 838 and packetised by RTP packetisers 117 and 134, use the up and down links of the same wireless bearer at each mobile handset. Similarly, the enhanced layers from the encoders 106 and 129, carried by RTP sessions 836 and 840 and packetised by RTP packetisers 1 17 and 134, use the up and down links of the same wireless bearer at each mobile handset. It will be appreciated that there are many other possible mappings between RTP/RTCP sessions to wireless bearer. The wireless channels are provided at call set up as previously described and in accordance with requested QoS and bandwidth parameters according to the UMTS standard. As shown in the example, one SIP session 845 covers all the RTP/RTCP sessions between any two entities while each CVDS substream requires an individual RTP session and an associated RTCP session.

Since wireless channels for the transmit side and receive side are allocated separately there is no guarantee that the number of transmit side and receive side wireless channels will be the same. In the streaming arrangement shown in Figure 12 there is one transmit side wireless channel and three receive side wireless channels. On the receive side RTP/RTCP sessions 844/845 and 846/847 and RTSP session 848 are each mapped to a separate wireless channel. On the transmit side all IP sessions 844-848 are mapped to the same wireless channel.

Figure 13 shows the situation for the conversational arrangement, where for the handset containing user agent 146 there is only one wireless channel, whilst for the handset containing user agent 148 there are three wireless channels, in the handset containing user agent 148 the RTP/RTCP sessions 853/854 and 857/858 are mapped to the same wireless channel. However the RTP/RTCP sessions 851/852 and 855/856 are mapped to another wireless channel and the SIP control 859 is mapped to yet another wireless channel. In the handset containing user agent 146 all IP sessions of all types 851-859 are mapped to the same wireless channel. Under the proposed UTMS standard (and others), wireless channels can be defined in terms of a number of quality of service parameters, such as priority, maximum and guaranteed bandwidth, residual bit error rate and delay.

In the embodiment of Figure 10, the first RTP session 824 is defined as carrying the base substream, having an example bitrate of 16kbps, whilst the second RTP session 826 has a bitrate of 32kbps. The first wireless channel has the lowest bitrate but highest priority and the base substream is allocated to it. The enhancement substream is allocated to the second wireless channel, since it has the lower priority. This ensures that the most important video data is allocated to the wireless channel with the highest priority. The first RTP session can also be marked with the highest priority DSCP for prioritised transport over the IP component of a diffserv enabled core network.

As discussed above in relation to ^'other embodiments, the allocation of resources within a UMTS network is dynamic, and this can mean that bandwidths allocated to either of the RTP sessions can fluctuate with (amongst other things) network load. In the preferred embodiment shown in Figure 1 and Figure 2, the bandwidth available for each wireless channel is known to the transmitter controller 108 as it monitors the network messages at the transmitter air interface 110. In the event that the available bandwidth on one or more of the wireless channels is commanded by the network 100 to be reduced, an assessment is made as to whether it is desirable to reallocate the frames between substreams. For example, assuming the substream frame structure of Figure 16, if the second RTP session bandwidth fell to, say, 16kbps an assessment would be taken to determine whether to simply reduce the number of P and B frames generated and leave other substreams unchanged, or whether a better overall quality would be achieved by including some of the P and B frames on the base substream at the expense of say, reduced resolution in the I frames.

It is not necessarily the case that reallocation will happen automatically and immediately without any assessment of context. In one form, the preferred embodiment is configured to maintain a history of wireless channel behaviour in relation to the quality parameter. As an example, a sudden drop in bandwidth on a wireless channel to which relatively high priority substream or frame type is mapped may not be a trigger for the frame mapping to be changed. If there is a history of short-term bursts of bandwidth loss, it is likely that the higher bandwidth will be available shortly, and it may ultimately be more efficient to allow the short- term reduction to be ignored. Typically, an assessment of this type will be made by the transmitter controller 108. It will be understood that quite sophisticated proportional, integral and differential factors can be taken in to account to build a relatively sophisticated model of any wireless channel's behaviour (and likely future behaviour) over time. Such modelling well known to those skilled in the relevant art, and so is not described further here.

Similar history data can be collected for the other types of a quality data collected in earlier embodiments of the invention, and similarly used to make decisions about how and when to alter outputs of, for example, the encoder. In general, if there is a history of short-term bandwidth problems, then it may be more efficient or may provide a visibly better overall video streamed image if the bitrate out of the encoder is not immediately altered when the bandwidth initially drops. Rather, it will in some cases be preferable to wait until the bandwidth has remained low for a predetermined time period or number of frames before changing the output of the encoder.

Wireless channels between mobile handset and network have a certain QoS, which is provided for the mobile user of a network service. In an embodiment of the invention, a set of QoS parameters including Bitrate (BR) and Bit Error Rate (BER) are used for controlling the video encoder. These QoS parameters are conveyed between the encoder controller and the decoder controller via IP sessions. Referring to Figure 14, a wireless channel between transmitter 102 and network has QoS parameters BR and BER. A wireless channel between network and receiver 104 has QoS parameters BR' and BER'. The encoder controller 108 sends BR to decoder controller 122 via an RTSP session 866. Having received BR from the encoder controller 108, the decoder controller 122 sends BER' and the calculated Request Bitrate (RBR) to the encoder controller 108 via an RTSP session 866 or RTCP session 865. The encoder controller and the decoder controller will be discussed in detail below. The encoder controller 108 is used to control the video encoder 106 with the objective of improving the error resilience of video encoding while meeting the bitrate constraint of wireless channels.

In a preferred embodiment of the invention, the video encoder is an MPEG-4 or H.263 compliant encoder. The input video source is encoded into an MPEG-4 or H.263 compliant bit stream. The video data can be constructed using a plurality of different types of frames, which are referred to as I, P and B frames. I frames are self-contained still frames of video data, whereas P and B frames constitute intermediate frames that are predicatively encoded. The precise composition of the frames varies in accordance with the particular standards and application of the standards and is known perse.

MPEG-4 specifies a plurality of layers, including the base layer and enhanced layers, in which each layer is comprised of a sequence of frames which may be of the same type (I, P, B) or a mixture of types. As already described, in a wireless network the mobile may be allocated one wireless channel or a plurality of wireless channels. In the preferred form, each wireless channel is used to transmit a single RTP/RTCP session pair. Each RTP session carries an optimum sequence of I, P and B frames, known as a substream. The term "substream" is used rather than "layer" because the frame sequencing onto wireless channels can be varied dynamically and need not be one of the layer sequences predefined in MPEG-4 or other known video encoding standards. Other partitions of coded video data for error resilient purposes (e.g. Data Partitioning Modes of MPEG-4/H.263) are also possible and could be represented by substreams.

Figure 16 and Figure 17 illustrate example compositions of a video data stream in accordance with the MPEG-4 video standard. In the example two temporally scalable substreams are used with I frames on the base substream while P and B frames are carried on the enhanced substream. In the illustrated example in Figure 17, only the base substream is used, comprising interleaved I and P frames. The encoder controller can thus control the bitrate of the video data stream for each wireless channel by manipulation of the number of substreams used for transmission, and the number and type of frames per substream. Video encoding under MPEG-4 or H.263 standards operates on a frame- by-frame basis. Each frame is divided into either Group of Blocks (GOBs) or slices. A GOB comprises of macroblocks of one or several rows in a video frame. Slices are more flexible and can partition the frame into a variable number of macroblocks. A macroblock in turn comprises of four luminance and two spatially corresponding colour difference blocks of image data. All blocks in an l-frame are intra-coded. Blocks in an inter-coded P or B-frame can be of both intra-coded blocks (1-blocks) and inter-coded blocks (P-blocks or B-blocks).

The increase of l-block/P-block ratio (\_IP_) in P-frames or l-block B-block ratio (I_b/B_b) in B-frames has two consequences: (1) improving the error resilience because more intra-coded blocks result in less error propagation; (2) increasing the bitrate because inter-coded blocks comprise substantially smaller amounts of data than intra-coded blocks. The encoder controller controls the encoder to make the best use of wireless channel utilisation for error resilient video encoding.

The error control can also be achieved by allocating GOBs or slices in a frame wherein the header of each GOB or slice can serve as synchronisation markers for decoder to regain synchronisation.

Figure 18 illustrates the operation of the encoder controller 108. The encoder controller operates by a closed-loop process. In the first step 400, the encoder controller obtains relevant information from various sources including the BR and BER associated with the wireless channel between the encoder and the network from air interface, the RBR and BER' via IP control session from the decoder, latency jitter (ΔL) of RTP packets from the RTCP client 123 and the instantaneous bitrate (IBR) from the encoder. In the second step 410, the encoder controller determines the target BR (BR_targ_et) and the frame type (FT) based on the BR, RBR and IBR. In the third step 420, the encoder controller determines the l_b/P_b ratio for P-frames and the Ib/B for B-frames and the synchronisation marker rate R_syn_c for all frames. In the fourth step 430, the encoder controller determines the QP and the frame rate (FR) for the frame based on the l_b/P_b or l_b/B_b, and BR_targ_et- In the fifth step 440, the encoder controller sends the encoding parameters FT, FR, Rsync, l_b/Pb or l_b/Bb, and QP to the encoder. In the sixth step 445, the encoder controller sends BR via IP control session to the decoder. Then the encoder controller goes back to the first step 400.

Figure 19 illustrates the operation of the decoder controller 122. The decoder controller operates by a closed-loop process. In the first step 450, the decoder controller obtains relevant information from various sources including the BR' and BER' associated with the wireless channel between the decoder and the network from air interface, the BR associated with the wireless channel between the encoder and the network via IP session from the encoder. In the second step 460, the decoder controller determines ΔL of RTP packets received. In the third step 470, the decoder controller calculates RBR based on ΔL, BR and BR'. In the fourth step 480, the decoder controller sends the RBR and BER' via IP control session to the encoder controller. Then the decoder controller goes back to the first step 450 .Figure 20 illustrates the operation of the encoder 106. The encoder operates on a frame-by-frame basis. In the first step 490, the encoder obtains the encoding parameters including FT, FR, R_sync, l_b P_b or I_b/B , and QP from the encoder controller. In the second step 492, the encoder allocates GOBs or slices for inter-coded frames based on R_syn_c- In the third step 494, the encoder further allocates the l-block distribution within P or B-frames based on l /Pb or Ib/Bb. In the fourth step 496, the encoder encodes the frame using the above encoding parameters and adds it to the CVDS. In the fifth step 498, the encoder calculates the IBR and sends it to the encoder controller. Then the encoder goes back to the first step, to process the next frame.

Although various aspects of the invention have been described with reference to specific embodiments, it will be appreciated by those skilled in the art that the invention can be embodied in many other forms.

DEFINITIONS

Unless the contrary intention clearly appears from the context in which certain words are used, the following definitions apply to words used in this specification: WIRELESS CHANNEL; a physical radio channel with associated QoS parameters, e.g. UMTS Radio Access Bearer.

END TO END LINK; an end to end communications link between transmitter and receiver containing a transmit side wireless channel and / or a receive side wireless channel;

IP SESSION; An IP communications session between two IP hosts carrying either control or application data. Examples are a RTP session, a RTCP session, a RTSP session and a SIP session. One or more IP sessions can be mapped to a wireless channel;

COMPRESSED VIDEO DATA STREAM (CVDS); an overall video stream, where the original stream of image frames is compressed by means of an encoder;

FRAME; a video encoder outputs a number of different frame types. These include full still images and derivative images that have different data transmission requirements, have different sensitivity to errors and may have dependency on other frames. In the MPEG-4 video standard, frames are also known as Video Object Planes (VOPS);

SUBSTREAM; a CVDS may be split into a number of substreams for the purposes of transmission over a channel or plurality of channels. Each substream can be used as a means of transmitting a sequence of video frames that may be of different types. A substream is not necessarily the same as a layer as defined in video standards such as MPEG-4 and H.263. A substream could also be used to transmit any part of the coded frame data that can be successfully partitioned for error resilient purposes (e.g. DCT coefficients and motion vector data in data partitioning modes of MPEG-4/H.263). Each substream is transported by an RTP session and an associated RTCP session.

RTCP SERVER; an entity generating RTCP Receiver Reports based on the reception of RTP packets. These are sent to the transmitter of the RTP packets, where an RTCP client uses them.

RTCP CLIENT; an entity that uses RTCP Receiver Reports. These are sent from the receiver of the RTP packets, where an RTCP server generates them.

Claims

1. A method of generating a compressed video data stream (CVDS) from a video stream, for transmission from a transmitter to a receiver over a wireless channel within a communications network, the method including the steps of:

(a) determining values of one or more error rate parameters associated with the wireless channel;

(b) determining values of one or more bit rate parameters associated with the wireless channel;

(c) calculating values of one or more encoding parameters using the values of the error rate parameters and the bit rate parameters;

(d) controlling encoding of the video stream on the basis of the encoding parameters, thereby to generate the CVDS.

2. A method according to claim 1 , wherein step (a) includes ascertaining the error rate parameters associated with the wireless channel between the transmitter and the receiver, and step (d) includes controlling encoding of the video stream to improve an error resilience of the CVDS whilst meeting the bandwidth constraints of the wireless channel.

3. A method according to claim 2, where the transmitter is a mobile transmitter and the receiver is a network receiver, wherein the error rate parameters are derived from data taken from an air interface associated with the transmitter.

4. A method according to claim 2, where the transmitter is a network transmitter and the receiver is a mobile receiver, wherein the error rate parameters are derived from data taken from an air interface associated with the receiver and communicated via end to end feedback between the receiver and the transmitter.

5. A method according to 2, where both receiver and transmitter are mobile, wherein the error rate parameters are derived as in 3 and 4 and are combined in the transmitter.

6. A method according to claims 4 & 5, wherein the end to end feedback is carried out by one or more IP sessions between the receiver and the transmitter, wherein each IP session is one of an RTCP, RTSP or SIP session.

7. A method according to any one of the preceding claims, wherein the communications network is a UMTS or GPRS mobile telecommunications network.

8. A method according to claim 7, wherein the error rate and bit rate parameters are derived from Quality of Service (QoS) parameters associated with the wireless channel

9. A method according to claim 8, further including the step, prior to step (a), of initiating a session in which one or more of the wireless channels is established for transmission of the CVDS, the initiation of the session including receiving in the transmitter the QoS parameters from the network.

10. A method according to claim 9, when the QoS parameters are provided to the transmitter again in the event that they are modified.

11. A method according to any one of the preceding claims, wherein the CVDS is packetised into IP format packet prior to transmission.

12. A method according to claim 11 , wherein the transmitter is a mobile handset and the IP packetisation includes the substeps of: wrapping the CVDS in an RTP format packet; wrapping the RTP packet in a UDP format packet; and wrapping the UDP packet in an IP format packet for supply to an air interface associated with the transmitter for transmission to the receiver via the network

13. A method according to claim 12, wherein the receiver is configured to transmit a CVDS to the transmitter in accordance with any one of the preceding claims.

14. A method according to any one of the preceding claims, wherein the encoding parameters include any one or more of the following: a frame rate; a frame type, being indicative of either intra-coded frame or inter-coded frame; a block type, being indicative of either intra-coded block or inter-coded block; and a quantisation parameter.

15. A method according to claim 14, wherein the video data is divided into a sequence of frames, with each frame being further divided into Groups of Blocks (GOBs) or slices, each GOB or slice being further divided into a number of macroblocks.

16. A method according to claim 15, wherein the encoding parameters include any one or more of the following; a ratio of intra-coded frames to inter-coded frames; sizes of GOBs or slices, the sizes being selected such that headers of GOBs or slices can be used as resynchronisation markers for the decoder to regain synchronisation quickly when transmission errors occur; a ratio of intra-coded blocks and inter-coded blocks for each GOB or slices in each inter-coded frame;

17. A method according to any one of the preceding claims, wherein the CVDS comprises a plurality of layers including a base layer and enhanced layers, each layer comprising a sequence of frames.

18. A method according to claim 17, wherein each layer is carried via an RTP session.

19. A method according to claim 18, wherein more than one transmit side and/or receive side wireless channel is open between the transmitter and the receiver, and each wireless channel is used to transmit an RTP session.

20. A method according to claim 19, wherein layers of the CVDS are mapped onto the wireless channels such that the base layer is mapped into the wireless channel with better transmission quality.