WO2016209132A1

WO2016209132A1 - Method and system for encoding an input video stream into a compressed output video stream with parallel encoding

Info

Publication number: WO2016209132A1
Application number: PCT/SE2015/050746
Authority: WO
Inventors: Julien Michot; Thomas Rusert; Kenneth Andersson; Jack ENHORN; Per Wennersten; Per Hermansson
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2015-06-25
Filing date: 2015-06-25
Publication date: 2016-12-29

Abstract

A method is disclosed performed by a system (10), for encoding an input video stream into a compressed output video stream, the input video stream comprising a plurality of data frames, the system comprising a first system node (12) and a second system node (14). The method comprises receiving, at the first node (12), a first data frame of the input video stream, receiving at the second node (14), a second data frame of the input video stream and producing, at the first node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame. The method further comprises sending, by the first node to the second node, information on the produced dependency data, producing, at the second node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data, producing a compressed representation of the first data frame based at least on the first data frame, and combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

Description

METHOD AND SYSTEM FOR ENCODING AN INPUT VIDEO STREAM INTO A COMPRESSED OUTPUT VIDEO STREAM

Technical field

[0001 ] The present disclosure relates generally to a method, a system and a computer program, for encoding an input video stream into a compressed output video stream.

Background

[0002] When a digital video stream is to be distributed over a communication network, it is encoded into a compressed mode so as to reduce the amount of data that has to be sent over the communication network. The process of reducing the size of data to be transmitted may be referred to as data compression or source coding, i.e. encoding done at the source of the data before it is stored or transmitted. By such video compression, limited communication interface resources are more efficiently used, and the encoded/compressed video stream is distributed quicker than if the original video stream was sent. Also, there is a lot of similarities between consecutive pictures of a video stream so the video stream can be effectively compressed taking account for those similarities. In this disclosure the wordings coding, encoding and compressing are used

interchangeably, all wordings signifying to encode a video stream into a

compressed mode.

[0003] There are several known coding formats for video compression that are used today, such as ITU-T standardized recommendations h.262, h.263, h.264 (also called Advanced Video Coding, AVC) and h.265 (also called High Efficiency Video Coding, HEVC). Most video coding formats combine spatial intra-picture compression and motion compensated temporal inter-picture compression techniques.

[0004] A video stream that is received at an encoder for video compression comprises a plurality of consecutive frames or pictures. In a video stream the frames are grouped into groups of pictures, GOPs, each GOP comprising a plurality of data frames. Normally, it is the encoder that groups frames into GOPs. Typically, the grouping is done before actual encoding of the frames. The grouping may involve reordering of the frames. The frames of a GOP are

coded/compressed in relation to each other.

[0005] An input video stream is historically coded by a single network node, e.g. a single machine or single computer before it is distributed over a communication network to viewers. However, in some cases it is important to have a quick encoding process. A use case when input video has to be encoded quickly is for real-time encodings that are to be distributed to viewers as "live" as possible, such as live broadcasts of an event such as a football match or a rock concert. In such cases, the delay that has to be introduced to be able to perform the encoding and corresponding decoding at the receiver side, i.e. at the communication device of the viewer needs to be short. To speed up the encoding process, distributed video encoding has been introduced. Distributed video encoding signifies to split the encoding work among several machines.

[0006] Existing distributed video encoding approaches split the input video stream into video chunks that are thereafter encoded separately and

independently of each other. This is in most known methods performed by splitting the input video temporally so that an Instantaneous Decoder Refresh, IDR, frame is inserted every N frames, thereby creating segments, i.e. sub-videos, that can be decoded independently of each other. A drawback with that method is the encoding latency that it produces, i.e. the delay between feeding a picture into the parallel encoder, and being able to stream out a continuous compressed video stream. The delay is at least as high as the playout duration of the number of frames that are in the system for processing at any time. I.e., if 10 segments of 60 frames each are in the system for parallel processing, and the video frame rate is 30 fps (frames per second), then the system introduces 600 frames / 30 fps = 20 sec delay. The higher the desired degree of parallel processing, the more segments of video need to be processed in parallel, and thus the higher the delay. High delay may be acceptable for offline encoding applications, e.g. for video-on- demand services, however it is typically not acceptable for live TV/video feeds, e.g. live sports events etc.

[0007] Another approach is to split the individual video frames into tiles, creating sub-videos of reduced resolution that can be encoded independently. A major drawback with the existing methods is that they break the dependency between the sub-videos. Thereby they reduce the coding efficiency, making the encoded bit-stream that is sent over the communication network larger than they could have been, which results in slower transmission over the communication network and/or need of larger communication resources for transmission of the encoded bit- stream. Consequently, there is a need of an efficient and quick distributed video compression method that can handle e.g. encoding and distribution of real-time video to communication devices efficiently.

Summary

[0008] It is an object of the invention to address at least some of the problems and issues outlined above. It is a further object to achieve an efficient and quick coding process for encoding of a video stream that is to be distributed over a communication network. It is possible to achieve these objects and others by using a method and an apparatus as defined in the attached independent claims.

[0009] According to one aspect, a method is provided, performed by a system, for encoding an input video stream into a compressed output video stream, the input video stream comprising a plurality of data frames. The system comprises a first system node and a second system node. The method comprises receiving, at the first node, a first data frame of the input video stream, receiving at the second node, a second data frame of the input video stream, and producing, at the first node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame. The method further comprises sending, by the first node to the second node, information on the produced dependency data, producing, at the second node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data, producing a compressed representation of the first data frame based at least on the first data frame, and combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

[00010] According to another aspect, a system is provided, configured for encoding an input video stream into a compressed output video stream, the system comprising a first system node and a second system node and the input video stream comprising a plurality of data frames. The system comprises at least one processor and at least one memory. The at least one memory contains instructions executable by said at least one processor, whereby the system is operative for receiving, at the first system node, a first data frame of the input video stream, receiving, at the second system node, a second data frame of the input video stream, and producing, at the first system node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame. The system is further operative for receiving, at the second system node from the first system node, information on the produced dependency data, producing, at the second system node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data, producing, at the first or the second system node, a compressed representation of the first data frame based at least on the first data frame, and combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

[0001 1 ] According to another aspect, a first system node is provided, of a system configured for encoding an input video stream into a compressed output video stream, the system comprising apart from the first system node also a second system node, and the input video stream comprising a plurality of data frames. The first system node comprises a processor and a memory, said memory containing instructions executable by said processor, whereby the first system node is operative for receiving a first data frame of the input video stream, producing dependency data representing a dependency between the first data frame and a second data frame of the input video stream, based on the first data frame, and sending to the second system node, information on the produced dependency data; which information the second system node uses to produce a compressed representation of the second data frame. The first system node is optionally also operative for producing, a compressed representation of the first data frame based at least on the first data frame. The first system node is optionally also operative for combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

[00012] According to another aspect, a second system node is provided, of a system configured for encoding an input video stream into a compressed output video stream, the system comprising apart from the second system node also a first system node, the input video stream comprising a plurality of data frames. The second system node comprises a processor and a memory. The memory contains instructions executable by said processor, whereby the second system node is operative for receiving a second data frame of the input video stream, receiving, from the first system node, information on dependency data representing a dependency between a first data frame of the input video stream and the second data frame, the dependency data being produced at the first system node, and producing a compressed representation of the second data frame based on the second data frame and the received information on the dependency data. The second system node is optionally also operative for producing a compressed representation of the first data frame based at least on the first data frame. The second system node is optionally also operative for combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

[00013] According to other aspects, computer programs and carriers are also provided, the details of which will be described in the claims and the detailed description. [00014] Further possible features and benefits of this solution will become apparent from the detailed description below.

Brief description of drawings

[00015] The solution will now be described in more detail by means of exemplary embodiments and with reference to the accompanying drawings, in which:

[00016] Fig 1 is a block diagram illustrating a system, according to some possible embodiments.

[00017] Fig. 2 is a block diagram of an input video stream that may be input to the system of fig. 1 .

[00018] Fig. 3 is a flow chart illustrating a method, according to possible embodiments.

[00019] Fig. 4 is another block diagram of a system according to a possible embodiment, illustrating an example of messages sent on an exemplary GOP.

[00020] Fig. 5 is a block diagram of an exemplary GOP structure.

[00021 ] Fig. 6 is another block diagram of the exemplary GOP structure, this time as a tree structure.

[00022] Fig. 7 is a block diagram of work allocation between two working units.

[00023] Fig. 8 is a block diagram illustrating a system in more detail, according to a possible embodiment

[00024] Fig. 9 is another block diagram illustrating a system in more detail, according to a further possible embodiment.

Detailed description

[00025] Briefly described, an improved distributed video encoding method is disclosed, together with its corresponding system. The method distributes an incoming video stream between two or more network nodes, e.g. computers for encoding the video stream. But in comparison to existing methods this method distributes the encoding of a video stream so that data for two frames that have impact on each other, e.g. two frames of the same GOP, are treated at different network nodes but it still takes dependencies between the two frames into consideration. In more detail, the method comprises receiving a first data frame at a first computer and a second data frame in close proximity to the first data frame at a second computer. At the first computer, data representing a dependency between the first data frame and the second data frame is produced, such as a reference frame or a motion estimate. Thereafter, the dependency data is sent to the second network node that uses the dependency data to encode the second data frame into a compressed second frame which can be sent in an output stream together with a compressed version of the first frame. The compressed version of the first frame may have been produced at any of the first or the second computer. By such a sharing of the encoding process for one and the same frame, an efficient coding can be achieved since it takes dependencies of frames into consideration and still distributes the steps of the encoding process between different computers so as to increase the processing power.

[00026] Fig. 1 shows an exemplary system 10 for encoding an input video stream into a compressed output video stream, in which system the present invention may be used. The system comprises an input unit 1 1 for receiving an incoming video stream. The incoming video stream may be a raw video stream or a video stream that has been pre-processed in some way. The input unit 1 1 is further connected to a first system node 12 and a second system node 14. The first and the second system nodes are separate nodes, each node being arranged for encoding data frames of a video signal. The system nodes may be separate machines, such as computers, or separate logical machines. Even though the system nodes 12, 14 are separate nodes they are connected to each other so that they can exchange data between each other. The system may further comprise an output unit 16 connected to the first and the second system node 12, 14. The input unit 1 1 may be arranged to distribute the incoming video stream, frame by frame, to the first or the second system node so that a first frame of the video stream is sent to the first system node and a second frame of the video stream is sent to the second node. The first system node 12 is then arranged to produce data representing a dependency between the first data frame and the second data frame and send this data or information of the dependency data to the second system node that uses the dependency data to produce a compressed version of the second data frame. The compressed version of the second data frame is then sent from the second system node 14 to the output unit 16 that arranges the compressed second data frame into a compressed output video stream together with a compressed version of the first frame that has been received from the first or the second node. The compressed output video stream may then be sent over a communication network such as a wireless communication network to a receiving unit, which may be a communication device, such as a mobile phone, belonging to a user that wants to see the distributed video. For being able to see the video, the receiving unit has a decoding unit that is arranged to decode the received encoded video stream. Even though the system 10 of fig. 1 has an input unit 1 1 and an output unit 16 that are separate from the first and the second system nodes 12, 14, it may also be the case that the input unit and the output unit are integrated into the first system node or the second system node or both.

[00027] Figure 2 shows an input video stream or more likely a part of an input video stream comprising two GOPs, the first GOP having a number of consecutive frames 0 to 8 and the second GOP having a number of consecutive frames 9 to 17. The sizes of each frame as illustrated in fig. 2 differ a bit, but this only due to illustrative purposes, that two digits take more space than one digit. Each GOP may have more or less than the exemplary 9 frames. A frame is considered to be the same as a picture throughout this disclosure.

[00028] Fig. 3 in conjunction with fig. 1 shows an embodiment of a method performed by a system 10, for encoding an input video stream into a compressed output video stream, the input video stream comprising a plurality of data frames. The system comprises a first system node 12 and a second system node 14. The method comprises receiving 106, at the first node 12, a first data frame of the input video stream, receiving 108, at the second node 14, a second data frame of the input video stream, producing 1 10, at the first node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame, and sending 1 12, by the first node to the second node, information on the produced dependency data. The method further comprises producing 1 14, at the second node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data, producing 1 16 a compressed representation of the first data frame based at least on the first data frame, and combining 1 18 the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

[00029] The first and the second node of the system may be for example physical computers or virtual computers. The first and the second data frame may belong to the same group of pictures, GOP. The data frames of a GOP may be coded so that at least some of the frames are depending on the encoding outcome of another frame or frames in the GOP. A common GOP comprises one l-frame and a number of P-frames and/or B-frames. However, also other combinations may apply, such as one GOP consisting of only P- and B-frames. A data frame may in this disclosure, except for signifying a whole frame also signify a part of a frame, such as a slice or a tile. Within the scope of this embodiment, the order of the parts of the method shown in fig. 3 may be different than what is shown in fig. 3. For example, when the dependency data is the reconstructed compressed first frame, the compressed representation of the first frame is produced 1 16 before the reconstructed compressed first frame is produced 1 10.

[00030] The dependency data may be any input data defining a dependency between the first and the second frame, such as a reference frame and/or a motion estimate used by the second system node to calculate the compressed representation of the second frame, statistics from a pre-pass encoding, activity calculations etc. A pre-pass encoding is in this case an encoding performed by the first node of a coding work that would normally be performed by the second node, such as the work of producing 1 14 at the second node the compressed

representation of the second frame, The information on the dependency data may be any information that the second system node is configured to accept/require as input, e.g. the actual reference frame, a motion vector or motion vector field defining the motion estimate(s), statistics from a first-pass encoding, matrix of picture activity measurements, etc. The wording "producing a compressed representation" may signify determining a compressed representation. The combining 1 18 may be performed at any of the first or the second system node or at a separate node, such as at the output unit 16 of fig. 1 , which may be a bit stream mixer. The producing 1 16 of a compressed representation of the first data frame based at least on the first data frame may be accomplished at the first system node or at the second system node or in the case where dependency data is a motion estimation pre-pass, it may have been accomplished earlier, i.e. before this method starts and/or at another node of the system

[00031 ] By dividing the encoding work between two or more system nodes, e.g. computers as described above, a quicker encoding process is achieved than what is possible if only one system node, e.g. computer, is used. This is especially advantageous in real-time encoding applications when a recorded input video stream is to be distributed to viewers as quickly as possible, such as a live broadcast of an event. Especially, to produce dependency data defining a dependency between a first frame and a second frame at a first node based on at least the first frame, to send this produced dependency data to the second node, and to, at the second node, use this dependency in addition to using the second frame to produce a compressed representation of the second frame results in an efficient division of the encoding work between the two system nodes.

[00032] According to an embodiment, the produced 1 10 dependency data is a reconstructed compressed representation of the first frame. The reconstructed compressed representation of the first frame functions as a reference frame for the second frame. [00033] According to another embodiment, the produced 1 10 dependency data is a motion estimate of the difference between the first frame and the second frame. Further, the method comprises receiving 107, at the first system node, the second frame. Further, the motion estimate is produced 1 16 based on the first frame and the second frame. The "first frame" may be the original first frame, of the input video stream, or the reconstructed first frame The information on the motion estimate, which is sent to the second node, may be a motion vector, such as a global motion vector or series of motion vectors for each pixel or block in the frame.

[00034] According to another embodiment, the produced 1 10 dependency data is a pre-pass encoding, or other statistical data, of the second frame. Further, the second node receives the first frame in addition to the first-pass compression to produce 1 16 the compressed representation of the first frame based on the received first frame and the pre-pass encoding. Further, the first node receives the second frame in addition to the first frame in order to be able to produce the pre- pass encoding of the second frame.

[00035] According to another embodiment, the sent 1 12 information on the dependency data is the actual dependency data.

[00036] According to another embodiment, the sent 1 12 information on the dependency data is a compressed representation of the dependency data.

[00037] According to another embodiment, the sent 1 12 information on the dependency data is compressed relative to information already present on the second system node. The information already present on the second node may be the second data frame. The dependency data may be the reconstructed compressed representation of the first frame, and the information on this dependency data may then be sent as the difference between the reconstructed compressed representation of the first frame and the second data frame. By compressing the information on the dependency data in such a way would lower than amount of data that has to be sent between the system nodes compared to sending the actual dependency data.

[00038] According to another embodiment, the method further comprises receiving 109 at the second system node, the first data frame. Further, the sent 1 12 information on the dependency data represents a pixel-wise difference between the first frame and a reconstructed compressed representation of the first frame. Such a measure would reduce the amount of data to send to the second node after the first node has finished compression and would therefore reduce the delay before the second node can start producing the compressed representation of the second frame.

[00039] According to another embodiment, the input video stream is a

compressed video stream, divided into a first compressed sub-bit stream comprising the first frame and a second compressed sub-bit stream comprising the second frame. The method then further comprises, before receiving 106, 108 the first frame and the second frame at the first and the second system node, respectively, decompressing 102, by the first system node, the first compressed sub-bit stream into a first decompressed sub-bit stream comprising the first frame, and decompressing 104, by the second system node, the second compressed sub-bit stream into a second decompressed sub-bit stream comprising the second frame.

[00040] According to another embodiment, the method further comprises sending, by the first system node, information of the decompressed 102 first sub- bit stream to the second system node. The first and the second sub-bit stream might be divided so that some frames in the second sub-bit stream has

dependency on some frames of the first sub-bit stream. By sending this

information it is possible for the second node to take advantage of the already performed decompression of the first sub-bit stream when decompressing the second sub-bit stream. [00041 ] According to another embodiment, the system comprises a plurality of system nodes including the first system node 12 and the second system node 14. Further, the method comprises selecting the first system node for processing the first data frame based on one or more of the following conditions: presence of a reference frame for the first data frame at each of the plurality of system nodes; presence of the first data frame at each of the plurality of system nodes, processor load of each of the plurality of system nodes.

[00042] The selecting of the first system node is performed before the receiving 106 of the first data frame at the first system node. The selecting may be performed by an input unit 1 1 of the system. The input unit may be a dispatcher. The selecting may be performed before the decompression 102, when such a step exists. The first node of the plurality of nodes may be prioritized as selection alternative when the first data frame is already present on the first node and/or when the reference frame for the first data frame is already present on the first node. By such a method, it is possible to select a system node out of a plurality of system nodes that is most suitable for performing the encoding.

[00043] According to another embodiment, the system comprises a plurality of system nodes including the first system node 12 and the second system node 14, and the input video stream comprises a first GOP comprising a plurality of data frames including the first data frame and the second data frame, the first GOP having a hierarchical GOP structure. The method further comprises:

determining a number of the plurality of system nodes to use for the encoding, based on a degree of parallelism of the GOP structure, the number of system nodes comprising the first system node ;

assigning tasks of producing compressed representations of the plurality of data frames of the first GOP to the different determined number of system nodes based on the degree of parallelism of the GOP and so that the first system node is assigned the trunk of the GOP structure; and

assigning motion estimation tasks to the determined number of system nodes. [00044] The GOP concept is illustrated in fig. 2. An example of a GOP structure is illustrated in fig. 5 and in fig. 6. In fig. 6, the GOP exemplary structure is described as a tree structure with a trunk (frames 0, 8, 4 and e.g. 6, 5) from which branches are branching off. The degree of parallelism is how many coding tasks that may be performed simultaneously. This may also be seen in the tree structure as how many parallel branches there are. The parallelism comes from the fact that encoding a picture in a hierarchical GOP structure "unlocks" (two in the example case) pictures in a higher temporal layer, which can then be encoded in parallel. Temporal layer is defined such that pictures can only depend on other pictures in the same or lower layer. The lowest temporal layer of the GOP structure is the most critical part since that unlocks all the other layers and also the next GOP. Since the picture in lowest layer (picture 8) only has dependencies on picture in lowest layer from previous GOP it becomes natural that first node is always responsible for coding the lowest temporal layer in all GOPs.

[00045] In the example of Fig 5/6, encoding of picture 4, unlocks picture 2 and 6. At the highest layer you have 4 pictures (1 , 3, 5, and 7) that can be encoded in parallel without any dependencies on each other. In other words, in the fig. 6 example, the degree of parallelism is 4 (frames 1 , 3, 5, 7), i.e. four system nodes is to be used. In the method, the first system node is then assigned coding 5 tasks of producing compressed representations of data frames, i.e. of five data frames/pictures, i.e. the coding tasks of the trunk, e.g. 0, 8, 4, 2, and 1 of the tree structure. The remaining three of the tasks of producing compressed

representations of data frames are spread out on the three other system nodes of the fig. 6 example. As the first network node is occupied by the 5 tasks, in the meantime the three other system nodes are assigned the tasks of producing dependency data, e.g. motion estimations.

[00046] Fig. 4 shows an embodiment of a system 200 for encoding an incoming video stream of raw data frames into a compressed video stream. The system 200 comprises four working units, i.e. system nodes A 212, B 214, C 216 and D 218 performing video encoding, a dispatcher 210 that receives incoming video frames and distributes them to the specific allocated working unit, and a bit stream mixer 220 that receives encoded bit streams from the working units 212, 214, 216, 218 and mixes them into a final outgoing bit stream of compressed video.

[00047] In the figure, an example is described for the coding of a received GOP. At the arrows from the dispatcher to the working units are written which raw data frames of the GOP that are sent to which working unit for further treatment. For example, frames 1 , 2, 4, 8 and possibly also frame 0 is sent to Working unit A 212. Further, there are arrows between the working units on which RF stands for reference frame and Mv stands for motion vector. For example, the text "RF4,8" by the arrow between working unit A and working unit B signifies that reference frames 4 and 8 are to be sent from working unit A to working unit B. This also implicitly signifies that the reference frames 4 and 8 are produced by working unit A. Similarly, "Mv84" by the arrow between Working unit B and A signifies that motion vector defining motions differences between frame 8 and 4 is sent from Working unit B to Working unit A. The example of fig. 4 is based on the certain exemplary GOP structure shown in fig. 5.

[00048] The GOP structure of fig. 5 shows 9 frames (0 to 8). Fig. 5 further describes an exemplary coding order of the frames and the coding dependency between the frames. The coding order and coding dependency is the same that is used for deciding the allocation of encoding work as shown in fig. 4, i.e. which frames that are treated on which computer and where reference frames and motion estimates are calculated. Frame 0 is an l-type or P-type, frame 8 is a P type and frames 1 -7 are B-types, wherein Ί" stands for a frame coded

independently of all other frames, "P" stands for a frame that contains motion- compensated difference information relative to previously coded frames, and "B" stands for a frame that contains motion-compensated difference information relative to, most often, two previously coded frames. The coding order of the frames of the GOP structure of fig. 5 is 0, 4, 3, 5, 2, 7, 6, 8, and 1 . [00049] Typical encoding steps for encoding a video sequence comprising a plurality of GOPs are:

1 . Motion estimation, ME, between two frames of the GOP, wherein the difference between two pictures/frames due to movement of items in the picture is estimated. This can be made either a) by comparing a current original frame with a

reconstructed reference frame or b) by comparing a current original frame with an original reference frame.

2. Intra-frame predictions, motion compensated inter-picture prediction, mode decision, Rate Distortion Optimization;

3. Frame reconstruction:

4. Filtering techniques, including de-blocking and Sample Adaptive Offset filter Step 1 can be performed per block independently and can even start as soon as a new frame appears, in the case of 1 a. Step 2 cannot start before step 1 has been finalized, since intra prediction needs top and left neighboring blocks to be available.

Steps 2 to 4 can be considered as one work item and will be referred to as

"Coding" in the following description. In terms of dependency, the coding tasks of one frame (i) has to be performed after its associated motion estimations.

[00050] According to embodiments, when used in the context of fig. 4 and 5, the encoding work of this GOP is split among the four working units A, B, C and D, e.g. four different computers. As will be shown further down in this document, five working units would be a waste of resources and wouldn't bring any gain.

[00051 ] When encoding the exemplary GOP, the following work items are to be distributed between the four working units 212-218: 15 motion estimates, MEij ^", i.e. 08, 04, 84, 02, 42, 46, 86, 01 , 21 , 23, 43, 45, 65, 67, 87, where MEij means motion estimation from frame J to the reference frame I; and 9 encoding procedures, frames 0 to 8. Typically it may only be 8 encoding procedures if frame 0 is a P frame since it was this frame was frame number 8 of the previous GOP. [00052] The transmission bandwidth between the working units is typically a bottleneck. Therefore, an embodiment of the invention aims to minimize the amount of data to be transmitted between the working units, such data typically being motion vectors and reconstructed reference frames. Since it is costly in terms of bandwidth to transmit the reference frames between the working units, this embodiment allocates the production of the reference frames and the MVs according to the reference frame dependencies. In other words, this allocating is based on where the reference frames are to be used so that the reference frames are produced at a computer which also handles the production of the compressed outgoing frame based on the reference frame.

The table 1 below defines for how many other frames the defined frame, in the exemplary GOP of fig. 5 is used as reference frame, i.e. frame 0 is used as reference frame for four other frames.

The table 2 below defines which of the reference pictures in the GOP of fig. 5 that the defined frame uses, i.e. frame 1 uses frames 0 and 2 as reference frames.

Frame 0 1 2 3 4 5 6 7 8 using

reference - 0,2 0,4 2,4 0,8 4,6 4,8 6,8 0 pictures [00053] Below is described an allocation procedure according to an embodiment, for allocating encoding tasks/jobs to different system nodes of a system for encoding video into a compressed outgoing video stream. The problem of assigning N tasks to M system nodes/working units when tasks are dependents is known as task-shop scheduling and is an NP-hard problem with multiple solutions. Thus, this problem is difficult, i.e. slow, but feasible, using methods such as "branch and bound", Local search methods, etc. to solve. The aim is to find the best assignment of encoding tasks so as to minimize the overall processing time. Since in our case, all the tasks have unknown exact durations but only rough approximations, it is very difficult, if not even impossible to solve the problem optimally with an algorithm. Thus, we here present a set of clever heuristics that allow us to simplify the problem, partly solve it and obtain an effective job assignment.

[00054] According to an embodiment, the number of working units is determined so as to maximize the parallelism, i.e. to minimize the overall GOP encoding time, while the number of working units should be small enough in order not to waste processing power. This is accomplished considering the coding tasks and their dependencies so that one working unit is assigned coding tasks that are

dependent on each other in a consecutive order. Then the rest of the tasks, e.g. the MEs, are dispatched among the remaining working units.

[00055] More details of a variant of the above embodiment is given in the following steps, in order obtain the exemplary presented job assignment table 3. The first thing to do in this assignment process may be to estimate the degree of parallelism of the GOP structure allows, e.g. the GOP structure of fig. 5, and from the estimated degree of parallelism derive the number of working units needed. To do so, the GOP structure of fig. 5 is transformed into a tree structure, only keeping the dependencies (arrows) of each frame that are satisfied at last. This is illustrated in fig. 6. Applying a depth-first search algorithm to this tree structure gives us the coding order as specified in fig. 6 The depth-first search algorithm is an algorithm where all pictures in the GOP are visited once, starting at first picture 0. In principle it selects one of the neighbor pictures and continues searching its neighbor pictures until it reaches the end of that branch. It then backtrack and consider the other neighbor pictures until whole graph has been visited. In the example of fig 5/6: Picture 0 is added to an output graph. Out of the four neighboring pictures (1 , 2, 4, 8) the one in the lowest temporal layer is selected,

1. e. picture 8. In the output graph an arrow between picture 0 and 8 is created. From picture 8 there are three neighbor pictures (4, 6, 7), and out of these picture 4 has the lowest temporal layer. In output graph an arrow between 8 and 4 is drawn. Picture 4 has two neighbor pictures both at same temporal layer, one of them is selected first (i.e. depth first search), e.g. picture 2. In output graph a line between picture 4 and 2 is drawn. From picture 2 lines are then added in output graph to picture 1 and 3. Since no more pictures remain in this branch search goes back to picture 6. From picture 6 lines to picture 5 and 7 is added to output graph. Since all pictures have now been visited the search is done. The degree of parallelism we can achieve here corresponds to the number of tree leaves, 4 in this example (1 , 3, 5, and 7). Thus, at most 4 working units are to be used.

[00056] The next step is to assign the coding tasks, i.e. the producing of compressed representations of frames, since they are the most time consuming tasks to each machine, while preserving the tasks' dependencies but minimizing the total time. The actual assigning of coding tasks to nodes may be based on what references frames are needed, i.e. frame dependencies. What references frames are needed is known beforehand since that is decided by the GOP structure. To do so, we apply a breadth-first search algorithm on the tree and assign parallel tasks to different working units. The breadth-first search algorithm is similar to the depth-first algorithm. It also visits all pictures in the graph once. The only difference compared to the depth-first algorithm is that all neighbors are visited/marked before any of the neighbors^' neighbors. So in fig 6, the breadth first search starts at root node, i.e. picture 0. It then visits only neighbor, picture 8. From picture 8 it visits only neighbor 4. Picture 4 has two neighbor, which both are then marked as visited, in output graph a line between picture 4-2 and 4-6 is added. Then one of the neighbors is selected to continue search, e.g. picture 2. Picture 2 has two neighbors, picture 1 and 3, afterwards that branch is closed. Search continues with picture 6's neighbors, 5 and 7. In summary: The depth-first search algorithm gives an order: 0-8-4-2-1 -3-6-5-7; the breadth-first search algorithm gives an order: 0-8-4-2-6-1 -3-5-7. For instance, we start by assigning coding tasks CO, C8 and C4 to working unit A. Then, we see that coding tasks C2 and C6 can be done in parallel so we assign C2 to working unit A and C6 to working unit B. We continue with coding task C1 being assigned to working unit A, C5 to working unit B, C3 to working unit C and C7 to working unit D. We can observe that working unit A contains the critical path of the tree, i.e. the trunk, and has 4 or 5 encoding tasks to perform.

[00057] The next step is to assign ME production tasks, which are simpler/faster tasks that have to be performed right before the associated coding task. To do so, we start by considering MEs to be assigned to working units having free time before their coding tasks, which is working units B, C and D in this example, and we assign them their associated MEs. This way, we can avoid at least some of the transmission of motion vectors, MV, since MEs are done locally. For instance, since working unit D encodes frame 7, it can also perform ME67 and ME87. We then consider the remaining MEs, ME84, ME42, ME21 , ME01 , ME02, ME04, and ME08 in our example. These are essentially MEs of frames encoded by working unit A. Since MEs tasks are performed faster than encoding tasks, we can let these tasks be performed by the three remaining working units B, C and D, and after having produced the MEs, they send their results back to the working unit A, e.g. as MVs. The decision of which ME task will be assigned to which working unit is made so as to have roughly the same number of MEs among the three working units and roughly the same number of original frames to send to the different working units. This way, the work load is balanced between the working units while minimizing the data transfer between the working units, considering a

transmissions costs (delay) hypothesis: Orig Pic > Ref Pic > MVs > 0 and considering that the number of transmission tasks should be smaller than the number of ME tasks. The transmission cost hypothesis signifies the following: The amount of data to send for an original picture is larger than the amount of data to send for a reference picture which in turn is larger than the amount of data to send for a motion vector which in turn is larger than zero. The term picture is equivalent with the term frame throughout the description. Further, the term working unit is equivalent with the term worker and with the term system node. A working unit may be a physical computer or a virtual computer.

[00058] These heuristics are general enough to be employed on various GOP structures and allow us to keep all dependencies satisfied, constrain the overall coding time of the GOP to be equal, or at least close to the critical path of the GOP tree, and minimize the amount of transmitted data, i.e. MVs, reference frames and original frames, since it adds delay, while satisfying the two previous points. The critical path of the GOP tree is the same as the number of coding tasks running on working unit A.

[00059] In the following we consider two cases of GOPs. Case A is when the first frame/picture of the GOP is not already coded. This typically corresponds to the beginning of a video or when picture 0 is an l-frame. Case B is when picture 0 has already been coded. This is typically the case when picture 0 corresponds to the picture 8 of the previous GOP that has already been encoded.

[00060] Case A: frame 0 is an l-frame

Applying the heuristics specified in the previous section and deriving the required data in order to run the tasks, we obtain the following job allocation for the GOP of fig. 5 and 6:

Worker A will carry out coding of frames 0, 8, 4, 2, and 1 . To

accomplish this, a needs reconstructed reference frames 0,2,4,8, and original pixels of frames 0, 1 , 2, 4, 8.

Worker B will carry out coding of frames 6, 5, and producing of MEs 46, 86, 45, 65 and (84). To accomplish this, B needs reconstructed reference frames 4, 6, 8 and original pixels of frames 5, 6, (+4, 8).

Worker C will carry out coding of frame 3 and producing of MEs 23, 43, (+01, 21, 42). To accomplish this, C needs reconstructed reference frames 2, 4 (+0) and original pixels of frames 3, (+1, 2, 4). Worker D will carry out coding of frame 7 and producing of MEs 67, 87, (+02, 04, 08). To accomplish this, C needs reconstructed reference frames 6, 8 (+0) and original pixels of frames 7, (+8, 6, 4, 2, 0).

In bold are the reference frames needed to be sent to the worker, since they are not already stored locally at the worker.Jn parenthesis/italics are tasks and required data that could be performed on a different machine resulting in a different solution. For instance, it would be possible to swap these tasks and required data between workers C and D.

[00061 ] Below is a time allocation diagram for the job allocation described in the former paragraph for case A. On the x-axis of the diagram is time. The time allocation starts at left and proceeds to the right in the diagram. In the time allocation diagram it is estimated that ME+S is at least 1 .66 times faster than Coding. This means that approximately five MEs can be performed during three Coding tasks on one worker:

[00062] In the diagram, "S(i)" means that reconstructed frame associated data, i.e. either ME or C, has to be sent to the working units that need them. In this diagram S is there to signify that time is taken also for this task of sending the data. "S(i)" can probably be done in parallel to other works depending on the complexity. Further, MEij' means that the motion estimation ME between frames i and j can be done on the reconstructed reference frame Γ instead of the original pixels i since it is available at the time specified (sent by step Si). An alternative is to perform MEij, i.e. on original frame i instead of on the reconstructed reference frame i'. Further, MEs needed for worker A (08, 04, 84, 02, 42, 01 , 21 ) are in italics since there might be other partitioning where these jobs are done on different machines and at different time. The job assignment might also depend on the cost, i.e. time, of sending MVs or Cs (what is called "S" in the diagram) to the working unit that needs them, but we can make the assumption that it takes less time to send data than to perform ME. An alternative to the time allocation described in the diagram is to have ME84+S assigned to worker D instead of B, since B has 2 encoding tasks to perform and D has already everything needed. However, D has already some timely constrained tasks to do (ME08+S and ME04'+S) so it is doubtful whether it is better.

[00063] Case B: frame 0 is a P-frame, i.e. already encoded

In this case we obtain the following job allocation for the GOP of fig. 5 and 6:

Worker A will carry out coding of frames 8,4,2, 1 and producing of ME 08. To accomplish this, worker A needs reconstructed reference frames 0,2,4,8 and original pixels of frames 1 ,2,4,8

Worker B will carry out coding of frames 6, 5 and producing of MEs 46,86,45,65, (+84). To accomplish this, worker B needs reconstructed reference frames 4, 6, 8 and original pixels of frames 5, 6, (+4, 8).

Worker C will carry out coding of frame 3 and producing of MEs 23, 43, (+01, 21, 42). To accomplish this, worker C needs reconstructed reference frames 2, 4 (+0) and original pixels of frames 3, (+1, 2, 4).

Worker D will carry out coding of frame 7 and producing of MEs 67, 87, (+02, 04). To accomplish this, worker D needs reconstructed reference frames 6, 8 (+0) original pixels of frames 7, (+6, 4, 2). [00064] Below is a time allocation diagram for the job allocation described in the former paragraph for case A. On the x-axis of the diagram is time. The time allocation starts at left and proceeds to the right in the diagram. In the time allocation diagram it is estimated that ME+S is at least 2 times faster than Coding.

[00065] Comments to the table: Job allocation depends on the ME/coding times ratio but for the current HEVC encoding, Coding is around 10x ME; if the encoded bit stream RF4 is sent to worker C, C has no access to frame 8, thus cannot decode RF4. An alternative may then be to send RF4 + RF8 or even better to send the delta-RF4 that only requires access to original frame 4, which worker C has already. Similarly, for WU D with RF8: in case B and D have no access to frame 8 one option is to send frame 8 + delta-RF8 to D alternatively to send an encoded bit-stream representation of RF8 which requires RFO, but if D already has RFO, the latter seems as a better option.

[00066] The following benefits may be achieved by case A and case B: Maximum parallelism gain may be: 5/9 for case A and 4/8 for case B; ME can be produced on the available reconstructed reference frame instead of its original pixels; very efficient partitioning since the amount of MEs done on reconstructed pixels is maximized while performing the ME tasks in parallel.

[00067] Depending on where ME is performed, its output, motion vectors, MVs, may need to be sent to one or more workers. For instance, MVs of ME42, estimated on worker C need to be sent to worker A. One efficient way to send MVs at a limited cost, i.e. limited number of bits, is by encoding the MVs according to the HEVC standard, with a fixed known block structure, ex: regular NxM grid. The block structure is in H.264 called a macroblock structure. The macroblock structure is 16^*16 pixels. In HEVC the block structure is called a coding unit, CU, structure. CU = 1 block. The size of a block in HEVC can vary within the picture depending on image features.

[00068] Sending of reference frames from one worker to another may be performed by sending the reference frames as compressed bit streams (option 1 ) using a state of the art video codec like HEVC. The compressed bit stream needs to be decoded by the receiving worker, and thus the receiving worker requires access to some extra data, including the reference frames of the RF, unless the RF is an l-frame. Another alternative (option 2) for sending of reference frames from one worker to another is to send a lossless compressed difference between the reconstructed reference frame and the original frame. Here we assume that the receiving worker has access to the original frame in advance.

[00069] Another alternative for sending of reference frames, RF, in the example GOP for each of the four working units A, B, C and D will now be described for the above mentioned case A and case B.

Working unit A

In case A, all Reference frames are available locally. In case B, the encoded RFO is either already available in working unit A from the previous GOP encoding, since frame 0 is the previous frame 8, or is missing from one or more working units and has to be sent to working units A, C and/or D. One way to do is to send a losslessly and independently encoded version of the reconstructed reference frame (option 1 ), as a compromise with sending the original pixels plus delta (option 2). Option 2 breaks the dependencies of the RF with its own RFs, and allows the system to change one or more physical working units or restart the coding of one entire GOP for a limited but substantial cost.

Working unit B

RF8: Option 2 since working unit B has the original frame 8. RF4: Option 2 since working unit B has the original frame 4.

Working unit C

RFO: For case A, option 1 . For case B, see working unit A, case B. RF4: Option 2 since working unit C has the original frame 4. RF2: Option 2 since working unit C has the original frame 2.

Working unit D

RFO: Option 2 in case A since working unit D has the original frame 0. For case B, see working unit A case B. RF8: Option 2 in case A since working unit D has the original frame 8, and option 1 in case B since working unit D has encoded reference 0. RF6: Option 2 since working unit D has the original frame 6.

A case when it may be better to only use option 2 is for instance when the Quantization Parameter is low or the bit stream only contains l-frames.

[00070] The choice of selecting best transmission method for sharing of data between the working units depends on several factors like content type, bit-rate and latency requirements. In some cases it is cheaper in terms of amount of data and/or time, to send the encoded difference between the reconstructed reference frame and the original frame than to send the encoded reference frame bits. A possible static approach may be to always select the first option when QP is higher than threshold number, for example 15. An alternative method may be to select the transmission method adaptively. All B-pictures needs at least two reference pictures. As shown in fig. 7, one of those reference frames, in fig. 7 called x, determined from frame 8, may have been coded by working unit A at least 1 frame in-between when it is to be used, and the other frame, called y in fig. 7 and coded from frame 4 may have been coded by working unit A just-in time for worker B to start coding the current B-picture using the two reference frames x and y. Selecting best transmission method for x is less relevant since transmission time is allowed to be longer. Selecting transmission method for y is more important since that is blocking working unit B from starting coding frame 6 using the reference frames x and y. Best transmission method for y is to decide adaptively just-in-time of transmission, by selecting method which results in lowest number of bytes necessary for the transmission. Additionally, by always using the first option for x, the dependencies on earlier frames (frames 0 and 8) does not need to be transmitted again if that method is also selected for y. Which option that is used can be coded as a flag in the header of the sub-bit stream sent to the receiver.

[00071 ] Fig. 8 describes an embodiment of a system 10 configured for encoding an input video stream into a compressed output video stream. The system comprises a first system node 12 and a second system node 14. The input video stream comprises a plurality of data frames. The system 10 comprises at least one processor 603, 623 and at least one memory 604, 624. The at least one memory contains instructions executable by said at least one processor, whereby the system 10 is operative for receiving, at the first system node 12, a first data frame of the input video stream, receiving, at the second system node 14, a second data frame of the input video stream and producing, at the first system node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame. The system is further operative for receiving, at the second system node from the first system node, information on the produced dependency data and producing, at the second system node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data. The system is further operative for producing, at the first or the second system node, a compressed representation of the first data frame based at least on the first data frame, and combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream. [00072] According to an embodiment, which is the embodiment described in fig. 8, the system 10 is realized by the first system node 12 comprising a first processor 603 and a first memory 604, and the second system node 14

comprising a second processor 623 and a second memory 624. In this

embodiment, the first memory 604, i.e. the memory of the first system node, contains instructions executable by the first processor 603, whereby the first system node 12 is operative for receiving a first data frame of the input video stream, producing dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame, and sending, to the second system node, information on the produced dependency data, and optionally producing a compressed representation of the first data frame based at least on the first data frame. Further, the second memory 624, i.e. the memory of the second system node, contains instructions executable by the second processor 623, whereby the second system node 14 is operative for receiving a second data frame of the input video stream, receiving, from the first system node, the information on the produced dependency data, producing a compressed representation of the second data frame based on the second data frame and the received information on the dependency data, and producing a compressed representation of the first data frame based at least on the first data frame, when the compressed representation of the first data frame was not produced by the first system node. Further, the combining of the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream may be performed in any of the first system node or the second system node or in another node of the system.

[00073] According to an embodiment, the produced dependency data is a motion estimate of the difference between the first frame and the second frame, and wherein the system is further operative for receiving, at the first system node, the second frame, and wherein the system is operative for producing the motion estimate based on the first frame and the second frame. [00074] According to another embodiment, the information on the dependency data that the system is operative to send is a compressed representation of the dependency data.

[00075] According to another embodiment, the compressed representation of the dependency data is compressed relative to information already present on the second system node.

[00076] According to another embodiment, the produced dependency data is a reconstructed compressed representation of the first frame and the system is further operative for receiving, at the second system node, the first data frame. Further, the information on the dependency data represents a pixel-wise difference between the first frame and the reconstructed compressed representation of the first frame.

[00077] According to another embodiment, the input video stream is a

compressed video stream, divided into a first compressed sub-bit stream

comprising the first frame and a second compressed sub-bit stream comprising the second frame. The system is further operative for, before the first frame and the second frame is received at the first and the second system node,

respectively: decompressing, at the first system node, the first compressed sub-bit stream into a first decompressed sub-bit stream comprising the first frame, and decompressing, at the second system node, the second compressed sub-bit stream into a second decompressed sub-bit stream comprising the second frame.

[00078] According to another embodiment, the system comprises a plurality of system nodes including the first system node 12 and the second system node 14. The system is further operative for selecting the first system node for processing the first data frame based on one or more of the following conditions: presence of a reference frame for the first data frame at each of the plurality of system nodes; presence of the first data frame at each of the plurality of system nodes, processor load of each of the plurality of system nodes. [00079] According to another embodiment, the system comprises a plurality of system nodes including the first system node 12 and the second system node 14. Further, the input video stream comprises a first GOP comprising a plurality of data frames including the first data frame and the second data frame, the first GOP having a GOP structure. Further, the system is operative for determining a number of the plurality of system nodes to use for the encoding, based on a degree of parallelism of the GOP structure, the number of system nodes comprising the first system node, assigning tasks of producing compressed representations of the plurality of data frames of the first GOP to the different determined number of system nodes based on the degree of parallelism of the GOP and so that the first system node is assigned the trunk of the GOP structure; and assigning motion estimation tasks to the determined number of system nodes except for to the first system node.

[00080] Another embodiment of a system 10 configured for encoding an input video stream into a compressed output video stream is shown in fig. 9. The system comprises a first system node 12 and a second system node 14, the input video stream comprises a plurality of data frames. The system 10 further comprises a first receiving module 702 for receiving, at the first system node 12, a first data frame of the input video stream, a second receiving module 704 for receiving, at the second system node 14, a second data frame of the input video stream, and a first producing module 706 for producing, at the first system node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame. The system further comprises a third receiving module 708 for receiving, at the second system node from the first system node, information on the produced dependency data, and a second producing module 710 for producing, at the second system node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data. The system further comprises a third producing module 712 for producing, at the first or the second system node, a compressed representation of the first data frame based at least on the first data frame, and a combining module 714 for combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

[00081 ] According to another embodiment, a first system node12 is provided, of a system 10 configured for encoding an input video stream into a compressed output video stream, the system comprising apart from the first system node 12 also a second system node 14. The input video stream comprises a plurality of data frames. The first system node 12 comprises a processor 603 and a memory 604, said memory containing instructions executable by said processor, whereby the first system node 12 is operative for receiving a first data frame of the input video stream, producing dependency data representing a dependency between the first data frame and a second data frame of the input video stream, based on the first data frame, and sending to the second system node, information on the produced dependency data; which information the second system node uses to produce a compressed representation of the second data frame. The first system node is optionally also operative for producing, at the first system node, a compressed representation of the first data frame based at least on the first data frame. The first system node is optionally also operative for combining the compressed representation of the first data frame and the compressed

representation of the second data frame into the compressed output data stream.

[00082] According to another embodiment, a second system node 14 is provided, of a system 10 configured for encoding an input video stream into a compressed output video stream, the system comprising apart from the second system node 14 also a first system node 12. The input video stream comprises a plurality of data frames. The second system node 14 comprises a processor 623 and a memory 624, said memory containing instructions executable by said processor, whereby the second system node 14 is operative for receiving a second data frame of the input video stream, receiving, from the first system node, information on dependency data representing a dependency between a first data frame of the input video stream and the second data frame, the dependency data being produced at the first system node, and producing a compressed representation of the second data frame based on the second data frame and the received information on the dependency data. The second system node is further optionally operative for producing a compressed representation of the first data frame based at least on the first data frame. The second system node is further optionally operative for combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

[00083] The first system node 12 may further comprise a communication unit 602, which may be considered to comprise conventional means for communicating from and/or to other nodes in the system 10, such as the second node and an optional input unit 1 1 and output unit 16, see fig. 1 . The communication unit 602 may comprise one or more communication ports for communicating with the other nodes in the system. The instructions executable by said processor 603 may be arranged as a computer program 605 stored in said memory 604. The processor 603 and the memory 604 may be arranged in a sub-arrangement 601. The sub- arrangement 601 may be a micro-processor and adequate software and storage therefore, a Programmable Logic Device, PLD, or other electronic

component(s)/processing circuit(s) configured to perform the actions, or methods mentioned above.

[00084] The second system node 14 may further comprise a communication unit 622, which may be considered to comprise conventional means for communicating from and/or to other nodes in the system 10, such as the first node and the optional input unit 1 1 and output unit 16. The communication unit 622 of the second system node may comprise one or more communication ports for communicating with the other nodes in the system. The instructions executable by the processor 623 of the second system node may be arranged as a computer program 625 stored in the memory 624 of the second system node. The processor 623 and the memory 624 of the second system node may be arranged in a sub- arrangement 621 , which may be a micro-processor and adequate software and storage therefore, a Programmable Logic Device, PLD, or other electronic component(s)/processing circuit(s) configured to perform the actions, or methods mentioned above.

[00085] The computer programs 605 and 625 may respectively comprise computer readable code means, which when run in the first/second node causes the system to perform the steps described in any of the described embodiments. The computer program 605; 625 may be carried by a computer program product connectable to the processor 603; 625. The computer program product may be the memory 604; 624. The memory 604; 624 may be realized as for example a RAM (Random-access memory), ROM (Read-Only Memory) or an EEPROM (Electrical Erasable Programmable ROM). Further, the computer program may be carried by a separate computer-readable medium, such as a CD, DVD or flash memory, from which the program could be downloaded into the memory 604; 624. Alternatively, the computer program may be stored on a server or any other entity connected to the communication network to which the system has access via the

communication unit 602; 622 of the respective first and second node. The computer program may then be downloaded from the server into the memory 604; 624.

[00086] Although the description above contains a plurality of specificities, these should not be construed as limiting the scope of the concept described herein but as merely providing illustrations of some exemplifying embodiments of the described concept. It will be appreciated that the scope of the presently described concept fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the presently described concept is accordingly not to be limited. Reference to an element in the singular is not intended to mean "one and only one" unless explicitly so stated, but rather "one or more." All structural and functional equivalents to the elements of the above- described embodiments that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed hereby. Moreover, it is not necessary for an apparatus or method to address each and every problem sought to be solved by the presently described concept, for it to be encompassed hereby.

Claims

1 . A method performed by a system (10), for encoding an input video stream into a compressed output video stream, the input video stream comprising a plurality of data frames, the system comprising a first system node (12) and a second system node (14), the method comprising:

receiving (106), at the first node (12), a first data frame of the input video stream;

receiving (108) at the second node (14), a second data frame of the input video stream;

producing (1 10), at the first node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame;

sending (1 12), by the first node to the second node, information on the produced dependency data;

producing (1 14), at the second node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data;

producing (1 16) a compressed representation of the first data frame based at least on the first data frame, and

combining (1 18) the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

2. Method according to claim 1 , wherein the produced (1 10) dependency data is a reconstructed compressed representation of the first frame.

3. Method according to claim 1 wherein the produced (1 10) dependency data is a motion estimate of the difference between the first frame and the second frame, and wherein the method further comprises receiving (107), at the first system node, the second frame, and wherein the motion estimate is produced

(1 10) based on the first frame and the second frame.

4. Method according to any of the preceding claims, wherein the sent (1 12) information on the dependency data is the actual dependency data.

5. Method according to any of claims 1 -3, wherein the sent (1 12) information on the dependency data is a compressed representation of the dependency data.

6. Method according to claim 5, wherein the sent (1 12) information on the dependency data is compressed relative to information already present on the second system node.

7. Method according to claim 2, further comprising receiving (109) at the second system node, the first data frame, and where the sent (1 12) information on the dependency data represents a pixel-wise difference between the first frame and the reconstructed compressed representation of the first frame.

8. Method according to any of the preceding claims, where the input video stream is a compressed video stream, divided into a first compressed sub-bit stream comprising the first frame and a second compressed sub-bit stream comprising the second frame, wherein the method comprises, before receiving (106, 108) the first frame and the second frame at the first and the second system node, respectively:

decompressing (102), by the first system node, the first compressed sub-bit stream into a first decompressed sub-bit stream comprising the first frame, and

decompressing (104), by the second system node, the second compressed sub-bit stream into a second decompressed sub-bit stream

comprising the second frame.

9. Method according to claim 8, wherein the method further comprises sending, by the first system node, information of the decompressed (102) first sub- bit stream to the second system node.

10. Method according to any of the preceding claims, wherein the system comprises a plurality of system nodes including the first system node (12) and the second system node (14), and the method further comprises selecting the first system node for processing the first data frame based on one or more of the following conditions: presence of a reference frame for the first data frame at each of the plurality of system nodes; presence of the first data frame at each of the plurality of system nodes , processor load of each of the plurality of system nodes.

1 1 . Method according to any of claims 3-10, wherein the system comprises a plurality of system nodes including the first system node (12) and the second system node (14), and the input video stream comprises a first GOP comprising a plurality of data frames including the first data frame and the second data frame, the first GOP having a GOP structure, and the method further comprises:

assigning motion estimation tasks to the determined number of system nodes except for to the first system node.

12. A system (10) configured for encoding an input video stream into a compressed output video stream, the system comprising a first system node (12) and a second system node (14), the input video stream comprising a plurality of data frames, the system (10) comprising at least one processor (603, 623) and at least one memory (604, 624), said at least one memory containing instructions executable by said at least one processor, whereby the system (10) is operative for

receiving, at the first system node (12), a first data frame of the input video stream; receiving, at the second system node (14), a second data frame of the input video stream;

producing, at the first system node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame;

receiving, at the second system node from the first system node, information on the produced dependency data;

producing, at the second system node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data;

producing, at the first or the second system node, a compressed representation of the first data frame based at least on the first data frame, and combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

13. System according to claim 12, wherein the produced dependency data is a motion estimate of the difference between the first frame and the second frame, and wherein the system is further operative for receiving, at the first system node, the second frame, and wherein the system is operative for producing the motion estimate based on the first frame and the second frame.

14. System according to claim 12 or 13, wherein the information on the dependency data that the system is operative to send is a compressed

representation of the dependency data.

15. System according to claim 14, wherein the compressed representation of the dependency data is compressed relative to information already present on the second system node.

16. System according to any of claims 12-15, wherein the produced dependency data is a reconstructed compressed representation of the first frame and wherein the system is further operative for receiving, at the second system node, the first data frame, and wherein the information on the dependency data represents a pixel-wise difference between the first frame and the reconstructed compressed representation of the first frame.

17. System according to any of claims 12-16, wherein the input video stream is a compressed video stream, divided into a first compressed sub-bit stream comprising the first frame and a second compressed sub-bit stream comprising the second frame, wherein the system is further operative for, before the first frame and the second frame is received at the first and the second system node, respectively: decompressing, at the first system node, the first compressed sub-bit stream into a first decompressed sub-bit stream comprising the first frame, and decompressing , at the second system node, the second compressed sub-bit stream into a second decompressed sub-bit stream comprising the second frame.

18. System according to any of claims 12-17, comprising a plurality of system nodes including the first system node (12) and the second system node (14), and the system further being operative for selecting the first system node for processing the first data frame based on one or more of the following conditions: presence of a reference frame for the first data frame at each of the plurality of system nodes; presence of the first data frame at each of the plurality of system nodes, processor load of each of the plurality of system nodes.

19. System according to any of claims 13-18, comprising a plurality of system nodes including the first system node (12) and the second system node (14), and the input video stream comprises a first GOP comprising a plurality of data frames including the first data frame and the second data frame, the first GOP having a GOP structure, and the system further being operative for:

20. A system (10) configured for encoding an input video stream into a compressed output video stream, the system comprising a first system node (12) and a second system node (14), the input video stream comprising a plurality of data frames, the system (10) comprising:

a first receiving module (702) for receiving, at the first system node (12), a first data frame of the input video stream;

a second receiving module (704) for receiving, at the second system node (14), a second data frame of the input video stream;

a first producing module (706) for producing, at the first system node, dependency data representing a dependency between the first data frame and the second data frame, based on the first data frame;

a third receiving module (708) for receiving, at the second system node from the first system node, information on the produced dependency data;

a second producing module (710) for producing, at the second system node, a compressed representation of the second data frame based on the second data frame and the received information on the dependency data;

a third producing module (712) for producing, at the first or the second system node, a compressed representation of the first data frame based at least on the first data frame, and

a combining module (714) for combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

21 . A first system node (12) of a system (10) configured for encoding an input video stream into a compressed output video stream, the system comprising apart from the first system node (12) also a second system node (14), the input video stream comprising a plurality of data frames, the first system node (12) comprising a processor (603) and a memory (604), said memory containing instructions executable by said processor, whereby the first system node (12) is operative for:

receiving a first data frame of the input video stream;

producing dependency data representing a dependency between the first data frame and a second data frame of the input video stream, based on the first data frame;

sending to the second system node, information on the produced dependency data; which information the second system node uses to produce a compressed representation of the second data frame,

optionally producing, a compressed representation of the first data frame based at least on the first data frame, and

optionally combining the compressed representation of the first data frame and the compressed representation of the second data frame into the compressed output data stream.

22. A second system node (14) of a system (10) configured for encoding an input video stream into a compressed output video stream, the system comprising apart from the second system node (14) also a first system node (12), the input video stream comprising a plurality of data frames, the second system node (14) comprising a processor (623) and a memory (624), said memory containing instructions executable by said processor, whereby the second system node (14) is operative for:

receiving a second data frame of the input video stream; receiving, from the first system node, information on dependency data representing a dependency between a first data frame of the input video stream and the second data frame, the dependency data being produced at the first system node,

producing a compressed representation of the second data frame based on the second data frame and the received information on the dependency data; optionally producing a compressed representation of the first data frame based at least on the first data frame, and

23. A computer program (605, 625) comprising computer readable code means to be run in a system (10) configured for encoding an input video stream into a compressed output video stream, the input video stream comprising a plurality of data frames, the system comprising a first system node (12) and a second system node (14), which computer readable code means when run in the system causes the system (10) to perform the following steps:

receiving, at the first system node (12), a first data frame of the input video stream;

receiving, at the second system node (14), a second data frame of the input video stream;

24. A carrier containing the computer program (605, 625) according to claim 23, wherein the carrier is one of an electronic signal, optical signal, radio signal or computer readable storage med^'