WO2008068623A2

WO2008068623A2 - Adaptive interpolation method and system for motion compensated predictive video coding and decoding

Info

Publication number: WO2008068623A2
Application number: PCT/IB2007/004305
Authority: WO
Inventors: Ronggang Wang; Zhen-Nadine Ren
Original assignee: France Telecom
Priority date: 2006-12-01
Filing date: 2007-11-30
Publication date: 2008-06-12
Also published as: WO2008068623A3; CN101632306B; EP2092752A2; CN101632306A

Abstract

Disclosed is an adaptive interpolation method and system for motion compensated predictive video codec, and a decoding method and system corresponding to the interpolation method and system. The interpolation method comprises providing a set of filters including F1 and F2 for a current frame; interpolating a reference frame according to the filters; calculating motion vectors to generate a prediction frame; constructing and adaptively training F1 for a first part of sub-pixel positions; constructing and adaptively training F2 for a second part of sub-pixel positions under the constraint of F1; re-training F1 under the constraint of F2; and updating the filters by the trained filters F1 and F2 to further optimize the filters. In the invention, it is possible to minimize the difference between the current frame and its prediction frame by one pass fast algorithm to make it feasible for real-time coding application.

Description

ADAPTIVE INTERPOLATION METHOD AND SYSTEM FOR MOTION COMPENSATED PREDICTIVE VIDEO CODING AND DECODING

FIELD OF THE INVENTION

[0001] The invention relates to video coding and decoding technology, and particularly to an adaptive interpolation method and the system for the improvement of a motion compensated predictive video coding and decoding.

TECHNICAL BACKGROUND OF THE INVENTION

[0002] A typical video encoding system is based on the motion compensated prediction technique with motion vectors of a fractional pixel resolution. For example, in the video coding standard MPEG-2, the motion vectors can be in half-pixel resolution (or precision). In MPEG-4 version video coding standard, the resolution of the motion vectors can be higher, i.e., in 1/4-pixel resolution. Another technique known as Advanced Video Coding (AVC) allows 1/8-pixel resolution for the motion vectors.

[0003] Recently, a new technique known as adaptive interpolation filtering (AIF) has been developed to interpolate the various pixel resolutions of motion compensated prediction. AIF takes into account the alteration of image signal properties, especially aliasing, for the purpose of minimization of the prediction error energy. AIF was introduced in a paper entitled "Motion- and Aliasing-compensated Prediction for Hybrid Video Coding" disclosed in "Circuits and Systems for Video Technology" (IEEE Transactions on Volume 13, Issue 7, July 2003 Page(s): 577 - 586). Basically, AIF depends on filter coefficients that are adapted once per frame to non-stationary statistical properties (e.g. aliasing, motion) of video signals. The adapted coefficients are coded and transmitted as a part of the frame. However, the author of the paper employs a Downhill simplex search method to find local minimum filters, instead of global minimum filters, which renders a heavy computing load.

[0004] US Patent Application No. 2004/0076333, entitled "Adaptive Interpolation Filter System for Motion Compensated Predictive Video Coding", also disclosed an adaptive interpolation filter for minimizing the predication error in a video codec. The adaptive interpolation filter utilizes a heuristic search technique to increase the efficiency of coding. The main disadvantage of the heuristics search method is that it fails to converge to "optimum" or "near optimum" solutions unless it initially begins from a "good" initial starting point. The global minimum filter may never be found if the initial starting point is poorly chosen. One way to counteract this problem is to execute multiple passes searching. However, the multiple passes searching will definitely increase the computing load thereof, which is not suitable for those real-time coding applications.

[0005] In the 30^th meeting of ITU-T (ITU-Telecommunications Standardization Sector), a document VCEG-AD08 entitled "Prediction of

P- and B-Frames Using a Two-dimensional Non-separable Adaptive

Wiener Interpolation Filter for H.264/AVC" disclosed an a two-dimensional

(2D) non-separable interpolation filtering technique, which is composed of five groups of independently calculated filters in view of each frame by minimizing prediction errors. The problem of this method is that there's no relationship among the five groups of filters, so it is required to adopt a big number of bits to transmit the five groups of filter coefficients for each frame. Therefore, the two-dimensional (2D) non-separable interpolation filtering technique imposes a heavy computing complexity in both processes of filter training and interpolation operation. SUMMARY OF THE INVENTION

[0006] In order to overcome the shortcoming of the prior art, the present invention provides an adaptive interpolation method and system for motion compensated predictive video codec, which is capable of minimizing the difference between raw picture and predicted pictures.

Moreover, the present invention also provides a decoding method and system corresponding to the interpolation method and system. According to the invention, the training procedure to find an optimal interpolation filter can be performed by one pass fast algorithm to make it feasible for real-time coding application.

[0007] According to an aspect, the present invention provides an adaptive interpolation method for motion compensated predictive video codec, which comprises: providing a set of filters for a current frame; interpolating a reference frame having a certain precision according to the set of filters; calculating motion vectors to generate a prediction frame of the current frame in view of the interpolated reference frame; constructing a first interpolation filter F1 of the set of filters in view of a first part of all the sub-pixel positions according to a fixed linear relationship among samples of the first part; training the first filter F1 by performing Least Square Estimation on the sub-pixel positions of the first part; constructing a second filter F2 in view of a second part of all the sub-pixel positions according to a fixed linear relationship among samples of the second part; training the second filter F2 of the set of filters by performing Least Square Estimation on the sub-pixel positions of the second part under the constraint of F1 ; re-training the first filter F1 on the sub-pixel positions of the first part under the constraint of the second filter F2; and updating the set of filters by the trained filters F1 and F2 to further optimize the set of filters by iteratively performing the above steps from the step for interpolating to the step for updating until the stopping condition is satisfied.

[0008] Preferably, the first filter F1 is employed to interpolate samples at horizontal half-pixel positions or vertical half-pixel positions.

[0009] Preferably, the second filter F2 is employed to interpolate samples at horizontal and vertical half-pixel positions.

[0010] Preferably, after the samples at half-pixel positions are interpolated, the samples at other sub-pixel positions are interpolated under a fixed linear relationship between the samples at half-pixel or integer-pixel positions and the samples at sub-pixel position with higher precision.

[0011] Preferably, said step for employing the filter F2 to interpolate samples at horizontal and vertical half-pixel positions further comprises: filtering the up-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a first middle result; filtering the up-right NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a second middle result; filtering the down-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a third middle result; filtering the down-right NxN integer samples of horizontal and vertical half-pixel sample positions by using the filter F2 to obtain a fourth middle result; and interpolating samples at horizontal and vertical half-pixel sample positions by averaging the first, the second, the third, and the fourth obtained results, wherein N is an integer. [0012] Moreover, the present invention further provides a video encoder, which comprises a summer, a motion compensation module, a motion estimation module, an encoding module, a feedback decoding module and an adaptive interpolation system, wherein said adaptive interpolation system further comprises: a device configured to provide a set of filters for a current frame; a device configured to interpolate a reference frame having a certain precision according to the set of filters; a device configured to calculate motion vectors of the current frame in view of the interpolated reference frame; a device configured to train at least one of the filter sets by performing Least Square Estimation using the calculated motion vectors according to an equation as below:

and to update the filter sets by replacing the at least one filter set with the trained filter to obtain an optimum filter set, wherein e represents the difference between the current frame and a prediction of the current frame; S represents the current frame; P represents the reference frame; x, y represent the x and y coordinates, respectively; NxM is the size of the filter, {mvx, mvy) represents the motion vectors; h represents the float filter coefficients, /, j represent the coordinates of filter coefficients; and a device configured to obtain a desirable prediction of the current frame by using the optimum filter set.

[0013] According to another aspect, the present invention provides a decoding method for motion compensated predictive video codec, which comprises receiving encoded set of filters, motion vectors and prediction error, in which said filters include a first filter F1 and a second filter F2; decoding the received set of filters, motion vectors and prediction error by using predictive coding and exponent-Glomob method; determining samples to be interpolated according to the decoded motion vectors; interpolating a reference frame using the decoded set of filters, which further includes: applying the filter F1 to interpolate a first plurality of samples among said determined samples where the first plurality of samples are at horizontal or vertical half-pixel sample positions; and applying the filter F2 to interpolate a second plurality of samples among said determined samples where the second plurality of samples are at horizontal and vertical half-pixel sample positions; and reconstructing the current frame using the interpolated reference frame, the decoded motion vectors and the decoded prediction error.

[0014] Preferably, said step for applying the filter F2 to interpolate said second plurality of samples further comprises: filtering the up-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a first middle result; filtering the up-right NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a second middle result; filtering the down-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a third middle result; filtering the down-right NxN integer samples of horizontal and vertical half-pixel sample positions by using the filter F2 to obtain a fourth middle result; and interpolating samples at horizontal and vertical half-pixel sample positions by averaging the first, the second, the third, and the fourth obtained results, wherein N is an integer.

[0015] Preferably, said interpolating the reference frame using the decoded set of filters further includes: applying a fixed filter to interpolate samples at other sub-pixel sample positions under a fixed linear relationship between the samples at half-pixel or integer-pixel positions and the samples at sub-pixel position with higher precision after the samples at half-pixel positions are interpolated using the filter F1 or filter F2.

[0016] Moreover, the present invention further provides a video decoder, which comprises a decoding module configured to receive and decode an encoded set of filters, motion vectors and prediction error; a motion compensation module configured to interpolate the reference frame using the decoded set of filters including a first filter F1 and a second filter F2, which further comprises: means for determining samples to be interpolated according to the decoded motion vectors; means for applying the filter F1 to interpolate a first plurality of samples among said determined samples if the first plurality of samples are at horizontal or vertical half-pixel sample positions; and means for applying the filter F2 to interpolate a second plurality of samples among said determined samples if the second plurality of samples are at horizontal and vertical half-pixel sample positions; and a reconstruction module configured for reconstructing the current frame using the interpolated reference frame, the decoded motion vectors and the decoded prediction error.

[0017] Preferably, said means for applying the filter F2 to interpolate said second samples further comprises: means for filtering the up-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a first middle result; means for filtering the up-right NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a second middle result; means for filtering the down-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a third middle result; means for filtering the down-right NxN integer samples of horizontal and vertical half-pixel sample positions by using the filter F2 to obtain a fourth middle result; and means for interpolating samples at horizontal and vertical half-pixel sample positions by averaging the first, the second, the third, and the fourth obtained results, wherein N is an integer.

[0018] Preferably, said motion compensation module further comprises: means for applying a fixed filter to interpolate samples at other sub-pixel sample positions under a fixed linear relationship between the samples at half-pixel or integer-pixel positions and the samples at sub-pixel position with higher precision after the samples at half-pixel positions are interpolated using the filter F1 or filter F2.

BRIEF DESCRIPTION OF DRAWINGS

[0019] Fig. 1 is a block diagram of a video codec having an adaptive interpolation system;

[0020] Fig. 2 is a flow chart illustrating the process of the video encoding with adaptive interpolation filtering;

[0021] Fig. 3 is a flow chart illustrating the first embodiment of the process of training adaptive interpolation filters; [0022] Fig. 4 is a flow chart illustrating the second embodiment of the process of training adaptive interpolation filters;

[0023] Fig. 5 shows a sub-pixel interpolation scheme of H.264/AVC by incorporating the interpolation method according to the present invention, wherein those shaded blocks with upper-case letters represent integer samples and unshaded blocks with lower-case letters represent fractional sample positions for quarter sample Luma interpolation;

[0024] Fig. 6 is a photo for showing subjective reconstructed video quality comparison between with and without adaptive interpolation system in H.264/AVC; [0025] Fig. 7 is a flow chart illustrating a decoding method according to the present invention; and

[0026] Fig. 8 is a block diagram of a decoder for implementing the decoding method according to Fig. 7.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0027] The present invention and various advantages thereof will be described in conjunction with exemplary embodiments with reference to the accompanying drawings.

[0028] Fig. 1 is a block diagram showing a video codec 170 with an adaptive interpolation system 110, which is capable of improving the video compression efficiency by utilizing an adaptive filter set in the process of motion compensated prediction.

[0029] As shown in Fig. 1 , the video codec 170 comprises an encoder 171 and a decoder 172. The encoder 171 comprises a summer 120, a motion compensation module 115, a motion estimation module 105, an encoding module 125, a feedback decoding module 130, and an adaptive interpolation system 110. The decoder 172 comprises a decoding module 135, a motion compensation module 140, and a reconstruction module 145.

[0030] A current frame s(t), namely, a raw image signal to be coded is input into the encoder 171 , namely, input to the summer 120, the adaptive interpolation system 110 and the motion estimation module 105. The current frame s(t) may be predicted by a motion compensated prediction technique based on a reference frame s'(t-1) which was obtained by reconstructing a previous encoded frame in the feedback decoding module 130.

[0031] It is known from Fig. 1 that an interpolated frame is transmitted from the adaptive interpolation filter system 110 into the Motion Estimation module 105. The interpolated frame is obtained by interpolating the reference frame s'(t-1) according to a default filter set of the adaptive interpolation system 110. The default filter set may be a fixed filter set or an adaptive filter set trained by the immediate preceding frame.

[0032] A filter set in the present invention comprises a set of filters, each of which is designed for its specific sub-pixel resolution positions. For example, in a 1/4 pixel resolution interpolation filter system 110, two kinds of filters may be required: the first one for interpolating horizontal 1/2 sub-pixel positions and the vertical 1/2 sub-pixel positions of the reference frame, and the second one for interpolating the 1/4 sub-pixel positions of the reference frame. Moreover, the interpolation filter system 110 is also capable of determining the pattern of the filter set, such as the relationship among the filters.

[0033] The motion estimation module 105 partitions the input current frame s(t) into multiple blocks and assigns a motion vector MV to each of the blocks in view of the interpolated frame. It is apparent that the motion vectors relating to the interpolated frame and the current frame may have a fractional pixel resolution. The motion vectors MV for all the blocks in the current frame s(t) are provided to the adaptive interpolation system 110, the motion compensation module 115, and the encoding module 125. The motion compensation 115 utilizes the received motion vectors as well as the interpolation filter set from the adaptive interpolation filter system 110 to generate a prediction so as to obtain the prediction frame s_pre(t).

[0034] According to the present invention, the adaptive interpolation filter system 110 receives the current frame s(t) from the input of the encoder 171 , the reference frame s'(t-1) from the feedback decoding 130, and motion vectors from the motion estimation 105, and adaptively optimizes a filter set by utilizing the above received information until an optimum filter set occurs. The principle of the adaptive interpolation filter system 110 as well as an optimization process employed therein will be described in detail later.

[0035] The motion compensation 115 utilizes the optimum filter set derived from the adaptive interpolation filter system 110 to improve the prediction s_pre(t) of the current frame s(t). The prediction s_pre(t) of the current frame s(t) is transmitted to the summer 120 and subtracted from the current frame s(t). The difference between the input current frame s(t) and the prediction s_pre(t) is encoded by the encoding module 125.

[0036] The encoded difference, together with the encoded motion vectors of the current frame, is sent to the decoding module 135. The optimum filter set obtained by the adaptive interpolation system 110 is also transmitted to motion compensation module 140.

[0037] The decoding module 135 decodes the encoded difference and the encoded MV, and transmits the decoded signals to the motion compensation module 140.

[0038] The motion compensation module 140 is used for determining the samples to be interpolated according to the decoded MV and for interpolating the reference frame so as to recover the motion compensated prediction frame based on the decoded difference and motion vectors by using the optimum filter set from the adaptive interpolation system 110.

[0039] The reconstruction module 145 receives the decoded difference from the decoding module 135 and the motion compensated prediction frame from motion compensation module 140 so as to reconstruct the required video signals s'(t) by sum of the decoded difference and the decoded prediction.

[0040] As stated above, the adaptive interpolation filter system 110 is able to adaptively optimize a filter set according to the current frame s(t), the previously reconstructed reference frame s'(t-1) and motion vectors having a fractional pixel resolution to obtain an optimum filter set. The optimization process carried out by the adaptive interpolation filter system 110 is described with reference to Figs. 2, 3 and 4.

[0041] Fig. 2 shows an encoding process of a current frame carried out by encoder 171. The frame to be processed is an inter-frame. The inter-frame refers to a frame in a video codec which is expressed as the change from one or more other frames. The "inter" part of the term refers to the use of inter-frame prediction.

[0042] As shown in Fig. 2, step 200 is carried out to determine whether the current frame to be coded is the first inter-frame.

[0043] If "yes", a default filter set is selected in step 210 and a reference frame of the first inter-frame is interpolated in step 215 by the default filter set. As mentioned above, the default filter set may be a fixed filter set preset in the system 110. [0044] If the current frame is not the first inter-frame, namely, one or more inter-frames have been processed before the current frame is processed, an adaptive filter set is selected in step 205. This adaptive filter set may be the optimum filter set obtained by the training process of the immediate preceding inter-frame.

[0045] Similarly, a reference frame will be interpolated in step 215 by the selected adaptive filter set.

[0046] In step 220, each block of the current frame in view of the corresponding block of the reference frame with a fractional pixel resolution (interpolated frame) will be searched so that motion vectors representative of the least distortion between current frame and its prediction frame are obtained.

[0047] It is understood that until now the motion estimation is implemented based on a default filter set selected in step 210 or an adaptive filter set selected in step 205. In the following step 225, the default filter set or the adaptive filter set (hereafter "designated filter set") will be optimized to derive an optimum filter set for the current frame so as to improve the motion estimation and thereby enhance the coding efficiency. The objective of the optimization is to minimize the prediction error between the current frame and the prediction frame by a Least Square Estimation. The prediction error is represented by (e)² using the following formula

(e)² D ∑∑(S_x,_y D Spre_x,_yr 1 -1 x y wherein, S represents the current frame to be coded; S_pre represents the prediction frame from the motion compensation module 115; x and y represent x and y coordinates, respectively, of a pixel of the current frame.

[0048] If, in step 230, the optimized filter set satisfies a stopping condition, the optimized filter set is then identified to be the optimum interpolation filter set for the current frame. The motion compensation prediction for the current frame will be executed in step 235. Afterwards, the current frame is encoded using the motion predictive estimation with the optimum filter set of the invention in step 240.

[0049] Otherwise, the procedure returns to step 205 at which the obtained optimized filter set is selected to be the current adaptive filter set. Then, steps from 205 to 230 will be repeated to iteratively optimize the filter set until the stopping condition is satisfied. According to the present embodiment, the stopping condition may be a preset number of iteration cycles, a set of desirable coefficients of the filter set, or a desirable prediction error. It is known that stopping condition should be determined by trade-off between the distortion of an image and the complexity of processing the image.

[0050] As stated above, the present invention aims to minimize the prediction error by optimizing the filter set using the Least Square Estimation. The detailed optimization procedure will be described hereinafter by referring to Fig. 3.

[0051] Fig. 3 is a flowchart illustrating the adaptive optimizing step

225 carried out by the adaptive interpolation system 110 according to the first embodiment. According to the present embodiment, the coefficients of all the filters of the filter set can be simultaneously trained to minimize the prediction error by using Least Square Estimation.

[0052] Before optimizing a filter set, it is necessary to determine parameter values of the filter set in step 300 and the filtering pattern in step 305 according to the practical requirements.

[0053] The parameter values comprises such as sub-pixel resolution for determining the number of filters needed for the filter set, filter taps representing the size of each filter of the filter set. The filtering pattern includes filtering patterns in respect of each sub-pixel position as well as the relationship among the filters.

[0054] In step 310, coefficients of the filter set (i.e. coefficients of each filter with a specific sub-pixel resolution) are adaptively trained for minimizing the square error (e)² in formula 1 -1. According to the present invention, the prediction frame S_pre in formula 1 -1 can be calculated using the following formula

N M

Spre_{χ U} a Y Y h_{1 1}P , _N , , _M 1 -2

_/Ξi /Ξi ixπmvxjπϊLπtei.lyπmvyjπfπtej

wherein NxM is the size of a filter, P represents the reference frame; (mvx, mvy) represents the motion vectors of a current sub-pixel at the position (x, y); h represents the filter coefficients for the current sub-pixel position; the filter size is decided by filter taps which was determined in step 200 as shown in Fig. 2.

[0055] As stated above, the square error (e)² can be obtained by using the following formula

wherein e represents the difference between the current frame and a prediction of the current frame; NxM is the size of a filter, S represents the current frame; P represents the reference frame; x and y represent the x and y coordinates, respectively; {mvx, mvy) represents the motion vectors; h represents the float filter coefficients, and /, j represent the coordinates of filter coefficients.

[0056] The training of a filter set in step 310 is to calculate optimum filter coefficients h for minimizing the square error (e)². Such a training step can be achieved by using Least Square Estimation. Moreover, in the process of carrying out step 310, the coefficients h of the present invention are float coefficients, which are different from the quantization coefficients used in the US patent application No. 2004/0076333 as stated in the background. In order to minimize the prediction error, in US

2004/0076333, quantization coefficients of the filter is searched using the heuristic search method. In the present invention, float coefficients of the filter set is derived using the Least Square Estimation method. Therefore, the filter set obtained using the present invention is a global optimum interpolation filter set.

[0057] Then, step 315 is carried out for mapping the float filter coefficients to quantization filter coefficients according to the required precision of the present embodiment. It is understood that this mapping step is employed for facilitating the training of the interpolation filter set.

[0058] Now, the filter set with quantization coefficients is the trained filter set in the current iteration. The procedure will go to step 230 of Fig. 2 to determine whether the trained filter set of the current iteration satisfies a stopping condition. If it is "yes", the trained filter of this iteration is the desired optimized interpolation filter, namely, optimum filter set.

[0059] Briefly, the objective of the optimization is to minimize e square as mentioned above according to Figs. 2 and 3. It is impossible to directly apply the Least Square Estimation to error e due to the unknown motion vector {mvx, mvy) and h in formula 1 -3. So, the above embodiment sets forth a solution to address this issue in this way: setting a default filter set or an adaptive interpolation filter set H'; finding motion vectors which can optimize the objective by motion estimation; performing the Least Square Estimation on the interpolation filter set H under the constrains of just obtained motion vectors; and the filter set H can replace the filter set H' in step 1 to further optimize the interpolation filter set by interactively performing steps 1 -3 until coefficients of the filter set H are convergent.

[0060] As stated above, it is possible to get an optimum filter for each sub-pixel position, so it is also possible to get an optimum filter set H_opt for all sub-pixel positions.

[0061] The present invention proposes the second embodiment which is capable of reducing the filter coefficients bit rate of a filter set H and the computing complexity for filtering the whole set S of sub-pixel positions. With reference to Fig. 4, a filter set H in step 225 as shown in

Fig. 2 can be also optimized.

[0062] Step 400 is to construct a filter F1 according to a predetermined filtering pattern and assumed relationship among the first sub-set of sub-pixel positions. [0063] In detail, for illustrative purpose, samples at horizontal half-pixel positions and vertical half-pixel positions share the same interpolation filter F1. In the meantime, for instance, F1 is used to compute middle results which will be further used to interpolate those samples at other related sub-pixel positions higher than half-pixel precision. The relationship between the samples at sub-pixel position with higher precision than half-pixel precision and the samples at horizontal half-pixel position or vertical half-pixel position with half-pixel precision should be defined by fixed linear functions, such as a linear averaging function. The set of all samples at related sub-pixel positions in this step are called S1.

[0064] In step 405, F1 is optimized by Least Square Estimation for minimizing the prediction error of S1 between the current frame and the prediction frame. The difference between this embodiment and the above-mentioned first embodiment is that the prediction frame of the first embodiment is obtained based on the whole filter set including all filters in the set, and all the filters are trained simultaneously for minimizing the prediction error, while in the second embodiment the prediction frame in step 405 is obtained based on the filter F1 only and therefore the training procedure herein is only for the filter F1.

[0065] Step 410 is to construct another filter F2 according to the predetermined filter pattern and assumed relationship among another sub-set of sub-pixel positions.

[0066] In detail, it is assumed that samples in horizontal and vertical half-pixel positions are interpolated by another filter F2. In the meantime, it is also assumed that F2 is used to compute the middle results which will be further used to interpolate samples which are at higher positions than half-pixel positions and are related to the samples in horizontal and vertical half-pixel positions as defined by said fixed linear functions or F1. The set of all samples at related sub-pixel positions in this step are called S2.

[0067] In step 415, F2 is optimized by Least Square Estimation under the constraints of S2 and optimized F1 obtained in step 405. The optimizing procedure of F2 is similar to that of F1 as described in step 405, so it is omitted herein.

[0068] After that, F1 is further optimized by Least Square Estimation under the constraints of S1 and optimized F2 in step 420.

[0069] Then, the procedure goes to step 425 for determining whether the optimizing procedure satisfies a stop condition. As stated above, the stopping condition may be a preset number of iteration cycles, coefficients of the filter set are convergent, or the prediction error between the current frame and the prediction frame being within a desirable range.

[0070] If a required stopping condition is satisfied, the current F1 and F2 forms an optimum filter set and the procedure goes to step 235 of Fig.2. Otherwise, the procedure goes to step 205 of Fig.2.

[0071] So, the filter set H is reduced to two interpolation filters F1 and F2, which could be used together with said fixed linear relationship among the sub-pixel positions to interpolate the pixels of the whole sub-pixel positions.

[0072] In this example, we assume S = S1 + S2, so F1 and F2 are optimized, respectively, under the constraints of the whole set S of sub-pixel positions. It is understood by those skilled in the art, though the present embodiment employs two adaptively optimized filters F1 and F2, the present invention is not limited to it. The number of the adaptively filters can be determined according to the practice. For example, if there are samples not in S1 or S2 but in S3, where S=S1 +S2+S3, another filter F3 can be introduced and optimized by the similar steps as F2 until all the samples in S are covered.

[0073] Following is an example of the present invention implemented on the platform of H.264/AVC.

[0074] For the purpose of comparison, the sub-pixel interpolation scheme of H.264/AVC for a Luma component is first described in conjunction with Fig. 5. Given Luma samples 'A' to 'U' at full-sample locations, Luma samples 'a' to 's' at fractional sample positions are derived by the following rules. The Luma prediction values at half sample positions shall be derived by applying a fixed 6-tap filter with tap values (1 , -5, 20, 20, -5, 1). The Luma prediction values at quarter sample positions shall be derived by averaging samples at full and half sample positions. The sample at a half sample position labeled with "b" is derived by calculating an intermediate value b-, by applying the fixed 6-tap filter to the nearest integer position samples E, F, G, H, I and J in the horizontal direction. The sample at a half sample position labeled with "h" is derived by calculating an intermediate value h-, by applying the fixed filter to the nearest integer position samples A, C, G, M, R and T in the vertical direction, namely: b_A = (E - 5 ^* F + 20 ^* G + 20 ^* H - 5 ^* I + J), h₁ = (A - 5 ^* C + 20 ^* G + 20 ^* M - 5 ^* R + T). wherein E, F, G, H, I and J represent six full samples in the horizontal direction, respectively; and A, C, G, M, R and T represent six full samples in the vertical direction, respectively. Due to applying the fixed filter to the half samples b and h, each of taps will be applied to each of full samples in each of directions.

[0075] The final prediction values b and h shall be derived using: b = Clip1 ( ( b1 + 16 ) » 5 ) h = Clip1 ( ( h1 + 16 ) » 5 ), wherein the shift sign "» n" means to shift (b1 +16) or (h1 +16) rightwards by n bits (here n is an integer) and the sign "CMpI " is a mechanism which constrains the filtered result b and h in the range of 0 to 255. In the above equations, n equals to 5, namely, the value of b or h is divided by 2⁵ (because they have been scaled up 32 by filter (1 , -5, 20, 20, -5, 1) in the above process).

[0076] Moreover, according to the conventional interpolation scheme of H.264/AVC for Luma component, the samples at quarter sample positions labeled as a, c, d, n, f, i, k, and q shall be derived by averaging with upward rounding of the two nearest samples at integer and half sample positions. And, the samples at quarter sample positions labeled as e, g, p, and r shall be derived by averaging with upward rounding of the two nearest samples at half sample positions in the diagonal direction.

[0077] Compared with the conventional interpolation scheme of

H.264/AVC for Luma component, the interpolation method of the present invention is described as below.

[0078] According to the present embodiment, the motion vector precision is set to be 1/4 pixel and the largest reference area of one sub-pixel position is set as 6x6, which may be done in step 300 of Fig. 3.

[0079] Further, the filtering pattern is also determined in step 305 of Fig. 3.

[0080] First, an asymmetrical 6-tap filter F1 (xθ, x1 , x2, x3, x4, x5) is used to interpolate the samples like "b" and "h". The filtering operation is the same as that of H.264/AVC. F1 is constrained by the condition "xθ + x1 + x2 + x3 + x4 + x5 = 1 ", so at most five filter coefficients are enough to denote F1.

[0081] According to the embodiment, "a" is computed by averaging "G" and "b", "d" is computed by averaging "G" and "h". F1 is also used to interpolate the samples at sub-pixel positions "a" and "d" by firstly interpolating samples at "b" and "h".

[0082] Under the assumption that "c" is computed by averaging "b" and "H", "n" is computed by averaging "h" and "M ", F1 also is used to interpolate the samples at sub-pixel positions "c" and "n" by firstly interpolating samples at "b" and "h".

[0083] Under the assumption that "e" is computed by averaging "b" and "h", "g" is computed by averaging "b" and "m", "p" is computed by averaging "h" and "m", "r" is computed by averaging "s" and "m", F1 is also used to interpolate sub-pixel positions of "e", "g", "p" and "r" by firstly interpolating the samples at positions "b", "h", "m", and "s".

[0084] By now, the filter F1 can be optimized by Least Square

Estimation method using the formula 1 -3 as described above under the constraints of samples at sub-pixel positions of "a", "b", "c", "d", "e", "g", "h", "n", "p" and "r".

[0085] A 9-tap filter F2 (yO, y1 , y2, y3, y4, y5, y6, y7, y8) is used to interpolate the sample of "j". It filters respectively the up-left 3x3 integer samples, up-right 3x3 integer samples, down-left 3x3 integer samples and down-right 3x3 integer samples of sub-pixel position sample "j"- F2 is constrained by the condition "yO + y1 + y2 + y3 + y4 + y5 + y6 + y7 + y8 = 1/4", so at most eight filter coefficients are enough to denote F2.

[0086] In detail, to interpolate sample "j", F2 filters the up-left 3x3 integer samples of sample "j"(AO, A1 , A, CO, C1 , C, E, F and G4) and get a middle result G1. F2 filters the up-right 3x3 integer samples of sample "j"(BO, B1 , B, DO, D1 , D, J, I and H) and get a middle result H1. F2 further filters the down-left 3x3 integer samples of sample "j"(TO, T1 , T, RO, R1 , R, K, L and M) and get a middle result M1 and filters the down-right 3x3 integer samples of sample "j"(UO, U1 , U, SO, S1 , S, Q, P and N) and get a middle result N1. Then, the interpolated sample "j" is computed by averaging G1 , H1 , M1 and N1.

[0087] Under the assumption that "f" is computed by averaging "b" and "j", "k" is computed by averaging "m" and "j", "q" is computed by averaging "s" and "j" (using a conventional averaging filter), "i" is computed by averaging "h" and "j", F2 and averaging filters are used for interpolating sub-pixel positions of "f", "i", "k" and "q". Sub-pixel positions samples of "b", "m", "s" and "h" are computed by filter F1.

[0088] By now, the filter F2 can be optimized by Least Square Estimation method under the constraints of F2 and the samples at sub-pixel positions of "j", "f", "i", "k" and "q". [0089] F1 and F2 are optimized iteratively until the coefficients of F1 and F2 are both convergent.

[0090] In the conventional technology, coefficients relating to F1 and F2 are fixed or adaptively searched by Downhill simplex search or heuristic search method. According to the present invention, in step 310 as shown in Fig. 3, filters F1 and F2 are trained by Least Square Estimation method, and they are optimized iteratively. Moreover, LDL^T (Lower Triangle Matrix Diagonal Lower Triangle Matrix Transpose) decomposing can be employed to accelerate the calculation of filter coefficients. As stated above, the coefficients obtained according to the present invention are float coefficients. In step 315, filters F1 and F2 are scalar quantized by stepsize of 1/128 (which can be realized using the equation as Q(x) = (x + 1/256)/(1/128) in Filtering Quantization. Furthermore, the quantized F1 and F2 can be used to interpolate the reference frame in the next loop if necessary (namely, if the current iteration cannot make F1 and F2 satisfy the stopping condition). The quantized F1 and F2 are encoded by, e.g. a known method as called "predictive coding and exponent-Glomob". The encoded filters F1 and F2 will be transmitted to the encoder 125 as a part of the raw frame.

[0091] Fig. 6 shows an experimental result by employing the interpolation method according to the present invention.

[0092] It is observed from Fig. 6 that the subjective quality is improved. The H.264/AVC codec with adaptive interpolation module provides better quality at lower bit-rate compared with that of without adaptive interpolation module. By using the adaptively trained interpolation filter of the present invention, about 0.4 dB improvement is gained for the decoded frame. [0093] From the above description of the invention as well as shown in the experimental photos and graphs, it is understood that the adaptive interpolation system according to the present invention could be integrated in the reference models of video coding standards, H.264/AVC and AVS. By using the adaptive interpolation system, both the subjective and objective quality of reconstructed video is greatly improved in comparison with that of H.264/AVC and AVS with little computer cost.

Time cost of the method is much less than the known methods as given in the background. One-pass training can still obtain comparable improvement as their multiple pass training.

[0094] In the following, the decoding method for motion compensated predictive video codec is described in detail with reference to Fig. 7.

[0095] As shown in Fig. 7, step 700 is implemented for receiving encoded information including the encoded set of filters, motion vectors and prediction error from an encoder like the encoder 171. According to the present embodiment of the present invention, the set of filters includes a first filter F1 and a second filter F2, but is not limited to this proposal.

[0096] In step 705, the received filters F1 and F2, motion vectors and prediction error are entropy decoded and recovered from the bitstream according to the known technique named "predictive coding and exponent-Glomob".

[0097] Then, step 710 is implemented to determining samples to be interpolated according to the decoded motion vectors. [0098] After that, a reference frame is interpolated using the decoded set of filters by applying the filter F1 to interpolate a first plurality of samples among said determined samples, wherein the first plurality of samples are at horizontal or vertical half-pixel sample positions in step 715; and by applying the filter F2 to interpolate a second plurality of samples among said determined samples, wherein the second plurality of samples are at horizontal and vertical half-pixel sample positions in step 720.

[0099] In step 725, the current frame is reconstructed using the interpolated reference frame, the decoded motion vectors and the decoded prediction error.

[0100] According to an embodiment, given Luma samples 'A' to 'U' at full-sample locations, as shown in Fig. 5, Luma samples 'a' to 's' at fractional sample positions are derived by the following rules. The Luma prediction values at horizontal or vertical half sample positions S1 (e.g., the sample position at b) shall be derived by applying filter F1 with tap values (xθ, x1 , x2, x3, x4, x5). The Luma prediction values at horizontal and vertical half sample positions S2 (e.g., the sample position at h) shall be derived by applying filter F2 with tap values (yθ, y1 , y2, y3, y4, y5, y6, y7, y8) and averaging filter.

[0101] The Luma prediction values at quarter sample positions shall be derived by averaging samples at full and half sample positions. The sample at a half sample position labeled with "b" is derived by calculating an intermediate value b-, by applying the adaptive filter F1 to the nearest integer position samples E, F, G, H, I and J in the horizontal direction. The sample at a half sample position labeled with "h" is derived by calculating an intermediate value h-, by applying the adaptive filter F1 to the nearest integer position samples A, C, G, M, R and T in the vertical direction, namely:

D₁ = (xO^*E + x1 ^* F + x2 ^* G + x3 ^* H + x4 ^* I + x5^*J), h_A = (xO^*A - x1 ^* C + x2 ^* G + x3 ^* M - x4 ^* R + x5^*T), wherein E, F, G, H, I and J represent six full samples in the horizontal direction, respectively; and A, C, G, M, R and T represent six full samples in the vertical direction, respectively. Due to applying the fixed filter to the half samples b and h, each of taps will be applied to each of full samples in each of directions.

[0102] The final prediction values b and h shall be derived using: b = Clip1 ( ( b1 + 64 ) » 7 ), h = Clip1 ( ( h1 + 64 ) » 7 ), wherein the shift sign "» n" means to shift (b1 +64) or (h1 +64) rightwards by n bits (here n is an integer) and the sign "CMpI " is a mechanism which constrains the filtered result b and h in the range of 0 to 255. In the above equations, n equals to 7, namely, the value of b or h is divided by 2⁷ (because they have been scaled up 128 by filter F1 (xθ,x1 , x2, x3, x4, x5) in the above process).

[0103] The sample at horizontal and vertical half sample position labeled with "j" is derived by applying F2 with tap values (yθ, y1 , y2, y3, y4, y5, y6, y7, y8) respectively to the 3x3 integer samples of each corner of "j". In detail, to interpolate sample "j", F2 filters the up-left 3x3 integer samples of sample "j"(AO, A1 , A, CO, C1 , C, E, F and G4) and gets a middle result G1. F2 filters the up-right 3x3 integer samples of sample "j"(BO, B1 , B, DO, D1 , D, J, I and H) and gets a middle result H1. F2 further filters the down-left 3x3 integer samples of sample "j"(TO, T1 , T, RO, R1 , R, K, L and M) and gets a middle result M1 , and F2 also filters the down-right 3x3 integer samples of sample "j"(UO, U1 , U, SO, S1 , S, Q, P and N) and gets a middle result N1.

G₁ = (yO^*AO + y1 ^* A1 + y2 ^* A + y3 ^* CO + y4 ^* C1 + y5 ^* C1 + y6 ^* C + y7 ^* E + y8 ^* F),

H₁ = (yO^*BO + y1 ^* B1 + y2 ^* B + y3 ^* DO + y4 ^* D1 + y5 ^* D + y6 ^* J + y7 ^* I + y8 ^* H),

M₁ = (yO^*TO + y1 ^* T1 + y2 ^* T + y3 ^* RO + y4 ^* R1 + y5 ^* R + y6 ^* K + y7 ^* L + y8 ^* M),

N₁ = (yO^*UO + y1 ^* U1 + y2 ^* U + y3 ^* SO + y4 ^* S1 + y5 ^* S + y6 ^* Q + y7 ^* P + y8 ^* N).

[0104] Then, the interpolated sample "j" is computed by averaging G1 , H1 , M1 and N1 , j = Clip1 ((G1 + H1 + M1 + N1 + 256) » 9), wherein the shift sign "» n" means to shift (G1 + H1 + M1 + N1 + 256) rightwards by n bits (here n is an integer) and the sign "CMpI " is a mechanism which constrains the filtered result b and h in the range of 0 to

255. In the above equations, n equals to 9, namely, the value of j is divided by 2⁹ (because it has been scaled up 512 in the above process).

[0105] Moreover, as known in the conventional interpolation scheme of H.264/AVC for Luma component, the samples at quarter sample positions labeled as "a, c, d, n, f, i, k, and q" shall be derived by averaging with upward rounding of the two nearest samples at integer and half sample positions. And, the samples at quarter sample positions labeled as "e, g, p, and r" shall be derived by averaging with upward rounding of the two nearest samples at half sample positions in the diagonal direction.

[0106] In order to implement the above-mentioned decoding steps as shown in Fig. 7, the video decoder 172 of Fig.1 is illustrated in more detail with reference to Fig. 8. As shown in Fig. 8, video decoder 172 comprises a decoding module 135 configured to receive and decode an encoded set of filters, motion vectors and a prediction error; a motion compensation module 140 configured to interpolate the reference frame using the decoded set of filters including a first filter F1 and a second filter F2, which further comprises a sub-module 805 for determining samples to be interpolated according to the decoded motion vectors, a sub-module 810 for applying the filter F1 to interpolate a first plurality of samples among said determined samples where the first plurality of samples are at horizontal or vertical half-pixel sample positions, and a sub-module 815 for applying the filter F2 to interpolate a second plurality of samples among said determined samples where the second plurality of samples are at horizontal and vertical half-pixel sample positions; and a reconstruction module 145 configured to reconstruct the current frame using the interpolated reference frame, the decoded motion vectors and the decoded prediction error.

[0107] While the present invention has been described with reference to specific exemplified embodiments, it will be evident that various modifications and changes without deviation from the spirit of the invention could be made within the scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

1. An adaptive interpolation method for motion compensated predictive video codec, comprising: providing a set of filters for a current frame; interpolating a reference frame having a certain precision according to the set of filters; calculating motion vectors to generate a prediction frame of the current frame in view of the interpolated reference frame; constructing a first interpolation filter F1 of the set of filters in view of a first part of all the sub-pixel positions according to a fixed linear relationship among samples of the first part; training the first filter F1 by performing Least Square Estimation on the sub-pixel positions of the first part; constructing a second filter F2 in view of a second part of all the sub-pixel positions according to a fixed linear relationship among samples of the second part; training the second filter F2 of the set of filters by performing Least Square Estimation on the sub-pixel positions of the second part under the constraint of F1 ; re-training the first filter F1 on the sub-pixel positions of the first part under the constraint of the second filter F2; and updating the set of filters by the trained filters F1 and F2 to optimize the set of filters by iteratively performing the above steps from the step for interpolating to the step for updating until the stopping condition is satisfied.

2. The method according to claim 1 , wherein the first filter F1 is employed to interpolate samples at horizontal half-pixel positions or vertical half-pixel positions.

3. The method according to claim 1 , wherein the second filter F2 is employed to interpolate samples at horizontal and vertical half-pixel positions.

4. The method according to claim 3, wherein said step for employing the filter F2 to interpolate samples at horizontal and vertical half-pixel positions further comprises: filtering the up-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a first middle result; filtering the up-right NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a second middle result; filtering the down-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a third middle result; filtering the down-right NxN integer samples of horizontal and vertical half-pixel sample positions by using the filter F2 to obtain a fourth middle result; and interpolating samples at horizontal and vertical half-pixel sample positions by averaging the first, the second, the third, and the fourth obtained results, wherein N is an integer.

5. The method according to claim 4, wherein N equal to 3.

6. The method according to claim 1 , wherein after the samples at half-pixel positions are interpolated, the samples at other sub-pixel positions are interpolated under a fixed linear relationship between the samples at half-pixel or integer-pixel positions and the samples at sub-pixel position with higher precision.

7. The method according to claim 1 , wherein the steps for training F1 and F2 further comprises: calculating float filter coefficients for denoting each filter by minimizing the square error (e)²; and mapping the float filter coefficients to quantization filter coefficients based on a required precision of the prediction of the current frame.

8. The method according to claim 7, wherein the Least Square Estimation of the square error (e)² is implemented using fast LDL^T decomposing algorithm.

9. The method according to any of claims 1 -8, wherein the stopping condition is that the number of iteration cycles equals to a preset value.

10. The method according to any one of claims 1 -8, wherein the stopping condition is that the coefficients of F1 and F2 begin to converge.

11. The method according to any one of claims 1 -8, wherein the stop condition is that the prediction error is smaller than a predetermined value.

12. The method according to claim 1 , wherein the provided set of filters is a set of default filters upon determining the current frame is a first inter-frame, or a set of optimized filters for the immediate preceding inter-frame of the current frame upon determining the current frame is not the first inter-frame.

13. The method according to claim 1 , wherein the optimized set of filters, the motion vectors and the prediction error are encoded and transmitted to a video decoder.

14. A video encoder (171), comprising a summer (120), a motion compensation module (115), a motion estimation module (105), an encoding module (125), a feedback decoding module (130) and an adaptive interpolation system (110), wherein said adaptive interpolation system (110) further comprises: a device configured to provide a set of filters for a current frame; a device configured to interpolate a reference frame having a certain precision according to the set of filters; a device configured to calculate motion vectors of the current frame in view of the interpolated reference frame; a device configured to train at least one of the filter sets by performing Least Square Estimation using the calculated motion vectors according to an equation as below:

(e)² π

and to update the filter sets by replacing the at least one filter set with the trained filter to obtain an optimum filter set, wherein e represents the difference between the current frame and a prediction of the current frame; S represents the current frame; P represents the reference frame; x, y represent the x and y coordinates, respectively; NxM is the size of the filter, (mvx, mvy) represents the motion vectors; h represents the float filter coefficients, /, j represent the coordinates of filter coefficients; and a device configured to obtain a desirable prediction of the current frame by using the optimum filter set.

15. A decoding method for motion compensated predictive video codec, comprising receiving encoded set of filters, motion vectors and prediction error, in which said filters include a first filter F1 and a second filter F2; decoding the received set of filters, motion vectors and prediction error by using predictive coding and exponent-Glomob method; determining samples to be interpolated according to the decoded motion vectors; interpolating a reference frame using the decoded set of filters, which further includes: applying the filter F1 to interpolate a first plurality of samples among said determined samples where the first plurality of samples are at horizontal or vertical half-pixel sample positions; and applying the filter F2 to interpolate a second plurality of samples among said determined samples where the second plurality of samples are at horizontal and vertical half-pixel sample positions; and reconstructing the current frame using the interpolated reference frame, the decoded motion vectors and the decoded prediction error.

16. The decoding method according to claim 15, wherein said step for applying the filter F2 to interpolate said second plurality of samples further comprises: filtering the up-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a first middle result; filtering the up-right NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a second middle result; filtering the down-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a third middle result; filtering the down-right NxN integer samples of horizontal and vertical half-pixel sample positions by using the filter F2 to obtain a fourth middle result; and interpolating samples at horizontal and vertical half-pixel sample positions by averaging the first, the second, the third, and the fourth obtained results, wherein N is an integer.

17. The method according to claim 16, wherein N equals to 3.

18. The decoding method according to claim 15, wherein said interpolating the reference frame using the decoded set of filters further includes: applying a fixed filter to interpolate samples at other sub-pixel sample positions under a fixed linear relationship between the samples at half-pixel or integer-pixel positions and the samples at sub-pixel position with higher precision after the samples at half-pixel positions are interpolated using the filter F1 or filter F2.

19. A video decoder (172), comprising a decoding module (135) configured to receive and decode an encoded set of filters, motion vectors and prediction error; a motion compensation module (140) configured to interpolate the reference frame using the decoded set of filters including a first filter F1 and a second filter F2, which further comprises: means for determining samples to be interpolated according to the decoded motion vectors; means for applying the filter F1 to interpolate a first plurality of samples among said determined samples if the first plurality of samples are at horizontal or vertical half-pixel sample positions; and means for applying the filter F2 to interpolate a second plurality of samples among said determined samples if the second plurality of samples are at horizontal and vertical half-pixel sample positions; and a reconstruction module (145) configured for reconstructing the current frame using the interpolated reference frame, the decoded motion vectors and the decoded prediction error.

20. The decoder according to claim 19, wherein said means for applying the filter F2 to interpolate said second samples further comprises: means for filtering the up-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a first middle result; means for filtering the up-right NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a second middle result; means for filtering the down-left NxN integer samples of horizontal and vertical half-pixel sample positions using the filter F2 to obtain a third middle result; means for filtering the down-right NxN integer samples of horizontal and vertical half-pixel sample positions by using the filter F2 to obtain a fourth middle result; and means for interpolating samples at horizontal and vertical half-pixel sample positions by averaging the first, the second, the third, and the fourth obtained results, wherein N is an integer.

21. The decoder according to claim 19, wherein said motion compensation module (140) further comprises: means for applying a fixed filter to interpolate samples at other sub-pixel sample positions under a fixed linear relationship between the samples at half-pixel or integer-pixel positions and the samples at sub-pixel position with higher precision after the samples at half-pixel positions are interpolated using the filter F1 or filter F2.