WO2009007133A2

WO2009007133A2 - Method and apparatus for determining the visual quality of processed visual information

Info

Publication number: WO2009007133A2
Application number: PCT/EP2008/005693
Authority: WO
Inventors: Tobias Oelbaum
Original assignee: Technische Universität München
Priority date: 2007-07-11
Filing date: 2008-07-11
Publication date: 2009-01-15
Also published as: WO2009007133A3

Abstract

The invention relates to a method and an apparatus for determining the visual quality of processed visual information such as compressed images or videos.

Description

Method and apparatus for determining the visual quality of processed visual information

Knowing the visual quality of a compressed (and therefore distorted) video or image is essential for most applications dealing with compressed image or video data. The most accurate way to determine the visual quality of such a compressed video or image would be to conduct subjective tests. But these tests are time consuming, expensive and cannot be part of a running system. For these reasons many automatic objective visual quality metrics were proposed in recent years but despite some advances in the field of visual quality estimation the correlation of the re- suits of those metrics to results gained in subjective tests is still limited.

A good objective video quality metric would produce results highly correlated to those obtained by a subjective test.

The most popular video quality metric is the Peak Signal to Noise Ratio (PSNR) . This simple metric just calculates the mathematical difference between every pixel of the encoded video and the original video and is calculated according to

I_max is the maximum value one pixel can have (e.g. 255 for 8 bit) , N and M are the number of rows and columns respectively and J_cocj and I_Orig are the actual pixel values for the coded and original image respectively.

In fact up to now, PSNR is the only video quality metric that is widely accepted and therefore PSNR is the de-facto standard for measuring video quality. Being the de-facto standard for objective video quality metrics, PSNR is still used for comparing AVC/H.264 encoded video with other video codecs or for comparing different encoder implementations or coding settings for AVC/H.264. This is in spite of the knowledge, that PSNR values may be heavily misleading. In 2004, the ITU released a recommendation which included four different full reference (not only the coded video but also the original video is needed for the evaluation) metrics which outperformed PSNR in terms of correlation to results of extensive subjective tests [ITU-T J.144. Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference, ITU-T 2004] .

One full reference (FR) image metric which has gained a high popularity since it was introduced in 2002 is the so called SSIM (Structural SIMilarity index) . The SSIM performs a separate comparison on luminance, contrast and structure in the original and the coded image and uses this information to calculate one overall quality index.

A very common approach for a full reference metric is to combine measurements such as block fidelity and content richness and distortions with some visibility masking functions.

Comparably few approaches were presented for reduced reference (RR) quality evaluation and even less for no reference (NR) quality evaluation. Compared to full reference metrics for an RR metric only parts of the original video or some extracted properties of the original video are needed for evaluation.

For an NR metric, no information about the original video is needed. One popular approach for an NR image and video quality metric is the inclusion of watermarks in the original image and then measuring the amount to which these watermarks can be recovered at the receiver. Other common methods are estimation of PSNR or calculating the visual quality by evaluating different types of distortion such as blockiness.

Objective visual quality metrics normally deliver very imprecise results. A reason for this is that different images or videos have different properties, for example regarding sharpness or color spectrum. These different properties are distorted in a different way during compression or transmission.

It is therefore the task of the present invention to improve the preciseness of the results which are obtained by objective quality metrics.

This object is solved by the method according to claim 1 and the apparatus according to claim 17. Advantageous embodiments of the method and the apparatus are given by the respective dependent claims.

The invention is based on the insight that the preciseness of an objective visual quality metric can be improved if for a certain processed visual information the parameters of a regression function are known which correlate the quality metric with the ac- tual visual quality of the processed visual information. The processed visual information is preferably at least one image or video or a part of at least one image or video. The processing is preferably compressing and/or transmitting of the information.

According to the invention, a visual information is processed not only once, for example for a transmission, but at least one time more, preferably with a further compressor, to generate a further processed visual information. Thus, if the visual information is an image or a video and the processing is compressing or transmitting, the image or video is compressed once for a transmission and at least one time in addition. This further image or video is used only for determining the visual quality of the transmitted image or video. The method according to the invention is preferably carried out automatically on a calculation device or a computer, which is for example part of a transmission system or a compression system.

If the actual visual quality V₁ of this further processed visual information can be estimated, determined or set sufficiently precisely, it is possible to determine the parameters of the regression function that relates the actual visual quality and the visual quality calculated by a visual quality metric. Setting of the further visual quality preferably means that this further quality is stored or entered and is based on estimated, determined , for example in tests determining the actual or subjective quality, or oth- erwise chosen values. The preciseness of the objective quality metrics can therefore be improved. The determination of the actual visual quality is preferably performed on the basis of the parameters which were used for the processing of this further proc- essed information. For the determination of the parameters of the regression function, also the calculated visual quality yi has to be calculated with an objective quality metric. In the most simple case, only one further processed version of the information is generated. In this case, the regression function can be assumed to be linear. It is in this case determined by the parameters slope and offset or intersect. With the calculated quality of the further processed information, i.e. for example the com- pressed picture or video, either the slope or the offset of the linear regression function can be determined.

The visual information is not necessarily a complete image or video, but can also be a part of an image or video. The visual information can also be at least one feature of at least a part of an image or video.

y

The slope s can here be calculated as s = — . For the v, case if two further processed versions of the visual information are generated, having the set visual quality v_h and V_n with v_h > V_n, and having the calculated visual quality y^~h and y_n the slope s is calcu- y.—y lated as s = ——— . The offset o is then o = y_l-v_l or

^Vh~^Vn o = y_n-v_n-s, respectively. The corrected quality value y—o y' to be determined is then y'= .

In the following, the description of the invention relates to images and videos as visual information. However, the same description also holds for general visual information.

For the present invention, a visual quality metric can preferably be regarded a black box building block as shown in figure 1, that produces a quality estimation y at the output if confronted with a coded video or image at the input. If the original image or video is also needed for input, this is a "full reference" (FR) or "reduced reference" (RR) quality metric, oth- erwise the metric is a "no reference" (NR) metric.

As mentioned before, overall correlation between results of subjective tests and predicted quality can be increased if it is possible to estimate the pa- rameters of the regression line of each sequence. In the present invention, these regression parameters are therefore preferably determined by producing at least one, preferably two, additional instances of the original image or video and using these instances to calculate the slope s and offset o of the linear regression line. The visual quality of these additional instances should preferably be inherently known. The gained parameters s and o are then used to correct the original quality prediction. An overview of an embodiment is given in figure 2.

The accuracy of the correction mainly depends on three attributes :

1. Difference between the actual visual quality and the assumed visual quality of the additional instances,

2. sensitivity of the parameters s and o to the error between the actual visual quality and the assumed visual quality of the additional instances,

3. ability of the additional instances to represent the regression line for the given image or sequence .

For this reason, the following is preferred for the generation of these additional instances:

• The visual quality of the two additional instances should preferably be very different. These instances can be produced e.g. by encoding the original image or video using a fixed quanti- zation parameter (QP) .

• If the visual quality metric is an NR metric, one instance can preferably be the uncoded original, otherwise, a coded version of the video or image can be generated that most likely has no or only very few impairments. The visual quality v_hi_9h of this instance will preferably be assumed to be in the range of 0.8 to 1.0 on a 0 to 1 scale .

• The low quality instance should preferably be of low visual quality but should not contain artifacts that are not present in the image or video of interest (e.g. should not contain skipped frames if the video of interest does not contain skipped frames) . The visual quality v_iow of this instance will preferably be assumed to be in the range of 0.1 to 0.3 on a 0 to 1 scale .

• The encoder used to encode the additional instances should be close to the encoder used to encode the image or video of interest. If the encoder and its settings are unknown, at least the same coding technology should preferably be used.

The additional instances are then preferably rated by the same visual quality metric that is used to gain the prediction y. The gained values y_high and yi_ow are used to predict the slope s and offset o of the regression line:

yixigh Vlow S — ^■

Vhigh — ¹Uj₀₁₀

In the following an embodiment of the invention for a reduced reference method is described.

A set of simple no reference feature measurements can be selected representing the most common kind of distortions namely blocking, blurriness and noise. One feature measurement can be added to measure the amount of detail present in the encoded video. To take into account the time dimension of a video, four different continuity measurements can be performed: predictability (shows how good one frame can be predicted using the previous frame only) , motion continuity (measurement for the smoothness of the motion) , color continuity (shows how much color changes between two successive images) and edge continuity (shows how much edge regions are changing between two successive images) . The following quantities can be used.

• Blur: the blur measurement used is described in S. Winkler, T. Ebrahimi, A no-reference perceptual blur metric . Proceedings of the International Conference on Image Processing, vol. 3, pp. 57—60, Rochester, NY, Sep. 22-25, 2002. The algorithm measures the width of an edge and then calculates the blur by assuming that blur is reflected by wide edges. As blur is something natural in a fast moving sequence, this measurement is adjusted if the video contains a high amount of fast motion.

• Blocking: for measuring the blockiness, the algorithm introduced in A. C. Bovik, S. Liu, DCT-domain blind measurement of blocking artifacts in DCT- coded images. International Conference on Acoustics, Speech, und Signal Processing. IEEE Proceed- ings on, Volume 3, Page(s): 1725-1728, 07-11 May

2001, is used. This algorithm calculates the blockiness by applying an FFT along each line or column. The unwanted blockiness can be easily detected by the location in the spectra. • Noise: to detect the noise present in the video, a very simple noise detector was designed. First, a prediction of the actual image is built by motion compensation using a simple block matching algo- rithm. Second, a difference image between the actual image and its prediction is calculated and a low pass version of this difference image is produced by first applying a median filter and a Gaussian low pass filter afterward. A pixel is classi- fied to contain noise if the difference value between the original difference image and the low pass difference image exceeds a threshold of 25 (assuming 8 bit values ranging from 0 to 255) for one of the three color planes .

• Details: to measure the amount of details that are present in a video, the percentage of turning points along each line and each row are calculated. This measurement is part of a BTFR metric included in ITU-T J.144. Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference. 2004. As the amount of details that are noticed by an observer decreases with increasing motion, the activity measurement is adjusted if high motion is detected in the video.

• Predictability: A predicted image is built by motion compensation using a simple block matching al- gorithm. The actual image and its prediction are then compared block by block. A 8 x 8 block is considered to be noticeable different if the SAD exceeds 384. To avoid that single pixels dominate the SAD measurement, both images am filtered using first a Gaussian blur filter and a median filtering afterwards .

• Edge Continuity: The actual image and its motion compensated prediction are compared using the Edge- PSNR algorithm as described in C. Lee, S. Cho, J. Choc, T. Jeong, W. Ann and- E. Lee: Objective video quality assessment, SPIE Journal of Optical Engineering, Volume 45. Issue 1, Jan. 2006.

• Motion Continuity: Two motion vector fields are calculated: between the current and the previous frame and between the following and the current frame. The percentage of motion vectors where the difference between the two corresponding motion vectors exceeds 5 pixels (either in x- or y- direction) determines the motion continuity.

• Color continuity: A color histogram with 51 bins for each RGB channel is calculated for the actual image und its prediction. Color continuity is then given as the linear correlation between those two histograms .

All feature measurements can be done for each frame of the video separately and the mean value of all frames can then be used for further processing.

The above selected measurements are just one example for a set of variables that can be used for building such a model. The presented variables can be used for their simplicity, using more complex measurements for artifacts like noise or blur, results in even more accurate models as well as adding measurements for artifacts (e.g. ringing). For this case, preferably only no reference feature measurements are considered, including some feature measurements that re- quire the original video a RR or FR metric could be built. The nature of the multivariate calibration allows including an unrestricted number of fixed variables in the calibration step. If the calibration phase is done properly, fixed variables that do not contribute to the latent variable "video quality" do not spoil the calibration process. The regression model will preferably contain these useless fixed variables with zero (or very close to zero) weight and those variables then can be removed from the model .

Correcting the Features using MSC: As it is expected that the measured features are not free from multi- plicative an additive effects (e.g. the measurement for noise may be correlated and affected by the amount of details present in the video) a multiplicative signal correction (MSC) step can be performed before starting the multivariate regression. MSC was originally developed to correct measurements in reflectance spectroscopy, but can also help in this context to remove multiplicative and additive effects between different objective features. The MSC corrected value of one feature m for one sequence i is calculated as follows:

The two variables c und d are obtained by simple lin- ear regression of the feature values of the sequence i compared to the average of the feature values of all calibration sequences. For a detailed description of MSC, see chapter 7.4 in H. Martens, T. Naes, Multivariate Calibration. Wiley & Sons, 1992. Multivariate Regression with PLS: The obtained feature values f '_mi can then be used together with the corresponding subjective ratings yi that form the column vector y to built a regression model using the method of Partial Least Squares Regression (PLSR) . PLSR is an extension of the Principal Component Regression (PCR) , that tries to find the principal components (PC) that are most relevant not only for the interpretation of the variation in the input values in F but also for the variation in the output values y. So while the PCR is a bilinear regression method that consists of a Principal Component Analysis (PCA) of F' into the matrix T that contains the PCs of F¹ followed by a regression of y on T, for the PLSR the modelling of F' and y is done simultaneously to ensure that the PCs gained from F' are relevant for y.

F' can be modelled as:

F' = f +T • P^T + Ef.

With P being the loadings of the k input features, T being the scores of the 1 input sequences, / represents the row vector of the mean values of the fea- tures and E_f is the error in F' that cannot be modelled.

Likewise y can be modelled as:

The prediction y can then be modelled as :

Vi = &o + f/ ♦ b b is the column vector of the single estimation weights b_m, b₀ is the model offset. A detailed description of PLSR can be found in chapter 3.5 of H. Martens, T. Naes, Multivariate Calibration. Wiley & Sons, 1992.

The NR quality metric gained by the previous steps faces the problem that even the original video may contain a certain amount of blur or blocking and dif- ferent sequences do not only have a different amount of details but also do have different motion properties. However, the prediction accuracy for each single sequence is very high: the data points for one single sequence lie on one straight line only with unknown slope s and unknown offset o. The overall prediction accuracy therefore can be improved by estimating the slope and the offset of these lines by calculating the predicted quality of the original video (y_Orig) and of a low quality version of the video (yi_ow) preferably using the same quality predictor.

The proposed method determines these parameters by introducing at least one, preferably two, additional (coded) instances of the respective image or video and making a quality prediction not only for the video or image that should be evaluated but also for these additional instances.

While the original video is available and the subjective visual quality of this original can inherently be given to be 1 on a 0 to 1 scale with a comparably small error only, an estimation of a low quality video can be produced by e.g. encoding the original with a low bit rate. Obviously the subjective visual quality of this low quality video can only be guessed (for example 0.25) . Including the predicted quality of the original video and the predicted quality of the low quality video, the NR model becomes an RR model, even if the additional data that has to be sent is very low (only two values per sequence) . The corrected prediction y'i can be calculated as

$ ₌ ^B^ _Wi,h , = -^v-vy_^~"r ^aDd ° = y«~ - °²⁵ ** ■

The extremes of the test range (very good on very bad quality) subjective testing have a nonlinear quality rating und ratings do not reach the very extremes of the scale but are saturated before. For this reason, the prediction values y' can preferably use a sigmoid nonlinear correction. The general sigmoid function is given as

For the correction, the following values can for example be chosen: a = 1.0, b = 0.5, c 0.2. The applied correction function is preferably very close to be linear over a wide quality range.

The task of an objective visual quality metric is to rate the quality of a video or image in a way, that the result would be as close as possible to the quality a human observer would assign to this video/image. The most commonly used metric for this task is the already mentioned Peak Signal to Noise Ratio (PSNR) . But as PSNR only calculates the mathematical difference between the original images and the coded images (for PSNR calculation a video is just a series of single images) and does not take into account the properties of the human visual sys- tern (HVS) , the correlation between PSNR values and subjective quality is quite limited. But up to now, PSNR is the only visual quality metric that is widely accepted. The value is normally calulated as

PSNR = 101og/_mαl/- MlN- £) £ (I_cort - I_origf

For the present invention, a visual quality metric can be regarded a black box building block as shown in figure 1, that produces a quality estimation y at the output if confronted with a coded video or image at the input. If the original image or video is also needed for input, this is a "full reference" (FR) or "reduced reference" (RR) quality metric, otherwise the metric is a "no reference" (NR) metric.

As mentioned before, overall correlation between re- suits of subjective tests and predicted quality can be increased if it is possible to estimate the parameters of the regression line of each sequence. In the present invention, these regression parameters are therefore preferably determined by producing at least one, preferably two, additional instances of the original image or video and using these instances to calculate the slope s and offset o of the linear regression line. The visual quality of these additional instances should preferably be inherently known. The gained parameters s and o are then used to correct the original quality prediction. An overview of an embodiment is given in figure 2.

The accuracy of the correction mainly depends on three attributes: 1. Difference between the actual visual quality and the assumed visual quality of the additional instances,

• If the visual quality metric is an NR metric, one instance can preferably be the uncoded original, otherwise, a coded version of the video or image can be generated that most likely has no or only very few impairments. The visual quality Vhigh of this instance will preferably be assumed to be in the range of 0.8 to 1.0 on a 0 to 1 scale.

• The low quality instance should preferably be of low visual quality but should not contain artifacts that are not present in the image or video of interest (e.g. should not contain skipped frames if the video of interest does not contain skipped frames) . The visual quality Vχ_ow of this instance will preferably be assumed to be in the range of 0.1 to 0.3 on a 0 to 1 scale .

• The encoder used to encode the additional in- stances should be close to the encoder used to encode the image or video of interest. If the encoder and its settings are unknown, at least the same coding technology should preferably be used.

The additional instances are then preferably rated by the same visual quality metric that is used to gain the prediction y. The gained values yhigh and y_iow are used to predict the slope s and offset o of the regression line:

_ VKiph — Vloiu 3 — ~— — — ~— — —

Vhigh — Vlo-u)

O = lflou - V(O_U, * S

In one embodiment of the present invention, the further processed information can be generated by proc- essing the first processed visual information of which the quality is to be determined.

This method preferably allows measuring the visual quality of processed images or videos. The method preferably comprises the steps of extracting a number of features from the video or image I. Furthermore, preferably additional processed versions of the video or image denoted as I_procn with n [1...N] are generated using the already processed video or image I as input to the processing step. Afterwards, a number of features is preferably extracted from the videos or images I_procn- The extracted features from I and I_prOcn are preferably combined into one quality value y.

It is preferred that the feature extraction process for I is performed in a way that no excess to the reference image or video I_re£ is needed. The same feature extraction process can be applied on I and Ip_rocn- The additional processing preferably comprises encoding and decoding the image or video using a stearable video or image encoder. It is furthermore preferred that the encoding is done in a way to produce a processed I_procn that most probably has a visual quality that is lower than the visual quality of I.

The encoding can be done using the same encoding technology that was used to generate the processed image or video I from the reference image or video Iref . It is sufficient that only one additional in- stance I_prOc is generated which is therefore an option.

It is preferred that the gained features are combined into one quality value by a weighted summation. The weights can be adjusted according to the extracted features. The basic weights can be gained by the use of training data.

The invention allows improving the preciseness of ob- jective visual quality metrics for images, videos or other visual information. The invention can be used for images or videos where the picture quality is impaired by compression and/or transmission. Methods for compression of pictures and videos are e.g. JPEG, JPEG2000, MPEG- 2 or AVC/H.264. The invention can be used for images or videos with arbitrary data rate and pictures of videos which are compressed arbitrarily high.

The invention can be applied for images and videos which are transmitted over an arbitrary channel . The channel can have a band width which is restricted and/or errorous . The transmission can be packet- oriented or connection-oriented. The images and videos can have arbitrary spatial and time resolution, i.e. arbitrary image size or pictures per second in case of videos. The invention is also applicable on quality metrics which need the compressed/transmitted image as well as the original image or video. The invention is also applicable on quality metrics which only require the compressed/transmitted picture or video. In the latter case, the calculated parameters of the regression function can be transmitted together with the image or video.

It is normally sufficient for the invention if one further processed/compressed picture or video of one picture or video which is transmitted is generated. If further versions are generated, the parameters of the regression function can be determined more pre- cisely. Thus, the preciseness of the results of the objective quality metric can be improved further. For the generation of additional further processed versions of the image or videos, an arbitrary compressor can be used. Employing the same compressor for all further compressed versions normally leads to a higher preciseness of the results. Preferably, the compressor is the same as used for the transmitted image or video. However, it is also possible to use different compressors for the further processed vis- ual information. It is sufficient and therefore an option to generate the further processed information only of parts of the image or videos and judge those with the objective quality metric. However, using the complete im- age or video leads to more precise results.

The channel over which the additional further processed visual information can be transmitted, can be arbitrary and does not have to be identical with the channel over which the actual image or video is transmitted. In the most simple case, the channel over which the further information is transmitted, is regarded as having an unlimited bandwidth and being free of errors. For the transmission of further proc- essed images or videos, multiple different channels can be employed.

In the following, the invention is described by way of examples. The shown features can also be realized separately in the invention.

Figure 1 shows a general use of a visual quality metric for an encoded video.

Figure 2 shows an example for the present invention.

Figure 3 shows a regression function according to the prior art as well as according to the present invention.

Figure 4 shows an example of the present invention where the quality is determined based on features extracted from the video.

Figure 5 shows an example of a preferred embodiment of the present invention. Figure 1 shows a general setup of a visual quality metric of the full reference type or reduced reference type. An encoded video, the quality of which is to be determined, is input into the visual quality metric 13. The visual quality metric 13 generates an output value 14 which is an approximate measure of the visual quality of the encoded video 11. If the visual quality metric is of full reference or reduced reference type, it needs the input of the full or partial original video 12.

Figure 2 shows a block diagram of a method according to the present invention. The quality of the encoded video 20 is determined with the visual quality metric 21 which generates the value y. According to the method, a processed video with high picture quality 22 is produced. Furthermore, a video or picture 23 with low picture quality is produced. Afterwards, the picture quality of the video 22 with high picture quality is determined with the visual quality metric 21b. Furthermore, the picture quality of the video 23 with low picture quality is calculated with the visual quality metric 21c which is identical to the vis- ual quality metric 21b.

Now, for the video 22 with high quality a visual image quality v_h or actual image quality is set, assumed or determined experimentally, which preferably lies in a region between 0.8 and 1.0 on a scale reaching from 0 to 1. For the video 23 with low image quality,, a visual or actual image quality V_n is set or estimated preferably in a region between 0.1 and 0.3 on a scale reaching from 0 to 1. The slope s and zero-point or offset o is then determined from the values y_h, y_n, V_h and v_n in the regression function determining 24. The determining 24 thus produces the slope s and the zero-point o. Those val- ues are input into the correction 25 where the quality y determined from the encoded video 20 in the visual quality metric 21a is corrected to yield the corrected quality value y⁺ . If the visual quality metric is of the full reference or reduced reference type, parts or the full original video 25 can be input into the visual quality metric 21.

Figure 3 shows the regression functions 31 and 32 for two different pictures according to the prior art (left diagram) as well as according to the present invention (right diagram) . The vertical axis 31 shows the quality determined from a quality metric while the horizontal axis 34 shows the visual or actual quality of the respective image. The dashed line 35 shows an optimal regression function where the determined value is always equal to the actual value. It can be clearly seen that for both pictures 31 and 32, the regression functions determined in the present invention are much closer to the optimal regression function 35 than in the prior art.

Figure 4 shows the method according to the present invention where features are extracted from the original video 41. The features are extracted in step 43a, 43b and 43c. The step 44 combines the quality values y obtained by applying the quality model 42 on the features of the original video 41 as well as on the features of a low quality video 45 and an encoded video 46 with unknown quality. The low quality video 45 is generated from the original video 41 by encoding and decoding the original video 41. From the low quality video 45, the features are extracted in 43b and then input into the quality model 42. The encoded video 46 is produced by encoding the original video 41 transmitting the encoded video and decoding it in the decoding step 47. Also from the encoded video 46, features are extracted in the feature extraction 43c which are then input into the quality model 42 to give the quality value y. Employing the quality values obtained from the original video 41 and the low quality video 45, the regression function can be corrected and the value y obtained from the encoded video 46 can be corrected in the correcting means 44. The corrected quality value y' output by the correction means 44 is then input into sigmoid correction 49 to yield the final quality value y' ' 40.

Figure 5 shows a method according to a preferred embodiment of the present invention. A first processed image or video 50 which may be processed by compress - ing, transmitting or also just recording visual information is on the one hand subject to feature extraction 53a and on the other hand subject to further processing 51. The further processed image or video 52 generated from the image of video 50 in processing step 51 is then subject to feature extraction 53b.

Preferably, the feature extraction 53a which extracts features from the first processed image or video 50 is the same as the feature extraction 53b which extracts features from the image or video 52. In step 54, the features extracted from the image or video 50 as well as the processed image of video 52 are then combined into one quality value y 55. The additional processing 51 can comprise encoding and/or decoding of the image or video 50, e.g. using a stearable video or image encoder. Preferably, such encoding can be done in a way to produce a processed image or video 52 that most probably has a visual quality which is lower than the visual quality of the image or video 50. The image or video 50 can also be a processed visual information or image or video. Such a processed image or video 50 can e.g. be generated by the same processing step as step 51. However, image or video 50 can also be some other reference image or video. The combination of the features extracted in step 53a and 53b in step 54 can e.g. be done by weighted summation. The weights of such a weighted summation can be adjusted according to the extracted features. Basic weights can be gained by using training data.

Claims

1. Method for determining the visual quality of processed visual information comprising the steps

processing first processed visual information having a visual quality to be determined,

generating at least one further processed visual information each further processed visual information having different further visual qualities,

setting the at least one further visual quality,

calculating the at least one further visual quality by employing a first visual quality metric on the further processed visual information,

determining a regression function correlating the at least one calculated further visual quality with the at least one set further visual quality,

determining the visual quality to be determined by employing a second visual quality met- ric on the first processed visual information to obtain an calculated visual quality and determining the visual quality to be determined by employing the regression function on the calculated visual quality.

2. Method according to the preceding claim, characterized in that the processing comprises compressing and/or transmitting of visual information.

3. Method according to one of the preceding claims, characterized in that the visual information is at least a part of an image or a part of a video or an image or a video.

4. Method according to one of the preceding claims, characterized in that the second visual quality metric used to calculate the visual quality of the first processed visual information is equal to the first visual quality metric that is used to calculate the visual quality of the further processed visual information.

5. Method according to one of the preceding claims, characterized in that the at least one further processed information is generated by processing the first processed visual information.

6. Method according to one of the preceding claims, characterized in that each further processed visual information is generated by processing visual information which is common for all further processed visual information and which is preferably at least a part of a visual informa- tion the first processed visual information is processed of .

7. Method according to one of the preceding claims, characterized in that the set visual quality is the actual visual quality.

8. Method according to one of the preceding claims, characterized in that two further processed visual informations are generated one of which has a high visual quality and one of which has a low visual quality.

9. Method according to one of the preceding claims, characterized in that the visual quality metric is a peak signal to noise ratio (PSNR) and/or a full reference metric, a reduced reference metric or a no reference metric and/or a metric, which models the human visual system.

10. Method according to one of the preceding claims, characterized in that the regression function is determined by parameters, preferably slope and/or intercept and/or offset.

11. Method according to one of the preceding claims, characterized in that the processing includes transmitting the information over at least one channel, whereby optionally at least some of the first processed visual information and at least some of the further processed visual information are transmitted over different channels.

12. Method according to one of the preceding claims, characterized in that the visual information is at least one feature of at least a part of an image or a video.

13. Method according to the preceding claim, characterized in that the feature is at least one out of the group comprising blur, blocking, noise, details, predictability, edge continuity, motion continuity and colour continuity.

14. Method according to one of claims 12 or 13 , characterized in that the visual information is a video, whereby the features are features of each separate frame and the visual information is a mean value of each feature over multiple, preferably over all frames.

15. Method according to one of claims 12 to 14, characterized in that one quality value is generated from the features by summing the values of the features, preferably each multiplied with a weighting factor.

16. Method according to one of claims 12 to 15, characterized in that the features are the same for the first processed visual information as well as the further processed visual informa- tion.

17. Apparatus for determining the visual quality of processed visual information, comprising

- processing means being adapted to process first processed visual information having a visual quality to be determined,

generating means being adapted to generate at least one further processed visual information such that each further processed visual information has different further visual qualities,

- setting means being adapted to set the at least one further visual quality, calculating means being adapted to calculate the at least one further visual quality by employing a first visual quality metric on the further processed visual information,

determining means being adapted to determine a regression function correlating the calculated further visual qualities with the set further visual qualities, and

determining means being adapted to determine the visual quality to be determined by employing a second visual quality metric on the first processed visual information to obtain an calculated visual quality and to determine the visual quality to be determined by employing the regression function on the calculated visual quality.

18. Apparatus according to the preceding claim, characterized in that the apparatus is adapted to carry out a method according to one of claims 1 to 16.