CN102055982B

CN102055982B - Coding and decoding methods and devices for three-dimensional video

Info

Publication number: CN102055982B
Application number: CN 201110006090
Authority: CN
Inventors: 唐慧明; 林圣辉; 虞露; 陈珊莎
Original assignee: Zhejiang University ZJU
Current assignee: Hangzhou Hua Yanyun state information technology Co. Ltd.
Priority date: 2011-01-13
Filing date: 2011-01-13
Publication date: 2012-06-27
Anticipated expiration: 2031-01-13
Also published as: CN102055982A

Abstract

The invention discloses a coding method for a three-dimensional video. The method comprises the following steps of: inputting a first frame image which comprises image texture information and depth information at a plurality of different viewpoints at the same time so as to form depth pixel images of the plurality of the viewpoints; selecting a viewpoint which is closest to a center as a main viewpoint and mapping the depth pixel image of each viewpoint onto the main viewpoint; acquiring motion information from the texture information by a motion target detection method, rebuilding all depth pixel points in the mapped depth pixel images by using the depth information and/or the motion information to acquire a background image layer image and one or more foreground image layer images; and coding the background image layer image and the foreground image layer images respectively, wherein the depth information and the texture information are coded respectively. The invention also discloses a decoding method for the three-dimensional video, a coder and a decoder. The coding method is particularly suitable for coding a multi-viewpoint video sequence with a stationary background, can enhance prediction compensation accuracy and decreases code rate on the premise of ensuring subjective quality.

Description

3 D video decoding method and device

Technical field

The invention belongs to compression of digital video coding techniques field, particularly a kind of decoding method efficiently and device to the 3 D video that the static background scene is arranged.

Background technology

In recent years, along with the appearance of various multimedia services, the demand that virtual reality is experienced just constantly increases, and various 3 D video technology are also among constantly developing and applying.The core of 3 D video is and can pictures different information be provided according to the different points of view position, thereby realizes experience on the spot in person.Therefore, multi-view point video is encoded into a key technology for 3 D video service.

What multiple view video coding the earliest adopted is the form of two-way or multi-channel video predictive coding, through the compression of reference between looking and interframe reference implementation code stream.The two-way image of this form offer respectively the user about two, can only provide the 3D vision of a fixed viewpoint to experience.Want to realize free-viewing angle experience with the same manner, just need encode respectively to the multitude of video sequences from each viewpoint, this will produce very large video data, no matter to storage or to transmission, very high requirement arranged all, be difficult to move towards practicality.

Be to solve the free-viewing angle problem, one road video that occurred subsequently encoding adds the form of one tunnel degree of depth, utilizes depth information that main visual point image is mapped on other viewpoint (side is looked), realizes the dynamic generation of each visual angle video content.But,, exist inevitably on the front view picture to be blocked for actual video; And the zones that in the side is looked, reveal because of the visual angle is different, i.e. video content occlusion issue, serious cavity will appear in the other view picture of generation; Promptly do not have content area, can't really satisfy the demand of free-viewing angle.

Present stage mainly adopts multi-channel video to add the form that the multichannel degree of depth (MVD), particularly two-path video add the two-way degree of depth in the world, carries out multiple view video coding.Because the part that in one road video, is blocked generally can reveal in another road, this form has solved occlusion issue to a certain extent.Reference between a very important instrument of multi-channel video compression is looked is exactly removed redundancy through the predictive compensation between looking, and realizes data compression.But owing to reference vector and interframe movement vector between looking can not be predicted each other, and difference vector is often bigger, has influenced code efficiency; Simultaneously, because the light problem of actual scene, the imaging difference that difference is looked, and the skew of the object of non-whole macro block, the efficient of prediction can't be very high between looking.

On the basis of MVD, PHILIPS Co. has proposed the expression way of depth of seam division video again, and multi-channel video all is mapped on the viewpoint, generates a complete foreground picture layer and a Background From Layer that contains side information, encodes again.Compare with MVD, this method has been abandoned main other objective quality looked of looking outer, when guaranteeing good subjective quality, has significantly reduced code check, and looks in virtual generation the quality of not failing in MVD is arranged.But, limited by angular field of view, the view edge of generation generally has cavity in blocks to need to fill; Simultaneously, even the background of scene does not change, because foreground moving causes depth map to change, can make all has no small interframe to change on scope and content as the Background From Layer that replenishes, and the predictive coding that this has just influenced Background From Layer has reduced compression efficiency.

The applicant discloses a kind of video coding, coding/decoding method and device that decomposes based on the figure layer in one Chinese patent application 201010039620.8, improved the code efficiency of ordinary video signal effectively.But this patent is not considered the coding characteristic of 3 D video, can not be applicable to 3 d video encoding well.

At present, the multiple view video coding technology is also among constantly studying and using, and multiple view video coding has also got into the monitoring field.To the particularity of monitor video, be generally long-term static scene or the like like background, the compression efficiency of multiple view video coding can also have bigger room for promotion.

Summary of the invention

In view of this, the object of the present invention is to provide a kind of 3 D video decoding method and device, be used to improve the coding and the compression efficiency of 3 D video, especially for the encoding compression efficient that promotes the 3 D monitoring video.

Embodiments of the invention provide a kind of coding method of 3 D video, comprising:

Import a two field picture, a said two field picture comprises the image texture information and the depth information of a plurality of different points of view of synchronization, constitutes the degree of depth pixel map of a plurality of viewpoints;

The degree of depth pixel map of each viewpoint is mapped on the main viewpoint, and the image size of main viewpoint view is expanded, the selection of said main viewpoint is according to making an appointment;

From said texture information, obtain movable information through moving target detecting method; All degree of depth pixels are redeveloped into a Background From Layer image and one or more foreground picture tomographic images in the degree of depth pixel map after utilizing depth information and/or movable information to shine upon, and the Background From Layer image is carried out the time domain cumulative operation;

Said Background From Layer image and foreground picture tomographic image are encoded respectively, and wherein depth information and texture information are encoded respectively, and in coding, realize the mutual reference of figure interlayer.

Embodiments of the invention also provide a kind of 3 D video coding/decoding method, comprising:

Obtain input code flow to be decoded;

Said input code flow is resolved and the entropy decoding, obtain coded message;

Successively Background From Layer image and foreground picture tomographic image are decoded, obtain the reconstructed image of each figure layer, and generate the reference picture of each figure layer;

Reconstructed image and camera parameters according to said each figure layer generate the output image of specifying viewpoint.

Embodiments of the invention also provide a kind of code device of 3 D video, comprising:

The video input unit is used to realize collection of video signal or reads in;

Figure layer resolving cell is used for the picture breakdown of input video is become foreground picture tomographic image and Background From Layer image;

Figure layer coding unit is used for based on the reconstructed image of each figure layer said foreground picture tomographic image and Background From Layer image being encoded;

A figure layer reconstructed image generation unit is used to generate the reconstructed image of a view picture of corresponding diagram layer;

Code stream forms the unit, is used for synthetic grammatical code stream of data set that figure layer coding unit generated;

The code stream output unit is used to realize the output to said code stream.

Embodiments of the invention also provide a kind of 3 D video decoding device, comprising:

The code stream input unit is used for reading in video flowing from hard disk;

The code stream analyzing unit is used for the video code flow of input is resolved, and isolates Background From Layer code stream and foreground picture layer bit stream;

Figure layer decoder unit is used for based on the reconstructed image of each figure layer Background From Layer image and foreground picture tomographic image being decoded;

Scheme layer reconstructed image generation unit, be used to generate the reconstructed image of corresponding diagram layer;

The visual point image generation unit is used for reconstructed image and camera parameters according to each figure layer, generates the output image of specifying viewpoint;

The image output unit is used for reconstructed image is sent to output interface output.

Because certain point in the actual scene possibly observed by a plurality of visual angles, so has a large amount of duplicate messages between the different points of view sequence.The image mapped of many viewpoints on same main viewpoint, can be found out these duplicate messages, and can in figure layer reconstruction operation, remove between this looking redundant, improve the efficient of 3 d video encoding and compression through the pixel union operation.Simultaneously,, can solve the in blocks empty problem that generates the view edge in the tradition figure layer coding, improve the quality of decoding output image the expansion of main visual point image scope.

Secondly, suitable figure layer is cut apart, and has particularly utilized the figure layer of movable information to cut apart, and can guarantee that the pixel on same degree of depth level and the same target is encoded in same image sets, thereby improves forecasting efficiency, helps compressed encoding.

Further, the present invention has also carried out the time domain cumulative operation to Background From Layer.Through the synthetic background image that obtains of time domain cumulative operation, will bring in constant renewal in along with input picture, both utilized the information accumulation on the time domain, utilized the visual angle information of many viewpoints again, thereby set up background model comparatively accurate.Use a kind of so synthetic background as reference picture,, can reduce the information redundancy on the time domain better, particularly when the reciprocating motion of foreground moving target, can reach sizable coding gain using under the less situation with reference to buffer memory.To the 3 D monitoring video, its scene is general long-term static, and the time domain accumulation method of this Background From Layer will be particularly effective.

Under desirable figure layer decomposition situation, the information of different figure layers is fully independently.But, in concrete the realization, be difficult to accomplish that accurate figure layer cuts apart, particularly the part at the target edge possibly have the pixel that should belong to other figure layer.The inaccurate of depth map edge will cause this problem, and this situation is very common.For addressing this problem, the present invention has used the mutual reference of figure interlayer.Interlayer is with reference to comprising interlayer inter-frame predicated reference and the reference of interlayer inter prediction; Can reduce requirement to figure layer segmentation precision; Reduce the dependence of compression performance, thereby improve precision of prediction and compression efficiency, and help to promote quality of reconstructed images figure layer segmentation precision.

Description of drawings

The flow chart of a kind of 3 d video encoding method that Fig. 1 provides for the embodiment of the invention;

Fig. 2 is a method flow diagram of rebuilding Background From Layer image and foreground picture tomographic image in the embodiment of the invention;

The flow chart of a kind of 3 D video coding/decoding method that Fig. 3 provides for the embodiment of the invention;

The realization schematic diagram of a kind of encoder that Fig. 4 provides for the embodiment of the invention;

The structure chart of a kind of 3 d video encoding device that Fig. 5 provides for the embodiment of the invention;

The realization schematic diagram of a kind of decoder that Fig. 6 provides for the embodiment of the invention;

The structure chart of a kind of 3 D video decoding device that Fig. 7 provides for the embodiment of the invention.

Embodiment

The embodiment of the invention utilizes image mapped comprehensive with the figure layer, under the prerequisite that does not influence the image subjective quality, realizes the minimizing significantly of image data amount.Simultaneously, utilize the reference of figure interlayer and the synthetic further compression that realizes code stream of time domain of Background From Layer, be particularly suitable for the fixing this application scenarios that has stationary background of video clips, video conference, 3 D monitoring of video camera.

For making the object of the invention, technical scheme and advantage clearer, the present invention is made further detailed description below in conjunction with accompanying drawing.

The flow chart of a kind of 3 d video encoding method that provides for the embodiment of the invention shown in Figure 1, this method comprises:

Step 101, through video acquisition, communicate by letter or the mode of reading disk (or other storage medium) is imported a two field picture.A two field picture described here comprises the image texture information and the depth information of a plurality of different points of view of synchronization; Show as the degree of depth pixel map of a plurality of viewpoints; The civilian said pixel in back is degree of depth pixel, and each degree of depth pixel comprises its depth information and texture information.If the input multi-view point video does not contain depth information, then adopt depth extraction algorithm computation depth information, construct degree of depth pixel map.

Step 102, to select the viewpoint near the center be main viewpoint, utilizes camera parameters to look a mapping algorithm degree of depth pixel map of each viewpoint is mapped on the main viewpoint.The expanded in size of main viewpoint is for can comprise all pixels.Describedly look the method that a mapping algorithm can use the solid space mapping matrix.Agreement is main viewpoint near the viewpoint at center, is of value to map operation.

Step 103, all degree of depth pixels after utilizing depth information and/or movable information to shine upon are redeveloped into a Background From Layer degree of depth pixel image (abbreviating the Background From Layer image as) and one or more foreground picture layer depth pixel image (abbreviating the foreground picture tomographic image as).Depth information is carried by degree of depth pixel.Movable information is obtained from texture information by moving target detecting method.The method of rebuilding is: all degree of depth pixels that will be mapped to main viewpoint are assigned to each figure layer by operation as shown in Figure 3 with pixel then as a set of pixels to be allocated:

Step 1031, merging pixel.To each location of pixels, with depth value difference in setting threshold TH1 and the pixel of texture value difference in setting threshold TH2 merge into a pixel, the pixel value of new pixel is the average of original pixel value, perhaps gets intermediate value.Two kinds of threshold values can be made as corresponding peaked 1%～5%.Because certain point in the practical application scene possibly observed by a plurality of visual angles, so has a large amount of duplicate messages between the different points of view sequence.This pixel union operation can be removed between this looking redundant.

Step 1032, Background From Layer image are set up.To each location of pixels; Take out the darkest pixel P of the degree of depth in the set of pixels to be allocated; If have a plurality of; The average pixel value of then getting them constitutes pixel P, compares with the pixel of former Background From Layer image: if the degree of depth of pixel P darker (greatly) is then replaced this Background From Layer image pixel value with the pixel value of pixel P; If the degree of depth is identical, then replace the Background From Layer image pixel value with after the pixel value of pixel P and the former Background From Layer image pixel value weighted average; If the degree of depth more shallow (little) is then put back to set of pixels to be allocated with pixel P.The generation that in fact this operation is exactly Background From Layer, renewal operation are upgraded Background From Layer through depth information, generate at the figure tomographic image and have realized the time domain cumulative operation.Here the identical difference that is meant depth value or its equivalent parallax value of the degree of depth is got TH1 here less than a specified threshold value.

Step 1033, foreground picture tomographic image are set up.To each location of pixels, take out the pixel value of the pixel of degree of depth minimum in the set of pixels to be allocated, if this location of pixels has not had pixel value, then put sky or be changed to this location context figure layer pixel value, constitute the first foreground picture tomographic image.

Step 1034, if also have pixel in the set of pixels to be allocated, then repeated execution of steps 1033, generate second, third tomographic image such as foreground picture such as grade, until distributing all pixels.As a pixel set to be allocated, in fact figure layer reorganization operation is exactly that a pixel is distributed or the process of pixel mapping with residual pixel o'clock.

In step 1033 and 1034, can utilize movable information that pixel is carried out Object Segmentation, be that unit takes out pixel value again with the object from set of pixels to be allocated, with the pixel that guarantees same target in same figure layer, raising predictive compensation precision.Cutting apart of said object, can use various traditional image detection algorithms to detect moving target, like mixed Gauss model method, frame difference method, optical flow method etc., carry out the target zone correction through depth information again.The degree of depth of considering same target often has continuity; This correction operations is meant; To near the pixel the target edge; Remove the point that surpasses threshold value TH1 with adjacent picture point depth difference, add and the point of adjacent picture point depth difference, make not occur the cavity when utilizing each figure layer pixel value and depth value thereof to rebuild each visual point image less than threshold value TH1.Simultaneously, the continuous one-tenth panel region of still not moving of the degree of depth is also as foreground target.

Step 104, each figure tomographic image is sent into corresponding figure tomographic image group respectively and encoded, wherein depth information and texture information are encoded respectively.Background From Layer is as time domain cumulative chart layer, and the foreground picture layer is as non-time domain cumulative chart layer.Each schemes tomographic image by encoding with the coded sequence of decoder agreement, and this order will influence interlayer reference in the frame, and promptly the figure layer of coding can't be with reference to the figure layer of back coding earlier.Present embodiment elder generation coded time domain cumulative chart layer, the non-time domain cumulative chart layer of encoding again to the figure layer of same nature, is encoded by the order of (degree of depth is descending) from deep to shallow.Encoder adopts traditional block-based predictive coding device, and present embodiment adopts AVS, and adds the part grammar element according to needs of the present invention, and the coding flow process is changed slightly, realizes the coding of multi-view point video sequence.

The interpolation of syntactic element comprises:

1, concentrates in former sequential parameter, add camera parameters;

2, in the original image parameter set, add elements such as layer count, Background image width height;

3, in former macro block data, add the reference layer numbering.

The modification of coding flow process comprises:

1, to each two field picture, begin from the Background From Layer image, each figure tomographic image is all done the encoding operation that is equivalent to the original encoding device, generate the code stream of chip level and macro-block level, and the reconstructed image (reference frame) of each figure layer is done block elimination filtering and interlayer filtering operation.

2,, select long image sets (comprising), thereby reduce code check than multiframe because the correlation between the Background From Layer picture frame is stronger.Usually, the Background From Layer image sets preestablishes image group structure and length, but when Background From Layer content changings such as camera switching, can form new image sets immediately.To the long-term static 3 D monitoring application scenarios of background, Background From Layer can be only in refreshed image frame place coding transmission immediately, can under the prerequisite that guarantees subjective quality, significantly reduce code check like this.

3, during coded macroblocks; Reference frame except from this figure layer with reference to selecting (realize in the layer in the infra-frame prediction and layer an inter prediction) the formation; Can also from the reference formation of other figure layer, select, to realize that interlayer is with reference to (comprising interlayer infra-frame prediction and interlayer inter prediction).When Background From Layer image during only at refreshed image frame place coding immediately, the reference picture that Background From Layer generates only is used for the interlayer reference of foreground picture layer.At this moment, the Background From Layer reference picture is time domain accumulation reference picture, and it is by the reconstructed image initialization of refreshed image frame background image immediately, and is brought in constant renewal in by the foreground picture tomographic image.The renewal here is meant, on a location of pixels, if the difference of the degree of depth of foreground picture tomographic image and Background From Layer image in certain threshold range, is then carried out weighted average with the foreground pixel value to background pixel value and upgraded.Here threshold value adopts TH1.

4, when each figure layer is encoded, the zone that does not belong to this figure layer is not done coding or adopted the skip mode coding.For example; With the figure layer rectangular area of original position, length and wide definition, be unit with pixel or macro block, the time to figure layer coding; Image block to not belonging to this figure layer can not done coding; Rebuild pixel value and obtain, or directly adopt the skip mode coding, rebuild pixel value and can select to obtain or obtain through the skip mode decoding from other figure layer from other figure layer.

5, when the reference frame of generation background figure layer, to each location of pixels, if when there is the reconstruction pixel in this figure layer reconstructed image correspondence position, value is this weighted average of rebuilding pixel value and last reference frame correspondence position pixel value; Otherwise value is the pixel value of last reference frame correspondence position.Weighted average operation has realized the recursive filtering of reference frame, and weight parameter has realized the time domain cumulative operation by preestablishing in the generation of reference frame.

When 6, generating the reference frame of each foreground picture layer, to each location of pixels, if when there is the reconstruction pixel in this figure layer reconstructed image correspondence position, value is that this rebuilds pixel value; Otherwise value is the pixel value of Background From Layer correspondence position.

7, figure layer depth information model coding.The same object surfaces degree of depth generally has regularity, can carry out depth modelling, can use areal model approximate under a lot of situation, thereby realizes more efficient depth coding.When using this pattern, need to add the modelling parameter in the code stream element.

Coding is exported formed code stream after accomplishing.As: store disk, through the network transmission etc., or be used for and code stream multiple connection such as audio frequency, form system flow.

Above coding method is through the method for solid space mapping; Utilize depth information that a plurality of sequences of 3 D video are mapped on the viewpoint; And after the image detection all data recombination are several figure layers in depth ratio, last, each figure sequence of layer is encoded respectively.

The difference of other coding method in this coding method and the background technology also is that the key of its compression efficiency is: the multi-view point video sequence is reassembled as a plurality of figure layers on the viewpoint, reduces data volume; Each schemes interlayer reference each other; Figure layer can cumulative information on time domain; Simultaneously, each figure layer can dynamically enlarge angular field of view, looks the generation quality to improve void.

Use the figure layer of time domain cumulative operation, be referred to as time domain cumulative chart layer; Do not use the figure layer of time domain cumulative operation, be referred to as non-time domain cumulative chart layer.For the generation of reference frame, the time domain cumulative operation is meant that this code pattern layer has time domain accumulation reference frame, and this reference frame long-term existence and is upgraded by the reconstructed image that decoding obtains in reference to formation.The accumulation of figure layer information on time domain is mainly used in the Background From Layer that background image constitutes.Concerning Background From Layer, the time domain cumulative operation is meant that specifically codec preserves a width of cloth background image, and this image is realized the time domain accumulation along with input picture is brought in constant renewal in.Use a kind of so synthetic background as reference picture,, can reduce the information redundancy on the time domain better using under the less situation with reference to buffer memory.Particularly when the reciprocating motion of foreground moving target, can reach sizable coding gain.In addition; Synthetic background had both been utilized the information accumulation on the time domain, had utilized the visual angle information of many viewpoints again, can set up background model comparatively accurate; Thereby can generate the image of other viewpoint better, even might generate the visual point image outside the viewpoint scope of source.

Based on the 3 d video encoding method that the embodiment of the invention provides, its corresponding coding/decoding method is as shown in Figure 3, may further comprise the steps:

Step 301, obtain input code flow through reading disk or communications reception.

Step 302, input code flow is resolved and the entropy decoding, obtain necessary coded message,, current to be decoded image block high like layer count, Background image width adopted which kind of prediction mode etc.

Step 303, successively All Layers is decoded, obtain the reconstructed image of each figure layer, and generate the reference picture of each figure layer.Coding/decoding method of each figure layer is analogous to traditional block-based prediction decoding device, as H.264, AVS.Present embodiment uses AVS to decode, and needs according to the present invention are changed the decoding of realization multi-view point video sequence slightly to decoding process.

Except resolving the syntactic element that adds in the corresponding coding flow process, the modification of decoding process also comprises:

1, during decoded macroblock, reference frame except from this figure layer with reference to selecting the formation, can also from the reference formation of other figure layer, select, determine by the reference layer in the syntactic element.Corresponding with encoder, predictive mode comprises inter prediction, interlayer infra-frame prediction and interlayer inter prediction in a layer interior infra-frame prediction, the layer.

2, Background From Layer is as time domain cumulative chart layer; Its reference picture create-rule is: to each location of pixels; When there is the reconstruction pixel in this figure layer reconstructed image correspondence position; Value is rebuild the weighted average of pixel value and last reference frame correspondence position pixel value for this, otherwise value is the pixel value of last reference frame correspondence position.

3, the foreground picture layer is as non-time domain cumulative chart layer, and its reference picture create-rule is: to each location of pixels, if when there is the reconstruction pixel in this figure layer reconstructed image correspondence position, value is that this rebuilds pixel value; Otherwise value is the pixel value of Background From Layer correspondence position.

Step 304, according to the reconstructed image and the camera parameters of each figure layer, generate the output image of specifying viewpoint.Said generation specifies the method for viewpoint output image to be: utilize camera parameters, adopt and look a mapping algorithm, each figure tomographic image is mapped to the appointment viewpoint, then the minimum pixel value of selected depth on each location of pixels.

Fig. 4 is the realization schematic diagram of a kind of encoder of providing of the embodiment of the invention.The realization of whole encoder structure is compared with conventional codec; Main difference is to have increased image mapped and decomposes with the figure layer; Wherein, it is to utilize depth information and reference background frame that the figure layer decomposes, and picture breakdown to be encoded is become foreground picture layer and Background From Layer; Rationally the exploded view layer can improve code efficiency, adopts foregoing figure layer decomposition algorithm in the present embodiment; In addition; Also the coding unit of simple sequence is expanded for multi-layer image coding unit (like foreground picture layer coding unit and the Background From Layer coding unit among the figure); And in foreground picture layer coding, adopted the interlayer reference model,, adopted the Background From Layer reference frame time domain cumulative operation when generating.

Method and conventional codec basically identical that the Background From Layer coding adopts; Comprise infra-frame prediction, inter prediction, residual computations, change quantization, inverse quantization inverse transformation, block elimination filtering etc.; Just increased a time domain cumulative operation in addition; Utilize Background From Layer reconstructed image and former Background From Layer reference picture, generate new Background From Layer reference picture.The renewal principle is if the degree of depth is different, then to get the big pixel value of depth value; If the degree of depth is identical, then get weighted average.

Foreground picture layer coding compared conventional codec, increased interlayer infra-frame prediction and interlayer inter prediction, promptly can adopt Background From Layer reconstructed image image for referencial use.Simultaneously,, do not having the location of pixels of rebuilding pixel value, filling by the pixel value of Background From Layer image respective pixel position because the foreground picture tomographic image not necessarily all has information (promptly the cavity might be arranged) at all location of pixels.

Reference picture shown in Fig. 4, subscript n are represented current newly-generated reference picture, the previous reference picture of subscript n-1 expression.

Data after data after the Background From Layer transcoding, coding transform quantizes quantize with foreground picture layer transcoding, coding transform will be respectively by entropy coding, and the note rules then form output code flow, is used for transmission or stores.

Based on above-mentioned realization principle; The structure chart of the 3 d video encoding device that Fig. 5 provides for the embodiment of the invention has video input unit, a figure layer resolving cell, a figure layer coding unit, a figure layer reconstructed image generation unit, code stream and forms unit and code stream output unit.

The video input unit realizes collection of video signal or reads in, like the collection of collection, CCD or the cmos sensor output signal of analog video signal, reading images, the code stream that will compress are decoded and obtained digital video signal, obtain digital video signal etc. from network or other interface from storage device.The multi-view point video input unit comprises the texture signal collecting device of at least two viewpoints, and corresponding depth calculation unit or depth information acquisition hardware equipment.The depth calculation unit adopts the method for three-dimensional coupling to calculate the degree of depth of each pixel, and both available hardware realized, also available software realizes.Depth information acquisition hardware equipment can adopt TOF (flight time) method, and the degree of depth that can obtain than calculate is degree of depth result more accurately.

Figure layer resolving cell becomes foreground picture tomographic image and Background From Layer image with the picture breakdown in the input video, and the Background From Layer image is a time domain cumulative chart layer, and the foreground picture tomographic image is non-time domain cumulative chart layer, and decomposition method as previously mentioned.

Figure layer coding unit is used for foreground picture tomographic image and Background From Layer image are encoded; Comprise foreground picture layer coding unit and Background From Layer coding unit, the two includes infra-frame prediction, inter prediction, model selection, be used for that subtracter, change quantization, entropy coding, inverse transformation inverse quantization, the adder that is used for reconstructed image, reference picture that prediction residual calculates are synthetic elementary cell such as deposits with block elimination filtering, reference frame.Wherein the reference frame in the Background From Layer coding unit is deposited and is used to deposit the Background From Layer reconstructed image, comprises time domain accumulation reference picture, and the reference frame in the foreground picture layer coding unit is deposited and is used to deposit foreground picture layer reconstructed image.The Background From Layer coding unit only uses the image of Background From Layer reference frame in depositing when doing inter prediction, and the image that foreground picture layer coding unit can use the Background From Layer reference frame to deposit in depositing with foreground picture layer reference frame has been realized the interlayer reference prediction.The video coding of each figure layer coding unit and general video encoder, as H.264, unanimity such as AVS.

A figure layer reconstructed image generation unit is used to generate the reconstructed image of a view picture of corresponding diagram layer, deposits through depositing separately frame behind the block elimination filtering in, and the inter prediction reference of coded frame being used for after, the existing in front introduction of generation method is repeated no more here.

Code stream forms the unit and is used for the data set from Background From Layer coding unit and foreground picture layer coding unit is synthesized a grammatical code stream.

The code stream output unit is realized the output to code stream, can export through communication interface.

Fig. 6 is the realization schematic diagram of a kind of decoder of providing of the embodiment of the invention.Compare with conventional decoder; Main difference is to have increased the figure layer bit stream and decomposes; And with simple sequence decoding expansion for multi-layer image decoding (decoding) like foreground picture layer decoder among the figure and Background From Layer; And in the foreground picture layer decoder, added inter-layer prediction mode,, adopted the Background From Layer reference frame time domain accumulation when generating.

The Background From Layer decoding is consistent with conventional decoder; The decoded residual error data of entropy is recovered out through the inverse transformation inverse quantization; And according to predictive mode selection and motion vector information; Select that identical prediction mode obtains its motion compensation prediction value when encoding, predicted value and residual error addition are just obtained the reconstructed image of Background From Layer.Increase the time domain cumulative operation in addition, utilized Background From Layer reconstructed image and former Background From Layer reference picture, generated new Background From Layer reference picture.The renewal principle is if the degree of depth is different, then to get the big pixel value of depth value; If the degree of depth is identical, then get weighted average.

The foreground picture layer decoder has increased multiple prediction, is corresponding with encoder.The prediction details of concrete each unit is consistent with encoder section, and the data decode flow process also is consistent with conventional decoder.

It is reconstructed image and the camera parameters according to each figure layer that visual point image generates, and generates the output image of specifying viewpoint.Generate and specify the method such as the preamble of viewpoint output image said.

Based on above-mentioned realization principle; The structure chart of the 3 D video decoding device that Fig. 7 provides for the embodiment of the invention has code stream input unit, code stream analyzing unit, figure layer decoder unit, a figure layer reconstructed image generation unit, visual point image generation unit and decoded picture output unit.

The code stream input unit is the input interface that reads in video flowing from hard disk, or the Ethernet interface of receiver, video stream, or other video input communication interface.If be input as the multimedia data stream that comprises audio frequency, system, then the code stream input unit is isolated video flowing wherein through separating multiple connection.

The code stream analyzing unit is used for according to syntax rule the video code flow of input being resolved, and comprises the entropy decoding and goes multiple connection etc., isolates Background From Layer code stream and foreground picture layer bit stream, imports Background From Layer decoding unit and foreground picture layer decoder unit respectively.

Figure layer decoder unit is used for Background From Layer image and foreground picture tomographic image are decoded; Comprise Background From Layer decoding unit and foreground picture layer decoder unit; The two includes infra-frame prediction, inter prediction, inverse transformation inverse quantization, elementary cells such as reference picture is synthetic and block elimination filtering, reference frame are deposited, model selection, and this is consistent with conventional Video Decoder.Wherein the reference frame of Background From Layer decoding unit is chosen in the reference frame of Background From Layer is deposited, and the reference frame of foreground picture layer decoder unit is deposited at the reference frame of foreground picture layer and chosen in depositing with the reference frame of Background From Layer, is determined by the reference layer in the syntactic element.These frames are deposited and are deposited former frame or a few frame decoding reconstructed image of corresponding diagram layer behind block elimination filtering respectively, are used for interframe prediction decoding.

A figure layer reconstructed image generation unit is used to generate the reconstructed image of corresponding diagram layer, deposits through depositing separately frame behind the block elimination filtering in, and the inter prediction reference of coded frame being used for after, the existing in front introduction of generation method is repeated no more here.

The visual point image generation unit is used for reconstructed image and the camera parameters according to each figure layer, generates the output image of specifying viewpoint.Generate and specify the method such as the preamble of viewpoint output image said.

The image output unit realizes through reconstructed image being sent to output interface, as writes file, sends to various communication networks or export through various demonstration output interfaces.

On AVS codec rm09.02 version, realize the foregoing description, adopt cycle tests BookArrival_man, main viewpoint chosen position 10; Code allocation is following: coding structure is IPPP ..., I interframe is divided into 20 frames, and cycle tests length is 100 frames; Open RDO, fixedly QP.In addition, with realizing the depth of seam division video coding on the AVS codec rm09.02 version, code allocation is constant, and coding result and the present invention are compared.

The test result that obtains shows; Compare and use the depth of seam division video coding to realize multi-vision-point encoding; Coding bit rate output of the present invention has reduced more than 35%; The objective quality (PSNR) of the reconstructed image on viewpoint 10 and subjective quality and depth of seam division video coding are basic identical simultaneously, and the objective quality (PSNR) and the subjective quality of the generation image on viewpoint 8 slightly are better than the depth of seam division video coding.

In a word, the above is merely preferred embodiment of the present invention, is not to be used to limit protection scope of the present invention.

Claims

1. the coding method of a 3 D video is characterized in that, comprising:

Said Background From Layer image and foreground picture tomographic image are encoded respectively, and wherein depth information and texture information are encoded respectively, and in coding, realize the mutual reference of figure interlayer;

Wherein,

Said time domain cumulative operation is: when generation of figure tomographic image and reference picture generation, and the information of frame before using;

The method of said reconstruction Background From Layer image and foreground picture tomographic image comprises: all degree of depth pixels that will be mapped to main viewpoint are as a set of pixels to be allocated, and

Merge pixel, depth value difference and the pixel of texture value difference in predetermined threshold value are merged into a pixel, the pixel value of new pixel is the average of original pixel value;

Set up the Background From Layer image;

Set up the foreground picture tomographic image;

If also have pixel in the said set of pixels to be allocated, then repeat last step, until distributing all pixels;

The said method of setting up the Background From Layer image comprises:

To each pixel, take out the darkest pixel of the degree of depth in the said set of pixels to be allocated, compare with the pixel of former Background From Layer image;

If the degree of depth of the darkest pixel of the said degree of depth is darker, then with this Background From Layer image pixel value of pixel value replacement of this pixel; If the degree of depth is identical, then the said degree of depth is replaced the Background From Layer image pixel value after pixel value and the former Background From Layer image pixel value weighted average of dark pixel; If the degree of depth of the darkest pixel of the said degree of depth is more shallow, then the darkest pixel of the said degree of depth is put back to set of pixels to be allocated;

The said method of setting up the foreground picture tomographic image comprises:

To each pixel, take out the pixel value of the pixel of degree of depth minimum in the said set of pixels to be allocated, if this pixel has not had pixel value, then put sky or be changed to this location context figure layer pixel value, constitute the foreground picture tomographic image.

2. coding method according to claim 1 is characterized in that, the said method of setting up the foreground picture tomographic image further comprises:

Utilizing said movable information that all pixels in the set of pixels to be allocated are carried out Object Segmentation, and utilize depth information to carry out the object range correction, is that unit takes out pixel value again with the object from set of pixels to be allocated.

3. coding method according to claim 1 is characterized in that, said coding method is not done coding or adopted the skip mode coding the zone that does not belong to this figure layer when each figure layer is encoded; And

When the reference frame of generation background figure layer, to each location of pixels, if when there is the reconstruction pixel in this figure layer reconstructed image correspondence position, value is this weighted average of rebuilding pixel value and last reference frame correspondence position pixel value; Otherwise value is the pixel value of last reference frame correspondence position;

When generating the reference frame of each foreground picture layer, to each location of pixels, if when there is the reconstruction pixel in this figure layer reconstructed image correspondence position, value is rebuild pixel value for this; Otherwise value is the pixel value of Background From Layer reference frame correspondence position.

4. the 3 D video coding/decoding method based on the said coding method of claim 1 is characterized in that, comprising:

Obtain input code flow to be decoded;

5. the code device of a 3 D video is characterized in that, comprising:

The video input unit is used to realize collection of video signal or reads in that a two field picture of said input comprises the image texture information and the depth information of a plurality of different points of view of synchronization, constitutes the degree of depth pixel map of a plurality of viewpoints;

Figure layer resolving cell is used for the degree of depth pixel map of each viewpoint is mapped to main viewpoint, and the image size of main viewpoint view expanded, and the selection of said main viewpoint is according to making an appointment; With all degree of depth pixels that are mapped to main viewpoint as a set of pixels to be allocated; And merging pixel; Depth value difference and the pixel of texture value difference in predetermined threshold value are merged into a pixel, and the pixel value of new pixel is the average of original pixel value; Set up Background From Layer image and foreground picture tomographic image,, then repeat and set up Background From Layer image and the operation of foreground picture tomographic image, until distributing all pixels if also have pixel in the said set of pixels to be allocated;

The said Background From Layer image of setting up comprises and takes out the darkest pixel of the degree of depth in the said set of pixels to be allocated to each pixel, compares with the pixel of former Background From Layer image; If the degree of depth of the darkest pixel of the said degree of depth is darker, then with this Background From Layer image pixel value of pixel value replacement of this pixel; If the degree of depth is identical, then the said degree of depth is replaced the Background From Layer image pixel value after pixel value and the former Background From Layer image pixel value weighted average of dark pixel; If the degree of depth of the darkest pixel of the said degree of depth is more shallow, then the darkest pixel of the said degree of depth is put back to set of pixels to be allocated;

The said foreground picture tomographic image of setting up comprises to each pixel, takes out the pixel value of the pixel of degree of depth minimum in the said set of pixels to be allocated, if this pixel has not had pixel value, then puts sky or is changed to this location context figure layer pixel value, constitutes the foreground picture tomographic image;

The code stream output unit is used to realize the output to said code stream.

6. code device according to claim 5 is characterized in that, said figure layer coding unit comprises:

Foreground picture layer coding unit; Be used to realize infra-frame prediction, inter prediction, model selection, be used for that subtracter, change quantization, entropy coding, inverse transformation inverse quantization, the adder that is used for reconstructed image, reference picture that prediction residual calculates are synthetic to be deposited with block elimination filtering, reference frame, said reference frame is deposited and is used to deposit foreground picture layer reconstructed image;

The Background From Layer coding unit; Be used to realize infra-frame prediction, inter prediction, model selection, be used for that subtracter, change quantization, entropy coding, inverse transformation inverse quantization, the adder that is used for reconstructed image, reference picture that prediction residual calculates are synthetic to be deposited with block elimination filtering, reference frame, said reference frame is deposited and is used to deposit the Background From Layer reconstructed image.

7. the 3 D video decoding device based on the said code device of claim 5 is characterized in that, comprising:

The code stream input unit is used for reading in video flowing from hard disk;