US9602814B2 - Methods and apparatus for sampling-based super resolution video encoding and decoding - Google Patents

Methods and apparatus for sampling-based super resolution video encoding and decoding Download PDF

Info

Publication number
US9602814B2
US9602814B2 US13/574,428 US201113574428A US9602814B2 US 9602814 B2 US9602814 B2 US 9602814B2 US 201113574428 A US201113574428 A US 201113574428A US 9602814 B2 US9602814 B2 US 9602814B2
Authority
US
United States
Prior art keywords
high resolution
resolution pictures
pictures
sampling
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/574,428
Other versions
US20120294369A1 (en
Inventor
Sitaram Bhagavathy
Joan Llach
Dong-Qing Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
InterDigital Madison Patent Holdings SAS
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Priority to US13/574,428 priority Critical patent/US9602814B2/en
Assigned to THOMSON LICENSING reassignment THOMSON LICENSING ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BHAGAVATHY, SITARAM, LLACH, JOAN, ZHANG, DONG-QING
Publication of US20120294369A1 publication Critical patent/US20120294369A1/en
Assigned to THOMSON LICENSING DTV reassignment THOMSON LICENSING DTV ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: THOMSON LICENSING
Assigned to THOMSON LICENSING DTV reassignment THOMSON LICENSING DTV ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: THOMSON LICENSING
Application granted granted Critical
Publication of US9602814B2 publication Critical patent/US9602814B2/en
Assigned to INTERDIGITAL MADISON PATENT HOLDINGS reassignment INTERDIGITAL MADISON PATENT HOLDINGS ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: THOMSON LICENSING DTV
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • H04N19/00721
    • H04N19/00072
    • H04N19/00078
    • H04N19/00127
    • H04N19/00145
    • H04N19/00266
    • H04N19/00387
    • H04N19/00436
    • H04N19/00545
    • H04N19/00593
    • H04N19/00715
    • H04N19/00745
    • H04N19/00757
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/527Global motion vector estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/53Multi-resolution motion estimation; Hierarchical motion estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/573Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Definitions

  • the present principles relate generally to video encoding and decoding and, more particularly, to methods and apparatus for sampling-based super resolution video encoding and decoding.
  • a video compression approach using super resolution was proposed in a first prior art approach.
  • the spatial size of the input video is reduced to a certain predetermined low resolution (LR) size before encoding.
  • the low resolution video is up-scaled to the original size using a super resolution method along with some side information (metadata) transmitted with the bitstream.
  • the metadata includes a block-based segmentation of frames where each block is labeled as moving, non-moving flat, and non-moving textured.
  • Non-moving flat blocks are up-scaled by spatial interpolation.
  • motion vectors are sent to the receiver where a super resolution technique is applied in order to recover sub-pixel information.
  • a jittered down-sampling strategy is used wherein four complementary down-sampling grids are applied in rotating order.
  • the aforementioned first prior art approach disadvantageously does not use a smart sampling strategy for moving regions. Rather, the first prior art approach relies on the presence of sub-pixel motion between the low resolution frames in order to obtain super resolution. However, sub-pixel motion is not always guaranteed.
  • a camera is mechanically moved in sub-pixel shifts between frame captures.
  • the goal is to capture low resolution video which is better suited for subsequent super resolution.
  • the method of the second prior art approach is analogous to the jittered sampling idea in aforementioned first prior art approach.
  • a fixed jitter is not an effective strategy for the case of non-static backgrounds which is likely in our targeted application, namely, down-sampling high resolution video for subsequent super resolution.
  • an apparatus includes a down-sampler and metadata generator for receiving high resolution pictures and generating low resolution pictures and metadata there from.
  • the metadata is for guiding post-decoding post-processing of the low resolution pictures and the metadata.
  • the apparatus further includes at least one encoder for encoding the low resolution pictures and the metadata.
  • the method includes receiving high resolution pictures and generating low resolution pictures and metadata there from.
  • the metadata is for guiding post-decoding post-processing of the low resolution pictures and the metadata.
  • the method further includes encoding the low resolution pictures and the metadata using at least one encoder.
  • an apparatus includes a decoder for receiving a bitstream and decoding low resolution pictures and metadata there from.
  • the apparatus further includes a super resolution post-processor for reconstructing high resolution pictures respectively corresponding to the low resolution pictures using the low resolution pictures and the metadata.
  • the method includes receiving a bitstream and decoding low resolution pictures and metadata there from using a decoder.
  • the method further includes reconstructing high resolution pictures respectively corresponding to the low resolution pictures using the low resolution pictures and the metadata.
  • FIG. 1 is a high level block diagram showing an exemplary system/method for sampling-based super resolution, in accordance with an embodiment of the present principles
  • FIG. 2 is a block diagram showing an exemplary video encoder to which the present principles may be applied, in accordance with an embodiment of the present principles
  • FIG. 3 is a block diagram showing an exemplary video decoder to which the present principles may be applied, in accordance with an embodiment of the present principles
  • FIGS. 4A-D are diagrams showing data and steps relating to the pre-processing stage of a sampling-based super resolution method, in accordance with an embodiment of the present principles
  • FIGS. 5A-5D are diagrams showing data and steps relating to the post-processing stage of a sampling-based super resolution method, in accordance with an embodiment of the present principles
  • FIG. 6 is a flow diagram showing an exemplary method relating to a pre-processing stage of a sampling-based super resolution method, in accordance with an embodiment of the present principles
  • FIGS. 7A-7F are diagrams showing examples of sampling grids used for down-sampling HR high resolution (HR) frames to low resolution (LR), in accordance with an embodiment of the present principles;
  • FIGS. 8A-8D are diagrams showing additional uniform sampling grids, in accordance with an embodiment of the present principles.
  • FIG. 9 is a diagram showing steps relating to the selection of sampling grids, in accordance with an embodiment of the present principles.
  • FIG. 10 is a flow diagram showing an exemplary method relating to a post-processing stage of a sampling-based super resolution method, in accordance with an embodiment of the present principles.
  • FIGS. 11A-11B are diagrams showing the motion of a foreground object between two frames, in accordance with an embodiment of the present principles.
  • the present principles are directed to methods and apparatus for sampling-based super resolution video encoding and decoding.
  • processor or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.
  • DSP digital signal processor
  • ROM read-only memory
  • RAM random access memory
  • any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
  • any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function.
  • the present principles as defined by such claims reside in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
  • any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B).
  • such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
  • This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
  • a picture and “image” are used interchangeably and refer to a still image or a picture from a video sequence.
  • a picture may be a frame or a field.
  • the words “surrounding co-located pixels” when used, for example, with respect to creating the high resolution mosaic described herein by interpolating pixel values at pixel positions in the high resolution mosaic from pixel values of “surrounding co-located pixels” in the low resolution pictures refers to pixels in the low resolution pictures that surround a particular pixel that is co-located (i.e., has the same position) as a target pixel currently being interpolated in the high resolution mosaic.
  • the present principles are directed to methods and apparatus for sampling-based super resolution video encoding and decoding. It is to be appreciated that the present principles advantageously improve video compression efficiency.
  • a smart down-sampling strategy which is capable of handling motion between frames is proposed.
  • HR high-resolution
  • LR low-resolution
  • metadata is generated to guide post-processing.
  • the decoded low resolution frames and received metadata are used within a novel super resolution framework to reconstruct high resolution frames. Since only low resolution frames are encoded and the amount of metadata transmitted is low to moderate, this approach has the potential to provide increased compression ratios.
  • the smart down-sampling strategy takes into account motion between frames.
  • the down-sampling strategy contributes to an improved super resolution result by creating LR frames such that they complement one another in the pixel information they carry (in other words, reducing pixel redundancy between frames). In some sense, the strategy attempts to enforce sub-pixel motion between frames.
  • FIG. 1 an exemplary system/method for sampling-based super resolution is indicated generally by the reference numeral 100 .
  • High resolution (HR) frames are input and subjected to down-sampling and metadata generation at step 110 (by a down-sampler and metadata generator 151 ) in order to obtain low resolution (LR) frames and metadata.
  • the low resolution frames and metadata are encoded (by an encoder 152 ) at step 115 .
  • the encoded low resolution frames and metadata are decoded (by a decoder 153 ) at step 120 .
  • the low resolution frames and metadata are subjected to super resolution post-processing (by a super resolution post-processor 154 ) in order to provide high resolution output frames at step 130 .
  • super resolution post-processing by a super resolution post-processor 154
  • high resolution frames are down-sampled to low resolution and metadata is generated to guide post-processing.
  • a smart down-sampling strategy which is capable of handling motion between frames is proposed.
  • the decoded low resolution frames and received metadata are used within a novel super resolution framework to reconstruct high resolution frames. Since only low resolution frames are encoded and the amount of metadata transmitted is low to moderate, increased compression ratios can be obtained using this approach.
  • the down-sampler and metadata generator 151 may also be considered and referred to as a pre-processor herein.
  • encoder 152 and decoder 153 can be respectively implemented as shown in FIGS. 2 and 3 , respectively.
  • the video encoder 200 includes a frame ordering buffer 210 having an output in signal communication with a non-inverting input of a combiner 285 .
  • An output of the combiner 285 is connected in signal communication with a first input of a transformer and quantizer 225 .
  • An output of the transformer and quantizer 225 is connected in signal communication with a first input of an entropy coder 245 and a first input of an inverse transformer and inverse quantizer 250 .
  • An output of the entropy coder 245 is connected in signal communication with a first non-inverting input of a combiner 290 .
  • An output of the combiner 290 is connected in signal communication with a first input of an output buffer 235 .
  • a first output of an encoder controller 205 is connected in signal communication with a second input of the frame ordering buffer 210 , a second input of the inverse transformer and inverse quantizer 250 , an input of a picture-type decision module 215 , a first input of a macroblock-type (MB-type) decision module 220 , a second input of an intra prediction module 260 , a second input of a deblocking filter 265 , a first input of a motion compensator 270 , a first input of a motion estimator 275 , and a second input of a reference picture buffer 280 .
  • MB-type macroblock-type
  • a second output of the encoder controller 205 is connected in signal communication with a first input of a Supplemental Enhancement Information (SEI) inserter 230 , a second input of the transformer and quantizer 225 , a second input of the entropy coder 245 , a second input of the output buffer 235 , and an input of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 240 .
  • SEI Supplemental Enhancement Information
  • An output of the SEI inserter 230 is connected in signal communication with a second non-inverting input of the combiner 290 .
  • a first output of the picture-type decision module 215 is connected in signal communication with a third input of the frame ordering buffer 210 .
  • a second output of the picture-type decision module 215 is connected in signal communication with a second input of a macroblock-type decision module 220 .
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • An output of the inverse quantizer and inverse transformer 250 is connected in signal communication with a first non-inverting input of a combiner 219 .
  • An output of the combiner 219 is connected in signal communication with a first input of the intra prediction module 260 and a first input of the deblocking filter 265 .
  • An output of the deblocking filter 265 is connected in signal communication with a first input of a reference picture buffer 280 .
  • An output of the reference picture buffer 280 is connected in signal communication with a second input of the motion estimator 275 and a third input of the motion compensator 270 .
  • a first output of the motion estimator 275 is connected in signal communication with a second input of the motion compensator 270 .
  • a second output of the motion estimator 275 is connected in signal communication with a third input of the entropy coder 245 .
  • An output of the motion compensator 270 is connected in signal communication with a first input of a switch 297 .
  • An output of the intra prediction module 260 is connected in signal communication with a second input of the switch 297 .
  • An output of the macroblock-type decision module 220 is connected in signal communication with a third input of the switch 297 .
  • the third input of the switch 297 determines whether or not the “data” input of the switch (as compared to the control input, i.e., the third input) is to be provided by the motion compensator 270 or the intra prediction module 260 .
  • the output of the switch 297 is connected in signal communication with a second non-inverting input of the combiner 219 and an inverting input of the combiner 285 .
  • a first input of the frame ordering buffer 210 and an input of the encoder controller 205 are available as inputs of the encoder 200 , for receiving an input picture.
  • a second input of the Supplemental Enhancement Information (SEI) inserter 230 is available as an input of the encoder 200 , for receiving metadata.
  • An output of the output buffer 235 is available as an output of the encoder 200 , for outputting a bitstream.
  • SEI Supplemental Enhancement Information
  • the video decoder 300 includes an input buffer 310 having an output connected in signal communication with a first input of an entropy decoder 345 .
  • a first output of the entropy decoder 345 is connected in signal communication with a first input of an inverse transformer and inverse quantizer 350 .
  • An output of the inverse transformer and inverse quantizer 350 is connected in signal communication with a second non-inverting input of a combiner 325 .
  • An output of the combiner 325 is connected in signal communication with a second input of a deblocking filter 365 and a first input of an intra prediction module 360 .
  • a second output of the deblocking filter 365 is connected in signal communication with a first input of a reference picture buffer 380 .
  • An output of the reference picture buffer 380 is connected in signal communication with a second input of a motion compensator 370 .
  • a second output of the entropy decoder 345 is connected in signal communication with a third input of the motion compensator 370 , a first input of the deblocking filter 365 , and a third input of the intra predictor 360 .
  • a third output of the entropy decoder 345 is connected in signal communication with an input of a decoder controller 305 .
  • a first output of the decoder controller 305 is connected in signal communication with a second input of the entropy decoder 345 .
  • a second output of the decoder controller 305 is connected in signal communication with a second input of the inverse transformer and inverse quantizer 350 .
  • a third output of the decoder controller 305 is connected in signal communication with a third input of the deblocking filter 365 .
  • a fourth output of the decoder controller 305 is connected in signal communication with a second input of the intra prediction module 360 , a first input of the motion compensator 370 , and a second input of the reference picture buffer
  • An output of the motion compensator 370 is connected in signal communication with a first input of a switch 397 .
  • An output of the intra prediction module 360 is connected in signal communication with a second input of the switch 397 .
  • An output of the switch 397 is connected in signal communication with a first non-inverting input of the combiner 325 .
  • An input of the input buffer 310 is available as an input of the decoder 300 , for receiving an input bitstream.
  • a first output of the deblocking filter 365 is available as an output of the decoder 300 , for outputting an output picture.
  • FIGS. 4A-D and 5 A-D The central idea of sampling-based SR is illustrated in FIGS. 4A-D and 5 A-D.
  • FIGS. 4A-D data and steps relating to the pre-processing stage of a sampling-based super resolution method are indicated generally by the reference numeral 400 .
  • FIG. 4A shows an input set of high resolution (HR) frames 410 .
  • FIG. 4B shows the estimation 420 of motion transformations ⁇ r1 with respect to the reference frame.
  • FIG. 4C shows the estimation 430 of sampling grids S t based on a super resolution filling factor in reference frame coordinates (note that I refers to an identity transform).
  • FIG. 4D shows down-sampled low resolution (LR) frames and corresponding metadata 440 .
  • LR down-sampled low resolution
  • FIGS. 5A-5D data and steps relating to the post-processing stage of a sampling-based super resolution method are indicated generally by the reference numeral 500 .
  • FIG. 5A shows a decoded set 510 of low resolution frames and metadata.
  • FIG. 5B shows the creation 520 of a super resolution mosaic from low resolution frames in reference frame coordinates (note that I refers to an identity transform).
  • FIG. 5C shows the reconstruction 530 of high resolution frames from the super resolution mosaic.
  • FIG. 5D shows super resolved high resolution frames 540 .
  • FIG. 5A which relates to the post-processing stage, initially a set of decoded LR frames ⁇ circumflex over (L) ⁇ 1 - ⁇ circumflex over (L) ⁇ 4 is available along with some relevant metadata.
  • One of the frames is known to be a reference frame ( ⁇ circumflex over (L) ⁇ 1 in FIG. 5A ).
  • FIG. 5B a super resolution mosaic is constructed after transforming the pixels in the low resolution frames into a common coordinate system (coinciding with that of the reference frame) using the metadata information. Thereafter, with respect to FIG.
  • each high resolution frame in the set is reconstructed by combining the information in the super resolution mosaic (transformed back to the current frame coordinates) and the corresponding low resolution frame using the metadata information.
  • the metadata needs to describe the motion of pixels between each frame and the reference frame, and the down-sampling process used to create each low resolution frame from the corresponding high resolution frame (at the pre-processing stage). This information is determined at the pre-processing stage and sent as metadata.
  • an input high resolution video is divided into sets of frames which are processed separately.
  • H 1 is taken to be the reference frame.
  • FIG. 4B the motion between each frame and the reference frame is estimated.
  • the motion transformation from H t to H 1 is denoted by ⁇ r1 .
  • FIG. 4C the (down-)sampling grids S t are selected for each frame H t in order to create the corresponding low resolution frame L t .
  • the (downsampled) low resolution frames L t can be compressed using an encoder and sent to the receiver along with the corresponding metadata (motion and sampling grid information).
  • the decoded low resolution frames along with the metadata information are used to reconstruct the high resolution frames as described earlier.
  • the input high resolution video is first divided into sets of contiguous frames. Each set is then processed separately.
  • M is the down-sampling factor, i.e., the ratio of high resolution to low resolution frame dimensions.
  • M the down-sampling factor
  • the method 600 includes a start block 605 that passes control to a function block 610 .
  • the function block 615 performs global motion estimation between each frame and the reference frame to obtain motion parameters there for, and passes control to a function block 620 .
  • the function block 620 performs a sampling grid selection for each frame based on criteria related to super resolution quality to obtain sampling grid indices, and passes control to a function block 625 .
  • the function block 625 down-samples the high resolution frames in order to obtain low resolution frames, and passes control to a function block 630 .
  • H 1 be the reference frame. Estimate the motion from each frame H t to the reference frame ( FIG. 4B ). The motion transformation from H t to H 1 is denoted by ⁇ t1 .
  • sampling grid selection For each frame H t , the sampling grid S t indicates the pixels that are taken from H t in order to create the corresponding LR frame L t .
  • the grids S t are chosen such that each frame provides complementary pixel information for the super resolution process in the post-processing stage ( FIGS. 5A-5D ). Motion between frames is accounted for during the grid selection process.
  • each of the low resolution frames L t is created.
  • the low resolution frames are then compressed using an encoder and sent to the receiver.
  • Information regarding the motion between the frames and the sampling grids used are also sent as metadata.
  • x ′ a 1 ⁇ x + a 2 ⁇ y + a 3 c 1 ⁇ x + c 2 ⁇ y + 1
  • y ′ b 1 ⁇ x + b 2 ⁇ y + b 3 c 1 ⁇ x + c 2 ⁇ y + 1 .
  • the parameters are usually estimated by first determining a set of point correspondences between the two frames and then using a robust estimation framework such as RANdom SAmple Consensus (RANSAC) or its variants. Point correspondences between frames can be determined by a number of methods, e.g. extracting and matching Scale-invariant Feature Transform (SIFT) features or using optical flow.
  • SIFT Scale-invariant Feature Transform
  • the motion between each frame H t to the reference frame (H 1 ) has to be estimated.
  • three sets of parameters are estimated: ⁇ 21 ; ⁇ 31 ; and ⁇ 41 (corresponding to transformations ⁇ 21 , ⁇ 31 and ⁇ 41 , respectively).
  • a sampling grid S t For each high resolution frame H t , a sampling grid S t has to be selected in order to down-sample the frame and create the low resolution version L t .
  • a sampling grid indicates the pixels in the high resolution frame that are taken and packed into the corresponding low resolution frame.
  • FIGS. 7A-7F examples of sampling grids used for down-sampling HR high resolution (HR) frames to low resolution (LR) are indicated generally by the reference numeral 700 .
  • FIG. 7A shows the pixels 710 in the high resolution frames.
  • FIG. 7B shows four uniform sampling grids 720 with a down-sampling factor of 2.
  • FIG. 7C shows the low resolution frame 730 resulting from the first sampling grid g 1 .
  • FIG. 7D shows the low resolution frame 740 resulting from the second grid g 2 .
  • FIG. 7E shows the low resolution frame 750 resulting from the third grid g 3 .
  • FIG. 7F shows the low resolution frame 460 resulting from the fourth grid g 4 .
  • FIGS. 8A-8D additional uniform sampling grids are indicated generally by the reference numeral 800 .
  • FIG. 8A shows a horizontally staggered grid 810 .
  • FIG. 8B shows sampling grids g 5 , g 6 , g 7 , and g 8 , collectively indicated by the reference numeral 820 , and individually represented by the following respective symbols: o; +; x; and ⁇ .
  • FIG. 8C shows a vertically staggered sampling grid 830 .
  • FIG. 8D shows sampling grids g 9 , g 10 , g 11 , and g 12 , collectively indicated by the reference numeral 840 , and individually represented by the following respective symbols: o; +; x; and ⁇ .
  • the staggered grids g 5 -g 12 can potentially capture slightly rotated or sheared grids of pixels better than the rectangular grids g 1 -g 4 .
  • the basic criterion we shall employ in selecting grids is to maximize the expected quality of the super resolution result (i.e., the super resolution mosaic) at the post-processing stage. In practice, this is achieved by choosing grids S t such that each frame provides complementary pixel information for the super resolution process.
  • the grid selection process proceeds by replicating part of the super resolution mosaic creation process.
  • the criterion used to select grids is the super resolution filling factor.
  • FIG. 9 steps relating to the selection of sampling grids are indicated generally by the reference numeral 900 .
  • FIG. 9A shows a step 910 where a sampling grid is chosen for a reference frame.
  • FIG. 9B shows a step 920 where an unfilled super resolution frame (H SR ) of the same size as H 1 is initialized (and where I is the identity transform since we presume that there is no motion between H SR and H 1 ).
  • FIG. 9C shows a step 930 where a filling factor is chosen for each candidate grid.
  • FIG. 9D shows a step 940 where the previous steps are repeated for each frame H t to select the corresponding S t .
  • the preceding method 900 for sampling grid selection may also be further described as follows (presuming a set of four frames, H 1 being the reference frame):
  • H SR super resolution frame
  • the filling factor of a candidate grid g i is defined as the number of previously unfilled pixels in H SR that are filled when g i is selected for H t .
  • Variations of the filling factor measure or entirely different metrics involving super resolution quality may be used as criteria for grid selection. For example, instead of declaring each pixel in H SR as being filled or unfilled, we could keep track of the number of grid pixels mapped to each pixel therein. Thereafter, the filling factor could be redefined as a measure of incremental information wherein grids that have greater incremental contribution to H SR score higher. Another criterion for grid selection could involve completely replicating the super resolution process (using the previously selected grids S 1 -S t ⁇ 1 and the current candidate grids for S t ) and choose a grid S t that results in the highest SR quality, e.g., based on PSNR with respect to the reference frame.
  • each high resolution frame H t has a corresponding sampling grid S t .
  • S t sampling grid
  • H t is down-sampled to the low resolution frame L t as follows:
  • a suitable packing strategy may be devised so as to form a rectangular low resolution frame using the pixels sampled from the high resolution frame.
  • the low resolution frames thus created are then compressed using a video encoder.
  • the grids are then known from a lookup table at the post-processing stage.
  • FIG. 10 an exemplary method relating to a post-processing stage of a sampling-based super resolution method is indicated generally by the reference numeral 1000 .
  • the method 1000 includes a start block 1005 that passes control to a function block 1010 .
  • the function block 1015 transforms valid pixels from each low resolution frame to super resolution mosaic coordinates, and passes control to a function block 1020 .
  • the function block 1020 creates a super resolution mosaic by interpolating values at integer pixel positions, and passes control to a function block 1025 .
  • the function block 1025 reconstructs each high resolution frame by reverse transforming the super resolution mosaic to the high resolution frame coordinates, and passes control to a function block 1030 .
  • a function block 1035 provides sampling grid indices to the function block 1015 for use thereby.
  • a function block 1040 provides motion parameters (metadata) to the function blocks 1015 and 1025 for use thereby.
  • the metadata includes the motion parameters and the sampling grid indices.
  • the following are the steps (see FIG. 5A-5D ) involved in reconstructing the high resolution frames ⁇ t using the decoded low resolution frames and the side information:
  • Each high resolution frame ⁇ t in the set is reconstructed using the super resolution mosaic image ⁇ SR and the low resolution frame ⁇ circumflex over (L) ⁇ t using the side-information to guide the process.
  • ⁇ SR is assumed to be in the same coordinates as the reference frame, i.e., there is no motion between ⁇ SR and ⁇ 1 .
  • the following are the steps to construct ⁇ SR are the steps to construct ⁇ SR :
  • an image ⁇ SR is constructed by interpolating the pixel values at all integer pixel positions where sufficient (e.g., as determined using a threshold) data is available, from the surrounding pixel values at each of those positions.
  • a variety of (non-uniform) spatial interpolation methods are available for this operation. These methods take a set of pixel positions and corresponding values, and output interpolated values at any number of other positions.
  • the grid data function of MATLAB can be used to carry out this interpolation.
  • a validity map may be computed to determine which pixels of ⁇ SR include reliable information so that only these pixels are used in the reconstruction of high resolution frames.
  • a measure of validity may be computed at each pixel of the mosaic image based on the samples (e.g., the number or density of the samples) in a neighborhood around the pixel. Thereafter, a pixel in the mosaic is used in the reconstruction process only if its validity value is high enough (e.g., above a given threshold).
  • ⁇ t is a continuous 2-D pixel space wherein non-integer pixel positions may exist. Fill in the pixel positions in ⁇ t given by the grid S t with the corresponding pixel values in ⁇ circumflex over (L) ⁇ t .
  • the high resolution frame ⁇ t is reconstructed by interpolating the pixel values at all integer pixel positions in the frame from the surrounding pixel values at each of those positions. This is handled using a spatial interpolation method as described in the previous section (step 3). Pixels outside the frame boundaries are not determined.
  • Foreground objects are defined as objects (or regions) that do not follow the global motion between frames. In other words, these objects have motions that are different from the global motion between frames.
  • FIGS. 11A-B the motion of a foreground object between two frames (Frame 1 and Frame t) is indicated generally by the reference numeral 1100 .
  • the mask F 1 is filled with zeros. In other words, all pixels are considered as background.
  • H 1t ⁇ 1t (H 1 ), i.e., H 1 is transformed to the coordinates of ⁇ t .
  • the masks are computed at the pre-processing stage, they have to be transmitted as side-information to the receiver. It may not be necessary to transmit a high resolution version of the foreground masks.
  • the masks may be down-sampled to low resolution using the same strategy as used to create the low resolution frames L t from ⁇ t , and then up-sampled at the post-processing stage.
  • the masks may also be compressed (e.g., using ZIP, the MPEG-4 AVC Standard, and/or any other data compression scheme) prior to transmission.
  • transmitting the masks may be entirely avoided by computing them at the receiver side using the decoded low resolution frames and the metadata. However, it is a difficult problem to compute a reliable mask at the receiver.
  • one advantage/feature is an apparatus having a down-sampler and metadata generator and at least one encoder.
  • the down-sampler and metadata generator is for receiving high resolution pictures and generating low resolution pictures and metadata there from.
  • the metadata is for guiding post-decoding post-processing of the low resolution pictures and the metadata.
  • the at least one encoder ( 152 ) is for encoding the low resolution pictures and the metadata.
  • Another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder as described above, wherein the metadata includes motion transformation information and sampling grid information.
  • Yet another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the metadata includes motion transformation information and sampling grid information as described above, wherein the motion transformation information comprises global motion transformation information relating to global motion between two or more of the high resolution pictures.
  • Still another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the metadata includes motion transformation information and sampling grid information as described above, wherein the sampling grid information comprises sampling grid indices for indicating each respective one of a plurality of down-sampling grids used to generate the low resolution pictures from the high resolution pictures by down-sampling.
  • a further advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder as described above, wherein the high resolution pictures includes at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generates the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids.
  • the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the high resolution pictures includes at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generates the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids as described above, wherein the one or more down-sampling grids are selected based on the motion information such that each of the high resolution pictures, when down-sampled using the one or more down-sampling grids, provide complementary pixel information for the post-decoding post-processing of the low resolution pictures.
  • another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the one or more down-sampling grids are selected based on the motion information such that each of the high resolution pictures, when down-sampled using the one or more down-sampling grids, provide complementary pixel information for the post-decoding post-processing of the low resolution pictures as described above, wherein the grids are further selected based upon a filling factor that indicates a number of previously unfilled pixels in a super resolution picture generated using a particular one of the one or more down-sampling grids, the super resolution picture corresponding to an output provided by the post-decoding post-processing of the low resolution pictures and the metadata.
  • another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the one or more down-sampling grids are selected based on the motion information such that each of the high resolution pictures, when down-sampled using the one or more down-sampling grids, provide complementary pixel information for the post-decoding post-processing of the low resolution pictures as described above, wherein the grids are further selected based upon a distortion measure.
  • another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the high resolution pictures includes at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generates the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids as described above, wherein different ones of the plurality of down-sampling grids are used to down-sample different portions of a particular one of at least one of the high resolution pictures.
  • the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the high resolution pictures includes at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generates the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids as described above, wherein a respective binary mask is constructed for each of the high resolution pictures, the binary mask indicating respective locations of foreground pixels in the high resolution pictures.
  • the teachings of the present principles are implemented as a combination of hardware and software.
  • the software may be implemented as an application program tangibly embodied on a program storage unit.
  • the application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
  • the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPU”), a random access memory (“RAM”), and input/output (“I/O”) interfaces.
  • CPU central processing units
  • RAM random access memory
  • I/O input/output
  • the computer platform may also include an operating system and microinstruction code.
  • the various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU.
  • various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit.

Abstract

Methods and apparatus are provided for sampling-based super resolution video encoding and decoding. The encoding method receives high resolution pictures and generates low resolution pictures and metadata there from, the metadata for guiding post-decoding post-processing of the low resolution pictures and the metadata; and then encodes the low resolution pictures and the metadata using at least one encoder. The corresponding decoding method receives a bitstream and decodes low resolution pictures and metadata there from using a decoder; and then reconstructs high resolution pictures respectively corresponding to the low resolution pictures using the low resolution pictures and the metadata.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/US2011/000107 and filed Jan. 20, 2011, which was published in accordance with PCT Article 21(2) on Jul. 28, 2011, in English, and which claims the benefit of United States Provisional Patent Application Ser. No. 61/297,320, filed on Jan. 22, 2010, in English, which are incorporated by reference in their respective entireties.
TECHNICAL FIELD
The present principles relate generally to video encoding and decoding and, more particularly, to methods and apparatus for sampling-based super resolution video encoding and decoding.
BACKGROUND
A video compression approach using super resolution was proposed in a first prior art approach. In the first prior art approach, the spatial size of the input video is reduced to a certain predetermined low resolution (LR) size before encoding. After the low resolution video is received at the decoder side, the low resolution video is up-scaled to the original size using a super resolution method along with some side information (metadata) transmitted with the bitstream. The metadata includes a block-based segmentation of frames where each block is labeled as moving, non-moving flat, and non-moving textured. Non-moving flat blocks are up-scaled by spatial interpolation. For moving blocks, motion vectors are sent to the receiver where a super resolution technique is applied in order to recover sub-pixel information. For non-moving textured blocks, a jittered down-sampling strategy is used wherein four complementary down-sampling grids are applied in rotating order.
However, the aforementioned first prior art approach disadvantageously does not use a smart sampling strategy for moving regions. Rather, the first prior art approach relies on the presence of sub-pixel motion between the low resolution frames in order to obtain super resolution. However, sub-pixel motion is not always guaranteed.
In a second prior art approach, a camera is mechanically moved in sub-pixel shifts between frame captures. The goal is to capture low resolution video which is better suited for subsequent super resolution. For static backgrounds, the method of the second prior art approach is analogous to the jittered sampling idea in aforementioned first prior art approach. However, a fixed jitter is not an effective strategy for the case of non-static backgrounds which is likely in our targeted application, namely, down-sampling high resolution video for subsequent super resolution.
SUMMARY
These and other drawbacks and disadvantages of the prior art are addressed by the present principles, which are directed to methods and apparatus for sampling-based super resolution video encoding and decoding.
According to an aspect of the present principles, there is provided an apparatus. The apparatus includes a down-sampler and metadata generator for receiving high resolution pictures and generating low resolution pictures and metadata there from. The metadata is for guiding post-decoding post-processing of the low resolution pictures and the metadata. The apparatus further includes at least one encoder for encoding the low resolution pictures and the metadata.
According to another aspect of the present principles, there is provided a method. The method includes receiving high resolution pictures and generating low resolution pictures and metadata there from. The metadata is for guiding post-decoding post-processing of the low resolution pictures and the metadata. The method further includes encoding the low resolution pictures and the metadata using at least one encoder.
According to yet another aspect of the present principles, there is provided an apparatus. The apparatus includes a decoder for receiving a bitstream and decoding low resolution pictures and metadata there from. The apparatus further includes a super resolution post-processor for reconstructing high resolution pictures respectively corresponding to the low resolution pictures using the low resolution pictures and the metadata.
According to still another aspect of the present principles, there is provided a method. The method includes receiving a bitstream and decoding low resolution pictures and metadata there from using a decoder. The method further includes reconstructing high resolution pictures respectively corresponding to the low resolution pictures using the low resolution pictures and the metadata.
These and other aspects, features and advantages of the present principles will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
The present principles may be better understood in accordance with the following exemplary figures, in which:
FIG. 1 is a high level block diagram showing an exemplary system/method for sampling-based super resolution, in accordance with an embodiment of the present principles;
FIG. 2 is a block diagram showing an exemplary video encoder to which the present principles may be applied, in accordance with an embodiment of the present principles;
FIG. 3 is a block diagram showing an exemplary video decoder to which the present principles may be applied, in accordance with an embodiment of the present principles;
FIGS. 4A-D are diagrams showing data and steps relating to the pre-processing stage of a sampling-based super resolution method, in accordance with an embodiment of the present principles;
FIGS. 5A-5D are diagrams showing data and steps relating to the post-processing stage of a sampling-based super resolution method, in accordance with an embodiment of the present principles;
FIG. 6 is a flow diagram showing an exemplary method relating to a pre-processing stage of a sampling-based super resolution method, in accordance with an embodiment of the present principles;
FIGS. 7A-7F are diagrams showing examples of sampling grids used for down-sampling HR high resolution (HR) frames to low resolution (LR), in accordance with an embodiment of the present principles;
FIGS. 8A-8D are diagrams showing additional uniform sampling grids, in accordance with an embodiment of the present principles;
FIG. 9 is a diagram showing steps relating to the selection of sampling grids, in accordance with an embodiment of the present principles;
FIG. 10 is a flow diagram showing an exemplary method relating to a post-processing stage of a sampling-based super resolution method, in accordance with an embodiment of the present principles; and
FIGS. 11A-11B are diagrams showing the motion of a foreground object between two frames, in accordance with an embodiment of the present principles.
DETAILED DESCRIPTION
The present principles are directed to methods and apparatus for sampling-based super resolution video encoding and decoding.
The present description illustrates the present principles. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the present principles and are included within its spirit and scope.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the present principles and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the present principles, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the present principles. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The present principles as defined by such claims reside in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
Reference in the specification to “one embodiment” or “an embodiment” of the present principles, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
Also, as used herein, the words “picture” and “image” are used interchangeably and refer to a still image or a picture from a video sequence. As is known, a picture may be a frame or a field.
Additionally, as used herein, the words “surrounding co-located pixels” when used, for example, with respect to creating the high resolution mosaic described herein by interpolating pixel values at pixel positions in the high resolution mosaic from pixel values of “surrounding co-located pixels” in the low resolution pictures, refers to pixels in the low resolution pictures that surround a particular pixel that is co-located (i.e., has the same position) as a target pixel currently being interpolated in the high resolution mosaic.
As noted above, the present principles are directed to methods and apparatus for sampling-based super resolution video encoding and decoding. It is to be appreciated that the present principles advantageously improve video compression efficiency. In particular, a smart down-sampling strategy which is capable of handling motion between frames is proposed. At the pre-processing stage, high-resolution (HR) frames are down-sampled to low-resolution (LR) and metadata is generated to guide post-processing. In the post-processing stage, the decoded low resolution frames and received metadata are used within a novel super resolution framework to reconstruct high resolution frames. Since only low resolution frames are encoded and the amount of metadata transmitted is low to moderate, this approach has the potential to provide increased compression ratios.
The smart down-sampling strategy takes into account motion between frames. The down-sampling strategy contributes to an improved super resolution result by creating LR frames such that they complement one another in the pixel information they carry (in other words, reducing pixel redundancy between frames). In some sense, the strategy attempts to enforce sub-pixel motion between frames.
We note that conventional video compression methods (mainly block-based prediction methods such as, for example, the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) Standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”), have started reaching saturation points in compression ratios. Data pruning methods aim at improving compression efficiency beyond that achieved by standard compression methods. The main principle of such methods is to remove data before (or during) encoding and putting back the removed data at the receiver after (or during) decoding. Data pruning methods have exploited a variety of pre- and post-processing techniques for achieving their goal, e.g., block/region removal and inpainting, line removal and interpolation, and so forth.
In accordance with the present principles, intelligent down-sampling (at the transmitter) and super resolution (at the receiver) are the techniques exploited for data pruning. Super resolution is the process of increasing the resolution of images or videos by temporally integrating information across several low resolution images or frames. The principle of this data pruning approach is illustrated in FIG. 1. Turning to FIG. 1, an exemplary system/method for sampling-based super resolution is indicated generally by the reference numeral 100. High resolution (HR) frames are input and subjected to down-sampling and metadata generation at step 110 (by a down-sampler and metadata generator 151) in order to obtain low resolution (LR) frames and metadata. The low resolution frames and metadata are encoded (by an encoder 152) at step 115. The encoded low resolution frames and metadata are decoded (by a decoder 153) at step 120. The low resolution frames and metadata are subjected to super resolution post-processing (by a super resolution post-processor 154) in order to provide high resolution output frames at step 130. Thus, at the pre-processing stage (step 110), high resolution frames are down-sampled to low resolution and metadata is generated to guide post-processing. In particular, a smart down-sampling strategy which is capable of handling motion between frames is proposed. In the post-processing stage (step 125), the decoded low resolution frames and received metadata are used within a novel super resolution framework to reconstruct high resolution frames. Since only low resolution frames are encoded and the amount of metadata transmitted is low to moderate, increased compression ratios can be obtained using this approach. We note that the down-sampler and metadata generator 151 may also be considered and referred to as a pre-processor herein.
While not limited to the specific configurations of the following described encoder and decoder, encoder 152 and decoder 153 can be respectively implemented as shown in FIGS. 2 and 3, respectively.
Turning to FIG. 2, an exemplary video encoder to which the present principles may be applied is indicated generally by the reference numeral 200. The video encoder 200 includes a frame ordering buffer 210 having an output in signal communication with a non-inverting input of a combiner 285. An output of the combiner 285 is connected in signal communication with a first input of a transformer and quantizer 225. An output of the transformer and quantizer 225 is connected in signal communication with a first input of an entropy coder 245 and a first input of an inverse transformer and inverse quantizer 250. An output of the entropy coder 245 is connected in signal communication with a first non-inverting input of a combiner 290. An output of the combiner 290 is connected in signal communication with a first input of an output buffer 235.
A first output of an encoder controller 205 is connected in signal communication with a second input of the frame ordering buffer 210, a second input of the inverse transformer and inverse quantizer 250, an input of a picture-type decision module 215, a first input of a macroblock-type (MB-type) decision module 220, a second input of an intra prediction module 260, a second input of a deblocking filter 265, a first input of a motion compensator 270, a first input of a motion estimator 275, and a second input of a reference picture buffer 280.
A second output of the encoder controller 205 is connected in signal communication with a first input of a Supplemental Enhancement Information (SEI) inserter 230, a second input of the transformer and quantizer 225, a second input of the entropy coder 245, a second input of the output buffer 235, and an input of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 240.
An output of the SEI inserter 230 is connected in signal communication with a second non-inverting input of the combiner 290.
A first output of the picture-type decision module 215 is connected in signal communication with a third input of the frame ordering buffer 210. A second output of the picture-type decision module 215 is connected in signal communication with a second input of a macroblock-type decision module 220.
An output of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 240 is connected in signal communication with a third non-inverting input of the combiner 290.
An output of the inverse quantizer and inverse transformer 250 is connected in signal communication with a first non-inverting input of a combiner 219. An output of the combiner 219 is connected in signal communication with a first input of the intra prediction module 260 and a first input of the deblocking filter 265. An output of the deblocking filter 265 is connected in signal communication with a first input of a reference picture buffer 280. An output of the reference picture buffer 280 is connected in signal communication with a second input of the motion estimator 275 and a third input of the motion compensator 270. A first output of the motion estimator 275 is connected in signal communication with a second input of the motion compensator 270. A second output of the motion estimator 275 is connected in signal communication with a third input of the entropy coder 245.
An output of the motion compensator 270 is connected in signal communication with a first input of a switch 297. An output of the intra prediction module 260 is connected in signal communication with a second input of the switch 297. An output of the macroblock-type decision module 220 is connected in signal communication with a third input of the switch 297. The third input of the switch 297 determines whether or not the “data” input of the switch (as compared to the control input, i.e., the third input) is to be provided by the motion compensator 270 or the intra prediction module 260. The output of the switch 297 is connected in signal communication with a second non-inverting input of the combiner 219 and an inverting input of the combiner 285.
A first input of the frame ordering buffer 210 and an input of the encoder controller 205 are available as inputs of the encoder 200, for receiving an input picture. Moreover, a second input of the Supplemental Enhancement Information (SEI) inserter 230 is available as an input of the encoder 200, for receiving metadata. An output of the output buffer 235 is available as an output of the encoder 200, for outputting a bitstream.
Turning to FIG. 3, an exemplary video decoder to which the present principles may be applied is indicated generally by the reference numeral 300. The video decoder 300 includes an input buffer 310 having an output connected in signal communication with a first input of an entropy decoder 345. A first output of the entropy decoder 345 is connected in signal communication with a first input of an inverse transformer and inverse quantizer 350. An output of the inverse transformer and inverse quantizer 350 is connected in signal communication with a second non-inverting input of a combiner 325. An output of the combiner 325 is connected in signal communication with a second input of a deblocking filter 365 and a first input of an intra prediction module 360. A second output of the deblocking filter 365 is connected in signal communication with a first input of a reference picture buffer 380. An output of the reference picture buffer 380 is connected in signal communication with a second input of a motion compensator 370.
A second output of the entropy decoder 345 is connected in signal communication with a third input of the motion compensator 370, a first input of the deblocking filter 365, and a third input of the intra predictor 360. A third output of the entropy decoder 345 is connected in signal communication with an input of a decoder controller 305. A first output of the decoder controller 305 is connected in signal communication with a second input of the entropy decoder 345. A second output of the decoder controller 305 is connected in signal communication with a second input of the inverse transformer and inverse quantizer 350. A third output of the decoder controller 305 is connected in signal communication with a third input of the deblocking filter 365. A fourth output of the decoder controller 305 is connected in signal communication with a second input of the intra prediction module 360, a first input of the motion compensator 370, and a second input of the reference picture buffer 380.
An output of the motion compensator 370 is connected in signal communication with a first input of a switch 397. An output of the intra prediction module 360 is connected in signal communication with a second input of the switch 397. An output of the switch 397 is connected in signal communication with a first non-inverting input of the combiner 325.
An input of the input buffer 310 is available as an input of the decoder 300, for receiving an input bitstream. A first output of the deblocking filter 365 is available as an output of the decoder 300, for outputting an output picture.
Principle of Sampling-Based Super Resolution
The central idea of sampling-based SR is illustrated in FIGS. 4A-D and 5A-D. Turning to FIGS. 4A-D, data and steps relating to the pre-processing stage of a sampling-based super resolution method are indicated generally by the reference numeral 400. In particular, FIG. 4A shows an input set of high resolution (HR) frames 410. FIG. 4B shows the estimation 420 of motion transformations Θr1 with respect to the reference frame. FIG. 4C shows the estimation 430 of sampling grids St based on a super resolution filling factor in reference frame coordinates (note that I refers to an identity transform). FIG. 4D shows down-sampled low resolution (LR) frames and corresponding metadata 440.
Turning to FIGS. 5A-5D, data and steps relating to the post-processing stage of a sampling-based super resolution method are indicated generally by the reference numeral 500. In particular, FIG. 5A shows a decoded set 510 of low resolution frames and metadata. FIG. 5B shows the creation 520 of a super resolution mosaic from low resolution frames in reference frame coordinates (note that I refers to an identity transform). FIG. 5C shows the reconstruction 530 of high resolution frames from the super resolution mosaic. FIG. 5D shows super resolved high resolution frames 540.
Referring to FIG. 5A, which relates to the post-processing stage, initially a set of decoded LR frames {circumflex over (L)}1-{circumflex over (L)}4 is available along with some relevant metadata. One of the frames is known to be a reference frame ({circumflex over (L)}1 in FIG. 5A). In FIG. 5B, a super resolution mosaic is constructed after transforming the pixels in the low resolution frames into a common coordinate system (coinciding with that of the reference frame) using the metadata information. Thereafter, with respect to FIG. 5C, each high resolution frame in the set is reconstructed by combining the information in the super resolution mosaic (transformed back to the current frame coordinates) and the corresponding low resolution frame using the metadata information. In order to carry out the above post-processing steps, the metadata needs to describe the motion of pixels between each frame and the reference frame, and the down-sampling process used to create each low resolution frame from the corresponding high resolution frame (at the pre-processing stage). This information is determined at the pre-processing stage and sent as metadata.
Referring to FIG. 4A, an input high resolution video is divided into sets of frames which are processed separately. Let us consider a set of high resolution frames H1-H4 where H1 is taken to be the reference frame. In FIG. 4B, the motion between each frame and the reference frame is estimated. In FIG. 4B, the motion transformation from Ht to H1 is denoted by Θr1. In FIG. 4C, the (down-)sampling grids St are selected for each frame Ht in order to create the corresponding low resolution frame Lt. In FIG. 4D, the (downsampled) low resolution frames Lt can be compressed using an encoder and sent to the receiver along with the corresponding metadata (motion and sampling grid information). In the post-processing stage at the receiver, the decoded low resolution frames along with the metadata information are used to reconstruct the high resolution frames as described earlier.
In the following, we shall further describe the steps involved in the pre-processing and post-processing stages.
Pre-Processing Stage of Sampling-Based Super Resolution
In the pre-processing stage, the input high resolution video is first divided into sets of contiguous frames. Each set is then processed separately. Typically, we choose M2 frames in each set where M is the down-sampling factor, i.e., the ratio of high resolution to low resolution frame dimensions. The reasoning here is that a high resolution frame includes M2 times the number of pixels as a low resolution frame, therefore it should take M2 LR frames to construct a super resolution mosaic with the same size as a high resolution frame.
Let us now consider the case where the down-sampling factor is 2 (i.e., M=2) and then consider a set of four high resolution frames (Ht; t=1, 2, 3, 4).
Turning to FIG. 6, an exemplary method relating to a pre-processing stage of a sampling-based super resolution method is indicated generally by the reference numeral 600. The method 600 includes a start block 605 that passes control to a function block 610. The function block 610 inputs high resolution video frames, t=1, . . . , N, and passes control to a function block 615. The function block 615 performs global motion estimation between each frame and the reference frame to obtain motion parameters there for, and passes control to a function block 620. The function block 620 performs a sampling grid selection for each frame based on criteria related to super resolution quality to obtain sampling grid indices, and passes control to a function block 625. The function block 625 down-samples the high resolution frames in order to obtain low resolution frames, and passes control to a function block 630. The function block 630 outputs the low resolution frames, t=1, . . . , N, to an encoder, and passes control to an end block 699.
Further details relating to the steps involved in the pre-processing stage (e.g., as shown with respect to FIGS. 4 and 6) are provided as follows:
1. Motion estimation: Let H1 be the reference frame. Estimate the motion from each frame Ht to the reference frame (FIG. 4B). The motion transformation from Ht to H1 is denoted by Θt1.
2. Sampling grid selection: For each frame Ht, the sampling grid St indicates the pixels that are taken from Ht in order to create the corresponding LR frame Lt. The grids St are chosen such that each frame provides complementary pixel information for the super resolution process in the post-processing stage (FIGS. 5A-5D). Motion between frames is accounted for during the grid selection process.
3. Down-sampling: Using the selected grids St, each of the low resolution frames Lt is created. The low resolution frames are then compressed using an encoder and sent to the receiver. Information regarding the motion between the frames and the sampling grids used are also sent as metadata.
Each of the preceding steps will be further described herein after.
Motion Estimation
For illustrative purposes, we will now discuss one way of estimating the motion between each frame Ht in a given set to the reference frame of the set (FIG. 4B). Without loss of generality, it is presumed that the reference frame is H1. Let us simplify the problem by presuming that there is only global motion among the frames. In other words, we presume that the motion of the pixels between any two frames can be described by a global transformation with a few parameters. Examples of global transformations include translation, rotation, affine warp, projective transformation, and so forth.
In order to estimate the motion from frame Hi to frame Hj, we first choose a parametric global motion model that describes the motion between frames. Using the data from Hi and Hj, the parameters θij of the model are then determined. Henceforth, we shall denote the transformation by Θij and its parameters by θij. The transformation Θij can then be used to align (or warp) Hi to Hj (or vice versa using the inverse model Θjiij −1).
Global motion can be estimated using a variety of models and methods. One commonly used model is the projective transformation given as follows:
x = a 1 x + a 2 y + a 3 c 1 x + c 2 y + 1 , y = b 1 x + b 2 y + b 3 c 1 x + c 2 y + 1 . ( 1 )
The above equations give the new position (x′, y′) in Hj to which the pixel at (x, y) in Hi has moved. Thus, the eight model parameters θij={a1, a2, a3, b1, b2, b3, c1, c2} describe the motion from Hi to Hj. The parameters are usually estimated by first determining a set of point correspondences between the two frames and then using a robust estimation framework such as RANdom SAmple Consensus (RANSAC) or its variants. Point correspondences between frames can be determined by a number of methods, e.g. extracting and matching Scale-invariant Feature Transform (SIFT) features or using optical flow.
For the sampling-based super resolution procedure, the motion between each frame Ht to the reference frame (H1) has to be estimated. Hence, three sets of parameters are estimated: θ21; θ31; and θ41 (corresponding to transformations Θ21, Θ31 and Θ41, respectively). The transformation is invertible and the inverse model Θijij −1 describes the motion from Hj to Hi.
Sampling Grid Selection
For each high resolution frame Ht, a sampling grid St has to be selected in order to down-sample the frame and create the low resolution version Lt. A sampling grid indicates the pixels in the high resolution frame that are taken and packed into the corresponding low resolution frame. Turning to FIGS. 7A-7F, examples of sampling grids used for down-sampling HR high resolution (HR) frames to low resolution (LR) are indicated generally by the reference numeral 700. In further detail, FIG. 7A shows the pixels 710 in the high resolution frames. FIG. 7B shows four uniform sampling grids 720 with a down-sampling factor of 2. A symbol “o” represents a first sampling grid g1, a symbol “+” represents a second sampling grid g2, a symbol “x” represents a third sampling grid g3, and a symbol “Δ” represents a fourth sampling grid g4. FIG. 7C shows the low resolution frame 730 resulting from the first sampling grid g1. FIG. 7D shows the low resolution frame 740 resulting from the second grid g2. FIG. 7E shows the low resolution frame 750 resulting from the third grid g3. FIG. 7F shows the low resolution frame 460 resulting from the fourth grid g4.
Turning to FIGS. 8A-8D, additional uniform sampling grids are indicated generally by the reference numeral 800. In further detail, FIG. 8A shows a horizontally staggered grid 810. FIG. 8B shows sampling grids g5, g6, g7, and g8, collectively indicated by the reference numeral 820, and individually represented by the following respective symbols: o; +; x; and Δ. FIG. 8C shows a vertically staggered sampling grid 830. FIG. 8D shows sampling grids g9, g10, g11, and g12, collectively indicated by the reference numeral 840, and individually represented by the following respective symbols: o; +; x; and Δ.
Let us constrain ourselves here to use only uniform sampling grids, i.e., those that have a uniform density of coverage across all portions of the high resolution frame. There are distinct advantages of using a uniform grid. First, it roughly preserves the spatial and temporal relationships present among pixels in the high resolution frames and this helps the encoder (e.g., encoder 115 in FIG. 1, encoder 200 in FIG. 2) exploit spatio-temporal redundancies in the video for efficient compression. Second, in case the sampling-based super resolution system fails, a uniformly sampled frame can then be spatially interpolated to create a high resolution frame, thus ensuring a minimum quality of experience. Third, it is easier to pack pixels sampled using a uniform grid into a low resolution frame.
The sampling grid selection process is posed as a problem of selecting, for each high resolution frame Ht, an appropriate sampling grid St from a candidate pool of grids G={gi; i=1, . . . , NG}. In one embodiment, we choose from 12 candidate grids g1-g12 shown in FIGS. 7B, 8B, and 8D. Note that the staggered grids g5-g12 can potentially capture slightly rotated or sheared grids of pixels better than the rectangular grids g1-g4.
The basic criterion we shall employ in selecting grids is to maximize the expected quality of the super resolution result (i.e., the super resolution mosaic) at the post-processing stage. In practice, this is achieved by choosing grids St such that each frame provides complementary pixel information for the super resolution process. The grid selection process proceeds by replicating part of the super resolution mosaic creation process. In one embodiment, the criterion used to select grids is the super resolution filling factor.
Turning to FIG. 9, steps relating to the selection of sampling grids are indicated generally by the reference numeral 900. In particular, FIG. 9A shows a step 910 where a sampling grid is chosen for a reference frame. FIG. 9B shows a step 920 where an unfilled super resolution frame (HSR) of the same size as H1 is initialized (and where I is the identity transform since we presume that there is no motion between HSR and H1). FIG. 9C shows a step 930 where a filling factor is chosen for each candidate grid. FIG. 9D shows a step 940 where the previous steps are repeated for each frame Ht to select the corresponding St.
The preceding method 900 for sampling grid selection may also be further described as follows (presuming a set of four frames, H1 being the reference frame):
1. Compute the motion transformations Θt1 between each frame Ht to the reference frame (H1).
2. Choose the sampling grid for the reference frame as S1=g1.
3. Initialize an “unfilled” super resolution frame (HSR) in the coordinates of the reference frame (i.e. assuming there is no motion between HSR and H1). “Fill” the pixels in HSR corresponding to pixel positions given by grid S1.
4. For each remaining HR frame Ht (t≠1), compute the filling factor of each possible candidate grid in G. The filling factor of a candidate grid gi is defined as the number of previously unfilled pixels in HSR that are filled when gi is selected for Ht. The grid gi* that results in the highest filling factor is then selected (i.e., Si=gi*) and the corresponding pixels in Ht are filled (taking into account the motion transformation Θt1).
5. If all the frames Ht in the set have been processed, terminate. Otherwise, go back to step 4.
In step 4, the filling factor of a candidate grid gi is computed as follows. First consider each grid giεG in turn for Ht, transform (move) the pixels given by gi to HSR using Θt1 (rounding to the nearest pixel position in HSR), and compute the filling factor by recording how many previously unfilled pixel positions in HSR are filled by the transformed pixels. Thereafter, the grid gi* that results in the highest filling factor is selected (i.e. St=gi*). Note that the selected grids St and the resulting super resolution quality may depend on the order in which the frames Ht are processed. One ordering strategy is to consider frames in increasing order of their temporal distance from the reference frame. For example, if H2 is the reference frame, then the other frames are processed in the following order: H1; H3; and H4.
Variations of the filling factor measure or entirely different metrics involving super resolution quality may be used as criteria for grid selection. For example, instead of declaring each pixel in HSR as being filled or unfilled, we could keep track of the number of grid pixels mapped to each pixel therein. Thereafter, the filling factor could be redefined as a measure of incremental information wherein grids that have greater incremental contribution to HSR score higher. Another criterion for grid selection could involve completely replicating the super resolution process (using the previously selected grids S1-St−1 and the current candidate grids for St) and choose a grid St that results in the highest SR quality, e.g., based on PSNR with respect to the reference frame.
Down-Sampling High Resolution to Low Resolution
After the grid selection process, each high resolution frame Ht has a corresponding sampling grid St. Depending on the nature of St, Ht is down-sampled to the low resolution frame Lt as follows:
    • In the case that St is a rectangular grid (FIG. 7B), i.e., St=gi (i=1, 2, 3, 4), Lt is formed by taking the pixels from St and packing them horizontally and vertically as illustrated in FIGS. 7C-7F.
    • In the case that St is a horizontally staggered grid (FIG. 8B), i.e., St=(i=5, 6, 7, 8), each row with sampled pixels is shifted to the left so that the first sampled pixels in all these rows align vertically. Thereafter, Lt is formed by packing the pixels as described above.
    • In the case that St is a vertically staggered grid (FIG. 8D), i.e., St=gi (i=9, 10, 11, 12), each column with sampled pixels is shifted upward so that the first sampled pixels in all these columns align horizontally. Thereafter, Lt is formed by packing the pixels as described above.
For uniform sampling grids with various other structures, a suitable packing strategy may be devised so as to form a rectangular low resolution frame using the pixels sampled from the high resolution frame.
The low resolution frames thus created are then compressed using a video encoder. The side information including the estimated motion transformation parameters (θ21, θ31, θ41) and the selected sampling grids (S1, S2, S3, S4) are transmitted as metadata. Note here that it is sufficient to send the sampling grid indices instead of the grids themselves (i.e., if St=gi, send i). The grids are then known from a lookup table at the post-processing stage.
Post-Processing Stage of Sampling-Based SR
At the post-processing stage, we use the decoded low resolution frames and the metadata to reconstruct the corresponding high resolution frames, a process known as super resolution (SR). Turning to FIG. 10, an exemplary method relating to a post-processing stage of a sampling-based super resolution method is indicated generally by the reference numeral 1000. The method 1000 includes a start block 1005 that passes control to a function block 1010. The function block 1010 input low resolution video frames from a decoder, t=1, . . . , N, and passes control to a function block 1015. The function block 1015 transforms valid pixels from each low resolution frame to super resolution mosaic coordinates, and passes control to a function block 1020. The function block 1020 creates a super resolution mosaic by interpolating values at integer pixel positions, and passes control to a function block 1025. The function block 1025 reconstructs each high resolution frame by reverse transforming the super resolution mosaic to the high resolution frame coordinates, and passes control to a function block 1030. The function block 1030 reconstructs the high resolution frames, t=1, . . . , N, and passes control to an end block 1099. A function block 1035 provides sampling grid indices to the function block 1015 for use thereby. A function block 1040 provides motion parameters (metadata) to the function blocks 1015 and 1025 for use thereby.
Suppose that we have a set of decoded LR frames {circumflex over (L)}t corresponding to the set of high resolution frames Ht (t=1, 2, 3, 4) at the pre-processing stage (FIGS. 4A-4D). The metadata includes the motion parameters and the sampling grid indices. The following are the steps (see FIG. 5A-5D) involved in reconstructing the high resolution frames Ĥt using the decoded low resolution frames and the side information:
1. Creation of super resolution mosaic from low resolution frames: In this step, a high-resolution “SR” mosaic image ĤSR is created using the pixels from the set of decoded low resolution frames and the side-information. This will serve as a reference image from which the HR frames will be reconstructed. In further detail, a portion of each reconstructed HR frame will come from the SR mosaic and the remaining portions will be spatially interpolated from the corresponding LR frame pixels.
2. Reconstruction of high resolution frames: Each high resolution frame Ĥt in the set is reconstructed using the super resolution mosaic image ĤSR and the low resolution frame {circumflex over (L)}t using the side-information to guide the process.
These steps are further explained herein below.
Creation of Super Resolution Mosaic from Low Resolution Frames
In this step, a high-resolution super resolution mosaic image ĤSR is constructed using the set of decoded low resolution frames {circumflex over (L)}t (t=1, 2, 3, 4) and the associated metadata, which comprises the grids St used to create the low resolution frames Lt and the transformations Θt1 from each frame to the reference frame in the set (Frame at t=1 in FIG. 5A). ĤSR is assumed to be in the same coordinates as the reference frame, i.e., there is no motion between ĤSR and Ĥ1. The following are the steps to construct ĤSR:
1. For the time being, consider ĤSR to be a continuous 2-D pixel space wherein non-integer pixel positions may exist, e.g., ĤSR (1.44, 2.35)=128.
2. Fill in the pixel positions in ĤSR given by the transformed grid positions Θt1(St) with the corresponding pixel values in the decoded low resolution frame {circumflex over (L)}t. Do this for each decoded low resolution frame in the set (t=1, 2, 3, 4). Note that Θ11=I (identity transformation) since there is no motion between ĤSR and Ĥ1.
3. Finally an image ĤSR is constructed by interpolating the pixel values at all integer pixel positions where sufficient (e.g., as determined using a threshold) data is available, from the surrounding pixel values at each of those positions. A variety of (non-uniform) spatial interpolation methods are available for this operation. These methods take a set of pixel positions and corresponding values, and output interpolated values at any number of other positions. The grid data function of MATLAB can be used to carry out this interpolation.
The result of the above steps is the super resolution mosaic image ĤSR. In addition, a validity map may be computed to determine which pixels of ĤSR include reliable information so that only these pixels are used in the reconstruction of high resolution frames. A measure of validity may be computed at each pixel of the mosaic image based on the samples (e.g., the number or density of the samples) in a neighborhood around the pixel. Thereafter, a pixel in the mosaic is used in the reconstruction process only if its validity value is high enough (e.g., above a given threshold).
Reconstruction of High Resolution Frames
Now each high resolution frame Ĥt (t=1, 2, 3, 4) is reconstructed as follows:
1. For the time being, consider Ĥt to be a continuous 2-D pixel space wherein non-integer pixel positions may exist. Fill in the pixel positions in Ĥt given by the grid St with the corresponding pixel values in {circumflex over (L)}t.
2. Transform the pixel positions in ĤSR using the motion transformation Θ1t. Note that Θ1t is the inverse transformation of Θt1. If an integer pixel position x in ĤSR maps to a position y in the Ĥt space after transformation [i.e., y=Θ1t(x)], then fill in y with the corresponding value in ĤSR, i.e., Ĥt (y)=ĤSR(x).
3. Finally, the high resolution frame Ĥt is reconstructed by interpolating the pixel values at all integer pixel positions in the frame from the surrounding pixel values at each of those positions. This is handled using a spatial interpolation method as described in the previous section (step 3). Pixels outside the frame boundaries are not determined.
Handling Foreground Objects
So far, we have assumed that the motion between frames is fully described by a global motion model, i.e., all pixels adhere to this motion model. We now present a strategy to handle foreground objects. Foreground objects are defined as objects (or regions) that do not follow the global motion between frames. In other words, these objects have motions that are different from the global motion between frames. Turning to FIGS. 11A-B, the motion of a foreground object between two frames (Frame 1 and Frame t) is indicated generally by the reference numeral 1100. To avoid artifacts in the sampling-based super resolution process, it is important to locate foreground objects and use this knowledge during certain steps in the procedure. The foreground may be represented by a binary mask Ft, where Ft=1 indicates the foreground pixels, and Ft=0 indicates the background pixels.
Suppose we have obtained a binary mask Ft (as shown in FIG. 11B) for each frame indicating the foreground pixels therein. Let FGt be the set of all pixels with Ft=1 and FGt be the set of all pixels with Ft=0. Then, this information may be used as follows:
    • In the sampling grid selection process, the foreground regions may be excluded while determining the sampling grids for sampling Ht to create Lt. In steps 3 and 4, we could avoid mapping pixels in FGt from Ht to HSR. Thus, the filling factor (or other measure) is computed based on background pixels alone. Furthermore, during sampling grid estimation, sufficiently flat regions in Ht could be considered to be part of FGt. This could improve super resolution by giving higher importance to regions with details during the grid selection process. Flat regions could be determined based on measures such as spatial variance.
    • Prior to down-sampling a high resolution frame to low resolution, an anti-aliasing filter may be applied to foreground regions in the frame. Since foreground regions are not super-resolved in the current embodiment, the anti-aliasing operation may help in obtaining a better spatial interpolation result for these regions at the post-processing stage.
    • Consider the super resolution mosaic creation process. In step 2, we may avoid transforming foreground pixels (FGt) of {circumflex over (L)}t to ĤSR.
    • In step 2 of the high resolution frame reconstruction process, we could discard any transformed pixels from ĤSR to Ĥt falling inside regions defined by FGt. Furthermore, in step 1, we could (optionally) choose not to use pixels from St that map inside regions defined by FGt .
    • In the previous two alterations, sufficiently (as determined, e.g., using a threshold) flat regions in {circumflex over (L)}t or Ĥt could be considered to be part of FGt. In this case, spatial interpolation will be used to up-sample these regions.
    • Thus far, the foreground regions in the reconstructed high resolution frame Ĥt are just spatially interpolated from the pixels in the corresponding decoded low resolution frame {circumflex over (L)}t. Information from other low resolution frames is not explicitly exploited to super-resolve these regions. However, it may be possible to send some additional information such as block motion vectors (to exploit sub-pixel motion between frames) or high resolution patches as metadata in order to super-resolve the foreground regions in part or in full at the receiver side.
In addition to the above, other criteria using the foreground information may be used to improve the quality of the result.
Foreground Mask Estimation
It is a difficult problem to extract a clean and reliable foreground mask from frames with independently moving regions. Errors in global motion estimation along with the noise in the pixel values complicate the process. Furthermore, there is also the issue of compactly representing and transmitting the foreground information as metadata to the decoder.
One method for extracting foreground masks Ft for each high resolution frame Ĥt is now described. This takes place in the pre-processing stage where the high resolution frames are available. The following are the steps in the process.
1. For frame H1, the mask F1 is filled with zeros. In other words, all pixels are considered as background.
2. To extract Ft, the frame Ĥt is compared with H1t1t(H1), i.e., H1 is transformed to the coordinates of Ĥt. A normalized correlation metric Nt1(x) is computed between each pixel x in Ĥt and the corresponding pixel in H1t considering a small neighborhood around the pixels. If there is no corresponding pixel in H1t, (i.e., Θt1(x) lies outside the boundaries of H1), then Ft(x) is set to 1. Otherwise, if Nt1(x)>T, where T is a chosen threshold, then Ft(x)=0. Otherwise, Ft(x)=1.
Other methods including variations of the above may be used instead.
If the masks are computed at the pre-processing stage, they have to be transmitted as side-information to the receiver. It may not be necessary to transmit a high resolution version of the foreground masks. The masks may be down-sampled to low resolution using the same strategy as used to create the low resolution frames Lt from Ĥt, and then up-sampled at the post-processing stage. The masks may also be compressed (e.g., using ZIP, the MPEG-4 AVC Standard, and/or any other data compression scheme) prior to transmission. Alternatively, transmitting the masks may be entirely avoided by computing them at the receiver side using the decoded low resolution frames and the metadata. However, it is a difficult problem to compute a reliable mask at the receiver.
We note the following possible variations that may be employed in one or more embodiments of the present principles and still remain within the scope of the present invention, as would be apparent to one skilled in the art.
    • 1. Although the method is described for a set of four frames, the number of frames N in the set has no upper bound. In practice, N should be at least 4. The size of the set could be determined based on the down-sampling factor and the amount of motion between frames.
    • 2. A sequence with K>N frames can be broken down into a number of sets of N frames each. Each set could be treated using the proposed method.
    • 3. The reference frame need not always be the first frame in the set. It may be advantageous using a frame near the (temporal) center of the set in order to minimize the amount of motion between reference and non-reference frames.
    • 4. While reconstructing a set of frames, information from other sets of frames may be used. For example, the reconstructed high resolution reference frame from the previous set may be used to reconstruct a non-reference frame in the current set. For this purpose, motion information between sets of frames may be determined and transmitted as metadata. Also, information from frames outside the current set may be used during the super resolution mosaic creation process.
    • 5. The treatment herein is valid for both grayscale (single component) and color (multi-component) frames. One or more of the pre- and post-processing steps (e.g., sampling grid selection) may be independently carried out for each color component or by jointly considering all of them. For example, a different sampling grid may be determined for each color component.
    • 6. Multiple sampling grids may be estimated for different regions of a single frame. For example, a frame may be divided into four rectangular quarters, and a sampling grid selected for each one. In this case, steps 2, 3 and 4 in the section above entitled “Sampling grid selection” are carried out on a per-quarter basis instead of a per-frame basis. All the subsequent processes (down-sampling, post-processing) are modified accordingly to use different sampling grids for different regions of a frame.
    • 7. Different global motion transformations may be estimated for different regions of a single frame. For example, in the aforementioned section entitled “Motion estimation”, a frame may be divided into four rectangular quarters and a different transformation may be estimated between each one and the reference frame in the set. All subsequent processes will use the corresponding transformation for each region in the frame.
    • 8. In the aforementioned section entitled “Motion estimation”, instead of estimating the transformation from each frame in the set to the reference frame, it is possible to estimate the transformation from each frame to the next (or vice versa) and combine one or more of these to derive the required transformations.
    • 9. In the foreground masks (in the aforementioned section entitled “Foreground mask estimation”), a band of border pixels may be considered as foreground, e.g., for handling fixed black borders. As mentioned earlier, it is also possible to consider sufficiently flat regions as foreground.
A description will now be given of some of the many attendant advantages/features of the present invention, some of which have been mentioned above. For example, one advantage/feature is an apparatus having a down-sampler and metadata generator and at least one encoder. The down-sampler and metadata generator is for receiving high resolution pictures and generating low resolution pictures and metadata there from. The metadata is for guiding post-decoding post-processing of the low resolution pictures and the metadata. The at least one encoder (152) is for encoding the low resolution pictures and the metadata.
Another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder as described above, wherein the metadata includes motion transformation information and sampling grid information.
Yet another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the metadata includes motion transformation information and sampling grid information as described above, wherein the motion transformation information comprises global motion transformation information relating to global motion between two or more of the high resolution pictures.
Still another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the metadata includes motion transformation information and sampling grid information as described above, wherein the sampling grid information comprises sampling grid indices for indicating each respective one of a plurality of down-sampling grids used to generate the low resolution pictures from the high resolution pictures by down-sampling.
A further advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder as described above, wherein the high resolution pictures includes at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generates the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids.
Moreover, another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the high resolution pictures includes at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generates the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids as described above, wherein the one or more down-sampling grids are selected based on the motion information such that each of the high resolution pictures, when down-sampled using the one or more down-sampling grids, provide complementary pixel information for the post-decoding post-processing of the low resolution pictures.
Further, another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the one or more down-sampling grids are selected based on the motion information such that each of the high resolution pictures, when down-sampled using the one or more down-sampling grids, provide complementary pixel information for the post-decoding post-processing of the low resolution pictures as described above, wherein the grids are further selected based upon a filling factor that indicates a number of previously unfilled pixels in a super resolution picture generated using a particular one of the one or more down-sampling grids, the super resolution picture corresponding to an output provided by the post-decoding post-processing of the low resolution pictures and the metadata.
Also, another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the one or more down-sampling grids are selected based on the motion information such that each of the high resolution pictures, when down-sampled using the one or more down-sampling grids, provide complementary pixel information for the post-decoding post-processing of the low resolution pictures as described above, wherein the grids are further selected based upon a distortion measure.
Additionally, another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the high resolution pictures includes at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generates the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids as described above, wherein different ones of the plurality of down-sampling grids are used to down-sample different portions of a particular one of at least one of the high resolution pictures.
Moreover, another advantage/feature is the apparatus having the down-sampler and metadata generator and the at least one encoder wherein the high resolution pictures includes at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generates the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids as described above, wherein a respective binary mask is constructed for each of the high resolution pictures, the binary mask indicating respective locations of foreground pixels in the high resolution pictures.
These and other features and advantages of the present principles may be readily ascertained by one of ordinary skill in the pertinent art based on the teachings herein. It is to be understood that the teachings of the present principles may be implemented in various forms of hardware, software, firmware, special purpose processors, or combinations thereof.
Most preferably, the teachings of the present principles are implemented as a combination of hardware and software. Moreover, the software may be implemented as an application program tangibly embodied on a program storage unit. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPU”), a random access memory (“RAM”), and input/output (“I/O”) interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit.
It is to be further understood that, because some of the constituent system components and methods depicted in the accompanying drawings are preferably implemented in software, the actual connections between the system components or the process function blocks may differ depending upon the manner in which the present principles are programmed. Given the teachings herein, one of ordinary skill in the pertinent art will be able to contemplate these and similar implementations or configurations of the present principles.
Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present principles is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present principles. All such changes and modifications are intended to be included within the scope of the present principles as set forth in the appended claims.

Claims (33)

The invention claimed is:
1. An apparatus, comprising:
a down-sampler and metadata generator for receiving high resolution pictures and generating low resolution pictures and metadata there from, the metadata for guiding post-decoding post-processing of the low resolution pictures and the metadata; and
at least one encoder for encoding the low resolution pictures and the metadata,
wherein the metadata comprises motion transformation information relating to motion
between two or more of the high resolution pictures and sampling grid indices, and
wherein the high resolution pictures include at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generate the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids.
2. The apparatus of claim 1, wherein the motion transformation information comprises global motion transformation information relating to global motion between two or more of the high resolution pictures.
3. The apparatus of claim 1, wherein the sampling grid indices indicate each respective one of a plurality of down-sampling grids used to generate the low resolution pictures from the high resolution pictures by down-sampling.
4. The apparatus of claim 1, wherein the one or more down-sampling grids are selected based on the motion information such that each of the high resolution pictures, when down-sampled using the one or more down-sampling grids, provide complementary pixel information for the post-decoding post-processing of the low resolution pictures.
5. The apparatus of claim 4, wherein the grids are further selected based upon a filling factor that indicates a number of previously unfilled pixels in a super resolution picture generated using a particular one of the one or more down-sampling grids, the super resolution picture corresponding to an output provided by the post-decoding post-processing of the low resolution pictures and the metadata.
6. The apparatus of claim 4, wherein the grids are further selected based upon a distortion measure.
7. The apparatus of claim 1, wherein different ones of the plurality of down-sampling grids are used to down-sample different portions of a particular one of at least one of the high resolution pictures.
8. The apparatus of claim 1, wherein a respective binary mask is constructed for each of the high resolution pictures, the binary mask indicating respective locations of foreground pixels in the high resolution pictures.
9. A method, comprising:
receiving high resolution pictures and generating low resolution pictures and metadata there from, the metadata for guiding post-decoding post-processing of the low resolution pictures and the metadata; and
encoding the low resolution pictures and the metadata using at least one encoder,
wherein the metadata comprises motion transformation information relating to motion
between two or more of the high resolution pictures and sampling grid indices, and
wherein the high resolution pictures include at least one reference picture and one or more non-reference pictures, and the down-sampler and metadata generator generate the low resolution pictures by estimating motion from a reference picture to each of the one or more non-reference pictures, selecting one or more down-sampling grids from a plurality of candidate down-sampling grids for use in down-sampling the high resolution pictures based on the motion information, and down-sampling the high resolution pictures using the one or more down-sampling grids.
10. The method of claim 9, wherein the motion transformation information comprises global motion transformation information relating to global motion between two or more of the high resolution pictures.
11. The method of claim 9, wherein the sampling grid indices indicate each respective one of a plurality of down-sampling grids used to generate the low resolution pictures from the high resolution pictures by down-sampling.
12. The method of claim 9, wherein the one or more down-sampling grids are selected based on the motion information such that each of the high resolution pictures, when down-sampled using the one or more down-sampling grids, provide complementary pixel information for the post-decoding post-processing of the low resolution pictures.
13. The method of claim 12, wherein the grids are further selected based upon a filling factor that indicates a number of previously unfilled pixels in a super resolution picture generated using a particular one of the one or more down-sampling grids, the super resolution picture corresponding to an output provided by the post-decoding post-processing of the low resolution pictures and the metadata.
14. The method of claim 12, wherein the grids are further selected based upon a distortion measure.
15. The method of claim 9, wherein different ones of the plurality of down-sampling grids are used to down-sample different portions of a particular one of at least one of the high resolution pictures.
16. The method of claim 9, wherein a respective binary mask is constructed for each of the high resolution pictures, the binary mask indicating respective locations of foreground pixels in the high resolution pictures.
17. An apparatus, comprising:
a decoder for receiving a bitstream and decoding low resolution pictures and metadata there from; and
a super resolution post-processor for reconstructing high resolution pictures respectively corresponding to the low resolution pictures using the low resolution pictures and the metadata,
wherein the metadata comprises motion transformation information and sampling grid indices, and
wherein the super-resolution post-processor creates a high resolution mosaic from the metadata and the low resolution pictures by interpolating pixel values at pixel positions in the low resolution pictures, and reconstructs the high resolution pictures using the low resolution pictures, the metadata, and the high resolution mosaic.
18. The apparatus of claim 17, wherein the motion transformation information comprises global motion transformation information relating to global motion between two or more of the high resolution pictures.
19. The apparatus of claim 17, wherein the sampling grid indices indicate each respective one of a plurality of down-sampling grids used to generate the low resolution pictures from the high resolution pictures by down-sampling.
20. The apparatus of claim 17, wherein the high resolution mosaic is created by interpolating pixel values at pixel positions in the high resolution mosaic from pixel values of surrounding co-located pixels in the low resolution pictures.
21. The apparatus of claim 17, wherein said super-resolution post-processor generates a validity map that includes a measure of validity for each of pixels in the high resolution mosaic.
22. The apparatus of claim 21, wherein the measure of validity, for a given one of the pixels in the high resolution mosaic, is computed based on samples in a neighborhood around the given one of the pixels, and the given one of the pixels is designated as acceptable for use in reconstructing the high resolution pictures only if the measure of validity computed for the given one of the pixels is above a threshold value.
23. The apparatus of claim 17, wherein a given one of the high resolution pictures is reconstructed by interpolating pixel values at pixel positions in the given one of the high resolution pictures from pixels values of at least one of surrounding co-located pixels in a corresponding one of the low resolution pictures, surrounding co-located pixels in the high resolution mosaic, and surrounding co-located pixels in at least another one of the low resolution pictures, wherein the interpolating from the surrounding co-located pixels in the high resolution mosaic involves a motion transformation of pixels between the given one of the high resolution pictures and the high resolution mosaic, and wherein the interpolating from the surrounding co-located pixels in the at least other one of the low resolution pictures involves the motion transformation of pixels between the given one of the high resolution pictures and the at least other one of the low resolution pictures.
24. The apparatus of claim 17, wherein foreground pixels of a particular one of the high resolution pictures are reconstructed by interpolating from surrounding co-located pixels in the low resolution pictures.
25. A method, comprising:
receiving a bitstream and decoding low resolution pictures and metadata there from using a decoder; and
reconstructing high resolution pictures respectively corresponding to the low resolution pictures using the low resolution pictures and the metadata,
wherein the metadata comprises motion transformation information and sampling grid indices, and
wherein the super-resolution post-processor creates a high resolution mosaic from the metadata and the low resolution pictures by interpolating pixel values at pixel positions in the low resolution pictures, and reconstructs the high resolution pictures using the low resolution pictures, the metadata, and the high resolution mosaic.
26. The method of claim 25, wherein the motion transformation information comprises global motion transformation information relating to global motion between two or more of the high resolution pictures.
27. The method of claim 25, wherein the sampling grid indices indicate each respective one of a plurality of down-sampling grids used to generate the low resolution pictures from the high resolution pictures by down-sampling.
28. The method of claim 25, wherein the high resolution mosaic is created by interpolating pixel values at pixel positions in the high resolution mosaic from pixel values of surrounding co-located pixels in the low resolution pictures.
29. The method of claim 25, further comprising generating a validity map that includes a measure of validity for each of pixels in the high resolution mosaic.
30. The method of claim 29, wherein the measure of validity, for a given one of the pixels in the high resolution mosaic, is computed based on samples in a neighborhood around the given one of the pixels, and the given one of the pixels is designated as acceptable for use in reconstructing the high resolution pictures only if the measure of validity computed for the given one of the pixels is above a threshold value.
31. The method of claim 25, wherein a given one of the high resolution pictures is reconstructed by interpolating pixel values at pixel positions in the given one of the high resolution pictures from pixels values of at least one of surrounding co-located pixels in a corresponding one of the low resolution pictures, surrounding co-located pixels in the high resolution mosaic, and surrounding co-located pixels in at least another one of the low resolution pictures, wherein the interpolating from the surrounding co-located pixels in the high resolution mosaic involves a motion transformation of pixels between the given one of the high resolution pictures and the high resolution mosaic, and wherein the interpolating from the surrounding co-located pixels in the at least other one of the low resolution pictures involves the motion transformation of pixels between the given one of the high resolution pictures and the at least other one of the low resolution pictures.
32. The apparatus of claim 25, wherein foreground pixels of a particular one of the high resolution pictures are reconstructed by interpolating from surrounding co-located pixels in the low resolution pictures.
33. A non-transitory computer readable storage media having video signal data encoded thereupon, comprising:
encoded low resolution pictures and metadata generated from high resolution pictures, the metadata for guiding post-decoding post-processing of the low resolution pictures and the metadata,
wherein the metadata comprises motion transformation information and sampling grid indices, and
wherein super-resolution post-processing creates a high resolution mosaic from the metadata and the low resolution pictures by interpolating pixel values at pixel positions in the low resolution pictures, and reconstructs the high resolution pictures using the low resolution pictures, the metadata, and the high resolution mosaic.
US13/574,428 2010-01-22 2011-01-20 Methods and apparatus for sampling-based super resolution video encoding and decoding Active 2032-04-17 US9602814B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/574,428 US9602814B2 (en) 2010-01-22 2011-01-20 Methods and apparatus for sampling-based super resolution video encoding and decoding

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US29732010P 2010-01-22 2010-01-22
PCT/US2011/000107 WO2011090790A1 (en) 2010-01-22 2011-01-20 Methods and apparatus for sampling -based super resolution vido encoding and decoding
US13/574,428 US9602814B2 (en) 2010-01-22 2011-01-20 Methods and apparatus for sampling-based super resolution video encoding and decoding

Publications (2)

Publication Number Publication Date
US20120294369A1 US20120294369A1 (en) 2012-11-22
US9602814B2 true US9602814B2 (en) 2017-03-21

Family

ID=43755097

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/574,428 Active 2032-04-17 US9602814B2 (en) 2010-01-22 2011-01-20 Methods and apparatus for sampling-based super resolution video encoding and decoding

Country Status (6)

Country Link
US (1) US9602814B2 (en)
EP (1) EP2526698A1 (en)
JP (1) JP5911809B2 (en)
KR (1) KR101789845B1 (en)
CN (1) CN102823242B (en)
WO (1) WO2011090790A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180139362A1 (en) * 2016-11-11 2018-05-17 Industrial Technology Research Institute Method and system for generating a video frame
US10349069B2 (en) * 2012-12-11 2019-07-09 Sony Interactive Entertainment Inc. Software hardware hybrid video encoder
US10401143B2 (en) * 2014-09-10 2019-09-03 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and controlling a three-dimensional measuring device
US10825206B2 (en) * 2018-10-19 2020-11-03 Samsung Electronics Co., Ltd. Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image
US20210358083A1 (en) 2018-10-19 2021-11-18 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
US11395001B2 (en) 2019-10-29 2022-07-19 Samsung Electronics Co., Ltd. Image encoding and decoding methods and apparatuses using artificial intelligence
US11688038B2 (en) 2018-10-19 2023-06-27 Samsung Electronics Co., Ltd. Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8681866B1 (en) 2011-04-28 2014-03-25 Google Inc. Method and apparatus for encoding video by downsampling frame resolution
US8780976B1 (en) * 2011-04-28 2014-07-15 Google Inc. Method and apparatus for encoding video using granular downsampling of frame resolution
EP2555521A1 (en) * 2011-08-01 2013-02-06 Advanced Digital Broadcast S.A. A method and system for transmitting a high resolution video stream as a low resolution video stream
EP2557789B1 (en) 2011-08-09 2017-09-27 Dolby Laboratories Licensing Corporation Guided image up-sampling in video coding
WO2013056129A1 (en) * 2011-10-14 2013-04-18 Advanced Micro Devices, Inc. Region-based image compression
US9236024B2 (en) 2011-12-06 2016-01-12 Glasses.Com Inc. Systems and methods for obtaining a pupillary distance measurement using a mobile computing device
WO2013105946A1 (en) * 2012-01-11 2013-07-18 Thomson Licensing Motion compensating transformation for video coding
US9286715B2 (en) 2012-05-23 2016-03-15 Glasses.Com Inc. Systems and methods for adjusting a virtual try-on
US9311746B2 (en) 2012-05-23 2016-04-12 Glasses.Com Inc. Systems and methods for generating a 3-D model of a virtual try-on product
US9483853B2 (en) 2012-05-23 2016-11-01 Glasses.Com Inc. Systems and methods to display rendered images
TWI674792B (en) * 2012-08-06 2019-10-11 美商Vid衡器股份有限公司 Sampling grid information for spatial layers in multi-layer video coding
KR102114509B1 (en) * 2012-08-24 2020-05-22 아이큐브드 연구소 주식회사 Receiving device, transmission device, and image transmission method
US20140118460A1 (en) * 2012-11-01 2014-05-01 Microsoft Corporation Video Coding
US20140119456A1 (en) * 2012-11-01 2014-05-01 Microsoft Corporation Encoding video into lower resolution streams
US20140119446A1 (en) * 2012-11-01 2014-05-01 Microsoft Corporation Preserving rounding errors in video coding
US9185437B2 (en) 2012-11-01 2015-11-10 Microsoft Technology Licensing, Llc Video data
KR101420638B1 (en) * 2012-12-14 2014-07-17 에스케이플래닛 주식회사 An apparatus for presenting contents in streaming services and a method thereof
KR102121558B1 (en) * 2013-03-15 2020-06-10 삼성전자주식회사 Method of stabilizing video image, post-processing device and video encoder including the same
KR102131326B1 (en) * 2013-08-22 2020-07-07 삼성전자 주식회사 Image Frame Motion Estimation Device, Encoding Method Thereof
US9774865B2 (en) * 2013-12-16 2017-09-26 Samsung Electronics Co., Ltd. Method for real-time implementation of super resolution
KR20160103012A (en) * 2014-01-03 2016-08-31 톰슨 라이센싱 Method, apparatus, and computer program product for optimising the upscaling to ultrahigh definition resolution when rendering video content
WO2016132153A1 (en) 2015-02-19 2016-08-25 Magic Pony Technology Limited Offline training of hierarchical algorithms
GB201603144D0 (en) 2016-02-23 2016-04-06 Magic Pony Technology Ltd Training end-to-end video processes
GB201604672D0 (en) 2016-03-18 2016-05-04 Magic Pony Technology Ltd Generative methods of super resolution
WO2016156864A1 (en) 2015-03-31 2016-10-06 Magic Pony Technology Limited Training end-to-end video processes
KR20170023484A (en) 2015-08-24 2017-03-06 삼성전자주식회사 Device and method for processing image
US10764602B2 (en) 2016-01-25 2020-09-01 Koninklijke Kpn N.V. Spatial scalable video coding
JP6516695B2 (en) * 2016-02-24 2019-05-22 株式会社 日立産業制御ソリューションズ Image processing system and image processing apparatus
WO2017178808A1 (en) 2016-04-12 2017-10-19 Magic Pony Technology Limited Visual data processing using energy networks
KR102520957B1 (en) * 2016-04-15 2023-04-12 삼성전자주식회사 Encoding apparatus, decoding apparatus and method thereof
GB201607994D0 (en) 2016-05-06 2016-06-22 Magic Pony Technology Ltd Encoder pre-analyser
US10701394B1 (en) 2016-11-10 2020-06-30 Twitter, Inc. Real-time video super-resolution with spatio-temporal networks and motion compensation
US10721471B2 (en) * 2017-10-26 2020-07-21 Intel Corporation Deep learning based quantization parameter estimation for video encoding
CN108184116A (en) * 2017-12-18 2018-06-19 西南技术物理研究所 A kind of image rebuilding method suitable for the transmission of wireless data chain
JP7269257B2 (en) 2018-04-13 2023-05-08 コニンクリーケ・ケイピーエヌ・ナムローゼ・フェンノートシャップ Frame-level super-resolution-based video coding
KR102082816B1 (en) * 2018-04-24 2020-02-28 주식회사 지디에프랩 Method for improving the resolution of streaming files
US10824917B2 (en) 2018-12-03 2020-11-03 Bank Of America Corporation Transformation of electronic documents by low-resolution intelligent up-sampling
KR102619516B1 (en) * 2019-03-25 2023-12-28 텔레다인 디지털 이미징, 아이엔씨. Method and related device for generating super-resolution images
WO2021096159A1 (en) * 2019-11-15 2021-05-20 한국과학기술원 System and method for ingesting live video
US11367165B2 (en) * 2020-05-19 2022-06-21 Facebook Technologies, Llc. Neural super-sampling for real-time rendering
AU2020281143B1 (en) * 2020-12-04 2021-03-25 Commonwealth Scientific And Industrial Research Organisation Creating super-resolution images
FR3118380A1 (en) * 2020-12-22 2022-06-24 Fondation B-Com Method of coding images of a video sequence to be coded, method of decoding, corresponding devices and system.
US11924408B2 (en) 2021-01-14 2024-03-05 Tencent America LLC Method and apparatus for video coding
EP4300963A1 (en) * 2021-03-30 2024-01-03 Panasonic Intellectual Property Corporation of America Image encoding method, image decoding method, image processing method, image encoding device, and image decoding device
CN113329228A (en) * 2021-05-27 2021-08-31 杭州朗和科技有限公司 Video encoding method, decoding method, device, electronic device and storage medium
KR102632638B1 (en) * 2022-01-03 2024-02-01 네이버 주식회사 Method and system for generating super resolution image in hardware decoder environment
CN114092337B (en) * 2022-01-19 2022-04-22 苏州浪潮智能科技有限公司 Method and device for super-resolution amplification of image at any scale
CN114650449A (en) * 2022-03-03 2022-06-21 京东科技信息技术有限公司 Video data processing method and device
CN116776228B (en) * 2023-08-17 2023-10-20 合肥工业大学 Power grid time sequence data decoupling self-supervision pre-training method and system

Citations (125)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US845751A (en) 1906-03-03 1907-03-05 Max Ams Machine Co Method of forming covers for sheet-metal cans.
WO1994006099A1 (en) 1992-09-01 1994-03-17 Apple Computer, Inc. Improved vector quantization
JPH07222145A (en) 1994-01-31 1995-08-18 Mitsubishi Electric Corp Picture encoding device
US5446806A (en) 1993-11-15 1995-08-29 National Semiconductor Corporation Quadtree-structured Walsh transform video/image coding
US5537155A (en) 1994-04-29 1996-07-16 Motorola, Inc. Method for estimating motion in a video sequence
US5557684A (en) 1993-03-15 1996-09-17 Massachusetts Institute Of Technology System for encoding image data into multiple layers representing regions of coherent motion and associated motion parameters
JPH08336134A (en) 1995-04-06 1996-12-17 Sanyo Electric Co Ltd Method and device for moving image compression coding, method and device for moving image decoding and recording medium
US5754236A (en) 1995-05-29 1998-05-19 Samsung Electronics Co., Ltd. Variable bit rate coding using the BFOS algorithm
US5764374A (en) 1996-02-05 1998-06-09 Hewlett-Packard Company System and method for lossless image compression having improved sequential determination of golomb parameter
US5768434A (en) 1993-11-15 1998-06-16 National Semiconductor Corp. Quadtree-structured walsh transform coding
US5784491A (en) 1994-02-18 1998-07-21 Fujitsu Limited Method and apparatus for coding and decoding image data using vector quantization
US5862342A (en) 1996-10-31 1999-01-19 Sensormatic Electronics Corporation Intelligent video information management system with information archiving capabilities
WO1998019450A3 (en) 1996-10-31 1999-02-25 Sensormatic Electronics Corp Intelligent video information management system
KR0169662B1 (en) 1994-12-28 1999-03-20 배순훈 A classified pts vq encoder
US6043838A (en) 1997-11-07 2000-03-28 General Instrument Corporation View offset estimation for stereoscopic video coding
JP3027670B2 (en) 1993-05-07 2000-04-04 キヤノン株式会社 Photovoltaic element
JP2000215318A (en) 1999-01-20 2000-08-04 Univ Of Washington Method for clustering input vector
US6173089B1 (en) 1997-04-02 2001-01-09 U.S. Philips Corporation Image handling system and method
US6278446B1 (en) 1998-02-23 2001-08-21 Siemens Corporate Research, Inc. System for interactive organization and browsing of video
US20010055340A1 (en) * 1998-10-09 2001-12-27 Hee-Yong Kim Efficient down conversion system for 2:1 decimation
US20020009230A1 (en) 1998-12-18 2002-01-24 Shijun Sun Template matching using correlative auto-predicative search
US20020036705A1 (en) 2000-06-13 2002-03-28 Samsung Electronics Co., Ltd. Format converter using bi-directional motion vector and method thereof
US6397166B1 (en) 1998-11-06 2002-05-28 International Business Machines Corporation Method and system for model-based clustering and signal-bearing medium for storing program of same
US20020172434A1 (en) 2001-04-20 2002-11-21 Mitsubishi Electric Research Laboratories, Inc. One-pass super-resolution images
US20030005258A1 (en) 2001-03-22 2003-01-02 Modha Dharmendra Shantilal Feature weighting in k-means clustering
US20030021343A1 (en) 1998-07-08 2003-01-30 Philips Electronics North America Corporation Low bandwidth encoding scheme for video transmission
US6526183B1 (en) 1998-08-05 2003-02-25 Koninklijke Philips Electronics N.V. Static image generation method and device
US20030058943A1 (en) 2001-07-18 2003-03-27 Tru Video Corporation Dictionary generation method for video and image compression
CN1128097C (en) 1998-06-19 2003-11-19 中山医科大学中山眼科中心 Improved preparing process for changing coral into hydroxy-apatite 'artificial bone'
US20040001705A1 (en) 2002-06-28 2004-01-01 Andreas Soupliotis Video processing system and method for automatic enhancement of digital video
US20040017852A1 (en) 2002-05-29 2004-01-29 Diego Garrido Predictive interpolation of a video signal
WO2003084238A3 (en) 2002-03-26 2004-02-05 Gen Instrument Corp Methods and apparatus for efficient global motion compensation encoding and associated decoding
JP2004222218A (en) 2003-01-15 2004-08-05 Toa Corp Method for compressing and extending image
US20040170330A1 (en) 1998-08-12 2004-09-02 Pixonics, Inc. Video coding reconstruction apparatus and methods
US6795578B1 (en) 1999-09-29 2004-09-21 Canon Kabushiki Kaisha Image processing apparatus and method, and storage medium
JP2004266794A (en) 2002-09-04 2004-09-24 Microsoft Corp Multi-resolution video coding and decoding
US6798834B1 (en) 1996-08-15 2004-09-28 Mitsubishi Denki Kabushiki Kaisha Image coding apparatus with segment classification and segmentation-type motion prediction circuit
EP1401211A3 (en) 2002-09-04 2004-10-27 Microsoft Corporation Multi-resolution video coding and decoding
US20040218834A1 (en) 2003-04-30 2004-11-04 Microsoft Corporation Patch-based video super-resolution
US20040258148A1 (en) 2001-07-27 2004-12-23 Paul Kerbiriou Method and device for coding a scene
US20050015259A1 (en) 2003-07-18 2005-01-20 Microsoft Corporation Constant bitrate media encoding techniques
JP2005020761A (en) 2003-06-27 2005-01-20 Sungjin C & C Co Ltd Method of restoring and reconstructing super-resolution image from low-resolution image subjected to data compression
WO2005043882B1 (en) 2003-10-21 2005-10-13 Prismvideo Inc Video source coding with side information
US20050225553A1 (en) 2004-04-09 2005-10-13 Cheng-Jan Chi Hybrid model sprite generator (HMSG) and a method for generating sprite of the same
US20050243921A1 (en) 2004-03-26 2005-11-03 The Hong Kong University Of Science And Technology Efficient multi-frame motion estimation for video compression
US20060013303A1 (en) 1997-11-14 2006-01-19 Ac Capital Management, Inc. Apparatus and method for compressing video information
US20060039617A1 (en) 2003-02-28 2006-02-23 Bela Makai Method and assembly for video encoding, the video encoding including texture analysis and texture synthesis, and corresponding computer program and corresponding computer-readable storage medium
WO2006025339A1 (en) 2004-08-30 2006-03-09 Matsushita Electric Industrial Co., Ltd. Decoder, encoder, decoding method and encoding method
US20060088191A1 (en) 2004-10-25 2006-04-27 Tong Zhang Video content understanding through real time video motion analysis
CN1777287A (en) 2004-11-19 2006-05-24 株式会社Ntt都科摩 Image decoding apparatus, image decoding program, image decoding method, image encoding apparatus, image encoding program, and image encoding method
EP1659532A2 (en) 2004-11-19 2006-05-24 NTT DoCoMo, Inc. Image decoding apparatus, image decoding program, image decoding method, image encoding apparatus, image encoding program, and image encoding method
US20060126960A1 (en) 2004-12-15 2006-06-15 Lingxiang Zhou Pattern classification and filter design for increasing image resolution
JP2006203744A (en) 2005-01-24 2006-08-03 Victor Co Of Japan Ltd Still image generating apparatus and still image generation method
CN1276946C (en) 2001-06-08 2006-09-27 第一毛织株式会社 Flame retardant thermoplastic resin composition
US20060239345A1 (en) 2002-09-20 2006-10-26 David Taubman Method of signalling motion information for efficient scalable video compression
US20060245502A1 (en) 2005-04-08 2006-11-02 Hui Cheng Macro-block based mixed resolution video compression system
CN1863272A (en) 2006-02-14 2006-11-15 华为技术有限公司 Ultra-resolution ratio reconstructing method for video-image
US20060269149A1 (en) 2005-05-24 2006-11-30 Samsung Electronics Co., Ltd. Encoding and decoding apparatus and method for reducing blocking phenomenon and computer-readable recording medium storing program for executing the method
US20070041663A1 (en) 2005-08-03 2007-02-22 Samsung Electronics Co., Ltd. Apparatus and method for super-resolution enhancement processing
US20070118376A1 (en) 2005-11-18 2007-05-24 Microsoft Corporation Word clustering for input data
US20070223808A1 (en) 2006-03-23 2007-09-27 Canon Information Systems Research Australia Pty Ltd Motion characterisation
US20070223825A1 (en) 2006-03-27 2007-09-27 Yan Ye Methods and systems for significance coefficient coding in video compression
US20070248272A1 (en) 2006-04-19 2007-10-25 Microsoft Corporation Vision-Based Compression
WO2007111966A9 (en) 2006-03-24 2007-11-15 Cernium Corp System for pruning video data, and application thereof
US20080107346A1 (en) 2006-10-17 2008-05-08 Chao Zhang Scene-based non-uniformity correction and enhancement method using super-resolution
US20080131000A1 (en) 2006-12-01 2008-06-05 Compal Electronics, Inc. Method for generating typographical line
WO2008066025A1 (en) 2006-11-27 2008-06-05 Panasonic Corporation Image encoding apparatus and image decoding part
JP2008148119A (en) 2006-12-12 2008-06-26 Sony Corp Portable terminal device and display method as well as program
US20080152243A1 (en) 2006-12-20 2008-06-26 Samsung Electronics Co., Ltd. Image encoding and decoding method and apparatus using texture synthesis
US20080159401A1 (en) 2007-01-03 2008-07-03 Samsung Electronics Co., Ltd. Method and apparatus for estimating motion vector using plurality of motion vector predictors, encoder, decoder, and decoding method
US20080172379A1 (en) 2007-01-17 2008-07-17 Fujitsu Limited Recording medium storing a design support program, design support method, and design support apparatus
US20080187305A1 (en) 2007-02-06 2008-08-07 Ramesh Raskar 4D light field cameras
US7433526B2 (en) 2002-04-30 2008-10-07 Hewlett-Packard Development Company, L.P. Method for compressing images and image sequences through adaptive partitioning
JP2008289005A (en) 2007-05-18 2008-11-27 Ntt Docomo Inc Image-predicting/encoding device, image prediction/encoding method, image prediction/encoding program, image-predicting/decoding device, image prediction/decoding method, and image prediction/decoding program
US20090003443A1 (en) 2007-06-26 2009-01-01 Nokia Corporation Priority-based template matching intra prediction video and image coding
US20090002379A1 (en) 2007-06-30 2009-01-01 Microsoft Corporation Video decoding implementations for a graphics processing unit
US20090041367A1 (en) 2007-08-07 2009-02-12 Texas Instruments Incorporated Quantization method and apparatus
US20090080804A1 (en) 2007-09-21 2009-03-26 Hitachi, Ltd. Method for transmitting and receiving image, receiving device, and image storage device
US20090097564A1 (en) 2007-10-10 2009-04-16 To-Wei Chen Matching-pixel Sub-sampling Motion Estimation Method for Video Compression
US20090097756A1 (en) 2007-10-11 2009-04-16 Fuji Xerox Co., Ltd. Similar image search apparatus and computer readable medium
WO2009052742A1 (en) 2007-10-15 2009-04-30 Huawei Technologies Co., Ltd. An interframe prediction encoding/decoding method and apparatus
US20090116759A1 (en) 2005-07-05 2009-05-07 Ntt Docomo, Inc. Video encoding device, video encoding method, video encoding program, video decoding device, video decoding method, and video decoding program
CN101459842A (en) 2008-12-17 2009-06-17 浙江大学 Decoding method and apparatus for space desampling
US20090175538A1 (en) 2007-07-16 2009-07-09 Novafora, Inc. Methods and systems for representation and matching of video content
WO2009087641A2 (en) 2008-01-10 2009-07-16 Ramot At Tel-Aviv University Ltd. System and method for real-time super-resolution
US20090180538A1 (en) 2008-01-16 2009-07-16 The Regents Of The University Of California Template matching scheme using multiple predictors as candidates for intra-prediction
WO2009091080A1 (en) 2008-01-18 2009-07-23 Sharp Kabushiki Kaisha Methods, device, program and media for texture synthesis for video coding with side information
WO2009094036A1 (en) 2008-01-25 2009-07-30 Hewlett-Packard Development Company, L.P. Coding mode selection for block-based encoding
US20090196350A1 (en) 2007-01-11 2009-08-06 Huawei Technologies Co., Ltd. Methods and devices of intra prediction encoding and decoding
US20090232215A1 (en) 2008-03-12 2009-09-17 Lg Electronics Inc. Method and an Apparatus for Encoding or Decoding a Video Signal
US20090245587A1 (en) 2008-03-31 2009-10-01 Microsoft Corporation Classifying and controlling encoding quality for textured, dark smooth and smooth video content
CN101551903A (en) 2009-05-11 2009-10-07 天津大学 Super-resolution image restoration method in gait recognition
US20090252431A1 (en) 2007-09-07 2009-10-08 Microsoft Corporation Image Resizing for Web-based Image Search
JP2009239686A (en) 2008-03-27 2009-10-15 Hitachi Ltd Broadcast receiving system, home gateway device, and broadcast receiving terminal device
US20090274377A1 (en) 2005-11-11 2009-11-05 Japan Advanced Institute Of Science And Technology Clustering System and Image Processing System Having the Same
JP2009267710A (en) 2008-04-24 2009-11-12 Ntt Docomo Inc Image prediction encoding device, image prediction encoding method, image prediction encoding program, image prediction decoding device, image prediction decoding method, and image prediction decoding program
US7623706B1 (en) 2000-09-29 2009-11-24 Hewlett-Packard Development Company, L.P. Reduction of chromatic bleeding artifacts in images containing subsampled chrominance values
WO2009157904A1 (en) 2008-06-27 2009-12-30 Thomson Licensing Methods and apparatus for texture compression using patch-based sampling texture synthesis
US7671894B2 (en) 2004-12-17 2010-03-02 Mitsubishi Electric Research Laboratories, Inc. Method and system for processing multiview videos for view synthesis using skip and direct modes
US20100074549A1 (en) 2008-09-22 2010-03-25 Microsoft Corporation Image upsampling with training images
WO2010033151A1 (en) 2008-09-18 2010-03-25 Thomson Licensing Methods and apparatus for video imaging pruning
US20100091846A1 (en) 2007-04-09 2010-04-15 Ntt Docomo, Inc Image prediction/encoding device, image prediction/encoding method, image prediction/encoding program, image prediction/decoding device, image prediction/decoding method, and image prediction decoding program
US20100104184A1 (en) 2007-07-16 2010-04-29 Novafora, Inc. Methods and systems for representation and matching of video content
US20100150394A1 (en) 2007-06-14 2010-06-17 Jeffrey Adam Bloom Modifying a coded bitstream
FR2941581A1 (en) 2009-01-28 2010-07-30 France Telecom Video image sequence coding method, involves identifying better candidate zone in set of candidate zones, minimizing reconstruction error with respect to target zone, and determining indication representing identified better candidate zone
US20100196721A1 (en) 2007-05-30 2010-08-05 Kazufumi Ogawa Adhesion method, and biochemical chip and optical component made by the same
CN101389021B (en) 2007-09-14 2010-12-22 华为技术有限公司 Video encoding/decoding method and apparatus
US20110007800A1 (en) 2008-01-10 2011-01-13 Thomson Licensing Methods and apparatus for illumination compensation of intra-predicted video
US20110047163A1 (en) 2009-08-24 2011-02-24 Google Inc. Relevance-Based Image Selection
US20110142330A1 (en) 2009-12-10 2011-06-16 Samsung Electronics Co., Ltd. Image processing apparatus and method
WO2011090798A1 (en) 2010-01-22 2011-07-28 Thomson Licensing Data pruning for video compression using example-based super-resolution
US20110210960A1 (en) 2010-02-26 2011-09-01 Google Inc. Hierarchical blurring of texture maps
WO2011154127A1 (en) 2010-06-08 2011-12-15 Phoenix Contact Gmbh & Co. Kg Electrical device with a plug-type connector and electrical plug-type connection
US20120106862A1 (en) 2009-05-15 2012-05-03 Sony Corporation Image processing device, method, and program
US20120155766A1 (en) 2010-12-17 2012-06-21 Sony Corporation Patch description and modeling for image subscene recognition
US20120201475A1 (en) 2009-10-05 2012-08-09 I.C.V.T. Ltd. Method and system for processing an image
US20120320983A1 (en) 2010-01-19 2012-12-20 Thomson Licensing Methods and apparatus for reduced complexity template matching prediction for video encoding and decoding
US8340463B1 (en) 2008-08-29 2012-12-25 Adobe Systems Incorporated Candidate pruning for patch transforms
US20130163676A1 (en) 2010-09-10 2013-06-27 Thomson Licensing Methods and apparatus for decoding video signals using motion compensated example-based super-resolution for video compression
US20130163679A1 (en) 2010-09-10 2013-06-27 Dong-Qing Zhang Video decoding using example-based data pruning
US20130170558A1 (en) 2010-09-10 2013-07-04 Thomson Licensing Video decoding using block-based mixed-resolution data pruning
US20130170746A1 (en) 2010-09-10 2013-07-04 Thomson Licensing Recovering a pruned version of a picture in a video sequence for example-based data pruning using intra-frame patch similarity
US20140036054A1 (en) 2012-03-28 2014-02-06 George Zouridakis Methods and Software for Screening and Diagnosing Skin Lesions and Plant Diseases
US20140056518A1 (en) 2012-08-22 2014-02-27 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and program
CN101556690B (en) 2009-05-14 2015-01-07 复旦大学 Image super-resolution method based on overcomplete dictionary learning and sparse representation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101198064A (en) * 2007-12-10 2008-06-11 武汉大学 Movement vector prediction method in resolution demixing technology

Patent Citations (152)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US845751A (en) 1906-03-03 1907-03-05 Max Ams Machine Co Method of forming covers for sheet-metal cans.
WO1994006099A1 (en) 1992-09-01 1994-03-17 Apple Computer, Inc. Improved vector quantization
US5822465A (en) 1992-09-01 1998-10-13 Apple Computer, Inc. Image encoding by vector quantization of regions of an image and codebook updates
US5557684A (en) 1993-03-15 1996-09-17 Massachusetts Institute Of Technology System for encoding image data into multiple layers representing regions of coherent motion and associated motion parameters
JP3027670B2 (en) 1993-05-07 2000-04-04 キヤノン株式会社 Photovoltaic element
US5768434A (en) 1993-11-15 1998-06-16 National Semiconductor Corp. Quadtree-structured walsh transform coding
US5446806A (en) 1993-11-15 1995-08-29 National Semiconductor Corporation Quadtree-structured Walsh transform video/image coding
JPH07222145A (en) 1994-01-31 1995-08-18 Mitsubishi Electric Corp Picture encoding device
US20070014354A1 (en) 1994-01-31 2007-01-18 Mitsubishi Denki Kabushiki Kaisha Image coding apparatus with segment classification and segmentation-type motion prediction circuit
US5784491A (en) 1994-02-18 1998-07-21 Fujitsu Limited Method and apparatus for coding and decoding image data using vector quantization
US5537155A (en) 1994-04-29 1996-07-16 Motorola, Inc. Method for estimating motion in a video sequence
KR0169662B1 (en) 1994-12-28 1999-03-20 배순훈 A classified pts vq encoder
JPH08336134A (en) 1995-04-06 1996-12-17 Sanyo Electric Co Ltd Method and device for moving image compression coding, method and device for moving image decoding and recording medium
US5754236A (en) 1995-05-29 1998-05-19 Samsung Electronics Co., Ltd. Variable bit rate coding using the BFOS algorithm
US5764374A (en) 1996-02-05 1998-06-09 Hewlett-Packard Company System and method for lossless image compression having improved sequential determination of golomb parameter
US6798834B1 (en) 1996-08-15 2004-09-28 Mitsubishi Denki Kabushiki Kaisha Image coding apparatus with segment classification and segmentation-type motion prediction circuit
WO1998019450A3 (en) 1996-10-31 1999-02-25 Sensormatic Electronics Corp Intelligent video information management system
CN1495636A (en) 1996-10-31 2004-05-12 传感电子公司 Video information intelligent management system
US5862342A (en) 1996-10-31 1999-01-19 Sensormatic Electronics Corporation Intelligent video information management system with information archiving capabilities
US6173089B1 (en) 1997-04-02 2001-01-09 U.S. Philips Corporation Image handling system and method
US6043838A (en) 1997-11-07 2000-03-28 General Instrument Corporation View offset estimation for stereoscopic video coding
US20060013303A1 (en) 1997-11-14 2006-01-19 Ac Capital Management, Inc. Apparatus and method for compressing video information
US6278446B1 (en) 1998-02-23 2001-08-21 Siemens Corporate Research, Inc. System for interactive organization and browsing of video
CN1128097C (en) 1998-06-19 2003-11-19 中山医科大学中山眼科中心 Improved preparing process for changing coral into hydroxy-apatite 'artificial bone'
US20030021343A1 (en) 1998-07-08 2003-01-30 Philips Electronics North America Corporation Low bandwidth encoding scheme for video transmission
US6526183B1 (en) 1998-08-05 2003-02-25 Koninklijke Philips Electronics N.V. Static image generation method and device
US20040170330A1 (en) 1998-08-12 2004-09-02 Pixonics, Inc. Video coding reconstruction apparatus and methods
US20010055340A1 (en) * 1998-10-09 2001-12-27 Hee-Yong Kim Efficient down conversion system for 2:1 decimation
US6397166B1 (en) 1998-11-06 2002-05-28 International Business Machines Corporation Method and system for model-based clustering and signal-bearing medium for storing program of same
US20020009230A1 (en) 1998-12-18 2002-01-24 Shijun Sun Template matching using correlative auto-predicative search
JP2000215318A (en) 1999-01-20 2000-08-04 Univ Of Washington Method for clustering input vector
US6795578B1 (en) 1999-09-29 2004-09-21 Canon Kabushiki Kaisha Image processing apparatus and method, and storage medium
US20020036705A1 (en) 2000-06-13 2002-03-28 Samsung Electronics Co., Ltd. Format converter using bi-directional motion vector and method thereof
US7623706B1 (en) 2000-09-29 2009-11-24 Hewlett-Packard Development Company, L.P. Reduction of chromatic bleeding artifacts in images containing subsampled chrominance values
US20030005258A1 (en) 2001-03-22 2003-01-02 Modha Dharmendra Shantilal Feature weighting in k-means clustering
JP2003018398A (en) 2001-04-20 2003-01-17 Mitsubishi Electric Research Laboratories Inc Method for generating a super-resolution image from pixel image
US20020172434A1 (en) 2001-04-20 2002-11-21 Mitsubishi Electric Research Laboratories, Inc. One-pass super-resolution images
CN1276946C (en) 2001-06-08 2006-09-27 第一毛织株式会社 Flame retardant thermoplastic resin composition
US20030058943A1 (en) 2001-07-18 2003-03-27 Tru Video Corporation Dictionary generation method for video and image compression
US20040258148A1 (en) 2001-07-27 2004-12-23 Paul Kerbiriou Method and device for coding a scene
WO2003084238A3 (en) 2002-03-26 2004-02-05 Gen Instrument Corp Methods and apparatus for efficient global motion compensation encoding and associated decoding
US7433526B2 (en) 2002-04-30 2008-10-07 Hewlett-Packard Development Company, L.P. Method for compressing images and image sequences through adaptive partitioning
WO2003102868A3 (en) 2002-05-29 2004-04-08 Pixonics Inc Classifying image areas of a video signal
US20040017852A1 (en) 2002-05-29 2004-01-29 Diego Garrido Predictive interpolation of a video signal
US7386049B2 (en) 2002-05-29 2008-06-10 Innovation Management Sciences, Llc Predictive interpolation of a video signal
US20040001705A1 (en) 2002-06-28 2004-01-01 Andreas Soupliotis Video processing system and method for automatic enhancement of digital video
EP1401211A3 (en) 2002-09-04 2004-10-27 Microsoft Corporation Multi-resolution video coding and decoding
US20040213345A1 (en) 2002-09-04 2004-10-28 Microsoft Corporation Multi-resolution video coding and decoding
JP2004266794A (en) 2002-09-04 2004-09-24 Microsoft Corp Multi-resolution video coding and decoding
US20060239345A1 (en) 2002-09-20 2006-10-26 David Taubman Method of signalling motion information for efficient scalable video compression
JP2004222218A (en) 2003-01-15 2004-08-05 Toa Corp Method for compressing and extending image
US20060039617A1 (en) 2003-02-28 2006-02-23 Bela Makai Method and assembly for video encoding, the video encoding including texture analysis and texture synthesis, and corresponding computer program and corresponding computer-readable storage medium
JP2006519533A (en) 2003-02-28 2006-08-24 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ Method and assembly for video coding where video coding includes texture analysis and texture synthesis, corresponding computer program and corresponding computer-readable recording medium
US20040218834A1 (en) 2003-04-30 2004-11-04 Microsoft Corporation Patch-based video super-resolution
US20050019000A1 (en) 2003-06-27 2005-01-27 In-Keon Lim Method of restoring and reconstructing super-resolution image from low-resolution compressed image
JP2005020761A (en) 2003-06-27 2005-01-20 Sungjin C & C Co Ltd Method of restoring and reconstructing super-resolution image from low-resolution image subjected to data compression
US20050015259A1 (en) 2003-07-18 2005-01-20 Microsoft Corporation Constant bitrate media encoding techniques
WO2005043882B1 (en) 2003-10-21 2005-10-13 Prismvideo Inc Video source coding with side information
US20050243921A1 (en) 2004-03-26 2005-11-03 The Hong Kong University Of Science And Technology Efficient multi-frame motion estimation for video compression
US20050225553A1 (en) 2004-04-09 2005-10-13 Cheng-Jan Chi Hybrid model sprite generator (HMSG) and a method for generating sprite of the same
WO2006025339A1 (en) 2004-08-30 2006-03-09 Matsushita Electric Industrial Co., Ltd. Decoder, encoder, decoding method and encoding method
US20080117975A1 (en) 2004-08-30 2008-05-22 Hisao Sasai Decoder, Encoder, Decoding Method and Encoding Method
CN101048799A (en) 2004-10-25 2007-10-03 惠普开发有限公司 Video content understanding through real time video motion analysis
US7447337B2 (en) 2004-10-25 2008-11-04 Hewlett-Packard Development Company, L.P. Video content understanding through real time video motion analysis
US20060088191A1 (en) 2004-10-25 2006-04-27 Tong Zhang Video content understanding through real time video motion analysis
US7643690B2 (en) 2004-11-19 2010-01-05 Ntt Docomo, Inc. Image decoding and encoding apparatus, method and computer readable storage medium
US20100054338A1 (en) 2004-11-19 2010-03-04 Ntt Docomo, Inc. Image decoding apparatus, image decoding program, image decoding method, image encoding apparatus, image encoding program, and image encoding method
EP1659532A2 (en) 2004-11-19 2006-05-24 NTT DoCoMo, Inc. Image decoding apparatus, image decoding program, image decoding method, image encoding apparatus, image encoding program, and image encoding method
CN1777287A (en) 2004-11-19 2006-05-24 株式会社Ntt都科摩 Image decoding apparatus, image decoding program, image decoding method, image encoding apparatus, image encoding program, and image encoding method
US20060126960A1 (en) 2004-12-15 2006-06-15 Lingxiang Zhou Pattern classification and filter design for increasing image resolution
US7671894B2 (en) 2004-12-17 2010-03-02 Mitsubishi Electric Research Laboratories, Inc. Method and system for processing multiview videos for view synthesis using skip and direct modes
JP2006203744A (en) 2005-01-24 2006-08-03 Victor Co Of Japan Ltd Still image generating apparatus and still image generation method
US20060245502A1 (en) 2005-04-08 2006-11-02 Hui Cheng Macro-block based mixed resolution video compression system
US20060269149A1 (en) 2005-05-24 2006-11-30 Samsung Electronics Co., Ltd. Encoding and decoding apparatus and method for reducing blocking phenomenon and computer-readable recording medium storing program for executing the method
US20090116759A1 (en) 2005-07-05 2009-05-07 Ntt Docomo, Inc. Video encoding device, video encoding method, video encoding program, video decoding device, video decoding method, and video decoding program
US20070041663A1 (en) 2005-08-03 2007-02-22 Samsung Electronics Co., Ltd. Apparatus and method for super-resolution enhancement processing
US7715658B2 (en) 2005-08-03 2010-05-11 Samsung Electronics Co., Ltd. Apparatus and method for super-resolution enhancement processing
US20090274377A1 (en) 2005-11-11 2009-11-05 Japan Advanced Institute Of Science And Technology Clustering System and Image Processing System Having the Same
US20070118376A1 (en) 2005-11-18 2007-05-24 Microsoft Corporation Word clustering for input data
CN1863272A (en) 2006-02-14 2006-11-15 华为技术有限公司 Ultra-resolution ratio reconstructing method for video-image
US20070223808A1 (en) 2006-03-23 2007-09-27 Canon Information Systems Research Australia Pty Ltd Motion characterisation
WO2007111966A9 (en) 2006-03-24 2007-11-15 Cernium Corp System for pruning video data, and application thereof
US20070223825A1 (en) 2006-03-27 2007-09-27 Yan Ye Methods and systems for significance coefficient coding in video compression
US20070248272A1 (en) 2006-04-19 2007-10-25 Microsoft Corporation Vision-Based Compression
US20080107346A1 (en) 2006-10-17 2008-05-08 Chao Zhang Scene-based non-uniformity correction and enhancement method using super-resolution
WO2008066025A1 (en) 2006-11-27 2008-06-05 Panasonic Corporation Image encoding apparatus and image decoding part
US20100046845A1 (en) 2006-11-27 2010-02-25 Panasonic Corporation Image coding apparatus and image decoding apparatus
US20080131000A1 (en) 2006-12-01 2008-06-05 Compal Electronics, Inc. Method for generating typographical line
JP2008148119A (en) 2006-12-12 2008-06-26 Sony Corp Portable terminal device and display method as well as program
US20080152243A1 (en) 2006-12-20 2008-06-26 Samsung Electronics Co., Ltd. Image encoding and decoding method and apparatus using texture synthesis
JP2010514325A (en) 2006-12-20 2010-04-30 サムスン エレクトロニクス カンパニー リミテッド Video coding and decoding method and apparatus using texture synthesis
US20080159401A1 (en) 2007-01-03 2008-07-03 Samsung Electronics Co., Ltd. Method and apparatus for estimating motion vector using plurality of motion vector predictors, encoder, decoder, and decoding method
US20090196350A1 (en) 2007-01-11 2009-08-06 Huawei Technologies Co., Ltd. Methods and devices of intra prediction encoding and decoding
US20080172379A1 (en) 2007-01-17 2008-07-17 Fujitsu Limited Recording medium storing a design support program, design support method, and design support apparatus
US20080187305A1 (en) 2007-02-06 2008-08-07 Ramesh Raskar 4D light field cameras
US9031130B2 (en) 2007-04-09 2015-05-12 Ntt Docomo, Inc. Image prediction/encoding device, image prediction/encoding method, image prediction/encoding program, image prediction/decoding device, image prediction/decoding method, and image prediction decoding program
US20100091846A1 (en) 2007-04-09 2010-04-15 Ntt Docomo, Inc Image prediction/encoding device, image prediction/encoding method, image prediction/encoding program, image prediction/decoding device, image prediction/decoding method, and image prediction decoding program
JP2008289005A (en) 2007-05-18 2008-11-27 Ntt Docomo Inc Image-predicting/encoding device, image prediction/encoding method, image prediction/encoding program, image-predicting/decoding device, image prediction/decoding method, and image prediction/decoding program
US20100196721A1 (en) 2007-05-30 2010-08-05 Kazufumi Ogawa Adhesion method, and biochemical chip and optical component made by the same
US20100150394A1 (en) 2007-06-14 2010-06-17 Jeffrey Adam Bloom Modifying a coded bitstream
US20090003443A1 (en) 2007-06-26 2009-01-01 Nokia Corporation Priority-based template matching intra prediction video and image coding
US20090002379A1 (en) 2007-06-30 2009-01-01 Microsoft Corporation Video decoding implementations for a graphics processing unit
US20090175538A1 (en) 2007-07-16 2009-07-09 Novafora, Inc. Methods and systems for representation and matching of video content
US20100104184A1 (en) 2007-07-16 2010-04-29 Novafora, Inc. Methods and systems for representation and matching of video content
US20090041367A1 (en) 2007-08-07 2009-02-12 Texas Instruments Incorporated Quantization method and apparatus
US20090252431A1 (en) 2007-09-07 2009-10-08 Microsoft Corporation Image Resizing for Web-based Image Search
CN101389021B (en) 2007-09-14 2010-12-22 华为技术有限公司 Video encoding/decoding method and apparatus
US8831107B2 (en) 2007-09-14 2014-09-09 Tsinghua University Method and device for video coding and decoding
JP2009077189A (en) 2007-09-21 2009-04-09 Hitachi Ltd Video transmission/reception method, receiver, and video storage device
US20090080804A1 (en) 2007-09-21 2009-03-26 Hitachi, Ltd. Method for transmitting and receiving image, receiving device, and image storage device
US20090097564A1 (en) 2007-10-10 2009-04-16 To-Wei Chen Matching-pixel Sub-sampling Motion Estimation Method for Video Compression
US20090097756A1 (en) 2007-10-11 2009-04-16 Fuji Xerox Co., Ltd. Similar image search apparatus and computer readable medium
JP2011501542A (en) 2007-10-15 2011-01-06 華為技術有限公司 Method and apparatus for interframe predictive coding
US20100208814A1 (en) 2007-10-15 2010-08-19 Huawei Technologies Co., Ltd. Inter-frame prediction coding method and device
WO2009052742A1 (en) 2007-10-15 2009-04-30 Huawei Technologies Co., Ltd. An interframe prediction encoding/decoding method and apparatus
US20100272184A1 (en) 2008-01-10 2010-10-28 Ramot At Tel-Aviv University Ltd. System and Method for Real-Time Super-Resolution
WO2009087641A2 (en) 2008-01-10 2009-07-16 Ramot At Tel-Aviv University Ltd. System and method for real-time super-resolution
US20110007800A1 (en) 2008-01-10 2011-01-13 Thomson Licensing Methods and apparatus for illumination compensation of intra-predicted video
US20090180538A1 (en) 2008-01-16 2009-07-16 The Regents Of The University Of California Template matching scheme using multiple predictors as candidates for intra-prediction
US20090185747A1 (en) 2008-01-18 2009-07-23 Sharp Laboratories Of America, Inc. Systems and methods for texture synthesis for video coding with side information
WO2009091080A1 (en) 2008-01-18 2009-07-23 Sharp Kabushiki Kaisha Methods, device, program and media for texture synthesis for video coding with side information
WO2009094036A1 (en) 2008-01-25 2009-07-30 Hewlett-Packard Development Company, L.P. Coding mode selection for block-based encoding
US20090232215A1 (en) 2008-03-12 2009-09-17 Lg Electronics Inc. Method and an Apparatus for Encoding or Decoding a Video Signal
JP2009239686A (en) 2008-03-27 2009-10-15 Hitachi Ltd Broadcast receiving system, home gateway device, and broadcast receiving terminal device
US20090245587A1 (en) 2008-03-31 2009-10-01 Microsoft Corporation Classifying and controlling encoding quality for textured, dark smooth and smooth video content
US20110261886A1 (en) 2008-04-24 2011-10-27 Yoshinori Suzuki Image prediction encoding device, image prediction encoding method, image prediction encoding program, image prediction decoding device, image prediction decoding method, and image prediction decoding program
JP2009267710A (en) 2008-04-24 2009-11-12 Ntt Docomo Inc Image prediction encoding device, image prediction encoding method, image prediction encoding program, image prediction decoding device, image prediction decoding method, and image prediction decoding program
WO2009157904A1 (en) 2008-06-27 2009-12-30 Thomson Licensing Methods and apparatus for texture compression using patch-based sampling texture synthesis
US8340463B1 (en) 2008-08-29 2012-12-25 Adobe Systems Incorporated Candidate pruning for patch transforms
WO2010033151A1 (en) 2008-09-18 2010-03-25 Thomson Licensing Methods and apparatus for video imaging pruning
US20110170615A1 (en) 2008-09-18 2011-07-14 Dung Trung Vo Methods and apparatus for video imaging pruning
US20100074549A1 (en) 2008-09-22 2010-03-25 Microsoft Corporation Image upsampling with training images
CN101459842A (en) 2008-12-17 2009-06-17 浙江大学 Decoding method and apparatus for space desampling
FR2941581A1 (en) 2009-01-28 2010-07-30 France Telecom Video image sequence coding method, involves identifying better candidate zone in set of candidate zones, minimizing reconstruction error with respect to target zone, and determining indication representing identified better candidate zone
CN101551903A (en) 2009-05-11 2009-10-07 天津大学 Super-resolution image restoration method in gait recognition
CN101556690B (en) 2009-05-14 2015-01-07 复旦大学 Image super-resolution method based on overcomplete dictionary learning and sparse representation
US20120106862A1 (en) 2009-05-15 2012-05-03 Sony Corporation Image processing device, method, and program
US20110047163A1 (en) 2009-08-24 2011-02-24 Google Inc. Relevance-Based Image Selection
US20120201475A1 (en) 2009-10-05 2012-08-09 I.C.V.T. Ltd. Method and system for processing an image
US20110142330A1 (en) 2009-12-10 2011-06-16 Samsung Electronics Co., Ltd. Image processing apparatus and method
US20120320983A1 (en) 2010-01-19 2012-12-20 Thomson Licensing Methods and apparatus for reduced complexity template matching prediction for video encoding and decoding
WO2011090798A1 (en) 2010-01-22 2011-07-28 Thomson Licensing Data pruning for video compression using example-based super-resolution
US20110210960A1 (en) 2010-02-26 2011-09-01 Google Inc. Hierarchical blurring of texture maps
JP2013528309A (en) 2010-06-08 2013-07-08 フェニックス コンタクト ゲゼルシャフト ミット ベシュレンクテル ハフツング ウント コンパニー コマンディートゲゼルシャフト Electrical device with plug-type connector and electrical plug-type connector
WO2011154127A1 (en) 2010-06-08 2011-12-15 Phoenix Contact Gmbh & Co. Kg Electrical device with a plug-type connector and electrical plug-type connection
US20130163679A1 (en) 2010-09-10 2013-06-27 Dong-Qing Zhang Video decoding using example-based data pruning
US20130170558A1 (en) 2010-09-10 2013-07-04 Thomson Licensing Video decoding using block-based mixed-resolution data pruning
US20130170746A1 (en) 2010-09-10 2013-07-04 Thomson Licensing Recovering a pruned version of a picture in a video sequence for example-based data pruning using intra-frame patch similarity
US20130163676A1 (en) 2010-09-10 2013-06-27 Thomson Licensing Methods and apparatus for decoding video signals using motion compensated example-based super-resolution for video compression
US20120155766A1 (en) 2010-12-17 2012-06-21 Sony Corporation Patch description and modeling for image subscene recognition
US20140036054A1 (en) 2012-03-28 2014-02-06 George Zouridakis Methods and Software for Screening and Diagnosing Skin Lesions and Plant Diseases
US20140056518A1 (en) 2012-08-22 2014-02-27 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and program

Non-Patent Citations (102)

* Cited by examiner, † Cited by third party
Title
Barreto, D. et al., "Region-Based Super-Resolution for Compression", Multidimensional Systems and Signal Processing, Special Issue on papers presented at the I International Conference in Super Resolution (Hong Kong, 2006), vol. 18, No. 2-3, pp. 59-81, Sep. 2007.
Ben-Ezra, M. et al., "Video Super-Resolution Using Controlled Subpixel Detector Shifts", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, No. 6, Jun. 2005, pp. 977-987.
Bertalmio et al., "Image Inpainting", Proceedings of SIGGRAPH 2000, New Orleans, USA, Jul. 2000, pp. 1-8.
Bhagavathy et al., "A Data Pruning Approach for Video Compression Using Motion-Guided Down-Sampling and Super-Resollution", submitted to ICIP 2010, pp. 1-4.
Bishop et al., "Super-resolution Enhancement of Video," Proceedings of the 9th Int'l. Workshop on Artificial Intelligence and Statistics, Jan. 3, 2003, pp. 1-8, Society for Artificial Intelligence and Statistics, Key West, Florida.
Black, M. et al., "The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields", Computer Vision and Image Understanding, vol. 63, No. 1, 1996, pp. 75-104.
Cheng et al., "Reduced Resolution Residual Coding for H.264-based Compression System," Proceedings of the 2006 IEEE Int'l. Symposium on Circuits and Systems (ISCAS 2006), May 21, 2006, pp. 3486-3489.
CN Search Report for Related CN Application 2011800432940 dated Jul. 28, 2015.
CN Search Report for Related CN Application 2011800437234 dated Sep. 16, 2015.
CN Search Report for Related CN Application 201180053976.X dated Sep. 23, 2015.
CN Search Report for Related CN Application 201180054419X dated Sep. 8, 2015.
CN Search Report for Related CN Application No. 201180006921.3 dated Nov. 21, 2014.
CN Search Report for Related CN Application No. 2011800153355 dated Nov. 22, 2014.
CN Search Report for Related CN Application No. 2011800432758 dated Sep. 23, 2015.
CN Search Report for Related CN Application No. 2011800435953 dated Aug. 18, 2015.
CN Search report for Related CN Application No. 201180054405.8 dated Nov. 30, 2015.
Dorr et al., "Clustering Sequences by Overlap", International Journal Data Mining and Bioinformatics, vol. 3, No. 3, 2009, pp. 260-279.
Dumitras et al., "A Texture Replacement Method at the Encoder for Bit-Rate Reduction of Compressed Video", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, No. 2, Feb. 2003, pp. 163-175.
Dumitras et al., "An Encoder-Decoder Texture Replacement Method with Application to Content-Based Movie Coding", IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, No. 6, Jun. 2004, pp. 825-840.
Fischler, M. et al., "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography", Communications of the ACM, Jun. 1981, vol. 24, No. 6, pp. 381-395.
Freeman et al., "Example-based Super-Resolution", IEEE Coomputer Graphics and Applications, Mar./Apr. 2002, pp. 56-65.
Han et al., "Rank-based Image Transformation for Entropy Coding Efficiently", Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science (ICIS'05), IEEE 2005.
International Search Report for Corresponding Appln. PCT/US2011/050915 dated Jul. 30, 2012.
International Search Report for Corresponding Appln. PCT/US2011/050922 dated Jan. 4, 2012.
International Search Report for Corresponding Appln. PCT/US2011/050925 dated Jan. 6, 2012.
International Search Report for Corresponding International Appln. PCT/US2011/050921 dated Jan. 4, 2012.
International Search Report for Corresponding International Appln. PCT/US2011/050923 dated Jan. 5, 2012.
International Search Report for International Application PCT/US11/050924 dated Jan. 5, 2012.
ISR for related International Application No. PCT/US2011/000107 dated Apr. 20, 2011.
ISR for related International Application No. PCT/US2011/000117 dated Apr. 29, 2011.
ISR for related International Application No. PCT/US2011/050913 dated Jul. 30, 2012.
ISR for related International Application No. PCT/US2011/050917 dated Jan. 5, 2012.
ISR for related International Application No. PCT/US2011/050920 dated Jan. 4, 2012.
ISR for related International Application PCT/US2011/050919 dated Jan. 4, 2012.
ISR for related International Patent Application No. PCT/US2011/050918 dated Jan. 5, 2012.
Itu-T, H.264, "Series H: Audiovisual and Multimedia Systems, Infrastructure of audiovisual services-Coding of moving video", "Advanced Video Coding for Generic Audiovisual Services", ITU-T Recommendation H.264, Mar. 2005, 343 pages.
Komodakis et al., "Image Completion Using Efficient Belief Propagation Via Priority Scheduling and Dynamic Pruning", IEEE Transactions on Image Processing, vol. 16, No. 11, Nov. 1, 2007, pp. 2649-2661.
Krutz et al., Windowed Image Registration for Robust Mosaicing of Scenes with Large Background Occlusions, ICIP 2006, vol 1-7, IEEE, 2006, pp. 353-356.
Lee et al., "Robust Frame Synchronization for Low Signal-to-Noise Ratio Channels Using Energy-Corrected Differential Correlation", EURASIP Journal on Wireless Communications and Networking, vol. 2009 (2009), Article ID 345989, online May 17, 2010, 8 pages.
Li et al., "Example-Based Image Super-Resolution with Class-Specific Predictors", Journal of Visual Communication and Image Representation, vol. 20, No. 5, Jul. 1, 2009, pp. 312-322.
Liu et al., "Intra Prediction via Edge-Based Inpainting", IEEE 2008 Data Compression Conference, Mar. 25-27, 2008, pp. 282-291.
Lowe, D., "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, vol. 2, No. 60, 2004, pp. 91-110.
Moffat et al., "Chapter 3. Static Codes," Compression and Coding Algorithms, Feb. 2002, pp. 29-50.
Ndjiki-Nya, P. et al., "A Generic and Automatic Content-Based Approach for Improved H.264/MPEG4-AVC Video Coding", IEEE International Conference on Image Processing (ICIP), 2005, Image Processing Department, FhG Heinrich-Hertz-Institut (HHI), Berlin, Germany.
Non-Final Office Action for related U.S. Appl. No. 13/522,024 dated Mar. 27, 2015.
Non-Final Office Action for related U.S. Appl. No. 13/820,901 dated May 5, 2015.
Non-Final Office Action for related U.S. Appl. No. 13/821,083 dated Jul. 16, 2015.
Non-Final Office Action for related U.S. Appl. No. 13/821,257 dated Aug. 19, 2015.
Non-Final Office Action for related U.S. Appl. No. 13/821,270 dated Jul. 16, 2015.
Non-Final Office Action for related U.S. Appl. No. 13/821,283 dated Aug. 17, 2015.
Non-Final US Office Action for related U.S. Appl. No. 13/821,357 dated Aug. 13, 2015.
Notice of Allowance for U.S. Appl. No. 13/821,393 Dated Mar. 18, 2016.
Park, S.C. et al., "Super-Resolution Image Reconstruction: A Technical Overview", IEEE Signal Processing Magazine, IEEE Service Center, Piscataway, NJ, US, vol. 20, No. 3, May 1, 2003, pp. 21-36, XP011097476.
PCT International Search Report Mailed: Apr. 20, 2011.
Porikli et al., "Compressed Domain Video Object Segmentation", IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, No. 1, Jan. 2010, pp. 1-14.
Sawhney, H. et al. "Hybrid Sereo Camera: An IBR Appoach or Synthesis of Very High Resoluton Stereoscopic Image Sequences", Proc. SIGGRAPH, pp. 451-460, 2001, Vision Technologies Lab., Sarnoff Corp.
Schuster et al., "An Optimal Polygonal Boundary Encoding Scheme in the Rate Distortion Sense", IEEE Transactions on Image Processing, vol. 7, No. 1, Jan. 1998, pp. 13-26.
Segail, C. A. et al., "High-Resolution Images from Low-Resolution Compressed Video", IEEE Signal Processing Magazine, IEEE Service Center, Piscataway, NJ, US, vol. 20, No. 3, May 1, 2003, pp. 37-48, XP011097477.
Sermadevi et al., "Efficient Bit Allocation for Dependent Video Coding", Proceedings of the Data Compression Conference (DCC'04), IEEE, 2004.
Shen et al., "Optimal Pruning Quad-Tree Block-Based Binary Shape Coding", IEEE Proceedings 2007, International Conference on Image Processing, ICIP, 2007, pp. V1-437-V1-440.
Shimauchi Kazuhiro, "JPEG Based Image Compression Using Adaptive Multi Resolution Conversion," The 17th Workshop on Circuits and Systems in Karuizawa, The Institute of Electronics, Information and Communication Engineers, Apr. 27, 2004, pp. 147-152.
Shimauchi, et al., "JPEG Based Image Compression Using Adaptive Multi Resolution Conversion," The 17th Workshop On Circuits and Systems in Karuizawa. The Institute of Electronics, Information and Communication Engineers, pp. 147-152, Apr. 27, 2004.
Smolic, A. et al., "Improved Video Coding Using Long-Term Global Motion Compensation", Proceedings of SPIE, SPIE, USA, vol. 5308, No. 1, Jan. 22, 2004, pp. 343-354, XP008046986.
Sun et al., "Classified Patch Learning for Spatially Scalable Video Coding", Proceedings of the 16th IEEE International Conference on Image Processing, Nov. 7, 2009, pp. 2301-2304.
Symes, "Digital Video Compression," McGraw-Hill, ISBN 0-07-142487, pp. 116-121 and 242-243.
Torr, P. et al., "MLESAC: A New Robus Estimator wth Applicaton to Esimatng Image Geometry", Journal of Computer Vision and Image Understanding, vol. 78, No. 1, 2000, pp. 138-156.
US Final Office for U.S. Appl. No. 13/821,270 Dated Feb. 26, 2016.
US Non-Final Office Action for U.S. Appl. No. 13/820,901 Dated May 18, 2016.
US Non-Final Office Action for U.S. Appl. No. 13/821,130 Dated Jul. 11, 2016.
US Non-Final Office Action for U.S. Appl. No. 13/821,436 Dated Jul. 11, 2016.
US Notice of Allowance of Allowance for U.S. Appl. No. 13/522,024 dated Mar. 14, 2016.
US Notice of Allowance of Allowance for U.S. Appl. No. 13/821,424 dated Mar. 14, 2016.
US Office Action for Related U.S. Appl. No. 13/820,901 dated Dec. 18, 2015.
US Office Action for Related U.S. Appl. No. 13/821,078 dated Jan. 13, 2016.
US Office Action for Related U.S. Appl. No. 13/821,078 Dated Jun. 5, 2015.
US Office Action for Related U.S. Appl. No. 13/821,083 dated Jan. 29, 2016.
US Office Action for Related U.S. Appl. No. 13/821,130 dated Jan. 14, 2016.
US Office Action for Related U.S. Appl. No. 13/821,130 Dated Jun. 16, 2015.
US Office Action for Related U.S. Appl. No. 13/821,257 dated Dec. 21, 2015.
US Office Action for Related U.S. Appl. No. 13/821,283 dated Dec. 22, 2015.
US Office Action for Related U.S. Appl. No. 13/821,357 dated Dec. 21, 2015.
US Office Action for Related U.S. Appl. No. 13/821,393 dated Dec. 11, 2015.
US Office Action for Related U.S. Appl. No. 13/821,393 Dated Jul. 10, 2015.
US Office Action for Related U.S. Appl. No. 13/821,436 Dated Jun. 18, 2015.
US Office Action for Related U.S. Appl. No. 13/821,436 dated Nov. 25, 2015.
Vo, D. T. et al., "Data Pruning-Based Compression Using High Order Edge-Directed Interpolation", Thomson Research Technical Report, submitted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009, Video Processing Laboratory, UC San Diego, CA 92092, Thomson Inc. Corporate Research, Princeton, NJ USA.
Vu et al., "Efficient Pruning Schemes for Distance-Based Outlier Detection", Springer Verlag, Proceedings European Conference 2009, pp. 160-175.
Wiegand et al., "Overview of the H.264/AVC Video Coding Standard", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, No. 7, Jul. 2003, pp. 560-576.
Wu et al., Image Compression by Visual Pattern Vector Quantization (VPVQ), Proceedings of the 2008 Data Compression Conference, Mar. 25, 2008, pp. 123-131.
Xiong et al., "Block-Based Image Compression with Parameter-Assistant Inpainting", IEEE Transactions on Image Processing, vol. 19, No. 6, Jun. 2010, pp. 1651-1657.
Xu et al., Probability Updating-based Adaptive Hybrid Coding (PUAHC), ISCAS 2006, IEEE 2006, pp. 361-364.
Yap et al., "Unsupervised Texture Segmentation Using Dominant Image Modulations", IEEE Conference Recordings of the 34th Asilomar Conference on Signals, Systems and Computers, IEEE 2000, pp. 911-915.
Zhang et al, "Video Decoding Using Block-based Mixed-Resolution Data Pruning", Invention Disclosure, Mar. 2010.
Zhang et al, "Video Decoding Using Blocked-Based Mixed-Resolution", Invention Disclosure, Mar. 2010.
Zhang et al., "A Pattern-based Lossy Compression Scheme for Document Images," Electronic Publishing, vol. 8, No. 2-3, Sep. 24, 1995, pp. 221-233.
Zhang et al., "Example-Based Data Pruning for Improving Video Compression Efficiency", Invention Disclosure, Apr. 2010.
Zhang et al., "Method and Apparatus for Data Pruning for Video Compression Using Example-Based Super-Resolution" Invention Disclosure, Apr. 2010.
Zhang et al., "Method and Apparatus for Data Pruning for Video Compression Using Example-Based Super-Resolution", Invention Disclosure, Apr. 2010.
Zhang et al., "Segmentation for Extended Target in Complex Backgrounds Based on Clustering and Fractal", Optics and Precision Engineering, vol. 17, No. 7, Jul. 2009, pp. 1665-1671.
Zheng et al., "Intra Prediction Using Template Matching with Adaptive Illumination Compensation", ICIP 2008, IEEE 2008, pp. 125-128.
Zhu, C. et al., "Video Coding With Spatio-Temporal Texture Synthesis and Edge-Based Inpainting", IEEE International Conference on Multimedia and Expo (ICME), 2008, University of Science and Technology of China, Hefei, 230027, China, Microsoft Research Asia, Beijing, 100080, China, pp. 813-816.
Zhu, C. et al., "Video Coding With Spatio-Temporal Texture Synthesis", IEEE International Conference on Multimedia and Expo (ICME), 2007, University of Science and Technology of China, Hefei, 230027, China, Microsoft Research Asia, Beijing, 100080, China, pp. 112-115.

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10349069B2 (en) * 2012-12-11 2019-07-09 Sony Interactive Entertainment Inc. Software hardware hybrid video encoder
US10401143B2 (en) * 2014-09-10 2019-09-03 Faro Technologies, Inc. Method for optically measuring three-dimensional coordinates and controlling a three-dimensional measuring device
US20180139362A1 (en) * 2016-11-11 2018-05-17 Industrial Technology Research Institute Method and system for generating a video frame
US10200574B2 (en) * 2016-11-11 2019-02-05 Industrial Technology Research Institute Method and system for generating a video frame
US10825206B2 (en) * 2018-10-19 2020-11-03 Samsung Electronics Co., Ltd. Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image
US20210358083A1 (en) 2018-10-19 2021-11-18 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
US11663747B2 (en) 2018-10-19 2023-05-30 Samsung Electronics Co., Ltd. Methods and apparatuses for performing artificial intelligence encoding and artificial intelligence decoding on image
US11688038B2 (en) 2018-10-19 2023-06-27 Samsung Electronics Co., Ltd. Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image
US11748847B2 (en) 2018-10-19 2023-09-05 Samsung Electronics Co., Ltd. Method and apparatus for streaming data
US11395001B2 (en) 2019-10-29 2022-07-19 Samsung Electronics Co., Ltd. Image encoding and decoding methods and apparatuses using artificial intelligence
US11405637B2 (en) 2019-10-29 2022-08-02 Samsung Electronics Co., Ltd. Image encoding method and apparatus and image decoding method and apparatus

Also Published As

Publication number Publication date
US20120294369A1 (en) 2012-11-22
WO2011090790A8 (en) 2012-08-30
KR101789845B1 (en) 2017-11-20
WO2011090790A1 (en) 2011-07-28
JP5911809B2 (en) 2016-04-27
EP2526698A1 (en) 2012-11-28
KR20120118477A (en) 2012-10-26
CN102823242A (en) 2012-12-12
JP2013518463A (en) 2013-05-20
CN102823242B (en) 2016-08-10

Similar Documents

Publication Publication Date Title
US9602814B2 (en) Methods and apparatus for sampling-based super resolution video encoding and decoding
CN111741289B (en) Method and apparatus for processing cube face images
JP5535625B2 (en) Method and apparatus for adaptive reference filtering
US8135234B2 (en) Method and apparatus for edge-based spatio-temporal filtering
WO2017125030A1 (en) Apparatus of inter prediction for spherical images and cubic images
EP2304958B1 (en) Methods and apparatus for texture compression using patch-based sampling texture synthesis
KR101906614B1 (en) Video decoding using motion compensated example-based super resolution
EP2596636A1 (en) Reference processing using advanced motion models for video coding
JP6042899B2 (en) Video encoding method and device, video decoding method and device, program and recording medium thereof
US20230069953A1 (en) Learned downsampling based cnn filter for image and video coding using learned downsampling feature
US9420291B2 (en) Methods and apparatus for reducing vector quantization error through patch shifting
EP3038370A1 (en) Devices and method for video compression and reconstruction
CN115552905A (en) Global skip connection based CNN filter for image and video coding
CN116114246B (en) Intra-frame prediction smoothing filter system and method
WO2024006167A1 (en) Inter coding using deep learning in video compression
WO2013105946A1 (en) Motion compensating transformation for video coding
EP2981086A1 (en) Video encoding device, video decoding device, video encoding method, video decoding method, and program

Legal Events

Date Code Title Description
AS Assignment

Owner name: THOMSON LICENSING, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BHAGAVATHY, SITARAM;LLACH, JOAN;ZHANG, DONG-QING;REEL/FRAME:028600/0501

Effective date: 20100621

AS Assignment

Owner name: THOMSON LICENSING DTV, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THOMSON LICENSING;REEL/FRAME:041370/0433

Effective date: 20170113

AS Assignment

Owner name: THOMSON LICENSING DTV, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THOMSON LICENSING;REEL/FRAME:041378/0630

Effective date: 20170113

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: INTERDIGITAL MADISON PATENT HOLDINGS, FRANCE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THOMSON LICENSING DTV;REEL/FRAME:046763/0001

Effective date: 20180723

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4