WO2001061650A1 - 3d image processing system and method - Google Patents

3d image processing system and method Download PDF

Info

Publication number
WO2001061650A1
WO2001061650A1 PCT/GB2001/000639 GB0100639W WO0161650A1 WO 2001061650 A1 WO2001061650 A1 WO 2001061650A1 GB 0100639 W GB0100639 W GB 0100639W WO 0161650 A1 WO0161650 A1 WO 0161650A1
Authority
WO
WIPO (PCT)
Prior art keywords
subject
image
representation
probability
camera
Prior art date
Application number
PCT/GB2001/000639
Other languages
French (fr)
Inventor
Michael Turner
Simon Moss
Paul Zanelli
Original Assignee
Pc Multimedia Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/GB2000/000492 external-priority patent/WO2000049527A1/en
Priority claimed from GB0020741A external-priority patent/GB0020741D0/en
Application filed by Pc Multimedia Limited filed Critical Pc Multimedia Limited
Priority to AU2001233865A priority Critical patent/AU2001233865A1/en
Publication of WO2001061650A1 publication Critical patent/WO2001061650A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes

Definitions

  • the present invention relates to image processing, and m particular to a method and system for generating a tnree dimensional representation cf a subject from two dimensional vi ⁇ eo images of the suoject.
  • the present invention also relates to recognising a subject in video images, the manipulation of a subject m video images and the generation and enco ⁇ ed transmission of video images including three dimensional representations of a suDject.
  • a subject three dimensional (3D) object 110 has a 3D surface 112.
  • Three camera positions 114, 116 and 118 are shown, each having a camera centre 120 and a corresoondmg image plane 122.
  • the cameras can each be considered to be acting as pinhole cameras, with the camera centre at the pinhole.
  • a lens system to focus light on the image plane is typically provided .
  • a camera captures a sequence of images of a subject object taken from different viewpoints.
  • a matching system then identifies matching image regions ("correspondences") m pairs of captured images. This is commonly achieved through a feature extraction and template matching process.
  • These correspondences are then passed on to a geometry-based system, which estimates the relative camera parameters. Given the matched regions in the images and the camera parameters, a reconstruction engine can then reconstruct 3D surface points by back-projecting rays from the matched regions through the camera centres for cameras having the estimated camera parameters.
  • Such systems can be classified according to whether image correspondences or the camera parameters are known a priori. If both are available this greatly simplifies the problem since reconstruction becomes simply a case of back-projecting rays.
  • the camera parameters are known but correspondences are not, and must be estimated from the images.
  • the present invention is concerned with uncalibrated systems in which neither the camera geometry nor the image correspondences are available to the system before processing and must be identified as part of the reconstruction process. There are problems associated with available uncalibrated systems. They are slow and they may wor ⁇ satisfactorily on only one or two carefully selected objects. In general, they give poor reconstructions.
  • the present invention relates to a new approach to 3D reconstruction which is fast and gives good reconstructions under a wide range of realistic conditions.
  • a method of constructing a three dimensional representation of a subject comprising the steps of:
  • a 3D representation of the subject can be generated by eliminating cells in a volume of space enclosing the reconstructed object and retaining cells enclosing cr on the surface of the reconstructed object so that the surface cf the subject is generated.
  • a subject is considered to included single items, a collections of items, with and without backgrounds, scenes and any other image, real or imaginary, that can be capture ⁇ or created.
  • the method can create a representation of the whole or the part, or parts, of the subject .
  • the method includes the step of comparing image regions in the at least two images, to determine whether the image regions match.
  • the method can include the step of repeating steps (IV) to (vn) for all cells located between the periphery of the volume and the surface of the representation of the subject. As cells are only retained if they are on the surface of the ob j ect, those cells within the surface of the object do not need to be eliminated. 3
  • the method can also include the step of eliminating those cells which lie on an axis of a back projection from a cell determined not to be on the surface of the represe- tation to an image plane, and between the cell and the perip h ery of the volume.
  • the method can include the further steps of: sub-dividing a cell into smaller sub cells whic" span the original cell; determining a new upper bound; determining a new threshold probability; and eliminating those sub cells having an upper bour ⁇ less than the threshold probability so as to generate a more accurate surface of the representation of the subject.
  • a coarse set of cells can be used to span the volume. After the non-surface cells have been el ⁇ r_nated, the remaining surface cells can be sub-divided so as to provide a finer scale for definition of the surface features of the subject. This allows a more accurate representation to be generated. Further, the method is still computationally tractable, as a large amount of the possible volume can be eliminated by the initial coarse cell processing step, rather than using a fine cell size in the initial step.
  • the metho ⁇ can include the further steps of:
  • the method allows the method to generate a reconstruction when the camera parameters are not known, or not all known, a pri ori .
  • the number of variables being calculated over by the method can be decreased by eliminating from all possible camera parameter sets, those sets of camera parameters corresponding to image locations which are not considered to include sufficient surface features of the subject being assessed. Reducing the number of possible camera positions that need to be considered m this way reduces the amount of processing that is required m order to determine plausible surface cells .
  • a set of camera parameters includes translation, rotation and internal camera parameters.
  • the step of eliminating camera parameter steps is iterated, so as to identify cells enclosing a surface of the representation best matching the subject or a set of surfaces of the representation best matching subject.
  • the step of eliminating camera parameters can be iterated until a surface best matching the subject has been generated, or until a set or sets of surfaces overlapping the representation have been generated.
  • the upper bound is determined using Bayesian probability theory.
  • an image processing system for generating a 3D representation of a subject from at least two images, in which each image is a different view of the subject, and including data processing means in communication with a storage device storing image data representing the at least two images, the data processing means operating on the stored image data to; (I) define a volume enclosing possible surfaces of the subject and being made up from a plurality of cells; (ii) calculate an upper bound on tne probability that a one of the plurality of the cells encloses the surface of the representation of the subject; (m) determine a threshold probability;
  • a method for identifying a subject from a video signal including different views of the subject including constructing a three dimensional representation of the subject from the video signal by using the method according to the first aspect of the invention, and comparing the 3D representation with representations of -mown sub ects .
  • the representations of known subjects can be 2D images or can be parameters of the known subjects which can be compare ⁇ with parameters determined from the 3d representation, such as the ratio of the width and length of a subject.
  • the representations of known subjects have been constructed according to the method of the first aspect of the invention.
  • the identification then becomes a process of comparing the similarity of the shapes of the wholes or parts of the representations to determine the best match.
  • a method for tracking a subject in a video signal including views of the subject, and including the step of matching a 3D representation of the subject created according to the first aspect of the invention with 2D video images derived from the video signal.
  • a method for altering a video image including a sub j ect including identifying the subject according to a previous aspect of the invention, and replacing the image of the subject in the video signal with an altered image of the sub j ect derived from the representation of the subject. For instance the colour or other aspect of the surface decoration of the subject in the video signal can be altered.
  • a method for altering a video image inducing a subject including identifying the subject according to the method according to a previous aspect of the invention, and replacing the image of the subject in the video signal with an image derived from a representation of a different s ⁇ oject. In this way an entirely different subject can be used m place of the original subject in the video signal.
  • a method for generating 3D video signals, for a 3D television system including capturing images of a subject with three or more cameras placed around the subject and constructing a representation of the subject according to the first aspect of the invention, for each of a sequence of time steps.
  • a moving 3D TV image can then be generated cy displaying the 3D representations in time sequence.
  • tnere is provided a method for transmitting a video signal if a subject, the method including generating a 3D representation of the subject according to the first aspect of the invention and transmitting camera position data so as to generate different views of the subject.
  • a method for transmitting a video signal of a sub j ect including generating 3D representations of the subject according to the rirst asoect at the mventic-, and transmitting data relating to the differences between tne 3D representations cf the subject.
  • Figure 1 shows a diagram illustrating the capture of two dimensional video images of a three dimensional subject object
  • Figure 2 shows a flow chart illustrating a prior art 3D reconstruction method
  • Figure 3 shows a flow chart illustrating in general terms a 3D reconstruction method according to the present invention
  • Figure 4 shows a 3D reconstruction system according to an aspect of the present invention
  • Figure 5 shows a flow chart illustrating the 3D reconstruction method of the present invention in more detail
  • Figure 6 shows an illustration of the steps in the method of reducing the camera parameter phase space for pairs of images and constructing the corresponding candidate surface locations
  • Figure 7 shows an illustration of the steps of eliminating cells and identifying cells enclosing the surface of the representation of the subject, by the comparison cf image regions
  • Figure 8 shows an illustration of the image capture step in greater detail
  • Figure 10 shows a matching system
  • Figure 11 shows a system for identifying and/or manipulating video images
  • Figure 12 shows a 3D video signal transmission system
  • Figure 13 shows 2D images of different views of a subject
  • Figure 14 shows different views of a 3D representation of the subject shown in Figure 13 reconstructed according to the method of the invention.
  • the methods and systems of the present invention are based on a new philosophy to pattern recognition, which is based upon three key conditions:
  • the method requires that all regions of the solution space be assessed.
  • Processing is resource-driven such that the calculations that can be performed are constrained by the memory available and the speed of operations required, as defined by the operator.
  • the approach uses the key conditions as fcllov.s.
  • an upper bound probability for regions of the solution stace is defined.
  • regions with lew upper bounds are eliminated, and then effort is re-applied to those regions that remain.
  • the size of the regions covering the remaining space can be reduced without compromising resources, and more accurate upper bounds can be evaluated.
  • good solutions are identified through a process of exclusion.
  • the general method of the invention is illustrated by the flow chart 300 shown in Figure 3.
  • the approach is applied simultaneously to two modules: a geometry engine 310 for assessing hypothesised camera parameters and a reconstruction engine 320 for analysing hypothesised 3D surface structure.
  • An important aspect of the invention is that all plausible geometric and surface hypotheses are examined, processing being a task of eliminating implausible hypotheses so as to hone m on tr.e best solution through a process of exclusion.
  • 2D image data of a first view 302 of the subject object or scene and 21 image data of a second different view 304 are processed cy the method to arrive at the 3D reconstruction.
  • the geometry interpreter 310 evaluates possible camera parameters based upon the currently possible 3D surface structures and the image data. In brief, it eliminates parameters that are not consistent with the visible surface and the image data.
  • processing is a task of eliminating outermost cells in the volume if they cannot plausibly contain a surface. That is, implausible volume is cut away from the original volume in a manner analogous to a sculptor chipping away stone to reveal the finished object.
  • Processing 330 is then a task of eliminating implausible parameters and 3D cells and seeing how this affects the system. This is an iterative process. For example, elimination of an unlikely set of camera parameters m itself leads to the elimination of certain cells since the cells were dependent upon these parameters for their existence in the first place. Likewise, eliminating part of the 3D volume affects the support for certain camera parameters since they may no longer be consistent with both scene and image data.
  • a system 400 for generating a three dimensional representation of a subject object 410 includes a digital camera 420 movable relative to the object 410 to capture images showing different views of the object.
  • the camera is connected to a computer 430 which stores image data from the camera and processes it to generate data providing the three dimensional representation of the re-constructed object.
  • the computer operates under control of a computer program which implements the method described herein. The specific details of a suitable computer program is considered to be within the ability of a man of ordinary skill in that art in light of tnis description and so has not been described in detail.
  • Figure 5 shows a flow chart 500 illustrating steps of the method in more detail.
  • the camera 420 is used to capture two dimensional images of the subject object 410 as seen from different views. It will be appreciated that all that is required is relative re-orientation between the object and camera.
  • the object may be stationary and the camera mobile, the camera stationary and the object mobile or both camera and object may move. Further, sufficient 2D images from difference views must be captured so as to cover all features of the object that it is desired to re-construct.
  • the object may be mounted on a turntable and rotated relative to a static camera mounted on a tri-pod.
  • Images II, 12 and 13 are two dimensional images generated by the camera 420.
  • the camera image data from the digital camera ⁇ 20 are converted into bitmap file format and stored as seoarate bitmap data files on computer 420.
  • a series ol five 2D bitmap images taken from different views of an example object are shown m Figure 13.
  • the 2D images II, 12 and 13 are filtered at step 530 by applying colour, texture and edge filters to the data. Pairs of filtered images, II and 12, II and 13 and 12 and 13 are compared to identify characteristics of small regions of the respective images.
  • a set of candidate correspondences (le plausible images of the same surface feature) can be generated 550 from, and for, each of the pairs of images, as follows.
  • Figure 6 schematically illustrates step 550 of flow chart 500.
  • a volume 610 is defined within which the 3D reconstruction 615 of subject object 410 is to be constructed.
  • snown m Fig 6 is a pnase space diagram 620 illustrating a set of possible camera parameters that has been determined from the candidate correspondences derived from images II and 12. The entire area enclosed by the perimeter indicates all possible camera parameter sets. The shaded regions indicate camera parameter sets that are impossible in respect of the different views shown in II and 12. For instance if II and 12 are related by a rotation in a flat 2 dimensional plane, then no rotation of the camera out of that plane can be possible. As such, all camera parameters sets including such rotations out of the plane can be excluded.
  • the unshaded area represents the possible camera parameter sets remaining.
  • 630 shows the phase space diagram for the set of camera parameters related to the pair of images II and 13. From the set of candidate correspondences generated at step 530 for potentially matching image regions in images II and 13, it is has been possible to exclude some camera parameters (shown in shading) from the set of all possible camera parameters.
  • Candidate surfaces are created within volume 610.
  • a local region of phase space 625 for images II and 12 is selected and the candidate correspondences for that set of camera parameters for that pair of images is used to generate a region 628 of the possible surface locations of the reconstructed object.
  • Regions 715 and 725 of images II ano 12 have been compared and found to include a sufficiently similar content: le a correspondence.
  • image regions 745 and 750 of irrages II and 12 have been determined to match sufficiently to constitute a candidate correspondence and the intersection 629 of their back projected rays through the respective camera centres is identified as a possible surface location.
  • the region 628 is generated by determining the intersection of back projected rays for a particular correspondence for all possible camera image positions, as indicated by dashed lines 714 and 724 in Figure 6, so as to cover all the camera parameter sets corresponding to that region of phase space.
  • the process is carried out for all the non-excluded regions of phase space so that all possible camera parameters for ad possible correspondences in images II and 12 have been considered, thereby generating a volume of possible surface locations 640.
  • the result of this is a volume of candidate surface locations 640 which are considered to be possible features cf the surface of the object ceing re-constructed 615.
  • reconstruction 615 has not yet been generated but _s shown to highlight the overlap cetween the volume of poss ⁇ r_e object surface locations 640 and the actual surface which _s to be re-constructed. )
  • a similar process is carried out for the set of candidate correspondences and camera parameters determined for image pair II and 13 and a further 3D volume of possible reconstructed odject surface points 650 is generated.
  • the process is repeated fcr all pairs of images being considered so as to create volumes of candidate surface locations which enclose the object to be re-constructed.
  • the exclusion of certain camera parameter values inherently leads to a reduction in the volume of possible object surface points.
  • implausible candidate surface points are eliminated m step 560 in the following manner as described w th particular reference to Figure 7.
  • the volume 610 is spanned by a number of cells 705.
  • a cell ⁇ d existing at a candidate object surface location 627 for the pair of images II and 12 is considered.
  • a theoretical light ray is projected from cell 710 through the camera centre onto the image 12.
  • the image region 725 which corresponds to that cell 710 for the camera in that position, is determined from the 2D images stored by the computer. As explained previously, there are a certain number of camera parameters and therefore camera positions at which image 12 was acquired.
  • a one of the possible camera positions for image 12 from the possible set cf camera parameters is shown.
  • a light ray 720 is projected from the cell 710 through the camera centre and the region cf image 12 725 corresponding therewith is determined. The degree of similarity of the image regions is then assessed.
  • the image regions from image II and image 12 are both passed through a texture filter and compared. If the filtered image regions are considered to match within a specified low threshold, e.g. 10%, then the image regions are both passed through a colour filter and again their degree of match compared to a IC ⁇ threshold value. If the image regions are still considered to match, then they are passed through an edge filter and the degree of match between the filtered images again compared with a low threshold. If the filtered image regions pass all three match criteria, tnen the cell is considered to oe still a plausible surface feature of the obj ect .
  • a specified low threshold e.g. 10%
  • the cell is still plausible.
  • Cell 710 is then verified with respect to image 13 for all possible parameter sets.
  • a light ray is projected from cell 710 through camera centre 752 and onto image region 735.
  • the content of image region 735 is filtered and its degree cf match with image region 715 determined as described above. In this case, as the image regions are insufficiently similar for any possible camera parameters, cell 710 can be eliminated as not corresponding to a plausible surface feature of the reconstructed object.
  • the cell 710 is considered not to lie on the surface of the object and so can be eliminated. Further, as the cell has been hypothesised as lying on the surface of the object being re-constructed, all those cells 730 lying on the set of light rays extended from the cell to the camera positions considered may similarly be eliminated.
  • Fig 7 this has been represented schematically by a volume 730 being eliminated from the initial volume 610.
  • the cells that may be eliminated will have a more complex shape, reflecting the degrees of freedom of the camera parameters that have been considered.
  • the elimination of these cells has a concomitant reduction m the set of possible camera parameters.
  • the procedure is repeated for all cells falling within the volume of the candidate surface locations 640.
  • cell 740 The image region of image II corresponding to that point in the three dimensional volume 610 is determined by pro j ecting a light ray onto image II and the image content for that region 745 determined from the bit map images stored m the computer.
  • the image region in image 12 corresponding to cell 740 in the volume 610 is determined for the set of possible camera parameters for image 12 and individually compared with those for image II.
  • the individual image regions 745 and 750 are filtered as previously and the degree of match compared. In this case, the image regions 745 and 750 match sufficiently for the cell to be considered to be on or enclose the surface of the object and therefore to constitute a part of the surface cf the reconstructed object.
  • cell 740 as an candidate surface point of the reconstructed object is verified by comparing the image region for image II with image region ⁇ 55 for image 13, via the filtering and matching procedure discussed above. This is possible because the volume of candidate surface location 650 for the images II and 13 overlaps with that for the pair of images II and 12. Hence, the surface feature of the original ob j ect 410, which cell 740 reconstructs, is present in both images II and 13 as originally captured. This would not be the case if, for instance image 13 had been captured by the camera viewing an opposite side of the object which view would not show the surface point 740 being reconstructed. Further verifications can be carried out using different images and different sets of possible camera parameters as required.
  • the procedure is repeated for all cells falling within the volume 640 of possible surface locations within volume 610 until tne implausible cells have been eliminated and those ceils forming the surface of the reconstructed object ha-e oeen identified.
  • the set of camera parameters which could result in the reconstructed object surface reduces to a singularity as shown in phase space diagram 760 such that the camera parameters are also umque.y identified.
  • the process is then repeated for the pair cf camera images II and 13 and the set of camera parameters relating to images II and 13.
  • the entire process is then repeated again for the pair of camera images 12 and 13 and sets of camera parameters relating to the pair of images 12 and 13.
  • the process of eliminating candidate cells from tne correspondences between images 12 and 13 can result in tne elimination of cells previously identified from Ii and 12 as Pemg likely parts of the surface of the reconstructed co ect, thereby improving the accuracy of the re-construction
  • the entire procedure can be repeated for a finer scale of cells spanning the cells identified as enclosing the re-constructed object surface, and using a smaller volume 610 encompassing only the reconstructed object surface volume identified by the first iteration and using smaller image regions in the image matching step, as the available processing power allows.
  • Figure 14 shows images of different views of a reconstruction of the object shown m Figure 13 as obtained using the method and system described.
  • Figure 8 illustrates the 2D image capture step 510 in more detail.
  • the object 410 is stationary and the camera is rotated about the vertical axis of the object m a two dimensional plane.
  • An image II is captured at base posicicr II and the camera is rotated through small angle steps with a sequence of images, 12, 13.... In being captured. It is important to ensure that the step between the sequence of images is sufficiently small that features of the object are not lost. It is also important that the initial base camera position II relative tc which subsequent camera positions can be determined is identified.
  • the end result is a data file illustrated n Fig 9 containing cartesian co-ordinates x, y, z for the points of the surface of the reconstructed object relative to an arbitrary origin, and a set of camera parameters including cartesian co-ordinates a, b, c, three angles of rotatio n ⁇ , ⁇ , ⁇ and internal camera parameters, e.g. the distance of the image plane from the lens and any other internal camera parameters required.
  • the camera parameters are relative to the starting point II and are used to provide the reconstruction.
  • the cartesian co-ordinates of the surface of the reconstructed oo ect relative to an arbitrary origin provides a representation of the reconstructed object.
  • the set of surface points for the reconstructed object is used to construct a surface.
  • a t ⁇ angulation routine is used to connect the surface points so as to generate a series of connected triangular surfaces covering the surface of the object.
  • a smoothing routine is applied to the flat facets so as to provide a smooth surface. Texture is then applied to the smooth surface of tne object.
  • a centre point of each triangle is determined.
  • a normal to that surface is projected and extrapolated onto the image stored in the computer most nearly corresponding to that part of the surface.
  • the triangle on the surface of the reconstructed object is then projected onto the most nearly corresponding captured image stored in the computer and the triangular ima ⁇ e portion grabbed. That triangular image portion is then mapped onto the tr_angular surface region of the reconstructed object so as to provide texture for that triangle.
  • the procedure is the- repeated for all tne triangles covering the surface of the reconstructed object. Once the textured surface has been completed, tne data is saved as a VRML data file for subsequent _se.
  • the information held at a cell may be extended to include surface properties such as surface normal and surface curvature information but this is omitted from the following discussion for the sake of simplicity.
  • g' ⁇ g 1 , ,g _1 ,g t+1 , , ,g ⁇ ⁇ denotes the camera parameters at all times bar the time under consideration
  • G' (n) is the space of all possible solutions for g' .
  • Processing will continue until no solutions fall below the relevant threshold. At any time processing may be re-started by heu ⁇ stically increasing the threshold, or alternatively, the remaining solutions may be recorded and processed m some manner .
  • the image data x ⁇ can be viewed as generated by (a) mapping image dat; x t onto the surface s followed by (b) the projection of the visible surface onto the ⁇ tii image plane. Assuming that the data generated in each projected region is conditionally independent
  • q 2 ⁇ qi ⁇ 1 and the decision wnether image regions match is based upon some similarity measure.
  • a number of alternative metrics may be used based upon the texture, colour and the like.
  • the expected projection of a surface region is compared with the actual image data and this is dependent upon a variety of factors such as lighting conditions, local surface shape and texture, image qua_ ⁇ ty and so on. (Note that a match may be invalid if the imaging geometry is unsuitable, for example, if the camera has rotated through too large an angle) .
  • the quantity in (18) is essentially a tracking mechanism which counts how many of the previous images onto which cell I must project are consistent with the projection of cell I onto the current image.
  • S 3 (n) is the current space of possible cell assignments for cell j
  • the shorthand LJ has been used to denote those cells that lie along a line from cell j through the camera centre at time t.
  • An important feature of the invention is the computation of upper bound scores in (18) and (22) . It is worth mentioning that complexity can be reduced further c considering only those times when the number of possiole camera parameters is small. In any case, if the parameters are cetter defined this will provide greater powers of discrimination.
  • the size of the cells in 3D space may also oe used to meet resource requirements.
  • at tne onset or processing cells may be quite large.
  • Onl_ those cells that are not consistent with the image will be eliminated. Once elimination has taken place, the remaining cells can be subdivided and the process can be repeated. In this way resources can be focused on interesting surface reg.ons, and it provides an efficient means of achieving high resolution reconstructions given limited computing power.
  • n l (n) ⁇ .
  • Figure 10 shows a matching system for identifying objects from a video signal including views of the object from different directions.
  • the system includes a camera 850 connected to a computer 860 connected to a random access storage device 870 storing a database of image data.
  • the system can pe applied to the recognition of vehicles by monitoring vehicles passing the view of the camera 850.
  • a numoer of two dimensional video images of the scene including tne vehicle are captured and stored on the computer.
  • the computer system uses the method described above to construct a 3D representation of the whole or part of the vehicle from the images captured b ⁇ the camera 850. (It will be appreciated that alternatively, or in addition, the camera moves relative to a stationary object or moving object m order to capture sufficient 2D images.)
  • the database stores data relating to a number of images with which the constructed model can be compared to try and identify the object.
  • the database can store a number of side elevations of vehicles and by rotating the 3D representation of the car and comparing it with the stored images, that stored image most closely matching the reconstructed object can be identified and thereby the identity of the car identified.
  • the images with which the reconstructed model are compared will have associated with them in the database data relating to the identity of the image, such as model, name and manufacturer name m the car identification example.
  • the database can store image data relating to reconstructed objects as created by the method described above or some other 3D representation of the object.
  • the matching procedure will then be a matter of comparing the representation constructed from the captured video images with the 3D models stored in the database to determine the most likely match. In this way, although only a part of the sub j ect may be captured by the camera, the surface detail on that part of the object may be sufficient to enable a match to be made with the models stored in the database.
  • Figure 11 shows a system for identifying and/or manipulating an object in a video signal.
  • the system includes a source of video signals in the form of a camera or a video recording device 880.
  • the system includes a computer 860 to which video signals are supplied.
  • the system also includes a video recording device 882 and a video display device 884 for recording and displaying video signals respectively.
  • a video signal such as a live TV feed or a recorded program is supplied by the camera or recorder to be displayed on the video display 884 or recorded on the recording device 882.
  • the computer 860 monitors the video signal being transmitted and processes the video signal as described below.
  • the system can be used to identify an object m a video stream.
  • the video signal 886 being transmitted includes a particular manufacturer's product.
  • the computer 860 includes in its memory a representation of the manufacturers product as obtained by the method described above.
  • the computer system 860 monitors the video signal 886 and captures video frames. From the different views of the objects shown in the video frames, the system reconstructs a representation of the object and compares it with the representation of the product stored in the memory to identify the product. This information can then be used to alert a party that the video being transmitted includes the manufacturer's product.
  • the system could also replace the object in the video screen with an altered object or and alternative object, a representation of which stored in the computer's memory, hence, the system could identify the existence of a particular object in the video signal from the video frames captured, and on identifying the object, it could be replaced by the same object but having altered properties, e.g. colour or surface decoration.
  • the altered object is then substituted in the video image in place of the original object.
  • the manipulated video signal can then be stored, broadcast, or displayed. For instance, the colour of a car could be changed or the surface decoration of an object up-dated to correspond with current packaging or decoration.
  • an entirely different object could be inserted in the video signal, which has been derived from a 3D representation of the different object and which is accessible by the computer.
  • the system of Fig 11 can also be used to manipulate images m a desired way. For instance a view of a rural scene could be captured as a number of two dimensional images and a three dimensional representation of the scene generated by the computer system.
  • the computer system analyses the content of the three dimensional representation to identify parts of the representation having particular features, e.g. degree of surface curvature, thereby identifying different articles. For instance buildings would have flat surfaces and so those parts of the 3D reconstruction corresponding to buildings could be identified.
  • a tree would have a highly fractal surface shape and other natural objects could likewise be identified from the three dimensional reconstruction of the scene. Selected parts of the scene could then be altered independently of the other parts. For instance, the surface decoration of buildings could be altered or the colours of objects m the scene changed. Hence, the system would allow the manipulation of parts of an image of a scene for purposes of special effects and such like.
  • Figure 12 shows a system for encoding video signals for transmission.
  • a camera 850 or video playing device 880 supplies video signals to a computer 860 which processes the images.
  • a computer 860 which processes the images.
  • a set of at least three cameras viewing a subject is preferred.
  • Individual image frames are converted into a three dimensional representation of the scene from the individual two dimensional images.
  • the three dimensional representation of the scene created from the 2D visual images is transmitted by a transmitter 890 to a receiver 892 and supplied to a further computer.
  • the three dimensional representation is then be stored in a random access memory device 896.
  • m device 896 Once the 3D representation of the scene has been stored m device 896, different views of the scene can be provided on a display device 898 merely be transmitting camera parameter data between the transmitter and receiver.
  • This provides an object based encoded signal in which m order to provide a video signal of different views of a scene all that is required is data relating to the direction from which the scene is to be viewed rather than complete image data.
  • the present invention can be exploited m the field of three dimensional TV svste s.

Abstract

A method of constructing a three dimensional representation of a subject. The method comprising the steps of: (i) providing at least two images, in which each image is a different view of the subject; (ii) defining a volume enclosing possible surfaces of the subject, the volume being made up from a plurality of cells; (iii) determining an upper bound on the probability that one of the plurality of the cells encloses the surface of the representation of the subject; (iv) determining a threshold probability; (v) comparing the upper probability bound for the cell with the threshold probability; and (vi) eliminating the cell from the volume if the upper probability bound is less than the threshold probability. The method is useable in various video applications, including image processing, manipulation, tracking, identification, transmission and 3D Television systems.

Description

3D Image Processing System and Metnod
The present invention relates to image processing, and m particular to a method and system for generating a tnree dimensional representation cf a subject from two dimensional viαeo images of the suoject.
The present invention also relates to recognising a subject in video images, the manipulation of a subject m video images and the generation and encoαed transmission of video images including three dimensional representations of a suDject.
The task of reconstructing a three dimensional object from two dimensional images can be understood with reference to Figure 1. A subject three dimensional (3D) object 110 has a 3D surface 112. Three camera positions 114, 116 and 118 are shown, each having a camera centre 120 and a corresoondmg image plane 122. For the purposes of this discussion the cameras can each be considered to be acting as pinhole cameras, with the camera centre at the pinhole. Ir reality, a lens system to focus light on the image plane is typically provided .
In the process of image generation, light rays 124 pass from the surface and pass through the camera centres 120 projecting a two dimensional (2D) image 126 onto each image plane. Reconstruction of the 3D object is the task of recovering 3D surface structure from the projected 2D images. This can be achieved by back-projecting rays from the images through the camera centres. However, in order to do this it is necessary to know which points in the different camera images 127, 128, 129 correspond to the same point 130 on the surface of the subject object, and the camera parameters: i.e. the camera translation, rotation and internal parameters. Reconstructed points on the surface of the object can then be determined from the location of the intersection of these bac -pro ected rays in 3D Euclidean space.
There are a number of existing technigues for 3D reconstruction, which are illustrated with reference to figure 2. A camera captures a sequence of images of a subject object taken from different viewpoints. A matching system then identifies matching image regions ("correspondences") m pairs of captured images. This is commonly achieved through a feature extraction and template matching process. These correspondences are then passed on to a geometry-based system, which estimates the relative camera parameters. Given the matched regions in the images and the camera parameters, a reconstruction engine can then reconstruct 3D surface points by back-projecting rays from the matched regions through the camera centres for cameras having the estimated camera parameters.
Such systems can be classified according to whether image correspondences or the camera parameters are known a priori. If both are available this greatly simplifies the problem since reconstruction becomes simply a case of back-projecting rays. In calibrated systems the camera parameters are known but correspondences are not, and must be estimated from the images. The present invention is concerned with uncalibrated systems in which neither the camera geometry nor the image correspondences are available to the system before processing and must be identified as part of the reconstruction process. There are problems associated with available uncalibrated systems. They are slow and they may worκ satisfactorily on only one or two carefully selected objects. In general, they give poor reconstructions. In particular, only the best-guess image correspondences are used to determine the camera parameters, and, in turn, only the best-guess parameters are used to obtain a 3D reconstruction: i.e., best-guess information is passed up the processing chain. This approach is flawed as it depends critically on obtaining a good initial set of correspondences. However, that is not possible in general since good matches can only be reliably obtained if the camera parameters and the 3D object are known m the first place. The result is that errors introduced at the match stage necessarily pass on to the geometry stage, causing mistakes m geometry estimation and thereby leading to a skewed or unconvincing reconstruction. Attempts to improve on the recovered reconstruction may subsequently be made, but in essence, these simply refine the current best-guess solution (i.e. perform a local gradient-based search) and are incapable of recovering from non-trivial errors .
The present invention relates to a new approach to 3D reconstruction which is fast and gives good reconstructions under a wide range of realistic conditions.
According to a first aspect of the present invention, there is provided a method of constructing a three dimensional representation of a subject, comprising the steps of:
(l) providing at least two images, m which each image is a different view of the subject;
(ii) defining a volume enclosing possible surfaces of the subject, the volume being made up from a plurality of cells; (111) determining an upper bound on the probability that a one of the plurality of the cells encloses the surface of the representation of the subject; (17) determining a threshold probability; (vj comparing the upper probability bound for the cε_l with the threshold probability; and
(vi) eliminating the cell from the volume if the upper probability bound is less than the threshold probabι_ιty.
In this way, a 3D representation of the subject can be generated by eliminating cells in a volume of space enclosing the reconstructed object and retaining cells enclosing cr on the surface of the reconstructed object so that the surface cf the subject is generated.
A subject is considered to included single items, a collections of items, with and without backgrounds, scenes and any other image, real or imaginary, that can be captureα or created. The method can create a representation of the whole or the part, or parts, of the subject .
Preferably, the method includes the step of comparing image regions in the at least two images, to determine whether the image regions match.
The method can include the step of repeating steps (IV) to (vn) for all cells located between the periphery of the volume and the surface of the representation of the subject. As cells are only retained if they are on the surface of the object, those cells within the surface of the object do not need to be eliminated. 3 The method can also include the step of eliminating those cells which lie on an axis of a back projection from a cell determined not to be on the surface of the represe- tation to an image plane, and between the cell and the periphery of the volume. For a connected subject, as cells are onl^ retained if they are on the surface of the representation cf the subject, those cells falling on a back projected ι_ght ray between the eliminated cell and wall of the volume can also be eliminated without having to assess the cells individually.
The method can include the further steps of: sub-dividing a cell into smaller sub cells whic" span the original cell; determining a new upper bound; determining a new threshold probability; and eliminating those sub cells having an upper bourα less than the threshold probability so as to generate a more accurate surface of the representation of the subject.
Initially a coarse set of cells can be used to span the volume. After the non-surface cells have been el±r_nated, the remaining surface cells can be sub-divided so as to provide a finer scale for definition of the surface features of the subject. This allows a more accurate representation to be generated. Further, the method is still computationally tractable, as a large amount of the possible volume can be eliminated by the initial coarse cell processing step, rather than using a fine cell size in the initial step.
The methoα can include the further steps of:
(l) defining an initial space representing sets of possible camera parameters; (11) determining an upper bound on the probability that a set of camera parameters relates to the surface of the representation of the subject;
(m) determining a threshold probability;
(ιv) comparing the upper probability bound for the set of camera parameters with tne threshold probability; and
(v) eliminating the set of camera parameters from the initial space, if the upper probability bound is less than the threshold probability.
This allows the method to generate a reconstruction when the camera parameters are not known, or not all known, a pri ori . The number of variables being calculated over by the method can be decreased by eliminating from all possible camera parameter sets, those sets of camera parameters corresponding to image locations which are not considered to include sufficient surface features of the subject being assessed. Reducing the number of possible camera positions that need to be considered m this way reduces the amount of processing that is required m order to determine plausible surface cells .
Preferably, a set of camera parameters includes translation, rotation and internal camera parameters.
Preferably, the step of eliminating camera parameter steps is iterated, so as to identify cells enclosing a surface of the representation best matching the subject or a set of surfaces of the representation best matching subject. The step of eliminating camera parameters can be iterated until a surface best matching the subject has been generated, or until a set or sets of surfaces overlapping the representation have been generated. Preferably the upper bound is determined using Bayesian probability theory.
According to a further aspect of the invention there is provided an image processing system for generating a 3D representation of a subject from at least two images, in which each image is a different view of the subject, and including data processing means in communication with a storage device storing image data representing the at least two images, the data processing means operating on the stored image data to; (I) define a volume enclosing possible surfaces of the subject and being made up from a plurality of cells; (ii) calculate an upper bound on tne probability that a one of the plurality of the cells encloses the surface of the representation of the subject; (m) determine a threshold probability;
(IV) compare the upper probability bound for the cell with the threshold probability; and (vi) eliminate the cell from the volume if the upper probability oound is less than the threshold probability.
According to a further aspect of the invention there is provided computer program code executable on a computer to carry out a method according to the first aspect of the invention .
According to a yet further aspect of the invention, there is provided a method for identifying a subject from a video signal including different views of the subject, the method including constructing a three dimensional representation of the subject from the video signal by using the method according to the first aspect of the invention, and comparing the 3D representation with representations of -mown sub ects .
The representations of known subjects can be 2D images or can be parameters of the known subjects which can be compareα with parameters determined from the 3d representation, such as the ratio of the width and length of a subject.
Preferably, the representations of known subjects have been constructed according to the method of the first aspect of the invention. The identification then becomes a process of comparing the similarity of the shapes of the wholes or parts of the representations to determine the best match.
According to a further aspect of the invention, there is provided a method for tracking a subject in a video signal including views of the subject, and including the step of matching a 3D representation of the subject created according to the first aspect of the invention with 2D video images derived from the video signal.
In this way, the mere existence of the subject in the viαeo signal can be identified.
According to a further aspect of the invention, there is provided a method for altering a video image including a subject, including identifying the subject according to a previous aspect of the invention, and replacing the image of the subject in the video signal with an altered image of the subject derived from the representation of the subject. For instance the colour or other aspect of the surface decoration of the subject in the video signal can be altered. According to a further aspect of the invention, there is provided a method for altering a video image inducing a subject, including identifying the subject according to the method according to a previous aspect of the invention, and replacing the image of the subject in the video signal with an image derived from a representation of a different sαoject. In this way an entirely different subject can be used m place of the original subject in the video signal.
According to a further aspect of the invention, there is provided a method for generating 3D video signals, for a 3D television system, including capturing images of a subject with three or more cameras placed around the subject and constructing a representation of the subject according to the first aspect of the invention, for each of a sequence of time steps. A moving 3D TV image can then be generated cy displaying the 3D representations in time sequence.
According to a further aspect of the invention, tnere is provided a method for transmitting a video signal if a subject, the method including generating a 3D representation of the subject according to the first aspect of the invention and transmitting camera position data so as to generate different views of the subject. There is a significant compression of the amount of video data that needs to be transmitted as only data relating to the view of the representation that is to be displayed needs to be transmitted in order to provide a moving image of the subject.
According to a further aspect of the invention, there is provided a method for transmitting a video signal of a subject, the method including generating 3D representations of the subject according to the rirst asoect at the mventic-, and transmitting data relating to the differences between tne 3D representations cf the subject. By transmitting only the differences between subsequent representations of a moving subject, a significant compression of the amount of data that needs to oe transmitted in order to display a moving 3D image can be achieved.
An embodiment of the invention will now be described in detail, by way of example only, and with reference to the accompanying drawings, in which:
Figure 1 shows a diagram illustrating the capture of two dimensional video images of a three dimensional subject object;
Figure 2 shows a flow chart illustrating a prior art 3D reconstruction method;
Figure 3 shows a flow chart illustrating in general terms a 3D reconstruction method according to the present invention;
Figure 4 shows a 3D reconstruction system according to an aspect of the present invention;
Figure 5 shows a flow chart illustrating the 3D reconstruction method of the present invention in more detail;
Figure 6 shows an illustration of the steps in the method of reducing the camera parameter phase space for pairs of images and constructing the corresponding candidate surface locations; Figure 7 shows an illustration of the steps of eliminating cells and identifying cells enclosing the surface of the representation of the subject, by the comparison cf image regions; Figure 8 shows an illustration of the image capture step in greater detail;
Figure 9 illustrates the format of the data corresponding to the reconstructed representation that is generated by the method;
Figure 10 shows a matching system; Figure 11 shows a system for identifying and/or manipulating video images;
Figure 12 shows a 3D video signal transmission system; Figure 13 shows 2D images of different views of a subject; and
Figure 14 shows different views of a 3D representation of the subject shown in Figure 13 reconstructed according to the method of the invention.
The same elements in different Figures share common reference numerals unless indicated otherwise.
The methods and systems of the present invention are based on a new philosophy to pattern recognition, which is based upon three key conditions:
1. Calculations are underpinned by Bayesian probability theory.
2. The method requires that all regions of the solution space be assessed.
3. Processing is resource-driven such that the calculations that can be performed are constrained by the memory available and the speed of operations required, as defined by the operator. In brief, the approach uses the key conditions as fcllov.s.
Given the available resources, a suitable means of computmσ an upper bound probability for regions of the solution stace is defined. Through an iterative process, regions with lew upper bounds are eliminated, and then effort is re-applied to those regions that remain. As more and more of the solution space is eliminated, so the size of the regions covering the remaining space can be reduced without compromising resources, and more accurate upper bounds can be evaluated. In this wav , good solutions are identified through a process of exclusion.
The general method of the invention is illustrated by the flow chart 300 shown in Figure 3. In relation to the current invention the approach is applied simultaneously to two modules: a geometry engine 310 for assessing hypothesised camera parameters and a reconstruction engine 320 for analysing hypothesised 3D surface structure. An important aspect of the invention is that all plausible geometric and surface hypotheses are examined, processing being a task of eliminating implausible hypotheses so as to hone m on tr.e best solution through a process of exclusion. 2D image data of a first view 302 of the subject object or scene and 21 image data of a second different view 304 are processed cy the method to arrive at the 3D reconstruction.
The geometry interpreter 310 evaluates possible camera parameters based upon the currently possible 3D surface structures and the image data. In brief, it eliminates parameters that are not consistent with the visible surface and the image data.
Likewise, the reconstruction engine 320 scores possible 3D surface structures given all currently possible camera parameters and the image data. In brief, it eliminates visible surface structure that is not consistent with the geometry and the image data.
By representing the sucject object or scene structure through a volume of cells m 3D Euclidean space, processing is a task of eliminating outermost cells in the volume if they cannot plausibly contain a surface. That is, implausible volume is cut away from the original volume in a manner analogous to a sculptor chipping away stone to reveal the finished object.
At the onset of processing all possible camera parameters and 3D cells are available to the system as possibilities, bar those that can be eliminated due to prior knowledge. For instance, if it is known that the camera is rotating around an object, then some of the camera parameters can be pinpointed a priori. If the extent of 3D space which contains an object of interest is known, then a small initial volume can be defined.
Processing 330 is then a task of eliminating implausible parameters and 3D cells and seeing how this affects the system. This is an iterative process. For example, elimination of an unlikely set of camera parameters m itself leads to the elimination of certain cells since the cells were dependent upon these parameters for their existence in the first place. Likewise, eliminating part of the 3D volume affects the support for certain camera parameters since they may no longer be consistent with both scene and image data.
Through this iterative process of eliminating implausible parameters and eliminating implausible cells a good 3D reconstructions 340 is honed in on by exclusion. This is in stark contrast to all existing methodologies that attempt to
SUBSΠTUTE SHEET RUL identify a reconstruction directly through the propagation of best-guess interpretations. All possible solutions are passed between the geometry and reconstruction modules .
With reference tc Fig 4, there is shown a system 400 for generating a three dimensional representation of a subject object 410. The system includes a digital camera 420 movable relative to the object 410 to capture images showing different views of the object. The camera is connected to a computer 430 which stores image data from the camera and processes it to generate data providing the three dimensional representation of the re-constructed object. The computer operates under control of a computer program which implements the method described herein. The specific details of a suitable computer program is considered to be within the ability of a man of ordinary skill in that art in light of tnis description and so has not been described in detail.
Figure 5 shows a flow chart 500 illustrating steps of the method in more detail. Initially 510, the camera 420 is used to capture two dimensional images of the subject object 410 as seen from different views. It will be appreciated that all that is required is relative re-orientation between the object and camera. Hence, the object may be stationary and the camera mobile, the camera stationary and the object mobile or both camera and object may move. Further, sufficient 2D images from difference views must be captured so as to cover all features of the object that it is desired to re-construct. With a sufficient field of view for the camera, the object may be mounted on a turntable and rotated relative to a static camera mounted on a tri-pod.
In order to simplify the description, only three images of different views of the object, II, 12 and 13, will be referred to m Figures 6 and 7. Images II, 12 and 13 are two dimensional images generated by the camera 420. At step 520, the camera image data from the digital camera ^20 are converted into bitmap file format and stored as seoarate bitmap data files on computer 420. A series ol five 2D bitmap images taken from different views of an example object are shown m Figure 13.
The 2D images II, 12 and 13 are filtered at step 530 by applying colour, texture and edge filters to the data. Pairs of filtered images, II and 12, II and 13 and 12 and 13 are compared to identify characteristics of small regions of the respective images.
By analysing the filtered pairs of images, it is possible to identify pairs of points, or small regions, m the images which are plausibly the same surface feature of the object
540. By analysing the entirety of the pairs of images, a set of candidate correspondences (le plausible images of the same surface feature) can be generated 550 from, and for, each of the pairs of images, as follows.
Figure 6 schematically illustrates step 550 of flow chart 500. A volume 610 is defined within which the 3D reconstruction 615 of subject object 410 is to be constructed. Also snown m Fig 6, is a pnase space diagram 620 illustrating a set of possible camera parameters that has been determined from the candidate correspondences derived from images II and 12. The entire area enclosed by the perimeter indicates all possible camera parameter sets. The shaded regions indicate camera parameter sets that are impossible in respect of the different views shown in II and 12. For instance if II and 12 are related by a rotation in a flat 2 dimensional plane, then no rotation of the camera out of that plane can be possible. As such, all camera parameters sets including such rotations out of the plane can be excluded. The unshaded area represents the possible camera parameter sets remaining.
Similarly, 630 shows the phase space diagram for the set of camera parameters related to the pair of images II and 13. From the set of candidate correspondences generated at step 530 for potentially matching image regions in images II and 13, it is has been possible to exclude some camera parameters (shown in shading) from the set of all possible camera parameters.
Candidate surfaces are created within volume 610. A local region of phase space 625 for images II and 12 is selected and the candidate correspondences for that set of camera parameters for that pair of images is used to generate a region 628 of the possible surface locations of the reconstructed object. Regions 715 and 725 of images II ano 12 have been compared and found to include a sufficiently similar content: le a correspondence. For a particular set of camera parameters for the position of image II and a particular set of camera parameters for the position of image 12, light rays are back projected from the position of the region m the image through the camera centre for each image, and the location of the intersection 627 is determined to be a possible surface feature for the reconstructed object 615.
The process is repeated for further correspondences identified. For instance image regions 745 and 750 of irrages II and 12 have been determined to match sufficiently to constitute a candidate correspondence and the intersection 629 of their back projected rays through the respective camera centres is identified as a possible surface location.
The region 628 is generated by determining the intersection of back projected rays for a particular correspondence for all possible camera image positions, as indicated by dashed lines 714 and 724 in Figure 6, so as to cover all the camera parameter sets corresponding to that region of phase space. The process is carried out for all the non-excluded regions of phase space so that all possible camera parameters for ad possible correspondences in images II and 12 have been considered, thereby generating a volume of possible surface locations 640. The result of this is a volume of candidate surface locations 640 which are considered to be possible features cf the surface of the object ceing re-constructed 615. In Fig 6, reconstruction 615 has not yet been generated but _s shown to highlight the overlap cetween the volume of possιr_e object surface locations 640 and the actual surface which _s to be re-constructed. )
A similar process is carried out for the set of candidate correspondences and camera parameters determined for image pair II and 13 and a further 3D volume of possible reconstructed odject surface points 650 is generated. The process is repeated fcr all pairs of images being considered so as to create volumes of candidate surface locations which enclose the object to be re-constructed. The exclusion of certain camera parameter values inherently leads to a reduction in the volume of possible object surface points.
Once a complete set of candidate re-constructed ocject surface points has been created, implausible candidate surface points are eliminated m step 560 in the following manner as described w th particular reference to Figure 7. The volume 610 is spanned by a number of cells 705. A cell ~d existing at a candidate object surface location 627 for the pair of images II and 12 is considered. A theoretical light ray is projected from cell 710 through the camera centre onto the image 12. The image region 725 which corresponds to that cell 710 for the camera in that position, is determined from the 2D images stored by the computer. As explained previously, there are a certain number of camera parameters and therefore camera positions at which image 12 was acquired.
In Fig 7, a one of the possible camera positions for image 12 from the possible set cf camera parameters is shown. A light ray 720 is projected from the cell 710 through the camera centre and the region cf image 12 725 corresponding therewith is determined. The degree of similarity of the image regions is then assessed.
The image regions from image II and image 12 are both passed through a texture filter and compared. If the filtered image regions are considered to match within a specified low threshold, e.g. 10%, then the image regions are both passed through a colour filter and again their degree of match compared to a ICΛ threshold value. If the image regions are still considered to match, then they are passed through an edge filter and the degree of match between the filtered images again compared with a low threshold. If the filtered image regions pass all three match criteria, tnen the cell is considered to oe still a plausible surface feature of the obj ect .
According to image 12 for that camera position, the cell is still plausible. Cell 710 is then verified with respect to image 13 for all possible parameter sets. A light ray is projected from cell 710 through camera centre 752 and onto image region 735. The content of image region 735 is filtered and its degree cf match with image region 715 determined as described above. In this case, as the image regions are insufficiently similar for any possible camera parameters, cell 710 can be eliminated as not corresponding to a plausible surface feature of the reconstructed object.
Hence the cell 710 is considered not to lie on the surface of the object and so can be eliminated. Further, as the cell has been hypothesised as lying on the surface of the object being re-constructed, all those cells 730 lying on the set of light rays extended from the cell to the camera positions considered may similarly be eliminated.
In Fig 7, this has been represented schematically by a volume 730 being eliminated from the initial volume 610. In reality the cells that may be eliminated will have a more complex shape, reflecting the degrees of freedom of the camera parameters that have been considered. As the possible cell locations and camera parameters are interconnected, the elimination of these cells has a concomitant reduction m the set of possible camera parameters.
The procedure is repeated for all cells falling within the volume of the candidate surface locations 640. Consider cell 740. The image region of image II corresponding to that point in the three dimensional volume 610 is determined by projecting a light ray onto image II and the image content for that region 745 determined from the bit map images stored m the computer. As previously, the image region in image 12 corresponding to cell 740 in the volume 610 is determined for the set of possible camera parameters for image 12 and individually compared with those for image II. The individual image regions 745 and 750 are filtered as previously and the degree of match compared. In this case, the image regions 745 and 750 match sufficiently for the cell to be considered to be on or enclose the surface of the object and therefore to constitute a part of the surface cf the reconstructed object.
The existence of cell 740 as an candidate surface point of the reconstructed object is verified by comparing the image region for image II with image region ^55 for image 13, via the filtering and matching procedure discussed above. This is possible because the volume of candidate surface location 650 for the images II and 13 overlaps with that for the pair of images II and 12. Hence, the surface feature of the original object 410, which cell 740 reconstructs, is present in both images II and 13 as originally captured. This would not be the case if, for instance image 13 had been captured by the camera viewing an opposite side of the object which view would not show the surface point 740 being reconstructed. Further verifications can be carried out using different images and different sets of possible camera parameters as required.
The procedure is repeated for all cells falling within the volume 640 of possible surface locations within volume 610 until tne implausible cells have been eliminated and those ceils forming the surface of the reconstructed object ha-e oeen identified. During this process, the set of camera parameters which could result in the reconstructed object surface reduces to a singularity as shown in phase space diagram 760 such that the camera parameters are also umque.y identified. The process is then repeated for the pair cf camera images II and 13 and the set of camera parameters relating to images II and 13. The entire process is then repeated again for the pair of camera images 12 and 13 and sets of camera parameters relating to the pair of images 12 and 13. The process of eliminating candidate cells from tne correspondences between images 12 and 13 can result in tne elimination of cells previously identified from Ii and 12 as Pemg likely parts of the surface of the reconstructed co ect, thereby improving the accuracy of the re-construction
If improved surface detail is required, then the entire procedure can be repeated for a finer scale of cells spanning the cells identified as enclosing the re-constructed object surface, and using a smaller volume 610 encompassing only the reconstructed object surface volume identified by the first iteration and using smaller image regions in the image matching step, as the available processing power allows.
Figure 14 shows images of different views of a reconstruction of the object shown m Figure 13 as obtained using the method and system described.
Figure 8 illustrates the 2D image capture step 510 in more detail. Here the object 410 is stationary and the camera is rotated about the vertical axis of the object m a two dimensional plane. An image II is captured at base posicicr II and the camera is rotated through small angle steps with a sequence of images, 12, 13.... In being captured. It is important to ensure that the step between the sequence of images is sufficiently small that features of the object are not lost. It is also important that the initial base camera position II relative tc which subsequent camera positions can be determined is identified.
A second set of i-ages _s captured the first image II' is captured at the position of image IN and a sequence of images 12', 13'...., In' are captured similarly to the first set of images. A third set cf images with the first image II" captured at the position of image In' is then taken and the procedure continued until a final image In N is captured at position corresponding to the first image capture II. The full set of images is then analysed according to the above described steps cf the method. The end result is a data file illustrated n Fig 9 containing cartesian co-ordinates x, y, z for the points of the surface of the reconstructed object relative to an arbitrary origin, and a set of camera parameters including cartesian co-ordinates a, b, c, three angles of rotation α, β, γ and internal camera parameters, e.g. the distance of the image plane from the lens and any other internal camera parameters required. The camera parameters are relative to the starting point II and are used to provide the reconstruction. The cartesian co-ordinates of the surface of the reconstructed oo ect relative to an arbitrary origin provides a representation of the reconstructed object.
The set of surface points for the reconstructed object is used to construct a surface. A tπangulation routine is used to connect the surface points so as to generate a series of connected triangular surfaces covering the surface of the object. A smoothing routine is applied to the flat facets so as to provide a smooth surface. Texture is then applied to the smooth surface of tne object.
A centre point of each triangle is determined. A normal to that surface is projected and extrapolated onto the image stored in the computer most nearly corresponding to that part of the surface. The triangle on the surface of the reconstructed object is then projected onto the most nearly corresponding captured image stored in the computer and the triangular imaσe portion grabbed. That triangular image portion is then mapped onto the tr_angular surface region of the reconstructed object so as to provide texture for that triangle. The procedure is the- repeated for all tne triangles covering the surface of the reconstructed object. Once the textured surface has been completed, tne data is saved as a VRML data file for subsequent _se.
A general mathematical discussion of the _nventιon follows by way of further explanation of the presert invention. Consider a set of T digitised images x={x1, , x^ } cf an op ect or scene, captured by an image capture device such as a digital camera. The goal is to derive a realistic 3D reconstruction of the scene in the form of a surface in 3D space.
This reconstruction process involves identifying the set of camera parameters g= { g , , qt ) for the image set, and a representation of the surface structure of the object or scene. The surface can be represented via a quantised volume of N=NNyN, cells in Euclidean space, where each cell I εN has a label s where the label assignment s_=_ means that the cell lies on, or encloses an object's surface, s^O means it does not. The information held at a cell may be extended to include surface properties such as surface normal and surface curvature information but this is omitted from the following discussion for the sake of simplicity.
From condition 1 the aim is to find the best global solution for camera parameters and 3D surface reconstruction. From conditions 2 and 3, an holistic probability theory approach, requires :
(1) {g,s}=arg max(G,sι P(g,s|x) where {G,S} is the space of all possible solutions for camera parameters and surface descriptions. This aim is not realised directly, i.e., by actively searching for and refining solutions within the global solution space. That is the approach of existing gradient-based or exhaustive search tecnniques . Rather, an indirect method is employed, by eliminating bad solutions from {G,S}. In doing so all of the solution space is implicitly examined, n line with condition 2 , as follows .
Initially solutions are grouped together since examining each individual solution in isolation would be computationally intractable in general, which is against condition 3. Consider all solutions that contain the individual hypothesis sx=α, (for example in the case α=l we consider all solutions that take cell l to be on, or enclose, the surface of an object) .
The lowest upper bound on the probability of any solution containing the assignment sx=α is computed
(2) U { n ) ( s±=a ) = max,G <n),s '(n) , P(g,s',Sl=α |x)
where s' = {sι, , Si-i, s1+ι, ...,sN} denotes the labels at all cells bar that under consideration, S'(n) is the space of all possible solutions for s', and the index n has been introduced to indicate some time step associated with processing.
Now any group of solutions whose upper bound probability is below some known lower bound value, Ls (n), cannot contain the optimum solution. Therefore, these groups can be eliminated from consideration. The rule for sx at some iteration time n is : elimina te any sol u tion conta ining the cell assignmen t s_=α if
( 3 ) U ,n ) ( S l=α ) < Ls ( n >
By eliminating this set of solutions the size of the space which needs to be considered at the next time step s effectively reduced. That is, the new search space at time n+1, S(n+1),wιll not contain these solutions, which will affect future processing. In relation to the volume of cells, if tne possibility sx=l is excluded, then the cell cannot contain a surface and it can be eliminated from the volume, i.e., this part of space can be carved away. S(n) is used to generate a volume V(n> that enclose an object. V(n) is simply the union of all cells l for which s, =l.
In a similar fashion an upper bound for camera parameter assignments can be computed. Consider all solutions that contain the individual hypothesis gt=γ, where γ is a set of possible camera parameters encompassing camera translation, rotation and internal camera parameters .
The lowest upper bound on any one solution containing this hypothesis is:
(4) u^'lg^γ) = max{ GJn) , Ξ ( n ) } P (g'^γ, g' , s I x)
where g'={g1, ,g _1,gt+1, , ,gτ} denotes the camera parameters at all times bar the time under consideration, and G' (n) is the space of all possible solutions for g' .
Now again, any group of solutions whose upper bound probability is below some known lower bound value, Lσ (n), cannot contain the optimum solution. Therefore, these groups can be eliminated from consideration. The rule for gz at some iteratior time n is: el imina te any sol u tion con taining tne camera parame ters gt=γ i f
(5) u<n) i gt=γ)< Lα "
Again, by eliminating this set of solutions the size of the space which needs to be considered at the next time step is reduced. That is, the new search space at time n+ 1, Gn+11,wιii not contain these solutions, and this will affect future processing .
The computation cf the upper bound has not yet been defined, and m general may be computationally expensive, contrary to condition 3. The solution is to identify quantities of the form Y(nl such that Ytn)>= U r) which can be computed in a given time and using a given amount of memory. The elimination rules then become : el imina te any sol u tion con taining the cell assignment s2=α if
(6) Y(n) (s1=α)< Ls n'
and elimina te any sol ution containing tne camera parameters
( 7 ) Yι n ( g^) < Lα 'n ) Y { rι ) is evaluated by combining Bayesian probability theory with rules of inequality . Its form may change over the iterative cycles m order to accommodate condition 3. For example, at the onset of processing Y(n may be coarsely and quickly evaluated, but provided it obeys Y(n)>= U'nl then only bad solutions -ill be eliminated. Towards the end of processing when only a few solutions remain, a more sophisticated and computationally intensive means of computing
Y may be employed, such that Yln) approximates U(n> provided condition 3 is not violated.
Processing will continue until no solutions fall below the relevant threshold. At any time processing may be re-started by heuπstically increasing the threshold, or alternatively, the remaining solutions may be recorded and processed m some manner .
In summary, the surface and geometry solution spaces S and G are simultaneously eliminated. Bad local surfaces are eliminated based upon other local surfaces and geometry. Likewise, bad camera geometries are eliminated based upon other geometries and the surface. Computational overheads are addressed using a coarseness function Y.
A more detailed mathematical description of the general method described aoove is now give, m relation to 3D reconstruction, and in particular, to the computation of upper bound quantities for parameter and cell assignments. Note that the actual implementation details will vary dependent upon the form of the application and the various constraints imposed on the system. This discussion is for the case when no information about the constraints on the imaging geometry or surface smoothness is available a priori. The development leads to relatively simple expressions for the upper bound quantities.
The upper bound quantities for tne reconstruction module can be developed through Bayesian probability theory. Consider the th cell m the volume and hypothesise that sL=α is part of the global solution.
Bayes' rule is applied to the joint probability m (2), giving
(8) P (g, s' ,
Figure imgf000029_0001
s' ) p ( xl I g, specs' P (g I s =α, s' ) P (sx=a, s' ) /p (x)
Now the simplifying assumptions is made that all surface structures and all camera parameters are equi-probable a priori, and that no information about the image data for image indexed t is available from the parameters and surface structure alone. Under these assumptions (which we have included here for simplicity out may change without compromising the general methodology described herein) the lowest upper bound for any solution containing the assignment
(9) U(n)
Figure imgf000029_0002
, xt_ \xt+1, ,xτ|xt,g,s1=α,s' )
where ki =c p(xt)/p(x) is independent of the assignment s1=α.
The evaluation of this expression is computationally intractable in general since it is necessary to search over all of the entries in the space {G(nl, S' (n)} of camera parameters and cell assignments . In line with the general methodology described previously, what is required is a means of computing an upper bound Y(nl> U n) which is adequate for the purpose of locating implausible solutions, and whic can be computed within the computational resources available to the user.
To begin, the assumption that the data m each image mdexe: 1, ...,t-l are conditionally independent is made, wnicn gives
(10) U(n)
Figure imgf000030_0001
s'l
which, by expressing the current space of possible camera parameters as G(n) = {G(nl1, ,G(n)T }, and using the inequality
11) max(aεA,bεB) < max(aεA t max(bεB)
gives
(12) U(n)(s1=α)< kιllτ.=tmax (gτεG (n)τ,gtεG(n) ' >max s' εΞ ""(Plx'lx g ^s^α^s'
Considering the probability distribution m (12) . The image data xτ can be viewed as generated by (a) mapping image dat; xt onto the surface s followed by (b) the projection of the visible surface onto the τtii image plane. Assuming that the data generated in each projected region is conditionally independent
(13)
Figure imgf000030_0002
p ( x τ \ x:) t l qt , q s ) where vτ ( s ) is the set of cells visible m image plane τ, given the surface solution s, and the shorthand xJ _s used to denote the image data formed via the projection of the cell onto the τ tn image plane. In essence f->e images x-, are taken as the ground truth by wnich hypotheses about the surface are tested. This is achieved p directly matching potentially visible surface regions given the potential geometries of the system.
Further development of this expression can take place once the model for the probability distribution is available.
In a similar way, an upper bound for any solution containing a set of camera parameters can be developed. Consider the camera parameters for the tth image and hypothesise that qt =γ is part of the global solution.
By analogy with the above development for cell assignments, in particular by applying Bayes' rule and using the inequality n (11) it follows that
14) U(n)
Figure imgf000031_0001
(gτεG ln,τ,max, sεS n) , UJE v τ
P I*] gL=γ,g #s,
In order to develop the equations for upper bound probabilities in (13) and (14) a model is needed for tne probability distπoution of generating data in an image region given the data in a second image, the surface structure and the camera parameters .
A simple model to adopt is of the f ollowing form :
Figure imgf000031_0002
qi if sk=0 or cell k is hidden or any attempted match is invalid
1 if sy= l and valid image data match q2 otherwise
where q2 << qi < 1 and the decision wnether image regions match is based upon some similarity measure. In order to determine whether there is a match a number of alternative metrics may be used based upon the texture, colour and the like. In essence, the expected projection of a surface region is compared with the actual image data and this is dependent upon a variety of factors such as lighting conditions, local surface shape and texture, image qua_ιty and so on. (Note that a match may be invalid if the imaging geometry is unsuitable, for example, if the camera has rotated through too large an angle) .
This expression can be put into the equations for upper bound probabilities. From (13) we have
( 1 6 ) U ι n )
Figure imgf000032_0001
p ( χD τ | x^ g' , gτ , s )
It is not possible to evaluate (16) in general since it is of exponential complexity due to the need to find a maximum value over all combinatorial solutions s'εS'ιn). However, it is possible to find an upper bound over all solutions. In particular, it is possible to determine from S' nl certain occasions when a cell must be visible. This allows the hypothesis to be tested by matching image regions. On other occasions it may not be possible to say for sure if a cell is visible or not. Therefore, an image match may not provide meaningful information and so the maximum value for the distribution is taken m such cases. Under this approach
i 17) Uιn)(s1=α)< k: πτ, max (gτ.εc-n,τ\ gtεG,nlt( ~ q2ltT. + q;
where the shorthand t* has been used to indicate that the product is over all images onto which cell l must project given S'(n), and δltτ* s 1 if the projections from cell i onto image planes t and τ* match, 0 otherwise. It is the right hand side of (17) which equates to the coarse upper bound measure, γlr. _
Taking logarithms
(18) log U(n)(s1= )< c + ∑τ. max <»eG (n,τ*, gtεG<n)t) (l-δltτ 1og q2
The quantity in (18) is essentially a tracking mechanism which counts how many of the previous images onto which cell I must project are consistent with the projection of cell I onto the current image.
If a hypothesised cell assignment is inconsistent with image data as the cell is tracked m this way then it will be penalised, and it may be excluded from consideration if its score falls below a threshold.
Moving on to camera parameters,
(19) U(n)(gr=γ)< kιπτ.-tmax (gτεG'n,τ)max{ sεS (n) } π]εv τ (s, p J I x-jS g^g'^ ) Again, it is not possible to evaluate (19) in general since it is of exponential complexity due to the need to find a maximum value over all combinatorial solutions s'εS'ιnl. However, it can be shown that
(20) max(sεsln)J(s)p ( xJ 1 xJ", qt , gτ, s-, ) < π3E:v t(s(n) )max,kεL3 t: }max,S3εsJ n\p (xk τ I x^ , gfc, gτ, s-, )
where S3 (n) is the current space of possible cell assignments for cell j , and where the shorthand LJ has been used to denote those cells that lie along a line from cell j through the camera centre at time t.
It then follows that
(n)τ
21) U(n) (g =γ)< kιπτ!=tmax f gτεG } π il]εv t (S (nl maxi εl 1- qi δ.
+ qi
which, by taking logarithms, gives
(22) log U(n)(gt=γ)< c + ∑τ,=t max {gτεG (n)τ, Σεv\Ξ (n) > max(kεLΛ ; 1- δktτ*) log qi
which, for each image τ={ 1, ...t-1, t+1, ...T } , is essentially a matching mechanism which counts how many of the image regions there may match in the current image, given the hypothesised camera parameters. Camera parameters will be eliminated if they do not give rise to a sufficient number of matches.
An important feature of the invention is the computation of upper bound scores in (18) and (22) . It is worth mentioning that complexity can be reduced further c considering only those times when the number of possiole camera parameters is small. In any case, if the parameters are cetter defined this will provide greater powers of discrimination.
It is also possiole to sample in sucn a a_ as to ennance the probability of rejecting an hypothesis b_ only considering unusual image regions since these will be xess likely to match erroneously. That is, unusual features wι___ tend to support only the true surface or camera parameter hypotheses, and have more power m eliminated false hypotheses .
The size of the cells in 3D space may also oe used to meet resource requirements. In particular, at tne onset or processing cells may be quite large. Onl_ those cells that are not consistent with the image will be eliminated. Once elimination has taken place, the remaining cells can be subdivided and the process can be repeated. In this way resources can be focused on interesting surface reg.ons, and it provides an efficient means of achieving high resolution reconstructions given limited computing power.
Finally, whilst hypotheses about points ι^ camera parameters space (e.g. gfc =γ) have been considered, in practice tne number of possible parameters may be large. It can be preferably to estimate an upper bound for regions in the space. That is, rather than hypothesise qt =γ, hypothesise gr εR
There follows a skeleton indication of the processing routines implemented by suitable software to eliminate implausible cells and implausible camera parameters.
Elimination Procedures 1. Reconstruction at time t
ElimmateCelis (G n,,S'
For each cell ιεV,n ( matcn=0
For each image indexed τ ' =t { imagematch=0 For each possible pair of camera parameters, gτ,g
If cell l must project onto both image t and image τ <
If comparison valid {
If projected image regions match ιmagematch= 1
} } else { ιmagematch=l }
match=match+ιmagematch
} if match<threshold eliminate cell I
2. Camera parameters at time t
ElimmateParameters (G(n) , S(n) )
For each set of camera parameters g { match=0 For each image indexed τ ' =t (
For each possible set of camera parameters gτ { For each region in image t {
Figure imgf000037_0001
Access cells in line from region through camera centre
For each cell {
If cell must project onto both image t and τ { If comparison valid \
If proj . regions match ιmagematch= 1 }
} match=match+ιmagematch
} if match threshold eliminate camera parameters qt
There now follows a skeleton indication of the processing routines implemented by suitable software to processes batches of image data or sequences of image data.
Batch Mode
1 1
Obtain image data x={x , ,x
n=l (n)τ .
Initialise set of possible camera parameters G
Figure imgf000037_0002
Initialise 3D volume of cells and their assignments S
; _ ; o (n) Q (n) >
10 While no change in solution space EliminateCells (G1"' , S(n) )
EliminateParameters (G(n> , S(n) n=n+l
Optionally subdivide cells and go to 10
StandardRefineSurface (G <n) , S (n) )
Sequential Mode
Initialise 3D volume cells and their assignments S n)
) _ i q (n) c (n) \ Read image data x1 n=l
Initialise set of possible camera parameters G
For t=2,...,T {
Read image data xr~ Initialise set of possible camera parameters G(n)r
10 While nc change m solution space
ElimmateCells ( {G (n) 1 },s
ElimmateParameters ( {G(n)1, ,G(r
n=n+l
Optionally subdivide cells and go to 10
}
StandardRefmeSurface (G(n),S(n
The above discussion has focussed on the construction of a three dimensional representation of a three dimensional object from a plurality of two dimensional images including different views of the object. The method can oe applied in a number of image processing systems as discussed below.
Figure 10 shows a matching system for identifying objects from a video signal including views of the object from different directions. The system includes a camera 850 connected to a computer 860 connected to a random access storage device 870 storing a database of image data. As an example, the system can pe applied to the recognition of vehicles by monitoring vehicles passing the view of the camera 850. As a vehicle passes through the field of view of the camera, a numoer of two dimensional video images of the scene including tne vehicle are captured and stored on the computer. The computer system then uses the method described above to construct a 3D representation of the whole or part of the vehicle from the images captured b^ the camera 850. (It will be appreciated that alternatively, or in addition, the camera moves relative to a stationary object or moving object m order to capture sufficient 2D images.)
The database stores data relating to a number of images with which the constructed model can be compared to try and identify the object. For instance, the database can store a number of side elevations of vehicles and by rotating the 3D representation of the car and comparing it with the stored images, that stored image most closely matching the reconstructed object can be identified and thereby the identity of the car identified. The images with which the reconstructed model are compared will have associated with them in the database data relating to the identity of the image, such as model, name and manufacturer name m the car identification example.
Alternatively, the database can store image data relating to reconstructed objects as created by the method described above or some other 3D representation of the object. The matching procedure will then be a matter of comparing the representation constructed from the captured video images with the 3D models stored in the database to determine the most likely match. In this way, although only a part of the subject may be captured by the camera, the surface detail on that part of the object may be sufficient to enable a match to be made with the models stored in the database.
Figure 11 shows a system for identifying and/or manipulating an object in a video signal. The system includes a source of video signals in the form of a camera or a video recording device 880. The system includes a computer 860 to which video signals are supplied. The system also includes a video recording device 882 and a video display device 884 for recording and displaying video signals respectively. A video signal, such as a live TV feed or a recorded program is supplied by the camera or recorder to be displayed on the video display 884 or recorded on the recording device 882. The computer 860 monitors the video signal being transmitted and processes the video signal as described below.
The system can be used to identify an object m a video stream. As an example, the video signal 886 being transmitted includes a particular manufacturer's product. The computer 860 includes in its memory a representation of the manufacturers product as obtained by the method described above. The computer system 860 monitors the video signal 886 and captures video frames. From the different views of the objects shown in the video frames, the system reconstructs a representation of the object and compares it with the representation of the product stored in the memory to identify the product. This information can then be used to alert a party that the video being transmitted includes the manufacturer's product.
The system could also replace the object in the video screen with an altered object or and alternative object, a representation of which stored in the computer's memory, hence, the system could identify the existence of a particular object in the video signal from the video frames captured, and on identifying the object, it could be replaced by the same object but having altered properties, e.g. colour or surface decoration. The altered object is then substituted in the video image in place of the original object. The manipulated video signal can then be stored, broadcast, or displayed. For instance, the colour of a car could be changed or the surface decoration of an object up-dated to correspond with current packaging or decoration.
Instead of merely altering the appearance of the same object, an entirely different object could be inserted in the video signal, which has been derived from a 3D representation of the different object and which is accessible by the computer. The system of Fig 11 can also be used to manipulate images m a desired way. For instance a view of a rural scene could be captured as a number of two dimensional images and a three dimensional representation of the scene generated by the computer system. The computer system then analyses the content of the three dimensional representation to identify parts of the representation having particular features, e.g. degree of surface curvature, thereby identifying different articles. For instance buildings would have flat surfaces and so those parts of the 3D reconstruction corresponding to buildings could be identified. A tree would have a highly fractal surface shape and other natural objects could likewise be identified from the three dimensional reconstruction of the scene. Selected parts of the scene could then be altered independently of the other parts. For instance, the surface decoration of buildings could be altered or the colours of objects m the scene changed. Hence, the system would allow the manipulation of parts of an image of a scene for purposes of special effects and such like.
Figure 12 shows a system for encoding video signals for transmission. A camera 850 or video playing device 880 supplies video signals to a computer 860 which processes the images. For a 3D TV system a set of at least three cameras viewing a subject is preferred.
Individual image frames are converted into a three dimensional representation of the scene from the individual two dimensional images. The three dimensional representation of the scene created from the 2D visual images is transmitted by a transmitter 890 to a receiver 892 and supplied to a further computer. The three dimensional representation is then be stored in a random access memory device 896. Once the 3D representation of the scene has been stored m device 896, different views of the scene can be provided on a display device 898 merely be transmitting camera parameter data between the transmitter and receiver. This provides an object based encoded signal in which m order to provide a video signal of different views of a scene all that is required is data relating to the direction from which the scene is to be viewed rather than complete image data.
In order to handle a non-static scene, e.g. m which there is relative movement between articles m the scene, then all that is required is to transmit the differences between subsequent views of the scene. For instance in tne case of a scene showing a person walking down an otherwise static street scene, the first frame of the video image can be generated from the representation of the street scene that has been transmitted and a 3D representation of the person. For the next scene, the only data that needs to be transmitted is camera parameter data for any changing view of the scene and any changes in the representation of the entire scene. The only change would be owing to the movement of the person between frames, and so only the data required to define the changes in the reconstruction of the moving part of the person need to be transmitted. This represents a significant compression in the amount of information required in order to transmit a three dimensional television signal. Hence it will be appreciated, that the present invention can be exploited m the field of three dimensional TV svste s.

Claims

1. A method of constructing a three dimensional representation of a subject, comprising the steps of: (i providing at least two images, m which each image is a different view of the subject;
(HI defining a volume enclosing possible surfaces of the suDject, the volume being made up from a plurality of cells ; (in) determining an upper bound on the probability that a one of the plurality of the cells encloses the surface of the representation of the subject; (IV) determining a threshold probability; (v comparing the upper probability bound for the cell with the threshold probability; and
(vi) eliminating the cell from the volume if the upper probability bound is less than the threshold probability.
2. A method as claimed n claim 1, and including the step of repeating steps (m) to (vi) for all cells located between the periphery of the volume and the surface of the representation of the subject.
3. A method as claimed n claim 1, and including the further steps of: sub-dividing a cell into smaller sub cells which span the original cell; determining a new upper bound; determining a new threshold probability; and eliminating those sub cells having an upper bound less than the threshold probability so as to generate a more accurate surface of the representation of the subject.
4. A method as claimed ^n claim 1, and including the further steps of:
(I) defining an initial space representing sets of possible camera parameters;
(ii ) determining an upper bound on the probability that a set of camera parameters is true; (in) determining a threshold probability;
(iv) comparing the upper probability bound for the set of camera parameters with the threshold probability; and (v) eliminating the set of camera parameters frcm the initial space, if the upper probability bound is less than the threshold probability.
5. A method as claimed in claim 4, in which the set of camera parameters includes translation, rotation and internal camera parameters
6. A method as claimed m claim 4, and including the step of iterating the steps of claim 4 so as to identify cells enclosing a surface of tne representation best matching the subject or a set of surfaces of the representation best matching subject.
7. A method as claimed in claim 1, in which the upper bound is determined using Bayesian probability theory.
8. Computer program code executable on a computer to carry out a method as claimed m claim 1.
9. A method for identifying a subject from a video sιgna_ including different views of the subject, including constructing a three dimensional representation of the subject from the video signal by using the method of cla_m 1, and comparing the 3D representation with representations of known subjects.
10. A method as claimed in claim 9, in which the representations of known subjects have been constructed according to the method of claim 1.
11. A method for tracking a subject m a video signal including views of the subject including the step of matching a 3D representation of the subject created according to claim 1 with 2D video images derived from the video signal.
12. A method for altering a video image including a subject, including identifying the subject according to the method cf claim 9, and replacing the image of the subject in the videc signal with an altered image of the subject derived from the representation of the subject.
13. A method for altering a video image including a subject, including identifying the subject according to the method of claim 9, and replacing the image of the subject in the video signal with an image derived from a representation of a different subject.
14. A method for generating 3D video signals, for a 3D television system, including capturing images of a subject with three or more cameras placed around the subject and constructing a representation of the subject according to the method of claim 1, for each of a sequence of time steps .
15. A method for transmitting a "τιdeo signal or a subject, the method including generating a 3d representation of the subject according to the method of claim 1 and transmitting camera position data so as to generate different views of the subject.
16. A method for transmitting a video signal of a subject, the method including generating 3D representations of the subject according to the method of claim 1, and transmitting data relating to the differences between the 3D representations of the subject.
PCT/GB2001/000639 2000-02-16 2001-02-16 3d image processing system and method WO2001061650A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001233865A AU2001233865A1 (en) 2000-02-16 2001-02-16 3d image processing system and method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GBPCT/GB00/00492 2000-02-16
PCT/GB2000/000492 WO2000049527A1 (en) 1999-02-19 2000-02-16 Matching engine
GB0020741A GB0020741D0 (en) 2000-08-23 2000-08-23 3D image processing system and method
GB0020741.5 2000-08-23

Publications (1)

Publication Number Publication Date
WO2001061650A1 true WO2001061650A1 (en) 2001-08-23

Family

ID=26243369

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2001/000639 WO2001061650A1 (en) 2000-02-16 2001-02-16 3d image processing system and method

Country Status (2)

Country Link
AU (1) AU2001233865A1 (en)
WO (1) WO2001061650A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT201600091510A1 (en) * 2016-09-12 2018-03-12 Invrsion S R L System and method for creating three-dimensional models.

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999006959A1 (en) * 1997-08-01 1999-02-11 Avid Technology, Inc. Method and system for editing or modifying 3d animations in a non-linear editing environment
US5917937A (en) * 1997-04-15 1999-06-29 Microsoft Corporation Method for performing stereo matching to recover depths, colors and opacities of surface elements

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5917937A (en) * 1997-04-15 1999-06-29 Microsoft Corporation Method for performing stereo matching to recover depths, colors and opacities of surface elements
WO1999006959A1 (en) * 1997-08-01 1999-02-11 Avid Technology, Inc. Method and system for editing or modifying 3d animations in a non-linear editing environment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHANG R -F ET AL: "INTERFRAME DIFFERENCE QUADTREE EDGE-BASED SIDE-MATCH FINITE-STATE CLASSIFIED VECTOR QUANTIZATION FOR IMAGE SEQUENCE CODING", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,US,IEEE INC. NEW YORK, vol. 6, no. 1, 1 February 1996 (1996-02-01), pages 32 - 41, XP000625577, ISSN: 1051-8215 *
EISERT P ET AL: "Multi-hypothesis, volumetric reconstruction of 3-D objects from multiple calibrated camera views", 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. ICASSP99 (CAT. NO.99CH36258), PHOENIX, AZ, USA, 15 March 1999 (1999-03-15) - 19 March 1999 (1999-03-19), 1999, Piscataway, NJ, USA, IEEE, USA, pages 3509 - 3512 vol.6, XP002169812, ISBN: 0-7803-5041-3 *
SZELISKI R: "RAPID OCTREE CONSTRUCTION FROM IMAGE SEQUENCES", CVGIP IMAGE UNDERSTANDING,US,ACADEMIC PRESS, DULUTH, MA, vol. 58, no. 1, 1 July 1993 (1993-07-01), pages 23 - 32, XP000382074, ISSN: 1049-9660 *
WALLART O ET AL: "Temporal knowledge for cooperative distributed vision", SEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND ITS APPLICATIONS (CONF. PUBL. NO.465), PROCEEDINGS OF 7TH INTERNATIONAL CONGRESS ON IMAGE PROCESSING AND ITS APPLICATIONS, MANCHESTER, UK, 13-15 JULY 1999, 1999, London, UK, IEE, UK, pages 726 - 730 vol.2, XP002169813, ISBN: 0-85296-717-9 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT201600091510A1 (en) * 2016-09-12 2018-03-12 Invrsion S R L System and method for creating three-dimensional models.
WO2018046352A1 (en) * 2016-09-12 2018-03-15 Invrsion S.R.L. System, device and method for creating three-dimensional models
US11257287B2 (en) 2016-09-12 2022-02-22 Invrsion S.R.L. System, device and method for creating three-dimensional models

Also Published As

Publication number Publication date
AU2001233865A1 (en) 2001-08-27

Similar Documents

Publication Publication Date Title
US20230059839A1 (en) Quotidian scene reconstruction engine
Sun et al. Neural 3d reconstruction in the wild
Dick et al. Modelling and interpretation of architecture from several images
US6463176B1 (en) Image recognition/reproduction method and apparatus
Filip et al. Bidirectional texture function modeling: A state of the art survey
US6803910B2 (en) Rendering compressed surface reflectance fields of 3D objects
US6151424A (en) System for identifying objects and features in an image
US20030091227A1 (en) 3-D reconstruction engine
Wei Converting 2d to 3d: A survey
Zhang et al. GigaMVS: a benchmark for ultra-large-scale gigapixel-level 3D reconstruction
Wang et al. Building3D: A Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds
Lee et al. SAM-Net: LiDAR depth inpainting for 3D static map generation
WO2001061650A1 (en) 3d image processing system and method
King Range data analysis by free-space modeling and tensor voting
Zabulis et al. Increasing the accuracy of the space-sweeping approach to stereo reconstruction, using spherical backprojection surfaces
Sainz et al. Carving 3D models from uncalibrated views
Hummel On synthetic datasets for development of computer vision algorithms in airborne reconnaissance applications
Page et al. Towards computer-aided reverse engineering of heavy vehicle parts using laser range imaging techniques
Lach et al. Multisource data processing for semi-automated radiometrically-correct scene simulation
Yahya Deep Learning Based Methods on 2D Image and 3D Structure Inpainting
Comi et al. Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Park Towards Photo-Realistic 3D Reconstruction from Casual Scanning
Nilosek Analysis and exploitation of automatically generated scene structure from aerial imagery
CN113643421A (en) Three-dimensional reconstruction method and three-dimensional reconstruction device for image
Kissi Automatic Fracture Orientation Extraction from SfM Point Clouds

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP