US20010048753A1 - Semantic video object segmentation and tracking - Google Patents

Semantic video object segmentation and tracking Download PDF

Info

Publication number
US20010048753A1
US20010048753A1 US09/054,280 US5428098A US2001048753A1 US 20010048753 A1 US20010048753 A1 US 20010048753A1 US 5428098 A US5428098 A US 5428098A US 2001048753 A1 US2001048753 A1 US 2001048753A1
Authority
US
United States
Prior art keywords
pixel
boundary
pixels
cluster
marker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/054,280
Other versions
US6400831B2 (en
Inventor
Ming-Chieh Lee
Chuang Gu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US09/054,280 priority Critical patent/US6400831B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GU, CHUANG, LEE, MING-CHIEH
Publication of US20010048753A1 publication Critical patent/US20010048753A1/en
Application granted granted Critical
Publication of US6400831B2 publication Critical patent/US6400831B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/155Segmentation; Edge detection involving morphological operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]

Definitions

  • the invention relates to semantic video object extraction and tracking.
  • a video sequence is composed of a series of video frames, where each frame records objects at discrete moments in time.
  • each frame is represented by an array of pixels.
  • attaching semantic meaning to that portion of the video frame.
  • a ball, an aircraft, a building, a cell, a human body, etc. all represent some meaningful entities in the world. Semantic meaning is defined with respect to the user's context. Although vision seems simple to people, a computer does not know that a certain collection of pixels within a frame depicts a person.
  • a user can identify a part of a video frame based upon some semantic criteria (such as by applying an is a person criteria), and thus assign semantic meaning to that part of the frame; such identified data is typically referred to as a semantic video object.
  • An advantage to breaking video stream frames into one or more semantic objects is that in addition to compression efficiency inherent to coding only active objects, received data may also be more accurately reconstructed because knowledge of the object characteristics allows better prediction of its appearance in any given frame.
  • object tracking and extraction can be very useful in many fields.
  • video compression is important due to a large bandwidth requirement for transmitting video data.
  • bandwidth requirements may be reduced if one identifies (segments) a speaker within a video frame, removes (extracts) the speaker off the background, and then skips transmitting the background unless it changes.
  • the invention allows automatic tracking of an object through a video sequence. Initially a user is allowed to roughly identify an outline of the object in a first key frame. This rough outline is then automatically refined to locate the object's actual outline. Motion estimation techniques, such as global and local motion estimation, are used to track the movement of the object through the video sequence. The motion estimation is also applied to the refined boundary to generate a new rough outline in the next video frame, which is then refined for the next video frame. This automatic outline identification and refinement is repeated for subsequent frames.
  • Motion estimation techniques such as global and local motion estimation
  • the user is presented with a graphical user interface showing a frame of video data, and the user identifies, with a mouse, pen, tablet, etc., the rough outline of an object by selecting points around the perimeter of the object. Curve-fitting algorithms can be applied to fill in any gaps in the user-selected points.
  • the unsupervised tracking is performed. During unsupervised tracking, the motion of the object is identified from frame to frame. The system automatically locates similar semantic video objects in the remaining frames of the video sequence, and the identified object boundary is adjusted based on the motion transforms.
  • Mathematical morphology and global perspective motion estimation/-compensation is used to accomplish these unsupervised steps.
  • mathematical morphology can estimate many features of the geometrical structure in the video data, and aid image segmentation. Instead of simply segmenting an image into square pixel regions unrelated to frame content (i.e. not semantically based), objects are identified according to a semantic basis and their movement tracked throughout video frames. This object-based information is encoded into the video data stream, and on the receiving end, the object data is used to re-generate the original data, rather than just blindly reconstruct it from compressed pixel regions.
  • Global motion estimation is used to provide a very complete motion description for scene change from frame to frame, and is employed to track object motion during unsupervised processing.
  • other motion tracking methods e.g. block-based, mesh-based, parametric estimation motion estimation, and the like, may also be used.
  • the invention also allows for irregularly shaped objects, while remaining compatible with current compression algorithms.
  • Most video compression algorithms expect to receive a regular array of pixels. This does not correspond well with objects in the real world, as real-world objects are usually irregularly shaped.
  • a user identifies a semantically interesting portion of the video stream (i.e. the object), and this irregularly shaped object is converted into a regular array of pixels before being sent to a compression algorithm.
  • a computer can be programmed with software programming instructions for implementing a method of tracking rigid and non-rigid motion of an object across multiple video frames.
  • the object has a perimeter, and initially a user identifies a first boundary approximating this perimeter in a first video frame.
  • a global motion transformation is computed which encodes the movement of the object between the first video frame and a second video frame.
  • the global motion transformation is applied to the first boundary to identify a second boundary approximating the perimeter of the object in the second video frame.
  • an inner boundary inside the approximate boundary is defined, and an outer boundary outside the approximate boundary is defined.
  • the inner border is expanded and the outer boundary contracted so as to identify an outline corresponding to the actual border of the object roughly identified in the first frame.
  • expansion and contraction of the boundaries utilizes a morphological watershed computation to classify the object and its actual border.
  • a motion transformation function representing the transformation between the object in the first frame and the object of the second frame can be applied to the outline to warp it into a new approximate boundary for the object in the second frame.
  • inner and outer boundaries are defined for the automatically generated new approximate boundary, and then snapped to the object.
  • implementations can provide for setting an error threshold on boundary approximations (e.g. by a pixel-error analysis), allowing opportunity to re-identify the object's boundary in subsequent frames.
  • FIG. 1 is a flowchart of an implementation of a semantic object extraction system.
  • FIG. 2 is a continuation flow-chart of FIG. 1.
  • FIG. 3 shows a two-stage boundary outline approximation procedure.
  • FIG. 4 shows the definition and adjustment of In and Out boundaries.
  • FIG. 5 shows an example of pixel-wise classification for object boundary identification.
  • FIG. 6 shows an example of morphological watershed pixel-classification for object boundary identification.
  • FIG. 7 shows a hierarchical queue structure used by the FIG. 6 watershed algorithm.
  • FIG. 8 is a flowchart showing automatic tracking of a semantic object.
  • FIG. 9 shows an example of separable bilinear interpolation used by the FIG. 8 tracking.
  • FIG. 10 shows automatic warping of the FIG. 6 identified object boundary to generate a new approximate boundary in a subsequent video frame.
  • FIGS. 11 - 13 show sample output from the semantic video object extraction system for different types of video sequences.
  • the invention will be implemented as computer program instructions for controlling a computer system; these instructions can be encoded into firmware chips such as ROMS or EPROMS. Such instructions can originate as code written in a high-level language such as C or C++, which is then compiled or interpreted into the controlling instructions.
  • Computer systems include as their basic elements a computer, an input device, and output device.
  • the computer generally includes a central processing unit (CPU), and a memory system communicating through a bus structure.
  • the CPU includes an arithmetic logic unit (ALU) for performing computations, registers for temporary storage of data and instructions, and a control unit for controlling the operation of computer system in response to instructions from a computer program such as an application or an operating system.
  • ALU arithmetic logic unit
  • the memory system generally includes high-speed main memory in the form of random access memory (RAM) and read only memory (ROM) semiconductor devices, and secondary storage in the form of floppy disks, hard disks, tape, CD-ROM, etc. and other devices that use optical or magnetic recording material.
  • Main memory stores programs, such as a computer's operating system and currently running application programs, and also includes video display memory.
  • the input and output devices are typically peripheral devices connected by the bus structure to the computer.
  • An input device may be a keyboard, modem, pointing device, pen, or other device for providing input data to the computer.
  • An output device may be a display device, printer, sound device or other device for providing output data from the computer. It should be understood that these are illustrative elements of a basic computer system, and are not intended to a specific architecture for a computer system.
  • the invention is implemented as a software program code that is executed by the computer.
  • the invention can be encoded into hardware devices such as video processing boards and the like.
  • the implementation of the invention described below is basically an object tracking and extraction system that does not require any specific prior knowledge of the color, shape or motion of the data the system processes.
  • the system Based on initial user input, the system automatically generates accurate boundaries for an identified semantic object as the object moves through a video sequence.
  • the semantic object is first defined by a user's tracing an initial outline for the object within an initial video frame. After the object is defined, the object is tracked in subsequent frames.
  • a graphical user interface is presented to the user which allows the user to identify as well as refine indication of the object's outline.
  • Preferred segmentation and tracking systems using one or more homogeneous (i.e. similar) criteria are used to indicate how to partition input data.
  • Such criteria overcomes limitations in prior art methods of color or motion identification that do not provide for identification of semantic video objects.
  • identified object semantics is the basis for evaluating homogeneous criteria. That is, color, motion, or other identification can be used to identify a semantic object boundary, but the criteria is evaluated with respect to the user-identified semantic object. Therefore object color, shape or motion is not restricted.
  • FIG. 1 shows the two basic steps of the present system of semantic video object extraction.
  • the system needs a good semantic boundary for the initial frame, which will be used as a starting 2D-template for successive video frames.
  • a user indicates 110 the rough boundary of a semantic video object in the first frame with an input device such as a mouse, touch sensitive surface, pen, drawing tablet, or the like.
  • the system defines one boundary lying inside in the object, called the In boundary 102 and another boundary lying outside the object, called Out boundary 104 .
  • These two boundaries roughly indicate the representative pixels inside and outside the user-identified semantic video object.
  • These two boundaries are then snapped 106 into a precise boundary that identifies an extracted semantic video object boundary.
  • the user is given the opportunity to accept or reject 112 , 114 the user selected and computer generated outlines.
  • the goal of the user assistance is to provide an approximation of the object boundary by just using the input device, without the user having to precisely define or otherwise indicate control points around the image feature. Requiring precise identification of control points is time consuming, as well as limiting the resulting segmentation by the accuracy of the initial pixel definitions.
  • a preferred alternative to such a prior art method is to allow the user to identify and portray the initial object boundary easily and not precisely, and then have this initial approximation modified into a precise boundary.
  • FIG. 2 shows the second step 108 , in which the system finds similar templates in successive frames. Shown in FIG. 2 are F i , representing each original frame, V i , representing a corresponding motion information between the current semantic object boundary and the next one, and S i , representing the final extracted semantic boundary. Note that after completing boundary extraction S i , this S i becomes the starting frame F for the next frame i+1. That is, the results of a previous step becomes the starting input for the next step.
  • FIG. 2 shows the initial frame F 0 , and the tracking of an object's boundaries (from FIG. 1) through two successive frames F 1 , and F 2 .
  • Step 108 depends primarily on a motion estimation algorithm 116 that describes the evolution between the previous semantic video object boundary and the current one.
  • a global perspective algorithm is used, although other algorithms may be used instead.
  • a tracking procedure 118 receives as its input the boundary data S 0 and motion estimation data V 0 .
  • the approximate semantic video object boundary in the current frame can be obtained by taking the previous boundary identified by the user in the first step 100 , and warping it towards the current frame. That is, tracking function 118 is able to compute a new approximate boundary for the semantic object in current frame F 1 by adjusting previous boundary data S 0 according to motion data V 0 .
  • the new approximate boundary is snapped to a precise boundary S 1 , and the process repeats with boundary S 1 becoming a new input for processing a subsequent frame F 2 .
  • Both steps 100 and step 108 require the snapping of an approximate boundary to a precise one.
  • a morphological segmentation can be used to refine the initial user-defined boundary (step 110 ) and the motion compensated boundary (S 0 ) to get the final precise boundary of the semantic video object.
  • an error value may be included in the processing of the subsequent frames to allow setting a threshold after which a frame can be declared to be another initial frame requiring user assistance.
  • a good prediction mechanism should result in small error values, resulting in efficient coding of a video sequence.
  • errors may accumulate. Although allowing for further user-based refinement is not necessary, such assistance can increase the compression quality for complex video sequences.
  • FIG. 3 shows the results of a two-part approximation procedure, where the first part is the user's initial approximation of an image feature's outline 148 , and the second part is refining that outline 150 to allow segmentation of the object from the frame.
  • the first part 148 there are two general methods for identifying the initial boundary.
  • the first is a pixel-based method in which a user inputs the position of interior (opaque) pixels and exterior (transparent) pixels. This method has the serious shortcoming that collecting the points is time consuming and prone to inaccuracies. In addition, unless many points are collected, the points do not adequately disclose the true shape of the image feature.
  • the second is a contour-based method in which a user only indicates control points along the outline of an object boundary, and splines or polygons are used to approximate a boundary based upon the control points.
  • the addition of Splines is superior over the first method because it allows one to fill in the gaps between the indicated points.
  • the drawback is that a spline or polygon will generally produce a best-fit result for the input points given. With few points, broad curves or shapes will result. Thus, to get an accurate shape, many points need to be accurately placed about the image feature's true boundary.
  • n nodes guarantees a desired maximal boundary approximation error of e pixels, at a minimum the user must then enter n keystrokes to define a border.
  • n may be a very large number. In order to avoid such reduce user effort, n can be decreased, but this approach yields larger e vales.
  • a user has marked, with white points, portions of the left image 148 to identify an image feature of interest. Although it is preferable that the user define an entire outline around the image feature, doing so is unnecessary. As indicated above, gaps in the outline will be filled in with the hybrid pixel-polygon method.
  • the right image 150 shows the initial object boundary after gaps in the initial outline of the left image 148 have been filled in.
  • FIG. 4 shows in detail the definition of In and Out boundaries.
  • the initial boundary B init 200 is the one initially provided by the user assistance (FIG. 3) as an approximation of the image feature's true object boundary B 202 . Since the user is attempting to trace the real boundary as closely as possible, it is known that the real video object boundary is not too far away from B init 200 . Therefore, an interior In boundary B in 204 and an exterior Out boundary B out 206 are selected to limit the searching area for the real object boundary. B in lies strictly inside the image feature while B out lies outside the image feature.
  • Morphology is a method of performing shape-based processing that allows extraction of portions of an image. Morphology is applicable to 2D and 3D data, and works well with segmentation methods, since segmentation was developed for processing multidimensional images. The following is a brief overview of the erosion and dilation (B IN and B OUT ) operations. More detailed mathematical definitions can be found in many textbooks.
  • B in ⁇ s (B init )
  • B out ⁇ s (B init )
  • ⁇ and ⁇ are respectively morphological erosion and dilation operators, where B in ⁇ B init ⁇ B out .
  • erosion refers to an operation in which a structure element of particular shape is moved over the input image, and wherever the structure fits completely within the boundaries of a shape in the input image, a pixel is placed then an output image.
  • the net effect is that eroded shapes are smaller in size in the output image, and any input shapes smaller than the size of the probe disappear altogether (being smaller means they cannot contain the structure element).
  • dilation refers to an operation in which a structure element is moved over the input image, and when the structure element touches the boundary of a shape in the input image, then a pixel is placed in the output image.
  • a square structure element s will be used for the erosion and dilation operations, although it is understood by those skilled in the art that different shapes may be used to achieve different results.
  • a user can interactively choose the size and shape of the structure element, as well as perform preliminary trials of the effectiveness of the element so chosen, so long as the selection satisfies B in ⁇ B ⁇ B out .
  • Pixels lying along B in 204 and B out 206 respectively represent pixels belonging inside and outside the semantic video object defined by the user.
  • the next step is to classify (see FIG. 1 , step 106 ) each pixel between B out and B in to see determine whether it belongs to the semantic video object or not (i.e. determine whether it is an interior pixel).
  • Classification means employing some test to determine whether a pixel belongs to a particular group of pixels; in this case, classification refers to determining whether a particular pixel belongs to Out pixels (pixels outside the semantic video object) or to In pixels (pixels inside the object). Defining In and Out boundaries has reduced the classification search space since the boundaries give representative pixels inside and outside of the semantic object. It is understood by those skilled in the art that different classification methods may be used to classify pixels.
  • Classifying a pixel requires finding cluster centers and then grouping (classifying) pixels as belonging to a particular cluster center. Two types of cluster centers are defined, the first being an In cluster-center 216 for pixels inside the semantic video object, and the second being a Out cluster-center 218 for those pixels outside of the object. The more cluster centers that are defined, the more accurately a pixel may be classified. Since we already know that B in and B out identify inside and outside pixels, a preferred method is to define cluster centers to be all of the pixels along the B in and B out boundaries.
  • Cluster centers are denoted as ⁇ I 0 , I 1 , . . . , I m ⁇ 1 ⁇ and ⁇ O 0 , O 1 , . . . , O n ⁇ 1 ⁇ , where I S and O S are 5-dimensional vectors (r, g, b, x, y) representing the color and position values for each center. As denoted, there are m In cluster vectors and n Out cluster vectors. To classify the pixels, the three-color components (r, g, b) and the pixel location (x, y) are used as the classification basis.
  • each pixel inside the subset of pixels defined by B in and B out (a reduced search area) is assigned to the closest cluster center.
  • assigning pixels to a cluster center is by one of two methods. The first method is through pixel-wise classification (see FIG. 5), and the second method by morphological watershed classification (see FIG. 6), which produces results superior over pixel-wise analysis.
  • FIG. 5 shows an example of pixel-wise classification. For each pixel p 250 between the In 252 and Out 254 boundaries, which surround the object's real boundary 256 , the pixel's absolute distance to each cluster center is computed, such that
  • d j w color *(
  • d j w color *(
  • w color and w coord are the weights for the color and coordinate information.
  • the summation of w color and w coord is 1.
  • each pixel of the In and Out boundary is used to define a cluster center is defined to be pixels along the In and Out boundaries; shown are three representative pixels from each boundary 252 , 254 .
  • a pixel 250 is assigned to a cluster-center 252 , 254 according to its minimal distance from a cluster-center. If the pixel is classified to one of the In cluster-centers 252 , then the pixel is considered inside the user-defined semantic object. If a pixel is assigned to one of the Out clusters 254 , then the pixel is considered to be outside the semantic object.
  • a precise semantic object boundary is located at the meeting of the In and Out pixel regions. That is, as pixels are classified, In and Out regions are grown around the cluster centers. When there are no more pixels to classify, the boundary where the In and Out regions meet defines the semantic object's precise boundary.
  • the final In area constitutes the segmented semantic video object (i.e. the identified real border 202 of FIG. 4).
  • a drawback to such pixel-wise classification is that it requires an object to have a color fairly different from the background. Although this is often the case (and is usually pre-arranged to be so, e.g. blue-screening), when the colors are close, edges will be imprecisely snapped to the middle of the interior and exterior outlines, depending on where the user draws the outline and the expanded number of pixels. (The term snapped represents the cumulative effect of classifying pixels, in which the In and Out Borders are effectively moved closer to the actual object boundary.)
  • An additional drawback is that during classification, no use is made of a pixel's spatial relation to neighboring pixels. That is, a pixel could be tagged with higher-level semantic-type characteristics of the image (e.g. sizes, shapes and orientation of pixel regions), which would facilitate segmentation and reconstruction of the image. But pixel-wise classification ignores the spatial relations of pixels, resulting in a process sensitive to noise, and which may also destroy pixel geometrical relationships.
  • FIG. 6 shows a Morphological watershed classification approach, a preferred method over pixel-based classification.
  • the morphological watershed approach overcomes the pixel-based limitation of color distinctiveness, and it also uses the semantic-type information contained in pixel spatial relationships.
  • Program code for implementing the morphological watershed method starts from cluster centers and approaches each pixel p between the clusters of B in 302 and B out 304 , and is based upon an extension to a gray-tone only region-growing version of the watershed algorithm to provide a multi-valued watershed method able to handle color images (see Gu Ph.D.).
  • This multi-valued watershed starts from a set of markers extracted from the zones of interest and extends them until they occupy all the available space.
  • makers are chosen to be the pixels of the In and Out borders.
  • the available space to classify is then the points between B in 302 and B out 304 .
  • the multi-valued watershed classification process differs from the classical pixel-wise gray-scale approach which does not emphasize spatial coherence of the pixels.
  • the classical pixel-wise gray-scale approach just uses a distance function to measure the similarity of two pixels.
  • the multi-valued watershed method chooses a point because it is in the neighborhood of a marker and the similarity between the point and marker is the highest at that time than between any other pair of points and neighborhood markers.
  • Calculation of similarity can be divided into two steps. First, the multi-valued representation of the marker is evaluated. Second, the difference between the point and multi-valued representation is calculated.
  • the multi-valued representation of the marker uses the multi-valued mean of the color image over the marker. The distance function is defined as the absolute distance
  • d j
  • a hierarchical queue is a set of queues with different priorities. Each queue is a first-in-first-out data structure. The elements processed by the queue are the pixel positions in the space, which also defines the way of scanning.
  • the hierarchical queue structure bears the notion of two orders: the priority of the queues and the order inside a queue. At any time, the pixel position pulled out the queue is the one that is in the queue of highest priority and entered that queue the earliest. If the queue with higher priority has been empty, the pixel in the first non-empty queue of lower priority is considered.
  • FIG. 7 shows a hierarchical queue structure that can be used by the FIG. 6 multi-valued watershed algorithm.
  • the classification decision step (FIG. 1, step 106 ) is fulfilled by the multi-valued watershed to classify all uncertain areas between B in and B out to the In and Out markers.
  • the priority in the hierarchical queue is defined as the opposite of the distance between the pixel concerned and the representation of the marker. In practice, the representation of the marker is calculated as its mean color value.
  • a multi-valued watershed is composed of two stages: initialization of the hierarchical queue and the flooding.
  • the initialization consists of putting all the neighborhood pixels of all ‘in’ and ‘out’ markers into the hierarchical queue according to their similarity with the corresponding markers. The more similar the pair, the higher the priority. Note that it may happen that a pixel is put into different queues several times because it is in the neighborhood of several markers.
  • the flooding procedure starts.
  • the flooding follows a region growing process (e.g. defining a region based upon pixels sharing a certain characteristic), but from a set of known markers and under the constraint of the In and Out boundaries defining the scope of the classification process.
  • the flooding procedure begins to extract a pixel from the hierarchical queue. If this pixel has not yet been classified to any marker, the distance between this pixel and all the neighboring markers are calculated. At last, this pixel is classified to the most similar marker, and the multi-valued representation of that marker is then updated to take into account this new arrived pixel.
  • pixels assigned to a In marker are pixels interior to the image feature (semantic video object) defined by the user (FIG. 1, step 110 ), and pixels assigned to an Out marker are similarly considered pixels exterior to the semantic object.
  • the locations where the In and Out pixel regions meet identifies the semantic object's boundary. The combination of all In pixels constitutes the segmented semantic video object.
  • FIG. 8 is a flowchart showing automatic subsequent-frame boundary tracking, performed after a semantic video object has been identified in an initial frame, and its approximate boundary adjusted (i.e. after pixel classification). Once the adjusted boundary has been determined, it is tracked into successive predicted frames. Such tracking continues iteratively until the next initial frame (if one is provided for). Subsequent frame tracking consists of four steps: motion prediction 350 , motion estimation 352 , boundary warping 354 , and boundary adjustment 356 . Motion estimation 352 may track rigid-body as well as non-rigid motion.
  • Rigid motion can also be used to simulate non-rigid motion by applying rigid-motion analysis to sub-portions of an object, in addition to applying rigid-motion analysis to the overall object.
  • color information inside the semantic video object can be used since it is a good indicator of the global evolution of the semantic video object from frame to frame. For example, assume two color images under consideration are the previous frame F k ⁇ 1 (x, y) and the current frame F k (x′, y′).
  • (x′, y′) generally do not fall on integer pixel coordinates. Consequently, an interpolation of the color in F k should be performed when re-sampling values.
  • FIG. 9 shows an example of a separable bilinear interpolation that can be used as the FIG. 8 interpolation step.
  • a Levenberg-Marquardt iterative nonlinear algorithm is employed to perform the object-based minimization in order to get perspective parameters (a, b, c, d, e, f, g).
  • the Levenberg-Marquardt algorithm is a non-linear curve fitting method useful for finding solutions to complex fitting problems. However, other least-squares or equivalent techniques may also be used.
  • the next step is motion estimation 352 .
  • a good estimation starts with a good initial setting.
  • the trajectory of an object is generally smooth, this information can be applied to interpreting recorded data to improve compression efficiency.
  • the trajectory of a semantic video object is basically smooth, and that the motion information in a previous frame provides a good guess basis for motion in a current frame. Therefore, the previous motion parameters can be used as the starting point of the current motion estimation process.
  • the previous boundary is then warped 354 according to the predicted motion parameters (a, b, c, d, e, f, g, h), i.e., the semantic object boundary in the previous frame (B i ⁇ 1 ) is warped towards the current frame to become to current estimate boundary (B i ′). Since the warped points generally do not fall on integer pixel coordinates, an inverse warping process is performed in order to get the warped semantic object boundary for the current frame. Although one skilled in the art will recognize that alternate methods may be employed, one method of accomplishing warping is as follows.
  • non-rigid body motion also exists in many real situations. Such motion is difficult to model. As noted above, it can be modeled with rigid-motion analysis.
  • a preferred implementation treats non-rigid motion as a boundary refinement problem to be solved with a boundary adjustment step 406 .
  • B i the same method used in the initial frame segmentation to solve the boundary adjustment problem may be used again. The only difference is that B i ′ in the initial frame is provided interactively by a user and B i ′ in the subsequent frame is produced by a motion estimation/motion compensated procedure (i.e.
  • B i ′ can be used to generate In boundary B in ′ and Out boundary B out ′ in the current frame. Once In and Out boundaries are obtained, the morphological watershed step (see FIG. 6 discussion above) will produce the real semantic object boundary B i .
  • FIG. 10 shows the creation of a subsequent frame's (see FIG. 2) In 370 and Out 372 boundaries based on such warping.
  • FIGS. 11 - 13 show sample output from the semantic video object extraction system for several video sequences. These sequences represent different degrees of extraction difficulty in real situations. To parallel the operation of the invention, the samples are broken to parts, the first representing initial frame (user assisted) segmentation results, and the second subsequent frame (automatic) tracking results.
  • the three selected color video sequences are all in QCIF format (176 ⁇ 144) at 30 Hz.
  • the first Akiyo 450 sequence contains a woman sitting in front of a still background.
  • the motion of the human body is relatively small. However, this motion is a non-rigid body motion because the human body may contain moving and still parts at the same time.
  • the goal is to extract the human body 452 (semantic video object) from the background 454 .
  • the second Foreman 456 includes a man 458 talking in front of a building 460 .
  • This video data is more complex than Akiyo due to the camera being in motion while the man is talking.
  • the third video sequence is the well-known Mobile-calendar sequence 462 .
  • This sequence has a moving ball 464 that is traveling over a complex background 466 . This sequence is the most complex since the motion of the ball contains not only translational motion, but also rotational and zooming factors.
  • FIG. 11 shows initial frame segmentation results.
  • the first row 468 shows an initial boundary obtained by user assistance; this outline indicates an image feature within the video frame of semantic interest to the user.
  • the second row 470 shows the In and Out boundaries defined inside and outside of the semantic video object. For the output shown, the invention was configured with a size of 2 for the square structure element used for dilation and erosion.
  • the third row 472 shows the precise boundaries 474 located using the morphological segmentation tool (see FIG. 6 above).
  • the forth row 476 shows the final extracted semantic objects.
  • FIG. 12 shows subsequent frame boundary tracking results. For the output shown, the tracking was done at 30 Hz (no skipped frames).
  • Each column 478 , 480 , 482 represents four frames randomly chosen from each video sequence.
  • FIG. 13 shows the corresponding final extracted semantic video objects from the FIG. 12 frames.
  • the initial precise boundary 474 has been iteratively warped (FIG. 8, step 354 ) into a tracked 484 boundary throughout the video sequences; this allows implementations of the invention to automatically extract user-identified image features.

Abstract

A semantic video object extraction system using mathematical morphology and perspective motion modeling. A user indicates a rough outline around an image feature of interest for a first frame in a video sequence. Without further user assistance, the rough outline is processed by a morphological segmentation tool to snap the rough outline into a precise boundary surrounding the image feature. Motion modeling is performed on the image feature to track its movement into a subsequent video frame. The motion model is applied to the precise boundary to warp the precise outline into a new rough outline for the image feature in the subsequent video frame. This new rough outline is then snapped to locate a new precise boundary. Automatic processing is repeated for subsequent video frames.

Description

    FIELD OF THE INVENTION
  • The invention relates to semantic video object extraction and tracking. [0001]
  • BACKGROUND OF THE INVENTION
  • A video sequence is composed of a series of video frames, where each frame records objects at discrete moments in time. In a digital video sequence, each frame is represented by an array of pixels. When a person views a video frame, it is easy to recognize objects in the video frame, because the person can identify a portion of the video frame as being meaningful to the user. This is called attaching semantic meaning to that portion of the video frame. For example, a ball, an aircraft, a building, a cell, a human body, etc., all represent some meaningful entities in the world. Semantic meaning is defined with respect to the user's context. Although vision seems simple to people, a computer does not know that a certain collection of pixels within a frame depicts a person. To the computer, it is only a collection of pixels. However, a user can identify a part of a video frame based upon some semantic criteria (such as by applying an is a person criteria), and thus assign semantic meaning to that part of the frame; such identified data is typically referred to as a semantic video object. [0002]
  • An advantage to breaking video stream frames into one or more semantic objects (segmenting, or content based encoding) is that in addition to compression efficiency inherent to coding only active objects, received data may also be more accurately reconstructed because knowledge of the object characteristics allows better prediction of its appearance in any given frame. Such object tracking and extraction can be very useful in many fields. For example, in broadcasting and telecommunication, video compression is important due to a large bandwidth requirement for transmitting video data. For example, in a newscast monologue with a speaker in front of a fairly static background, bandwidth requirements may be reduced if one identifies (segments) a speaker within a video frame, removes (extracts) the speaker off the background, and then skips transmitting the background unless it changes. [0003]
  • Using semantic video objects to improve coding efficiency and reduce storage and transmission bandwidth has been investigated in the up-coming international video coding standard MPEG4. (See ISO/IEC JTC1/SC29/WG11. MPEG4 Video Verification Model Version 8.0, July. 1997; Lee, et al., A layered video object coding system using sprite and affine motion model, IEEE Tran. on Circuits and System for Video Technology, Vol. 7, No. 1, January 1997.) In the computer domain, web technology has new opportunities involving searching and interacting with meaningful video objects in a still or dynamic scene. To do so, extraction of semantic video objects is very important. In the pattern recognition domain, accurate and robust semantic visual information extraction aids medical imaging, industrial robotics, remote sensing, and military applications. (See Marr, Vision, W. H. Freeman, New York, 1982 (hereafter Marr).) [0004]
  • But, although useful, general semantic visual information extraction is difficult. Although human eyes see data that is easily interpreted by our brains as semantic video objects, such identification is a fundamental problem for image analysis. This problem is termed a segmentation problem, where the goal is to aid a computer in distinguishing between different objects within a video frame. Objects are separated from each other using some homogeneous criteria. Homogeneity refers to grouping data according to some similar characteristic. Different definitions for homogeneity can lead to different segmentation results for the same input data. For example, homogeneous segmentation may be based on a combination of motion and texture analysis. The criteria chosen for semantic video object extraction will determine the effectiveness of the segmentation process. [0005]
  • During the past two decades, researchers have investigated unsupervised segmentation. Some researches proposed using homogeneous grayscale/or homogenous color as a criterion for identifying regions. Others suggest using homogenous motion information to identify moving objects. (See Haralick and Shapiro, Image segmentation techniques, CVGIP, Vol. 29, pp. 100-132, 1985; C. Gu, [0006] Multi-valued morphology and segmentation-based coding, Ph.D. dissertation, LTS/EPFL, (hereafter Gu Ph.D.), http://ltswww.epfl.ch/Staff/gu.html, 1995.)
  • This research in grayscale-oriented analysis can be classified into single-level methods and multi-level approaches. Single-level methods generally use edge-based detection methods, k-nearest neighbor, or estimation algorithms. (See Canny, [0007] A computational approach to edge detection, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 8, pp. 679-698,1986; Cover and Hart, Nearest neighbor pattern classification, IEEE Trans. Information Theory, Vol. 13, pp. 21-27, 1967; Chen and Pavlidis, Image segmentation as an estimation problem, Computer Graphics and Image Processing, Vol. 13, pp. 153-172, 1980).
  • Unfortunately, although these techniques work well when the input data is relatively simple, clean, and fits the model well, they lack generality and robustness. To overcome these limitations, researchers focused on multi-level methods such as split and merge, pyramid linking, and morphological methods. (See Burt, et al., [0008] Segmentation and estimation of image region properties through cooperative hierarchical computation, IEEE Trans. On System, Man and Cybernetics, Vol. 11, pp. 802-809, 1981).
  • These technologies provide better performance than the prior single-level methods, but results are inadequate because these methods do not properly handle video objects that contain completely different grayscales/colors. An additional drawback to these approaches is that research in the motion oriented segmentation domain assumes that a semantic object has homogeneous motion. [0009]
  • Well known attempts have been made to deal with these problems. These include Hough transformation, multi-resolution region-growing, and relaxation clustering. But, each of these methods is based on optical flow estimation. This estimation technique is known to frequently produce inaccurately determined motion boundaries. In addition, these methods are not suitable to semantic video object extraction because they only employ homogeneous motion information while a semantic video object can have complex motions inside the object (e.g. rigid-body motion). [0010]
  • In an attempt to overcome these limitations, subsequent research focused on object tracking. This is a class of methods related to semantic video object extraction, and which is premised on estimating an object's current dynamic state based on a previous one, where the trajectory of dynamic states are temporally linked. Different features of an image have been used for tracking frame to frame changes, e.g., tracking points, intensity edges, and textures. But these features do not include semantic information about the object being tracked; simply tracking control points or features ignores important information about the nature of the object that can be used to facilitate encoding and decoding compression data. Notwithstanding significant research in video compression, little of this research considers semantic video object tracking. [0011]
  • Recently, some effort has been invested in semantic video object extraction problem with tracking. (See Gu Ph.D.; C. Gu, T. Ebrahimi and M. Kunt, [0012] Morphological moving object segmentation and tracking for content-based video coding, International Symposium on Multimedia Communication and Video Coding, New York, 1995, Plenum Press.) This research primarily attempts to segment a dynamic image sequence into regions with homogeneous motions that correspond to real moving objects. A joint spatio-temporal method for representing spatial and temporal relationships between objects in a video sequence was developed using a morphological motion tracking approach. However, this method relies on the estimated optical flow, which, as noted above, generally is not sufficiently accurate. In addition, since different parts of a semantic video object can have both moving and non-moving elements, results can be further imprecise.
  • Thus, methods for extracting semantic visual information based on homogeneous color or motion criteria are unsatisfactory, because each homogeneous criterion only deals with a limited set of input configurations, and cannot handle a general semantic video object having multiple colors and multiple motions. Processing such a restricted set of input configurations results in partial solutions for semantic visual information extraction. [0013]
  • One approach to overcome limited input configurations has been to detect shapes through user selected points using an energy formulation. However, a problem with this approach is that positioning the points is an imprecise process. This results in imprecise identification of an image feature (an object within the video frame) of interest. [0014]
  • SUMMARY OF THE INVENTION
  • The invention allows automatic tracking of an object through a video sequence. Initially a user is allowed to roughly identify an outline of the object in a first key frame. This rough outline is then automatically refined to locate the object's actual outline. Motion estimation techniques, such as global and local motion estimation, are used to track the movement of the object through the video sequence. The motion estimation is also applied to the refined boundary to generate a new rough outline in the next video frame, which is then refined for the next video frame. This automatic outline identification and refinement is repeated for subsequent frames. [0015]
  • Preferably, the user is presented with a graphical user interface showing a frame of video data, and the user identifies, with a mouse, pen, tablet, etc., the rough outline of an object by selecting points around the perimeter of the object. Curve-fitting algorithms can be applied to fill in any gaps in the user-selected points. After this initial segmentation of the object, the unsupervised tracking is performed. During unsupervised tracking, the motion of the object is identified from frame to frame. The system automatically locates similar semantic video objects in the remaining frames of the video sequence, and the identified object boundary is adjusted based on the motion transforms. [0016]
  • Mathematical morphology and global perspective motion estimation/-compensation (or an equivalent object tracking system) is used to accomplish these unsupervised steps. Using a set-theoretical methodology for image analysis (i.e. providing a mathematical framework to define image abstraction), mathematical morphology can estimate many features of the geometrical structure in the video data, and aid image segmentation. Instead of simply segmenting an image into square pixel regions unrelated to frame content (i.e. not semantically based), objects are identified according to a semantic basis and their movement tracked throughout video frames. This object-based information is encoded into the video data stream, and on the receiving end, the object data is used to re-generate the original data, rather than just blindly reconstruct it from compressed pixel regions. Global motion estimation is used to provide a very complete motion description for scene change from frame to frame, and is employed to track object motion during unsupervised processing. However, other motion tracking methods, e.g. block-based, mesh-based, parametric estimation motion estimation, and the like, may also be used. [0017]
  • The invention also allows for irregularly shaped objects, while remaining compatible with current compression algorithms. Most video compression algorithms expect to receive a regular array of pixels. This does not correspond well with objects in the real world, as real-world objects are usually irregularly shaped. To allow processing of arbitrarily shaped objects by conventional compression schemes, a user identifies a semantically interesting portion of the video stream (i.e. the object), and this irregularly shaped object is converted into a regular array of pixels before being sent to a compression algorithm. [0018]
  • Thus, a computer can be programmed with software programming instructions for implementing a method of tracking rigid and non-rigid motion of an object across multiple video frames. The object has a perimeter, and initially a user identifies a first boundary approximating this perimeter in a first video frame. A global motion transformation is computed which encodes the movement of the object between the first video frame and a second video frame. The global motion transformation is applied to the first boundary to identify a second boundary approximating the perimeter of the object in the second video frame. By successive application of motion transformations, boundaries for the object can be automatically identified in successive frames. [0019]
  • Alternatively, after the user identifies an initial approximate boundary near the border/perimeter of the object, an inner boundary inside the approximate boundary is defined, and an outer boundary outside the approximate boundary is defined. The inner border is expanded and the outer boundary contracted so as to identify an outline corresponding to the actual border of the object roughly identified in the first frame. Preferably expansion and contraction of the boundaries utilizes a morphological watershed computation to classify the object and its actual border. [0020]
  • A motion transformation function representing the transformation between the object in the first frame and the object of the second frame, can be applied to the outline to warp it into a new approximate boundary for the object in the second frame. In subsequent video frames, inner and outer boundaries are defined for the automatically generated new approximate boundary, and then snapped to the object. Note that implementations can provide for setting an error threshold on boundary approximations (e.g. by a pixel-error analysis), allowing opportunity to re-identify the object's boundary in subsequent frames.[0021]
  • The foregoing and other features and advantages will be more readily apparent from the following detailed description, which proceeds with reference to the accompanying drawings. [0022]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing(s) will be provided by the Patent and Trademark Office upon request and payment of the necessary fee. [0023]
  • FIG. 1 is a flowchart of an implementation of a semantic object extraction system. [0024]
  • FIG. 2 is a continuation flow-chart of FIG. 1. [0025]
  • FIG. 3 shows a two-stage boundary outline approximation procedure. [0026]
  • FIG. 4 shows the definition and adjustment of In and Out boundaries. [0027]
  • FIG. 5 shows an example of pixel-wise classification for object boundary identification. [0028]
  • FIG. 6 shows an example of morphological watershed pixel-classification for object boundary identification. [0029]
  • FIG. 7 shows a hierarchical queue structure used by the FIG. 6 watershed algorithm. [0030]
  • FIG. 8 is a flowchart showing automatic tracking of a semantic object. [0031]
  • FIG. 9 shows an example of separable bilinear interpolation used by the FIG. 8 tracking. [0032]
  • FIG. 10 shows automatic warping of the FIG. 6 identified object boundary to generate a new approximate boundary in a subsequent video frame. [0033]
  • FIGS. [0034] 11-13 show sample output from the semantic video object extraction system for different types of video sequences.
  • DETAILED DESCRIPTION
  • It is expected that the invention will be implemented as computer program instructions for controlling a computer system; these instructions can be encoded into firmware chips such as ROMS or EPROMS. Such instructions can originate as code written in a high-level language such as C or C++, which is then compiled or interpreted into the controlling instructions. [0035]
  • Computer systems include as their basic elements a computer, an input device, and output device. The computer generally includes a central processing unit (CPU), and a memory system communicating through a bus structure. The CPU includes an arithmetic logic unit (ALU) for performing computations, registers for temporary storage of data and instructions, and a control unit for controlling the operation of computer system in response to instructions from a computer program such as an application or an operating system. [0036]
  • The memory system generally includes high-speed main memory in the form of random access memory (RAM) and read only memory (ROM) semiconductor devices, and secondary storage in the form of floppy disks, hard disks, tape, CD-ROM, etc. and other devices that use optical or magnetic recording material. Main memory stores programs, such as a computer's operating system and currently running application programs, and also includes video display memory. The input and output devices are typically peripheral devices connected by the bus structure to the computer. An input device may be a keyboard, modem, pointing device, pen, or other device for providing input data to the computer. An output device may be a display device, printer, sound device or other device for providing output data from the computer. It should be understood that these are illustrative elements of a basic computer system, and are not intended to a specific architecture for a computer system. [0037]
  • In a preferred embodiment, the invention is implemented as a software program code that is executed by the computer. However, as noted above, the invention can be encoded into hardware devices such as video processing boards and the like. [0038]
  • Overview
  • The implementation of the invention described below is basically an object tracking and extraction system that does not require any specific prior knowledge of the color, shape or motion of the data the system processes. Based on initial user input, the system automatically generates accurate boundaries for an identified semantic object as the object moves through a video sequence. Preferably, the semantic object is first defined by a user's tracing an initial outline for the object within an initial video frame. After the object is defined, the object is tracked in subsequent frames. Preferably, a graphical user interface is presented to the user which allows the user to identify as well as refine indication of the object's outline. [0039]
  • Preferred segmentation and tracking systems using one or more homogeneous (i.e. similar) criteria are used to indicate how to partition input data. Such criteria overcomes limitations in prior art methods of color or motion identification that do not provide for identification of semantic video objects. Here, identified object semantics is the basis for evaluating homogeneous criteria. That is, color, motion, or other identification can be used to identify a semantic object boundary, but the criteria is evaluated with respect to the user-identified semantic object. Therefore object color, shape or motion is not restricted. [0040]
  • FIG. 1 shows the two basic steps of the present system of semantic video object extraction. In the [0041] first step 100, the system needs a good semantic boundary for the initial frame, which will be used as a starting 2D-template for successive video frames. During this step a user indicates 110 the rough boundary of a semantic video object in the first frame with an input device such as a mouse, touch sensitive surface, pen, drawing tablet, or the like. Using this initial boundary, the system defines one boundary lying inside in the object, called the In boundary 102 and another boundary lying outside the object, called Out boundary 104. These two boundaries roughly indicate the representative pixels inside and outside the user-identified semantic video object. These two boundaries are then snapped 106 into a precise boundary that identifies an extracted semantic video object boundary. Preferably the user is given the opportunity to accept or reject 112, 114 the user selected and computer generated outlines.
  • The goal of the user assistance is to provide an approximation of the object boundary by just using the input device, without the user having to precisely define or otherwise indicate control points around the image feature. Requiring precise identification of control points is time consuming, as well as limiting the resulting segmentation by the accuracy of the initial pixel definitions. A preferred alternative to such a prior art method is to allow the user to identify and portray the initial object boundary easily and not precisely, and then have this initial approximation modified into a precise boundary. [0042]
  • FIG. 2 shows the [0043] second step 108, in which the system finds similar templates in successive frames. Shown in FIG. 2 are Fi, representing each original frame, Vi, representing a corresponding motion information between the current semantic object boundary and the next one, and Si, representing the final extracted semantic boundary. Note that after completing boundary extraction Si, this Si becomes the starting frame F for the next frame i+1. That is, the results of a previous step becomes the starting input for the next step. FIG. 2 shows the initial frame F0, and the tracking of an object's boundaries (from FIG. 1) through two successive frames F1, and F2.
  • [0044] Step 108 depends primarily on a motion estimation algorithm 116 that describes the evolution between the previous semantic video object boundary and the current one. Preferably a global perspective algorithm is used, although other algorithms may be used instead. A tracking procedure 118 receives as its input the boundary data S0 and motion estimation data V0. Once the motion information V0 is known, the approximate semantic video object boundary in the current frame can be obtained by taking the previous boundary identified by the user in the first step 100, and warping it towards the current frame. That is, tracking function 118 is able to compute a new approximate boundary for the semantic object in current frame F1 by adjusting previous boundary data S0 according to motion data V0. As was done with the user-defined initial boundary, the new approximate boundary is snapped to a precise boundary S1, and the process repeats with boundary S1 becoming a new input for processing a subsequent frame F2.
  • Both [0045] steps 100 and step 108 require the snapping of an approximate boundary to a precise one. As described below, a morphological segmentation can be used to refine the initial user-defined boundary (step 110) and the motion compensated boundary (S0) to get the final precise boundary of the semantic video object.
  • Note that an error value may be included in the processing of the subsequent frames to allow setting a threshold after which a frame can be declared to be another initial frame requiring user assistance. A good prediction mechanism should result in small error values, resulting in efficient coding of a video sequence. However, in a lossy system, errors may accumulate. Although allowing for further user-based refinement is not necessary, such assistance can increase the compression quality for complex video sequences. [0046]
  • Boundary Approximation
  • FIG. 3 shows the results of a two-part approximation procedure, where the first part is the user's initial approximation of an image feature's [0047] outline 148, and the second part is refining that outline 150 to allow segmentation of the object from the frame.
  • For the [0048] first part 148, there are two general methods for identifying the initial boundary. The first is a pixel-based method in which a user inputs the position of interior (opaque) pixels and exterior (transparent) pixels. This method has the serious shortcoming that collecting the points is time consuming and prone to inaccuracies. In addition, unless many points are collected, the points do not adequately disclose the true shape of the image feature.
  • The second is a contour-based method in which a user only indicates control points along the outline of an object boundary, and splines or polygons are used to approximate a boundary based upon the control points. The addition of Splines is superior over the first method because it allows one to fill in the gaps between the indicated points. The drawback, however, is that a spline or polygon will generally produce a best-fit result for the input points given. With few points, broad curves or shapes will result. Thus, to get an accurate shape, many points need to be accurately placed about the image feature's true boundary. But, if it is assumed n nodes guarantees a desired maximal boundary approximation error of e pixels, at a minimum the user must then enter n keystrokes to define a border. For complex shapes, n may be a very large number. In order to avoid such reduce user effort, n can be decreased, but this approach yields larger e vales. [0049]
  • The limitations inherent to either prior art method may be overcome by combining the precision of the first pixel-based approach with the efficiency of the second spline/polygonal one, into a pixel-polygon approach for fixing an initial border around an image feature of interest. The complexity of the shape, e.g. straight or complicated boundary, can control whether a polygonal or pixel-wise approach is used for a particular portion of the boundary surrounding the image feature of interest. After the initial border is fixed, it is adjusted (FIG. 1 steps [0050] 102-106) to fit the semantic object's actual border.
  • As shown, a user has marked, with white points, portions of the [0051] left image 148 to identify an image feature of interest. Although it is preferable that the user define an entire outline around the image feature, doing so is unnecessary. As indicated above, gaps in the outline will be filled in with the hybrid pixel-polygon method. The right image 150 shows the initial object boundary after gaps in the initial outline of the left image 148 have been filled in. By allowing the user to draw the outline, the user is able to define many control points without the tedium of specifying each one individually. In the prior art, allowing such gaps in the border required a tradeoff between precision and convenience. The present invention avoids such a tradeoff by defining In and Out boundaries and modifying them to precisely locate the actual boundary of the (roughly) indicated image feature.
  • Approximation Adjustment
  • FIG. 4 shows in detail the definition of In and Out boundaries. The [0052] initial boundary B init 200 is the one initially provided by the user assistance (FIG. 3) as an approximation of the image feature's true object boundary B 202. Since the user is attempting to trace the real boundary as closely as possible, it is known that the real video object boundary is not too far away from B init 200. Therefore, an interior In boundary B in 204 and an exterior Out boundary B out 206 are selected to limit the searching area for the real object boundary. Bin lies strictly inside the image feature while Bout lies outside the image feature.
  • Preferably, morphological operators are used to obtain B[0053] in and Bout. Morphology is a method of performing shape-based processing that allows extraction of portions of an image. Morphology is applicable to 2D and 3D data, and works well with segmentation methods, since segmentation was developed for processing multidimensional images. The following is a brief overview of the erosion and dilation (BIN and BOUT) operations. More detailed mathematical definitions can be found in many textbooks.
  • For dilation of a set X by symmetrical structuring S, the dilation is the locus of the center of S when S touches X. This can be written as δ[0054] s(X)={x+s, x∈X, s∈∈S}, which is also known as Minkowski addition. Similarly, for erosion of a set X by a symmetrical structuring S, the erosion is the locus of center of the structuring element S when S is included in X. This can be written as εs(X)={y, ∀s∈S, y+s∈X}, which is Minkowski subtraction. Here, Bins(Binit), and Bouts(Binit), ε and δ are respectively morphological erosion and dilation operators, where Bin⊂Binit⊂Bout.
  • The term erosion refers to an operation in which a structure element of particular shape is moved over the input image, and wherever the structure fits completely within the boundaries of a shape in the input image, a pixel is placed then an output image. The net effect is that eroded shapes are smaller in size in the output image, and any input shapes smaller than the size of the probe disappear altogether (being smaller means they cannot contain the structure element). The term dilation refers to an operation in which a structure element is moved over the input image, and when the structure element touches the boundary of a shape in the input image, then a pixel is placed in the output image. [0055]
  • Preferably a square structure element s will be used for the erosion and dilation operations, although it is understood by those skilled in the art that different shapes may be used to achieve different results. With use of a proper user interface, a user can interactively choose the size and shape of the structure element, as well as perform preliminary trials of the effectiveness of the element so chosen, so long as the selection satisfies B[0056] in⊂B⊂Bout.
  • Pixels lying along [0057] B in 204 and B out 206 respectively represent pixels belonging inside and outside the semantic video object defined by the user. After defining the In and Out boundaries, the next step is to classify (see FIG.1, step 106) each pixel between Bout and Bin to see determine whether it belongs to the semantic video object or not (i.e. determine whether it is an interior pixel). Classification means employing some test to determine whether a pixel belongs to a particular group of pixels; in this case, classification refers to determining whether a particular pixel belongs to Out pixels (pixels outside the semantic video object) or to In pixels (pixels inside the object). Defining In and Out boundaries has reduced the classification search space since the boundaries give representative pixels inside and outside of the semantic object. It is understood by those skilled in the art that different classification methods may be used to classify pixels.
  • Classifying a pixel requires finding cluster centers and then grouping (classifying) pixels as belonging to a particular cluster center. Two types of cluster centers are defined, the first being an In cluster-[0058] center 216 for pixels inside the semantic video object, and the second being a Out cluster-center 218 for those pixels outside of the object. The more cluster centers that are defined, the more accurately a pixel may be classified. Since we already know that Bin and Bout identify inside and outside pixels, a preferred method is to define cluster centers to be all of the pixels along the Bin and Bout boundaries.
  • Cluster centers are denoted as {I[0059] 0, I1, . . . , Im−1} and {O0, O1, . . . , On−1}, where IS and OS are 5-dimensional vectors (r, g, b, x, y) representing the color and position values for each center. As denoted, there are m In cluster vectors and n Out cluster vectors. To classify the pixels, the three-color components (r, g, b) and the pixel location (x, y) are used as the classification basis. To group the pixels, each pixel inside the subset of pixels defined by Bin and Bout (a reduced search area) is assigned to the closest cluster center. Once the cluster centers have been defined, assigning pixels to a cluster center is by one of two methods. The first method is through pixel-wise classification (see FIG. 5), and the second method by morphological watershed classification (see FIG. 6), which produces results superior over pixel-wise analysis.
  • Pixel-wise Classification
  • FIG. 5 shows an example of pixel-wise classification. For each [0060] pixel p 250 between the In 252 and Out 254 boundaries, which surround the object's real boundary 256, the pixel's absolute distance to each cluster center is computed, such that
  • d j =w color*(|r−r j |+|g−g j |+|b−b j|)+w coord(|x−x j |+|y−y j|), 0<i<m,
  • d j =w color*(|r−r j |+|g−g j |+|b−b j|)+w coord(|x−x j |+|y−y j|), 0<j<n,
  • where w[0061] color and wcoord are the weights for the color and coordinate information. The summation of wcolor and wcoord is 1. As noted above, preferably each pixel of the In and Out boundary is used to define a cluster center is defined to be pixels along the In and Out boundaries; shown are three representative pixels from each boundary 252, 254.
  • A [0062] pixel 250 is assigned to a cluster- center 252, 254 according to its minimal distance from a cluster-center. If the pixel is classified to one of the In cluster-centers 252, then the pixel is considered inside the user-defined semantic object. If a pixel is assigned to one of the Out clusters 254, then the pixel is considered to be outside the semantic object. A precise semantic object boundary is located at the meeting of the In and Out pixel regions. That is, as pixels are classified, In and Out regions are grown around the cluster centers. When there are no more pixels to classify, the boundary where the In and Out regions meet defines the semantic object's precise boundary. The final In area constitutes the segmented semantic video object (i.e. the identified real border 202 of FIG. 4).
  • A drawback to such pixel-wise classification is that it requires an object to have a color fairly different from the background. Although this is often the case (and is usually pre-arranged to be so, e.g. blue-screening), when the colors are close, edges will be imprecisely snapped to the middle of the interior and exterior outlines, depending on where the user draws the outline and the expanded number of pixels. (The term snapped represents the cumulative effect of classifying pixels, in which the In and Out Borders are effectively moved closer to the actual object boundary.) An additional drawback is that during classification, no use is made of a pixel's spatial relation to neighboring pixels. That is, a pixel could be tagged with higher-level semantic-type characteristics of the image (e.g. sizes, shapes and orientation of pixel regions), which would facilitate segmentation and reconstruction of the image. But pixel-wise classification ignores the spatial relations of pixels, resulting in a process sensitive to noise, and which may also destroy pixel geometrical relationships. [0063]
  • Watershed Classification
  • FIG. 6 shows a Morphological watershed classification approach, a preferred method over pixel-based classification. The morphological watershed approach overcomes the pixel-based limitation of color distinctiveness, and it also uses the semantic-type information contained in pixel spatial relationships. [0064]
  • Program code for implementing the morphological watershed method starts from cluster centers and approaches each pixel p between the clusters of [0065] B in 302 and B out 304, and is based upon an extension to a gray-tone only region-growing version of the watershed algorithm to provide a multi-valued watershed method able to handle color images (see Gu Ph.D.).
  • This multi-valued watershed starts from a set of markers extracted from the zones of interest and extends them until they occupy all the available space. As with pixel-based classification, preferably makers are chosen to be the pixels of the In and Out borders. The available space to classify is then the points between [0066] B in 302 and B out 304. The multi-valued watershed classification process differs from the classical pixel-wise gray-scale approach which does not emphasize spatial coherence of the pixels. The classical pixel-wise gray-scale approach just uses a distance function to measure the similarity of two pixels. In contrast, the multi-valued watershed method chooses a point because it is in the neighborhood of a marker and the similarity between the point and marker is the highest at that time than between any other pair of points and neighborhood markers.
  • Calculation of similarity can be divided into two steps. First, the multi-valued representation of the marker is evaluated. Second, the difference between the point and multi-valued representation is calculated. The multi-valued representation of the marker uses the multi-valued mean of the color image over the marker. The distance function is defined as the absolute distance[0067]
  • d j =|r−r j |+|g−g j |+|b−b j, 0<i<(m+n).
  • Intuitively, two filling floods are starting from In and Out positions, where these floods run into the middle place where the object boundary is defined. In this method, the spatial coherence is considered in the region-growing procedure. Therefore, the result is much less sensitive to the existing noise in the data. [0068]
  • The efficacy of a multi-valued watershed approach depends on the scanning method used. Preferred implementations use code for a scanning method which uses a hierarchical queue. (See Meyer, [0069] Color image segmentation, 4th International Conference on Image Processing and its applications, pp. 303-304, Netherlands, May 1992.) A hierarchical queue is a set of queues with different priorities. Each queue is a first-in-first-out data structure. The elements processed by the queue are the pixel positions in the space, which also defines the way of scanning. The hierarchical queue structure bears the notion of two orders: the priority of the queues and the order inside a queue. At any time, the pixel position pulled out the queue is the one that is in the queue of highest priority and entered that queue the earliest. If the queue with higher priority has been empty, the pixel in the first non-empty queue of lower priority is considered.
  • FIG. 7 shows a hierarchical queue structure that can be used by the FIG. 6 multi-valued watershed algorithm. Once the In and Out markers are extracted, the classification decision step (FIG. 1, step [0070] 106) is fulfilled by the multi-valued watershed to classify all uncertain areas between Bin and Bout to the In and Out markers. The priority in the hierarchical queue is defined as the opposite of the distance between the pixel concerned and the representation of the marker. In practice, the representation of the marker is calculated as its mean color value.
  • Generally, a multi-valued watershed is composed of two stages: initialization of the hierarchical queue and the flooding. The initialization consists of putting all the neighborhood pixels of all ‘in’ and ‘out’ markers into the hierarchical queue according to their similarity with the corresponding markers. The more similar the pair, the higher the priority. Note that it may happen that a pixel is put into different queues several times because it is in the neighborhood of several markers. [0071]
  • After the initialization, the flooding procedure starts. The flooding follows a region growing process (e.g. defining a region based upon pixels sharing a certain characteristic), but from a set of known markers and under the constraint of the In and Out boundaries defining the scope of the classification process. The flooding procedure begins to extract a pixel from the hierarchical queue. If this pixel has not yet been classified to any marker, the distance between this pixel and all the neighboring markers are calculated. At last, this pixel is classified to the most similar marker, and the multi-valued representation of that marker is then updated to take into account this new arrived pixel. Similarly, all pixels in the neighborhood of the recently classified pixel are then processed, and they are placed into the hierarchical queue according to their similarity (distance value) to the representation of the marker. The more similar the points, the higher the pixel's priority in the queue. Gradually, all the uncertain areas between B[0072] in and Bout will be assigned to the markers.
  • When there are no more pixels to classify, pixels assigned to a In marker are pixels interior to the image feature (semantic video object) defined by the user (FIG. 1, step [0073] 110), and pixels assigned to an Out marker are similarly considered pixels exterior to the semantic object. As with pixel-wise classification, the locations where the In and Out pixel regions meet identifies the semantic object's boundary. The combination of all In pixels constitutes the segmented semantic video object.
  • Semantic Object Tracking
  • FIG. 8 is a flowchart showing automatic subsequent-frame boundary tracking, performed after a semantic video object has been identified in an initial frame, and its approximate boundary adjusted (i.e. after pixel classification). Once the adjusted boundary has been determined, it is tracked into successive predicted frames. Such tracking continues iteratively until the next initial frame (if one is provided for). Subsequent frame tracking consists of four steps: [0074] motion prediction 350, motion estimation 352, boundary warping 354, and boundary adjustment 356. Motion estimation 352 may track rigid-body as well as non-rigid motion.
  • In a given frame sequence, there are generally two types of motion, rigid-body in-place movement and translational movement. Rigid motion can also be used to simulate non-rigid motion by applying rigid-motion analysis to sub-portions of an object, in addition to applying rigid-motion analysis to the overall object. Rigid body motion can be modeled by a perspective motion model. That is, assume two boundary images under consideration are B[0075] k−1(x, y) which includes a boundary indicating the previous semantic video object, and a current boundary indicated by Bk(x′, y′). Using the homogeneous coordinates, a 2D planar perspective transformation can be described as: x′=(a*x+b*y+c)/(g*x+h*y+1)
  • y′=(d*x+e*y+f)/(g*x+h*y+1) [0076]
  • The perspective motion model can represent a more general motion than a translational or affine motion model, such that if g=h=0 and a=1, b=0, d=0, e=1, then x′=x+c and y′=y+f, which becomes the translational motion model. Also, if g=h=0, then x′=a*x+b*y+c and y′=d*x+e*y+f, which is the affine motion model. [0077]
  • To find the parameters of a perspective motion model, (e.g. a through g), color information inside the semantic video object can be used since it is a good indicator of the global evolution of the semantic video object from frame to frame. For example, assume two color images under consideration are the previous frame F[0078] k−1(x, y) and the current frame Fk(x′, y′). Since the focus is on the evolution of the color information inside the semantic video object, the goal is to minimize the prediction error E over all corresponding pairs of pixels j inside the semantic mask of Fk−1 and the current frame Fk:E=Σjwj*(Fk−1(xj, yj)−Fk(xj′, yj′))2jwj*ej 2, where wj is set to 1 if (xj, yj) is inside the semantic object, and (xj′, yj′) is inside the frame, otherwise wj is set to zero.
  • Note that (x′, y′) generally do not fall on integer pixel coordinates. Consequently, an interpolation of the color in F[0079] k should be performed when re-sampling values. Preferably a bilinear interpolation in Fk is used (see FIG. 9). So, assuming the four integer corner pixel coordinates surrounding (x′, y′) in Fk are v0, v1, v2 and v3(v=(x, y) and v′=(x′, y′)), the interpolated pixel value (see FIG. 9) is Fk(v′)=Fk−1(v0)+(Fk−1(v1)−Fk−1(v0))*p+(Fk−1(v2)−Fk−1(v0))*q+(Fk−1(v3)−Fk−1(v2)−Fk−1(v1)+Fk−1(v0))*p*q.
  • FIG. 9 shows an example of a separable bilinear interpolation that can be used as the FIG. 8 interpolation step. A Levenberg-Marquardt iterative nonlinear algorithm is employed to perform the object-based minimization in order to get perspective parameters (a, b, c, d, e, f, g). The Levenberg-Marquardt algorithm is a non-linear curve fitting method useful for finding solutions to complex fitting problems. However, other least-squares or equivalent techniques may also be used. [0080]
  • The algorithm computes the partial derivatives of e[0081] j in the semantic video object with respect to the unknown motion parameters (a, b, c, d, e, f, g). That is, e i m 0 = x i D i I x e i m 7 = y i D i ( x i I x + y i I y ) a kl = i e i m k e i m l b k = - i e i e i m k
    Figure US20010048753A1-20011206-M00001
  • where D[0082] j is the denominator, I′=Fk′, I=Fk−1 and (m0, m1, m2, m3, m4, m5, m6, m7)=(a, b, c, d, e, f, g, h).
  • From these partial derivatives, the Levenberg-Marquardt algorithm computes an approximate Hessian matrix A and weighted gradient vector b with components, and updates the motion parameter estimate m by an amount Δm=A[0083] −1b.
  • A preferred implementation of the Levenberg-Marquardt includes the following steps: computing, for each pixel i at location (x[0084] i, yi) inside the semantic video object, the pixel's corresponding position (x′i, y′i), computing the error ej, computing the partial derivative of ej with respect to the mk, and adding the pixel's contribution to A and b. Then, the system of equations AΔm=b is solved and the motion parameters m(t+1)=m(t)+Δm are updated. These steps are iterated until error is below a predetermined threshold.
  • Motion Estimation
  • Returning to FIG. 8, after [0085] prediction 350, the next step is motion estimation 352. It is somewhat axiomatic that a good estimation starts with a good initial setting. By recognizing that in the real world the trajectory of an object is generally smooth, this information can be applied to interpreting recorded data to improve compression efficiency. For simplicity, it is assumed that the trajectory of a semantic video object is basically smooth, and that the motion information in a previous frame provides a good guess basis for motion in a current frame. Therefore, the previous motion parameters can be used as the starting point of the current motion estimation process. (Note, however, that these assumptions are for simplicity, and all embodiments need not have this limitation.) For the first motion estimation, since there is no previous frame from which to extrapolate, the initial transformation is set to a=e=1, and b=c=d=f=g=h=0.
  • Boundary Warping
  • Once [0086] motion prediction 350 and estimation 352 is computed, the previous boundary is then warped 354 according to the predicted motion parameters (a, b, c, d, e, f, g, h), i.e., the semantic object boundary in the previous frame (Bi−1) is warped towards the current frame to become to current estimate boundary (Bi′). Since the warped points generally do not fall on integer pixel coordinates, an inverse warping process is performed in order to get the warped semantic object boundary for the current frame. Although one skilled in the art will recognize that alternate methods may be employed, one method of accomplishing warping is as follows.
  • For each pixel (x′, y′) in F[0087] i, the inverse perspective transformation based on motion parameter (a, b, c, d, e, f, g, h) gives the inversely warped pixel (x, y) in Fi−1. If any of the four integer bounding pixels belongs to the previous object boundary, then (x′, y′) is a boundary pixel in the current frame. Based on the goal of the motion estimation, it is clear that Bi′ is an approximation of the semantic video object boundary in the current frame Bi, where this approximation has taken into account the rigid-body motion.
  • Unfortunately, besides rigid body motion, non-rigid body motion also exists in many real situations. Such motion is difficult to model. As noted above, it can be modeled with rigid-motion analysis. A preferred implementation treats non-rigid motion as a boundary refinement problem to be solved with a boundary adjustment step [0088] 406. At this point, the approximation of Bi, which is the warped previous object boundary Bi−1, has already been computed. With Bi, the same method used in the initial frame segmentation to solve the boundary adjustment problem may be used again. The only difference is that Bi′ in the initial frame is provided interactively by a user and Bi′ in the subsequent frame is produced by a motion estimation/motion compensated procedure (i.e. automatically without user intervention). Bi′ can be used to generate In boundary Bin′ and Out boundary Bout′ in the current frame. Once In and Out boundaries are obtained, the morphological watershed step (see FIG. 6 discussion above) will produce the real semantic object boundary Bi.
  • The whole procedure is illustrated in FIG. 10, which shows the creation of a subsequent frame's (see FIG. 2) In [0089] 370 and Out 372 boundaries based on such warping.
  • Sample Output
  • FIGS. [0090] 11-13 show sample output from the semantic video object extraction system for several video sequences. These sequences represent different degrees of extraction difficulty in real situations. To parallel the operation of the invention, the samples are broken to parts, the first representing initial frame (user assisted) segmentation results, and the second subsequent frame (automatic) tracking results.
  • The three selected color video sequences are all in QCIF format (176×144) at 30 Hz. The [0091] first Akiyo 450 sequence contains a woman sitting in front of a still background. The motion of the human body is relatively small. However, this motion is a non-rigid body motion because the human body may contain moving and still parts at the same time. The goal is to extract the human body 452 (semantic video object) from the background 454. The second Foreman 456 includes a man 458 talking in front of a building 460. This video data is more complex than Akiyo due to the camera being in motion while the man is talking. The third video sequence is the well-known Mobile-calendar sequence 462. This sequence has a moving ball 464 that is traveling over a complex background 466. This sequence is the most complex since the motion of the ball contains not only translational motion, but also rotational and zooming factors.
  • FIG. 11 shows initial frame segmentation results. The [0092] first row 468 shows an initial boundary obtained by user assistance; this outline indicates an image feature within the video frame of semantic interest to the user. The second row 470 shows the In and Out boundaries defined inside and outside of the semantic video object. For the output shown, the invention was configured with a size of 2 for the square structure element used for dilation and erosion. The third row 472 shows the precise boundaries 474 located using the morphological segmentation tool (see FIG. 6 above). The forth row 476 shows the final extracted semantic objects.
  • FIG. 12 shows subsequent frame boundary tracking results. For the output shown, the tracking was done at 30 Hz (no skipped frames). Each [0093] column 478, 480, 482 represents four frames randomly chosen from each video sequence. FIG. 13 shows the corresponding final extracted semantic video objects from the FIG. 12 frames. As shown, the initial precise boundary 474 has been iteratively warped (FIG. 8, step 354) into a tracked 484 boundary throughout the video sequences; this allows implementations of the invention to automatically extract user-identified image features.
  • Conclusion
  • Having illustrated and described the principles of the present invention in a preferred embodiment, and several variations thereof, it should be apparent to those skilled in the art that these embodiments can be modified in arrangement and detail without departing from such principles. In view of the wide range of embodiments to which the principles of the invention may be applied, it should be recognized that the detailed embodiment is illustrative and should not be taken as limiting the invention. Accordingly, we claim as our invention all such modifications as may come within the scope and spirit of the following claims and equivalents thereto. [0094]

Claims (30)

We claim:
1. A method of semantic object tracking of an object depicted in a first, second, and third video frame, the object having a precise border, the method comprising:
(a) defining an initial approximate boundary near the border of the object in the first frame;
(b) defining an inner boundary inside the approximate boundary;
(c) defining an outer boundary outside the approximate boundary;
(d) expanding the inner boundary and contracting the outer boundary to identify an outline corresponding to the border of the object on the first frame;
(e) identifying a motion transformation function representing the transformation between the object in the first frame and the object of the second frame;
(f) warping the outline according to the motion transformation function to define a new approximate boundary for object in the second frame; and
(g) repeating steps (b) through (d) with the new approximate boundary so as to automatically track the boundary of the object between the second and third frames.
2. The method of
claim 1
, in which E is a morphological erosion operator, O is a morphological dilation operator, and Binit is the approximate boundary initially selected by the user, where
the step of defining an inner boundary further requires satisfying morphological relation Bin=Es(Binit), and
the step of defining the outer boundary requires satisfying morphological relation Bout=Os(Binit).
3. The method of
claim 1
, in which each video frame is defined by a set of pixels, the inner and outer boundaries defining a subset of pixels, wherein the step of expanding the inner and outer boundaries includes:
sampling pixels within the object to define at least one inside cluster-center pixel represented in multi-valued format;
sampling pixels outside of the object to define at least one outside cluster-center pixel represented in multi-valued format; and
classifying each pixel of the subset of pixels to its closest cluster-center.
4. The method of
claim 3
, wherein classifying each pixel of the subset of pixels is morphological watershed based.
5. The method of
claim 4
, having m inside cluster-centers and n outside cluster-centers, where morphological watershed pixel classification includes the step of:
starting with each cluster-center and calculating a similarity to each pixel in the subset of pixels;
wherein similarity for an ith pixel is evaluated by computing the ith pixel's absolute distance di from each cluster-center c as determined by di=(|rc−r i|)+(|gc−gi|)+(|bc−bi|), 0<i<(m+n).
6. The method of
claim 4
, in which morphological watershed pixel classification uses a hierarchical queue data-structure for tracking a set of first-in first-out pixel queues, each queue having a priority ranking such that a pixel removed from the hierarchical queue is removed from a highest-priority queue within the set of queues, the method including the steps of:
identifying a set of m in markers and a set of n out markers;
initializing the hierarchical queue by placing all neighborhood pixels of all in markers and out markers into the hierarchical queue according to a distance between each neighborhood pixel and each marker;
removing a first pixel from the hierarchical queue;
identifying a subset of markers consisting of those markers within a predetermined distance from the first pixel; and
determining if the first pixel has been classified to any marker, and if not classified, classifying the first pixel to a nearest marker of the subset of markers.
7. The method of
claim 6
, in which distance between a pixel and a marker is determined by computing the mean color value difference between the pixel and the marker.
8. The method of
claim 6
, in which pixels have red, green, and blue color component values r, g, b, and the distance between an ith pixel and a marker is determined by di=(|rmarker−ri|)+(|gmarker−gi|)+(|bmarker−bi|), 0<i<(m+n), wherein pixels with lower di values are first placed in the hierarchical queue.
9. The method of
claim 6
, in which pixels have color component values, and markers are represented in multi-valued format, wherein:
the step of classifying the first pixel to the nearest marker includes updating the multi-valued representation of the nearest marker with the color component values of the first pixel.
10. A computer readable medium having stored therein computer programming code for causing a computer to segment an image feature in a first video frame, the image feature having a border, comprising:
code for defining an approximate boundary near the border, where the approximate boundary is initially selected by a user;
code for defining an inner boundary inside the border;
code for defining an outer boundary outside the border; and
code for expanding the inner boundary and contracting the outer boundary to define an outline corresponding to the border.
11. The medium of
claim 10
, in which movement of the image feature is tracked across video frames, further comprising:
code for identifying a transform expressing a transformation of the image feature between the first and a second video frame; and
code for applying the transform to the outline of the image feature in the first frame to define a second approximate boundary for the image feature in the second frame.
12. The medium of
claim 11
, wherein the code for defining an inner and outer boundary, and the code for expanding the inner boundary and contracting the outer boundary, are applied to the second approximate boundary to identify a second outline.
13. The medium of
claim 10
, further comprising code for receiving input from a hand-held input device, wherein such input is used to define the initial approximate boundary.
14. The medium of
claim 10
, in which each video frame is defined by a set of pixels having color and position values, and where the inner and outer boundaries define a subset of pixels, wherein the code for expanding the inner and outer boundaries includes:
sampling pixels within the image feature to define at least one inside cluster-center pixel represented in multi-valued format;
sampling pixels outside of the image feature to define at least one outside cluster-center pixel represented in multi-valued format; and
classifying each pixel of the subset of pixels to its closest cluster-center.
15. The medium of
claim 14
, wherein the code for classifying each pixel of the subset of pixels is pixel-wise based.
16. The medium of
claim 15
, where there are m inside and n outside cluster-centers, and pixel-wise classification includes computing, for each pixel within the subset of pixels, such pixel's absolute distance to a cluster as determined by:
d i =w color*(|r−r i|)+(|g−g i|)+(|b−b i|)+w coord(|x−x i|)+(|y−y i|), 0<i<m, d j =w color*(|r−r j|)+(|g−g j|)+(|b−b j|)+w coord(|x−-x j|)+(|y−y j|), 0<j<n,
where wcolor and wcoord are the weights for the color coordinate information, and the summation of wcolor and wcoord is one.
17. The medium of
claim 14
, wherein the function for classifying each pixel of the subset of pixels is morphological watershed based.
18. The medium of
claim 17
, wherein there are m inside cluster-centers and n outside cluster-centers, and each pixel and marker has color component values r, g, b, where morphological watershed classification includes code for:
starting with each cluster center and calculating a similarity to each pixel in the subset of pixels;
wherein similarity is evaluated by computing such pixel's absolute distance from such cluster as determined by di=(|r−ri|)+(|g−gi|)+(|b−bi|), 0<i<(m+n).
19. The medium of
claim 17
, in which there is code for a hierarchical queue data structure tracking a set of first-in first-out pixel queues, each pixel queue ranked from a lowest to a highest priority queue, where a pixel removed from the hierarchical queue is an earliest pixel to enter the highest-priority queue, the code for morphological watershed classification including code for:
identifying a set of m in markers and a set of n out markers;
initializing the hierarchical queue by placing all neighborhood pixels of all markers into the hierarchical queue according to a distance between each neighborhood pixel and each marker;
removing a first pixel from the hierarchical queue;
determining if the first pixel has been classified to a marker;
identifying a set of markers consisting of those markers within a predetermined distance from the first pixel; and
classifying the first pixel to a nearest marker of the set of markers.
20. The medium of
claim 19
, in which determining the distance to a marker is by computing the mean color value difference between neighborhood pixels and each markers.
21. The medium of
claim 19
, in which the code for computing an ith pixel's distance to a marker is determined by di=(|rmarker−ri|)+(|−gmarkergi|)+(|bmarker−bi|), 0<i<(m+n);
wherein pixels with lower di values are first placed in the hierarchical queue.
22. The medium of
claim 19
, in which markers are stored in multi-valued format, and wherein classifying the first pixel to the nearest marker includes updating the marker's multi-valued representation with the color component values of the first pixel.
23. The medium of
claim 10
, in which E is a morphological erosion operator, O is a morphological dilation operator, and Binit is the approximate boundary initially selected by the user, wherein:
the code for defining the inner boundary requires satisfying morphological relation Bin=Es(Binit), and the code for defining the outer boundary requires satisfying morphological relation Bout=Os(Binit).
24. A method of tracking motion of an object across multiple video frames, the object having a perimeter, the method comprising:
identifying a first boundary approximating the perimeter of the object in a first video frame;
identifying a global motion transformation indicating the movement of the object between the first video frame and a second video frame; and
applying the global motion transformation to the first boundary to identify a second boundary approximating the perimeter of the object in the second video frame.
25. The method of
claim 24
, further comprising the steps of:
defining an inner boundary inside the first boundary;
defining an outer boundary outside the first boundary; and
snapping the inner and outer boundaries to the perimeter of the object in the first frame by expanding the inner boundary and contracting the outer boundary to identify an outline;
wherein the global motion transformation is applied to the outline to identify the second boundary approximating the perimeter of the object in the second video frame.
26. The method of
claim 25
, wherein for the first frame a user identifies the first boundary, and for subsequent frames, global motion transformations are used to identify tentative boundaries of an object in such subsequent frames, which are then snapped to identify an outline for the object in each such subsequent frame.
27. The method of
claim 26
, further including the step of:
computing an average error value E corresponding to pixel coloration error across a particular tentative boundary in a corresponding subsequent frame;
wherein if E exceeds a predetermined threshold, a user can be prompted to identify the boundary of the object in the corresponding subsequent frame, such identified boundary serving as the tentative boundary from which subsequent boundaries are determined in subsequent frames.
28. The method of
claim 26
, in which each video frame is defined by a set of pixels from which the inner Bin and outer Bout boundaries define a subset of pixels, and where E is a morphological erosion operator, O is a morphological dilation operator, and Binit is the boundary selected by the user, the method further comprising:
sampling pixels within the object to define at least one inside cluster-center pixel represented in multi-valued format;
sampling pixels outside of the object to define at least one outside cluster-center pixel represented in multi-valued format; and
classifying each pixel of the subset of pixels to its closest cluster-center by calculating a similarity between such pixel and each cluster-center;
wherein Bin satisfies morphological relation Bin=Es(Binit), and Bout satisfies morphological relation Bout=Os(Binit).
29. The method of
claim 28
, where clusters and pixels have color components r, g, and b, and the step of classifying pixels of the subset of pixels is morphological watershed based, having m inside cluster-centers and n outside cluster-centers, wherein similarity for an Ith pixel is the absolute distance di between the Ith pixel and each cluster-center c, as determined by di=(|rc−ri|)+(|gc−gi)+(|bc−bi|), 0<i<(m+n).
30. The method of tracking the object of
claim 24
, wherein non-rigid motion is tracked across multiple video frames by identifying a global motion transformation for movement of the object between the first and second video frame, and by identifying a local motion transformation for movement of at least one sub-object within the object.
US09/054,280 1998-04-02 1998-04-02 Semantic video object segmentation and tracking Expired - Lifetime US6400831B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/054,280 US6400831B2 (en) 1998-04-02 1998-04-02 Semantic video object segmentation and tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/054,280 US6400831B2 (en) 1998-04-02 1998-04-02 Semantic video object segmentation and tracking

Publications (2)

Publication Number Publication Date
US20010048753A1 true US20010048753A1 (en) 2001-12-06
US6400831B2 US6400831B2 (en) 2002-06-04

Family

ID=21989965

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/054,280 Expired - Lifetime US6400831B2 (en) 1998-04-02 1998-04-02 Semantic video object segmentation and tracking

Country Status (1)

Country Link
US (1) US6400831B2 (en)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010040924A1 (en) * 2000-05-11 2001-11-15 Osamu Hori Object region data describing method and object region data creating apparatus
US20030068074A1 (en) * 2001-10-05 2003-04-10 Horst Hahn Computer system and a method for segmentation of a digital image
US20030146915A1 (en) * 2001-10-12 2003-08-07 Brook John Charles Interactive animation of sprites in a video production
US6665423B1 (en) * 2000-01-27 2003-12-16 Eastman Kodak Company Method and system for object-oriented motion-based video description
US20040017939A1 (en) * 2002-07-23 2004-01-29 Microsoft Corporation Segmentation of digital video and images into continuous tone and palettized regions
US6731799B1 (en) * 2000-06-01 2004-05-04 University Of Washington Object segmentation with background extraction and moving boundary techniques
US20040189863A1 (en) * 1998-09-10 2004-09-30 Microsoft Corporation Tracking semantic objects in vector image sequences
US20050041102A1 (en) * 2003-08-22 2005-02-24 Bongiovanni Kevin Paul Automatic target detection and motion analysis from image data
US20050175219A1 (en) * 2003-11-13 2005-08-11 Ming-Hsuan Yang Adaptive probabilistic visual tracking with incremental subspace update
US20050216274A1 (en) * 2004-02-18 2005-09-29 Samsung Electronics Co., Ltd. Object tracking method and apparatus using stereo images
US20050271271A1 (en) * 2002-09-19 2005-12-08 Koninklijke Philips Electronics N.V. Segmenting a series of 2d or 3d images
US20060023916A1 (en) * 2004-07-09 2006-02-02 Ming-Hsuan Yang Visual tracking using incremental fisher discriminant analysis
US20060285770A1 (en) * 2005-06-20 2006-12-21 Jongwoo Lim Direct method for modeling non-rigid motion with thin plate spline transformation
US20070058647A1 (en) * 2004-06-30 2007-03-15 Bettis Sonny R Video based interfaces for video message systems and services
EP1792299A2 (en) * 2004-09-07 2007-06-06 Adobe Systems Incorporated A method and system to perform localized activity with respect to digital data
US20080226161A1 (en) * 2007-03-12 2008-09-18 Jeffrey Kimball Tidd Determining Edgeless Areas in a Digital Image
US20080303913A1 (en) * 2005-08-26 2008-12-11 Koninklijke Philips Electronics, N.V. Imaging Camera Processing Unit and Method
US20080309629A1 (en) * 2007-06-13 2008-12-18 Apple Inc. Bottom up watershed dataflow method and region-specific segmentation based on historic data
US20090174595A1 (en) * 2005-09-22 2009-07-09 Nader Khatib SAR ATR treeline extended operating condition
US20090259679A1 (en) * 2008-04-14 2009-10-15 Microsoft Corporation Parsimonious multi-resolution value-item lists
WO2010007423A2 (en) * 2008-07-14 2010-01-21 Musion Ip Limited Video processing and telepresence system and method
US20100149340A1 (en) * 2008-12-17 2010-06-17 Richard Lee Marks Compensating for blooming of a shape in an image
KR101021409B1 (en) 2002-11-11 2011-03-14 소니 일렉트로닉스 인코포레이티드 Method and apparatus for nonlinear multiple motion model and moving boundary extraction
US20110194742A1 (en) * 2008-10-14 2011-08-11 Koninklijke Philips Electronics N.V. One-click correction of tumor segmentation results
US20110282897A1 (en) * 2008-06-06 2011-11-17 Agency For Science, Technology And Research Method and system for maintaining a database of reference images
US20130236099A1 (en) * 2012-03-08 2013-09-12 Electronics And Telecommunications Research Institute Apparatus and method for extracting foreground layer in image sequence
US20130254825A1 (en) * 2012-03-22 2013-09-26 Nokia Siemens Networks Oy Enhanced policy control framework for object-based media transmission in evolved packet systems
US20130287259A1 (en) * 2011-11-17 2013-10-31 Yasunori Ishii Image processing device, image capturing device, and image processing method
US8712989B2 (en) 2010-12-03 2014-04-29 Microsoft Corporation Wild card auto completion
GB2529813A (en) * 2014-08-28 2016-03-09 Canon Kk Scale estimation for object segmentation in a medical image
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
EP2374107A4 (en) * 2008-12-11 2016-11-16 Imax Corp Devices and methods for processing images using scale space
US9921665B2 (en) 2012-06-25 2018-03-20 Microsoft Technology Licensing, Llc Input method editor application platform
US10032077B1 (en) * 2015-10-29 2018-07-24 National Technology & Engineering Solutions Of Sandia, Llc Vehicle track identification in synthetic aperture radar images
CN108345890A (en) * 2018-03-01 2018-07-31 腾讯科技(深圳)有限公司 Image processing method, device and relevant device
WO2019033509A1 (en) * 2017-08-16 2019-02-21 歌尔科技有限公司 Interior and exterior identification method for image contour and device
DE102017124600A1 (en) * 2017-10-20 2019-04-25 Connaught Electronics Ltd. Semantic segmentation of an object in an image
US10313686B2 (en) * 2016-09-20 2019-06-04 Gopro, Inc. Apparatus and methods for compressing video content using adaptive projection selection
DE102018128184A1 (en) * 2018-11-12 2020-05-14 Bayerische Motoren Werke Aktiengesellschaft Method, device, computer program and computer program product for generating a labeled image
CN111723769A (en) * 2020-06-30 2020-09-29 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for processing image
US10878280B2 (en) * 2019-05-23 2020-12-29 Webkontrol, Inc. Video content indexing and searching
US20200410710A1 (en) * 2018-11-06 2020-12-31 Wuyi University Method for measuring antenna downtilt based on linear regression fitting
US10902249B2 (en) * 2016-10-31 2021-01-26 Hewlett-Packard Development Company, L.P. Video monitoring
US11120280B2 (en) * 2019-11-15 2021-09-14 Argo AI, LLC Geometry-aware instance segmentation in stereo image capture processes
CN113420612A (en) * 2021-06-02 2021-09-21 深圳中集智能科技有限公司 Production beat calculation method based on machine vision
US11157744B2 (en) * 2020-01-15 2021-10-26 International Business Machines Corporation Automated detection and approximation of objects in video
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
US20220043853A1 (en) * 2010-05-06 2022-02-10 Soon Teck Frederick Noel Liau System And Method For Directing Content To Users Of A Social Networking Engine
US11301099B1 (en) 2019-09-27 2022-04-12 Apple Inc. Methods and apparatus for finger detection and separation on a touch sensor panel using machine learning models
WO2022073409A1 (en) * 2020-10-10 2022-04-14 腾讯科技(深圳)有限公司 Video processing method and apparatus, computer device, and storage medium
CN114503946A (en) * 2022-01-24 2022-05-17 海南大学 Fishing ground feeding system and method based on high frame rate dynamic frame difference accurate identification
TWI774620B (en) * 2021-12-01 2022-08-11 晶睿通訊股份有限公司 Object classifying and tracking method and surveillance camera

Families Citing this family (76)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100327103B1 (en) * 1998-06-03 2002-09-17 한국전자통신연구원 Method for objects sehmentation in video sequences by object tracking and assistance
JP4324327B2 (en) * 1998-09-10 2009-09-02 株式会社エッチャンデス Visual equipment
US6766037B1 (en) * 1998-10-02 2004-07-20 Canon Kabushiki Kaisha Segmenting moving objects and determining their motion
US6748421B1 (en) * 1998-12-23 2004-06-08 Canon Kabushiki Kaisha Method and system for conveying video messages
US6643387B1 (en) * 1999-01-28 2003-11-04 Sarnoff Corporation Apparatus and method for context-based indexing and retrieval of image sequences
US6647131B1 (en) 1999-08-27 2003-11-11 Intel Corporation Motion detection using normal optical flow
US6658136B1 (en) * 1999-12-06 2003-12-02 Microsoft Corporation System and process for locating and tracking a person or object in a scene using a series of range images
US6654483B1 (en) * 1999-12-22 2003-11-25 Intel Corporation Motion detection using normal optical flow
US6674925B1 (en) * 2000-02-08 2004-01-06 University Of Washington Morphological postprocessing for object tracking and segmentation
US7367042B1 (en) * 2000-02-29 2008-04-29 Goldpocket Interactive, Inc. Method and apparatus for hyperlinking in a television broadcast
US7343617B1 (en) 2000-02-29 2008-03-11 Goldpocket Interactive, Inc. Method and apparatus for interaction with hyperlinks in a television broadcast
JP4612760B2 (en) * 2000-04-25 2011-01-12 キヤノン株式会社 Image processing apparatus and method
AUPQ849200A0 (en) * 2000-06-30 2000-07-27 Cea Technologies Inc. Unsupervised scene segmentation
EP1317857A1 (en) * 2000-08-30 2003-06-11 Watchpoint Media Inc. A method and apparatus for hyperlinking in a television broadcast
FR2814312B1 (en) * 2000-09-07 2003-01-24 France Telecom METHOD FOR SEGMENTATION OF A VIDEO IMAGE SURFACE BY ELEMENTARY OBJECTS
US7219364B2 (en) * 2000-11-22 2007-05-15 International Business Machines Corporation System and method for selectable semantic codec pairs for very low data-rate video transmission
IL156250A0 (en) * 2000-12-05 2004-01-04 Yeda Res & Dev Apparatus and method for alignment of spatial or temporal non-overlapping image sequences
US7003061B2 (en) * 2000-12-21 2006-02-21 Adobe Systems Incorporated Image extraction from complex scenes in digital video
ITRM20010045A1 (en) * 2001-01-29 2002-07-29 Consiglio Nazionale Ricerche SYSTEM AND METHOD FOR DETECTING THE RELATIVE POSITION OF AN OBJECT COMPARED TO A REFERENCE POINT.
US20020171742A1 (en) * 2001-03-30 2002-11-21 Wataru Ito Method and apparatus for controlling a view field of an image picking-up apparatus and computer program therefor
US20050129274A1 (en) * 2001-05-30 2005-06-16 Farmer Michael E. Motion-based segmentor detecting vehicle occupants using optical flow method to remove effects of illumination
US6870945B2 (en) * 2001-06-04 2005-03-22 University Of Washington Video object tracking by estimating and subtracting background
TW530498B (en) * 2001-08-14 2003-05-01 Nat Univ Chung Cheng Object segmentation method using MPEG-7
US7068809B2 (en) * 2001-08-27 2006-06-27 Digimarc Corporation Segmentation in digital watermarking
JP3735552B2 (en) * 2001-09-28 2006-01-18 株式会社東芝 Processing method of spatio-temporal region information
US6847682B2 (en) * 2002-02-01 2005-01-25 Hughes Electronics Corporation Method, system, device and computer program product for MPEG variable bit rate (VBR) video traffic classification using a nearest neighbor classifier
TW554629B (en) * 2002-03-22 2003-09-21 Ind Tech Res Inst Layered object segmentation method based on motion picture compression standard
US20030185227A1 (en) * 2002-03-29 2003-10-02 International Business Machines Corporation Secondary queue for sequential processing of related queue elements
US7119837B2 (en) * 2002-06-28 2006-10-10 Microsoft Corporation Video processing system and method for automatic enhancement of digital video
US7085420B2 (en) * 2002-06-28 2006-08-01 Microsoft Corporation Text detection in continuous tone image segments
JP2005537608A (en) * 2002-09-02 2005-12-08 サムスン エレクトロニクス カンパニー リミテッド Optical information storage medium, method and apparatus for recording and / or reproducing information on and / or from optical information storage medium
US7421129B2 (en) * 2002-09-04 2008-09-02 Microsoft Corporation Image compression and synthesis for video effects
US7221775B2 (en) * 2002-11-12 2007-05-22 Intellivid Corporation Method and apparatus for computerized image background analysis
ATE454789T1 (en) * 2002-11-12 2010-01-15 Intellivid Corp METHOD AND SYSTEM FOR TRACKING AND MONITORING BEHAVIOR OF MULTIPLE OBJECTS MOVING THROUGH MULTIPLE FIELDS OF VIEW
US7408986B2 (en) * 2003-06-13 2008-08-05 Microsoft Corporation Increasing motion smoothness using frame interpolation with motion analysis
US7558320B2 (en) * 2003-06-13 2009-07-07 Microsoft Corporation Quality control in frame interpolation with motion analysis
US7061401B2 (en) * 2003-08-07 2006-06-13 BODENSEEWERK GERäTETECHNIK GMBH Method and apparatus for detecting a flight obstacle
US7286157B2 (en) * 2003-09-11 2007-10-23 Intellivid Corporation Computerized method and apparatus for determining field-of-view relationships among multiple image sensors
US7346187B2 (en) * 2003-10-10 2008-03-18 Intellivid Corporation Method of counting objects in a monitored environment and apparatus for the same
US7280673B2 (en) * 2003-10-10 2007-10-09 Intellivid Corporation System and method for searching for changes in surveillance video
US7447331B2 (en) * 2004-02-24 2008-11-04 International Business Machines Corporation System and method for generating a viewable video index for low bandwidth applications
US20060053374A1 (en) * 2004-09-07 2006-03-09 Adobe Systems Incorporated Localization of activity with respect to digital data
US7262713B1 (en) * 2004-09-30 2007-08-28 Rockwell Collins, Inc. System and method for a safe depiction of terrain, airport and other dimensional data on a perspective flight display with limited bandwidth of data presentation
US20060126737A1 (en) * 2004-12-15 2006-06-15 International Business Machines Corporation Method, system and program product for a camera to track an object using motion vector data
US20060126738A1 (en) * 2004-12-15 2006-06-15 International Business Machines Corporation Method, system and program product for a plurality of cameras to track an object using motion vector data
US7710452B1 (en) 2005-03-16 2010-05-04 Eric Lindberg Remote video monitoring of non-urban outdoor sites
ATE500580T1 (en) 2005-03-25 2011-03-15 Sensormatic Electronics Llc INTELLIGENT CAMERA SELECTION AND OBJECT TRACKING
US9036028B2 (en) 2005-09-02 2015-05-19 Sensormatic Electronics, LLC Object tracking and alerts
US7720283B2 (en) * 2005-12-09 2010-05-18 Microsoft Corporation Background removal in a live video
US8026931B2 (en) * 2006-03-16 2011-09-27 Microsoft Corporation Digital video effects
US7825792B2 (en) * 2006-06-02 2010-11-02 Sensormatic Electronics Llc Systems and methods for distributed monitoring of remote sites
US7671728B2 (en) 2006-06-02 2010-03-02 Sensormatic Electronics, LLC Systems and methods for distributed monitoring of remote sites
US8004536B2 (en) * 2006-12-01 2011-08-23 Adobe Systems Incorporated Coherent image selection and modification
US8175409B1 (en) 2006-12-01 2012-05-08 Adobe Systems Incorporated Coherent image selection and modification
JP5121258B2 (en) * 2007-03-06 2013-01-16 株式会社東芝 Suspicious behavior detection system and method
US8027513B2 (en) * 2007-03-23 2011-09-27 Technion Research And Development Foundation Ltd. Bitmap tracker for visual tracking under very general conditions
CN101682750A (en) * 2007-06-09 2010-03-24 传感电子公司 System and method for integrating video analytics and data analytics/mining
US8233676B2 (en) * 2008-03-07 2012-07-31 The Chinese University Of Hong Kong Real-time body segmentation system
US9141862B2 (en) * 2008-09-26 2015-09-22 Harris Corporation Unattended surveillance device and associated methods
US9355469B2 (en) 2009-01-09 2016-05-31 Adobe Systems Incorporated Mode-based graphical editing
US8934545B2 (en) * 2009-02-13 2015-01-13 Yahoo! Inc. Extraction of video fingerprints and identification of multimedia using video fingerprinting
US20100247062A1 (en) * 2009-03-27 2010-09-30 Bailey Scott J Interactive media player system
US8988525B2 (en) * 2009-08-27 2015-03-24 Robert Bosch Gmbh System and method for providing guidance information to a driver of a vehicle
WO2011028837A2 (en) * 2009-09-01 2011-03-10 Prime Focus Vfx Services Ii Inc. System and process for transforming two-dimensional images into three-dimensional images
US8824823B1 (en) * 2011-01-20 2014-09-02 Verint Americas Inc. Increased quality of image objects based on depth in scene
US9268996B1 (en) 2011-01-20 2016-02-23 Verint Systems Inc. Evaluation of models generated from objects in video
JP4855556B1 (en) * 2011-03-22 2012-01-18 株式会社モルフォ Moving object detection apparatus, moving object detection method, moving object detection program, moving object tracking apparatus, moving object tracking method, and moving object tracking program
DE102012020778B4 (en) * 2012-10-23 2018-01-18 Audi Ag Method of tagging a sequence of images taken in time sequence with integrated quality control
US9129191B2 (en) 2013-12-16 2015-09-08 Adobe Systems Incorporated Semantic object selection
US9129192B2 (en) * 2013-12-16 2015-09-08 Adobe Systems Incorporated Semantic object proposal generation and validation
US9928592B2 (en) 2016-03-14 2018-03-27 Sensors Unlimited, Inc. Image-based signal detection for object metrology
US10007971B2 (en) 2016-03-14 2018-06-26 Sensors Unlimited, Inc. Systems and methods for user machine interaction for image-based metrology
US10970553B2 (en) * 2017-11-15 2021-04-06 Uatc, Llc Semantic segmentation of three-dimensional data
EP3651056A1 (en) 2018-11-06 2020-05-13 Rovco Limited Computing device and method for video object detection
US10885386B1 (en) 2019-09-16 2021-01-05 The Boeing Company Systems and methods for automatically generating training image sets for an object
US11113570B2 (en) 2019-09-16 2021-09-07 The Boeing Company Systems and methods for automatically generating training image sets for an environment

Family Cites Families (85)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3873972A (en) 1971-11-01 1975-03-25 Theodore H Levine Analytic character recognition system
GB8311813D0 (en) 1983-04-29 1983-06-02 West G A W Coding and storing raster scan images
DE3515159A1 (en) 1984-04-27 1985-10-31 Canon K.K., Tokio/Tokyo IMAGE PROCESSING DEVICE
US4751742A (en) 1985-05-07 1988-06-14 Avelex Priority coding of transform coefficients
US4754492A (en) 1985-06-03 1988-06-28 Picturetel Corporation Method and system for adapting a digitized signal processing system for block processing with minimal blocking artifacts
JPH0766446B2 (en) 1985-11-27 1995-07-19 株式会社日立製作所 Method of extracting moving object image
JP2540809B2 (en) 1986-07-30 1996-10-09 ソニー株式会社 High efficiency encoder
US4745633A (en) 1986-08-18 1988-05-17 Peter Waksman Optical image encoding and comparing using scan autocorrelation
US4905295A (en) 1986-11-13 1990-02-27 Ricoh Company, Ltd. Code sequence matching method and apparatus
US4961231A (en) 1987-01-20 1990-10-02 Ricoh Company, Ltd. Pattern recognition method
US5070465A (en) 1987-02-25 1991-12-03 Sony Corporation Video image transforming method and apparatus
US5136659A (en) 1987-06-30 1992-08-04 Kokusai Denshin Denwa Kabushiki Kaisha Intelligent coding system for picture signal
US5031225A (en) 1987-12-09 1991-07-09 Ricoh Company, Ltd. Character recognition method for recognizing character in an arbitrary rotation position
US4912549A (en) 1988-09-07 1990-03-27 Rca Licensing Corporation Video signal synchronization system as for an extended definition widescreen television signal processing system
US5034986A (en) 1989-03-01 1991-07-23 Siemens Aktiengesellschaft Method for detecting and tracking moving objects in a digital image sequence having a stationary background
GB8909498D0 (en) 1989-04-26 1989-06-14 British Telecomm Motion estimator
US5073955A (en) 1989-06-16 1991-12-17 Siemens Aktiengesellschaft Method for recognizing previously localized characters present in digital gray tone images, particularly for recognizing characters struck into metal surfaces
JP2953712B2 (en) 1989-09-27 1999-09-27 株式会社東芝 Moving object detection device
GB9001468D0 (en) 1990-01-23 1990-03-21 Sarnoff David Res Center Computing multiple motions within an image region
JP2569219B2 (en) 1990-01-31 1997-01-08 富士通株式会社 Video prediction method
US5148497A (en) 1990-02-14 1992-09-15 Massachusetts Institute Of Technology Fractal-based image compression and interpolation
JPH082107B2 (en) 1990-03-02 1996-01-10 国際電信電話株式会社 Method and apparatus for moving picture hybrid coding
US5103306A (en) 1990-03-28 1992-04-07 Transitions Research Corporation Digital image compression employing a resolution gradient
US4999705A (en) 1990-05-03 1991-03-12 At&T Bell Laboratories Three dimensional motion compensated video coding
US5155594A (en) 1990-05-11 1992-10-13 Picturetel Corporation Hierarchical encoding method and apparatus employing background references for efficiently communicating image sequences
US5086477A (en) 1990-08-07 1992-02-04 Northwest Technology Corp. Automated system for extracting design and layout information from an integrated circuit
US5020121A (en) 1990-08-16 1991-05-28 Hewlett-Packard Company Neighborhood block prediction bit compression
JP3037383B2 (en) 1990-09-03 2000-04-24 キヤノン株式会社 Image processing system and method
GB9019538D0 (en) 1990-09-07 1990-10-24 Philips Electronic Associated Tracking a moving object
EP0497586A3 (en) 1991-01-31 1994-05-18 Sony Corp Motion detection circuit
JPH04334188A (en) 1991-05-08 1992-11-20 Nec Corp Coding system for moving picture signal
JP2866222B2 (en) 1991-06-12 1999-03-08 三菱電機株式会社 Motion compensation prediction method
KR930001678A (en) 1991-06-13 1993-01-16 강진구 Noise Detection Algorithm in Video Signal
JP2873338B2 (en) 1991-09-17 1999-03-24 富士通株式会社 Moving object recognition device
JP2856229B2 (en) 1991-09-18 1999-02-10 財団法人ニューメディア開発協会 Image clipping point detection method
US5259040A (en) 1991-10-04 1993-11-02 David Sarnoff Research Center, Inc. Method for determining sensor motion and scene structure and image processing system therefor
JP2790562B2 (en) 1992-01-06 1998-08-27 富士写真フイルム株式会社 Image processing method
JP3068304B2 (en) 1992-01-21 2000-07-24 日本電気株式会社 Video coding and decoding systems
DE69322423T2 (en) 1992-03-13 1999-06-02 Canon Kk Device for the detection of motion vectors
CA2132515C (en) * 1992-03-20 2006-01-31 Glen William Auty An object monitoring system
US5706417A (en) 1992-05-27 1998-01-06 Massachusetts Institute Of Technology Layered representation for image coding
GB9215102D0 (en) 1992-07-16 1992-08-26 Philips Electronics Uk Ltd Tracking moving objects
EP0584559A3 (en) * 1992-08-21 1994-06-22 United Parcel Service Inc Method and apparatus for finding areas of interest in images
TW250555B (en) 1992-09-30 1995-07-01 Hudson Kk
JPH06113287A (en) 1992-09-30 1994-04-22 Matsushita Electric Ind Co Ltd Picture coder and picture decoder
US5424783A (en) 1993-02-10 1995-06-13 Wong; Yiu-Fai Clustering filter method for noise filtering, scale-space filtering and image processing
US5592228A (en) 1993-03-04 1997-01-07 Kabushiki Kaisha Toshiba Video encoder using global motion estimation and polygonal patch motion estimation
JP3679426B2 (en) 1993-03-15 2005-08-03 マサチューセッツ・インスティチュート・オブ・テクノロジー A system that encodes image data into multiple layers, each representing a coherent region of motion, and motion parameters associated with the layers.
DE69434131T2 (en) * 1993-05-05 2005-11-03 Koninklijke Philips Electronics N.V. Device for segmentation of textures images
US5329311A (en) 1993-05-11 1994-07-12 The University Of British Columbia System for determining noise content of a video signal in the disclosure
EP0625853B1 (en) 1993-05-21 1999-03-03 Nippon Telegraph And Telephone Corporation Moving image encoder and decoder
DE69329332T2 (en) 1993-05-26 2001-02-22 St Microelectronics Srl TV picture decoding architecture for executing a 40 ms process algorithm in HDTV
US5517327A (en) 1993-06-30 1996-05-14 Minolta Camera Kabushiki Kaisha Data processor for image data using orthogonal transformation
US5477272A (en) 1993-07-22 1995-12-19 Gte Laboratories Incorporated Variable-block size multi-resolution motion estimation scheme for pyramid coding
JP2576771B2 (en) 1993-09-28 1997-01-29 日本電気株式会社 Motion compensation prediction device
JPH07299053A (en) * 1994-04-29 1995-11-14 Arch Dev Corp Computer diagnosis support method
US5594504A (en) 1994-07-06 1997-01-14 Lucent Technologies Inc. Predictive video coding using a motion vector updating routine
JP2870415B2 (en) 1994-08-22 1999-03-17 日本電気株式会社 Area division method and apparatus
KR100287211B1 (en) 1994-08-30 2001-04-16 윤종용 Bidirectional motion estimation method and system
US5574572A (en) 1994-09-07 1996-11-12 Harris Corporation Video scaling method and device
US5978497A (en) * 1994-09-20 1999-11-02 Neopath, Inc. Apparatus for the identification of free-lying cells
DE69525127T2 (en) 1994-10-28 2002-10-02 Oki Electric Ind Co Ltd Device and method for encoding and decoding images using edge synthesis and wavelet inverse transformation
KR0171146B1 (en) 1995-03-18 1999-03-20 배순훈 Feature point based motion vectors detecting apparatus
KR0171147B1 (en) 1995-03-20 1999-03-20 배순훈 Apparatus for selecting feature point by means of gradient
KR0171143B1 (en) 1995-03-20 1999-03-20 배순훈 Apparatus for composing triangle in the hexagonal grid
KR0171118B1 (en) 1995-03-20 1999-03-20 배순훈 Apparatus for encoding video signal
JP3612360B2 (en) 1995-04-10 2005-01-19 株式会社大宇エレクトロニクス Motion estimation method using moving object segmentation method
US5621660A (en) 1995-04-18 1997-04-15 Sun Microsystems, Inc. Software-based encoder for a software-implemented end-to-end scalable video delivery system
KR0171154B1 (en) 1995-04-29 1999-03-20 배순훈 Method and apparatus for encoding video signals using feature point based motion prediction
KR0181063B1 (en) 1995-04-29 1999-05-01 배순훈 Method and apparatus for forming grid in motion compensation technique using feature point
US5717463A (en) 1995-07-24 1998-02-10 Motorola, Inc. Method and system for estimating motion within a video sequence
US5668608A (en) 1995-07-26 1997-09-16 Daewoo Electronics Co., Ltd. Motion vector estimation method and apparatus for use in an image signal encoding system
DE69615812T2 (en) 1995-08-02 2002-06-20 Koninkl Philips Electronics Nv METHOD AND SYSTEM FOR CODING A PICTURE SEQUENCE
KR0178229B1 (en) 1995-08-08 1999-05-01 배순훈 Image processing system using a pixel-by-pixel motion estimation based on feature points
KR100304660B1 (en) * 1995-09-22 2001-11-22 윤종용 Method for encoding video signals by accumulative error processing and encoder
US5959673A (en) 1995-10-05 1999-09-28 Microsoft Corporation Transform coding of dense motion vector fields for frame and object based video coding applications
US5802220A (en) 1995-12-15 1998-09-01 Xerox Corporation Apparatus and method for tracking facial motion through a sequence of images
US5692063A (en) 1996-01-19 1997-11-25 Microsoft Corporation Method and system for unrestricted motion estimation for video
US5799113A (en) * 1996-01-19 1998-08-25 Microsoft Corporation Method for expanding contracted video images
US5778098A (en) 1996-03-22 1998-07-07 Microsoft Corporation Sprite coding
US6037988A (en) 1996-03-22 2000-03-14 Microsoft Corp Method for generating sprites for object-based coding sytems using masks and rounding average
US5764814A (en) 1996-03-22 1998-06-09 Microsoft Corporation Representation and encoding of general arbitrary shapes
EP0831422B1 (en) * 1996-09-20 2007-11-14 Hitachi, Ltd. Method of displaying moving object for enabling identification of its moving route, display system using the same, and program recording medium therefor
US5748789A (en) 1996-10-31 1998-05-05 Microsoft Corporation Transparent block skipping in object-based video coding systems
US5946043A (en) 1997-12-31 1999-08-31 Microsoft Corporation Video coding using adaptive coding of block parameters for coded/uncoded blocks

Cited By (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7088845B2 (en) * 1998-09-10 2006-08-08 Microsoft Corporation Region extraction in vector images
US20040189863A1 (en) * 1998-09-10 2004-09-30 Microsoft Corporation Tracking semantic objects in vector image sequences
US6665423B1 (en) * 2000-01-27 2003-12-16 Eastman Kodak Company Method and system for object-oriented motion-based video description
US7304649B2 (en) 2000-05-11 2007-12-04 Kabushiki Kaisha Toshiba Object region data describing method and object region data creating apparatus
US20010040924A1 (en) * 2000-05-11 2001-11-15 Osamu Hori Object region data describing method and object region data creating apparatus
US7091988B2 (en) * 2000-05-11 2006-08-15 Kabushiki Kaisha Toshiba Object region data describing method and object region data creating apparatus
US20050280657A1 (en) * 2000-05-11 2005-12-22 Osamu Hori Object region data describing method and object region data creating apparatus
US6731799B1 (en) * 2000-06-01 2004-05-04 University Of Washington Object segmentation with background extraction and moving boundary techniques
US6985612B2 (en) * 2001-10-05 2006-01-10 Mevis - Centrum Fur Medizinische Diagnosesysteme Und Visualisierung Gmbh Computer system and a method for segmentation of a digital image
US20030068074A1 (en) * 2001-10-05 2003-04-10 Horst Hahn Computer system and a method for segmentation of a digital image
US20030146915A1 (en) * 2001-10-12 2003-08-07 Brook John Charles Interactive animation of sprites in a video production
US7432940B2 (en) * 2001-10-12 2008-10-07 Canon Kabushiki Kaisha Interactive animation of sprites in a video production
US7072512B2 (en) 2002-07-23 2006-07-04 Microsoft Corporation Segmentation of digital video and images into continuous tone and palettized regions
US20040017939A1 (en) * 2002-07-23 2004-01-29 Microsoft Corporation Segmentation of digital video and images into continuous tone and palettized regions
US7668370B2 (en) * 2002-09-19 2010-02-23 Koninklijke Philips Electronics N.V. Segmenting a series of 2D or 3D images
US20050271271A1 (en) * 2002-09-19 2005-12-08 Koninklijke Philips Electronics N.V. Segmenting a series of 2d or 3d images
KR101021409B1 (en) 2002-11-11 2011-03-14 소니 일렉트로닉스 인코포레이티드 Method and apparatus for nonlinear multiple motion model and moving boundary extraction
US20050041102A1 (en) * 2003-08-22 2005-02-24 Bongiovanni Kevin Paul Automatic target detection and motion analysis from image data
US7239719B2 (en) * 2003-08-22 2007-07-03 Bbn Technologies Corp. Automatic target detection and motion analysis from image data
US20050175219A1 (en) * 2003-11-13 2005-08-11 Ming-Hsuan Yang Adaptive probabilistic visual tracking with incremental subspace update
US7463754B2 (en) 2003-11-13 2008-12-09 Honda Motor Co. Adaptive probabilistic visual tracking with incremental subspace update
US20050216274A1 (en) * 2004-02-18 2005-09-29 Samsung Electronics Co., Ltd. Object tracking method and apparatus using stereo images
US20070058647A1 (en) * 2004-06-30 2007-03-15 Bettis Sonny R Video based interfaces for video message systems and services
US7826831B2 (en) * 2004-06-30 2010-11-02 Bettis Sonny R Video based interfaces for video message systems and services
US7369682B2 (en) 2004-07-09 2008-05-06 Honda Motor Co., Ltd. Adaptive discriminative generative model and application to visual tracking
US20060023916A1 (en) * 2004-07-09 2006-02-02 Ming-Hsuan Yang Visual tracking using incremental fisher discriminant analysis
US7650011B2 (en) 2004-07-09 2010-01-19 Honda Motor Co., Inc. Visual tracking using incremental fisher discriminant analysis
US20060036399A1 (en) * 2004-07-09 2006-02-16 Ming-Hsuan Yang Adaptive discriminative generative model and application to visual tracking
EP1792299A4 (en) * 2004-09-07 2012-10-24 Adobe Systems Inc A method and system to perform localized activity with respect to digital data
EP1792299A2 (en) * 2004-09-07 2007-06-06 Adobe Systems Incorporated A method and system to perform localized activity with respect to digital data
US7623731B2 (en) 2005-06-20 2009-11-24 Honda Motor Co., Ltd. Direct method for modeling non-rigid motion with thin plate spline transformation
US20060285770A1 (en) * 2005-06-20 2006-12-21 Jongwoo Lim Direct method for modeling non-rigid motion with thin plate spline transformation
US20080303913A1 (en) * 2005-08-26 2008-12-11 Koninklijke Philips Electronics, N.V. Imaging Camera Processing Unit and Method
US8237809B2 (en) * 2005-08-26 2012-08-07 Koninklijke Philips Electronics N.V. Imaging camera processing unit and method
US20090174595A1 (en) * 2005-09-22 2009-07-09 Nader Khatib SAR ATR treeline extended operating condition
US7787657B2 (en) * 2005-09-22 2010-08-31 Raytheon Company SAR ATR treeline extended operating condition
US7929762B2 (en) * 2007-03-12 2011-04-19 Jeffrey Kimball Tidd Determining edgeless areas in a digital image
US20080226161A1 (en) * 2007-03-12 2008-09-18 Jeffrey Kimball Tidd Determining Edgeless Areas in a Digital Image
US9367167B2 (en) * 2007-06-13 2016-06-14 Apple Inc. Bottom-up watershed dataflow method and region-specific segmentation based on historic data to identify patches on a touch sensor panel
US10175805B2 (en) * 2007-06-13 2019-01-08 Apple Inc. Bottom-up watershed dataflow method and region-specific segmentation based on historic data to identify patches on a touch sensor panel
US7916126B2 (en) * 2007-06-13 2011-03-29 Apple Inc. Bottom-up watershed dataflow method and region-specific segmentation based on historic data to identify patches on a touch sensor panel
US20080309629A1 (en) * 2007-06-13 2008-12-18 Apple Inc. Bottom up watershed dataflow method and region-specific segmentation based on historic data
US20110169763A1 (en) * 2007-06-13 2011-07-14 Wayne Carl Westerman Bottom-up watershed dataflow method and region-specific segmentation based on historic data to identify patches on a touch sensor panel
US20110175837A1 (en) * 2007-06-13 2011-07-21 Wayne Carl Westerman Bottom-up watershed dataflow method and region-specific segmentation based on historic data to identify patches on a touch sensor panel
US20090259679A1 (en) * 2008-04-14 2009-10-15 Microsoft Corporation Parsimonious multi-resolution value-item lists
US8015129B2 (en) * 2008-04-14 2011-09-06 Microsoft Corporation Parsimonious multi-resolution value-item lists
US20110282897A1 (en) * 2008-06-06 2011-11-17 Agency For Science, Technology And Research Method and system for maintaining a database of reference images
WO2010007423A3 (en) * 2008-07-14 2010-07-15 Musion Ip Limited Video processing and telepresence system and method
EA018293B1 (en) * 2008-07-14 2013-06-28 МЬЮЖН АйПи ЛИМИТЕД Video and telepresence processing method
US20110235702A1 (en) * 2008-07-14 2011-09-29 Ian Christopher O'connell Video processing and telepresence system and method
WO2010007423A2 (en) * 2008-07-14 2010-01-21 Musion Ip Limited Video processing and telepresence system and method
US20110194742A1 (en) * 2008-10-14 2011-08-11 Koninklijke Philips Electronics N.V. One-click correction of tumor segmentation results
EP2374107A4 (en) * 2008-12-11 2016-11-16 Imax Corp Devices and methods for processing images using scale space
US8970707B2 (en) * 2008-12-17 2015-03-03 Sony Computer Entertainment Inc. Compensating for blooming of a shape in an image
US20100149340A1 (en) * 2008-12-17 2010-06-17 Richard Lee Marks Compensating for blooming of a shape in an image
US20220043853A1 (en) * 2010-05-06 2022-02-10 Soon Teck Frederick Noel Liau System And Method For Directing Content To Users Of A Social Networking Engine
US8712989B2 (en) 2010-12-03 2014-04-29 Microsoft Corporation Wild card auto completion
US9171222B2 (en) * 2011-11-17 2015-10-27 Panasonic Intellectual Property Corporation Of America Image processing device, image capturing device, and image processing method for tracking a subject in images
US20130287259A1 (en) * 2011-11-17 2013-10-31 Yasunori Ishii Image processing device, image capturing device, and image processing method
US20130236099A1 (en) * 2012-03-08 2013-09-12 Electronics And Telecommunications Research Institute Apparatus and method for extracting foreground layer in image sequence
US8873855B2 (en) * 2012-03-08 2014-10-28 Electronics And Telecommunications Research Institute Apparatus and method for extracting foreground layer in image sequence
US20130254825A1 (en) * 2012-03-22 2013-09-26 Nokia Siemens Networks Oy Enhanced policy control framework for object-based media transmission in evolved packet systems
US9921665B2 (en) 2012-06-25 2018-03-20 Microsoft Technology Licensing, Llc Input method editor application platform
US10867131B2 (en) 2012-06-25 2020-12-15 Microsoft Technology Licensing Llc Input method editor application platform
US9607023B1 (en) 2012-07-20 2017-03-28 Ool Llc Insight and algorithmic clustering for automated synthesis
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
US10318503B1 (en) 2012-07-20 2019-06-11 Ool Llc Insight and algorithmic clustering for automated synthesis
US11216428B1 (en) 2012-07-20 2022-01-04 Ool Llc Insight and algorithmic clustering for automated synthesis
GB2529813B (en) * 2014-08-28 2017-11-15 Canon Kk Scale estimation for object segmentation in a medical image
GB2529813A (en) * 2014-08-28 2016-03-09 Canon Kk Scale estimation for object segmentation in a medical image
US10032077B1 (en) * 2015-10-29 2018-07-24 National Technology & Engineering Solutions Of Sandia, Llc Vehicle track identification in synthetic aperture radar images
US10313686B2 (en) * 2016-09-20 2019-06-04 Gopro, Inc. Apparatus and methods for compressing video content using adaptive projection selection
US20190289302A1 (en) * 2016-09-20 2019-09-19 Gopro, Inc. Apparatus and methods for compressing video content using adaptive projection selection
US10757423B2 (en) * 2016-09-20 2020-08-25 Gopro, Inc. Apparatus and methods for compressing video content using adaptive projection selection
US10902249B2 (en) * 2016-10-31 2021-01-26 Hewlett-Packard Development Company, L.P. Video monitoring
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
WO2019033509A1 (en) * 2017-08-16 2019-02-21 歌尔科技有限公司 Interior and exterior identification method for image contour and device
DE102017124600A1 (en) * 2017-10-20 2019-04-25 Connaught Electronics Ltd. Semantic segmentation of an object in an image
CN108345890A (en) * 2018-03-01 2018-07-31 腾讯科技(深圳)有限公司 Image processing method, device and relevant device
US20200410710A1 (en) * 2018-11-06 2020-12-31 Wuyi University Method for measuring antenna downtilt based on linear regression fitting
DE102018128184A1 (en) * 2018-11-12 2020-05-14 Bayerische Motoren Werke Aktiengesellschaft Method, device, computer program and computer program product for generating a labeled image
US10878280B2 (en) * 2019-05-23 2020-12-29 Webkontrol, Inc. Video content indexing and searching
US10997459B2 (en) * 2019-05-23 2021-05-04 Webkontrol, Inc. Video content indexing and searching
US11301099B1 (en) 2019-09-27 2022-04-12 Apple Inc. Methods and apparatus for finger detection and separation on a touch sensor panel using machine learning models
US11120280B2 (en) * 2019-11-15 2021-09-14 Argo AI, LLC Geometry-aware instance segmentation in stereo image capture processes
US11669972B2 (en) 2019-11-15 2023-06-06 Argo AI, LLC Geometry-aware instance segmentation in stereo image capture processes
US11157744B2 (en) * 2020-01-15 2021-10-26 International Business Machines Corporation Automated detection and approximation of objects in video
CN111723769A (en) * 2020-06-30 2020-09-29 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for processing image
WO2022073409A1 (en) * 2020-10-10 2022-04-14 腾讯科技(深圳)有限公司 Video processing method and apparatus, computer device, and storage medium
CN113420612A (en) * 2021-06-02 2021-09-21 深圳中集智能科技有限公司 Production beat calculation method based on machine vision
TWI774620B (en) * 2021-12-01 2022-08-11 晶睿通訊股份有限公司 Object classifying and tracking method and surveillance camera
CN114503946A (en) * 2022-01-24 2022-05-17 海南大学 Fishing ground feeding system and method based on high frame rate dynamic frame difference accurate identification

Also Published As

Publication number Publication date
US6400831B2 (en) 2002-06-04

Similar Documents

Publication Publication Date Title
US6400831B2 (en) Semantic video object segmentation and tracking
Jain et al. Deformable template models: A review
Masood et al. A survey on medical image segmentation
Zhang et al. Segmentation of moving objects in image sequence: A review
Gu et al. Semiautomatic segmentation and tracking of semantic video objects
Fu et al. Tracking visible boundary of objects using occlusion adaptive motion snake
Bray et al. Posecut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts
US7430321B2 (en) System and method for volumetric tumor segmentation using joint space-intensity likelihood ratio test
CN111310659B (en) Human body action recognition method based on enhanced graph convolution neural network
US20030169812A1 (en) Method for segmenting a video image into elementary objects
JP2008518331A (en) Understanding video content through real-time video motion analysis
Gong et al. Advanced image and video processing using MATLAB
Zhao et al. Stochastic human segmentation from a static camera
Weiss et al. Perceptually organized EM: A framework for motion segmentation that combines information about form and motion
US20030095707A1 (en) Computer vision method and system for blob-based analysis using a probabilistic pramework
WO2019071976A1 (en) Panoramic image saliency detection method based on regional growth and eye movement model
US20060242218A1 (en) Prior-constrained mean shift analysis
JP2005165791A (en) Object tracking method and tracking system
Gui et al. Reliable and dynamic appearance modeling and label consistency enforcing for fast and coherent video object segmentation with the bilateral grid
Litvin et al. Coupled shape distribution-based segmentation of multiple objects
JP2002525988A (en) System and method for semantic video object segmentation
Gu et al. Semantic video object segmentation and tracking using mathematical morphology and perspective motion model
Bouaynaya et al. A complete system for head tracking using motion-based particle filter and randomly perturbed active contour
Gao et al. Articulated motion modeling for activity analysis
Szeliski et al. Segmentation

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEE, MING-CHIEH;GU, CHUANG;REEL/FRAME:009083/0770

Effective date: 19980401

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034541/0001

Effective date: 20141014