WO2011155902A1

WO2011155902A1 - General motion-based face recognition

Info

Publication number: WO2011155902A1
Application number: PCT/SG2011/000208
Authority: WO
Inventors: Ning Ye; Terence Mong Cheng Sim
Original assignee: National University Of Singapore
Priority date: 2010-06-11
Filing date: 2011-06-10
Publication date: 2011-12-15

Abstract

A local deformation profile for two or more facial expressions in two or more video clips is generated. The local deformation profile includes motion information such as the direction or magnitude of displacement of a point on the individual's face during the course of an expression. Motion and deformation data from two or more local deformation profiles wherein an individual is expressing different facial expressions may be compared to determine if a same individual is in both video clips. If the individual's localized facial motion is similar in the two video clips it suggests a high confidence in the deformation computation. A small difference in deformation patterns of the video clips along with a similar localized facial motion suggests a same individual is in the two or more video clips. Thus, an individual expressing facial expressions may be identified in video clips.

Description

GENERAL MOTION-BASED FACE RECOGNITION

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Application No.

61/353,812, filed June 11, 2010, which is incorporated by reference in its entirety.

BACKGROUND

[0002] This invention relates generally to facial recognition and more specifically to motion-based face recognition.

[0003] Facial recognition systems typically identify individuals based on their facial features, including shape and color of the eyes, nose, mouth, chin or distances between these parts. These systems typically match an image with an individual if facial features measured from the image match an individual's known features. However, such a method of recognizing an individual from a facial image has drawbacks. For example, if an individual provides different facial expressions that alter the shape or distances of facial features, the system may not be able to correctly identify the individual.

[0004] Some facial recognition systems use facial motion to recognize an individual. For example, motion recognition systems may calculate displacement vectors associated with certain facial movements such as smiles or frowns to recognize an individual. However, such motion recognition systems require a fixed, predefined motion. Therefore, an individual must perform a specific facial motion each time in order to be successfully recognized. These limitations limit the use and versatility of facial motion recognition systems.

SUMMARY

[0005] According to an embodiment of the invention, a computer-implemented method performs motion-based face recognition. In one embodiment, the method extracts motion and deformation information from one or more facial expressions provided by an individual and uses the extracted information to identify an individual in another facial motion video.

[0006] In one embodiment, the method extracts a local deformation profile from a frontal view facial motion by the following steps: 1) use a face detection and localization algorithm to find a set of key points on the neutral face of the first video frame; 2) remove any rigid head motion from the video; 3) crop the face region from the video to get a cropped face image sequence; 4) track each pixel on the neutral face (the first cropped face image) throughout the image sequence to obtain its displacement in each frame; 5) warp the displacement fields defined on the neutral face using a transformation which normalizes the face shape to a given mean face shape; and 6) from the shape-free displacement fields, construct the local deformation profile.

[0007] In one embodiment, the method constructs a local deformation profile for two or more facial expressions provided by one or more individuals. The local deformation profile includes motion inforaiation, such as the direction or magnitude of displacement of a point on the individual's face during the course of an expression. Motion and deformation data from two or more local deformation profiles in which an individual expresses different facial expressions may be compared to determine if the same individual is in both video clips. In one embodiment, the method determines if the individual's localized facial motion is similar in the two local deformation profiles.

[0008] Localized facial motions can be similar even if expressions are dissimilar. For example, facial expression of surprise and fear may be globally different, but they are locally similar around the eyebrows. A large difference in local motion suggests that the deformation patterns in the two or more video clips are caused by very different facial expressions and thus will not provide reliable results. On the other hand, a low difference in local motion suggests that deformation patterns are similar enough to compare with one other. In such an instance, the method may compare deformation of at least one point on the individual's face in each local deformation profile, wherein a a sufficiently small difference in deformation pattern suggests that the same individual is present in the two or more video clips. In this way, the identity of an individual expressing different facial expressions may be identified in video clips.

[0009] The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] FIG. 1 is a high-level block diagram of a system environment for identifying an individual based on his or her facial movement video clip, in accordance with an embodiment of the invention. [0011] FIG. 2 is a block diagram of a computing device in accordance with an embodiment of the invention.

[0012] FIG. 3 is a block diagram of a local deformation profile generation module in accordance with an embodiment of the invention.

[0013] FIG. 4 is a block diagram of a local deformation profile comparison module in accordance with an embodiment of the invention.

[0014] FIG. 5 is a flow chart of a process for generating a local deformation profile in accordance with an embodiment of the invention.

[0015] FIG. 6 A illustrates an example of a neutral face image in accordance with an embodiment of the invention.

[0016] FIG. 6B illustrates areas of interest in a facial image in accordance with an embodiment of the invention.

[0017] FIG. 6C illustrates a deformation profile of a facial expression in accordance with an embodiment of the invention.

[0018] FIG. 6D illustrates an overlap in two deformation patterns in accordance with an embodiment of the invention.

[0019] FIG. 7 is a flow chart of a process for matching local deformation profiles in accordance with an embodiment of the invention.

[0020] The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.

DETAILED DESCRIPTION

Overview

[0021] Embodiments of the inventions may be implemented using various architectures, such as the example architecture illustrated in FIG. 1. In this embodiment, a facial expression video 104 is compared to a training video 102. A local deformation profile (LDP) generation module 106 generates an LDP for each video and the LDP comparison module 108 compares the LDP profiles of each video to one another to identify if a same individual is in both videos. The results of the comparison are provided in the identification result 110.

[0022] A training video 102, also referred to as an enrolment video, includes two or more frames showing an individual's facial expression. A facial expression may include, but is not limited to, displays of surprise, happiness, fear, anger and disgust. The display of each expression may vary per individual. For example, a display of happiness may include a smile for an individual. In one embodiment, the training video 102 includes a neutral face image for an individual. The neutral face image is wherein a user is not expressing an emotion. It may be used as a baseline for comparison against other expressions. The training video 102 is provided to the LDP generation module 106 to extract an LDP profile from the video.

[0023] Similarly, a facial expression video 104 includes two or more frames showing an individual's facial expression. As noted, the facial expression may include displays of surprise, anger, happiness, or any other facial expression. Additionally, the facial expression video 104 also includes at least one frame displaying a neutral expression. The facial expression video 104 is provided to the LDP generation module 106 to extract an LDP profile from the video.

[0024] The LDP generation module 106 extracts an LDP profile for a facial expression from a video, including a training video 102 and a facial expression video 104. An LDP includes a set of deformation-displacement pairs that represent the motion and deformation of a face at a particular location through an expression. For example, the LDP may be represented by a two-dimensional displacement field that changes the neutral face to a specific deformed face and deformation sensor. The LDP is discussed in greater detail below in reference to FIG. 3. In one embodiment, the LDP generation module 106 generates an LDP for each training video 102 and the facial expression video 104.

[0025] The LDP comparison module 108 compares the LDPs for each the training video 102 with the facial expression video 104. The LDP comparison module 108 measures the difference in displacement vectors associated with facial motion between the videos and the difference in deformation patterns associated with an expression in each video clip. The differences in motion and deformation may be provided in the identification result 1 10, wherein a large difference in displacement vectors suggests that the facial motions are dissimilar and a comparison of deformation would not provide reliable results. Additionally, a large difference in deformation patterns suggests a difference in identity of individuals in the training video 102 and the facial expression video 104. On the other hand, a small difference (or a high verification score) suggests that the same individual is present in both training videos 102 and the facial expression video 104. Thus individual in 104 is given the identity of the individual in 102. A verification score is discussed in greater detail in the specification. System Environment

[0026] FIG. 2 illustrates an embodiment of a computing device 200 used to implement the LDP generation module 106 and the LDP comparison module 108. In the embodiment shown in FIG. 2, the computing device 200 comprises a processor 210, a data store 220, an input device 230, an output device 240, a power supply 250 and a communication module 260. It should be understood, however, that not all of the above components are required for the computing device 200, and this is not an exhaustive list of components for all

embodiments of the computing device 200 or of all possible variations of the above components. A computing device 200 may have any combination of fewer than all of the capabilities and components described herein.

[0027] The processor 210, data store 220, and the power supply 250 enable the computing device 200 to perform computing functionalities. The processor 210 is coupled to the input device 230 and the output device 240 enabling applications running on the computing device 200 to use these devices. In one embodiment, the data store 220 comprises a small amount of random access memory (RAM) and a larger amount of flash or other persistent memory, allowing applications or other computer executable code to be stored and executed by the processor 210. In one embodiment, the data store 220 includes instructions that when executed cause the processor to perform the actions described above in FIG. 1 in conjunction with the LDP generation module 106 and the LDP comparison module 108, allowing the computing device 200 to retrieve data from one or more data sources including the training video 102 and the facial expression video 104. The computing device 200 also executes an operating system or other software supporting one or more input modalities for receiving input from the input device 230 and/or one or more output modalities presenting data via the output device 240, such as audio playback or display of visual data.

[0028] The output device 240 may comprise any suitable display system for providing visual feedback, such as an organic light emitting diode (OLED) display. The output device 240 may also include a speaker or other audio playback device to provide auditory feedback. For example, the output device 240 may communicate audio feedback (e.g., prompts, commands, and system status) according to an application running on the computing device 200 using the speaker and also display word phrases, static or dynamic images, or prompts as directed by the application using the display. The input device 240 comprises any suitable device for receiving input from a user, such as a keyboard, touch-sensitive display or gesture capture system. In one instance, the input device 240 includes a device capable of providing a video, including a video camera. [0029] The communication module 260 comprises a wireless communication circuit allowing wireless communication with the network 120 (e.g., via Bluetooth, WiFi, RF, infrared, or ultrasonic). For example, the communication module 260 identifies and communicates with one or more wireless access points using WiFi or identifies and communicates with one or more cell towers using RF. In an embodiment, the communication module 260 also includes a jack for receiving a data cable (e.g., Mini-USB or Micro-USB).

LDP Generation and Comparison

[0030] The LDP generation module 106 as illustrated in FIG. 3, generates a LDP for a facial motion video clip, including the training video 102 and the facial expression video 104. The LDP generation module 106 includes a pixel detection module 310, a motion mitigation module 320, a resizing module 330, a pixel tracking module 340 and a normalization module 350.

[0031] The pixel detection module 310 finds key points or pixels on a neutral face of a facial motion video clip. In one embodiment, the pixel detection module 310 identifies a video frame displaying a neutral facial expression. The neutral face frame may be the first frame of a video. In one embodiment, the pixel detection module 310 uses a face detection and localization algorithm to find a set of key points on the neutral face. In another embodiment, statistical models such as active shape models may be used may be used to find key points on the neutral face. Additionally, software libraries such as STASM may be used to find the key points as described by S.Milborrow and F.Nicolls in "Locating facial features with an extended active shape model." Proc. ECCV, 2008. In one embodiment, the pixel detection module 310 finds key points or pixels on the neutral face by computing the key points from several neutral faces in a database. In one instance, the pixel detection module 310 finds regions of interest enclosed by the key points. Regions of interest exclude areas that may produce distortion in computing the LDP. Some regions of the face, including the eyes and the chin, can be occluded or move out of the image during some facial expressions. For example, eyes may be occluded when blinking occurs and the chin may move out of the image with a wide open mouth during facial expressions of surprise. In one embodiment, the region of interest is computed as a convex hull of key points on a neutral face, excluding the areas which may produce distortion in computing the LDP.

[0032] The motion mitigation module 320 removes rigid motion from a facial movement video clip. A rigid head motion may be detected by a motion detection algorithm, wherein head motion such as turning or tilting may be detected within a video clip. Video frames containing rigid motions may be removed from the video clip. In other embodiments, other methods of removing head motion from a video clip may be applied by the motion mitigation module 320. In other embodiments, the motion mitigation module 320 persistent head motion in a facial movement video clip. For example, the motion mitigation module 320 may measure an individual's head-tilt or gesture when performing an expression. The motion mitigation module 320 sends the measured head motion information to the computation module 360 wherein the individual's head motion is accounted in the individual's LDP.

[0033] In one embodiment, the resizing module 330 resizes the video frames containing frontal view of a face such that all the video frames are the same size. In one instance, the video frames are resized to 128 x 160 pixels for processing purposes. In other instances, the resizing module 330 resizes video frames such that a face comprises a same amount of space on a video frame in all video frames.

[0034] The pixel tracking module 340 tracks each pixel or key point on the neutral face throughout the image sequence to obtain its displacement in each frame. The pixel tracking module 340 may track pixels in a variety of ways, such as using an optical flow estimation described by B.D.Lucas and T.Kanade, "An iterative image registration technique with an application to stereo vision" in Proc. The 1981 DARPA Image Understanding Wo^hop, pages 121-130, April 1981. The flow estimation method assumes that the flow is essentially constant in a local neighborhood of a pixel under consideration, and the method solves the basic optical flow equations for all the pixels in that neighborhood by the least squares criterion. In other embodiments, the pixel tracking module 340 may compute a point's location at two or more video frames within the image sequence.

[0035] The normalization module 350 warps the displacement fields defined on the neutral face using a transformation that normalizes the face shape to a neutral face shape. The normalization module 350 may use any transformation that normalizes the face shape. The normalization module 350 therefore may deform the face shape such that one location on a facial image corresponds to the same location in another facial image in a video frame. As such, the computation module 360 may compute an LDP based on the normalized face shapes.

[0036] The computation module 360 computes an LDP for an expression in a facial motion video clip. In one instance the neutral face is used as the initial state and u denotes the two-dimensional displacement field, which changes the neutral face to a specific deformed face. Then, the computation module 360 computes the deformation tensor C as, C = Vu^TVu + Vii^r - (1) where I is an identity matrix, and V is the gradient operator. Although both u and C are generally supposed to be continuous in space, in the above implementation, u is defined on each pixel in the neutral face and thus, so is C. The two orthogonal eigenvectors of C give the two principal deformation directions. And the square-root of the corresponding eigenvalue measures the deformation magnitude. If the eigenvalue is smaller than one, a compression is observed. If the eigenvalue is larger than one, a stretch is observed. Such a deformation pattern can be well represented by an ellipse, as shown in FIG. 6C. The direction and length of the major/minor axes of the ellipse are determined by the eigenvectors and the square-roots of the eigenvalues of C, respectively.

[0037] Additionally, the computation module 360 computes LDP as a set of deformation- displacement pairs. The LDP may be represented mathematically as,

[0038]

where x denote a pixel in the shape-normalized neutral face image of the subject and t is an index. An order for the elements in set & may not be formed in one embodiment. The index t is used to refer to different deformed state of the face, and u_x excludes rigid head motion. In one embodiment, the LDP generation module 106 generates an LDP for training video 102 and the facial expression video 104. In one instance, an individual's LDP includes the individual's head motion information, including but not limited to a head tilt or a gesture.

[0039] FIG. 4 illustrates an LDP comparison module 108. The LDP comparison module 108 compares the LDPs of the training video 102 and the facial expression video 104 at each pixel in the image. The LDP comparison module 108 includes a deformation computation module 410, a motion computation module 420 and a verification module 430.

[0040] In one embodiment, the LDP comparison module 108 compares two LDPs at a particular pixel. If the training video LDP (TV-LDP) is matched with a facial expression LDP (FELDP), for example, for each TV-LDP u, a closest FE-LDP u, is found. The LDP comparison module 108 measures the similarity between their corresponding C. The deformation similarity, ¾ and motion similarity, s,„ as weighted averages of local deformation similarity and local motion similarity, which are measured on each pixel respectively, ^Sd = Σ ^W(^X)^S _Cl (4 =Σ *)*,,, (4

(3)

w(x = ^¾ . (4) where x denote a pixel in the region of interest Ω (Figure 2(b)). NonTialized motion similarity serves as the weight.

[0041] The defonnation computation module 410 computes deformation similarity, ¾ representing the difference in deformation patterns. In one embodiment, the defonnation computation module 410 defines a function ψ for comparing two defonnation patterns as,

where A_\ and A₂ are the areas that represent the two defonnation patterns, C_\ and C₂, respectively; and A is the area of overlap of the two ellipses after being translated to be concentric. FIG. 6D shows an example

608, A₂ 610 and A 612. In one instance, to match LDP A, represented as &_A = against LDP B, represented as

·-¾ = i(¾_' "S _t)} ^on ixel x, the defonnation computation module 410 finds, for each local motion in ¾, the most similar local motion in ^<¾ (on the same pixel). Mathematically,

^(w^ Mj) = (1 - r>expi-r^a/cr ), (8)

r = |u_t - «₃|/CIM.I.1 + l¾D, (9)

^ (¾₁,Μ_Ϊ} = 1 - expi—qVa (10)

where | · | denotes /²-norm; r is a commonly used relative measurement of vector difference;

□ i alters the value of r so that larger difference will be penalized more severely; and D ₂ is a penalty for small displacement vectors. As such, when the displacement is small, the deformation pattern is also slight and thus does not provide much personal characteristics. <T| and σ₂ are two parameters. In one embodiment, they may be set to 0.3 and 1.0 respectively.

□ is the motion similarity between two displacement vectors; 0 <□ < 1 and a bigger value indicates a higher motion similarity. In one instance, motion similarity can be considered as the confidence of the deformation similarity measurement. Thus, in such an instance, the deformation computation module 410 converts the motion similarity scores to normalized weights:

Additionally, the local deformation similarity at pixel x is computed as a weighted

[0043] The motion computation module 420 computes the motion similarity between two LDPs at each pixel or key point on the neutral face of a video frame. In one embodiment, the LDP comparison module 108 compares the head motion information in two LDPs to determine an identity of an individual in a facial expression video 104. As described in the specification, head motion may include a head-tilt, a gesture, etc that may be characteristic to an individual. Given two LDPs ¾ = {( .i _r)}, and = the motion computation module 420 may measure the local motion similarity on a pixel x as follows:

[0044] The verification module 430 identifies if a same individual is in both training video 102 and the facial expression video 104. In one embodiment, the verification module 430 computes a verification score by multiplying the overall motion similarity and deformation similarity scores:

S( A, &_B) = .¾) · 5rf ( .¾, ·!¾). (15) where s,„ ( i¾, i¾) denotes the motion similarity measured with -¾ against Similarly ¾ ( ¾, ¾) denotes the deformation similarity measured with ·.¾ against i¾. s( J¾, i¾) is the verification score of matching .f¾ against with 0 < s( i¾) < 1.0. A higher score denotes a higher similarity in identity of individuals in each video represented by each LDP. In one embodiment, if s( i¾, f¾) > Θ, subject A is considered as the same person as subject B. In one instance, the parameter Θ is determined based on users' requirements. If s( i¾, J¾) < Θ, subject s is considered as a different person from subject B.

Method of LDP Generation and Comparison

[0045] FIG. 5 is a flow chart of a process for generating a local deformation profile in accordance with an embodiment of the invention. The process starts by receiving at least one facial motion video 510. As described in reference to FIG. 1 , facial motion videos include a training video and a facial expression video. Each video may or may not include a same individual in the video. Additionally, each video includes a neutral face wherein the individual in the video is not expressing any expression. The neutral face may be flagged in a video clip or may be the first frame in the video clip. The process finds 520 key pixels on the neutral face by using a face detection and localization algorithm. FIG. 6A illustrates key pixels 604 on a neutral face of an individual in a video frame. In one embodiment, the pixels are in a region of interest on the individual's face. FIG. 6B illustrates regions of interest 606 on a face in accordance with an embodiment. As illustrated in the figure, the area of interest 606 excludes eyes, chin and forehead. The region of interest excludes these regions because they may cause distortion in an LDP. The process removes 530 any head motion recorded on the video. As such, the process can remove video frames that may provide distorted LDPs. The process resizes 540 the video frames such that they are a same size. The images may also be cropped such that the head occupies a similar area within each video frame. The process also tracks 550 each pixel on the neutral face throughout the image sequence to obtain its displacement in each frame. In one embodiment, an optical flow estimation method may be used to track a pixel, including the Lucas-Kanade optical flow estimation with pyramidal refinement method described by B.D.Lucas and T.Kanade, "An iterative image registration technique with an application to stereo vision" in Proc. The 1981 DARPA Image Understanding Workshop, pages 121-130, April 1981. Other models may be used to track the pixels in other embodiments. The process normalizes 560 the displacement fields defined on the neutral face using a transformation to a mean face. The mean face may include a average of pixel locations throughout the image sequence. The process users the normalized video frames to construct 570 an LDP for each video. As described in the specification above in reference to FIG. 3, the LDP may be represented as a deformation-displacement pairs. The process ends once the LDP for a received 510 video is constructed.

[0046] FIG. 7 is a flow chart of a process for matching local deformation profiles in accordance with an embodiment of the invention. The process starts by receiving 710 a plurality of LDPs. As described in the specification above, an LDP is a deformation- displacement pair represented by {(C_Xit, where x denotes a pixel in a shape- normalized neutral face image in a video clip. For each pixel, the process calculates 720 a local deformation similarity score. The local deformation similarity score may includes a measure of motion similarity between the LDPs. Additionally, the process calculates 730 a local motion similarity score at each pixel my computed. In one embodiment, the local motion similarity score is computed as a weighted average. The process calculates 740 a verification score indicating whether a same individual is in two or more videos represented by the LDPs. In one embodiment, a high verification score indicates high likelihood that a same individual is in both facial motion video clips. Conversely, a low verification score may indicate a low likelihood that a same individual is in both facial motion videos represented by the plurality of LDPs received 710 by the process.

[0047] An advantage of the embodiments disclosed herein is that the system and the method provide a way to determine the identity of an individual in a video clip. The video clip may be a part of security surveillance video, an advertisement video, etc. As such, the embodiments may be applied to identify an individual in a variety of situations, including but not limited to providing surveillance security, providing individual specific advertising, etc. Additionally, the embodiments described herein may be used to identify an individual's sentiment or expression, allowing a system to perform sentiment analysis on a video feed.

Summary

[0048] The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.

[0049] Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.

[0050] Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.

[0051] Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a tangible computer readable storage medium or any type of media suitable for storing electronic instructions, and coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

[0052] Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.

Claims

What is claimed is:

1. A method for identifying an individual in a video clip based on the

individual's local deformation profile, the method comprising:

detecting a face in the video clip;

identifying a neutral facial expression on a video frame within the video clip; identifying at least one pixel within the neutral face video frame;

tracking an identified pixel in at least two frames of the video clip to obtain its displacement in the frames;

generating a local defomiation profile including a deformation and displacement pair based on the displacement of each pixel;

comparing the local deformation profile with an individual's local deformation profile by comparing difference in motion and deformation between the two local deformation profiles; and

identifying an individual based on the comparing between the local deformation profiles.

2. The method of claim 1, wherein detecting a face in the video clip comprises using a face detection and localization algorithm.

3. The method of claim 1 , wherein the neutral facial expression is a first frame of the video clip.

4. The method of claim 1 , further comprising removing rigid head motion from the video clip.

5. The method of claim 1 , further comprising cropping the face region of video frames within the video clip.

6. The method of claim 1 , wherein the pixel within the neutral face is computed from a mean of neutral faces in a dataset.

7. The method of claim 1 , wherein the pixel is within a region of interest of a face, the region of interest excluding eyes, a forehead and a chin of a face in a video frame.

8. The method of claim 1 , wherein the pixel is tracked using an optical flow estimation method.

9. The method of claim 1 , wherein the pixel tracking method is a Lucas-Kanade optical flow estimation with pyramidal refinement.

10. The method of claim 1 , further comprising warping displacement fields of each pixel on the neutral face using a transformation to normalize the face shape of a mean face shape.

1 1. The method of claim 1 , wherein a large difference in displacement vectors associated with motion between the two local deformation profiles indicates a low confidence in deformation similarity score.

12. The method of claim 1, wherein a large difference in deformation patterns suggests a difference in identity of individuals represented by each local deformation profiles.

13. The method of claim 1 , further comprising computing a verification score by combining the motion similarity score and the deformation similarity score to determine the identity of an individual in the video clip.

14. The method of claim 1 , further comprising:

measuring a head motion in the video clip;

including the measured head motion in the local deformation profile; and comparing the head motion in local deformation profile with an individual's local deformation profile to identify the individual.

15. A computer program product for identifying an individual in a video clip based on the individual's local deformation profile, the computer program product comprising a computer-readable storage medium containing computer program code for:

detecting a face in the video clip;

generating a local deformation profile including a deformation and displacement pair based on the displacement of each pixel; comparing the local deformation profile with an individual's local deformation profile by comparing difference in motion and deformation between the two local deformation profiles; and

16. The computer program product of claim 15, wherein detecting a face in the video clip comprises using a face detection and localization algorithm.

17. The computer program product of claim 15, wherein the neutral facial expression is a first frame of the video clip.

18. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for removing rigid head motion from the video clip.

19. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for cropping the face region of video frames within the video clip.

20. The computer program product of claim 15, wherein the pixel within the neutral face is computed from a mean of neutral faces in a dataset.

21. The computer program product of claim 15, wherein the pixel is within a region of interest of a face, the region of interest excluding eyes, a forehead and a chin of a face in a video frame.

22. The computer program product of claim 15, wherein the pixel is tracked using an optical flow estimation method.

23. The computer program product of claim 15, wherein the pixel tracking method is a Lucas-Kanade optical flow estimation with pyramidal refinement.

24. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for warping displacement fields of each pixel on the neutral face using a transformation to normalize the face shape of a mean face shape.

25. The computer program product of claim 15, wherein a large difference in displacement vectors associated with motion between the two local deformation profiles indicates a low confidence in deformation similarity score.

26. The computer program product of claim 15, wherein a large difference in deformation patterns suggests a difference in identity of an individual represented by each local deformation profiles.

27. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for computing a verification score by combining the motion similarity score and the deformation similarity score to determine the identity of an individual in the video clip.

28. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for:

measuring a head motion in the video clip;