WO2011155902A1 - General motion-based face recognition - Google Patents

General motion-based face recognition Download PDF

Info

Publication number
WO2011155902A1
WO2011155902A1 PCT/SG2011/000208 SG2011000208W WO2011155902A1 WO 2011155902 A1 WO2011155902 A1 WO 2011155902A1 SG 2011000208 W SG2011000208 W SG 2011000208W WO 2011155902 A1 WO2011155902 A1 WO 2011155902A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
individual
computer program
pixel
video
Prior art date
Application number
PCT/SG2011/000208
Other languages
French (fr)
Inventor
Ning Ye
Terence Mong Cheng Sim
Original Assignee
National University Of Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University Of Singapore filed Critical National University Of Singapore
Publication of WO2011155902A1 publication Critical patent/WO2011155902A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression

Definitions

  • This invention relates generally to facial recognition and more specifically to motion-based face recognition.
  • Facial recognition systems typically identify individuals based on their facial features, including shape and color of the eyes, nose, mouth, chin or distances between these parts. These systems typically match an image with an individual if facial features measured from the image match an individual's known features.
  • a method of recognizing an individual from a facial image has drawbacks. For example, if an individual provides different facial expressions that alter the shape or distances of facial features, the system may not be able to correctly identify the individual.
  • Some facial recognition systems use facial motion to recognize an individual. For example, motion recognition systems may calculate displacement vectors associated with certain facial movements such as smiles or frowns to recognize an individual.
  • motion recognition systems require a fixed, predefined motion. Therefore, an individual must perform a specific facial motion each time in order to be successfully recognized.
  • a computer-implemented method performs motion-based face recognition.
  • the method extracts motion and deformation information from one or more facial expressions provided by an individual and uses the extracted information to identify an individual in another facial motion video.
  • the method extracts a local deformation profile from a frontal view facial motion by the following steps: 1) use a face detection and localization algorithm to find a set of key points on the neutral face of the first video frame; 2) remove any rigid head motion from the video; 3) crop the face region from the video to get a cropped face image sequence; 4) track each pixel on the neutral face (the first cropped face image) throughout the image sequence to obtain its displacement in each frame; 5) warp the displacement fields defined on the neutral face using a transformation which normalizes the face shape to a given mean face shape; and 6) from the shape-free displacement fields, construct the local deformation profile.
  • the method constructs a local deformation profile for two or more facial expressions provided by one or more individuals.
  • the local deformation profile includes motion inforaiation, such as the direction or magnitude of displacement of a point on the individual's face during the course of an expression.
  • Motion and deformation data from two or more local deformation profiles in which an individual expresses different facial expressions may be compared to determine if the same individual is in both video clips.
  • the method determines if the individual's localized facial motion is similar in the two local deformation profiles.
  • Localized facial motions can be similar even if expressions are dissimilar. For example, facial expression of surprise and fear may be globally different, but they are locally similar around the eyebrows. A large difference in local motion suggests that the deformation patterns in the two or more video clips are caused by very different facial expressions and thus will not provide reliable results. On the other hand, a low difference in local motion suggests that deformation patterns are similar enough to compare with one other. In such an instance, the method may compare deformation of at least one point on the individual's face in each local deformation profile, wherein a a sufficiently small difference in deformation pattern suggests that the same individual is present in the two or more video clips. In this way, the identity of an individual expressing different facial expressions may be identified in video clips.
  • FIG. 1 is a high-level block diagram of a system environment for identifying an individual based on his or her facial movement video clip, in accordance with an embodiment of the invention.
  • FIG. 2 is a block diagram of a computing device in accordance with an embodiment of the invention.
  • FIG. 3 is a block diagram of a local deformation profile generation module in accordance with an embodiment of the invention.
  • FIG. 4 is a block diagram of a local deformation profile comparison module in accordance with an embodiment of the invention.
  • FIG. 5 is a flow chart of a process for generating a local deformation profile in accordance with an embodiment of the invention.
  • FIG. 6 A illustrates an example of a neutral face image in accordance with an embodiment of the invention.
  • FIG. 6B illustrates areas of interest in a facial image in accordance with an embodiment of the invention.
  • FIG. 6C illustrates a deformation profile of a facial expression in accordance with an embodiment of the invention.
  • FIG. 6D illustrates an overlap in two deformation patterns in accordance with an embodiment of the invention.
  • FIG. 7 is a flow chart of a process for matching local deformation profiles in accordance with an embodiment of the invention.
  • Embodiments of the inventions may be implemented using various architectures, such as the example architecture illustrated in FIG. 1.
  • a facial expression video 104 is compared to a training video 102.
  • a local deformation profile (LDP) generation module 106 generates an LDP for each video and the LDP comparison module 108 compares the LDP profiles of each video to one another to identify if a same individual is in both videos. The results of the comparison are provided in the identification result 110.
  • LDP local deformation profile
  • a training video 102 also referred to as an enrolment video, includes two or more frames showing an individual's facial expression.
  • a facial expression may include, but is not limited to, displays of surprise, happiness, fear, anger and disgust.
  • the display of each expression may vary per individual. For example, a display of happiness may include a smile for an individual.
  • the training video 102 includes a neutral face image for an individual. The neutral face image is wherein a user is not expressing an emotion. It may be used as a baseline for comparison against other expressions.
  • the training video 102 is provided to the LDP generation module 106 to extract an LDP profile from the video.
  • a facial expression video 104 includes two or more frames showing an individual's facial expression. As noted, the facial expression may include displays of surprise, anger, happiness, or any other facial expression. Additionally, the facial expression video 104 also includes at least one frame displaying a neutral expression. The facial expression video 104 is provided to the LDP generation module 106 to extract an LDP profile from the video.
  • the LDP generation module 106 extracts an LDP profile for a facial expression from a video, including a training video 102 and a facial expression video 104.
  • An LDP includes a set of deformation-displacement pairs that represent the motion and deformation of a face at a particular location through an expression.
  • the LDP may be represented by a two-dimensional displacement field that changes the neutral face to a specific deformed face and deformation sensor. The LDP is discussed in greater detail below in reference to FIG. 3.
  • the LDP generation module 106 generates an LDP for each training video 102 and the facial expression video 104.
  • the LDP comparison module 108 compares the LDPs for each the training video 102 with the facial expression video 104.
  • the LDP comparison module 108 measures the difference in displacement vectors associated with facial motion between the videos and the difference in deformation patterns associated with an expression in each video clip.
  • the differences in motion and deformation may be provided in the identification result 1 10, wherein a large difference in displacement vectors suggests that the facial motions are dissimilar and a comparison of deformation would not provide reliable results.
  • a large difference in deformation patterns suggests a difference in identity of individuals in the training video 102 and the facial expression video 104.
  • a small difference suggests that the same individual is present in both training videos 102 and the facial expression video 104.
  • individual in 104 is given the identity of the individual in 102.
  • a verification score is discussed in greater detail in the specification.
  • FIG. 2 illustrates an embodiment of a computing device 200 used to implement the LDP generation module 106 and the LDP comparison module 108.
  • the computing device 200 comprises a processor 210, a data store 220, an input device 230, an output device 240, a power supply 250 and a communication module 260. It should be understood, however, that not all of the above components are required for the computing device 200, and this is not an exhaustive list of components for all
  • a computing device 200 may have any combination of fewer than all of the capabilities and components described herein.
  • the processor 210, data store 220, and the power supply 250 enable the computing device 200 to perform computing functionalities.
  • the processor 210 is coupled to the input device 230 and the output device 240 enabling applications running on the computing device 200 to use these devices.
  • the data store 220 comprises a small amount of random access memory (RAM) and a larger amount of flash or other persistent memory, allowing applications or other computer executable code to be stored and executed by the processor 210.
  • the data store 220 includes instructions that when executed cause the processor to perform the actions described above in FIG. 1 in conjunction with the LDP generation module 106 and the LDP comparison module 108, allowing the computing device 200 to retrieve data from one or more data sources including the training video 102 and the facial expression video 104.
  • the computing device 200 also executes an operating system or other software supporting one or more input modalities for receiving input from the input device 230 and/or one or more output modalities presenting data via the output device 240, such as audio playback or display of visual data.
  • the output device 240 may comprise any suitable display system for providing visual feedback, such as an organic light emitting diode (OLED) display.
  • the output device 240 may also include a speaker or other audio playback device to provide auditory feedback.
  • the output device 240 may communicate audio feedback (e.g., prompts, commands, and system status) according to an application running on the computing device 200 using the speaker and also display word phrases, static or dynamic images, or prompts as directed by the application using the display.
  • the input device 240 comprises any suitable device for receiving input from a user, such as a keyboard, touch-sensitive display or gesture capture system.
  • the input device 240 includes a device capable of providing a video, including a video camera.
  • the communication module 260 comprises a wireless communication circuit allowing wireless communication with the network 120 (e.g., via Bluetooth, WiFi, RF, infrared, or ultrasonic). For example, the communication module 260 identifies and communicates with one or more wireless access points using WiFi or identifies and communicates with one or more cell towers using RF. In an embodiment, the communication module 260 also includes a jack for receiving a data cable (e.g., Mini-USB or Micro-USB).
  • a data cable e.g., Mini-USB or Micro-USB
  • the LDP generation module 106 as illustrated in FIG. 3, generates a LDP for a facial motion video clip, including the training video 102 and the facial expression video 104.
  • the LDP generation module 106 includes a pixel detection module 310, a motion mitigation module 320, a resizing module 330, a pixel tracking module 340 and a normalization module 350.
  • the pixel detection module 310 finds key points or pixels on a neutral face of a facial motion video clip.
  • the pixel detection module 310 identifies a video frame displaying a neutral facial expression.
  • the neutral face frame may be the first frame of a video.
  • the pixel detection module 310 uses a face detection and localization algorithm to find a set of key points on the neutral face.
  • statistical models such as active shape models may be used may be used to find key points on the neutral face.
  • software libraries such as STASM may be used to find the key points as described by S.Milborrow and F.Nicolls in "Locating facial features with an extended active shape model.” Proc. ECCV, 2008.
  • the pixel detection module 310 finds key points or pixels on the neutral face by computing the key points from several neutral faces in a database. In one instance, the pixel detection module 310 finds regions of interest enclosed by the key points. Regions of interest exclude areas that may produce distortion in computing the LDP. Some regions of the face, including the eyes and the chin, can be occluded or move out of the image during some facial expressions. For example, eyes may be occluded when blinking occurs and the chin may move out of the image with a wide open mouth during facial expressions of surprise. In one embodiment, the region of interest is computed as a convex hull of key points on a neutral face, excluding the areas which may produce distortion in computing the LDP.
  • the motion mitigation module 320 removes rigid motion from a facial movement video clip.
  • a rigid head motion may be detected by a motion detection algorithm, wherein head motion such as turning or tilting may be detected within a video clip.
  • Video frames containing rigid motions may be removed from the video clip.
  • other methods of removing head motion from a video clip may be applied by the motion mitigation module 320.
  • the motion mitigation module 320 persistent head motion in a facial movement video clip.
  • the motion mitigation module 320 may measure an individual's head-tilt or gesture when performing an expression. The motion mitigation module 320 sends the measured head motion information to the computation module 360 wherein the individual's head motion is accounted in the individual's LDP.
  • the resizing module 330 resizes the video frames containing frontal view of a face such that all the video frames are the same size. In one instance, the video frames are resized to 128 x 160 pixels for processing purposes. In other instances, the resizing module 330 resizes video frames such that a face comprises a same amount of space on a video frame in all video frames.
  • the pixel tracking module 340 tracks each pixel or key point on the neutral face throughout the image sequence to obtain its displacement in each frame.
  • the pixel tracking module 340 may track pixels in a variety of ways, such as using an optical flow estimation described by B.D.Lucas and T.Kanade, "An iterative image registration technique with an application to stereo vision" in Proc. The 1981 DARPA Image Understanding Wo ⁇ hop, pages 121-130, April 1981.
  • the flow estimation method assumes that the flow is essentially constant in a local neighborhood of a pixel under consideration, and the method solves the basic optical flow equations for all the pixels in that neighborhood by the least squares criterion.
  • the pixel tracking module 340 may compute a point's location at two or more video frames within the image sequence.
  • the normalization module 350 warps the displacement fields defined on the neutral face using a transformation that normalizes the face shape to a neutral face shape.
  • the normalization module 350 may use any transformation that normalizes the face shape.
  • the normalization module 350 therefore may deform the face shape such that one location on a facial image corresponds to the same location in another facial image in a video frame.
  • the computation module 360 may compute an LDP based on the normalized face shapes.
  • the computation module 360 computes an LDP for an expression in a facial motion video clip.
  • the neutral face is used as the initial state and u denotes the two-dimensional displacement field, which changes the neutral face to a specific deformed face.
  • u is defined on each pixel in the neutral face and thus, so is C.
  • the two orthogonal eigenvectors of C give the two principal deformation directions. And the square-root of the corresponding eigenvalue measures the deformation magnitude.
  • eigenvalue is smaller than one, a compression is observed. If the eigenvalue is larger than one, a stretch is observed.
  • a deformation pattern can be well represented by an ellipse, as shown in FIG. 6C. The direction and length of the major/minor axes of the ellipse are determined by the eigenvectors and the square-roots of the eigenvalues of C, respectively.
  • the computation module 360 computes LDP as a set of deformation- displacement pairs.
  • the LDP may be represented mathematically as,
  • the LDP generation module 106 generates an LDP for training video 102 and the facial expression video 104.
  • an individual's LDP includes the individual's head motion information, including but not limited to a head tilt or a gesture.
  • FIG. 4 illustrates an LDP comparison module 108.
  • the LDP comparison module 108 compares the LDPs of the training video 102 and the facial expression video 104 at each pixel in the image.
  • the LDP comparison module 108 includes a deformation computation module 410, a motion computation module 420 and a verification module 430.
  • the LDP comparison module 108 compares two LDPs at a particular pixel. If the training video LDP (TV-LDP) is matched with a facial expression LDP (FELDP), for example, for each TV-LDP u, a closest FE-LDP u, is found.
  • the LDP comparison module 108 measures the similarity between their corresponding C.
  • the defonnation computation module 410 computes deformation similarity, 3 ⁇ 4 representing the difference in deformation patterns.
  • the defonnation computation module 410 defines a function ⁇ for comparing two defonnation patterns as,
  • FIG. 6D shows an example 608, A 2 610 and A 612.
  • ⁇ -3 ⁇ 4 i(3 ⁇ 4 ' "S t ) ⁇ on ixel x
  • the defonnation computation module 410 finds, for each local motion in 3 ⁇ 4, the most similar local motion in ⁇ 3 ⁇ 4 (on the same pixel).
  • and ⁇ 2 are two parameters. In one embodiment, they may be set to 0.3 and 1.0 respectively.
  • is the motion similarity between two displacement vectors; 0 ⁇ ⁇ 1 and a bigger value indicates a higher motion similarity.
  • motion similarity can be considered as the confidence of the deformation similarity measurement.
  • the deformation computation module 410 converts the motion similarity scores to normalized weights:
  • the local deformation similarity at pixel x is computed as a weighted
  • the motion computation module 420 computes the motion similarity between two LDPs at each pixel or key point on the neutral face of a video frame.
  • the LDP comparison module 108 compares the head motion information in two LDPs to determine an identity of an individual in a facial expression video 104.
  • head motion may include a head-tilt, a gesture, etc that may be characteristic to an individual.
  • the motion computation module 420 may measure the local motion similarity on a pixel x as follows:
  • the verification module 430 identifies if a same individual is in both training video 102 and the facial expression video 104. In one embodiment, the verification module 430 computes a verification score by multiplying the overall motion similarity and deformation similarity scores:
  • subject A is considered as the same person as subject B.
  • the parameter ⁇ is determined based on users' requirements. If s( i3 ⁇ 4, J3 ⁇ 4) ⁇ ⁇ , subject s is considered as a different person from subject B.
  • FIG. 5 is a flow chart of a process for generating a local deformation profile in accordance with an embodiment of the invention.
  • the process starts by receiving at least one facial motion video 510.
  • facial motion videos include a training video and a facial expression video.
  • Each video may or may not include a same individual in the video.
  • each video includes a neutral face wherein the individual in the video is not expressing any expression.
  • the neutral face may be flagged in a video clip or may be the first frame in the video clip.
  • the process finds 520 key pixels on the neutral face by using a face detection and localization algorithm.
  • FIG. 6A illustrates key pixels 604 on a neutral face of an individual in a video frame.
  • the pixels are in a region of interest on the individual's face.
  • FIG. 6B illustrates regions of interest 606 on a face in accordance with an embodiment.
  • the area of interest 606 excludes eyes, chin and forehead.
  • the region of interest excludes these regions because they may cause distortion in an LDP.
  • the process removes 530 any head motion recorded on the video. As such, the process can remove video frames that may provide distorted LDPs.
  • the process resizes 540 the video frames such that they are a same size.
  • the images may also be cropped such that the head occupies a similar area within each video frame.
  • the process also tracks 550 each pixel on the neutral face throughout the image sequence to obtain its displacement in each frame.
  • an optical flow estimation method may be used to track a pixel, including the Lucas-Kanade optical flow estimation with pyramidal refinement method described by B.D.Lucas and T.Kanade, "An iterative image registration technique with an application to stereo vision” in Proc. The 1981 DARPA Image Understanding Workshop, pages 121-130, April 1981. Other models may be used to track the pixels in other embodiments.
  • the process normalizes 560 the displacement fields defined on the neutral face using a transformation to a mean face.
  • the mean face may include a average of pixel locations throughout the image sequence.
  • the process users the normalized video frames to construct 570 an LDP for each video. As described in the specification above in reference to FIG. 3, the LDP may be represented as a deformation-displacement pairs. The process ends once the LDP for a received 510 video is constructed.
  • FIG. 7 is a flow chart of a process for matching local deformation profiles in accordance with an embodiment of the invention.
  • the process starts by receiving 710 a plurality of LDPs.
  • an LDP is a deformation- displacement pair represented by ⁇ (C Xit , where x denotes a pixel in a shape- normalized neutral face image in a video clip.
  • the process calculates 720 a local deformation similarity score.
  • the local deformation similarity score may includes a measure of motion similarity between the LDPs.
  • the process calculates 730 a local motion similarity score at each pixel my computed. In one embodiment, the local motion similarity score is computed as a weighted average.
  • the process calculates 740 a verification score indicating whether a same individual is in two or more videos represented by the LDPs.
  • a high verification score indicates high likelihood that a same individual is in both facial motion video clips.
  • a low verification score may indicate a low likelihood that a same individual is in both facial motion videos represented by the plurality of LDPs received 710 by the process.
  • An advantage of the embodiments disclosed herein is that the system and the method provide a way to determine the identity of an individual in a video clip.
  • the video clip may be a part of security surveillance video, an advertisement video, etc.
  • the embodiments may be applied to identify an individual in a variety of situations, including but not limited to providing surveillance security, providing individual specific advertising, etc.
  • the embodiments described herein may be used to identify an individual's sentiment or expression, allowing a system to perform sentiment analysis on a video feed.
  • a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
  • Embodiments of the invention may also relate to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a tangible computer readable storage medium or any type of media suitable for storing electronic instructions, and coupled to a computer system bus.
  • any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

Abstract

A local deformation profile for two or more facial expressions in two or more video clips is generated. The local deformation profile includes motion information such as the direction or magnitude of displacement of a point on the individual's face during the course of an expression. Motion and deformation data from two or more local deformation profiles wherein an individual is expressing different facial expressions may be compared to determine if a same individual is in both video clips. If the individual's localized facial motion is similar in the two video clips it suggests a high confidence in the deformation computation. A small difference in deformation patterns of the video clips along with a similar localized facial motion suggests a same individual is in the two or more video clips. Thus, an individual expressing facial expressions may be identified in video clips.

Description

GENERAL MOTION-BASED FACE RECOGNITION
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No.
61/353,812, filed June 11, 2010, which is incorporated by reference in its entirety.
BACKGROUND
[0002] This invention relates generally to facial recognition and more specifically to motion-based face recognition.
[0003] Facial recognition systems typically identify individuals based on their facial features, including shape and color of the eyes, nose, mouth, chin or distances between these parts. These systems typically match an image with an individual if facial features measured from the image match an individual's known features. However, such a method of recognizing an individual from a facial image has drawbacks. For example, if an individual provides different facial expressions that alter the shape or distances of facial features, the system may not be able to correctly identify the individual.
[0004] Some facial recognition systems use facial motion to recognize an individual. For example, motion recognition systems may calculate displacement vectors associated with certain facial movements such as smiles or frowns to recognize an individual. However, such motion recognition systems require a fixed, predefined motion. Therefore, an individual must perform a specific facial motion each time in order to be successfully recognized. These limitations limit the use and versatility of facial motion recognition systems.
SUMMARY
[0005] According to an embodiment of the invention, a computer-implemented method performs motion-based face recognition. In one embodiment, the method extracts motion and deformation information from one or more facial expressions provided by an individual and uses the extracted information to identify an individual in another facial motion video.
[0006] In one embodiment, the method extracts a local deformation profile from a frontal view facial motion by the following steps: 1) use a face detection and localization algorithm to find a set of key points on the neutral face of the first video frame; 2) remove any rigid head motion from the video; 3) crop the face region from the video to get a cropped face image sequence; 4) track each pixel on the neutral face (the first cropped face image) throughout the image sequence to obtain its displacement in each frame; 5) warp the displacement fields defined on the neutral face using a transformation which normalizes the face shape to a given mean face shape; and 6) from the shape-free displacement fields, construct the local deformation profile.
[0007] In one embodiment, the method constructs a local deformation profile for two or more facial expressions provided by one or more individuals. The local deformation profile includes motion inforaiation, such as the direction or magnitude of displacement of a point on the individual's face during the course of an expression. Motion and deformation data from two or more local deformation profiles in which an individual expresses different facial expressions may be compared to determine if the same individual is in both video clips. In one embodiment, the method determines if the individual's localized facial motion is similar in the two local deformation profiles.
[0008] Localized facial motions can be similar even if expressions are dissimilar. For example, facial expression of surprise and fear may be globally different, but they are locally similar around the eyebrows. A large difference in local motion suggests that the deformation patterns in the two or more video clips are caused by very different facial expressions and thus will not provide reliable results. On the other hand, a low difference in local motion suggests that deformation patterns are similar enough to compare with one other. In such an instance, the method may compare deformation of at least one point on the individual's face in each local deformation profile, wherein a a sufficiently small difference in deformation pattern suggests that the same individual is present in the two or more video clips. In this way, the identity of an individual expressing different facial expressions may be identified in video clips.
[0009] The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] FIG. 1 is a high-level block diagram of a system environment for identifying an individual based on his or her facial movement video clip, in accordance with an embodiment of the invention. [0011] FIG. 2 is a block diagram of a computing device in accordance with an embodiment of the invention.
[0012] FIG. 3 is a block diagram of a local deformation profile generation module in accordance with an embodiment of the invention.
[0013] FIG. 4 is a block diagram of a local deformation profile comparison module in accordance with an embodiment of the invention.
[0014] FIG. 5 is a flow chart of a process for generating a local deformation profile in accordance with an embodiment of the invention.
[0015] FIG. 6 A illustrates an example of a neutral face image in accordance with an embodiment of the invention.
[0016] FIG. 6B illustrates areas of interest in a facial image in accordance with an embodiment of the invention.
[0017] FIG. 6C illustrates a deformation profile of a facial expression in accordance with an embodiment of the invention.
[0018] FIG. 6D illustrates an overlap in two deformation patterns in accordance with an embodiment of the invention.
[0019] FIG. 7 is a flow chart of a process for matching local deformation profiles in accordance with an embodiment of the invention.
[0020] The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
DETAILED DESCRIPTION
Overview
[0021] Embodiments of the inventions may be implemented using various architectures, such as the example architecture illustrated in FIG. 1. In this embodiment, a facial expression video 104 is compared to a training video 102. A local deformation profile (LDP) generation module 106 generates an LDP for each video and the LDP comparison module 108 compares the LDP profiles of each video to one another to identify if a same individual is in both videos. The results of the comparison are provided in the identification result 110.
[0022] A training video 102, also referred to as an enrolment video, includes two or more frames showing an individual's facial expression. A facial expression may include, but is not limited to, displays of surprise, happiness, fear, anger and disgust. The display of each expression may vary per individual. For example, a display of happiness may include a smile for an individual. In one embodiment, the training video 102 includes a neutral face image for an individual. The neutral face image is wherein a user is not expressing an emotion. It may be used as a baseline for comparison against other expressions. The training video 102 is provided to the LDP generation module 106 to extract an LDP profile from the video.
[0023] Similarly, a facial expression video 104 includes two or more frames showing an individual's facial expression. As noted, the facial expression may include displays of surprise, anger, happiness, or any other facial expression. Additionally, the facial expression video 104 also includes at least one frame displaying a neutral expression. The facial expression video 104 is provided to the LDP generation module 106 to extract an LDP profile from the video.
[0024] The LDP generation module 106 extracts an LDP profile for a facial expression from a video, including a training video 102 and a facial expression video 104. An LDP includes a set of deformation-displacement pairs that represent the motion and deformation of a face at a particular location through an expression. For example, the LDP may be represented by a two-dimensional displacement field that changes the neutral face to a specific deformed face and deformation sensor. The LDP is discussed in greater detail below in reference to FIG. 3. In one embodiment, the LDP generation module 106 generates an LDP for each training video 102 and the facial expression video 104.
[0025] The LDP comparison module 108 compares the LDPs for each the training video 102 with the facial expression video 104. The LDP comparison module 108 measures the difference in displacement vectors associated with facial motion between the videos and the difference in deformation patterns associated with an expression in each video clip. The differences in motion and deformation may be provided in the identification result 1 10, wherein a large difference in displacement vectors suggests that the facial motions are dissimilar and a comparison of deformation would not provide reliable results. Additionally, a large difference in deformation patterns suggests a difference in identity of individuals in the training video 102 and the facial expression video 104. On the other hand, a small difference (or a high verification score) suggests that the same individual is present in both training videos 102 and the facial expression video 104. Thus individual in 104 is given the identity of the individual in 102. A verification score is discussed in greater detail in the specification. System Environment
[0026] FIG. 2 illustrates an embodiment of a computing device 200 used to implement the LDP generation module 106 and the LDP comparison module 108. In the embodiment shown in FIG. 2, the computing device 200 comprises a processor 210, a data store 220, an input device 230, an output device 240, a power supply 250 and a communication module 260. It should be understood, however, that not all of the above components are required for the computing device 200, and this is not an exhaustive list of components for all
embodiments of the computing device 200 or of all possible variations of the above components. A computing device 200 may have any combination of fewer than all of the capabilities and components described herein.
[0027] The processor 210, data store 220, and the power supply 250 enable the computing device 200 to perform computing functionalities. The processor 210 is coupled to the input device 230 and the output device 240 enabling applications running on the computing device 200 to use these devices. In one embodiment, the data store 220 comprises a small amount of random access memory (RAM) and a larger amount of flash or other persistent memory, allowing applications or other computer executable code to be stored and executed by the processor 210. In one embodiment, the data store 220 includes instructions that when executed cause the processor to perform the actions described above in FIG. 1 in conjunction with the LDP generation module 106 and the LDP comparison module 108, allowing the computing device 200 to retrieve data from one or more data sources including the training video 102 and the facial expression video 104. The computing device 200 also executes an operating system or other software supporting one or more input modalities for receiving input from the input device 230 and/or one or more output modalities presenting data via the output device 240, such as audio playback or display of visual data.
[0028] The output device 240 may comprise any suitable display system for providing visual feedback, such as an organic light emitting diode (OLED) display. The output device 240 may also include a speaker or other audio playback device to provide auditory feedback. For example, the output device 240 may communicate audio feedback (e.g., prompts, commands, and system status) according to an application running on the computing device 200 using the speaker and also display word phrases, static or dynamic images, or prompts as directed by the application using the display. The input device 240 comprises any suitable device for receiving input from a user, such as a keyboard, touch-sensitive display or gesture capture system. In one instance, the input device 240 includes a device capable of providing a video, including a video camera. [0029] The communication module 260 comprises a wireless communication circuit allowing wireless communication with the network 120 (e.g., via Bluetooth, WiFi, RF, infrared, or ultrasonic). For example, the communication module 260 identifies and communicates with one or more wireless access points using WiFi or identifies and communicates with one or more cell towers using RF. In an embodiment, the communication module 260 also includes a jack for receiving a data cable (e.g., Mini-USB or Micro-USB).
LDP Generation and Comparison
[0030] The LDP generation module 106 as illustrated in FIG. 3, generates a LDP for a facial motion video clip, including the training video 102 and the facial expression video 104. The LDP generation module 106 includes a pixel detection module 310, a motion mitigation module 320, a resizing module 330, a pixel tracking module 340 and a normalization module 350.
[0031] The pixel detection module 310 finds key points or pixels on a neutral face of a facial motion video clip. In one embodiment, the pixel detection module 310 identifies a video frame displaying a neutral facial expression. The neutral face frame may be the first frame of a video. In one embodiment, the pixel detection module 310 uses a face detection and localization algorithm to find a set of key points on the neutral face. In another embodiment, statistical models such as active shape models may be used may be used to find key points on the neutral face. Additionally, software libraries such as STASM may be used to find the key points as described by S.Milborrow and F.Nicolls in "Locating facial features with an extended active shape model." Proc. ECCV, 2008. In one embodiment, the pixel detection module 310 finds key points or pixels on the neutral face by computing the key points from several neutral faces in a database. In one instance, the pixel detection module 310 finds regions of interest enclosed by the key points. Regions of interest exclude areas that may produce distortion in computing the LDP. Some regions of the face, including the eyes and the chin, can be occluded or move out of the image during some facial expressions. For example, eyes may be occluded when blinking occurs and the chin may move out of the image with a wide open mouth during facial expressions of surprise. In one embodiment, the region of interest is computed as a convex hull of key points on a neutral face, excluding the areas which may produce distortion in computing the LDP.
[0032] The motion mitigation module 320 removes rigid motion from a facial movement video clip. A rigid head motion may be detected by a motion detection algorithm, wherein head motion such as turning or tilting may be detected within a video clip. Video frames containing rigid motions may be removed from the video clip. In other embodiments, other methods of removing head motion from a video clip may be applied by the motion mitigation module 320. In other embodiments, the motion mitigation module 320 persistent head motion in a facial movement video clip. For example, the motion mitigation module 320 may measure an individual's head-tilt or gesture when performing an expression. The motion mitigation module 320 sends the measured head motion information to the computation module 360 wherein the individual's head motion is accounted in the individual's LDP.
[0033] In one embodiment, the resizing module 330 resizes the video frames containing frontal view of a face such that all the video frames are the same size. In one instance, the video frames are resized to 128 x 160 pixels for processing purposes. In other instances, the resizing module 330 resizes video frames such that a face comprises a same amount of space on a video frame in all video frames.
[0034] The pixel tracking module 340 tracks each pixel or key point on the neutral face throughout the image sequence to obtain its displacement in each frame. The pixel tracking module 340 may track pixels in a variety of ways, such as using an optical flow estimation described by B.D.Lucas and T.Kanade, "An iterative image registration technique with an application to stereo vision" in Proc. The 1981 DARPA Image Understanding Wo^hop, pages 121-130, April 1981. The flow estimation method assumes that the flow is essentially constant in a local neighborhood of a pixel under consideration, and the method solves the basic optical flow equations for all the pixels in that neighborhood by the least squares criterion. In other embodiments, the pixel tracking module 340 may compute a point's location at two or more video frames within the image sequence.
[0035] The normalization module 350 warps the displacement fields defined on the neutral face using a transformation that normalizes the face shape to a neutral face shape. The normalization module 350 may use any transformation that normalizes the face shape. The normalization module 350 therefore may deform the face shape such that one location on a facial image corresponds to the same location in another facial image in a video frame. As such, the computation module 360 may compute an LDP based on the normalized face shapes.
[0036] The computation module 360 computes an LDP for an expression in a facial motion video clip. In one instance the neutral face is used as the initial state and u denotes the two-dimensional displacement field, which changes the neutral face to a specific deformed face. Then, the computation module 360 computes the deformation tensor C as, C = VuTVu + Viir - (1) where I is an identity matrix, and V is the gradient operator. Although both u and C are generally supposed to be continuous in space, in the above implementation, u is defined on each pixel in the neutral face and thus, so is C. The two orthogonal eigenvectors of C give the two principal deformation directions. And the square-root of the corresponding eigenvalue measures the deformation magnitude. If the eigenvalue is smaller than one, a compression is observed. If the eigenvalue is larger than one, a stretch is observed. Such a deformation pattern can be well represented by an ellipse, as shown in FIG. 6C. The direction and length of the major/minor axes of the ellipse are determined by the eigenvectors and the square-roots of the eigenvalues of C, respectively.
[0037] Additionally, the computation module 360 computes LDP as a set of deformation- displacement pairs. The LDP may be represented mathematically as,
[0038]
Figure imgf000009_0001
where x denote a pixel in the shape-normalized neutral face image of the subject and t is an index. An order for the elements in set & may not be formed in one embodiment. The index t is used to refer to different deformed state of the face, and ux excludes rigid head motion. In one embodiment, the LDP generation module 106 generates an LDP for training video 102 and the facial expression video 104. In one instance, an individual's LDP includes the individual's head motion information, including but not limited to a head tilt or a gesture.
[0039] FIG. 4 illustrates an LDP comparison module 108. The LDP comparison module 108 compares the LDPs of the training video 102 and the facial expression video 104 at each pixel in the image. The LDP comparison module 108 includes a deformation computation module 410, a motion computation module 420 and a verification module 430.
[0040] In one embodiment, the LDP comparison module 108 compares two LDPs at a particular pixel. If the training video LDP (TV-LDP) is matched with a facial expression LDP (FELDP), for example, for each TV-LDP u, a closest FE-LDP u, is found. The LDP comparison module 108 measures the similarity between their corresponding C. The deformation similarity, ¾ and motion similarity, s,„ as weighted averages of local deformation similarity and local motion similarity, which are measured on each pixel respectively, Sd = Σ W(X)S Cl (4 =Σ *)*,,, (4
(3)
w(x = ¾ . (4) where x denote a pixel in the region of interest Ω (Figure 2(b)). NonTialized motion similarity serves as the weight.
[0041] The defonnation computation module 410 computes deformation similarity, ¾ representing the difference in deformation patterns. In one embodiment, the defonnation computation module 410 defines a function ψ for comparing two defonnation patterns as,
where A\ and A2 are the areas that represent the two defonnation patterns, C\ and C2, respectively; and A is the area of overlap of the two ellipses after being translated to be concentric. FIG. 6D shows an example
Figure imgf000010_0001
608, A2 610 and A 612. In one instance, to match LDP A, represented as &A = against LDP B, represented as
·-¾ = i(¾' "S t)} on ixel x, the defonnation computation module 410 finds, for each local motion in ¾, the most similar local motion in <¾ (on the same pixel). Mathematically,
Figure imgf000010_0002
^(w^ Mj) = (1 - r>expi-ra/cr ), (8)
r = |ut - «3|/CIM.I.1 + l¾D, (9)
^ (¾1Ϊ} = 1 - expi—qVa (10)
Figure imgf000010_0003
where | · | denotes /2-norm; r is a commonly used relative measurement of vector difference;
□ i alters the value of r so that larger difference will be penalized more severely; and D 2 is a penalty for small displacement vectors. As such, when the displacement is small, the deformation pattern is also slight and thus does not provide much personal characteristics. <T| and σ2 are two parameters. In one embodiment, they may be set to 0.3 and 1.0 respectively.
□ is the motion similarity between two displacement vectors; 0 <□ < 1 and a bigger value indicates a higher motion similarity. In one instance, motion similarity can be considered as the confidence of the deformation similarity measurement. Thus, in such an instance, the deformation computation module 410 converts the motion similarity scores to normalized weights:
Figure imgf000011_0001
Additionally, the local deformation similarity at pixel x is computed as a weighted
Figure imgf000011_0002
[0043] The motion computation module 420 computes the motion similarity between two LDPs at each pixel or key point on the neutral face of a video frame. In one embodiment, the LDP comparison module 108 compares the head motion information in two LDPs to determine an identity of an individual in a facial expression video 104. As described in the specification, head motion may include a head-tilt, a gesture, etc that may be characteristic to an individual. Given two LDPs ¾ = {( .i r)}, and = the motion computation module 420 may measure the local motion similarity on a pixel x as follows:
Figure imgf000011_0003
[0044] The verification module 430 identifies if a same individual is in both training video 102 and the facial expression video 104. In one embodiment, the verification module 430 computes a verification score by multiplying the overall motion similarity and deformation similarity scores:
S( A, &B) = .¾) · 5rf ( .¾, ·!¾). (15) where s,„ ( i¾, i¾) denotes the motion similarity measured with -¾ against Similarly ¾ ( ¾, ¾) denotes the deformation similarity measured with ·.¾ against i¾. s( J¾, i¾) is the verification score of matching .f¾ against with 0 < s( i¾) < 1.0. A higher score denotes a higher similarity in identity of individuals in each video represented by each LDP. In one embodiment, if s( i¾, f¾) > Θ, subject A is considered as the same person as subject B. In one instance, the parameter Θ is determined based on users' requirements. If s( i¾, J¾) < Θ, subject s is considered as a different person from subject B.
Method of LDP Generation and Comparison
[0045] FIG. 5 is a flow chart of a process for generating a local deformation profile in accordance with an embodiment of the invention. The process starts by receiving at least one facial motion video 510. As described in reference to FIG. 1 , facial motion videos include a training video and a facial expression video. Each video may or may not include a same individual in the video. Additionally, each video includes a neutral face wherein the individual in the video is not expressing any expression. The neutral face may be flagged in a video clip or may be the first frame in the video clip. The process finds 520 key pixels on the neutral face by using a face detection and localization algorithm. FIG. 6A illustrates key pixels 604 on a neutral face of an individual in a video frame. In one embodiment, the pixels are in a region of interest on the individual's face. FIG. 6B illustrates regions of interest 606 on a face in accordance with an embodiment. As illustrated in the figure, the area of interest 606 excludes eyes, chin and forehead. The region of interest excludes these regions because they may cause distortion in an LDP. The process removes 530 any head motion recorded on the video. As such, the process can remove video frames that may provide distorted LDPs. The process resizes 540 the video frames such that they are a same size. The images may also be cropped such that the head occupies a similar area within each video frame. The process also tracks 550 each pixel on the neutral face throughout the image sequence to obtain its displacement in each frame. In one embodiment, an optical flow estimation method may be used to track a pixel, including the Lucas-Kanade optical flow estimation with pyramidal refinement method described by B.D.Lucas and T.Kanade, "An iterative image registration technique with an application to stereo vision" in Proc. The 1981 DARPA Image Understanding Workshop, pages 121-130, April 1981. Other models may be used to track the pixels in other embodiments. The process normalizes 560 the displacement fields defined on the neutral face using a transformation to a mean face. The mean face may include a average of pixel locations throughout the image sequence. The process users the normalized video frames to construct 570 an LDP for each video. As described in the specification above in reference to FIG. 3, the LDP may be represented as a deformation-displacement pairs. The process ends once the LDP for a received 510 video is constructed.
[0046] FIG. 7 is a flow chart of a process for matching local deformation profiles in accordance with an embodiment of the invention. The process starts by receiving 710 a plurality of LDPs. As described in the specification above, an LDP is a deformation- displacement pair represented by {(CXit, where x denotes a pixel in a shape- normalized neutral face image in a video clip. For each pixel, the process calculates 720 a local deformation similarity score. The local deformation similarity score may includes a measure of motion similarity between the LDPs. Additionally, the process calculates 730 a local motion similarity score at each pixel my computed. In one embodiment, the local motion similarity score is computed as a weighted average. The process calculates 740 a verification score indicating whether a same individual is in two or more videos represented by the LDPs. In one embodiment, a high verification score indicates high likelihood that a same individual is in both facial motion video clips. Conversely, a low verification score may indicate a low likelihood that a same individual is in both facial motion videos represented by the plurality of LDPs received 710 by the process.
[0047] An advantage of the embodiments disclosed herein is that the system and the method provide a way to determine the identity of an individual in a video clip. The video clip may be a part of security surveillance video, an advertisement video, etc. As such, the embodiments may be applied to identify an individual in a variety of situations, including but not limited to providing surveillance security, providing individual specific advertising, etc. Additionally, the embodiments described herein may be used to identify an individual's sentiment or expression, allowing a system to perform sentiment analysis on a video feed.
Summary
[0048] The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
[0049] Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.
[0050] Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
[0051] Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a tangible computer readable storage medium or any type of media suitable for storing electronic instructions, and coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
[0052] Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.

Claims

What is claimed is:
1. A method for identifying an individual in a video clip based on the
individual's local deformation profile, the method comprising:
detecting a face in the video clip;
identifying a neutral facial expression on a video frame within the video clip; identifying at least one pixel within the neutral face video frame;
tracking an identified pixel in at least two frames of the video clip to obtain its displacement in the frames;
generating a local defomiation profile including a deformation and displacement pair based on the displacement of each pixel;
comparing the local deformation profile with an individual's local deformation profile by comparing difference in motion and deformation between the two local deformation profiles; and
identifying an individual based on the comparing between the local deformation profiles.
2. The method of claim 1, wherein detecting a face in the video clip comprises using a face detection and localization algorithm.
3. The method of claim 1 , wherein the neutral facial expression is a first frame of the video clip.
4. The method of claim 1 , further comprising removing rigid head motion from the video clip.
5. The method of claim 1 , further comprising cropping the face region of video frames within the video clip.
6. The method of claim 1 , wherein the pixel within the neutral face is computed from a mean of neutral faces in a dataset.
7. The method of claim 1 , wherein the pixel is within a region of interest of a face, the region of interest excluding eyes, a forehead and a chin of a face in a video frame.
8. The method of claim 1 , wherein the pixel is tracked using an optical flow estimation method.
9. The method of claim 1 , wherein the pixel tracking method is a Lucas-Kanade optical flow estimation with pyramidal refinement.
10. The method of claim 1 , further comprising warping displacement fields of each pixel on the neutral face using a transformation to normalize the face shape of a mean face shape.
1 1. The method of claim 1 , wherein a large difference in displacement vectors associated with motion between the two local deformation profiles indicates a low confidence in deformation similarity score.
12. The method of claim 1, wherein a large difference in deformation patterns suggests a difference in identity of individuals represented by each local deformation profiles.
13. The method of claim 1 , further comprising computing a verification score by combining the motion similarity score and the deformation similarity score to determine the identity of an individual in the video clip.
14. The method of claim 1 , further comprising:
measuring a head motion in the video clip;
including the measured head motion in the local deformation profile; and comparing the head motion in local deformation profile with an individual's local deformation profile to identify the individual.
15. A computer program product for identifying an individual in a video clip based on the individual's local deformation profile, the computer program product comprising a computer-readable storage medium containing computer program code for:
detecting a face in the video clip;
identifying a neutral facial expression on a video frame within the video clip; identifying at least one pixel within the neutral face video frame;
tracking an identified pixel in at least two frames of the video clip to obtain its displacement in the frames;
generating a local deformation profile including a deformation and displacement pair based on the displacement of each pixel; comparing the local deformation profile with an individual's local deformation profile by comparing difference in motion and deformation between the two local deformation profiles; and
identifying an individual based on the comparing between the local deformation profiles.
16. The computer program product of claim 15, wherein detecting a face in the video clip comprises using a face detection and localization algorithm.
17. The computer program product of claim 15, wherein the neutral facial expression is a first frame of the video clip.
18. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for removing rigid head motion from the video clip.
19. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for cropping the face region of video frames within the video clip.
20. The computer program product of claim 15, wherein the pixel within the neutral face is computed from a mean of neutral faces in a dataset.
21. The computer program product of claim 15, wherein the pixel is within a region of interest of a face, the region of interest excluding eyes, a forehead and a chin of a face in a video frame.
22. The computer program product of claim 15, wherein the pixel is tracked using an optical flow estimation method.
23. The computer program product of claim 15, wherein the pixel tracking method is a Lucas-Kanade optical flow estimation with pyramidal refinement.
24. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for warping displacement fields of each pixel on the neutral face using a transformation to normalize the face shape of a mean face shape.
25. The computer program product of claim 15, wherein a large difference in displacement vectors associated with motion between the two local deformation profiles indicates a low confidence in deformation similarity score.
26. The computer program product of claim 15, wherein a large difference in deformation patterns suggests a difference in identity of an individual represented by each local deformation profiles.
27. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for computing a verification score by combining the motion similarity score and the deformation similarity score to determine the identity of an individual in the video clip.
28. The computer program product of claim 15, further comprising a computer- readable storage medium containing computer program code for:
measuring a head motion in the video clip;
including the measured head motion in the local deformation profile; and comparing the head motion in local deformation profile with an individual's local deformation profile to identify the individual.
PCT/SG2011/000208 2010-06-11 2011-06-10 General motion-based face recognition WO2011155902A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US35381210P 2010-06-11 2010-06-11
US61/353,812 2010-06-11

Publications (1)

Publication Number Publication Date
WO2011155902A1 true WO2011155902A1 (en) 2011-12-15

Family

ID=45098315

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2011/000208 WO2011155902A1 (en) 2010-06-11 2011-06-10 General motion-based face recognition

Country Status (1)

Country Link
WO (1) WO2011155902A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015115681A1 (en) * 2014-01-28 2015-08-06 영남대학교 산학협력단 Method and apparatus for recognising expression using expression-gesture dictionary
CN106127139A (en) * 2016-06-21 2016-11-16 东北大学 A kind of dynamic identifying method of MOOC course middle school student's facial expression
US10055821B2 (en) 2016-01-30 2018-08-21 John W. Glotzbach Device for and method of enhancing quality of an image
CN108875633A (en) * 2018-06-19 2018-11-23 北京旷视科技有限公司 Expression detection and expression driving method, device and system and storage medium
WO2021139475A1 (en) * 2020-01-08 2021-07-15 上海商汤临港智能科技有限公司 Facial expression recognition method and apparatus, device, computer-readable storage medium and computer program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774591A (en) * 1995-12-15 1998-06-30 Xerox Corporation Apparatus and method for recognizing facial expressions and facial gestures in a sequence of images
US6492986B1 (en) * 1997-06-02 2002-12-10 The Trustees Of The University Of Pennsylvania Method for human face shape and motion estimation based on integrating optical flow and deformable models
US6879709B2 (en) * 2002-01-17 2005-04-12 International Business Machines Corporation System and method for automatically detecting neutral expressionless faces in digital images
US20090195545A1 (en) * 2008-01-31 2009-08-06 University Fo Southern California Facial Performance Synthesis Using Deformation Driven Polynomial Displacement Maps
WO2009128784A1 (en) * 2008-04-14 2009-10-22 Xid Technologies Pte Ltd Face expressions identification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774591A (en) * 1995-12-15 1998-06-30 Xerox Corporation Apparatus and method for recognizing facial expressions and facial gestures in a sequence of images
US6492986B1 (en) * 1997-06-02 2002-12-10 The Trustees Of The University Of Pennsylvania Method for human face shape and motion estimation based on integrating optical flow and deformable models
US6879709B2 (en) * 2002-01-17 2005-04-12 International Business Machines Corporation System and method for automatically detecting neutral expressionless faces in digital images
US20090195545A1 (en) * 2008-01-31 2009-08-06 University Fo Southern California Facial Performance Synthesis Using Deformation Driven Polynomial Displacement Maps
WO2009128784A1 (en) * 2008-04-14 2009-10-22 Xid Technologies Pte Ltd Face expressions identification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Face Detection", 21 December 2008 (2008-12-21), Retrieved from the Internet <URL:http://web.archive.or//web/20081221095054/http://en.wikipedia.ort/wiki/Face_detection> [retrieved on 20110808] *
"Facial Recognition System", 25 January 2010 (2010-01-25), Retrieved from the Internet <URL:http://web.archive.or//web/20100125071637/http://en.wikipedia.org_/wiki/Facial_recognition_system> [retrieved on 20110808] *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015115681A1 (en) * 2014-01-28 2015-08-06 영남대학교 산학협력단 Method and apparatus for recognising expression using expression-gesture dictionary
KR101549645B1 (en) 2014-01-28 2015-09-03 영남대학교 산학협력단 Method and apparatus of recognizing facial expression using motion dictionary
US10068131B2 (en) 2014-01-28 2018-09-04 Industry-Academic Cooperation Foundation, Yeungnam University Method and apparatus for recognising expression using expression-gesture dictionary
US10055821B2 (en) 2016-01-30 2018-08-21 John W. Glotzbach Device for and method of enhancing quality of an image
US10783617B2 (en) 2016-01-30 2020-09-22 Samsung Electronics Co., Ltd. Device for and method of enhancing quality of an image
CN106127139A (en) * 2016-06-21 2016-11-16 东北大学 A kind of dynamic identifying method of MOOC course middle school student's facial expression
CN106127139B (en) * 2016-06-21 2019-06-25 东北大学 A kind of dynamic identifying method of MOOC course middle school student's facial expression
CN108875633A (en) * 2018-06-19 2018-11-23 北京旷视科技有限公司 Expression detection and expression driving method, device and system and storage medium
CN108875633B (en) * 2018-06-19 2022-02-08 北京旷视科技有限公司 Expression detection and expression driving method, device and system and storage medium
WO2021139475A1 (en) * 2020-01-08 2021-07-15 上海商汤临港智能科技有限公司 Facial expression recognition method and apparatus, device, computer-readable storage medium and computer program product

Similar Documents

Publication Publication Date Title
Zhang et al. Fast and robust occluded face detection in ATM surveillance
Kumano et al. Pose-invariant facial expression recognition using variable-intensity templates
US9224060B1 (en) Object tracking using depth information
Sánchez et al. Differential optical flow applied to automatic facial expression recognition
Shreve et al. Automatic expression spotting in videos
Ahmad et al. Human action recognition using multi-view image sequences
Cherla et al. Towards fast, view-invariant human action recognition
Hernández-Vela et al. BoVDW: Bag-of-Visual-and-Depth-Words for gesture recognition
US20140056490A1 (en) Image recognition apparatus, an image recognition method, and a non-transitory computer readable medium thereof
Lee et al. Time-sliced averaged motion history image for gait recognition
Akakın et al. Robust classification of face and head gestures in video
Ouanan et al. Facial landmark localization: Past, present and future
WO2011155902A1 (en) General motion-based face recognition
Unzueta et al. Efficient generic face model fitting to images and videos
Hayat et al. Evaluation of spatiotemporal detectors and descriptors for facial expression recognition
US9349038B2 (en) Method and apparatus for estimating position of head, computer readable storage medium thereof
Koutras et al. Estimation of eye gaze direction angles based on active appearance models
Krisandria et al. Hog-based hand gesture recognition using Kinect
Patil et al. Features classification using support vector machine for a facial expression recognition system
Khademi et al. Relative facial action unit detection
EP2998928B1 (en) Apparatus and method for extracting high watermark image from continuously photographed images
Holt et al. Static pose estimation from depth images using random regression forests and hough voting
Zhang et al. Using multiple views for gait-based gender classification
Talha et al. Human action recognition from body-part directional velocity using hidden Markov models
Manolova et al. Facial expression classification using supervised descent method combined with PCA and SVM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11792754

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11792754

Country of ref document: EP

Kind code of ref document: A1