WO2007114547A1 - Method for generating intuitive quasi-eigen faces - Google Patents

Method for generating intuitive quasi-eigen faces Download PDF

Info

Publication number
WO2007114547A1
WO2007114547A1 PCT/KR2006/004423 KR2006004423W WO2007114547A1 WO 2007114547 A1 WO2007114547 A1 WO 2007114547A1 KR 2006004423 W KR2006004423 W KR 2006004423W WO 2007114547 A1 WO2007114547 A1 WO 2007114547A1
Authority
WO
WIPO (PCT)
Prior art keywords
basis
expression
hand
quasi
generated
Prior art date
Application number
PCT/KR2006/004423
Other languages
French (fr)
Original Assignee
Seoul National University Industry Foundation
Kim, Ig-Jae
Ko, Hyeong-Seok
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seoul National University Industry Foundation, Kim, Ig-Jae, Ko, Hyeong-Seok filed Critical Seoul National University Industry Foundation
Publication of WO2007114547A1 publication Critical patent/WO2007114547A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings

Definitions

  • the present invention relates to a method for generating intuitive quasi-eigen faces.
  • the facial expressions of animated characters play a central role in delivering the story.
  • the ability to generate expressive and plausible facial expressions is a critical skill.
  • the invention proposes a small but very useful innovation in the area of 3D facial animation, which can be adopted in a wide range of facial animation productions .
  • blendshape technique which synthesizes expressions by taking a linear combination of a set of pre-modeled expressions.
  • the applicants call this expression set the expression basis.
  • Many commercial animation packages such as Maya and Softimage support blendshape-based facial animation.
  • the technique the applicants develop in the present invention is for systems of this type.
  • a fundamental question in developing a blendshape-based facial animation system is how to control the expressions.
  • One approach is to let the animators manually control the weights assigned to each member of the expression basis set in order to produce the desired expression sequences.
  • Another popular approach that can be taken provided facial motion capture is available, is to set up the system so that the facial animation is driven by a human performance.
  • the basis should contain sufficient elements span the desired range of expressions.
  • the term "basis" is usually reserved for an independent set of elements that spans the entire expressions. In the disclosure of the present application, however, the applicants use the term to loosely mean a set of expressions from which linear combinations are taken.
  • An advantage of using a hand-generated basis is that the combinations of basis elements produce somewhat predictable results.
  • a disadvantage of this approach is that the linear combinations may cover only a portion of full range of facial expressions. When the system is used to reproduce a human performance, the lack of coverage manifests as reproduction errors.
  • PCA principal component analysis
  • Facial expression can be viewed as resulting from the coordination of (mechanical) components such as the jaw, muscles, and skin.
  • the blendshape technique has been widely used for expression synthesis.
  • To generate human expressions in realtime [Kouadio et al. 1998] used linear combinations of a set of key expressions, where the weight assigned to each expressioin was determined from live capture data.
  • [Pighin et al.1998] created a set of photorealistic textured 3D expressions from photographs of a human subject, and used the blendshape technique to create smooth transitions between those expressions.
  • [Blanz and Vetter 1999] introduced a morphable model that could generate a 3D face from a 2D photograph by taking a linear combination of faces in a 3D example database.
  • To increase the covering range of the key expressions [Choe and Ko 2001] let animators sculpt expressions corresponding to the isolated actuation of individual muscles and then synthesized new expressions by taking linear combinations of them.
  • a critical determinant, of the quality of the expression generated by blendshape-based synthesis is the covering range of the key expressions being used.
  • [Chuang 2002] used a PCA-based procedure to identify a set of key expressions that guarantees a certain coverage.
  • the resulting principal components did not correspond intuitively meaningful human expressions.
  • [Chao et al. 2003] proposed another basis generation technique based on independent component analysis. In the key expression set produced using this approach, the differences among the elements were more recognizable than those generated by [Chuang 2002]; however, the individual elements in the set still did not accurately represent familiar/vivid human expressions. As a result, conventional keyframe control is not easy using this approach.
  • This invention is directed to solve these problems and satisfy the long-felt need.
  • the present invention contrives to solve the disadvantages of the prior art.
  • An object of the invention is to provide a method for generating intuitive quasi-eigen faces to form the expression basis for blendshape-based facial animation systems .
  • Another object of the invention is to provide a method for generating intuitive quasi-eigen faces, which the resulting expressions resemble the given expressions.
  • Still another object of the invention is to provide a method for generating intuitive guasi-eigen faces, which has significantly reduced reconstruction errors than hand- generated bases.
  • 3D face model at time t It is a triangular mesh consisting of N vertices, where vi represents the 3D position of the i- th vertex.
  • vi represents the 3D position of the i- th vertex.
  • the applicants assume that the geometry v° of the neutral face is given.
  • the applicants are interested in finding a set of facial expressions, linear combinations of which span ⁇ .
  • E be the hand-generated expression basis that is given by the animator.
  • n is the number of elements and ef is the geometry of the i-th element.
  • a potential problem of the hand-generated expression basis E H is that linear combinations of the basis elements may not span ⁇ .
  • the goal of the invention is to develop a procedure to convert E M into another basis E QE , such that the new basis spans ⁇ and each element ef £ visually resembles the corresponding element ef in E H .
  • the applicants call the elements in the new basis quasi-eigen faces.
  • a method for generating intuitive guasi-eigen faces includes steps of: a) representing a dynamic shape of a three dimensional face model with a vector; b) making a hand-generated expression basis; c) converting the hand-generated expression basis into a new expression basis, wherein the new expression basis is a quasi-eigen faces; and d) synthesizing expression with the quasi-eigen faces.
  • the new expression basis includes a plurality of quasi- eigen faces, wherein the linear combinations of the quasi- eigen faces cover a motion capture data, wherein each of the quasi-eigen faces resembles a corresponding element of the hand-generated expression basis.
  • the dynamic shape of a three dimensional face model is represented by a vector, where Vj represents the 3D position of the i-th vertex, N is a number of vertices .
  • the vector v is represented by a triangular mesh including N vertices, and the vector forms a facial mesh data.
  • the expression v is synthesized by v-V-JZXef , where the neutral face, , and the weights, wf , are given.
  • the hand-generated expression basis is represented by E" , where n is the number of elements of the hand-generated expression basis and ef is the geometry of the i-th element.
  • the method set of displacement is represented by
  • the step of converting the hand-generated expression basis into a new expression basis includes steps of: a) forming an approximate hyperplane out of the motion capture data or the facial mesh data; and b) identifying the orthogonal axes that spans the hyperplane.
  • the step of identifying the orthogonal axes that spans a hyperplane includes a step of using a principal component analysis (PCA) .
  • PCA principal component analysis
  • L is the duration of the motion capture in number of frames.
  • the hyperplane is formed by the cloud of points plotted ⁇ regarding each of the expressions in the 3N-dimensional space.
  • the covariance matrix C using The C is a symmetric positive-definite matrix with positive eigenvalues ⁇ ,--, ⁇ N i n order of magnitude, with A 1 being the largest.
  • the method may further include a step of obtaining the eigen faces from m eigenvectors E PCA corresponding to ⁇ ,...,,A m ⁇ , the principal axes.
  • the method may further include a step of converting the hand-generated expression basis into the quasi-eigen basis, the set of quasi-eigen faces, with the eigenfaces ready.
  • the step of converting the hand-generated expression basis into the quasi-eigen basis includes steps of: a) computing
  • the method may further include a step of synthesizing a general expression by the linear combination •
  • the weights w? E takes on both positive and negative values when the eigenfaces are used.
  • the method may further include a step of taking ef to represent the full actuation of a single expression muscle with other muscles left relaxed for intrinsically ruling out the possibility of two hand-generated elements having almost identical shapes.
  • the method may further include a step of retargeting the facial expressions by feeding a predetermined expression weight vector to a deformation bases.
  • the predetermined expression weight vector is obtained
  • the method provides a method for generating intuitive quasi- eigen faces to form the expression basis for blendshape- based facial animation systems; (2) the resulting expressions resemble the given expressions; and (3) the method makes it possible to significantly reduce reconstruction errors than hand-generated bases.
  • Fig. 1 shows an analogical drawing of the motion capture data and the bases
  • Fig. 2 shows a side-by-side comparison of the hand- generated basis and the quasi-eigen basis in four selected elements
  • Fig. 3 shows a reconstruction of the original performance
  • Fig. 4 shows a comparison of the reconstruction errors
  • Fig. 5 is a flow chart showing a method according to the present invention
  • Fig. 6 is a flow chart showing a step of converting the hand-generated expression basis into a new expression basis
  • Fig. 7 is a flow chart showing a step of identifying orthogonal axes that spans the hyperplane
  • Fig. 8 is another flow chart showing a step of identifying orthogonal axes that spans the hyperplane in detail;
  • Fig. 9 is a flow chart showing a step of converting into the quasi-eigen basis.
  • Fig. 10 is a flow chart showing a step of converting into the guasi-eigen basis in detail.
  • v will form a 3N-dimensional vector space.
  • E the mathematical expression space
  • normal human expressions involve a narrower range of deformation.
  • the applicants plot each expression in ⁇ as a point in 3iV-dimensional space, the point cloud forms an approximate hyperplane.
  • the PCA is designed to identify the orthogonal axes that spans the hyperplane .
  • the analogical situation is shown in Fig. 1.
  • the 3D coordinate system can be viewed as E, the dots as forming the ⁇ -hyperplane, and the solid perpendicular axes as the principal components.
  • the arrows 10 stand for PCA basis, the arrows 20 hand-generated basis, and the arrows 30 quasi- eigen basis.
  • I (2) C is a symmetric positive-definite matrix, and hence has positive eigenvalues.
  • ⁇ ,....,A 3N be the eigenvalues of C in order of magnitude, with A x being the largest.
  • the above procedure provides a powerful means of generating an expression basis that covers a given set ⁇ of expressions.
  • a problem of this approach is that, even though the eigenfaces have mathematical significance, they do not represent recognizable human expressions .
  • the applicants can now describe the method to convert the hand- generated expression basis into the quasi-eigen basis (i.e., the set of quasi-eigen faces). This method is based on our observation that the hand-generated elements may lie out of the hyperplane. In the analogical situation drawn in Fig.
  • Equations 3 and 4 will modify to the hand-generated elements.
  • the applicants need to assess whether the new expressions are visually close to the original ones. If a hand-generated expression lies on the hyperplane (or, is contained in the motion capture data), then it will not be modified by the projection process. When a hand-generated expression is out of the hyperplane, however, the projection will introduce a minimal Euclidean modification to it. Although the scale for visual differences is not the same as that of Euclidean distance, small Euclidean distances usually correspond to small visual changes .
  • the applicants can guide the sculpting work of the animator so as to avoid overlap among the hand-generated expressions.
  • the applicants can take ef to represent the full actuation of a single expression muscle with other muscles left relaxed, which intrinsically rules out the possibility of two hand- generated elements having almost identical shapes [Choe and Ko 2001].
  • animators can refer to reference book showing drawings of the expressions corresponding to isolated actuation of individual muscles [Faigin 1990].
  • the facial action coding system [Ekman and Friesen 1978] can also be of great assistance constructing non-overlapping hand-generated expression bases.
  • the quasi-eigen basis may leave out a PCA-axis. Situations of this type can be identified by looking at the matrix is less than a threshold ⁇ , the applicants conclude that e ⁇ is missing in the quasi- eigen basis. In such a case, the applicants can simply augment the basis with e ⁇ , or, can explicitly notify the animator regarding the missing eigenface ej 04 and let him/her make (minimal) modification to it so that its projection can be added to the keyframing basis as well as the quasi-eigen basis.
  • the applicants constructed the 3D facial model using a Cyberware 3D scanner. The applicants established the correspondence between the 3D marker positions and the geometrical model of the face using the technique that was introduced by Pighin et al. [1998].
  • the motion capture data ⁇ is a sequence of facial geometries.
  • the applicants convert the marker positions of each frame obtained into a facial mesh.
  • the applicants apply an interpolation technique that is based on the radial basis function.
  • the technique gives the 3D displacements of the vertices that should be applied to the neutral face.
  • E H 18 elements If the elements are clustered in E, then their projections will also be clustered in the hyperplane; this will result in poor coverage, requiring the hand-generation of additional basis elements.
  • Fig. 3 shows still images extracted from the video.
  • the three images in the left column are taken from the original motion capture.
  • the middle and right columns show the reconstruction of those frames with E" and E ⁇ , respectively.
  • Fig. 4 visualizes the errors introduced during the reconstruction (left: for E” , right: for E QE ) , for the frame shown on the top of Fig. 3.
  • the dots represent the captured marker positions and their positions in the reconstructed result.
  • the applicants can clearly see that reconstruction with the quasi-eigen faces (shown on the right) makes a less amount of error than the reconstruction with the handgenerated basis (shown on the left).
  • the applicants also experimented the reconstruction of a speech.
  • the video shows that the reconstructed speech with E QE is more realistic than that with E H .
  • the applicants have presented a new method for generating expression bases for blendshape- based facial animation systems.
  • Animation studios commonly generate such bases by manually modeling a set of key expressions.
  • hand-generated expressions may contain components that are not part of human expressions, and reconstruction/animation by taking linear combinations of these expressions may produce reconstruction errors or unrealistic results.
  • statistically-based techniques can produce high-fidelity expression bases, but the basis elements are not intuitively recognizable.
  • the applicants have proposed a method for generating so- called quasi-eigen faces, which have intuitively recognizable shapes but significantly reduced reconstruction errors compared to hand-generated bases.
  • the applicants have focused on the reproduction of captured performances. This approach was taken based on our experience in facial animation that, in most cases, technically critical problems reside in the analysis part rather than in the synthesis part. If the analysis is performed accurately, then expression synthesis, whether it be reproduction or animation of other characters, will be accurate. The experiments performed in the present work showed that the proposed technique produces basis elements that are visually recognizable as typical human expressions and can significantly reduce the reconstruction error. Even though the applicants did not demonstrated in the disclosure, the proposed technique can be effectively used for synthesizing expressions of other characters than the captured subject.
  • the proposed technique is an animator-in-the-loop method whose results are sensitive to the hand-generated expressions provided by the animator. If the animator provides inadequate expressions, the projection will not improve the result.
  • the applicants have found that a musclebased approach to the modeling of the hand-generated expressions, as used in Section 5, effectively extends the coverage of the basis .
  • Application of the proposed projection to hand-generated elements of this type reduces the reconstruction error.
  • the muscle-based approach is not, however, the only way to obtain non-overlapping hand- generated expressions. A better guidance may be developed in the future, which can help the animator sculpt intuitively meaningful but non-overlapping faces .
  • a method for generating intuitive quasi-eigen faces includes steps of: a) representing a dynamic shape of a three dimensional face model with a vector (SlOOO); b) making a hand-generated expression basis (S2000); c) converting the hand-generated expression basis into a new expression basis (S3000) , wherein the new expression basis is a quasi-eigen faces; and d) synthesizing expression with the quasi-eigen faces (S4000) as shown in Fig. 5.
  • the new expression basis includes a plurality of quasi- eigen faces, wherein the linear combinations of the quasi- eigen faces cover a motion capture data, wherein each of the quasi-eigen faces resembles a corresponding element of the hand-generated expression basis.
  • the dynamic shape of a three dimensional face model is represented by a vector, where v ⁇ represents the 3D position of the i-th vertex, N is a number of vertices .
  • the vector v is represented by a triangular mesh including N vertices.
  • the vector forms a facial mesh data.
  • the expression v is synthesized by
  • the hand-generated expression basis is represented by E" , where n is the number of elements of the hand-generated expression basis and ef is the geometry of the i-th element.
  • the method set of displacement is represented by
  • the step (S3000) of converting the hand-generated expression basis into a new expression basis includes steps of: a) forming an approximate hyperplane out of the motion capture data or the facial mesh data (S3100); and b) identifying the orthogonal axes that spans the hyperplane (S3200) as shown in Fig. 6.
  • the step (S32OO) of identifying the orthogonal axes that spans a hyperplane includes a step (S3210) of using a principal component analysis (PCA) as shown in Fig. 7.
  • the hyperplane is formed by the cloud of points plotted S regarding each of the expressions in the 3l ⁇ 7-dimensional space.
  • the step (S3200) of identifying the orthogonal axes that spans the hyperplane includes steps of: a) taking the mean of v, , where the summation is taken over the entire motion capture data ⁇ (S3220); b) obtaining a centered point cloud, *t ⁇ ⁇ *t ⁇ J*(S3230) ; and c) constructing the covariance matrix C using
  • the C is a symmetric positive-definite matrix with positive eigenvalues A 1 ,....,A 3N in order of magnitude, with A 1 being the largest.
  • the method may further include a step (S3250) of obtaining the eigen faces from m eigenvectors E FCA corresponding to [A 1 ,...., ⁇ m ⁇ , the principal axes as shown in Fig. 8.
  • the method may further include a step (S3260) of converting the hand-generated expression basis into the quasi-eigen basis, the set of quasi-eigen faces, with the eigenfaces ready as shown in Fig. 8.
  • the step (S3260) of converting the hand-generated expression basis into the quasi-eigen basis includes steps of: a) computing where i ranges over all the hand-generated elements , and j ranges over all the principal axes ( S3261 ) ; and b ) obtaining the quasi-eigen faces by
  • the method may further include a step (S3263) of synthesizing a general expression by the linear combination
  • the weights uf £ takes on both positive and negative values when the eigenfaces are used.
  • the method may further include a step (S3264) of taking ef to represent the full actuation of a single expression muscle with other muscles left relaxed for intrinsically ruling out the possibility of two hand-generated elements having almost identical shapes as shown in Fig. 10.
  • the method may further include a step (S3269) of retargeting the facial expressions by feeding a predetermined expression weight vector to a deformation bases as shown in Fig. 10.
  • the predetermined expression weight vector is obtained
  • Iv * -vl V d -Y ⁇ f 2 , where d * . and e ⁇ are the ft displacements of the j-th vertex of v * and ef* , respectively, from v° .
  • VLASIC D., BRAND, M., PFISTER, H., AND POPOVIC, J. 2005. Face transfer with multilinear models. In Proceedings of SIGGRAPH 2005, ACM Press, 426-433.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

In blendshape-based facial animation, two main approaches are used to create the key expressions: manual sculpting and statistically-based techniques. Hand- generated expressions have the advantage of being intuitively recognizable, thus allowing animators to use conventional keyframe control. However, they may cover only a fraction of the expression space, resulting in large reproduction animation errors. On the other hand, statistically-based techniques produce eigenfaces that give minimal reproduction errors but are visually non-intuitive. In the invention the applicants propose a technique to convert a given set of hand-generated key expressions into another set of so-called quasi-eigen faces. The resulting expressions resemble the original hand-generated expressions, but have expression space coverages more like those of statistically generated expression bases. The effectiveness of the proposed technique is demonstrated by applying it to hand-generated expressions.

Description

METHOD FOR GENERATING INTUITIVE QUASI-EIGEN FACES
Technical Field
The present invention relates to a method for generating intuitive quasi-eigen faces.
Background Art
1 Introduction
The facial expressions of animated characters play a central role in delivering the story. For an animation studio, therefore, the ability to generate expressive and plausible facial expressions is a critical skill. Nevertheless, as yet no standard procedure for generating facial expressions has been established; when facial animations are required, a whole gamut of approaches are mobilized, ranging from labor-intensive production work to state-of-the-art technical supports. The invention proposes a small but very useful innovation in the area of 3D facial animation, which can be adopted in a wide range of facial animation productions .
Probably, the most popular approach currently used in facial animation productions is the so-called blendshape technique, which synthesizes expressions by taking a linear combination of a set of pre-modeled expressions. The applicants call this expression set the expression basis. Many commercial animation packages such as Maya and Softimage support blendshape-based facial animation. The technique the applicants develop in the present invention is for systems of this type. A fundamental question in developing a blendshape-based facial animation system is how to control the expressions.
One approach is to let the animators manually control the weights assigned to each member of the expression basis set in order to produce the desired expression sequences.
Another popular approach that can be taken provided facial motion capture is available, is to set up the system so that the facial animation is driven by a human performance.
In this approach, if the basis is taken from the human subject, in principle the original performance can be reproduced. Although such reproduction may not be needed in an animation production, it has theoretical significance to developers because it can utilized as a benchmark of a blendshape technique: if a method can accurately reproduce the original performance, it can produce other facial animations accurately. The present work assumes that the facial animation system is operated by performance-driven control, but also assumes that manual control can be added whenever the results need to be edited. Another fundamental issue that must be resolved when developing a blendshape technique is how to form the expression basis. The present work is related to this issue. A casual approach practiced by many animation studios is to use an expression basis comprised of manually modeled, intuitively recognizable key expressions. The basis should contain sufficient elements span the desired range of expressions. The term "basis" is usually reserved for an independent set of elements that spans the entire expressions. In the disclosure of the present application, however, the applicants use the term to loosely mean a set of expressions from which linear combinations are taken. An advantage of using a hand-generated basis is that the combinations of basis elements produce somewhat predictable results. A disadvantage of this approach is that the linear combinations may cover only a portion of full range of facial expressions. When the system is used to reproduce a human performance, the lack of coverage manifests as reproduction errors.
In the context of blendshape-based reproduction of human performances, another well-established approach to obtain the expression basis is to use principal component analysis (PCA) . In this method, a set of mutually-orthogonal principal components that spans the expression space is generated by statistical analysis of performance data. Because this technique gives quantitative information on the coverage of each component, by selecting the dominant components the applicants can form an expression basis whose coverage is predictable and greater than that of manually generated bases, resulting in more accurate reproduction of the original performance. A drawback of this approach is that the expressions corresponding to the principal components are visually non-intuitive. Hence animators cannot predict the expression that will be produced by a particular linear combination. Here the applicants propose a new approach to basis generation that gives coverages comparable to those of statistically generated bases while at the same time having basis elements with meaningful shapes. This approach is based on our observation that a hand-generated expression can be modified such that the resulting expression remains visually close to the original one but its coverage over the expression space increases. It is also based on the relaxation the applicants take that the basis elements do not need to be strictly orthogonal to each other; they can still span the expression space.
2 Related work
A large number of techniques for synthesizing human expressions have been proposed since the pioneering work of [Parke 1972]. Facial expression can be viewed as resulting from the coordination of (mechanical) components such as the jaw, muscles, and skin.
Various researchers have explored physically based techniques for synthesizing facial expressions [Waters 1987; Terzopoulos and Waters 1990; Terzopoulos and Waters 1993; Lee et al. 1995; Wu et al. 1995; Essa and Pentland 1997; Kahler et al. 2001; Choe et al. 2001; Sifakis et al. 2005]. The present work takes a different approach: expressions are synthesized by taking linear combinations of several key expressions. Thus, instead of looking into the physics of facial components, the applicants utilize facial capture data to obtain realistic results. In this section, the applicants review previous work, with a focus on blendshape techniques and performance-driven facial animation techniques .
The blendshape technique has been widely used for expression synthesis. To generate human expressions in realtime, [Kouadio et al. 1998] used linear combinations of a set of key expressions, where the weight assigned to each expressioin was determined from live capture data. [Pighin et al.1998] created a set of photorealistic textured 3D expressions from photographs of a human subject, and used the blendshape technique to create smooth transitions between those expressions. [Blanz and Vetter 1999] introduced a morphable model that could generate a 3D face from a 2D photograph by taking a linear combination of faces in a 3D example database. To increase the covering range of the key expressions, [Choe and Ko 2001] let animators sculpt expressions corresponding to the isolated actuation of individual muscles and then synthesized new expressions by taking linear combinations of them.
A critical determinant, of the quality of the expression generated by blendshape-based synthesis is the covering range of the key expressions being used. [Chuang 2002] used a PCA-based procedure to identify a set of key expressions that guarantees a certain coverage. However, the resulting principal components did not correspond intuitively meaningful human expressions. [Chao et al. 2003] proposed another basis generation technique based on independent component analysis. In the key expression set produced using this approach, the differences among the elements were more recognizable than those generated by [Chuang 2002]; however, the individual elements in the set still did not accurately represent familiar/vivid human expressions. As a result, conventional keyframe control is not easy using this approach. To enable separate modifications of specific parts of the face in a blendshape-based system, [Joshi et al. 2003] proposed automatic segmentation of each key expression into meaningful blend regions . [Williams 1990] introduced a performance-driven approach to synthesize human expressions. This approach utilizes the human ability to make faces and has been shown to be quite effective for controlling high-DOF facial movements. The uses of this approach for blendshape-based reproduction of facial performances were introduced above.
[Noh and Neumann 2001; Pyun et al. 2003; Na and Jung 2004;
Wang et al. 2004] proposed techniques to retarget performance data to synthesize the expressions of other characters. Recently, [Vlasic et al. 2005] developed a multilinear model that can transfer expressions/speech of one face to other faces.
Another class of performance-driven facial animation techniques is the speech-driven techniques. [Bregler et al. 1997; Brandl999; Ezzat et al . 2002; Chao et al. 2004; Chang and Ezzat. 2005; Deng et al . 2005] are several representative works exploring this research direction.
Accordingly, a need for the method for generating intuitive quasi-eigen faces has been present for a long time, This invention is directed to solve these problems and satisfy the long-felt need.
Disclosure of Invention
The present invention contrives to solve the disadvantages of the prior art.
An object of the invention is to provide a method for generating intuitive quasi-eigen faces to form the expression basis for blendshape-based facial animation systems . Another object of the invention is to provide a method for generating intuitive quasi-eigen faces, which the resulting expressions resemble the given expressions.
Still another object of the invention is to provide a method for generating intuitive guasi-eigen faces, which has significantly reduced reconstruction errors than hand- generated bases.
3 Problem Description Let v = v(£) =[v[,....,v^]r represent the dynamic shape of the
3D face model at time t. It is a triangular mesh consisting of N vertices, where vi represents the 3D position of the i- th vertex. The applicants assume that the geometry v°
Figure imgf000008_0001
of the neutral face is given. The applicants also assume that motion capture data are given in a 3NxL matrix Ξ= [v(l), . . . ,v(L)], where L is the duration of the motion capture in number of frames . The applicants are interested in finding a set of facial expressions, linear combinations of which span Ξ . Let E"
Figure imgf000008_0002
be the hand-generated expression basis that is given by the animator. Here, n is the number of elements and ef is the geometry of the i-th element. Let ef represent the displacement of effrom the neutral face, i.e., ef . In the invention, the applicants call the set of displacements such as EH
Figure imgf000008_0003
also the (hand-generated) expression basis if it does not cause any confusion. When the weights wf axe given, the applicants synthesize the expression v by
Figure imgf000009_0001
A potential problem of the hand-generated expression basis EH is that linear combinations of the basis elements may not span Ξ . The goal of the invention is to develop a procedure to convert EM into another basis EQE
Figure imgf000009_0002
, such that the new basis spans Ξ and each element ef£ visually resembles the corresponding element ef in EH . The applicants call the elements in the new basis quasi-eigen faces.
According to the present invention, a method for generating intuitive guasi-eigen faces includes steps of: a) representing a dynamic shape of a three dimensional face model with a vector; b) making a hand-generated expression basis; c) converting the hand-generated expression basis into a new expression basis, wherein the new expression basis is a quasi-eigen faces; and d) synthesizing expression with the quasi-eigen faces.
The new expression basis includes a plurality of quasi- eigen faces, wherein the linear combinations of the quasi- eigen faces cover a motion capture data, wherein each of the quasi-eigen faces resembles a corresponding element of the hand-generated expression basis.
The dynamic shape of a three dimensional face model is represented by a vector,
Figure imgf000009_0003
, where Vj represents the 3D position of the i-th vertex, N is a number of vertices . The vector v is represented by a triangular mesh including N vertices, and the vector forms a facial mesh data.
The expression v is synthesized by v-V-JZXef, where the neutral face,
Figure imgf000010_0001
, and the weights, wf , are given. The hand-generated expression basis is represented by E"
Figure imgf000010_0002
, where n is the number of elements of the hand-generated expression basis and ef is the geometry of the i-th element.
The method set of displacement is represented by
£, --Je1 ,....,eNj- , where ef represents the displacement of ef from the neutral face,
Figure imgf000010_0003
The step of converting the hand-generated expression basis into a new expression basis includes steps of: a) forming an approximate hyperplane out of the motion capture data or the facial mesh data; and b) identifying the orthogonal axes that spans the hyperplane. The step of identifying the orthogonal axes that spans a hyperplane includes a step of using a principal component analysis (PCA) .
The motion capture data are given in given in a 3NxL matrix Ξ= [v(l), . . . ,v(L)], where N is the number of vertices of mesh representing the motion capture data, where
L is the duration of the motion capture in number of frames. The hyperplane is formed by the cloud of points plotted Ξ regarding each of the expressions in the 3N-dimensional space.
The step of identifying the orthogonal axes that spans the hyperplane includes steps of: a) taking the mean of v,
Figure imgf000011_0001
, where the summation is taken over the entire motion capture data Ξ ; b) obtaining a centered point cloud, D==[^1) >•••»>(£) ] , where ^C1')5^Hή™J* ; and c) constructing
the covariance matrix C using
Figure imgf000011_0002
The C is a symmetric positive-definite matrix with positive eigenvalues \,--,^N in order of magnitude, with A1 being the largest.
The method may further include a step of obtaining the eigen faces from m eigenvectors EPCA
Figure imgf000011_0003
corresponding to {\,...,,Am} , the principal axes.
The coverage of the principal axes is given by
The method may further include a step of converting the hand-generated expression basis into the quasi-eigen basis, the set of quasi-eigen faces, with the eigenfaces ready.
The step of converting the hand-generated expression basis into the quasi-eigen basis includes steps of: a) computing
where i ranges over all the hand-generated elements, and j ranges over all the principal axes; and b) obtaining the quasi-eigen faces by
Figure imgf000012_0001
The method may further include a step of synthesizing a general expression by the linear combination
Figure imgf000012_0002
The weights w?E takes on both positive and negative values when the eigenfaces are used.
The method may further include a step of taking ef to represent the full actuation of a single expression muscle with other muscles left relaxed for intrinsically ruling out the possibility of two hand-generated elements having almost identical shapes.
The method may further include steps of: a) looking at the matrix ^PCA-W-QE = ^WPCA-1O-QE^ . b) determining if nfA is
missing in the quasi-eigen basis by testing if Y" |w£CA~ft)-β£ is less than a threshold ε ; c) augmenting the basis with e™ ; d) notifying the animator regarding the missing eigenface
^PCA e;
The method may further include a step of retargeting the facial expressions by feeding a predetermined expression weight vector to a deformation bases. The predetermined expression weight vector is obtained
2 by minimizing v -v Cr'--2Vπfef , where d*. and eψ are the
displacements of the j-th vertex of v* and
Figure imgf000012_0003
, respectively, from v° .
The advantages of the present invention are: (1) the method provides a method for generating intuitive quasi- eigen faces to form the expression basis for blendshape- based facial animation systems; (2) the resulting expressions resemble the given expressions; and (3) the method makes it possible to significantly reduce reconstruction errors than hand-generated bases.
Although the present invention is briefly summarized, the fuller understanding of the invention can be obtained by the following drawings, detailed description and appended claims .
Brief Description of the Drawings
These and other features, aspects and advantages of the present invention will become better understood with reference to the accompanying drawings, wherein: Fig. 1 shows an analogical drawing of the motion capture data and the bases;
Fig. 2 shows a side-by-side comparison of the hand- generated basis and the quasi-eigen basis in four selected elements; Fig. 3 shows a reconstruction of the original performance;
Fig. 4 shows a comparison of the reconstruction errors; Fig. 5 is a flow chart showing a method according to the present invention; Fig. 6 is a flow chart showing a step of converting the hand-generated expression basis into a new expression basis; Fig. 7 is a flow chart showing a step of identifying orthogonal axes that spans the hyperplane; Fig. 8 is another flow chart showing a step of identifying orthogonal axes that spans the hyperplane in detail;
Fig. 9 is a flow chart showing a step of converting into the quasi-eigen basis; and
Fig. 10 is a flow chart showing a step of converting into the guasi-eigen basis in detail.
Best Mode for Carrying Out the Invention 4 Obtaining Quasi-Eigen Faces
If the facial vertices vi, ..., vw are allowed to freely move in 3D space, then v will form a 3N-dimensional vector space. Let us call this space the mathematical expression space E. However, normal human expressions involve a narrower range of deformation. If the applicants plot each expression in Ξ as a point in 3iV-dimensional space, the point cloud forms an approximate hyperplane. The PCA is designed to identify the orthogonal axes that spans the hyperplane . The analogical situation is shown in Fig. 1. The 3D coordinate system can be viewed as E, the dots as forming the Ξ -hyperplane, and the solid perpendicular axes as the principal components. The arrows 10 stand for PCA basis, the arrows 20 hand-generated basis, and the arrows 30 quasi- eigen basis.
The procedure for obtaining the quasi-eigen faces is based on the principal components. Finding the principal components requires the point cloud to be centered at the origin. Let
Figure imgf000014_0001
be the mean of v where the summation is taken over the entire motion capture data Ξ . Then, the applicants can obtain a centered point cloud,
Figure imgf000015_0001
where *(»}==v(ι}—μ # ^ow the applicants construct the covariance matrix C using
I (2) C is a symmetric positive-definite matrix, and hence has positive eigenvalues. Let ^,....,A3N be the eigenvalues of C in order of magnitude, with Ax being the largest. The m eigenvectors EPCΛ ={^G*,....,e^G4} corresponding to {\,....,Am} are the principal axes the applicants are looking for, the coverage of which is given by \ Λ/T^Λ • τ^e facial expressions in gPCA are called the eigenfaces. Since the coverage is usually very close to unity even for small m (e.g., in the case of the motion capture data used in this disclosure, m = 18 covers 99.5% of Ξ ), the above procedure provides a powerful means of generating an expression basis that covers a given set Ξ of expressions. A problem of this approach is that, even though the eigenfaces have mathematical significance, they do not represent recognizable human expressions . In the context of generating the eigenfaces, the applicants can now describe the method to convert the hand- generated expression basis into the quasi-eigen basis (i.e., the set of quasi-eigen faces). This method is based on our observation that the hand-generated elements may lie out of the hyperplane. In the analogical situation drawn in Fig. 1, let's consider two hand-generated expressions (two 3D vectors in the figure) that do not lie on the hyperplane. Although any linear combination of the two expressions should be allowed in principle, if the applicants stipulate that the result must lie on the hyperplane to form a valid expression, then the ratio between the weights assigned to the two expressions must be fixed. This means that, the two expressions, rather than spanning a two dimensional range of expressions, in fact only cover a one dimensional range, resulting in significant coverage loss. Linear combinations that are formed disregarding this constraint will be positioned out of the hyperplane, which explains why reproduction of an original performance generated using a hand-generated bases usually contain large errors.
A simple fix to the above problem would be to project the handgenerated elements onto the hyperplane; the quasi- eigen faces the applicants are looking for in this disclosure are, in fact, the projections of the hand- generated basis elements. To find the projection of a handgenerated element onto each principal axis, the applicants first compute
Figure imgf000016_0001
where i ranges over all the hand-generated elements, and j ranges over all the principal axes. Now, the applicants can obtain the guasi-eigen faces by
Figure imgf000016_0002
With the quasi-eigen basis, a general expression is synthesized by the linear combination Y" w9EePCA . The applicants would note that in most blendshape-based facial animation systems, the weights are always positive and, in some cases, are further constrained to lie within the range
[0;l] in order to prevent extrapolations. When the eigenfaces are used, however, the weights wfE is supposed to take on both positive and negative values . The weights of the quasi-eigen basis should be treated like the eigenfaces: even though they are not orthogonal, their ingredients are from an orthogonal basis. Allowing negative weights obviously increases the accuracy of the reproduction of a performance. Although keyframe-animators would not be familiar with negative weights, allowing weights to take on negative values can significantly extend the range of allowed expressions.
The projection steps of Equations 3 and 4 will modify to the hand-generated elements. The applicants need to assess whether the new expressions are visually close to the original ones. If a hand-generated expression lies on the hyperplane (or, is contained in the motion capture data), then it will not be modified by the projection process. When a hand-generated expression is out of the hyperplane, however, the projection will introduce a minimal Euclidean modification to it. Although the scale for visual differences is not the same as that of Euclidean distance, small Euclidean distances usually correspond to small visual changes .
Another aspect that must be checked is the coverage of
E^ . In the analogical case shown in Fig. 1, when there are two 3D vectors that do not coincide, it is highly likely that the projections of those vectors span the hyperplane. Similarly, if the number of hand-generated expressions is equal to or larger than m, it is highly probable that the projections of those expression will cover the hyperplane Ξ . Blow the applicants introduce several measures that can help avoid potential (but very rare) degenerate cases.
Preventive Treatments: The applicants can guide the sculpting work of the animator so as to avoid overlap among the hand-generated expressions. For example, the applicants can take ef to represent the full actuation of a single expression muscle with other muscles left relaxed, which intrinsically rules out the possibility of two hand- generated elements having almost identical shapes [Choe and Ko 2001], For this purpose, animators can refer to reference book showing drawings of the expressions corresponding to isolated actuation of individual muscles [Faigin 1990]. The facial action coding system [Ekman and Friesen 1978] can also be of great assistance constructing non-overlapping hand-generated expression bases.
Post-Treatments: In spite of the above preventive treatments, the quasi-eigen basis may leave out a PCA-axis. Situations of this type can be identified by looking at the matrix is less than a threshold ε,
Figure imgf000018_0001
the applicants conclude that e^ is missing in the quasi- eigen basis. In such a case, the applicants can simply augment the basis with e^ , or, can explicitly notify the animator regarding the missing eigenface ej04 and let him/her make (minimal) modification to it so that its projection can be added to the keyframing basis as well as the quasi-eigen basis.
5 Experiments To test the proposed method, the applicants obtained a set of facial capture data, and modeled a hand-generated expression basis, based on the actuation of the expression muscles. The applicants followed the procedure described in the previous section and produced the quasi-eigen basis from the hand-generated expression basis.
5.1 Capturing the Facial Model and Performance
The applicants captured the performance of an actress using a Vicon optical system. Eight cameras tracked 66 markers attached to her face, and an additional 7 markers that were attached to her head to track the gross motion, at a rate of 120 frames per second. The total duration of the motion capture was L = 35,000 frames. The applicants constructed the 3D facial model using a Cyberware 3D scanner. The applicants established the correspondence between the 3D marker positions and the geometrical model of the face using the technique that was introduced by Pighin et al. [1998].
5.2 Preparing the Training Data Ξ The motion capture data Ξ .is a sequence of facial geometries. The applicants convert the marker positions of each frame obtained into a facial mesh. For this, the applicants apply an interpolation technique that is based on the radial basis function. The technique gives the 3D displacements of the vertices that should be applied to the neutral face.
5.3 Preparing the Hand-Generated Expression Basis The applicants performed PCA on the data obtained in Section 5.2. Covering 99.5% of X corresponded to taking the first m = 18 principal components. The applicants asked animators to sculpt a hand-generated expression basis
EH
Figure imgf000020_0001
18 elements. If the elements are clustered in E, then their projections will also be clustered in the hyperplane; this will result in poor coverage, requiring the hand-generation of additional basis elements. To reduce the hand-work of the animators, the applicants guided the sculpting work by considering the size and location of the expression muscles, so that each basis element corresponds to the facial shape when a single expression muscle is fully actuated and all other muscles relaxed.
In our experiment, the applicants made 18 hand- generated expressions . Six elements are for the actuation of muscles in the upper region, 12 are for muscles in the lower region.
5.4 Obtaining the Quasi-Eigen Faces Starting from the given hand-generated basis, the applicants followed the steps described in Section 4 to obtain the quasi-eigen faces. A selection of the quasi-eigen expressions are shown in Fig. 2 along with the corresponding hand-generated expressions. Comparison of the quasi-eigen and hand-generated expressions verifies that although projecting a hand-generated expression onto the hyperplane may involve non-negligible geometry modifications, the original visual impression is preserved.
Running the preprocessin steps, which included the PCA on 6,000 frames of training data, took 158 minutes on a PC with an Intel Pentium 4 3.2GHz CPU and Nvidia geforce 6800
GPU. After the training was complete, the applicants could create quasi-eigen faces in realtime.
5.5 Analysis
Now, the applicants approximate each frame of Ξ .with a linear combination of the quasi-eigen faces . Let v = v° +
Figure imgf000021_0001
be the reconstruction of a frame, and let v* = v° + d*be the original expression of Ξ, where d* is the 3N- dimensional displacement vector from the neutral expression.
The applicants find the Ω-dimensional weight vector v/QE
Figure imgf000021_0002
by minimizing
N v -v <- M^ (5)
where d* and e?£ are the displacements of the 7-th vertex of v* and efε , respectively, from v° . The applicants solve equation 5 using the quadratic programming, which required about 0.007 second per frame. To evaluate the accuracy of the reproduction, the applicants used the following error metric :
Figure imgf000021_0003
For comparison, the above analysis was also performed using the bases EH and EPCA . The a values obtained using the three bases were aQE = 0.72%, a" = 5.2%, and cf^ = 0.62%.
The results thus indicate that, in terms of coverage, EQE is slightly inferior to EPCΛ and far better than EH .
Qualitative comparison of the reconstructions can be made in the accompanying video. Fig. 3 shows still images extracted from the video. The three images in the left column are taken from the original motion capture. The middle and right columns show the reconstruction of those frames with E" and E^, respectively.
Fig. 4 visualizes the errors introduced during the reconstruction (left: for E" , right: for EQE ) , for the frame shown on the top of Fig. 3. The dots represent the captured marker positions and their positions in the reconstructed result. The applicants can clearly see that reconstruction with the quasi-eigen faces (shown on the right) makes a less amount of error than the reconstruction with the handgenerated basis (shown on the left). To verify the reconstruction quality in more dynamic facial movements , the applicants also experimented the reconstruction of a speech. The video shows that the reconstructed speech with EQE is more realistic than that with EH .
6 Conclusion
In the present invention, the applicants have presented a new method for generating expression bases for blendshape- based facial animation systems. Animation studios commonly generate such bases by manually modeling a set of key expressions. However, hand-generated expressions may contain components that are not part of human expressions, and reconstruction/animation by taking linear combinations of these expressions may produce reconstruction errors or unrealistic results. On the other hand, statistically-based techniques can produce high-fidelity expression bases, but the basis elements are not intuitively recognizable. Here the applicants have proposed a method for generating so- called quasi-eigen faces, which have intuitively recognizable shapes but significantly reduced reconstruction errors compared to hand-generated bases.
In the present invention the applicants have focused on the reproduction of captured performances. This approach was taken based on our experience in facial animation that, in most cases, technically critical problems reside in the analysis part rather than in the synthesis part. If the analysis is performed accurately, then expression synthesis, whether it be reproduction or animation of other characters, will be accurate. The experiments performed in the present work showed that the proposed technique produces basis elements that are visually recognizable as typical human expressions and can significantly reduce the reconstruction error. Even though the applicants did not demonstrated in the disclosure, the proposed technique can be effectively used for synthesizing expressions of other characters than the captured subject.
The proposed technique is an animator-in-the-loop method whose results are sensitive to the hand-generated expressions provided by the animator. If the animator provides inadequate expressions, the projection will not improve the result. The applicants have found that a musclebased approach to the modeling of the hand-generated expressions, as used in Section 5, effectively extends the coverage of the basis . Application of the proposed projection to hand-generated elements of this type reduces the reconstruction error. The muscle-based approach is not, however, the only way to obtain non-overlapping hand- generated expressions. A better guidance may be developed in the future, which can help the animator sculpt intuitively meaningful but non-overlapping faces .
According to the present invention, a method for generating intuitive quasi-eigen faces includes steps of: a) representing a dynamic shape of a three dimensional face model with a vector (SlOOO); b) making a hand-generated expression basis (S2000); c) converting the hand-generated expression basis into a new expression basis (S3000) , wherein the new expression basis is a quasi-eigen faces; and d) synthesizing expression with the quasi-eigen faces (S4000) as shown in Fig. 5.
The new expression basis includes a plurality of quasi- eigen faces, wherein the linear combinations of the quasi- eigen faces cover a motion capture data, wherein each of the quasi-eigen faces resembles a corresponding element of the hand-generated expression basis.
The dynamic shape of a three dimensional face model is represented by a vector,
Figure imgf000024_0001
, where v± represents the 3D position of the i-th vertex, N is a number of vertices . The vector v is represented by a triangular mesh including N vertices. The vector forms a facial mesh data. The expression v is synthesized by
where the neutral face, v°
Figure imgf000025_0001
, and the weights, wf , are given. The hand-generated expression basis is represented by E"
Figure imgf000025_0002
, where n is the number of elements of the hand-generated expression basis and ef is the geometry of the i-th element. The method set of displacement is represented by
£, -^e1 ,....,εNy , where ef represents the displacement of ef from the neutral face, ef-ef-v".
The step (S3000) of converting the hand-generated expression basis into a new expression basis includes steps of: a) forming an approximate hyperplane out of the motion capture data or the facial mesh data (S3100); and b) identifying the orthogonal axes that spans the hyperplane (S3200) as shown in Fig. 6. The step (S32OO) of identifying the orthogonal axes that spans a hyperplane includes a step (S3210) of using a principal component analysis (PCA) as shown in Fig. 7.
The motion capture data are given in given in a 3NxL matrix Ξ= [v(l), . . . ,-V(L)], where N is the number of vertices of mesh representing the motion capture data, where L is the duration of the motion capture in number of frames. The hyperplane is formed by the cloud of points plotted S regarding each of the expressions in the 3l\7-dimensional space.
The step (S3200) of identifying the orthogonal axes that spans the hyperplane includes steps of: a) taking the mean of v,
Figure imgf000026_0001
, where the summation is taken over the entire motion capture data Ξ (S3220); b) obtaining a centered point cloud,
Figure imgf000026_0002
*tøβ*tø~J*(S3230) ; and c) constructing the covariance matrix C using
£ (S3240) as shown in Fig. 7.
The C is a symmetric positive-definite matrix with positive eigenvalues A1,....,A3N in order of magnitude, with A1 being the largest.
The method may further include a step (S3250) of obtaining the eigen faces from m eigenvectors EFCA
Figure imgf000026_0003
corresponding to [A1,....,Λm} , the principal axes as shown in Fig. 8.
The coverage of the principal axes is given by
The method may further include a step (S3260) of converting the hand-generated expression basis into the quasi-eigen basis, the set of quasi-eigen faces, with the eigenfaces ready as shown in Fig. 8.
The step (S3260) of converting the hand-generated expression basis into the quasi-eigen basis includes steps of: a) computing where i ranges over all the hand-generated elements , and j ranges over all the principal axes ( S3261 ) ; and b ) obtaining the quasi-eigen faces by
eQE = u+ y wPCA-'0-βEePCλ
( S3262 ) as shown in Fig . 9 and Fig . 10 .
The method may further include a step (S3263) of synthesizing a general expression by the linear combination
Figure imgf000027_0001
The weights uf£ takes on both positive and negative values when the eigenfaces are used.
The method may further include a step (S3264) of taking ef to represent the full actuation of a single expression muscle with other muscles left relaxed for intrinsically ruling out the possibility of two hand-generated elements having almost identical shapes as shown in Fig. 10.
The method may further include steps of: a) looking at the matrix ψPCA->°-QE =(wPCA-<°-QE) (S3265); b) determining if ePCΛ is missing in the quasi-eigen basis by testing if \" w PCA-'°-QE is less than a threshold ε (S3266)? c) augmenting the basis with ePCA (S3267); d) notifying the animator regarding the missing eigenface ej (S3268) as shown in Fig. 10.
The method may further include a step (S3269) of retargeting the facial expressions by feeding a predetermined expression weight vector to a deformation bases as shown in Fig. 10. The predetermined expression weight vector is obtained
by minimizing Iv* -vl = V d -YΛf 2 , where d*. and e^ are the ft displacements of the j-th vertex of v* and ef* , respectively, from v° .
While the invention has been shown and described with reference to different embodiments thereof, it will be appreciated by those skilled in the art that variations in form, detail, compositions and operation may be made without departing from the spirit and scope of the invention as defined by the accompanying claims.
References
BLANZ, V. AND VETTER, T. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of SIGGRAPH 1999, ACM Press, 187-194.
BRAND, M. 1999. Voice puppetry. In Proceedings of SIGGRAPH 1999, ACM Press, 21-28.
BREGLER, C. 1997. Video Rewrite: driving visual speech with audio. In Proceedings of SIGGRAPH 1997, ACM Press, 353-360.
CHAI, J., XIAO, J. AND HODGINS, J. 2003. Vision-based control of 3D facial animation. In Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation. , 193- 206.
CHANG, Y.-J., AND E2ZAT, T. 2005. Transferable Videorealistic Speech Animation. In Proceedings of ACM SIGGRAPH/
Eurographics Symposium on Computer Animation. , 143-151.
CHAO, Y., FALOUTSOS, P. AND PIGHIN, F. 2003. Unsupervised learning for speech motion editing. In Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation. , 225- 231.
CHAO, Y., FALOUTSOS, P., KOHLER, E., AND PIGHIN, F. 2004. Real-time speech motion synthesis from recorded motions. In Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation. , 347-355.
CHOE, B., LEE, H AND KO, H. -S. 2001. Performance-driven irmsclebased facial animation. In Journal of Visualization and Computer Animation 1999, 61-19.
CHOE, B. AND KO, H. -S. 2001. Analysis and synthesis of facial expressions with hand-generated muscle actuation basis. In Proceedings of Computer Animation 2001, 12-19.
CHUANG, E. 2002. Performance driven facial animation using blendshape. Stanford University Computer Science Technical Report, CS-TR-2002-02, 2002.
DENG, Z., LEWIS, J. P., AND NEUMANN, U. 2005. Synthesizing Speech Animation by Learning Compact Speech Co- Articulation Models . In Proceedings of Computer graphics international., IEEE Computer Society Press, 19-25.
EKMAN, P. AND FRIESEN, W. V. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Inc.
ESSA, I. AND PENTLAND, A. 1997. Coding, analysis, interpretation, and recognition of facial expressions. IEEE Transaction on Pattern Analysis and Machine Intelligence Vol. 19, No. 7, 757-763. EZZAT, T., GEIGER, G., AND POGGIO, T. 2002. Trainable videorealistic speech animation. In Proceedings of SIGGRAPH 2002, ACM Press, 388-398.
FAIGIN G. 1990. The Artists Complete Guide to Facial Expression. Watson-Guptill Publications
GUENTER, B., GRIMM, C, WOOD, D. MALVAR, H. AND PIGHIN, F. 1998. Making faces. In Proceedings of SIGGRAPH 1998, ACM Press, 55-66.
JOSHI, P., TIEN, W., C, DESBRUN, M., AND PIGHIN, F. 2003. Learning controls for blend shape based realistic facial animation. In Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation.
KAHLER, K., HABER, J., AND SEIDEL, H. -P. 2001. Geometrybased muscle modeling for facial animation. In Proceedings of Graphics Interface 2001, 37-46.
KOUADIO, C, POULIN, P., AND LACHAPELLE , P. 1998. Realtime facial animation based upon a bank of 3D facial expression. In Proceedings of Computer Animation 1998, IEEE Computer Society Press.
LEE, Y., TERZOPOULOS, D., AND WATERS, K. 1995. Realistic modeling for facial animation. In Proceedings of SIGGRAPH 1995, ACM Press, 55-62. NA, K. AND JUNG, M. 2004. Hierarchical retargetting of fine facial motions. In Proceedings of Eurographics 2004, Vol. 23.
NOH, J. AND NEUMANN, U. 2001. Expression cloning In Proceedings of SIGGRAPH 2001, ACM Press, 277-288.
PARKE, F. I. 1972. Synthesizing realistic facial expressions from photographs. In Proceedings of ACM Conference 1972, ACM
Press, 451-457.
PIGHIN, F., HECKER, J., LISCHINSKI, D., SZELISKI, R., AND
SALESIN, D. H. 1998. Synthesizing realistic facial expressions from photographs. In Proceedings of SIGGRAPH 1998, ACM Press, 75-84.
PIGHIN, F., SZELISKI, R., AND SALESIN, D. H. 1999. Resynthesizing facial animation through 3D model-based tracking. In Proceedings of International Conference on Computer Vision 1999, 143-150.
PYUN, H., KIM, Y., CHAE, W., KANG, H. W., AND SHIN, S. Y. 2003. An example-based approach for facial expression cloning. In Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation. , 167-176.
SIFAKIS, E., NEVEROV, I., AND FEDKIW, R. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. In Proceedings of SIGGRAPH 2005, ACM Press, 417-425.
TERZOPOULOS, D., AND WATERS, K. 1990. Physically-based facial modeling, analysis and animation. The Journal of Visualization and Computer Animation, 73-80.
TERZOPOULOS, D., AND WATERS, K. 1993. Analysis and synthesis of facial image sequences using physical and anatomical models. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15, No. 6, 569-579.
VLASIC, D., BRAND, M., PFISTER, H., AND POPOVIC, J. 2005. Face transfer with multilinear models. In Proceedings of SIGGRAPH 2005, ACM Press, 426-433.
WANG, Y., HUANG, X., LEE, C. S., ZHANG, S., LI, Z., SAMARAS, D., METAXAS, D., ELGAMMAL, A., AND HUANG, P.
2004. High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. In Proceedings of Eurographics 2004, 677-686
WATERS, K. 1987. A muscle model for animating threedimensional facial expressions. In Proceedings of SIGGRAPH
1987, ACM Press, 17-24.
WILLIAMS, L. 1990. Performance-driven facial animation. In Proceedings of SIGGRAPH 1990, ACM Press, 235-242. WU, Y., THALMANN, N. M., AND THALMANN, D. 1995. A dynamic wrinkle model in facial animation and skin ageing. Journal of Visualization and Computer Animation 1995, 195-206. .
ZHANG, L., SNAVELY, N., CURLESS, B., AND SEITZ, S. 2004. Spacetime faces: High resolution capture for modeling and animation. In Proceedings of SIGGRAPH 2004, ACM Press, 548- 558.

Claims

1. A method for generating intuitive quasi-eigen faces comprising steps of: a) representing a dynamic shape of a three dimensional face model with a vector; b) making a hand-generated expression basis; c) converting the hand-generated expression basis into a new expression basis, wherein the new expression basis is a quasi-eigen faces; and d) synthesizing expression with the quasi-eigen faces, wherein the new expression basis comprises a plurality of quasi-eigen faces, wherein the linear combinations of the quasi-eigen faces cover a motion capture data, wherein each of the quasi-eigen faces resembles a corresponding element of the hand-generated expression basis.
2. The method of claim I1 wherein the dynamic shape of a three dimensional face model is represented by a vector,
Figure imgf000035_0001
, wherein Vi represents the 3D position of the i-th vertex, I is a number of vertices.
3. The method of claim 2, wherein the vector v is represented by a triangular mesh comprising N vertices, wherein the vector forms a facial mesh data.
4. The method of claim 3, wherein the expression v is synthesized by
V+ ∑X«f wherein the neutral face, v°
Figure imgf000036_0001
, and the weights, wf , are given.
5. The method of claim 4, wherein the hand-generated expression basis is represented by E" ={e",....,e"} i wherein n is the number of elements of the hand-generated expression basis and ef is the geometry of the i-th element.
6. The method of claim 5, wherein the set of displacement is represented by
Figure imgf000036_0002
wherein ef represents the displacement of ef from the neutral face, ef
Figure imgf000036_0003
7. The method of claim -3, wherein the step of converting the hand-generated expression basis into a new expression basis comprises steps of: a) forming an approximate hyperplane out of the motion capture data or the facial mesh data; and b) identifying the orthogonal axes that spans the hyperplane.
8. The method of claim 7, wherein the step of identifying the orthogonal axes that spans a hyperplane comprises a step of using a principal component analysis (PCA) .
9. The method of claim 8, wherein the motion capture data are given in given in a 3NxL matrix Ξ = [V(I), . . . ,v(L)], where N is the number of vertices of mesh representing the motion capture data, wherein I, is the duration of the motion capture in number of frames.
10. The method of claim 7, wherein the hyperplane is formed by the cloud of points plotted Ξ regarding each of the expressions in the 3i\7-dimensional space.
11. The method of claim 7, wherein the step of identifying the orthogonal axes that spans the hyperplane comprises steps of: a) taking the mean of v,
Figure imgf000037_0001
, where the summation is taken over the entire motion capture data Ξ ; b) obtaining a centered point cloud, D = M') >">*(!) } f where vw —v(#)—/*. ancj c) constructing the covariance matrix C using
JL
12. The method of claim 11, wherein C is a symmetric positive-definite matrix with positive eigenvalues A1,....,A3N in order of magnitude, with A1 being the largest,
13. The method of claim 12, further comprising a step of obtaining the eigen faces from m eigenvectors
EPCΛ
Figure imgf000037_0002
, the principal axes .
14. The method of claim 13, wherein the coverage of the principal axes is given by ^^Λ/^ζ^Λ •
15. The method of claim 13, further comprising a step of converting the hand-generated expression basis into the quasi-eigen basis, the set of quasi-eigen faces, with the eigenfaces ready.
16. The method of claim 15, wherein the step of converting the hand-generated expression basis into the quasi- eigen basis comprises steps of: a) computing
Figure imgf000038_0001
where i ranges over all the hand-generated elements, and j ranges over all the principal axes; and b) obtaining the quasi-eigen faces by
Figure imgf000038_0002
17. The method of claim 16, further comprising a step of synthesizing a general expression by the linear combination .
Figure imgf000038_0003
18. The method of claim 17, wherein the weights vJf takes on both positive and negative values when the eigenfaces are used.
19. The method of claim 16, further comprising a step of taking ef to represent the full actuation of a single expression muscle with other muscles left relaxed for intrinsically ruling out the possibility of two hand- generated elements having almost identical shapes.
20. The method of claim 16, further comprising steps of: a) looking at the matrix w^"'0'^ =(w^C4-'σ-βE) ; b) determining if e™ is missing in the quasi-eigen basis by testing if
Figure imgf000039_0001
is less than a threshold ε ; c) augmenting the basis with e™ ; d) notifying the animator regarding the missing eigenface e™ .
21. The method of claim 16, further comprising a step of retargeting the facial expressions by feeding a predetermined expression weight vector to a deformation bases .
22. The method of claim 21, wherein the predetermined expression weight vector is obtained by minimizing
Figure imgf000039_0002
where d*. and efare the displacements of the j-th vertex of v* and efE , respectively, from v° .
PCT/KR2006/004423 2006-04-05 2006-10-27 Method for generating intuitive quasi-eigen faces WO2007114547A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/398,987 US7535472B2 (en) 2006-04-05 2006-04-05 Method for generating intuitive quasi-eigen faces
US11/398,987 2006-04-05

Publications (1)

Publication Number Publication Date
WO2007114547A1 true WO2007114547A1 (en) 2007-10-11

Family

ID=38563800

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2006/004423 WO2007114547A1 (en) 2006-04-05 2006-10-27 Method for generating intuitive quasi-eigen faces

Country Status (2)

Country Link
US (1) US7535472B2 (en)
WO (1) WO2007114547A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767453A (en) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 Face tracking method and device, electronic equipment and storage medium

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2006225115B2 (en) 2005-03-16 2011-10-06 Lucasfilm Entertainment Company Ltd. Three- dimensional motion capture
JP4842242B2 (en) * 2006-12-02 2011-12-21 韓國電子通信研究院 Method and apparatus for real-time expression of skin wrinkles during character animation
US8199152B2 (en) * 2007-01-16 2012-06-12 Lucasfilm Entertainment Company Ltd. Combining multiple session content for animation libraries
US8130225B2 (en) * 2007-01-16 2012-03-06 Lucasfilm Entertainment Company Ltd. Using animation libraries for object identification
US8542236B2 (en) * 2007-01-16 2013-09-24 Lucasfilm Entertainment Company Ltd. Generating animation libraries
US8144153B1 (en) 2007-11-20 2012-03-27 Lucasfilm Entertainment Company Ltd. Model production for animation libraries
KR101527408B1 (en) * 2008-11-04 2015-06-17 삼성전자주식회사 System and method for sensing facial gesture
WO2010074786A2 (en) * 2008-12-04 2010-07-01 Total Immersion Software, Inc. System and methods for dynamically injecting expression information into an animated facial mesh
US9142024B2 (en) * 2008-12-31 2015-09-22 Lucasfilm Entertainment Company Ltd. Visual and physical motion sensing for three-dimensional motion capture
US8614714B1 (en) * 2009-12-21 2013-12-24 Lucasfilm Entertainment Company Ltd. Combining shapes for animation
US9196074B1 (en) * 2010-10-29 2015-11-24 Lucasfilm Entertainment Company Ltd. Refining facial animation models
US9508176B2 (en) 2011-11-18 2016-11-29 Lucasfilm Entertainment Company Ltd. Path and speed based character control
WO2015042867A1 (en) * 2013-09-27 2015-04-02 中国科学院自动化研究所 Method for editing facial expression based on single camera and motion capture data
CN108154550B (en) * 2017-11-29 2021-07-06 奥比中光科技集团股份有限公司 RGBD camera-based real-time three-dimensional face reconstruction method
WO2019209431A1 (en) * 2018-04-23 2019-10-31 Magic Leap, Inc. Avatar facial expression representation in multidimensional space

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050063582A1 (en) * 2003-08-29 2005-03-24 Samsung Electronics Co., Ltd. Method and apparatus for image-based photorealistic 3D face modeling

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5844573A (en) * 1995-06-07 1998-12-01 Massachusetts Institute Of Technology Image compression by pointwise prototype correspondence using shape and texture information
US5880788A (en) * 1996-03-25 1999-03-09 Interval Research Corporation Automated synchronization of video image sequences to new soundtracks
US6188776B1 (en) * 1996-05-21 2001-02-13 Interval Research Corporation Principle component analysis of images for the automatic location of control points

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050063582A1 (en) * 2003-08-29 2005-03-24 Samsung Electronics Co., Ltd. Method and apparatus for image-based photorealistic 3D face modeling

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KIM I.-J. ET AL.: "Intuitive quasi-eigenfaces for facial animation", SUMMER WORKSHOP KCGS (KOREAN COMPUTER GRAPHICS SOCIETY), 3 July 2006 (2006-07-03) - 4 July 2006 (2006-07-04), pages 185 - 191 *
KING S.A. ET AL.: "Creating speech-synchronized animation", IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, vol. 11, no. 3, May 2005 (2005-05-01) - June 2005 (2005-06-01), pages 341 - 352, XP011128391 *
YU ZHANG ET AL.: "Hierarchical modeling of a personalized face for realistic expression animation", MULTIMEDIA AND EXPO, 2002. ICME'02. PROCEEDINGS. 2002 IEEE INTERNATIONAL CONFERENCE, vol. 1, 26 August 2002 (2002-08-26) - 29 August 2002 (2002-08-29), pages 457 - 460, XP010604404 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767453A (en) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 Face tracking method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
US20070236501A1 (en) 2007-10-11
US7535472B2 (en) 2009-05-19

Similar Documents

Publication Publication Date Title
US7535472B2 (en) Method for generating intuitive quasi-eigen faces
Sifakis et al. Simulating speech with a physics-based facial muscle model
Ezzat et al. Trainable videorealistic speech animation
Chai et al. Vision-based control of 3 D facial animation
Essa et al. Modeling, tracking and interactive animation of faces and heads//using input from video
Zhang et al. Geometry-driven photorealistic facial expression synthesis
Chuang et al. Mood swings: expressive speech animation
Chuang et al. Performance driven facial animation using blendshape interpolation
Ersotelos et al. Building highly realistic facial modeling and animation: a survey
Choe et al. Analysis and synthesis of facial expressions with hand-generated muscle actuation basis
US9129434B2 (en) Method and system for 3D surface deformation fitting
KR20120137826A (en) Retargeting method for characteristic facial and recording medium for the same
Oreshkin et al. Motion In-Betweening via Deep $\Delta $-Interpolator
Wampler et al. Dynamic, expressive speech animation from a single mesh
Pei et al. Transferring of speech movements from video to 3D face space
Agianpuye et al. 3d facial expression synthesis: a survey
Gralewski et al. Statistical synthesis of facial expressions for the portrayal of emotion
Chuang Analysis, synthesis, and retargeting of facial expressions
Wu et al. Image Comes Dancing With Collaborative Parsing-Flow Video Synthesis
KR100544684B1 (en) A feature-based approach to facial expression cloning method
Kim et al. Intuitive quasi-eigen faces
Shin et al. Expression synthesis and transfer in parameter spaces
Tang et al. Mpeg4 performance-driven avatar via robust facial motion tracking
Chen et al. 3D Facial Priors Guided Local-Global Motion Collaboration Transforms for One-shot Talking-Head Video Synthesis
Jiang et al. Animating arbitrary topology 3D facial model using the MPEG-4 FaceDefTables

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06843820

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06843820

Country of ref document: EP

Kind code of ref document: A1