WO2007118919A1

WO2007118919A1 - Method for generating synthetic-animation images

Info

Publication number: WO2007118919A1
Application number: PCT/ES2007/000235
Authority: WO
Inventors: Juan Coll Soler; Ãlvaro UÑA RESA
Original assignee: Emotique, S.L.
Priority date: 2006-04-19
Filing date: 2007-04-19
Publication date: 2007-10-25
Also published as: ES2284391A1; ES2284391B1

Abstract

Method that comprises the phases of: acquisition (1) of at least one frontal image (12) of the user - user's face and/or full body - preferably by means of a video or stills camera (13); recognition (2) of the user's facial and/or body features on the basis of the acquired images (12); generation (3) of a three-dimensional grid model (11) corresponding to the user's features, comprising the deformation of a pre-established neutral three-dimensional grid model (31), according to the features recognised and the superimposition of the images (12) in accordance with the modified structure; introduction of the model (11) into a virtual scenario (43), with other, personalized or pre-established models (41); synthesis (4) of modelling images, with or without interaction of the personalized models (11), pre-established models (41) and scenario (43) for display thereof; dumping (5) of the successive photograms obtained in the image synthesis (4) by an output device.

Description

DESCRIPTION

PROCEDURE FOR THE GENERATION OF ANIMATION IMAGES

SYNTHETIC

Object of the invention

The present invention relates to a method for generating synthetic animation ^* images for the leisure sector in general, such as the film industry, videogames and promotional advertising among others, in which the personalization of an image or animation is of interest. kinematics.

Background of the invention.

At present, the use of computers for the generation of synthetic animation images has extended notably in the leisure sector.

Both in the cinematography and in the world of videogames, the generation of characters and visual scenarios from complex mathematical models and virtual representations is already common.

In the case of cinematography, substitutions of characters and scenarios that have been generated by computer have been made using software tools that recreate three-dimensional spaces and objects, which by means of proper assembly can be superimposed with the previously recorded images of conventional actors. An example of them are the films "Star Wars" by Lucasfilm or "The Lord of the Rings" by Wingnut Films. The image quality obtained can be of such consideration that it is not even possible to differentiate the generated character from a normal person if it is not for their fantastic characteristics.

These techniques allow the overlapping of images of real people or characters generated in already recorded images on a single image. The purification in the use of this technique has led to the creation of films made entirely in synthetic image, for example "Shreck" of Dreamworks Pictures, in which the stage and the characters are pre-established three-dimensional models, leaving the voice dubbing as the only participation for the human being. The main drawback is that these techniques require highly trained specialized work personnel and a large number of computer resources. Another drawback is that it is a passive experience in which the user or spectator cannot intervene.

These techniques have been tried to use in Internet advertising, allowing to insert a photograph of a user, duly cropped, in a pre-recorded sequence, so as to provide the feeling that the user participates in said sequence, thus achieving an important promotional impact and attractiveness of the potential consumer. These techniques are limited to the fact that they are two-dimensional images, both the photograph of the user and the pre-recorded sequence, so that the feeling of "patch" is visible in a visible way, without being able to rotate the personalized face and make small modifications of the joint of the mouth by means of deformation systems of the image (known as "morphing") or substitution by layers (Macromedia's "Flash" system), with somewhat abrupt results.

In turn, in the videogames sector, many titles based on three-dimensional environments have been extended, which are subsequently synthesized or "rendered" by the hardware of the video game console for display on real-time television screens. These games include in the structure of their programming characters or objects previously made in pre-established models and scenarios also modeled three-dimensionally in a pre-established way for their interaction. The player through the appropriate control commands can manipulate one or several main characters and navigate the scenarios generated according to the intended script of the game, with the corresponding freedoms that have been assigned. The synthesization of images from the available 3D models does not reach the cinematographic image quality, but they are close enough that they are substantially credible and convincing on a television screen. However, the possibilities of personalization of the character by the user go through the assignment of a name and the configuration according to pre-established options, so that the main character of the game is not more than a combination of these options, without the possibility of performing a authentic customization.

Within the infographic industry, several attempts have been made to make models of three-dimensional heads and busts that could represent the factions of a user. Its use is more oriented to the so-called virtual speakers. An example is the software "3DMeNow!" of the company BioVirtual, Inc. that from two photographs, one in front and the other in the profile of the user's head allows the reconstruction of a virtual model of his head with a considerable similarity. In this software, the user must enter both images mentioned above and using point guides determine which areas of the images correspond to critical points, such as eyebrows, eyes or mouth. This process requires a remarkable handling ability and a certain dedication on the part of the user, to obtain a relatively good result.

There are also other programs that allow the superposition of a cropped image of the face directly on a three-dimensional model of a neutral head. This procedure, although simple, also does not produce effects of sufficient quality, since the morphology of the head is preset and the disparity of the deformation of the two-dimensional image on the original morphology is notoriously observed, by means of a deformation that can make the user unrecognizable in the model displayed, an example of a program that performs this function is "Poser" by Curious Labs, Inc.

Description of the invention

The procedure for the generation of synthetic animation images of this invention presents some technical peculiarities that allow the realization of audiovisual means of synthetic image that allow one or more users to personalize them to be able to configure one or more characters of the story or event described in the audiovisual medium with the face and / or body of said users.

In effect, the procedure seeks to obtain a method or system that allows a personalization of, for example, a film, with the participation of the user, whose image and similarity is easily incorporated into one of the characters. He - TO -

The procedure also allows the creation of videogames and interactive software in which the user can also configure one or more characters to his image and similarity, allowing the hardware of the machine that performs the image synthesis a virtual representation of the user in the graphic environment of said game.

The procedure has several phases that include:

- Acquisition of at least one user image.

- Recognition of the user's facial and / or body features - Generation of a personalized mathematical three-dimensional model.

- Introduction of the customized model in a virtual scenario.

- Synthesis of modeling images, dump by output device.

This procedure is integrated so that the user can configure the final result very easily and easily. However, a number of important steps have been provided in the procedure that allow considerable possibilities of interacting and modifying the work parameters, depending on the object to be obtained by executing the procedure.

The acquisition of at least one image of the user is carried out by means of a video camera, photography or the like, such as a "webcam" or a mobile camera. These images comprise the face of the user seen preferably from the front, this image being configured as a two-dimensional digital dot map.

The process of recognizing the user's facial and / or body features can be carried out by means of suitable software that executes an artificial intelligence algorithm applied to the artificial vision, already existing in the market, from said image in static format if the quality It is enough for the correct contrast. Although if the camera is connected to the computer that performs said recognition in real time, it allows the recognition to be performed automatically in real time. In the cases mentioned above, this recognition is executed automatically without the user having to handle any part of the algorithm. The objective of this algorithm is to automate the process of distinguishing between the various objects that they may appear in the image, discarding those that do not work and focusing on the only object of interest, which is the user's face. Once the region in the image that occupies the face is identified, the facial features of this face are identified, such as, for example, the position and size of the eyes, mouth, nose, and others. During said recognition, a contour pattern of the face is used, which comprises, for example, the eyebrows, the eyes, the nose, the lips, the jaw and the cheekbones, among others. This pattern adapts to the image automatically by the software, obtaining a personalized shift of key coordinates with respect to a neutral pattern. This displacement corresponds to the physiognomic characteristics of the user. In this way, with the pattern modified by the recognition and the image, it is possible to alter a model, for example of the head, to adapt it to the physiognomy of the user and thus be able to superimpose the image as texture with complete accuracy. It is possible that the model may have a beard or mustache if they have been detected in the recognition of the image.

With the data obtained through the recognition of artificial vision, a three-dimensional mesh of a neutral head model is deformed, so that this mesh ends up adjusting to the physiognomic characteristics of the user. Subsequently, the image of the user is applied on top of this custom mesh model, which will be perfectly accommodated, accurately reflecting the user's appearance.

In this model it is possible to make physiognomic modifications, by means of external modifiers and to adapt according to the wishes of the user or the requirements of the environment or script in which he develops the kinematic action, altering its representation. These modifications can be varied, such as adding some glasses, accessories or changing the type of hairstyle, among others.

Like the recognition in the case that the camera is directly connected to the computer or hardware that executes the algorithms, the modification can be made in real time.

As mentioned, the generated custom model can be a single portion of the character, such as the head, so that the generation of the complete three-dimensional model includes the use of portions of pre-established models, In addition to the custom models generated. An example is the coupling of a custom head model of the user to an already designed comic body. This hybrid model is fully manageable in the synthesis of modeling images.

This synthesis or "rendering" of the customized models, whether complete or hybrid, is performed in a virtual scenario or background, together with the pre-established models that are necessary to obtain the sequence of images that gives rise to the animation. The synthesization can be configured passively, that is, the production software generates a movie or sequence according to the script previously established for the animation and can be turned into a physical medium for later viewing or sent to a display device, such as a television or analog screen, as a cinematic medium, obtaining one or several movie sequences with at least one custom character.

This synthesis of images can occur in real time, so that the user is allowed to intervene in the scene and manage his character, a method especially suitable for video games. Thus, the user can have adequate control means for the interaction and alteration in real time of the characteristics of the customized model. These interactions may correspond to gestures and actions that represent facial emotions, such as laughing, crying or showing anger or body movements, such as speaking, opening and closing the eyes or body movements.

As mentioned, the image dump of the successive frames obtained in the synthesis of images can be sent to various devices. For example, digital files can be generated by way of cinematic sequences to be dumped into physical media (DVD or others) or sent to multimedia devices such as mobile phones. In another sense, the animation produced can be viewed in real time, for example on a television screen, cinematographic or broadcast on the Internet, interconnected computer networks or broadcasting. Description of the figures.

To complement the description that is being made and in order to facilitate the understanding of the characteristics of the invention, a set of drawings is attached to the present specification in which, for illustrative and non-limiting purposes, the following has been represented:

- Figure 1 shows a process flow scheme.

Figure 2 shows a diagram of the phases of capture and recognition of facial features from the image obtained.

Figure 3 shows a diagram of the phases of generation of the corresponding mathematical three-dimensional model with the recognized features of the user.

Figure 4 shows a scheme of the synthesis of a scene in which a character has been introduced with the custom model added and a second character consisting of a preset model, with user control means for interactive management. representative model of his character.

Preferred embodiment of the invention

As can be seen in the referenced figures for the generation of an animated sequence in which a three-dimensional model (6) of a character comprising a three-dimensional head model (11) modeled from the image (12) of the user and Another character formed from a three-dimensional model (41) preset for interaction according to a variable resolution script, the procedure comprises the following sequence of phases:

Acquisition (1) by a video camera (13) of a front image

(12) of the user's face, this video camera (13) being connected to a computer on which a program or software module for capturing said image is executed. This software module comprises an artificial intelligence module applied to the artificial vision, which makes a recognition (2) of the facial features from a neutral pattern (21) that adapts to a custom pattern (14) as they are the characteristics of the captured image (12) of the user.

Another software module evaluates the custom pattern (14) obtained and generates the three-dimensional mesh model (11) corresponding to the user's features, from the deformation of a pre-established neutral three-dimensional mesh model (31). In this generation, the surface textures obtained from the image (12) or acquired images are superimposed on said model (11). External modifiers (32) are also included, such as the design of glasses, for example, or physiognomic modifications, such as the modification of pointed ears among others. In a next phase the custom head model (11) is coupled to a portion of the preset body model (42), constituting a mixed three-dimensional model (6) and introduced into a virtual scenario (43) together with other preset models (41 ).

We proceed to the synthesis (4) of modeling images with the user's interaction through control means (7), such as game controls, for the indirect handling of points and curves of the custom model (6, 11 ) for the representation of facial gestures and emotions.

One dump (5) is produced per output device of the consecutive frames generated by a synthesizing hardware for display on television screen. It is also possible to proceed to the recording of said frames on physical media, such as DVD or other.

Once the nature of the invention has been sufficiently described, as well as an example of a preferred embodiment, it is stated for the appropriate purposes that the materials, shape, size and arrangement of the described elements may be modified, provided that this does not imply an alteration of the essential characteristics of the invention that are claimed below.

Claims

1.- Procedure for the generation of synthetic animation images in

Ia which preset models and scenarios are used characterized in that it comprises the phases of: - acquisition (1) of at least one frontal image (12) of the user, of his face and / or of the entire body, recognition (2) of the facial features and / or body of the user from the image or images acquired (12), generation (3) of a three-dimensional mesh model (11) corresponding to the user's features, comprising the deformation of a pre-established neutral three-dimensional mesh model ( 31) according to the recognized features and the superposition on said three-dimensional mesh model (11) of the textures obtained from the acquired images (12), - introduction of the three-dimensional model (11) in a virtual scenario

(43) with other custom or preset models (41), synthesization (4) of modeling images with or without interaction of the custom three-dimensional models (11), preset models (41) and stage (43) for viewing. - Dump (5) by output device of the successive frames obtained in the synthesis (4) of images.

2. Method according to claim 1, characterized in that the acquisition (1) of the image of the user (12) is by means of a video camera (13), photograph or analogue that allows the two-dimensional point map scanning.

3. Procedure according to claim 1, characterized in that the acquisition (1) and recognition (2) of facial features is by means of an artificial intelligence algorithm applied to artificial vision.

4. Method according to claim 1, characterized in that in the generation (3) of the three-dimensional model modifications of said model (11) are introduced by modifiers (32) to alter its representation.

5. Procedure according to claim 1, characterized in that the generation (3) of the three-dimensional model (6) comprises the use of pre-established model portions (42) and portions of custom generated models (11) for the formation of a usable model in the virtual scenario (43).

6. Method according to claim 1, characterized in that the synthesis (4) of modeling images is in real time.

7. Method according to any of claims 1 and 6, characterized in that it comprises user interaction by means of control (7) suitable for the real-time alteration of the characteristics of the custom model (6, 11) corresponding to gestures and actions that represent facial emotions or body movements.