WO2010081225A1 - Digital content creation system - Google Patents

Digital content creation system Download PDF

Info

Publication number
WO2010081225A1
WO2010081225A1 PCT/CA2010/000046 CA2010000046W WO2010081225A1 WO 2010081225 A1 WO2010081225 A1 WO 2010081225A1 CA 2010000046 W CA2010000046 W CA 2010000046W WO 2010081225 A1 WO2010081225 A1 WO 2010081225A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
input
user
ambiguity
representation
Prior art date
Application number
PCT/CA2010/000046
Other languages
French (fr)
Inventor
Herve Lange
Paul Nightingale
Original Assignee
Xtranormal Technology Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xtranormal Technology Inc. filed Critical Xtranormal Technology Inc.
Publication of WO2010081225A1 publication Critical patent/WO2010081225A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 

Definitions

  • the present invention relates to the process of creating and producing a presentation, film, audio-visual, multimedia or theatrical experience.
  • the production workflow of modern ads, television series, presentations, games, and movies typically use a waterfall method; i.e., the result from one stage is needed to complete the next.
  • Content for a given production is outlined (created), then written and re-written until it is deemed safe to pass on to the next (pre-production) stage; 'safe' typically means that the content won't have to be substantially rewritten after the subsequent stages have been started.
  • the process is repeated for each stage until it is produced, edited, scored and the finished product is shown to an audience.
  • Each stage in the workflow expands the work from previous stages in terms of complexity, specialization and collaboration.
  • each stage might have multiple versions of the same production, as is illustrated in Figure 1.
  • each stage might have multiple versions of the same production, as is illustrated in Figure 1.
  • one version might have a country setting as the background while another might take place in a city. It becomes difficult and costly to have all versions running simultaneously.
  • a result of this estrangement is a discrepancy between the 'vision' of the creator and the actual production. This discrepancy is either allowed to propagate through the stages leading to an unintended final production (with respect to the creator's original idea), or it is resolved after-the-fact in later stages, causing both a time and financial penalty. For this reason, the creators are being asked to better visualize their projects a priori as the budgets, required resources, and production risks increase.
  • a digital content creation system comprising: an interface device for receiving an input from a user, the input being descriptive of a storyline; a content generation device in communication with the interface device, for transforming the input into a content representation according to a data format, the content generation device analyzing the input and interacting with the user via the interface device when an ambiguity relating to the input is encountered, to correct the input prior to the transforming into the content representation; and a rendering device in communication with the content generation device, for receiving the content representation and rendering the content representation to generate at least one of audio and visual rendered content in real time, the rendered content for being displayed on the interface device to represent the storyline as intended by the user.
  • a method for creating digital content comprising: receiving an input descriptive of a storyline from a user; identifying an ambiguity in the input, the ambiguity relating to at least one of syntactic and logical aspect of the input; upon the identifying of the ambiguity, interacting with the user during a process of modifying the input to correct the ambiguity; once the input freed of the ambiguity, transforming the input into a content representation having a data format understandable by a rendering device; generating at least one of audio and visual rendered content in the rendering device based on the content representation; and playing the at least one of audio and visual rendered content to the user, to represent the storyline as intended by the user.
  • a computer readable media storing instructions for creating digital content, the instructions being readable by a processing device, for allowing the processing device to: receive an input descriptive of a storyline from a user; identify an ambiguity in the input, the ambiguity relating to at least one of syntactic and logical aspect of the input; upon the identifying of the ambiguity, interact with the user during a process of modifying the input to correct the ambiguity; once the input freed of the ambiguity, transform the input into a content representation having a data format understandable by a rendering device; generate at least one of audio and visual rendered content in the rendering device based on the content representation; and play the at least one of audio and visual rendered content to the user, to represent the storyline as intended by the user.
  • the term "content” is to be understood as including computer generated animation, filmed video, audio recordings, pictures, series of pictures, and any other type of content that may be generated as understood by a person skilled in the art.
  • the term “real-time” is not to be construed as being limiting with respect to a particular time frame, but rather to indicate that a result is obtained reasonably fast in the instants following a request.
  • Fig. 1 is a three-dimensional pyramid illustrating the process of creation according to the prior art
  • FIG. 2 is a block diagram of a digital content creation system implementing a multi-level validation loop in accordance with an embodiment
  • FIG. 3 is a detailed schematic flowchart illustrating a multi-level validation loop process in accordance with an embodiment
  • Fig. 4 is a detailed schematic flowchart illustrating a multi-level validation loop process, with advanced features, in accordance with an embodiment
  • FIG. 5a is a block diagram of an interactive version of the content generation device of Figures 3 and/or 4, in accordance with an embodiment
  • FIG. 5b is a block diagram of a non-interactive version of the content generation device of Figures 3 and/or 4, in accordance with an embodiment
  • FIG. 6a is a block diagram illustrating one version of the rendering device of Figures 3 and/or 4, using automatically defined camera angle and framing, in accordance with an embodiment
  • FIG. 6b is a block diagram illustrating another version of the rendering device of Figures 3 and/or 4, using user-defined camera angle and framing, in accordance with an embodiment
  • FIG. 6c is a block diagram illustrating yet another version of the rendering device of Figures 3 and/or 4, with an automatic compositing and editing device (ACED), in accordance with an embodiment
  • FIG. 7a is a block diagram of one version of the content recombination device of Fig. 4, with a user input generation device, in accordance with an embodiment
  • FIG. 7b is a block diagram of another version of the content recombination device of Fig. 4, with a modification preserving recombination device, in accordance with an embodiment
  • FIG. 8 is a flowchart of a method for creating digital content, in accordance with an embodiment.
  • FIG. 9 is a schematic illustration of a digital content creation system, in accordance with an embodiment.
  • the toolset is focused on creativity and idea/story development.
  • the point of entry is a character, sketch, idea or materials (i.e. an image or a drawing) and later develops into something deeper and more complicated where the details are revealed, finishing with an integrated presentation and/or moving picture audio experience whether a multimedia presentation, a web or television ad or a theatrical movie.
  • the digital toolset allows users to create ideas and then to move as far down the production process as they wish. It provides users with entry level and simplified tools to develop concepts and an increasingly more complicated set of tools to develop and build prototypes or create draft versions and eventually, if they wish, to finish presentations, ads or movies and to output them in the desired format.
  • the user may manipulate the whole breadth of the project.
  • the user has access to the whole process, from implementing the idea (be it through written text, videos, pictures, audio or programming languages) to seeing an edited final animation complete with sound, through an all encompassing, breadth-first approach.
  • the toolset is not shallow, but it prioritizes the functions that help the user create an overview of his project with ease, rather than forcing him to finesse details. Handshaking with other software packages is used when a greater focus on one aspect of the production is necessary.
  • This breadth-first approach makes it possible to visualize the content in a filmic form and make adjustments in that context.
  • the toolset uses two validation loops: one logical and the other audio/visual.
  • the idea is to have a semi-automatic process where the automatic processes within the digital toolset encourage and assist the user to validate his ideas. This involves the use of feedback loops where the user is needed to validate the work done by the digital toolset.
  • the intent of these loops is twofold: one, it provides simple corrective feedback to eliminate errors and ambiguities in the original input; and second, it provides creative feedback to help the user complete the project at hand. Going through the process of the logical validation loop helps the user realize the missing or unclear aspects of his project. In doing so, the toolset teaches the user to create in a more clear and succinct way.
  • the audio/visual validation loop is meant to validate the visual and/or audio interpretation of the content to be generated by the digital toolset. Not only does this feedback help the user rectify his original input with the digital toolset's interpretation, but it can also supply the user with new ideas enabling him to take the project in a new direction. Combined, these two validation loops give the user complete logical and artistic control over the final product.
  • NLP natural language processing
  • the digital toolset will not be limited to text input (as will be detailed in the figure descriptions) and thus the toolset will also not be limited to using NLP.
  • Natural language processing is a branch of artificial intelligence that deals with analyzing, understanding and generating the languages that humans use naturally (as opposed to computer languages) in order to interface with computers in both written and spoken contexts. Natural language is one of many interface styles that can be used in the dialog between a human user and a computer.
  • Story understanding is a difficult problem in artificial intelligence and computational linguistics.
  • the main difficulty arises from the computer program making inferences about things, states and events (the world) not explicitly described in the text, in order to understand a story in text. This is typically because the text does not contain all the information needed to fully describe the scene's elements and their animations.
  • a fully autonomous text-to- animation conversion system - generation oriented - may make assumptions about the story that are not coherent with the writer's assumptions (i.e. the system does not understand the user's intentions).
  • the present digital toolset provides a "partially autonomous" process that uses user input to resolve ambiguities.
  • a partially autonomous process involving a dialog oriented interactive process helps not only the user, but also the conversion of the text into animation.
  • partially autonomous it is meant a combination of an automatic module (comprising NLP) and an interactive module enabling user interaction.
  • NLP automatic module
  • the tool teaches the user to develop a more precise writing style by avoiding ambiguities, and thus helps the user in becoming a better writer.
  • a question might be: to specify a gender of a new name, to specify an interlocutor during a dialog, to specify a character reference for a pronoun in a sentence such as 'he walks to the door'; i.e. which character is 'he', etc.
  • Audio/Visual validation of the story structure involves, in one embodiment, automatic generation of shots (actions, audio, cameras and/or lights) to help the user refine and/or change the action sequence by synchronizing what he/she is thinking and what has been written (for example, by showing the user missing actions or information in the text).
  • the audio/visual validation helps the user control time and rhythm through an action's duration. For example, the system shows ambiguous, missing and/or non-useful actions between actors; shows performances that are too slow/fast, etc. This helps the user to understand, optimize and determine the complexity and costs for scenes and shots.
  • Figure 2 illustrates an embodiment of a system 100 implementing multilevel validation loop, as it applies to storytelling or scriptwriting.
  • An iterative process using several devices with feedback is provided to help a writer create better stories that can be quickly and easily produced into final products (TV shows, movies, presentations, plays, etc.).
  • the process starts when a user (writer/creator) enters a story on an Interface Device (101) which passes the input to a Content Generation Device (102).
  • the Content Generation Device (102) analyzes and understands the input (story) and transforms it into a data format (referred to as content representation) that can be rendered by the Rendering Device (103).
  • the Content Generation Device (102) can also interact directly with the Interface Device (101), providing feedback to the user.
  • LDL Logical Validation Loop
  • the content representation outputted by the CGD (102) is treated in the Rendering Device (103), an output is released from the Rendering Device (103) and sent to a Visualization Device (104). It is this device (104) that provides visual and/or audio feedback to the user through the Interface Device (101) (and can also provide feedback directly to the Content Generation Device 102). This second interaction with the user is defined as the Audio-visual Validation Loop (AWL) 108.
  • ADL Audio-visual Validation Loop
  • the final rendered story can be further validated, and/or output to an optional editing software, or as a final document illustrated as element (105).
  • the story will consist of text.
  • the story is or contains other media such as video, audio, image (pictures) and/or computer programming language data.
  • the Interface Device (101) itself is implemented on a computer (or any other processing device), but it could also be a video camera, audio recorder or other similar device. Further details of the Content Generation Device (102) and Rendering Device (103) will be shown and described below in Figures 5a, 5b and Figures 6a, 6b and 6c respectively.
  • a self- contained software application which runs on a single desktop computer implements both the system and the method.
  • each device, or combination of devices herein described are implemented as distinct devices, at separate physical locations, and communicate with each other using any connectivity link or dedicated channel, such as via a network (i.e. internet, a WAN, LAN or any other similar link/network).
  • a device or a set of devices are each implementable as either a client (installed application on a desktop computer for example), a web application (running through a web browser for example) or simply run on a server.
  • the Interface Device, the Visualization Device and the Validation device run as either a client or a web application, while the Content Generation Device and the Rendering Device run on a server (or network of servers) since they are the two most computationally expensive Devices.
  • the application can run completely on a single computer, such as in the case of a desktop application or Rich Internet Applications (RIA).
  • FIG. 3 is a flow chart for a simplified multilevel validation loop.
  • a logical validation loop (LVL) 211 a logical validation loop (LVL) 211
  • ADL Audio/Visual validation loop
  • the purpose of these loops is to ensure coherence between the intent of the user and the output of the system.
  • a user (201) inputs data (202) through the use of text, speech, computer programming language, videos or pictures into a Content Generation Device (102).
  • the data is usually in the form of a story, but need not be limited as such.
  • the Content Generation Device is a device that converts the input data into a data format called content representation (204).
  • Content representation is a data structure. This content representation then undergoes a logical validation (205) through an interaction with the user and/or predefined default settings (LVL loop 211). If there is incomplete information and/or ambiguity in the content representation, then the user will be questioned for clarification. The corrected input will then be returned to the Content Generation Device (102) to be converted again into content representation 204; thus closing the logical loop.
  • the content representation data (204, 204') is also sent to the Rendering Device (103) where it is converted into a data format called Rendered Content (207).
  • Rendered Content An example of rendered content is a 3D rendered image. Further details of the Rendering Device (103) will be outlined with reference to Figures 6a, 6b and 6c.
  • the Rendered Content 207 undergoes an Audio/Visual Validation (208) through another interaction with the user. This is to say that in the AWL loop 212, the Rendered Content 207 is visually inspected by the user to ensure that it conforms to a 'vision' the user initially had.
  • the user has the option to change the initial input and proceed again to generate a new or updated content representation 204 and a new or updated rendered content 207 via a second iteration within the Content Generation Device 102 and the Rendering Device 103 respectively.
  • the closing of the Audio/Visual Validation loop 212 occurs once the user is satisfied with the presented rendered content 207.
  • the data is communicated to an Optional Validation Device (105), which is implemented, in one embodiment, as an editing and/or compositing device(s). Following that, the data is outputted as a final product.
  • the rendered content data proceeds to another external device(s) represented as the End element (210).
  • the final product can be formatted according to various data formats.
  • FIG 4 shows a more complex embodiment of the flowchart illustrated in Figure 3.
  • the Audio/visual validation loop AWL 212 has a secondary path that allows the user more control.
  • the user has access to a Content User Interface (CUI) (309) adapted to directly modify the Content Representation (204) through the Content Recombination Device (310).
  • the CUI is a graphical user interface that allows the user to modify the content; for example it could be an interface that allows the user to reposition characters/objects in a scene, or change the camera positions, or change the properties of objects, etc.
  • the Content Recombination Device 310 may be provided in different ways.
  • the Content Representation 204 reads and updates the Content Representation 204 directly and notifies the user of any change without actually changing the user's original input (202).
  • it may change the original input (202) to reflect any number of changes made through the CUI.
  • These changes to the input, when run through the Content Generation Device (102), will reflect the changes made via the Content User Interface 309.
  • the following is a specific example based on textual input with a Content Generation Device 102 based on natural language programming.
  • the input data (202) be a text describing a conversation between two people that takes place in a living room.
  • the Content Representation (204) would be a data structure describing the scene, characters and dialogue while the Rendered Content (207) would be a 3D view of the living room containing the characters as they dialogue. The user might then want to change the location of one of the characters, such as have him stand by a window during the conversation.
  • the user would have two ways to make that change: one, change the text (202) directly through the use of a User Input Device (201) by writing something like "Bob is standing next to the window as he talks to Mary"; or, he uses the Content User Interface (309) which might be an interactive view of the living room to physically drag Bob (with the computer mouse) to the window.
  • the Content Recombination Device (310) can itself automatically change the original text input (202) to reflect the character being dragged to the window (for example by automatically inserting a sentence such as "Bob is standing next to the window as he talks to Mary"), or it can leave the text unaltered and simply change the Content Representation to reflect the fact that Bob is standing by the window, which will remain in memory even though the text does not show the change.
  • Figures 5a and 5b show two different embodiments of the Content Generation Device: one is an interactive correction version whereby the user is involved (figure 5a) in the correction process, and another is a non-interactive version whereby the user is not involved in the correction process but still involves the user assessing the correction made ( Figure 5b).
  • the input to the Content Generation Device is either the data from the user ((202) or data from the Content Recombination Device (310).
  • the output is the Content Representation data structure.
  • the Content Understanding Device or CUD ((401) and (413)) attempts to 'understand' the input and convert it to a symbolic representation ((405) and (418)).
  • Symbolic Representation of data could be a graph of goals and descriptions in the form of models ordered in time and space (see international publication WO 2008/148211 entitled: "Time-ordered templates for text-to-animation system").
  • the CUD uses information from a knowledge database ((402) and (414)) or information from the Content Recombination Device ((403) and (415)) to interpret the input. If the input is clear, then the CUD can proceed to convert the data into the symbolic representation.
  • This Symbolic Representation can then be converted into the Content Representation data format by one or multiple Content Enrichment Devices (CED's) ((406),(407) in Fig.
  • CED's Content Enrichment Devices
  • CED's also use a knowledge database ((408) and (410) in Fig. 5a and (421), (423) in Fig. 5b) and/or data from the Content Recombination Device ((409), (411) and/or (412) in Fig. 5a and (422), (424) and/or (425) in Fig. 5b) in order to enrich or enhance the content.
  • a knowledge database ((408) and (410) in Fig. 5a and (421), (423) in Fig. 5b) and/or data from the Content Recombination Device ((409), (411) and/or (412) in Fig. 5a and (422), (424) and/or (425) in Fig. 5b) in order to enrich or enhance the content.
  • a few examples of CED's are: 1) Automatic placement of objects/entities, 2) Automatic cinematographer, and 3) Automatic music generation. For more on the automatic cinematographer (refer to international publication WO 2009/055929 entitled: "Automated cinema
  • the CUD will stop processing the input when it encounters an error and it will question the user using the dialogue interface (404) in Fig. 5a, in an attempt to resolve the error.
  • Errors are usually caused by inconsistencies or ambiguities in the input. For example language ambiguities are often a problem within written text.
  • the CUD will continue to process the remaining data.
  • the user has the option of aborting if he/she feels the issue is irresolvable (e.g. abort element (430) in Fig. 5a). This process of error resolution involving the user is a logical validation loop.
  • Non-interactive version In this embodiment, the CUD will attempt to correct any errors automatically using default settings as provided in memory element (416) in Fig. 5b, and data retrieved from a knowledge database (432). It does not look for validation from the user; rather it highlights the changed content as provided to archiving element (417) still in Fig. 5b, to alert the user that a change was made.
  • Interactive solution Ask the user "Who does 'He' refer to?”.
  • Non-interactive solution Default settings could be used to dictate that an unreferenced pronoun like 'He' would always be associated with the last person named (of a same gender as the pronoun) in the text before the word 'He'; i.e. it would assume that 'He' was Paul.
  • the user input is plain language text or textual commands.
  • This input is converted to a symbolic representation by the Content Understanding Device ((401) and (413)) which is based on a text-to-speech conversion technology.
  • the Symbolic Representation ((405) and (418)) is a data structure of phonetic events, such as English language phonemes, prosodic information such as emphasis on syllable and accentuation, and timings.
  • the knowledge database ((402) or (414)) to accomplish this consists of (but is not limited to) phonetic rules, intonation rules and audio files containing audio/voice data. This symbolic representation is then the input to several CED's.
  • a CED is a Dialog Generation System and another is an Automatic Cinematographer.
  • a Dialog Generation System is a device that can automatically determine animations from phonetic events.
  • the knowledge database for this contains rules for the transition between animations to select the most appropriate (statistically speaking) next animation from the existing dialog status.
  • the output of the CED is Content Representation in the form of a goal graph data structure.
  • Figures 6a, 6b, and 6c shows three different embodiments for the Rendering Device 103. All three embodiments take the Content Representation (204) and convert into Rendered Content (207).
  • the first embodiment ( Figure 6a) uses an Automatic Framing Device (501) that automatically chooses camera angles and framing. Once the content is framed, a standard 2D or 3D Rendering Device (502) renders the content (refer to international patent publication WO 2008/124941 entitled: "Digital representation and animation of physical objects"; international patent publication WO 2009/006727 entitled: “Modeling the motion of articulated objects”; and patent international patent publication WO 2009/033290 entitled: “Character animation of legged figures”).
  • a second embodiment employs a user-defined cameras and framing input (503) for rendering the content in rendering device (504).
  • a third embodiment uses an Automatic Compositing and Editing Device (ACED) (505) linked to a picture, video and/or audio database (506).
  • the ACED (505) chooses appropriate images, video and/or audio data from the database in order to accurately convert the Content Representation into the Rendered Content.
  • An example of an ACED is a device that searches a database for images of a living room if the user input describes a scene in a living room.
  • the ACED could be based on audio: for example a CED ((406), (407) in Fig. 5a and/or (419), (420) in Fig.
  • FIGs 7 a and 7b each illustrate a different embodiment of an expanded view of the Content Recombination Device (CRD) (310) from Figure 4. Both embodiments illustrated take user input from the Content User Interface (309) to directly change the content representation (601). This can be accomplished by using the computer mouse to directly alter the content representation (for example by dragging a character across the scene).
  • the change passes through a User Input Generation Device (602) to alter the original user input (202) to reflect the changes made (for example: this could be an automatic rewriting of the input text).
  • the change passes through a Recombination Device (604) that preserves the modifications to the content without changing the original input (202).
  • This Recombination Device (310) of Fig. 7b would both read and update the Content Representation (204) directly.
  • Fig. 8 is a flowchart of a method 700 for creating digital content, in accordance with an embodiment.
  • the method is implemented as a computer readable media storing coding forming part of a software application for example, which when run on a processing device, implements the steps of the method 800 as herein described.
  • a step 7 02 involves first receiving a user input as described hereinabove, which is descriptive of a storyline for example. [0064] Then, as illustrated by step 704, the user input is analyzed to determine any presence of an ambiguity (or an error) in the input. In this step, at least one ambiguity relating to a syntactic and/or a logical aspect of said input for example is identified.
  • step 706 involves interacting with the user via a user interface device for example, during a process of modifying the input to correct the ambiguity.
  • the input is thereby freed of the ambiguity (i.e. by removing the ambiguity, or correcting the input accordingly).
  • Such interacting involves, in one embodiment, querying the user to obtain user feedback associated with the error in question, as described extensively hereinabove.
  • the interacting involves correcting the error automatically using predefined correction settings and knowledge data, as extensively described hereinabove.
  • steps 704 and 706 are re-iterated according to a loop process, until all errors are identified and corrected based on user interaction.
  • step 708 the user input is transformed into a content representation taking on a pre-defined data format which is understandable by a rendering device capable of rendering digital media from the data of the content representation.
  • step 710 the content representation is rendered by a rendering device that generates rendered content as described hereinabove.
  • the rendered content is available, at step 712, it is presented to the user via an output device.
  • Step 712 involves, playing the rendered content to the user.
  • the rendered content has either or both audio and/or a visual content and can be played on either or a combined audio device and/or visual display output device.
  • FIG. 9 there is illustrated a schematic of a digital content creation system 800 in accordance with an embodiment where the system is implemented as a combination of hardware components, including a processor 802, a memory device 804, a database (or a set of databases) 806, a user interface 808 and an audio device and/or visual display device 810.
  • the combination of the processor 802, memory 804 and database 806 is implemented as a general computer device 812 for example.
  • the memory 804 stores coding which when run by the processor 802 function to implement the steps of a method as the one described in relation to Fig. 8 for example.

Abstract

There is described a digital content creation system comprising: an interface device for receiving an input from a user, the input being descriptive of a storyline; a content generation device in communication with the interface device, for transforming the input into a content representation according to a data format, the content generation device analyzing the input and interacting with the user via the interface device when an ambiguity relating to the input is encountered, to correct the input prior to the transforming into the content representation; and a rendering device in communication with the content generation device, for receiving the content representation and rendering the content representation to generate at least one of audio and visual rendered content in real time, the rendered content for being displayed on the interface device to represent the storyline as intended by the user.

Description

DIGITAL CONTENT CREATION SYSTEM
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority from US provisional patent application 61/144,286 filed January 13, 2009 and entitled "DIGITAL CONTENT CREATION SYSTEM".
TECHNICAL FIELD
[0002] The present invention relates to the process of creating and producing a presentation, film, audio-visual, multimedia or theatrical experience.
BACKGROUND
[0003] The overall process of creating and producing a presentation, film, audiovisual, multimedia or theatrical experience usually involves a number of people with varying degrees of knowledge and expertise using specialized tools. The complex and sometimes confused interaction of all these people results in final productions that are often over-budget and that scarcely resemble the original idea of the creator/writer.
[0004] The production workflow of modern ads, television series, presentations, games, and movies typically use a waterfall method; i.e., the result from one stage is needed to complete the next. There are four general stages to a typical production workflow: creation, pre-production, production and post-production. Content for a given production is outlined (created), then written and re-written until it is deemed safe to pass on to the next (pre-production) stage; 'safe' typically means that the content won't have to be substantially rewritten after the subsequent stages have been started. The process is repeated for each stage until it is produced, edited, scored and the finished product is shown to an audience. Each stage in the workflow expands the work from previous stages in terms of complexity, specialization and collaboration. For this reason, it is usually represented by a pyramid. Furthermore, the pyramid can become 3D when we consider that each stage might have multiple versions of the same production, as is illustrated in Figure 1. For example, one version might have a country setting as the background while another might take place in a city. It becomes difficult and costly to have all versions running simultaneously.
[0005] One problem is that creators of the ideas and stories are becoming further and further removed from the production process. This is primarily due to the specialization required in later stages. Once the ideas are conceived, developed and approved they are removed from the creator's control and passed on to technicians for enhancement and alteration using digital tools. Unfortunately in this process it is the creators (the very people trying to communicate with customers or audiences) that are relegated to the background.
[0006] A result of this estrangement is a discrepancy between the 'vision' of the creator and the actual production. This discrepancy is either allowed to propagate through the stages leading to an unintended final production (with respect to the creator's original idea), or it is resolved after-the-fact in later stages, causing both a time and financial penalty. For this reason, the creators are being asked to better visualize their projects a priori as the budgets, required resources, and production risks increase.
[0007] Another problem is that the relationship between the production stages is poorly addressed by existing digital tool packages even though flexibility and interaction between the stages is key if a production is to reach the final stage. To complicate things further, many of the existing software packages come from different companies which results in incompatibility which in turn means a loss of production time.
[0008] There is therefore a need for an integrated solution that addresses each production level to make life easier for users and save owners money.
SUMMARY
[0009] There is described herein a digital toolset for use by creators and all members of the pre-production, production, and post-production worlds to help visualize and produce high-quality content in a simplified and natural way. The tools can be used early in the production process, such as to help foster creativity, or can be used at any other stage throughout the production process to generate alternate versions of a production.
[001O] In accordance with an embodiment, there is provided a digital content creation system comprising: an interface device for receiving an input from a user, the input being descriptive of a storyline; a content generation device in communication with the interface device, for transforming the input into a content representation according to a data format, the content generation device analyzing the input and interacting with the user via the interface device when an ambiguity relating to the input is encountered, to correct the input prior to the transforming into the content representation; and a rendering device in communication with the content generation device, for receiving the content representation and rendering the content representation to generate at least one of audio and visual rendered content in real time, the rendered content for being displayed on the interface device to represent the storyline as intended by the user.
[0011] According to an embodiment, there is provided a method for creating digital content, the method comprising: receiving an input descriptive of a storyline from a user; identifying an ambiguity in the input, the ambiguity relating to at least one of syntactic and logical aspect of the input; upon the identifying of the ambiguity, interacting with the user during a process of modifying the input to correct the ambiguity; once the input freed of the ambiguity, transforming the input into a content representation having a data format understandable by a rendering device; generating at least one of audio and visual rendered content in the rendering device based on the content representation; and playing the at least one of audio and visual rendered content to the user, to represent the storyline as intended by the user.
[0012] According to an embodiment there is provided a computer readable media storing instructions for creating digital content, the instructions being readable by a processing device, for allowing the processing device to: receive an input descriptive of a storyline from a user; identify an ambiguity in the input, the ambiguity relating to at least one of syntactic and logical aspect of the input; upon the identifying of the ambiguity, interact with the user during a process of modifying the input to correct the ambiguity; once the input freed of the ambiguity, transform the input into a content representation having a data format understandable by a rendering device; generate at least one of audio and visual rendered content in the rendering device based on the content representation; and play the at least one of audio and visual rendered content to the user, to represent the storyline as intended by the user.
[0013] In this specification, the term "content" is to be understood as including computer generated animation, filmed video, audio recordings, pictures, series of pictures, and any other type of content that may be generated as understood by a person skilled in the art. The term "real-time" is not to be construed as being limiting with respect to a particular time frame, but rather to indicate that a result is obtained reasonably fast in the instants following a request.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Further features and advantages of the present disclosure will become apparent from the following detailed description, taken in combination with the appended drawings, in which:
[0015] Fig. 1 is a three-dimensional pyramid illustrating the process of creation according to the prior art;
[0016] Fig. 2 is a block diagram of a digital content creation system implementing a multi-level validation loop in accordance with an embodiment;
[0017] Fig. 3 is a detailed schematic flowchart illustrating a multi-level validation loop process in accordance with an embodiment; [0018] Fig. 4 is a detailed schematic flowchart illustrating a multi-level validation loop process, with advanced features, in accordance with an embodiment;
[0019] Fig. 5a is a block diagram of an interactive version of the content generation device of Figures 3 and/or 4, in accordance with an embodiment;
[0020] Fig. 5b is a block diagram of a non-interactive version of the content generation device of Figures 3 and/or 4, in accordance with an embodiment;
[0021] Fig. 6a is a block diagram illustrating one version of the rendering device of Figures 3 and/or 4, using automatically defined camera angle and framing, in accordance with an embodiment;
[0022] Fig. 6b is a block diagram illustrating another version of the rendering device of Figures 3 and/or 4, using user-defined camera angle and framing, in accordance with an embodiment;
[0023] Fig. 6c is a block diagram illustrating yet another version of the rendering device of Figures 3 and/or 4, with an automatic compositing and editing device (ACED), in accordance with an embodiment;
[0024] Fig. 7a is a block diagram of one version of the content recombination device of Fig. 4, with a user input generation device, in accordance with an embodiment;
[0025] Fig. 7b is a block diagram of another version of the content recombination device of Fig. 4, with a modification preserving recombination device, in accordance with an embodiment;
[0026] Fig. 8 is a flowchart of a method for creating digital content, in accordance with an embodiment; and
[0027] Fig. 9 is a schematic illustration of a digital content creation system, in accordance with an embodiment. [0028] It will be noted that throughout the appended drawings, like features are identified by like reference numerals.
DETAILED DESCRIPTION
[0029] The toolset is focused on creativity and idea/story development. The point of entry is a character, sketch, idea or materials (i.e. an image or a drawing) and later develops into something deeper and more complicated where the details are revealed, finishing with an integrated presentation and/or moving picture audio experience whether a multimedia presentation, a web or television ad or a theatrical movie. The digital toolset allows users to create ideas and then to move as far down the production process as they wish. It provides users with entry level and simplified tools to develop concepts and an increasingly more complicated set of tools to develop and build prototypes or create draft versions and eventually, if they wish, to finish presentations, ads or movies and to output them in the desired format.
[0030] Using the described toolset, the user may manipulate the whole breadth of the project. The user has access to the whole process, from implementing the idea (be it through written text, videos, pictures, audio or programming languages) to seeing an edited final animation complete with sound, through an all encompassing, breadth-first approach. The toolset is not shallow, but it prioritizes the functions that help the user create an overview of his project with ease, rather than forcing him to finesse details. Handshaking with other software packages is used when a greater focus on one aspect of the production is necessary. This breadth-first approach makes it possible to visualize the content in a filmic form and make adjustments in that context.
[0031] An example of a breadth first approach to adding a table in a scene: by scaling a white cylinder and moving it up above the ground, a user can simply and quickly suggest the shape of a round table, which is enough to express an idea. Conversely, in a focused approach, a complete modeling software would help the user show a more detailed table, but the added level of detail would not necessarily add to the idea.
[0032] The toolset uses two validation loops: one logical and the other audio/visual. The idea is to have a semi-automatic process where the automatic processes within the digital toolset encourage and assist the user to validate his ideas. This involves the use of feedback loops where the user is needed to validate the work done by the digital toolset. The intent of these loops is twofold: one, it provides simple corrective feedback to eliminate errors and ambiguities in the original input; and second, it provides creative feedback to help the user complete the project at hand. Going through the process of the logical validation loop helps the user realize the missing or unclear aspects of his project. In doing so, the toolset teaches the user to create in a more clear and succinct way. Similarly, the audio/visual validation loop is meant to validate the visual and/or audio interpretation of the content to be generated by the digital toolset. Not only does this feedback help the user rectify his original input with the digital toolset's interpretation, but it can also supply the user with new ideas enabling him to take the project in a new direction. Combined, these two validation loops give the user complete logical and artistic control over the final product.
[0033] The following is an example of an embodiment of the digital toolset based on text input and natural language processing (NLP); however, it should be noted that the digital toolset will not be limited to text input (as will be detailed in the figure descriptions) and thus the toolset will also not be limited to using NLP. Natural language processing is a branch of artificial intelligence that deals with analyzing, understanding and generating the languages that humans use naturally (as opposed to computer languages) in order to interface with computers in both written and spoken contexts. Natural language is one of many interface styles that can be used in the dialog between a human user and a computer.
[0034] If, for example, a user wants to animate a story/idea, he could start by writing text. The automatic generation of animations from this text is called text-to- animation conversion (see for example United States patent publication No. US 2008/0215310 entitled "Method and system for mapping a natural language text into animation").
[0035] The conceptual bridge between a story (such as in text) and animation is story understanding (SU). Story understanding is a difficult problem in artificial intelligence and computational linguistics. The main difficulty arises from the computer program making inferences about things, states and events (the world) not explicitly described in the text, in order to understand a story in text. This is typically because the text does not contain all the information needed to fully describe the scene's elements and their animations. A fully autonomous text-to- animation conversion system - generation oriented - may make assumptions about the story that are not coherent with the writer's assumptions (i.e. the system does not understand the user's intentions). The present digital toolset provides a "partially autonomous" process that uses user input to resolve ambiguities. Instead of making arbitrary assumptions about the story, a partially autonomous process involving a dialog oriented interactive process for example, helps not only the user, but also the conversion of the text into animation. By "partially autonomous", it is meant a combination of an automatic module (comprising NLP) and an interactive module enabling user interaction. By integrating NLP into such a process, language ambiguity, logical ambiguity, idiosyncrasy and dependence on world knowledge is eliminated, or at the very least mitigated. The tool teaches the user to develop a more precise writing style by avoiding ambiguities, and thus helps the user in becoming a better writer.
[0036] In writing a story through dialoging with the machine, the user/writer is taken through a multilevel validation loop process as schematically illustrated in Figure 2, Figure 3 and Figure 4, in accordance with various embodiments explained herein below. Logical validation of the story structure is done through syntactic correction. For example, the syntactic correction performed by the system shows in the text, ambiguous writing, missing details, subjects, targets, destinations, etc. Such a correction may also be done through interactive questioning of the user (for example, asking the user to specify and/or add descriptions / details). For example, a question might be: to specify a gender of a new name, to specify an interlocutor during a dialog, to specify a character reference for a pronoun in a sentence such as 'he walks to the door'; i.e. which character is 'he', etc.
[0037] Audio/Visual validation of the story structure involves, in one embodiment, automatic generation of shots (actions, audio, cameras and/or lights) to help the user refine and/or change the action sequence by synchronizing what he/she is thinking and what has been written (for example, by showing the user missing actions or information in the text). The audio/visual validation helps the user control time and rhythm through an action's duration. For example, the system shows ambiguous, missing and/or non-useful actions between actors; shows performances that are too slow/fast, etc. This helps the user to understand, optimize and determine the complexity and costs for scenes and shots.
[0038] Figure 2 illustrates an embodiment of a system 100 implementing multilevel validation loop, as it applies to storytelling or scriptwriting. An iterative process using several devices with feedback is provided to help a writer create better stories that can be quickly and easily produced into final products (TV shows, movies, presentations, plays, etc.). The process starts when a user (writer/creator) enters a story on an Interface Device (101) which passes the input to a Content Generation Device (102). The Content Generation Device (102) analyzes and understands the input (story) and transforms it into a data format (referred to as content representation) that can be rendered by the Rendering Device (103). The Content Generation Device (102) can also interact directly with the Interface Device (101), providing feedback to the user. This interaction between the user and the Content Generation Device (102), via the Interface (101) is defined as a Logical Validation Loop (LVL) (106). It is defined as a Logical Validation because it is an iterative process of communication with the user designed to clarify any illogical and/or ambiguous elements in the story (user input).
[0039] Once the content representation outputted by the CGD (102) is treated in the Rendering Device (103), an output is released from the Rendering Device (103) and sent to a Visualization Device (104). It is this device (104) that provides visual and/or audio feedback to the user through the Interface Device (101) (and can also provide feedback directly to the Content Generation Device 102). This second interaction with the user is defined as the Audio-visual Validation Loop (AWL) 108.
[0040] Still referring to Fig. 2, once both validations (i.e. LVL 106 and AWL 108) are carried out, the final rendered story can be further validated, and/or output to an optional editing software, or as a final document illustrated as element (105).
[0041] Usually, the story will consist of text. However, in some embodiments, the story is or contains other media such as video, audio, image (pictures) and/or computer programming language data.
[0042] In one embodiment, the Interface Device (101) itself is implemented on a computer (or any other processing device), but it could also be a video camera, audio recorder or other similar device. Further details of the Content Generation Device (102) and Rendering Device (103) will be shown and described below in Figures 5a, 5b and Figures 6a, 6b and 6c respectively.
[0043] The physical implementation of the various embodiments of the system and method herein described can take multiple forms. In one embodiment, a self- contained software application which runs on a single desktop computer implements both the system and the method. Alternatively, each device, or combination of devices herein described, are implemented as distinct devices, at separate physical locations, and communicate with each other using any connectivity link or dedicated channel, such as via a network (i.e. internet, a WAN, LAN or any other similar link/network). A device (or a set of devices) are each implementable as either a client (installed application on a desktop computer for example), a web application (running through a web browser for example) or simply run on a server.
[0044] Still with reference to Figure 2, in one embodiment, the Interface Device, the Visualization Device and the Validation device run as either a client or a web application, while the Content Generation Device and the Rendering Device run on a server (or network of servers) since they are the two most computationally expensive Devices. Alternatively, the application can run completely on a single computer, such as in the case of a desktop application or Rich Internet Applications (RIA).
[0045] Figure 3 is a flow chart for a simplified multilevel validation loop. There are two validation loops shown in this figure: a logical validation loop (LVL) 211 , and an Audio/Visual validation loop (AWL) 212, both in accordance with an embodiment. The purpose of these loops is to ensure coherence between the intent of the user and the output of the system.
[0046] A user (201) inputs data (202) through the use of text, speech, computer programming language, videos or pictures into a Content Generation Device (102). The data is usually in the form of a story, but need not be limited as such. The Content Generation Device is a device that converts the input data into a data format called content representation (204). Content representation is a data structure. This content representation then undergoes a logical validation (205) through an interaction with the user and/or predefined default settings (LVL loop 211). If there is incomplete information and/or ambiguity in the content representation, then the user will be questioned for clarification. The corrected input will then be returned to the Content Generation Device (102) to be converted again into content representation 204; thus closing the logical loop.
[0047] In one embodiment, the content representation data (204, 204') is also sent to the Rendering Device (103) where it is converted into a data format called Rendered Content (207). An example of rendered content is a 3D rendered image. Further details of the Rendering Device (103) will be outlined with reference to Figures 6a, 6b and 6c. From there, the Rendered Content 207 undergoes an Audio/Visual Validation (208) through another interaction with the user. This is to say that in the AWL loop 212, the Rendered Content 207 is visually inspected by the user to ensure that it conforms to a 'vision' the user initially had. If it does not conform, the user has the option to change the initial input and proceed again to generate a new or updated content representation 204 and a new or updated rendered content 207 via a second iteration within the Content Generation Device 102 and the Rendering Device 103 respectively. The closing of the Audio/Visual Validation loop 212 occurs once the user is satisfied with the presented rendered content 207. Once both validation loops 211 , 212 are closed, then the data is communicated to an Optional Validation Device (105), which is implemented, in one embodiment, as an editing and/or compositing device(s). Following that, the data is outputted as a final product. Alternatively, the rendered content data proceeds to another external device(s) represented as the End element (210). The final product can be formatted according to various data formats.
[0048] Figure 4 shows a more complex embodiment of the flowchart illustrated in Figure 3. In this embodiment the Audio/visual validation loop AWL 212 has a secondary path that allows the user more control. In this extra path, the user has access to a Content User Interface (CUI) (309) adapted to directly modify the Content Representation (204) through the Content Recombination Device (310). The CUI is a graphical user interface that allows the user to modify the content; for example it could be an interface that allows the user to reposition characters/objects in a scene, or change the camera positions, or change the properties of objects, etc. To accomplish this, the Content Recombination Device 310 may be provided in different ways. In one embodiment, it reads and updates the Content Representation 204 directly and notifies the user of any change without actually changing the user's original input (202). Alternatively, it may change the original input (202) to reflect any number of changes made through the CUI. These changes to the input, when run through the Content Generation Device (102), will reflect the changes made via the Content User Interface 309. The following is a specific example based on textual input with a Content Generation Device 102 based on natural language programming.
[0049] Let the input data (202) be a text describing a conversation between two people that takes place in a living room. The Content Representation (204) would be a data structure describing the scene, characters and dialogue while the Rendered Content (207) would be a 3D view of the living room containing the characters as they dialogue. The user might then want to change the location of one of the characters, such as have him stand by a window during the conversation. In the embodiment illustrated by Figure 4, the user would have two ways to make that change: one, change the text (202) directly through the use of a User Input Device (201) by writing something like "Bob is standing next to the window as he talks to Mary"; or, he uses the Content User Interface (309) which might be an interactive view of the living room to physically drag Bob (with the computer mouse) to the window. There are two ways to carry out the second option as we have shown in Figure 4: first, the Content Recombination Device (310) can itself automatically change the original text input (202) to reflect the character being dragged to the window (for example by automatically inserting a sentence such as "Bob is standing next to the window as he talks to Mary"), or it can leave the text unaltered and simply change the Content Representation to reflect the fact that Bob is standing by the window, which will remain in memory even though the text does not show the change.
[0050] Figures 5a and 5b show two different embodiments of the Content Generation Device: one is an interactive correction version whereby the user is involved (figure 5a) in the correction process, and another is a non-interactive version whereby the user is not involved in the correction process but still involves the user assessing the correction made (Figure 5b). In both embodiments, the input to the Content Generation Device is either the data from the user ((202) or data from the Content Recombination Device (310). The output is the Content Representation data structure. The Content Understanding Device or CUD ((401) and (413)) attempts to 'understand' the input and convert it to a symbolic representation ((405) and (418)). An example of Symbolic Representation of data could be a graph of goals and descriptions in the form of models ordered in time and space (see international publication WO 2008/148211 entitled: "Time-ordered templates for text-to-animation system"). To complete the task of 'understanding', the CUD uses information from a knowledge database ((402) and (414)) or information from the Content Recombination Device ((403) and (415)) to interpret the input. If the input is clear, then the CUD can proceed to convert the data into the symbolic representation. This Symbolic Representation can then be converted into the Content Representation data format by one or multiple Content Enrichment Devices (CED's) ((406),(407) in Fig. 5a and (419) ,(420) in Fig. 5b). These CED's also use a knowledge database ((408) and (410) in Fig. 5a and (421), (423) in Fig. 5b) and/or data from the Content Recombination Device ((409), (411) and/or (412) in Fig. 5a and (422), (424) and/or (425) in Fig. 5b) in order to enrich or enhance the content. A few examples of CED's are: 1) Automatic placement of objects/entities, 2) Automatic cinematographer, and 3) Automatic music generation. For more on the automatic cinematographer (refer to international publication WO 2009/055929 entitled: "Automated cinematography editing tool").
[0051] However, if the CUD fails to understand the input then there are two ways of rectifying the situation: one involving a user's feedback useful to correct the noted ambiguity (Interactive Version) and the other without the user's feedback (Non- interactive Version).
[0052] Interactive correction version: In this embodiment, the CUD will stop processing the input when it encounters an error and it will question the user using the dialogue interface (404) in Fig. 5a, in an attempt to resolve the error. Errors are usually caused by inconsistencies or ambiguities in the input. For example language ambiguities are often a problem within written text. Once the user has resolved the problem the CUD will continue to process the remaining data. The user has the option of aborting if he/she feels the issue is irresolvable (e.g. abort element (430) in Fig. 5a). This process of error resolution involving the user is a logical validation loop.
[0053] Non-interactive version: In this embodiment, the CUD will attempt to correct any errors automatically using default settings as provided in memory element (416) in Fig. 5b, and data retrieved from a knowledge database (432). It does not look for validation from the user; rather it highlights the changed content as provided to archiving element (417) still in Fig. 5b, to alert the user that a change was made.
[0054] Specific example using textual input: Let the user input text be: "Bob is standing by the door. Paul is sitting on the chair. He walks to the door." In this example the pronoun 'He' is ambiguous; who does it refer to: Bob or Paul?
[0055] Interactive solution: Ask the user "Who does 'He' refer to?". Non-interactive solution: Default settings could be used to dictate that an unreferenced pronoun like 'He' would always be associated with the last person named (of a same gender as the pronoun) in the text before the word 'He'; i.e. it would assume that 'He' was Paul.
[0056] In one embodiment of the Content Generation Device based on text-to- speech conversion, the user input is plain language text or textual commands. This input is converted to a symbolic representation by the Content Understanding Device ((401) and (413)) which is based on a text-to-speech conversion technology. Within this particular embodiment, the Symbolic Representation ((405) and (418)) is a data structure of phonetic events, such as English language phonemes, prosodic information such as emphasis on syllable and accentuation, and timings. The knowledge database ((402) or (414)) to accomplish this consists of (but is not limited to) phonetic rules, intonation rules and audio files containing audio/voice data. This symbolic representation is then the input to several CED's. For example one CED is a Dialog Generation System and another is an Automatic Cinematographer. A Dialog Generation System is a device that can automatically determine animations from phonetic events. The knowledge database for this contains rules for the transition between animations to select the most appropriate (statistically speaking) next animation from the existing dialog status. The output of the CED is Content Representation in the form of a goal graph data structure.
[0057] Figures 6a, 6b, and 6c shows three different embodiments for the Rendering Device 103. All three embodiments take the Content Representation (204) and convert into Rendered Content (207). The first embodiment (Figure 6a) uses an Automatic Framing Device (501) that automatically chooses camera angles and framing. Once the content is framed, a standard 2D or 3D Rendering Device (502) renders the content (refer to international patent publication WO 2008/124941 entitled: "Digital representation and animation of physical objects"; international patent publication WO 2009/006727 entitled: "Modeling the motion of articulated objects"; and patent international patent publication WO 2009/033290 entitled: "Character animation of legged figures").
[0058] A second embodiment (Figure 6b) employs a user-defined cameras and framing input (503) for rendering the content in rendering device (504).
[0059] A third embodiment (Figure 6c) uses an Automatic Compositing and Editing Device (ACED) (505) linked to a picture, video and/or audio database (506). The ACED (505) chooses appropriate images, video and/or audio data from the database in order to accurately convert the Content Representation into the Rendered Content. An example of an ACED is a device that searches a database for images of a living room if the user input describes a scene in a living room. Alternatively, the ACED could be based on audio: for example a CED ((406), (407) in Fig. 5a and/or (419), (420) in Fig. 5b) within the CGD (103)) enriches the video content by assigning certain audio files to different parts of the video. The ACED would then automatically order and mix the audio files into the soundtrack. [0060] Figures 7 a and 7b each illustrate a different embodiment of an expanded view of the Content Recombination Device (CRD) (310) from Figure 4. Both embodiments illustrated take user input from the Content User Interface (309) to directly change the content representation (601). This can be accomplished by using the computer mouse to directly alter the content representation (for example by dragging a character across the scene). There are two ways to proceed after the user has made the change: as per Figure 7a, the change passes through a User Input Generation Device (602) to alter the original user input (202) to reflect the changes made (for example: this could be an automatic rewriting of the input text). As per Figure 7b, the change passes through a Recombination Device (604) that preserves the modifications to the content without changing the original input (202). This Recombination Device (310) of Fig. 7b would both read and update the Content Representation (204) directly.
[0061] While illustrated in the block diagrams as groups of discrete components communicating with each other via distinct data signal connections, it will be understood by those skilled in the art that the present disclosure also includes embodiments involving a combination of hardware and software components, with some components being implemented by a given function or operation of a hardware or software system. In addition, many of the data paths illustrated are implementable using data communications within a computer application or operating system.
[0062] To this matter, Fig. 8 is a flowchart of a method 700 for creating digital content, in accordance with an embodiment. In one embodiment, the method is implemented as a computer readable media storing coding forming part of a software application for example, which when run on a processing device, implements the steps of the method 800 as herein described.
[0063] In the method 700, a step 7 02 involves first receiving a user input as described hereinabove, which is descriptive of a storyline for example. [0064] Then, as illustrated by step 704, the user input is analyzed to determine any presence of an ambiguity (or an error) in the input. In this step, at least one ambiguity relating to a syntactic and/or a logical aspect of said input for example is identified.
[0065] Upon the identification of the ambiguity, step 706 involves interacting with the user via a user interface device for example, during a process of modifying the input to correct the ambiguity. The input is thereby freed of the ambiguity (i.e. by removing the ambiguity, or correcting the input accordingly). Such interacting involves, in one embodiment, querying the user to obtain user feedback associated with the error in question, as described extensively hereinabove. In another embodiment, the interacting involves correcting the error automatically using predefined correction settings and knowledge data, as extensively described hereinabove.
[0066] In one embodiment where multiple ambiguities are identified in the input, steps 704 and 706 are re-iterated according to a loop process, until all errors are identified and corrected based on user interaction.
[0067] Next, in step 708, the user input is transformed into a content representation taking on a pre-defined data format which is understandable by a rendering device capable of rendering digital media from the data of the content representation.
[0068] In step 710, the content representation is rendered by a rendering device that generates rendered content as described hereinabove. Once the rendered content is available, at step 712, it is presented to the user via an output device. Step 712 involves, playing the rendered content to the user. The rendered content has either or both audio and/or a visual content and can be played on either or a combined audio device and/or visual display output device.
[0069] It is noted that the method can be adapted as per the above described embodiments. [0070] Now in reference to Fig. 9, there is illustrated a schematic of a digital content creation system 800 in accordance with an embodiment where the system is implemented as a combination of hardware components, including a processor 802, a memory device 804, a database (or a set of databases) 806, a user interface 808 and an audio device and/or visual display device 810. In one embodiment, the combination of the processor 802, memory 804 and database 806 is implemented as a general computer device 812 for example. The memory 804 stores coding which when run by the processor 802 function to implement the steps of a method as the one described in relation to Fig. 8 for example.
[0071] While preferred embodiments have been described above and illustrated in the accompanying drawings, it will be evident to those skilled in the art that modifications may be made therein without departing from the essence of this disclosure. Such modifications are considered as possible variants comprised in the scope of the disclosure.

Claims

CLAIMS:
1. A digital content creation system comprising: an interface device for receiving an input from a user, the input being descriptive of a storyline; a content generation device in communication with the interface device, for transforming the input into a content representation according to a data format, the content generation device analyzing the input and interacting with the user via the interface device when an ambiguity relating to the input is encountered, to correct the input prior to the transforming into the content representation; and a rendering device in communication with the content generation device, for receiving the content representation and rendering the content representation to generate at least one of audio and visual rendered content in real time, the rendered content for being displayed on the interface device to represent the storyline as intended by the user.
2 The system of claim 1 , wherein the content generation device comprises a content understanding device for identifying the ambiguity from the input based on data retrieved from a knowledge database.
3 The system of claim 1 , wherein the content generation device comprises a dialogue interface for questioning the user in order to obtain feedback on the ambiguity; and for using the feedback to correct the input and generate the content representation accordingly.
4 The system of claim 1 , wherein the content generation device comprises an ambiguity correction device for performing a change to the input to correct the ambiguity in accordance with data retrieved from a knowledge database and a set of pre-defined correction settings, the input once corrected being used to generate the content representation, and the change being notified to the user.
5. The system of claim 1 , wherein the interface device comprises a visualization device for allowing the user to assess the at least one of audio and visual rendered content.
6 The system of claim 5, wherein the interface device comprises a content user interface for receiving a user-inputted change to be made to the content representation.
7 The system of claim 6, comprising a content recombination device in communication with the content generation device and the interface device, for modifying the content representation in accordance with the user-inputted change, which in turn modifies the at least one of the audio and visual rendered content generated from the content representation once modified.
8 The system of claim 7, wherein the content recombination device comprises a user input generation device for modifying the input to reflect the user-inputted change.
9 The system of claim 1 , wherein each one of the interface device, the content generation device and the rendering device are distinct and located at different physical locations, while in communication with one another via a network.
10. The system of claim 1 , wherein at least one of the interface device, the content generation device and the rendering device comprises a computing device running a client application to access a remote server over a communication network.
11. The system of claim 1 , wherein the content generation device comprises at least one of: a text-to-speech conversion engine; and a text-to-animation conversion engine.
12. A method for creating digital content, the method comprising: receiving an input descriptive of a storyline from a user; identifying an ambiguity in the input, the ambiguity relating to at least one of syntactic and logical aspect of the input; upon the identifying of the ambiguity, interacting with the user during a process of modifying the input to correct the ambiguity; once the input freed of the ambiguity, transforming the input into a content representation having a data format understandable by a rendering device; generating at least one of audio and visual rendered content in the rendering device based on the content representation; and playing the at least one of audio and visual rendered content to the user, to represent the storyline as intended by the user.
13. The method of claim 12, wherein the identifying the ambiguity in the input comprises retrieving data from a knowledge database to analyze the at least one of syntactic and logical aspect relating to a content of the input.
14. The method of claim 12, wherein the interacting with the user comprises: querying the user via an interface device, to gather feedback relating to the ambiguity; and changing the input using the feedback to remove the ambiguity.
15. The method of claim 12, wherein the interacting with the user comprises: modifying the input to remove the ambiguity based on data retrieved from a knowledge database and a set of pre-defined correction settings; and notifying the user, via a user interface device, of modifications made to the input in the modifying.
16. The method of claim 12, wherein the receiving the input comprises receiving a user-inputted change to the content representation; and comprising modifying the content representation in accordance with the user-inputted change, to in turn modify the at least one of the audio and visual rendered content generated from the content representation once modified.
17. The method of claim 16, wherein the modifying the content representation comprises modifying the input to reflect the user-inputted change, from which the content representation is in turn modified.
18. The method of claim 12, comprising receiving a final validation from at least one of the user and another application, upon the playing the at least one of audio and visual rendered content.
19. The method of claim 12, wherein the receiving the input comprises receiving at least one of: a textual data file, an audio data file, an image data file, and a video data file.
20. The method of claim 12, wherein the transforming the input into the content representation comprises at least one of: converting textual data into speech data; and converting textual data into animation data.
21. The method of claim 12, wherein the generating the at least one of audio and visual rendered content comprises selecting a camera angle and a framing to be applied in the generating.
22. A computer readable media storing instructions for creating digital content, the instructions being readable by a processing device, for allowing the processing device to: receive an input descriptive of a storyline from a user; identify an ambiguity in the input, the ambiguity relating to at least one of syntactic and logical aspect of the input; upon the identifying of the ambiguity, interact with the user during a process of modifying the input to correct the ambiguity; once the input freed of the ambiguity, transform the input into a content representation having a data format understandable by a rendering device; generate at least one of audio and visual rendered content in the rendering device based on the content representation; and play the at least one of audio and visual rendered content to the user, to represent the storyline as intended by the user.
PCT/CA2010/000046 2009-01-13 2010-01-13 Digital content creation system WO2010081225A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14428609P 2009-01-13 2009-01-13
US61/144,286 2009-01-13

Publications (1)

Publication Number Publication Date
WO2010081225A1 true WO2010081225A1 (en) 2010-07-22

Family

ID=42339363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2010/000046 WO2010081225A1 (en) 2009-01-13 2010-01-13 Digital content creation system

Country Status (1)

Country Link
WO (1) WO2010081225A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8731339B2 (en) * 2012-01-20 2014-05-20 Elwha Llc Autogenerating video from text
EP3239857A1 (en) * 2016-04-28 2017-11-01 Wipro Limited A method and system for dynamically generating multimedia content file
WO2019140120A1 (en) * 2018-01-11 2019-07-18 End Cue, Llc Script writing and content generation tools and improved operation of same
US10805665B1 (en) 2019-12-13 2020-10-13 Bank Of America Corporation Synchronizing text-to-audio with interactive videos in the video framework
US10896294B2 (en) 2018-01-11 2021-01-19 End Cue, Llc Script writing and content generation tools and improved operation of same
US11350185B2 (en) 2019-12-13 2022-05-31 Bank Of America Corporation Text-to-audio for interactive videos using a markup language

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493677A (en) * 1994-06-08 1996-02-20 Systems Research & Applications Corporation Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface
US20010049596A1 (en) * 2000-05-30 2001-12-06 Adam Lavine Text to animation process
US20040091848A1 (en) * 2002-11-13 2004-05-13 Nemitz Keith Gerard Interactive narrative operated by introducing encounter events
US7016828B1 (en) * 2000-10-23 2006-03-21 At&T Corp. Text-to-scene conversion
US20080027726A1 (en) * 2006-07-28 2008-01-31 Eric Louis Hansen Text to audio mapping, and animation of the text
US20080215310A1 (en) * 2005-10-28 2008-09-04 Pascal Audant Method and system for mapping a natural language text into animation
WO2008148211A1 (en) * 2007-06-06 2008-12-11 Xtranormal Technologie Inc. Time-ordered templates for text-to-animation system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493677A (en) * 1994-06-08 1996-02-20 Systems Research & Applications Corporation Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface
US20010049596A1 (en) * 2000-05-30 2001-12-06 Adam Lavine Text to animation process
US7016828B1 (en) * 2000-10-23 2006-03-21 At&T Corp. Text-to-scene conversion
US20040091848A1 (en) * 2002-11-13 2004-05-13 Nemitz Keith Gerard Interactive narrative operated by introducing encounter events
US20080215310A1 (en) * 2005-10-28 2008-09-04 Pascal Audant Method and system for mapping a natural language text into animation
US20080027726A1 (en) * 2006-07-28 2008-01-31 Eric Louis Hansen Text to audio mapping, and animation of the text
WO2008148211A1 (en) * 2007-06-06 2008-12-11 Xtranormal Technologie Inc. Time-ordered templates for text-to-animation system

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10402637B2 (en) 2012-01-20 2019-09-03 Elwha Llc Autogenerating video from text
US9036950B2 (en) 2012-01-20 2015-05-19 Elwha Llc Autogenerating video from text
US9189698B2 (en) 2012-01-20 2015-11-17 Elwha Llc Autogenerating video from text
US9552515B2 (en) 2012-01-20 2017-01-24 Elwha Llc Autogenerating video from text
US8731339B2 (en) * 2012-01-20 2014-05-20 Elwha Llc Autogenerating video from text
EP3239857A1 (en) * 2016-04-28 2017-11-01 Wipro Limited A method and system for dynamically generating multimedia content file
US10140259B2 (en) 2016-04-28 2018-11-27 Wipro Limited Method and system for dynamically generating multimedia content file
WO2019140120A1 (en) * 2018-01-11 2019-07-18 End Cue, Llc Script writing and content generation tools and improved operation of same
US10896294B2 (en) 2018-01-11 2021-01-19 End Cue, Llc Script writing and content generation tools and improved operation of same
US10922489B2 (en) 2018-01-11 2021-02-16 RivetAI, Inc. Script writing and content generation tools and improved operation of same
US10805665B1 (en) 2019-12-13 2020-10-13 Bank Of America Corporation Synchronizing text-to-audio with interactive videos in the video framework
US11064244B2 (en) 2019-12-13 2021-07-13 Bank Of America Corporation Synchronizing text-to-audio with interactive videos in the video framework
US11350185B2 (en) 2019-12-13 2022-05-31 Bank Of America Corporation Text-to-audio for interactive videos using a markup language

Similar Documents

Publication Publication Date Title
Xu et al. A practical and configurable lip sync method for games
US9992556B1 (en) Automated creation of storyboards from screenplays
Stone et al. Speaking with hands: Creating animated conversational characters from recordings of human performance
US8223152B2 (en) Apparatus and method of authoring animation through storyboard
US20130246063A1 (en) System and Methods for Providing Animated Video Content with a Spoken Language Segment
US20120276504A1 (en) Talking Teacher Visualization for Language Learning
US11721081B2 (en) Virtual reality experience scriptwriting
Yao et al. Iterative text-based editing of talking-heads using neural retargeting
EP3014619A2 (en) System, apparatus and method for movie camera placement based on a manuscript
WO2010081225A1 (en) Digital content creation system
Jung et al. Storyboarding and pre-visualization with x3d
US8297754B2 (en) Apparatus and method of controlling camera work based on direction rule
US8223153B2 (en) Apparatus and method of authoring animation through storyboard
Gjaci et al. Towards culture-aware co-speech gestures for social robots
US11423941B2 (en) Write-a-movie: unifying writing and shooting
US11302047B2 (en) Techniques for generating media content for storyboards
Marsella et al. Towards higher quality character performance in previz
Shim et al. CAMEO-camera, audio and motion with emotion orchestration for immersive cinematography
Mokhov et al. Agile forward-reverse requirements elicitation as a creative design process: A case study of Illimitable Space System v2
Altarawneh et al. Leveraging Cloud-based Tools to Talk with Robots.
US20230236575A1 (en) Computer-automated scripted electronic actor control
Yu A Novel Framework and Design Methodologies for Optimal Animation Production Using Deep Learning
JP2023546754A (en) Conversion of text into dynamic video objects
Nowina-Krowicki et al. ENGAGE: Automated Gestures for Animated Characters
JP2007272375A (en) Method and program for generating script

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10730999

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10730999

Country of ref document: EP

Kind code of ref document: A1