WO2008097051A1

WO2008097051A1 - Method for searching specific person included in digital data, and method and apparatus for producing copyright report for the specific person

Info

Publication number: WO2008097051A1
Application number: PCT/KR2008/000757
Authority: WO
Inventors: Jung-Hee Ryu; Junhwan Kim
Original assignee: Olaworks, Inc.
Priority date: 2007-02-08
Filing date: 2008-02-05
Publication date: 2008-08-14
Also published as: KR20080074266A; KR100865973B1

Abstract

A specific person can be rapidly retrieved from a moving picture by an automated system. A report indicating whether or not a copyright infringement is committed is automatically produced by the automated system, thus allowing a copyright holder to check whether his or her copyright is infringed or not with ease. A method for automatically searching the specific person in the moving picture includes the steps of: determining a face_search_candidate_sections of the moving picture based on voice recognition technique; and retrieving sections including the specific person's face from the face_search_candidate_sections.

Description

METHOD FOR SEARCHING SPECIFIC PERSON INCLUDED

IN DIGITAL DATA, AND METHOD AND APPARATUS FOR

PRODUCING COPYRIGHT REPORT FOR THE SPECIFIC

PERSON Technical Field

[1] The present invention relates to a method for searching a specific person included in a digital data, and a method and an apparatus for producing a copyright report for the specific person. Background Art

[2] In recent years, User Created Content ("UCC") has been skyrocketing in its popularity. The number of websites for providing the UCC to user is also increasing.

[3] The UCC refers to various kinds of media content produced by ordinary people rather than companies. In detail, the UCC includes a variety of contents, such as music, a photograph, a flash animation, a moving picture, and the like. Disclosure of Invention Technical Problem

[4] The increasing popularity of UCC induces the diversification of the subject who creates the UCC. In former days when contents had been produced by only a few subjects, a copyright holder could protect his or her copyright with no difficulty. Nowadays, however, the diversification of the subjects may often bring about issues on copyright infringement.

[5] In order to check whether the copyright infringement is committed, much time and cost may be required. For example, time and cost may be required to determine when a specific person appears in the UCC such as a moving picture without permission. Accordingly, there is a need for a scheme capable of easily determining whether the specific person appears in the moving picture, thereby effectively protecting copyrights.

[6] Further, the scheme may be used to produce a copyright report without having to require much time and cost. Technical Solution

[7] It is, therefore, one object of the present invention to provide a method for searching a specific person in a moving picture.

[8] It is another object of the present invention to provide a method for producing a copyright report for the specific person included in the moving picture in order to easily check whether a copyright infringement for the specific person is committed or not.

[9] It is yet another object of the present invention to provide an apparatus for producing the copyright report for the specific person included in the moving picture without having to require much time and cost.

Advantageous Effects

[10] In accordance with the present invention, a specific person can be rapidly searched in the moving picture without a vexatious manual search. [11] Further, in accordance with the present invention, a copyright report for the specific person is automatically produced so that a copyright holder can easily check whether infringement is committed against his or her copyright. [12] Furthermore, in accordance with the present invention, since the face of the specific person is rapidly retrieved from the moving picture, the copyright holder can easily check whether the copyright infringement is committed or not, thereby protecting his or her copyright effectively.

Brief Description of the Drawings [13] The above and other objects and features of the present invention will become apparent from the following description of preferred embodiments given in conjunction with the accompanying drawings, in which: [14] Fig. 1 shows a block diagram illustrating an apparatus for producing a copyright report for a specific person who appears in a moving picture in accordance with the present invention; [15] Fig. 2 provides a block diagram showing a person_search_unit in accordance with the present invention; [16] Fig. 3 depicts a flowchart illustrating a process of producing a copyright report for the specific person included in the moving picture in accordance with the present invention; [17] Fig. 4 offers a flowchart illustrating a method for searching the specific person in the moving picture in accordance with an example embodiment of the present invention; [18] Fig. 5 illustrates a method for searching the specific person in the moving picture in accordance with another example embodiment of the present invention; [19] Fig. 6 illustrates a method for searching the specific person in the moving picture in accordance with yet another example embodiment of the present invention; [20] Fig. 7 shows a method for searching the specific person in the moving picture in accordance with yet another example embodiment of the present invention; and [21] Fig. 8 provides a method for searching the specific person in the moving picture in accordance with yet another example embodiment of the present invention. Best Mode for Carrying Out the Invention

[22] In accordance with one aspect of the present invention, there is provided a method for searching temporal sections, in which a specific person appears, of a moving picture, the moving picture including audio components and video components, the method including the steps of: (a) extracting the audio components from the moving picture and determining first sections as face_search_candidate_sections, the first sections including temporal sections, in which person's voices among the extracted audio components are included, by using voice recognition technique; and (b) comparing faces of one or more persons appearing in a part of the video components included in the face_search_candidate_sections with the specific person's face by using face recognition technique, and determining second sections as results, the second sections including temporal sections, in which a certain individual among the persons appearing in the part of the video components is included, there being a degree of similarity over a predetermined threshold value between face of the certain individual and the specific person's face.

[23] In accordance with another aspect of the present invention, there is provided a method for searching temporal sections, in which a specific person appears, of a moving picture, the moving picture including audio components and video components, the method including the steps of: (a) extracting the video components from the moving picture and determining first sections as voice_search_candidate_sections, the first sections including temporal sections, in which person's faces are included among the extracted video components, by face detection technique; and (b) determining second sections as results, the second sections including temporal sections, in which the specific person's voice is included among the audio components in the voice_search_candidate_sections, by voice recognition technique.

[24] In accordance with yet another aspect of the present invention, there is provided an apparatus for producing a copyright report for a specific person appearing in a moving picture, the apparatus including: a moving picture acquiring unit for acquiring the moving picture; a component extracting unit for extracting video components and audio components from the acquired moving picture; a character string search unit for determining first sections so as to include temporal sections in which one or more character strings associated with the specific person are included, in case the character strings are retrieved from the extracted video components by character recognition technique; a voice search unit for determining second sections so as to include temporal sections in which the specific person's voice is included, in case the specific person's voice is retrieved from the extracted audio components by voice recognition technique; an image search unit for comparing faces of one or more persons appearing in the extracted video components with the specific person's face by using face recognition technique, and determining third sections so as to include temporal sections in which a certain individual among the persons appearing in the extracted video components is included, there being a degree of similarity over a predetermined threshold value between face of the certain individual and the specific person's face; and a copyright report producing unit for automatically producing a copyright report including information on time slots corresponding to at least one of the first sections, the second sections, and the third sections and the name of the specific person appearing in at least one of the first sections, the second sections, and the third sections. Mode for the Invention

[25] In the following detailed description, reference is made to the accompanying drawings that show, by way of illustration, specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present invention. It is to be understood that the various embodiments of the present invention, although different, are not necessarily mutually exclusive. For example, a particular feature, structure, or characteristic described herein in connection with one embodiment may be implemented within other embodiments without departing from the spirit and scope of the present invention. In addition, it is to be understood that the location or arrangement of individual elements within each disclosed embodiment may be modified without departing from the spirit and scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims, appropriately interpreted, along with the full range of equivalents to which the claims are entitled. In the drawings, like numerals refer to the same or similar functionality throughout the several views.

[26] The present invention will now be described in more detail, with reference to the accompanying drawings.

[27] Fig. 1 shows a block diagram illustrating an apparatus 100 for producing a copyright report for a specific person included in a digital data, e.g., a moving picture, in accordance with the present invention.

[28] In detail, the apparatus 100 includes a moving picture acquiring unit 110 for acquiring a moving picture, a person search unit 120 for checking whether a specific person appears in the moving picture provided by the moving picture acquiring unit 110, and a copyright report producing unit 130 for producing a copyright report based on the information provided by the person search unit 120. [29] The moving picture acquiring unit 110 may acquire the moving picture through wired or wireless networks or from a broadcast.

[30] The person search unit 120 checks rapidly whether an image, e.g., a facial image, of the specific person is included in the moving picture without permission.

[31] The copyright report producing unit 130 generates a copyright report based on the result obtained by the person search unit 120.

[32] Fig. 2 provides a block diagram showing the person search unit 120 in accordance with the present invention.

[33] The person search unit 120 includes a voice search unit 220 for searching voice included in the moving picture, and an image search unit 230 for retrieving the facial image of the specific person from the moving picture. Moreover, the person search unit 120 may further include a character string search unit 210 for retrieving character strings, such as a name, a nickname, and the like, associated with the specific person from the moving picture.

[34] In case the moving picture is obtained from a digital broadcast, the character string search unit 210 may retrieve the character strings associated with the specific person from additional data, e.g., a caption, included in the moving picture. The character string search unit 210 may be embodied by means of one or more character recognition techniques well known in the art.

[35] The voice search unit 220 may retrieve one or more temporal sections, which may be considered as so-called face_search_candidate_sections, from the moving picture, where the specific person's voice is included. By determining the face_search_candidate_sections, temporal range of the moving picture to be searched for the specific person may shrink substantially, thereby reducing the time and the cost needed for the search process.

[36] However, in case the specific person's voice is retrieved from the moving picture by the voice_search_unit 220, a part of the temporal sections are likely to be skipped during the search process although the specific person appears therein, due to a failure in recognition of the specific person's voice.

[37] Accordingly, the voice search unit 220 may be embodied so as to search extended temporal sections including all person's voices (instead of the temporal sections including the specific person's voice). The person's voice may be detected by referring to periodicity of vibration of vocal cords, periodic acceleration resulting from the presence of Glottal Closure Instance (GCI), a peculiar shape of a spectral envelope of person's voice, and the like. The voice search unit 220 may be embodied by means of one or more voice recognition techniques well known in the art.

[38] After the face_search_candidate_sections are determined, the image search unit 230 retrieves the specific person's face from the face_search_candidate_sections to check whether the specific person's facial image is included in the moving picture without permission or not. The image search unit 230 may be embodied by means of one or more face detection techniques and/or face recognition techniques well known in the art.

[39] In accordance with another example embodiment of the present invention, so-called voice_search_candidate_sections, may be retrieved from the moving picture in the first place by checking whether there appears at least one candidate in the moving picture who is likely to be recognized as the specific person by the image search unit 230, and then finalized temporal sections where the specific person's voice is included may be retrieved from the voice_search_candidate_sections by the voice search unit 220.

[40] Moreover, in accordance with yet another example embodiment of the present invention, both the retrieval of first sections where the specific person's face is included by the image search unit 230 and the retrieval of second sections where the specific person's voice is included by the voice search unit 220 may be performed at the same time.

[41] It is to be noted that the above-mentioned various embodiments may be applied to other embodiments expounded hereinafter even though any specific description thereabout is not presented thereat.

[42] Fig. 3 depicts a flowchart illustrating a process of producing a copyright report for the specific person who appears in the moving picture in accordance with the present invention.

[43] Referring to Fig. 3, the method for producing the copyright report for the specific person includes the steps of acquiring the moving picture (S310), retrieving temporal sections of the moving picture in which the specific person appears (S320), and producing the copyright report based on the retrieved sections (S330).

[44] In detail, at the step 310, the moving picture may be acquired through wired or wireless networks or digital broadcasts.

[45] After the temporal sections where the facial image of the specific person is included without permission are retrieved from the moving picture at the step 320, the copyright report may be produced at the step 330, the copyright report being used as a supporting evidence for copyright infringement.

[46] The copyright report may contain information on the specific person. For example, the copyright report may say that the specific person "Peter" appears at "three scenes" of the moving picture "A", a total time of the sections where "Peter" appears is "10 minutes", and time slots at which "Peter" appears are "from xx seconds to xxx seconds" and the like.

[47] By referring to the copyright report, a copyright holder can determine whether to cope with the copyright infringement. [48] Fig. 4 offers a flowchart illustrating a method for searching the specific person in the moving picture in accordance with an example embodiment of the present invention.

[49] First, voices which have high probabilities of being determined as the specific person's voice are retrieved from the moving picture (S410).

[50] A determination is then made on whether the specific person's voice is included in the moving picture (S420).

[51] In case the specific person's voice is considered to be included in the moving picture, first temporal sections including the specific person's voice (e.g., scenes at which the specific person's voice is inserted) are determined as the face_search_candidate_sections (S430).

[52] After the face_search_candidate_sections are determined, second temporal sections including persons who have high probabilities of being determined as the specific person are retrieved from the face_search_candidate_sections (S440).

[53] Fig. 5 illustrates a method for searching the specific person in the moving picture in accordance with another example embodiment of the present invention.

[54] First, person's voices are searched in the moving picture (S510). Unlike the embodiment of Fig. 4, the reason why the unspecific person's voices are searched instead of the specific person's voice is that a part of the temporal sections including the specific person are likely to be skipped due to inaccuracy of the technique for recognizing a certain voice.

[55] A determination is then made on whether the person's voices are included in the moving picture (S520).

[56] In case the person's voices are considered to be included in the moving picture, first temporal sections including the person's voices (e.g., scenes where the person's voices are inserted) are determined as the face_search_candidate_sections (S530).

[57] After the face_search_candidate_sections are determined, second temporal sections including persons who have high probabilities of being determined as the specific person are retrieved from the face_search_candidate_sections (S540).

[58] Fig. 6 illustrates a method for searching the specific person in the moving picture in accordance with yet another example embodiment of the present invention.

[59] Referring to Fig. 6, character strings associated with the specific person are first retrieved from the moving picture (S 610), unlike the embodiments of Figs. 4 and 5. In case the moving picture is obtained from a digital broadcast, the character strings may include data, e.g., a caption inserted into the moving picture, as previously mentioned.

[60] A determination is then made on whether the character strings associated with the specific person are included in the moving picture (S 620). In case the character strings associated with the specific person are not included in the moving picture, it is determined that the specific person does not appear in the moving picture. [61] However, in case the character strings associated with the specific person are included in the moving picture, voices which have high probabilities of being determined as the specific person's voice are retrieved from the moving picture (S630).

[62] A determination is then made on whether the specific person's voice is included in the moving picture (S640).

[63] In case the specific person's voice is considered to be included in the moving picture, first temporal sections including the specific person's voice (e.g., scenes at which the specific person's voice are inserted) are determined as the face_search_candidate_sections (S650).

[64] Then, second temporal sections including persons who have high probabilities of being determined as the specific person are retrieved from the face_search_candidate_sections (S660).

[65] Fig. 7 shows a method for searching the specific person in the moving picture in accordance with yet another example embodiment of the present invention.

[66] Referring to Fig. 7, character strings associated with the specific person are first retrieved from the moving picture (S710) like the embodiment of Fig. 6 (unlike the embodiments of Figs. 4 and 5). Since the examples of the character strings were previously mentioned, a detailed description thereabout will be omitted.

[67] A determination is then made on whether the character strings associated with the specific person are included in the moving picture (S720). In case the character strings associated with the specific person are not included in the moving picture, it is determined that the specific person does not appear in the moving picture.

[68] However, in case the character strings associated with the specific person are considered to be included in the moving picture, person's voices are retrieved from the moving picture (S730). Unlike the embodiment of Fig. 6, the reason why the un- specific person's voices are retrieved instead of the specific person's voice is that a part of the temporal sections including the specific person are likely to be skipped due to the inaccuracy of the technique for recognizing a certain voice. Since this was mentioned above, a detailed description thereabout will be omitted.

[69] A determination is then made on whether the person's voices are included in the moving picture (S740).

[70] In case the person's voices are included in the moving picture, first temporal sections including the person's voices (e.g., scenes at which the person's voices are inserted) are determined as the face_search_candidate_sections (S750).

[71] Thereafter, second temporal sections including the specific person's facial image are retrieved from the face_search_candidate_sections (S760).

[72] Fig. 8 provides a method for searching the specific person in the moving picture in accordance with yet another example embodiment of the present invention. [73] First, character strings associated with the specific person are retrieved from the moving picture (S 810), as shown in the embodiments of Figs. 6 and 7. Referring to Fig. 8, however, the retrieved character strings are applied to determine the face_search_candidate_sections, unlike the embodiments of Figs. 6 and 7.

[74] Thereafter, voices which have high probabilities of being determined as the specific person's voice are searched in the moving picture (S820). Herein, it should be noted that the steps 810 and 820 can be performed in a reverse order or at the same time.

[75] After the retrieval of the character strings and the voice are ended, a determination is then made on whether the character strings associated with the specific person or the specific person's voice are included in the moving picture (S830).

[76] When the character strings associated with the specific person or the specific person's voice are included in the moving picture, first temporal sections including the character strings or the specific person's voice are determined as the face_search_candidate_sections (S840). This is because the first temporal sections including the character strings associated with the specific person or the specific person's voice are considered as time slots in which the specific person is highly likely to appear.

[77] After the face_search_candidate_sections are determined, second temporal sections including the specific person's facial image are retrieved from the face_search_candidate_sections (S850).

[78] As described above, the embodiments of the present invention described with reference to Figs. 4 to 8 may be embodied by using metadata such as Electronic Program Guide (EPG). For example, the name of the specific person may be retrieved from the EPG in the first place, which may include information on a plurality of performers, and in case the specific person is included in the EPG, the attempt to retrieve the specific person from a corresponding moving picture may be made with efficiency, resulting in a high-accurate retrieval.

[79] However, since the EPG includes only key performers in general, it is undesirable in terms of accuracy not to check content of the moving picture for the simple reason that the name of the specific person has not been found in the EPG.

[80] Meanwhile, the EPG is available for moving pictures provided by broadcasting stations such as KBS, MBC, and the like. However, the EPG may be unavailable for moving pictures illegally distributed because in this case it is a matter of course not to have corresponding EPG.

[81] While the present invention has been shown and described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and the scope of the present invention as defined in the following claims.

Claims

[1] A method for searching temporal sections, in which a specific person appears, of a moving picture, the moving picture including audio components and video components, the method comprising the steps of:

(a) extracting the audio components from the moving picture and determining first sections as face_search_candidate_sections, the first sections including temporal sections, in which person's voices among the extracted audio components are included, by using voice recognition technique; and

(b) comparing faces of one or more persons appearing in a part of the video components included in the face_search_candidate_sections with the specific person's face by using face recognition technique, and determining second sections as results, the second sections including temporal sections, in which a certain individual among the persons appearing in the part of the video components is included, there being a degree of similarity over a predetermined threshold value between face of the certain individual and the specific person's face.

[2] The method of claim 1, wherein, at the step (a), the first sections are determined by detecting temporal sections of the moving picture in which the specific person's voice among the extracted audio components is included.

[3] The method of claim 2, wherein a shape of a unique spectral envelope of the specific person's voice is compared with that of the extracted audio components in order to determine whether the specific person's voice is included in the extracted audio components.

[4] The method of claim 1, wherein the step (a) includes the steps of: extracting the video components and the audio components from the moving picture; extracting character strings from the video components; and determining the first sections as the face_search_candidate_sections in case one or more character strings associated with the specific person are retrieved from the extracted character strings.

[5] The method of claim 4, wherein the first sections are determined by detecting temporal sections of the moving picture in which the specific person's voice among the extracted audio components id included.

[6] The method of claim 1, wherein the step (a) includes the steps of: extracting the video components and the audio components from the moving picture; determining third sections including temporal sections of the moving picture in which character strings associated with the specific person are included in the extracted video components; determining the first sections including temporal sections of the moving picture in which the specific person's voice among the extracted audio components is included; and determining the third sections and the first sections as the face_search_candidate_sections.

[7] The method of any one of claims 4 to 6, wherein the character strings include captions of the moving picture.

[8] The method of claim 1, wherein the step (a) includes the step of determining whether a name of the specific person is included in an Electronic Program

Guide (EPG) associated with the moving picture; and determining the first sections as the face_search_candidate_sections in case the name of the specific person is included in the EPG.

[9] A method for searching temporal sections, in which a specific person appears, of a moving picture, the moving picture including audio components and video components, the method comprising the steps of:

(a) extracting the video components from the moving picture and determining first sections as voice_search_candidate_sections, the first sections including temporal sections, in which person's faces are included among the extracted video components, by face detection technique; and

(b) determining second sections as results, the second sections including temporal sections, in which the specific person's voice is included among the audio components in the voice_search_candidate_sections, by voice recognition technique.

[10] The method of claim 9, wherein the step (a) includes the steps of: extracting the video components from the moving picture; and comparing faces of one or more persons appearing in the video components with the specific person's face by using face recognition technique, and determining the first sections so as to include temporal sections, in which a certain individual among the persons appearing in the video components is included, there being a degree of similarity over a predetermined threshold value between face of the certain individual and the specific person's face.

[11] The method of claim 9, wherein the step (a) includes the steps of: extracting the video components and the audio components from the moving picture; extracting character strings from the video components; and determining the first sections as the voice_search_candidate_sections in case one or more character strings associated with the specific person are retrieved from the extracted character strings.

[12] The method of claim 9, the step (a) includes the step of determining whether a name of the specific person is included in an Electronic Program Guide associated with the moving picture; and determining the first sections as the voice_search_candidate_sections in case the name of the specific person is included in the EPG.

[13] The method of any one of claims 1 to 6 and 8 to 12, further comprising the step of: automatically producing a copyright report including information on time slots corresponding to the second sections and the name of the specific person appearing in the second sections, in case the second sections exist.

[14] A medium recording a computer readable program to execute the method of any one of claims 1 to 6 and 8 to 12.

[15] An apparatus for producing a copyright report for a specific person appearing in a moving picture, the apparatus comprising: a moving picture acquiring unit for acquiring the moving picture; a component extracting unit for extracting video components and audio components from the acquired moving picture; a character string search unit for determining first sections so as to include temporal sections in which one or more character strings associated with the specific person are included, in case the character strings are retrieved from the extracted video components by character recognition technique; a voice search unit for determining second sections so as to include temporal sections in which the specific person's voice is included, in case the specific person's voice is retrieved from the extracted audio components by voice recognition technique; an image search unit for comparing faces of one or more persons appearing in the extracted video components with the specific person's face by using face recognition technique, and determining third sections so as to include temporal sections in which a certain individual among the persons appearing in the extracted video components is included, there being a degree of similarity over a predetermined threshold value between face of the certain individual and the specific person's face; and a copyright report producing unit for automatically producing a copyright report including information on time slots corresponding to at least one of the first sections, the second sections, and the third sections and the name of the specific person appearing in at least one of the first sections, the second sections, and the third sections.