WO2000045307A1 - Multimedia archive description scheme - Google Patents

Multimedia archive description scheme Download PDF

Info

Publication number
WO2000045307A1
WO2000045307A1 PCT/US2000/002488 US0002488W WO0045307A1 WO 2000045307 A1 WO2000045307 A1 WO 2000045307A1 US 0002488 W US0002488 W US 0002488W WO 0045307 A1 WO0045307 A1 WO 0045307A1
Authority
WO
WIPO (PCT)
Prior art keywords
attributes
relationships
archive
cluster
multimedia
Prior art date
Application number
PCT/US2000/002488
Other languages
French (fr)
Other versions
WO2000045307A9 (en
Inventor
Ana B. Benitez
Alejandro Jaimes
Seungyup Paek
Shih-Fu Chang
Chung-Sheng Li
John R. Smith
Original Assignee
The Trustees Of Columbia University In The City Of New York
Ibm
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Trustees Of Columbia University In The City Of New York, Ibm filed Critical The Trustees Of Columbia University In The City Of New York
Priority to EP00915716A priority Critical patent/EP1151398A4/en
Priority to MXPA01007725A priority patent/MXPA01007725A/en
Priority to JP2000596495A priority patent/JP2002537591A/en
Priority to US09/889,859 priority patent/US6941325B1/en
Priority to AU36943/00A priority patent/AU3694300A/en
Publication of WO2000045307A1 publication Critical patent/WO2000045307A1/en
Publication of WO2000045307A9 publication Critical patent/WO2000045307A9/en
Priority to HK03100981.8A priority patent/HK1048866A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures

Definitions

  • the present invention relates generally to multimedia content description and more particularly relates to a description scheme for a collection of multimedia records, such as a multimedia archive.
  • multimedia content becomes increasingly pervasive, many applications will benefit from an interoperable archive description standard.
  • One such application is the exchange of multimedia documents among heterogeneous audiovisual databases.
  • multimedia records can include images, image segments, videos, video segments, audio content, documents, and the like.
  • the existence of an archive and audio-visual document description standard would allow the purchasing company to take advantage not only of previous extracted features and annotations for each multimedia document, but also of the previous indexing of the whole multimedia collection. Therefore, the media company could minimize the cost of integrating the purchased content with existing content.
  • Metasearch engines which are gateways linking users to multiple and distributed search engines, would also benefit from a multimedia archive description scheme.
  • the operation of current metasearch engines is significantly restrained by the interface limitations of current search engines (e.g. query by example or by sketch and results are a flat list of documents).
  • Archive descriptions will provide significant advantages for the metasearcher in its interaction with multiple search engines. Such queries can involve large collections of multimedia documents.
  • Queries by archive description will allow efficient matching of a multimedia collection in selected feature spaces without the need of exchanging the description of each multimedia document.
  • the interests addressed by the archive content are also more likely to become evident when viewing a description of a collection of multimedia documents rather than an individual multimedia document.
  • multimedia standards such as MPEG-7
  • MPEG-7 include description standards for individual multimedia documents, but do not extend these standards to the description of collections or archives of multimedia documents. If such standards are adopted and search engines make them available to the metasearcher, a more efficient search solution of a multimedia archive can be obtained.
  • a system for generating a multimedia archive description has a digital storage subsystem for storing multimedia records and descriptions of the records in accordance with a media description scheme.
  • a computer processor is operatively coupled to the digital storage subsystem.
  • the computer processor can access the record descriptions in the digital storage subsystem and generate an archive description record.
  • the archive description record has at least one cluster, which is a data structure relating at least two records, or at least two portions of a single record in the digital storage subsystem.
  • the clusters are formed based on attributes of the record descriptions indicative of a similarity measure in the corresponding records.
  • Cluster attributes are generated from the record descriptions and can include feature space attributes, semantic attributes, media attributes and meta attributes.
  • the computer processor also generates, as part of the archive description record, a multimedia archive index, or collection structure, for the archive.
  • the system also includes archive description record storage which is operatively coupled to the computer processor for storing the archive description record.
  • the archive description storage can be a separate data storage device or a section of computer readable media associated with the digital storage subsystem.
  • the clusters in the archive description record can further relate records in accordance with at least one cluster relationship.
  • the cluster relationships can include feature space relationships, semantic relationships, media relationships and meta relationships.
  • Cluster relationships can define relationships between a cluster and a record, between two or more records, or between two or more clusters.
  • a method in accordance with the present invention generates a description of the content of a multimedia collection, or archive, which has one or more media records and record descriptions associated with the records.
  • the method includes evaluating the record descriptions to determine measures of similarity in at least two records in the archive based on the properties of the records and grouping elements based on these properties. The groupings can be performed automatically, semi-automatically or even manually.
  • a cluster description record can then be generated reflecting the grouping of elements, in accordance with cluster attributes and cluster relationships.
  • a multimedia archive index, or collection structure description scheme is generated to relate the multimedia archive description to the records in the archive.
  • an archive description file for describing the content of a multimedia archive, having records and record descriptions associated with the records.
  • the principal description element in the archive description file is a cluster.
  • the cluster includes at least one cluster attribute which relates the records in accordance with a similarity measure found in the record descriptions.
  • the clusters can be further defined with the inclusion of at least one cluster relationship.
  • the cluster attributes can include descriptors related to feature space attributes, semantic attributes, media attributes and meta attributes.
  • the cluster relationships can include feature space relationships, semantic relationships, media relationships and meta relationships. Cluster relationships can define relationships between records, between records and clusters or between two or more clusters.
  • the cluster attributes can be indexed in accordance with an information-based hierarchy wherein feature space attributes are indexed above semantic attributes in the information-based hierarchy.
  • the feature space attributes can be selected from the group including type/technique attributes, global distribution attributes, local structure attributes and global composition attributes.
  • the semantic attributes can be selected from the group consisting of generic object attributes, generic scene attributes, specific object attributes, specific scene attributes, abstract object attributes and abstract scene attributes.
  • the information-based hierarchy is a ten level index structure having a plurality of feature space levels and a plurality of semantic attribute levels, wherein the feature space attribute levels include a type/technique attribute level, a global distribution attribute level, a local structure attribute level and a global composition attribute level and wherein the semantic attribute levels include a generic object attribute level, a generic scene attribute level, a specific object attribute level, a specific scene attribute level, an abstract object attribute level and an abstract scene attribute level.
  • the cluster relationships can be selected from the group including feature space relationships, semantic relationships, media relationships and meta relationships.
  • the feature space relationships can be further selected from the group including spatial relationships, temporal relationships, and visual relationships.
  • the semantic relationships can be further selected from the group including lexical relationships and predictive relationships.
  • the systems, methods and archive description scheme described above can be used in a media archive system in accordance with the present invention.
  • the media archive system includes a computer readable storage system for storing a plurality of media records. At least one media description engine is also provided for accessing media records stored in the computer readable storage system and for generating a media description record corresponding thereto.
  • a cluster processor is operatively coupled to the computer readable storage system. The cluster processor accesses the media description records and generates an archive description record which includes at least one cluster relating at least two records stored in the storage system.
  • a query processor can be provided and operatively coupled to the cluster processor to receive archive search parameters from a user and provide an archive search query to the cluster processor.
  • the cluster processor can generate the archive description record automatically or semi-automatically with varying degrees of user input.
  • the archive description record can also be generated or modified by manually creating clusters based on cluster attributes and cluster relationships.
  • FIG. 1 is a simplified block diagram of a multimedia archive description scheme in accordance with the present invention.
  • Figure 2A is a Venn diagram illustrating exemplary multimedia archive content and potential clusters, in a feature space
  • Figure 2B is a hierarchal diagram illustrating an exemplary cluster decomposition relationship for the clusters identified in Figure 2A;
  • Figure 3 A is a Venn diagram illustrating exemplary multimedia archive content and potential clusters, in a subject space;
  • Figure 3B is a hierarchal diagram illustrating an exemplary cluster decomposition relationship for the clusters identified in Figure 3A;
  • Figure 4 is a block diagram illustrating a first embodiment of the present multimedia archive description scheme architecture
  • Figure 5 is a block diagram illustrating an exemplary cluster descriptor arrangement, for use in connection with the description scheme architecture of Figure 4;
  • Figure 6 is a block diagram illustrating an exemplary cluster relationship arrangement, for use in connection with the description scheme architecture of Figure 4;
  • Figure 7 is a table illustrating an index structure for semantic relationships for an exemplary multimedia archive description scheme;
  • Figure 8 is a table illustrating an index structure for syntactic relationships for an exemplary multimedia archive description scheme
  • Figure 9 is a pictorial representation of a cluster attribute index structure and cluster relationship index structure for an exemplary multimedia archive description scheme
  • Figure 10 is a block diagram illustrating an exemplary entity- relationship model for a multimedia archive description scheme
  • Figure 11 is a block diagram illustrating an alternate embodiment of the present multimedia archive description scheme architecture.
  • Figure 12 is a block diagram of a multimedia archive system employing a multimedia archive description scheme in accordance with the present invention.
  • multimedia document and multimedia record are synonymous and refer generically to any item of multimedia content, such as an image, an image object (e.g., a portion of an image), a video segment, an Internet web page including text and graphics (e.g., HTML and XML based multimedia content), and the like.
  • the present multimedia archive description schemes are generally applicable to systems and methods for describing collections of multimedia records where the individual records in the collection can be, or have been, described by a media description scheme for individual records, referred to herein generally as single record multimedia description schemes.
  • Exemplary single record multimedia description schemes for various multimedia content, such as generic audio-video content, images and video are described in co-pending International Applications: PCT/US99/26125 entitled “Systems and Methods for Interoperable Multimedia Content Descriptions;” PCT/US99/26127 entitled “Image Description System and Method;” and PCT/US99/26126 entitled “Video Description System and Method;” respectively, which are incorporated herein by reference.
  • multimedia records can be described as a set of elements, such as syntactic, semantic, meta and media elements.
  • elements such as syntactic, semantic, meta and media elements.
  • an image can be described as a group of objects, with features and relationships associated with those objects.
  • This description philosophy is extensible to a collection of multimedia documents, which can also be viewed as a set of elements that are further described by attributes of the elements and relationships to other elements.
  • the simplified block diagram of Figure 1 illustrates the concept of the present multimedia archive description scheme.
  • the primary unit of description in the present multimedia archive description scheme is a cluster 100.
  • a cluster 100 represents a grouping of records in the archive based on the properties of the records which are set forth in descriptions of the records.
  • a cluster 100 includes at least one cluster attribute 105 which describes a property of the cluster.
  • Clusters can also include cluster relationships 110.
  • Cluster relationships can be simple, one to one relationships such as between clusters, records, or record segments, or can be complex, m to n relationships between multiple clusters, records, or record segments.
  • the cluster relationships 110 can be further defined by cluster relationship attributes 115.
  • a cluster 100 can be characterized in numerous ways, such as by statistics and distributions of its elements in different feature spaces, semantics associated with the cluster, media information, and meta information.
  • Feature-space and semantic attributes can be used to describe the multimedia content in the cluster.
  • Media and meta attributes generally convey to information that is closely related to the cluster, but not explicitly given by its content. Examples of this type of cluster attribute can include the identification of the cluster in a database or the method used to create a cluster.
  • a cluster 100 can be created based on a vast number of combinations of attributes and/or relationships. Clusters can also be created based on human perception, which may not correspond to any combination of the attributes and relationships describing these elements. Clusters 100 can be grouped to form other clusters, so composition and/or decomposition are possible relationships among clusters.
  • More complex relationships involve relating the clusters based on their attributes 105, e.g., "color more random than.” Relationships 110 among clusters can be defined for each cluster attribute type, e.g., syntactic feature space, semantic features, media, and meta. Probabilities can also be assigned to relationships as a confidence factor of a stated association (e.g., "Object 1 is contained in Cluster A with probability 0.9"). Relationships can also relate clusters to elements of multimedia documents (e.g., "Cluster A is composed of Object 1 and Object 2").
  • Figure 2 A graphically represents elements of an archive and possible clusters which are defined in a generic feature space, designated X.
  • the elements 220 of this archive which are illustrated as stars distributed in the feature space, can be still regions (e.g. image objects), moving regions (e.g. video objects), and video segments, among others.
  • Clusters 0, 1, 2, 2J, 2J, 2.3 and 3 are described in terms of attributes color 205, shape 210 and file size 215.
  • Figure 2B illustrates a simple decomposition relationship among the clusters in Figure 2 A. In this case, cluster 0 includes clusters 1 , 2, and 3 and cluster 2 includes clusters 2J, 2.2 and 2.3.
  • Figures 3A and 3B also represent exemplary archive content defined as clusters, and a cluster decomposition relationship, respectively.
  • the elements of this archive can be objects and events which are related by the exemplary clusters: Subject 300, History 305, Art 310 and Science 320.
  • the clusters Expressionist 325, Impressionist 330, and Modern 335 are defined.
  • Figure 3B illustrates a simple cluster decomposition relationship between the clusters of Figure 3 A.
  • FIG 4 is a block diagram illustrating an embodiment of the present multimedia archive description scheme. This embodiment is one which extends the definition of a single record multimedia description scheme to represent multimedia archives having one or more records.
  • the various blocks represent fields of information in the multimedia archive description scheme.
  • the various blocks in the description scheme are linked by composition relationships, which are illustrated by diamond shaped arrows, or by inheritance relationships, which are illustrated by triangular shaped arrows.
  • Clusters 100 are defined by at least one attribute 105 and may include one or more relationships 110. Referring to Figure 4, these parameters are stored in a cluster description scheme block 405 which includes cluster relations 415 and cluster descriptors 410 by way of composition relationships.
  • the cluster descriptor block 410 includes the attributes of the cluster.
  • the cluster description scheme block 405 also includes a cluster node block 400.
  • the cluster node block 400 is an element used for grouping a set of elements in a relationship.
  • the cluster description scheme block 405 is also linked to a collection structure description scheme block 420, which is an index bridging the single document multimedia archive description scheme, such as a description scheme for generic Audio-Visual content (Generic AV DS) 425, to the cluster description scheme block 405 and cluster relations block 415.
  • a collection structure description scheme block 420 which is an index bridging the single document multimedia archive description scheme, such as a description scheme for generic Audio-Visual content (Generic AV DS) 425, to the cluster description scheme block 405 and cluster relations block 415.
  • the cluster node 400 includes references to various components which further describe the content of the individual records in the archive.
  • the cluster node 400 can include, via a composition link, references to segments 430 (such as video), references to events 435 and references to objects 440 (such as image objects) which are defined in an existing multimedia description scheme.
  • the cluster node 400 can also include references to other clusters 445 and references to the multimedia description scheme 450.
  • the references operate as pointers which convey the location of a source of information.
  • the cluster descriptor block 410 is further defined in the relational block diagram of Figure 5.
  • the cluster descriptor block 410 can include feature space descriptors 500, semantic descriptors 505, media descriptors 510 and meta descriptors 515, which are used in connection with single document media description schemes.
  • the feature space descriptors are a set of properties which describe the feature space attributes of records in a description scheme. Such attributes are generally syntactic and refer to the way the content of records are arranged without considering the meaning conveyed by such arrangements.
  • Feature space attributes generally describe the cluster by its appearance in a given feature space. Feature space attributes can also describe statistical attributes (e.g., size and higher order moments) of distribution of records in a cluster.
  • Feature space attributes can include information such as feature space point 520, feature space orientation 522, feature space bounding box parameters 524, feature space contour definitions 526 and feature space quantization 528.
  • the feature space descriptor 500 can also inherit the properties of the feature space 530 and feature space distribution 532.
  • the semantic descriptor block 505 can inherit free form annotations 534 as well as conventional 6-w parameters 536, which are also used in connection with the single record description schemes for multimedia records.
  • Semantic attributes generally refer to the meaning conveyed by the arrangement of records.
  • the 6-w's include who 538, where 540, what object 542, what action 544, why 546 and when 548.
  • the media descriptor block 510 includes information describing the media attributes of a cluster.
  • the media descriptor block 510 may inherit format information 550, storage requirements 552, file identification parameters 556, and file location information 558 of the clusters.
  • the meta descriptor block 515 includes author-generated information which is input by an author of the document or the creator of the cluster.
  • the meta attributes of a cluster can include information related to the creation of the cluster, such as the method, constraints or rules followed to create the cluster based on the elements' attributes.
  • the meta descriptor block 515 can inherit information such as representative icons 560, intellectual property rights attribution 562 and creation information 564, such as method of creation 566, date/time of creation 568, and organization 570.
  • Figure 6 further illustrates the construction and content of the cluster relationship block 415.
  • Cluster relationships 415 can be broadly classified as feature space relationships 605, semantic relationships 610, media relationships 670 and meta relationships 680.
  • Semantic relationships 610 which are used to relate semantic interpretations of clusters, further include lexical relationships 615, action relationships 620 and state relationships 625, all of which can be inherited from the description of the individual records in the archive collection.
  • the action relationships 620 and state relationships 625 are examples of predictive relationships.
  • the relationship "Impressionism is a part of cluster Art” is a semantic relationship between the clusters "impressionism" and "art”.
  • the lexical relationships 615 correspond to the semantic relationships among nouns, such as those described in the article "WordNet: A lexical Database for English", by G.A. Miller, Communications of the ACM, Vol. 38, No. 11. pp. 39-41, November 1995.
  • These relationships can include synonymy (e.g., "pipe is similar to tube”), antonymy (e.g., "happiness is opposite to sadness"), hyponymy/hypernymy (e.g., "a dog is an animal” and “an animal is the type of dog”) and meronymy/holonymy (e.g., "a musician is member of a musical band” and "a musical band is composed of musicians").
  • Predicative semantic attributes can include action relationships 620 (e.g., "to throw” and “to hit") and state relationships 625 (e.g., "to own” and “to control”) among two or more clusters. These relationships are further set forth in the table of Figure 7, that provides an indexing structure for semantic relationships.
  • Feature space relationships 605 can include relationships such as cluster union 630, cluster intersection 635, cluster decomposition 640, Rtheta relationships 645 and cluster elements 650. These relationships are syntactic in nature.
  • the Rtheta relationships generally include orientation information 655 and feature space distance information 660. This listing of feature space relationships is a non-exhaustive, representative list.
  • the actual set of feature space relationships for a given archive description scheme instantiation can include subsets of these relationships and can also include other relationships not shown in Figure 6, yet are relevant to a particular feature space.
  • the table of Figure 8 provides an exemplary indexing structure for syntactic relationships for the multimedia archive description scheme.
  • FIG. 9 is a pictorial diagram which provides a ten-level index structure for cluster attributes and a correspondence of this index structure to cluster relationships for an exemplary archive of records described by a generic audio-visual description scheme.
  • records are generally images, image segments or video segments, and are described in terms of objects, regions and events.
  • the cluster attribute index structure 910 is visually represented by a ten level pyramid representation.
  • Each level in the index structure 910 represent attributes which require more information to define them than the layers above.
  • cluster attributes can be syntactic, semantic, media and meta type attributes.
  • clusters are only characterized by syntactic and semantic attributes.
  • syntactic attributes such as type technique 912, global distribution 914, local structure 916 and global composition 918 make up the first four layers of the index structure 910.
  • the lower six layers of the index structure describe semantic cluster attributes, such as generic objects 922, generic scene 924, specific objects 926, specific scene 928, abstract object 928 and abstract scene 932.
  • the dividing line 920 between layers 918 and 922 graphically illustrates the transition from syntactic attributes to semantic attributes.
  • the type/technique level 912 provides general information about the visual characteristics of a cluster, which can include descriptions of the features used to create the cluster (e.g., color, texture, etc.), the types of elements in the cluster groups (e.g., objects, animated regions, etc.) and the like.
  • the global distribution level 914 classifies clusters based on attributes of global content, which are generally measured in terms of low-level perceptual features of the records.
  • Global distribution features can include global color characteristics (e.g., dominant color, average color, histogram, etc.), global texture (e.g., coarseness, directionality, contrast), global shape (e.g., aspect ratio), global motion parameters (e.g., speed, acceleration, direction), global deformation (e.g, expanding speed), temporal/spatial dimensions, feature space dimensions, and the like.
  • the global distribution level 914 can also include statistical attributes of the cluster, such as size (the number of records in a cluster), and higher order moments of distribution of records in a cluster.
  • the local structure level 916 is for attributes related to the extraction and characterization of local components of the cluster.
  • Local components generally exhibit elements (e.g., regions, objects) with a homogenous distribution in the given feature space.
  • Local structure attributes include distribution masks, centroids, first and second moments, local distribution functions and the like.
  • the global composition level 918 is the last syntactic, or feature space, attribute level of the index structure 910.
  • global composition refers to the arrangement or spatial layout of the clusters in the feature space.
  • attributes which relate to the specific arrangement or composition of the elements set forth in the local structure level 916. This can include concepts such as the number of sub-clusters, boundaries of the cluster, symmetry and the like.
  • the next level down in the index structure 910 is the generic objects level 922, which is the first semantic attribute layer in the hierarchy.
  • Generic objects are those which are described at a fundamental level using only commonly available knowledge. Generic objects can include such things as "person” and "sky”. Such objects are defined in terms of generic object attributes in the generic object level 922.
  • the generic scenes level 924 which indexes clusters both on the generic objects and their arrangement. Generic scene classes can include city, landscape, indoor, outdoor, still life, portrait and the like. As with generic objects, only generally available knowledge is required to classify records as generic scenes. Unlike generic objects, specific objects refer to those objects which are identified and grouped using specific information.
  • Abstract objects are defined in terms of very specialized knowledge which is highly subjective in nature. Abstract objects can include emotions, such as “anger” or “happy,” as well as concepts such as “hardworking/ 1 "decisive” and the like.
  • attributes which refer to what the cluster as a whole represents are referenced. Thus the semantic object "New York City” could be described by abstract-scene attributes such as “fun,” “hip,” “chaotic” and the like. As such attributes require the most specialized knowledge of what the cluster is and represents, the abstract scene level 932 forms the base of the index hierarchy.
  • Cluster relationships can be defined at the different levels of the attribute index structure 910.
  • Syntactic relationships can be defined at the syntactic and semantic levels. This is represented by the syntactic relationship table 940 extending above and below the dividing line 920 in the index structure 910.
  • Semantic relationships 950 can only be defined in terms of semantic levels. Thus the table of semantic relationships 950 is shown only extending below the dividing line 920.
  • relationships can be defined at different levels. Semantic relationships can be defined at the levels generic, specific and abstract. For example, “to own stock” is a generic semantic (action) relationship; “to own 80% of the stock” is a specific semantic (action) relationship; and “to control the company” is an abstract semantic relationship. Syntactic relationships, such as illustrated in Figure 8, can only be formed on the generic level ( e.g., "similar distribution") or the specific level (“e.g., the difference in the variance is x) of the in the index structure 910. Referring to Figure 8, syntactic relationships include feature spatial relationships such as topological and directional relationships.
  • the topological relationships can further be defined on the generic level (e.g., near from, far from, adjacent to, etc.) and on the specific level (e.g., the union of, the intersection of , distance of centroids, etc.).
  • directional relationships can be defined on the generic level (e.g., to the direction of increasing feature A, to the direction of decreasing feature B) and the specific level (e.g., angle between centroids of clusters in feature space x).
  • FIG 10 is a relational flow diagram that illustrates an entity- relationship model for an archive description scheme for generic audio-visual description scheme, as discussed in connection with Figure 9.
  • clusters are classified in accordance with their attributes and relationships, as syntactic clusters 1002 and semantic clusters 1004.
  • Clusters can also be defined as media clusters and meta clusters, which are not shown in Figure 10.
  • the syntactic clusters can be derived from regions 1006, animated regions 1008 and segments 1010 which can be specific record types defined in the visual description scheme.
  • the syntactic cluster 1002 can also include the syntactic relationships 1012, as illustrated in Figure 8.
  • the semantic cluster 1004 is derived from objects 1014, animated objects 1106, events 1018 and syntactic clusters 1002, by inheritance relationships.
  • the entity -relationship model also includes syntactic elements 1020 which are derived from the regions 1006, animated regions 1008, segments 1010 as well as visual feature relationships 1022.
  • the syntactic elements 1020 are defined in terms of syntactic attributes 1024.
  • semantic elements 1026 are derived from objects 1014, animated objects 1016, events 1018 and semantic relationships 1028.
  • the semantic elements are defined by semantic attributes 1030.
  • the embodiment of Figure 4 illustrates an extension of a single record multimedia description scheme for use as a multimedia archive description scheme. This embodiment is appropriate if the basic single record description scheme(s) can readily be modified.
  • Figure 11 is a block diagram of an alternate embodiment of a multimedia archive description scheme which is suitable for use in conjunction with a non-modified single record description scheme.
  • the single record description scheme is represented by an Audio- Video description scheme (AV DS) block 1105, which can include a syntactic description scheme 1110, a syntactic/semantic link description scheme 1115 and a semantic description scheme 1120.
  • a multimedia index description scheme 1125 provides an index reference to the single document description scheme 1105 for a multimedia archive description scheme block 1130.
  • the multimedia archive description scheme block 1130 can include a number of multimedia index description scheme blocks 1125, each of which relate to a corresponding single document description schemes 1105.
  • the multimedia archive description scheme block 1130 references standard single document description schemes via multiple multimedia index description scheme blocks 1125, which essentially provide an index of the individual records for the multimedia archive description scheme block 1130.
  • FIG 12 is a block diagram of a multimedia archive system in accordance with the present multimedia archive description scheme, systems and methods.
  • the system includes archive storage 1200 wherein multimedia records 1205 and associated multimedia record descriptions 1210 are stored in computer readable media, such as optical disk storage, magnetic disk storage and the like.
  • the multimedia records 1205 can take the form of digital images, image segments, digitized video segments, Internet web pages, hyper-text based documents (HTML, XML and the like), digital audio files and the like.
  • the record descriptions 1210 correspond to the records and provide descriptions in accordance with a description scheme defined for the particular record type. While the archive storage will generally reside locally in storage on a host computer, the archive storage can be remotely located, such as through a collection of hypertext links, or pointers, which reference remote locations for the records 1205 and descriptions 1210.
  • the multimedia archive system can also include various description engines which characterize individual multimedia records in accordance with an appropriate description scheme.
  • the system of Figure 12 includes a video description engine 1215, an audio description engine 1220 and an image description engine 1225.
  • the description engines can access the records 1205 in the archive storage 1200 and generate the record descriptions 1210 in accordance with appropriate record description schemes for individual records.
  • the multimedia archive system includes a cluster processing subsystem 1230.
  • the cluster processing subsystem accesses the record descriptions 1210 and generates clusters, as described in connection with Figures 1 and 4-6.
  • clusters can be defined manually by a user or semi-automatically by a combination of user input and cluster processing subsystem 1230 operations.
  • the cluster processing subsystem 1230 generates a multimedia archive description 1235, including the cluster definitions and multimedia index or collection structure, in accordance with the description scheme defined in accordance with Figs 4-7.
  • the multimedia archive description 1235 can reside in the archive storage 1200.
  • the multimedia archive description can be stored in an archive description database 1240, which is also accessible by the cluster processing subsystem 1230.
  • the system of Figure 12 also includes a query processing subsystem 1237.
  • the query processing subsystem 1237 can access the archive description 1235 via the cluster processing subsystem 1230. Alternatively, the query processing subsystem 1235 can directly access the archive description in the archive storage 1200 or archive description database 1240.
  • the query processing subsystem 1237 can receive a user query through applicable input/output (I/O) circuitry 1245, which can include communication ports, search engines, keyboards, digitizers and the like (not shown).
  • I/O input/output
  • the archive system can be accessed by remote client computer systems 1250 via a dedicated physical I/O connection, such as a dedicated terminal, or via a network connection, such as the Internet.
  • Various media based applications tools and software 1255 can also be used in connection with the present archive description scheme and system. Such applications can work in conjunction with the cluster processing subsystem 1230 and I/O circuitry 1245 to effect various functionality.
  • the various processing systems and subsystems of Figure 12 can be implemented on a dedicated computer such as a mainframe computer or a single personal computer.
  • the various subsystems can be implemented using a number of computer stations which are interconnected via a network, such as a local area network or the Internet.
  • the present multimedia archive description scheme provides data structures that describe collections of multimedia documents.
  • the data structure of the present multimedia archive description scheme are based on clusters, which are description units that interrelate the records in the collection by one or more similarity measures based on attributes and relationships.
  • the present description schemes can be realized as an extension of an existing single document description scheme or as a data structure which works in concert with unmodified description schemes via multiple multimedia index descriptions.

Abstract

A multimedia archive description scheme is provided for characterizing a multimedia archive having records and associated record descriptions. The multimedia archive description scheme provides a data structure which relates records by similarity measures. The principle data structure in the multimedia archive description scheme is a cluster (100). A cluster includes one or more attributes of the records in the archive and can include one or more cluster relationships (110). Cluster attributes (105) can include feature space attributes, semantic attributes, media attributes and meta attributes of the records in the archive. The cluster relationships (110) can relate records to clusters or clusters to clusters. Cluster relationships can include feature space (syntactic) relationships, sematic relationships, media relationships and meta relationships. The multimedia archive description scheme provides an efficient form for describing a collection of records.

Description

MULTIMEDIA ARCHIVE DESCRIPTION SCHEME
Related Applications
This application claims the benefit of United States Provisional Applications, Serial No. 60/118,026 entitled MPEG-7 ARCHIVE DESCRIPTION SCHEME, filed on February 1, 1999, Serial No. 60/142,327 entitled
FUNDAMENTAL ENTITY-RELATIONSHIP MODELS FOR A MULTIMEDIA ARCHIVE DESCRIPTION SCHEME, filed on July 3, 1999, Serial No. 60/118,020 entitled PROPOSAL FOR MPEG-7 IMAGE DESCRIPTION SCHEME, filed on February 1, 1999, and Serial No. 60/118,027 entitled PROPOSAL FOR MPEG-7 VIDEO DESCRIPTION SCHEME, filed on February 1 , 1999.
Field of the Invention
The present invention relates generally to multimedia content description and more particularly relates to a description scheme for a collection of multimedia records, such as a multimedia archive.
Background of the Invention
As multimedia content becomes increasingly pervasive, many applications will benefit from an interoperable archive description standard. One such application is the exchange of multimedia documents among heterogeneous audiovisual databases. For example, when a media company purchases multimedia information from a TV broadcaster, the purchaser usually acquires a large collection of multimedia records, which can include images, image segments, videos, video segments, audio content, documents, and the like. The existence of an archive and audio-visual document description standard would allow the purchasing company to take advantage not only of previous extracted features and annotations for each multimedia document, but also of the previous indexing of the whole multimedia collection. Therefore, the media company could minimize the cost of integrating the purchased content with existing content.
Media companies also have the problem of evaluating the content of a multimedia archive to determine what set of multimedia material is best suited for their purposes. One way to evaluate content is to browse the multimedia documents one by one and manually choose the most appropriate material. This solution is very time consuming and tedious. An appropriate description scheme for multimedia archives would enable applications to more efficiently browse the content of multimedia collections, even with content from different sources. Metasearch engines, which are gateways linking users to multiple and distributed search engines, would also benefit from a multimedia archive description scheme. The operation of current metasearch engines is significantly restrained by the interface limitations of current search engines (e.g. query by example or by sketch and results are a flat list of documents). Archive descriptions will provide significant advantages for the metasearcher in its interaction with multiple search engines. Such queries can involve large collections of multimedia documents. Queries by archive description will allow efficient matching of a multimedia collection in selected feature spaces without the need of exchanging the description of each multimedia document. The interests addressed by the archive content are also more likely to become evident when viewing a description of a collection of multimedia documents rather than an individual multimedia document.
Currently, multimedia standards, such as MPEG-7, include description standards for individual multimedia documents, but do not extend these standards to the description of collections or archives of multimedia documents. If such standards are adopted and search engines make them available to the metasearcher, a more efficient search solution of a multimedia archive can be obtained.
In view of the foregoing, there remains a need for a multimedia archive description scheme suitable for used with media standards, such as MPEG-7. Objects and Summary of the Invention
It is an object to provide an archive description scheme with indexing structures and elements to describe collections of multimedia content.
It is a further object to provide an archive description scheme which relates records in an archive in accordance with properties and descriptions of the records.
It is another object to provide an archive description scheme which relates records in accordance with attributes of the records and relationships between or among the records. In accordance with the present invention, a system for generating a multimedia archive description has a digital storage subsystem for storing multimedia records and descriptions of the records in accordance with a media description scheme. A computer processor is operatively coupled to the digital storage subsystem. The computer processor can access the record descriptions in the digital storage subsystem and generate an archive description record. The archive description record has at least one cluster, which is a data structure relating at least two records, or at least two portions of a single record in the digital storage subsystem. The clusters are formed based on attributes of the record descriptions indicative of a similarity measure in the corresponding records. Cluster attributes are generated from the record descriptions and can include feature space attributes, semantic attributes, media attributes and meta attributes. The computer processor also generates, as part of the archive description record, a multimedia archive index, or collection structure, for the archive.
The system also includes archive description record storage which is operatively coupled to the computer processor for storing the archive description record. The archive description storage can be a separate data storage device or a section of computer readable media associated with the digital storage subsystem.
The clusters in the archive description record can further relate records in accordance with at least one cluster relationship. The cluster relationships can include feature space relationships, semantic relationships, media relationships and meta relationships. Cluster relationships can define relationships between a cluster and a record, between two or more records, or between two or more clusters.
A method in accordance with the present invention generates a description of the content of a multimedia collection, or archive, which has one or more media records and record descriptions associated with the records. The method includes evaluating the record descriptions to determine measures of similarity in at least two records in the archive based on the properties of the records and grouping elements based on these properties. The groupings can be performed automatically, semi-automatically or even manually. A cluster description record can then be generated reflecting the grouping of elements, in accordance with cluster attributes and cluster relationships. A multimedia archive index, or collection structure description scheme, is generated to relate the multimedia archive description to the records in the archive.
Also in accordance with the present multimedia archive description scheme is an archive description file for describing the content of a multimedia archive, having records and record descriptions associated with the records. The principal description element in the archive description file is a cluster. The cluster includes at least one cluster attribute which relates the records in accordance with a similarity measure found in the record descriptions. The clusters can be further defined with the inclusion of at least one cluster relationship.
In each of the embodiments above, the cluster attributes can include descriptors related to feature space attributes, semantic attributes, media attributes and meta attributes. The cluster relationships can include feature space relationships, semantic relationships, media relationships and meta relationships. Cluster relationships can define relationships between records, between records and clusters or between two or more clusters. The cluster attributes can be indexed in accordance with an information-based hierarchy wherein feature space attributes are indexed above semantic attributes in the information-based hierarchy. The feature space attributes can be selected from the group including type/technique attributes, global distribution attributes, local structure attributes and global composition attributes. The semantic attributes can be selected from the group consisting of generic object attributes, generic scene attributes, specific object attributes, specific scene attributes, abstract object attributes and abstract scene attributes. In a specific embodiment, the information-based hierarchy is a ten level index structure having a plurality of feature space levels and a plurality of semantic attribute levels, wherein the feature space attribute levels include a type/technique attribute level, a global distribution attribute level, a local structure attribute level and a global composition attribute level and wherein the semantic attribute levels include a generic object attribute level, a generic scene attribute level, a specific object attribute level, a specific scene attribute level, an abstract object attribute level and an abstract scene attribute level.
The cluster relationships can be selected from the group including feature space relationships, semantic relationships, media relationships and meta relationships. The feature space relationships can be further selected from the group including spatial relationships, temporal relationships, and visual relationships. The semantic relationships can be further selected from the group including lexical relationships and predictive relationships.
The systems, methods and archive description scheme described above can be used in a media archive system in accordance with the present invention. The media archive system includes a computer readable storage system for storing a plurality of media records. At least one media description engine is also provided for accessing media records stored in the computer readable storage system and for generating a media description record corresponding thereto. A cluster processor is operatively coupled to the computer readable storage system. The cluster processor accesses the media description records and generates an archive description record which includes at least one cluster relating at least two records stored in the storage system. A query processor can be provided and operatively coupled to the cluster processor to receive archive search parameters from a user and provide an archive search query to the cluster processor.
The cluster processor can generate the archive description record automatically or semi-automatically with varying degrees of user input. The archive description record can also be generated or modified by manually creating clusters based on cluster attributes and cluster relationships.
BRIEF DESCRIPTION OF THE DRAWING
Further objects, features and advantages of the invention will become apparent from the following detailed description taken in conjunction with the accompanying figures showing illustrative embodiments of the invention, in which
Figure 1 is a simplified block diagram of a multimedia archive description scheme in accordance with the present invention;
Figure 2A is a Venn diagram illustrating exemplary multimedia archive content and potential clusters, in a feature space;
Figure 2B is a hierarchal diagram illustrating an exemplary cluster decomposition relationship for the clusters identified in Figure 2A;
Figure 3 A is a Venn diagram illustrating exemplary multimedia archive content and potential clusters, in a subject space; Figure 3B is a hierarchal diagram illustrating an exemplary cluster decomposition relationship for the clusters identified in Figure 3A;
Figure 4 is a block diagram illustrating a first embodiment of the present multimedia archive description scheme architecture;
Figure 5 is a block diagram illustrating an exemplary cluster descriptor arrangement, for use in connection with the description scheme architecture of Figure 4;
Figure 6 is a block diagram illustrating an exemplary cluster relationship arrangement, for use in connection with the description scheme architecture of Figure 4; Figure 7 is a table illustrating an index structure for semantic relationships for an exemplary multimedia archive description scheme;
Figure 8 is a table illustrating an index structure for syntactic relationships for an exemplary multimedia archive description scheme; Figure 9 is a pictorial representation of a cluster attribute index structure and cluster relationship index structure for an exemplary multimedia archive description scheme;
Figure 10 is a block diagram illustrating an exemplary entity- relationship model for a multimedia archive description scheme;
Figure 11 is a block diagram illustrating an alternate embodiment of the present multimedia archive description scheme architecture; and
Figure 12 is a block diagram of a multimedia archive system employing a multimedia archive description scheme in accordance with the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject invention will now be described in detail with reference to the figures, it is done so in connection with the illustrative embodiments. It is intended that changes and modifications can be made to the described embodiments without departing from the true scope and spirit of the subject invention as defined by the appended claims. For the present multimedia archive description scheme, the nature of the multimedia records are not critical. As used herein, the terms multimedia document and multimedia record are synonymous and refer generically to any item of multimedia content, such as an image, an image object (e.g., a portion of an image), a video segment, an Internet web page including text and graphics (e.g., HTML and XML based multimedia content), and the like.
The present multimedia archive description schemes are generally applicable to systems and methods for describing collections of multimedia records where the individual records in the collection can be, or have been, described by a media description scheme for individual records, referred to herein generally as single record multimedia description schemes. Exemplary single record multimedia description schemes for various multimedia content, such as generic audio-video content, images and video are described in co-pending International Applications: PCT/US99/26125 entitled "Systems and Methods for Interoperable Multimedia Content Descriptions;" PCT/US99/26127 entitled "Image Description System and Method;" and PCT/US99/26126 entitled "Video Description System and Method;" respectively, which are incorporated herein by reference.
In general, multimedia records can be described as a set of elements, such as syntactic, semantic, meta and media elements. For example, an image can be described as a group of objects, with features and relationships associated with those objects. This description philosophy is extensible to a collection of multimedia documents, which can also be viewed as a set of elements that are further described by attributes of the elements and relationships to other elements.
The simplified block diagram of Figure 1 illustrates the concept of the present multimedia archive description scheme. The primary unit of description in the present multimedia archive description scheme is a cluster 100. A cluster 100 represents a grouping of records in the archive based on the properties of the records which are set forth in descriptions of the records. A cluster 100 includes at least one cluster attribute 105 which describes a property of the cluster. Clusters can also include cluster relationships 110. Cluster relationships can be simple, one to one relationships such as between clusters, records, or record segments, or can be complex, m to n relationships between multiple clusters, records, or record segments. The cluster relationships 110 can be further defined by cluster relationship attributes 115.
A cluster 100 can be characterized in numerous ways, such as by statistics and distributions of its elements in different feature spaces, semantics associated with the cluster, media information, and meta information. Feature-space and semantic attributes can be used to describe the multimedia content in the cluster. Media and meta attributes generally convey to information that is closely related to the cluster, but not explicitly given by its content. Examples of this type of cluster attribute can include the identification of the cluster in a database or the method used to create a cluster. A cluster 100 can be created based on a vast number of combinations of attributes and/or relationships. Clusters can also be created based on human perception, which may not correspond to any combination of the attributes and relationships describing these elements. Clusters 100 can be grouped to form other clusters, so composition and/or decomposition are possible relationships among clusters. More complex relationships involve relating the clusters based on their attributes 105, e.g., "color more random than." Relationships 110 among clusters can be defined for each cluster attribute type, e.g., syntactic feature space, semantic features, media, and meta. Probabilities can also be assigned to relationships as a confidence factor of a stated association (e.g., "Object 1 is contained in Cluster A with probability 0.9"). Relationships can also relate clusters to elements of multimedia documents (e.g., "Cluster A is composed of Object 1 and Object 2").
Examples of the present multimedia archive description scheme, in accordance with Figure 1, are illustrated in Figures 2 and 3. Figure 2 A graphically represents elements of an archive and possible clusters which are defined in a generic feature space, designated X. The elements 220 of this archive, which are illustrated as stars distributed in the feature space, can be still regions (e.g. image objects), moving regions (e.g. video objects), and video segments, among others. Clusters 0, 1, 2, 2J, 2J, 2.3 and 3 are described in terms of attributes color 205, shape 210 and file size 215. Figure 2B illustrates a simple decomposition relationship among the clusters in Figure 2 A. In this case, cluster 0 includes clusters 1 , 2, and 3 and cluster 2 includes clusters 2J, 2.2 and 2.3. Clearly, other relationships can be defined for these clusters, such as cluster 2 intersects cluster 2; cluster 2.2 has a higher average color value than cluster 2J; etc.
Figures 3A and 3B also represent exemplary archive content defined as clusters, and a cluster decomposition relationship, respectively. Referring to Figure 3A, the elements of this archive can be objects and events which are related by the exemplary clusters: Subject 300, History 305, Art 310 and Science 320. Within the Art cluster 310, the clusters Expressionist 325, Impressionist 330, and Modern 335 are defined. Figure 3B illustrates a simple cluster decomposition relationship between the clusters of Figure 3 A.
Figure 4 is a block diagram illustrating an embodiment of the present multimedia archive description scheme. This embodiment is one which extends the definition of a single record multimedia description scheme to represent multimedia archives having one or more records. In Figure 4, the various blocks represent fields of information in the multimedia archive description scheme. The various blocks in the description scheme are linked by composition relationships, which are illustrated by diamond shaped arrows, or by inheritance relationships, which are illustrated by triangular shaped arrows. Clusters 100 are defined by at least one attribute 105 and may include one or more relationships 110. Referring to Figure 4, these parameters are stored in a cluster description scheme block 405 which includes cluster relations 415 and cluster descriptors 410 by way of composition relationships. The cluster descriptor block 410 includes the attributes of the cluster. The cluster description scheme block 405 also includes a cluster node block 400. The cluster node block 400 is an element used for grouping a set of elements in a relationship. The cluster description scheme block 405 is also linked to a collection structure description scheme block 420, which is an index bridging the single document multimedia archive description scheme, such as a description scheme for generic Audio-Visual content (Generic AV DS) 425, to the cluster description scheme block 405 and cluster relations block 415.
The cluster node 400 includes references to various components which further describe the content of the individual records in the archive. For example, the cluster node 400 can include, via a composition link, references to segments 430 (such as video), references to events 435 and references to objects 440 (such as image objects) which are defined in an existing multimedia description scheme. The cluster node 400 can also include references to other clusters 445 and references to the multimedia description scheme 450. The references operate as pointers which convey the location of a source of information. The cluster descriptor block 410 is further defined in the relational block diagram of Figure 5. The cluster descriptor block 410 can include feature space descriptors 500, semantic descriptors 505, media descriptors 510 and meta descriptors 515, which are used in connection with single document media description schemes. The feature space descriptors are a set of properties which describe the feature space attributes of records in a description scheme. Such attributes are generally syntactic and refer to the way the content of records are arranged without considering the meaning conveyed by such arrangements. Feature space attributes generally describe the cluster by its appearance in a given feature space. Feature space attributes can also describe statistical attributes (e.g., size and higher order moments) of distribution of records in a cluster. Feature space attributes can include information such as feature space point 520, feature space orientation 522, feature space bounding box parameters 524, feature space contour definitions 526 and feature space quantization 528. The feature space descriptor 500 can also inherit the properties of the feature space 530 and feature space distribution 532.
The semantic descriptor block 505 can inherit free form annotations 534 as well as conventional 6-w parameters 536, which are also used in connection with the single record description schemes for multimedia records. Semantic attributes generally refer to the meaning conveyed by the arrangement of records. The 6-w's include who 538, where 540, what object 542, what action 544, why 546 and when 548.
The media descriptor block 510 includes information describing the media attributes of a cluster. For example, the media descriptor block 510 may inherit format information 550, storage requirements 552, file identification parameters 556, and file location information 558 of the clusters.
The meta descriptor block 515 includes author-generated information which is input by an author of the document or the creator of the cluster. The meta attributes of a cluster can include information related to the creation of the cluster, such as the method, constraints or rules followed to create the cluster based on the elements' attributes. As illustrated in Figure 5, the meta descriptor block 515 can inherit information such as representative icons 560, intellectual property rights attribution 562 and creation information 564, such as method of creation 566, date/time of creation 568, and organization 570.
Figure 6 further illustrates the construction and content of the cluster relationship block 415. Cluster relationships 415 can be broadly classified as feature space relationships 605, semantic relationships 610, media relationships 670 and meta relationships 680. Semantic relationships 610, which are used to relate semantic interpretations of clusters, further include lexical relationships 615, action relationships 620 and state relationships 625, all of which can be inherited from the description of the individual records in the archive collection. The action relationships 620 and state relationships 625 are examples of predictive relationships. For example, in Figure 3, the relationship "Impressionism is a part of cluster Art" is a semantic relationship between the clusters "impressionism" and "art". The lexical relationships 615 correspond to the semantic relationships among nouns, such as those described in the article "WordNet: A lexical Database for English", by G.A. Miller, Communications of the ACM, Vol. 38, No. 11. pp. 39-41, November 1995. These relationships can include synonymy (e.g., "pipe is similar to tube"), antonymy (e.g., "happiness is opposite to sadness"), hyponymy/hypernymy (e.g., "a dog is an animal" and "an animal is the type of dog") and meronymy/holonymy (e.g., "a musician is member of a musical band" and "a musical band is composed of musicians").
Predicative semantic attributes can include action relationships 620 (e.g., "to throw" and "to hit") and state relationships 625 (e.g., "to own" and "to control") among two or more clusters. These relationships are further set forth in the table of Figure 7, that provides an indexing structure for semantic relationships. Feature space relationships 605 can include relationships such as cluster union 630, cluster intersection 635, cluster decomposition 640, Rtheta relationships 645 and cluster elements 650. These relationships are syntactic in nature. The Rtheta relationships generally include orientation information 655 and feature space distance information 660. This listing of feature space relationships is a non-exhaustive, representative list. The actual set of feature space relationships for a given archive description scheme instantiation can include subsets of these relationships and can also include other relationships not shown in Figure 6, yet are relevant to a particular feature space. The table of Figure 8 provides an exemplary indexing structure for syntactic relationships for the multimedia archive description scheme.
An exemplary embodiment of the multimedia archive description scheme illustrated in Figures 4-6 is set forth in Appendix A. This embodiment defines the present description scheme in XML (extensible mark-up language) syntax. Appendix B illustrates instantiations of the present archive description scheme set forth in Appendix A for the exemplary archives of Figures 2 and 3, respectively. Figure 9 is a pictorial diagram which provides a ten-level index structure for cluster attributes and a correspondence of this index structure to cluster relationships for an exemplary archive of records described by a generic audio-visual description scheme. In the exemplary generic audio-visual description scheme, records are generally images, image segments or video segments, and are described in terms of objects, regions and events. In Figure 9, the cluster attribute index structure 910 is visually represented by a ten level pyramid representation. Each level in the index structure 910 represent attributes which require more information to define them than the layers above. As noted above, cluster attributes can be syntactic, semantic, media and meta type attributes. However, in the exemplary index structure 910 of Figure 9, clusters are only characterized by syntactic and semantic attributes. In the exemplary index structure 910 syntactic attributes, such as type technique 912, global distribution 914, local structure 916 and global composition 918 make up the first four layers of the index structure 910. The lower six layers of the index structure describe semantic cluster attributes, such as generic objects 922, generic scene 924, specific objects 926, specific scene 928, abstract object 928 and abstract scene 932. The dividing line 920 between layers 918 and 922 graphically illustrates the transition from syntactic attributes to semantic attributes.
The type/technique level 912 provides general information about the visual characteristics of a cluster, which can include descriptions of the features used to create the cluster (e.g., color, texture, etc.), the types of elements in the cluster groups (e.g., objects, animated regions, etc.) and the like. The global distribution level 914 classifies clusters based on attributes of global content, which are generally measured in terms of low-level perceptual features of the records. Global distribution features can include global color characteristics (e.g., dominant color, average color, histogram, etc.), global texture (e.g., coarseness, directionality, contrast), global shape (e.g., aspect ratio), global motion parameters (e.g., speed, acceleration, direction), global deformation (e.g, expanding speed), temporal/spatial dimensions, feature space dimensions, and the like. The global distribution level 914 can also include statistical attributes of the cluster, such as size (the number of records in a cluster), and higher order moments of distribution of records in a cluster.
The local structure level 916 is for attributes related to the extraction and characterization of local components of the cluster. Local components generally exhibit elements (e.g., regions, objects) with a homogenous distribution in the given feature space. Local structure attributes include distribution masks, centroids, first and second moments, local distribution functions and the like.
The global composition level 918 is the last syntactic, or feature space, attribute level of the index structure 910. In this context, global composition refers to the arrangement or spatial layout of the clusters in the feature space. In the global composition level 918 includes attributes which relate to the specific arrangement or composition of the elements set forth in the local structure level 916. This can include concepts such as the number of sub-clusters, boundaries of the cluster, symmetry and the like.
The next level down in the index structure 910 (level of increasing knowledge) is the generic objects level 922, which is the first semantic attribute layer in the hierarchy. Generic objects are those which are described at a fundamental level using only commonly available knowledge. Generic objects can include such things as "person" and "sky". Such objects are defined in terms of generic object attributes in the generic object level 922. Below the generic objects level 922 is the generic scenes level 924, which indexes clusters both on the generic objects and their arrangement. Generic scene classes can include city, landscape, indoor, outdoor, still life, portrait and the like. As with generic objects, only generally available knowledge is required to classify records as generic scenes. Unlike generic objects, specific objects refer to those objects which are identified and grouped using specific information. For example "George Washington" is a specific-object attribute. Specific object attributes are indexed in the specific object level 926 of the index structure 910. Similarly, the specific scene level 928 is analogous to the generic scene level 924 except that the attributes are further defined in terms of specific knowledge related to the records. "New York City" is an example of a specific scene attribute.
Moving down the index structure 910 more knowledge is required. Beneath the specific scene level 928 is the abstract object level 930. Abstract objects are defined in terms of very specialized knowledge which is highly subjective in nature. Abstract objects can include emotions, such as "anger" or "happy," as well as concepts such as "hardworking/1 "decisive" and the like. Similarly, for the abstract scene level 932, attributes which refer to what the cluster as a whole represents are referenced. Thus the semantic object "New York City" could be described by abstract-scene attributes such as "fun," "hip," "chaotic" and the like. As such attributes require the most specialized knowledge of what the cluster is and represents, the abstract scene level 932 forms the base of the index hierarchy.
Cluster relationships can be defined at the different levels of the attribute index structure 910. Syntactic relationships can be defined at the syntactic and semantic levels. This is represented by the syntactic relationship table 940 extending above and below the dividing line 920 in the index structure 910. Semantic relationships 950 can only be defined in terms of semantic levels. Thus the table of semantic relationships 950 is shown only extending below the dividing line 920.
As set forth in Figures 7- 9, from the ten-level index structure 910 of Figure 9, relationships can be defined at different levels. Semantic relationships can be defined at the levels generic, specific and abstract. For example, "to own stock" is a generic semantic (action) relationship; "to own 80% of the stock" is a specific semantic (action) relationship; and "to control the company" is an abstract semantic relationship. Syntactic relationships, such as illustrated in Figure 8, can only be formed on the generic level ( e.g., "similar distribution") or the specific level ("e.g., the difference in the variance is x) of the in the index structure 910. Referring to Figure 8, syntactic relationships include feature spatial relationships such as topological and directional relationships. The topological relationships can further be defined on the generic level (e.g., near from, far from, adjacent to, etc.) and on the specific level (e.g., the union of, the intersection of , distance of centroids, etc.). Similarly, directional relationships can be defined on the generic level (e.g., to the direction of increasing feature A, to the direction of decreasing feature B) and the specific level (e.g., angle between centroids of clusters in feature space x).
Figure 10 is a relational flow diagram that illustrates an entity- relationship model for an archive description scheme for generic audio-visual description scheme, as discussed in connection with Figure 9. In this exemplary entity relationship model, clusters are classified in accordance with their attributes and relationships, as syntactic clusters 1002 and semantic clusters 1004. Clusters can also be defined as media clusters and meta clusters, which are not shown in Figure 10. In the case of an archive for a visual description scheme, the syntactic clusters can be derived from regions 1006, animated regions 1008 and segments 1010 which can be specific record types defined in the visual description scheme. The syntactic cluster 1002 can also include the syntactic relationships 1012, as illustrated in Figure 8. The semantic cluster 1004 is derived from objects 1014, animated objects 1106, events 1018 and syntactic clusters 1002, by inheritance relationships. The entity -relationship model also includes syntactic elements 1020 which are derived from the regions 1006, animated regions 1008, segments 1010 as well as visual feature relationships 1022. The syntactic elements 1020 are defined in terms of syntactic attributes 1024. Similarly, semantic elements 1026 are derived from objects 1014, animated objects 1016, events 1018 and semantic relationships 1028. The semantic elements are defined by semantic attributes 1030. The embodiment of Figure 4 illustrates an extension of a single record multimedia description scheme for use as a multimedia archive description scheme. This embodiment is appropriate if the basic single record description scheme(s) can readily be modified. Figure 11 is a block diagram of an alternate embodiment of a multimedia archive description scheme which is suitable for use in conjunction with a non-modified single record description scheme.
Referring to Figure 11 , the single record description scheme is represented by an Audio- Video description scheme (AV DS) block 1105, which can include a syntactic description scheme 1110, a syntactic/semantic link description scheme 1115 and a semantic description scheme 1120. A multimedia index description scheme 1125 provides an index reference to the single document description scheme 1105 for a multimedia archive description scheme block 1130. The multimedia archive description scheme block 1130 can include a number of multimedia index description scheme blocks 1125, each of which relate to a corresponding single document description schemes 1105. Thus, rather than altering and expanding a single document description scheme to be suitable for multiple record description, the multimedia archive description scheme block 1130 references standard single document description schemes via multiple multimedia index description scheme blocks 1125, which essentially provide an index of the individual records for the multimedia archive description scheme block 1130.
Figure 12 is a block diagram of a multimedia archive system in accordance with the present multimedia archive description scheme, systems and methods. The system includes archive storage 1200 wherein multimedia records 1205 and associated multimedia record descriptions 1210 are stored in computer readable media, such as optical disk storage, magnetic disk storage and the like. The multimedia records 1205 can take the form of digital images, image segments, digitized video segments, Internet web pages, hyper-text based documents (HTML, XML and the like), digital audio files and the like. The record descriptions 1210 correspond to the records and provide descriptions in accordance with a description scheme defined for the particular record type. While the archive storage will generally reside locally in storage on a host computer, the archive storage can be remotely located, such as through a collection of hypertext links, or pointers, which reference remote locations for the records 1205 and descriptions 1210.
The multimedia archive system can also include various description engines which characterize individual multimedia records in accordance with an appropriate description scheme. For example, the system of Figure 12 includes a video description engine 1215, an audio description engine 1220 and an image description engine 1225. The description engines can access the records 1205 in the archive storage 1200 and generate the record descriptions 1210 in accordance with appropriate record description schemes for individual records.
The multimedia archive system includes a cluster processing subsystem 1230. The cluster processing subsystem accesses the record descriptions 1210 and generates clusters, as described in connection with Figures 1 and 4-6. In addition, clusters can be defined manually by a user or semi-automatically by a combination of user input and cluster processing subsystem 1230 operations. The cluster processing subsystem 1230 generates a multimedia archive description 1235, including the cluster definitions and multimedia index or collection structure, in accordance with the description scheme defined in accordance with Figs 4-7. The multimedia archive description 1235 can reside in the archive storage 1200. Alternatively, the multimedia archive description can be stored in an archive description database 1240, which is also accessible by the cluster processing subsystem 1230.
The system of Figure 12 also includes a query processing subsystem 1237. The query processing subsystem 1237 can access the archive description 1235 via the cluster processing subsystem 1230. Alternatively, the query processing subsystem 1235 can directly access the archive description in the archive storage 1200 or archive description database 1240. The query processing subsystem 1237 can receive a user query through applicable input/output (I/O) circuitry 1245, which can include communication ports, search engines, keyboards, digitizers and the like (not shown). The archive system can be accessed by remote client computer systems 1250 via a dedicated physical I/O connection, such as a dedicated terminal, or via a network connection, such as the Internet. Various media based applications tools and software 1255 can also be used in connection with the present archive description scheme and system. Such applications can work in conjunction with the cluster processing subsystem 1230 and I/O circuitry 1245 to effect various functionality.
The various processing systems and subsystems of Figure 12 can be implemented on a dedicated computer such as a mainframe computer or a single personal computer. Alternatively, the various subsystems can be implemented using a number of computer stations which are interconnected via a network, such as a local area network or the Internet.
The present multimedia archive description scheme provides data structures that describe collections of multimedia documents. The data structure of the present multimedia archive description scheme are based on clusters, which are description units that interrelate the records in the collection by one or more similarity measures based on attributes and relationships. The present description schemes can be realized as an extension of an existing single document description scheme or as a data structure which works in concert with unmodified description schemes via multiple multimedia index descriptions.
Although the present invention has been described in connection with specific exemplary embodiments, it should be understood that various changes, substitutions and alterations can be made to the disclosed embodiments without departing from the spirit and scope of the invention as set forth in the appended claims.
Annex A: MPEG-7 DDL Representation of the Archive DS
<DSType name="GenericAVDS">
<attrDecl name="id"> <datatypeRef name="ID"/> </attrDecl> <attrDecl name- 'href '> <datatypeRef name="uri"/> </attrDecl> <DSTypeRef name="SyntacticDS" minOccur="0" maxOccur='T7> <DSTypeRef name="SemanticDS" minOccur="0" maxOccur="l"/> <DSTypeRef name="SyntacticSemanticLinkDS" minOccur="0" maxOccur="l "/> <DSTypeRef name="ModelDS" minOccur="0" maxOccur="l"/> <DSTypeRef name="MetaDS" minOccur="0" maxOccur="l"/> <DSTypeRef name="MediaDS" minOccur="0" maxOccur="rV> <DSTypeRef name="SummaryDS" minOccur="0" maxOccur="l"/>
<DSTypeRef name="MMIndexDS" minOccur="0" maxOccur="l"/> </DSType>
<DSType name="MMIndexDS">
<DSTypeRef="ClusterDS" minOccur="0" maxOccur="*'7> <DSTypeRef="ClusterRelation" minOccur="0" maxOccur="*"/>
</DSType>
<DSType name="ClusterDS">
<DescTypeRef="ClusterDescriptor" minOccur="0" maxOccur"*"/> <DSTypeRef="ClusterRelation" minOccur="0" maxOccur"*"/>
</DSType>
<DSType name="ClusterRelation"> <subDSOf name="Relation"/> <choice minOccur=' ' maxOccur="*"> <DSTypeRef="Cluster"/> <DSTypeRef="ClusterNode"/> <DSTypeRef="ClusterRelation"/> </choice> </DSType>
<DSType name="ClusterNode">
<subDSOf name="EntityNode"/> <choice minOccur="0" maxOccur="*">
<DescTypeRefname="ReferenceToSegment"/>
<DescTypeRefname="ReferenceToObject"/>
<DescTypeRefname="ReferenceToEvent"/>
<DescTypeRefname="ReferenceToCluster"/>
<DSTypeRefname="Cluster"/>
<DSTypeRefname="ClusterNode"/>
A-l </choice>
<DSTypeRef name="ClusterRelation" minOccur="0" maxOccur="* > </DSTyρe>
<DescType name="ClusterDescriptor>
OttrDecl name="type"> <datatypeRef name*="string'7> </attrDecl> OttrDecl name- 'level"> <datatypeRef name="string"/> </attrDecl>
</DescType>
<!- Specialized cluster descriptors -->
<DescType name="FeatureSpaceDescriptor">
<subDescOfname="ClusterDescriptor"/> < DescType>
<DescType name="Feature Space" minOccur="0" maxOccur='T'> <subDescOfname="FeatureSpaceDescriptor"/>
OttrDecl name="NumberDimensions"> <datatypeRef name="integer"/> </attrDecl> <DescType name="FeatureDimension" minOccur='J" maxOccur="*">
OttrDecl name="name"> <datatypeRef name="string"/> </attrDecl> OttrDecl name="id"> <datatypeRef name="ID"/> </attrDecl> </DescType> </DescTypeRef>
<DescType name="FeatureSpacePoint" minOccur="l" maxOccur="*"> <subDescOfname="FeatureSpaceDescriptor"/> <attrDecl name="unit"> <datatypeRef name="double'7> </attrDecl> <DescType name="DimensionOrdinate" minOccur=' ' maxOccur="*">
OttrDecl name="dimension"> <datatypeRef name="IDREF'7> </attrDecl> <datatypeRef name="double'7> </DescType> </DescType>
<DescType name- 'FeatureSpaceOrientation" minOccur-='T' maxOccur='T'> <subDescOfname="FeatureSpaceDescriptor7> OttrDecl name- 'unit"> <datatypeRef name="double > </attrDecl> <DescType name="DimensionAngle" minOccur="l" maxOccur="*"> OttrDecl name="dimension" required- 'true">
<datatypeRef name="IDREF'7> </attrDecl>
<datatypeRef name="double'7> </DescType> </DescType> <DescType name="FeatureSpaceBoundingBox">
<subDescOfname="FeatureSpaceDescriptor'7> <DescTypeRef name- 'FeatureSpace" minOccur="0" maxOccur="l'7> <choice minOccur="l" maxOccur="l"> Oll>
A-2 <DescType name="FeatureSpaceCenter" minOccur='T' maxOccur='T'>
<subDescOf name="FeatureSpacePoint/> </DescType>
<DescTypeRef name="FeatureSpaceOrientation" minOccur="l " maxOccur=' 7>
</all> <all>
<DescTypeRef name="FeatureSpacePoint" minOccur="l " maxOccur="*'7>
</all> </choice> </DescType> <DescType name="FeatureSpaceContour">
<subDescOfname="FeatureSpaceDescriptor'7> <DescTypeRef name="FeatureSpace" minOccur^'O" maxOccur='T7> <DescTypeRef name="FeatureSpacePoint" minOccur='T' maxOccur="*7> < DescType> <DescType name="FeatureSpaceQuantization">
<subDescOfname="FeatureSpaceDescriptor'7> <choice minOccur="r' maxOccur='T'>
<DescTypeRefname="LinearQuantization" minOccur='T' maxOccur=" 1'7> <DescTypeRef name- 'NonLinearQuantization"
Figure imgf000024_0001
maxOccur='T7> <DescTypeRef name="LookupTable" minOccur='J " maxOccur-='T7> </choice> </DescType> <DescType name="FeatureSpaceDistribution">
<subDescOfname="FeatureSpaceDescriptor'7>
<attrDecl name="NumberOfBins"> <datatypeRef name="integer'7> </attrDecl>
<choice m
Figure imgf000024_0002
maxOccur=" 1 ">
<DescType name="Distribution" minOccur=" 1 " maxOccur=" 1 "> <subDescOfname="FeatureSpaceDescriptor'7> OttrDecl name="name"> <datatypeRef name="string'7> </attrDecl> <DescType name="Moment">
OttrDecl name="oder"> <datatypeRef name=" integer "/> </attrDecl>
<datatypeRef name="double'7> </DescType> </DescType>
<DescType name="DistributionFunction"
Figure imgf000024_0003
maxOccur='T'> <subDescOfname="FeatureSpaceDescriptor'7> <DescTypeRef name="FeatureSpace" minOccur="0" maxOccur='T7> <DescTypeRefname="FeatureSpaceQuantization" minOccur="0" maxOccur="l "/>
<DescTypeRefname="DistributionFunctionValues'7> </DescType> </choice>
A-3 </DescTypeRef> </DescType> <DescType name="FeatureSpaceElementTypes">
<subDescOf name="FeatureSpaceDescriptor'7>
<DescType name- ΕlementType" minOccur-="l" maxOccur*="*">
OttrDecl name="name"> <datatypeRef name="string'7> </attrDecl> <DescType name--="Percentage">
<datatypeRef name="percentage'7> </DescType> </DescType> </DescTypeRef>
<DescType name="SemanticDescriptor">
<subDescOfname="ClusterDescriptor'7> </DescType> <DescType name="Annotation">
<subDescOfname="SemanticDescriptor'7>
<datatypeRefname="string'7> </DescType> <DescType name="6-WDSM>
<subDescOfname="SemanticDescriptor'7>
<DescTypeRef name="Who" minOccur="0" maxOccur=" 1*7>
<DescTypeRef name="WhatObject" minOccur="0" maxOccur='T7>
<DescTypeRef name="WhatAction" minOccur="0" maxOccur*='T7>
<DescTypeRef name="When" minOccur="0" maxOccur="l'7>
<DescTypeRefname=" Where" minOccur="0" maxOccur=" 1 >
<DescTypeRef name="When" minOccur="0M maxOccur="r'/>
<DescTypeRef name="Why" minOccur="0" maxOccur=" l"/> </DescType> <DescType name="Who">
<subDescOfname="SemanticDescriptor'7>
<DescTypeRef name="Annotation" minOccur^'J " maxOccur="*'7> </DescType> <DescType name="WhatObject">
<subDescOfname="SemanticDescriptor'7>
<DescTypeRef name="Annotation" minOccur=' ' maxOccur="*'7> </DescType> <DescType name="WhatAction">
<subDescOfname="SemanticDescriptor'7>
<DescTypeRefname="Annotation" minOccur=' ' maxOccur="*'7> </DescType> <DescType name="When">
<subDescOfname="SemanticDescriptor'7>
<DescTypeRef name- 'Annotation" minOccur-^'T' maxOccur="*'7> </DescType> <DescType name="Where">
A-4
NY02.2 418 .2 <subDescOfname="SemanticDescriptor"/>
<DescTypeRef name="Annotation" minOccur="l " maxOccur="* > </DescType> <DescTyρe name-="Why">
<subDescOfname="SemanticDescriptor'7>
<DescTypeRef name="Annotation" minOccur=' ' maxOccur="*7> </DescType>
<DescType name="MediaDescriptor">
<subDescOfname="ClusterDescriptor'7> </DescType> <DescType name="Location">
<subDescOfname="MediaDescriptor'7>
OttrDecl name="href '> <datatypeRef name="uri7> </attrDecl> </DescType> <DescType name="Identification">
<subDescOfname="MediaDescriptor'7>
<datatypeRef name="string'7> <!-- or lexical expression — > </DescType> <DescType name="StorageRequirements">
<subDescOfname="MediaDescriptor'7>
<datatypeRefname-="string7> </DescType> <DescType name="Format">
<subDescOfname="MediaDescriptor'7>
<datatypeRef name="string"/> </DescType> <DescType name="Medium">
<subDescOfname="MediaDescriptor"/>
<datatypeRef name=" string'7> < DescType>
<DescType name="MetaDescriptor">
<subDescOfname="ClusterDescriptor'7> </DescType> <DescType name="Creation">
<subDescOfname="MediaDescriptor"/> <DescType name="Method" minOccur="0" maxOccur='J"> OttrDecl name="mode"> <enumeration>
<literal> Automatic </literal> <literal> Manual </literal> </enumeration> <DescTypeRefname="Rules'7> <DescTypeRef name="RepresentativeExamples'7> <DescTypeRefname="Classiιϊer'7>
A-5 <DescTypeRef name="ManualClassification'7>
</DescType>
<DescType name- 'DateTime"> <datatypeRef name="dateTime'7> </DescType>
<DescType name- 'Organization'^ <datatypeRef name="string"/> </DescType> </DescType> <DescType name="Rights">
<subDescOfname*="MetaDescriptor'7>
<datatypeRef name= " string'7> </DescType> <DescType name="RepresentativeI cons">
<subDescOfname="MetaDescriptor'7>
<DescTypeRef name="Location" minOccur-=" 1 " maxOccur"*'7> </DescType>
<!— Specialized cluster relationships -->
<DSType name="FeatureSpaceRelation">
<subDSOfname="ClusterRelation"/> </DSType> <DSType name="ClusterDecomposition">
<subDSOfname="FeatureSpaceRelation'7>
OttrDecl name="type"> <fixed>FeatureSpace Topological</fϊxed> </attrDecl> OttrDecl name="name"> <fϊxed>ClusterDecomposition</fιxed> </attrDecl> OttrDecl name="degree"> <fixed>2</fixed> </attrDecl> <attrDecl name="DecompositionType"> <enumeration>
<literal> Temporal </literal> <literal> Spatial </literal> <literal> Spatial Temporal </literal> <literal> Media </literal> <literal> FeatureSpace </literal> </enumeration> </attrDecl>
OttrDecl name="overlaps"> <datatypeRef name="boolean'7> </attrDecl> OttrDecl name="gaps"> <datatypeRef name="boolean"/> </attrDecl> </DSType> <DSType name="ClusterUnion">
<subDSOfname="FeatureSpaceRelation'7>
OttrDecl name="type"> <fιxed>FeatureSpace Topological</fixed> </attrDecl> OttrDecl name="name"> <fιxed>Union</fιxed> </attrDecl> OttrDecl name="degree"> <fιxed>2</fixed> </attrDecl> < DSType> <DSType name="ClusterIntersection">
<subDSOfname="FeatureSpaceRelation'7>
<attrDecl name="type"> <fixed>Feature Space Topological</fixed> </attrDecl>
A-6 OttrDecl name="name"> <fιxed>lntersection</fixed> </attrDecl>
OttrDecl name="degree"> <fixed>2</fixed> </attrDecl> </DSType> <DSType name="ClusterNegation">
<subDSOfname="FeatureSpaceRelation"/>
OttrDecl name="type"> <fϊxed>FeatureSpace Topological</fιxed> </attrDecl>
OttrDecl name="name"> <fιxed>Negation</fixed> </attrDecl>
OttrDecl name- 'degree"> <fιxed>l</fixed> </attrDecl> </DSType> <DSType name="ClusterElements">
<subDSOfname="FeatureSpaceRelation7>
OttrDecl name="type"> <fixed>FeatureSpace</fixed> </attrDecl>
OttrDecl name*="name"> <fιxed>Elements</fϊxed> </attrDecl>
OttrDecl name- 'degree"> <fixed>2</fixed> </attrDecl> </DSTyρe> <DSType name="RThetaRelation">
<subDSOfname="FeatureSpaceRelation7>
OttrDecl name="type"> <fιxed>FeatureSpace Directional</fixed> </attrDecl>
OttrDecl name="name"> <fixed>Elements</fixed> </attrDecl>
OttrDecl name="degree"> <fixed>2</fϊxed> </attrDecl>
<DescTypeRef name="FeatureSpaceOrientation" minOccur="l " maxOccur=" l "/>
<DescType name="FeatureSpaceDistance" minOccur='T' maxOccur=" '> <datatypeRefname="double"/>
</DescType> </DSType>
<DSType name="SemanticRelation">
<subDSOfname="ClusterRelation'7> </DSType> <DSType name="LexicalRelation">
<subDSOfname="SemanticRelation'7> OttrDecl name="type"> <enumeration>
<literal> Synonymy </literal> <literal> Antonymy </literal> <literal> Hyponymy </literal> <literal> Meronymy </literal> </enumeration> </attrDecl>
OttrDecl name="degree"> <fixed>2</fixed> </attrDecl> </DSType> <DSType name="ActionRelation">
<subDSOfname="SemanticRelation'7>
OttrDecl name="type"> <fϊxed> Semantic Action</fixed> </attrDecl> </DSType> <DSType name="StateRelation">
A-7 <subDSOfname="SemanticRelation'7>
OttrDecl name="type"> <fιxed>Semantic State</fixed> </attrDecl> </DSType>
A-8 Annex B: Instantiation of the Archive DS
<!-- Examples in Figure 2 -->
<GenericAVDS>
<MMIndexDS>
<Cluster id="0">
<!— Descriptors --> <FeatureSpace NumberDimensions-="3">
<FeatureDimension id="color"> Color </FeatureDimension> <FeatureDimension id="shape"> Shape </FeatureDimension> <FeatureDimension id="file_size"> FileSize </FeatureDimension> </FeatureSpace> <FeatureBoundingBox>
<FeatureSpacePoint>
<DimensionOrdinate dimension="color"/> <DimensionOrdinate dimension="shape"/> <DimensionOrdinate dimension="fιle_size"/> </FeatureSpacePoint> </FeatureBoundingBox> <FeatureSpaceElementTypes>
<ElementType name="StillRegion">
<Percentage> 80 % </Percentage> < ElementType> <ElementType name="MovingRegion">
<Percentage> 20 % </Percentage> </ElementType> </FeatureSpaceElements> <!— Relationships ~> <ClusterElements>
<ClusterNode>
<StillRegionDS id="regl '7> <StillRegionDS id="reg27> <MovingRegionDS id="reg3'7> </ClusterNode> </ClusterElements> <ClusterDecomposition> <ClusterNode>
<Cluster id="l"> <!- Cluster 1 description --> </Cluster> <Cluster id="2">
<!— Cluster 2 description --> <C lusterDecomposition> <ClusterNode>
<Cluster id="2J"> </Cluster>
B-l <Cluster id="2.2"> </Cluster> <Cluster id="2.3"> </Cluster> </ClusterNode> </ClusterDecomposition> </Cluster>
<Cluster id="3"> <!-- Cluster 3 description --> </Cluster> </ClusterNode> </ClusterDecomposition> </Cluster> </MMIndexDS> </GenericAVDS>
<!-- Examples in Figure 3 -->
<GenericAVDS>
<MMIndexDS>
<Cluster id="Subject">
<!— Descriptors — >
<FeatureSpace NumberDimensions="l ">
<FeatureDimension id="subject"> Subject < FeatureDimension> < FeatureSpace> <FeatureSpaceElementTypes>
<ElementType name="StillRegion">
<Percentage> 80 % </Percentage> </ElementType> <ElementType name="MovingRegion">
<Percentage> 20 % </Percentage> </ElementType> </Feature SpaceEl ements> <!-- Relationships --> <ClusterElements>
<ClusterNode>
<StillRegionDS id="regl"/> <StillRegionDS id="reg2"/> <MovingRegionDS id="reg3"/> </ClusterNode> </ClusterElements>
<LexicalRelation type="Meronymy" name="To be the Whole of '> <ClusterNode>
<Cluster id="Science"> </Cluster> <Cluster id="History"> </Cluster> <Cluster id="Art">
<!-- Cluster "Art" description --> <LexicalRelation type-"Meronymy" name- To be the Whole of '>
B-2 <ClusterNode>
<Cluster id="Moder"> </Cluster> <Cluster id="Expres"> </Cluster> <Cluster id="Impres"> </Cluster> </ClusterNode> </LexicalRelation> </Cluster> </ClusterNode> </LexicalRelation> </Cluster> </MMIndexDS> </GenericAVDS>
B-3

Claims

1. A system for generating a multimedia archive description comprising: a digital storage subsystem for storing multimedia records and descriptions of the records in accordance with a media description scheme; a computer processor operatively coupled to the digital storage subsystem, the computer processor accessing the record descriptions and generating an archive description record having at least one cluster relating at least two records in the digital storage subsystem in accordance with attributes of the record descriptions, the archive description record having a collection structure description scheme providing an index for the at least one cluster; and archive description storage operatively coupled to the computer processor for storing the archive description record.
2. The system for generating a multimedia archive description of claim 1 , wherein the at least one cluster further relates records in accordance with at least one cluster relationship .
3. The system for generating a multimedia archive description of claim 2, wherein the cluster attributes are selected from the group including feature space attributes, semantic attributes, media attributes and meta attributes.
4. The system for generating a multimedia archive description of claim 3, wherein the cluster attributes are indexed in accordance with an information-based hierarchy.
5. The system for generating a multimedia archive description of claim 4, wherein feature space attributes are indexed above semantic attributes in the information-based hierarchy.
NY02:2 4184.2
6. The system for generating a multimedia archive description of claim 5, wherein the feature space attributes are selected from the group including type/technique attributes, global distribution attributes, local structure attributes and global composition attributes.
7. The system for generating a multimedia archive description of claim 5, wherein the semantic attributes are selected from the group consisting of generic object attributes, generic scene attributes, specific object attributes, specific scene attributes, abstract object attributes and abstract scene attributes.
8. The system for generating a multimedia archive description of claim 5, wherein the information-based hierarchy is a ten level index structure having a plurality of feature space levels and a plurality of semantic attribute levels, wherein the feature space attribute levels include a type/technique attribute level, a global distribution attribute level, a local structure attribute level and a global composition attribute level and wherein the semantic attribute levels include generic object attribute level, a generic scene attribute level, a specific object attribute level, a specific scene attribute level, an abstract object attribute level and an abstract scene attribute level.
9. The system for generating a multimedia archive description of claim 2, wherein the cluster relationships are selected from the group consisting of feature space relationships, semantic relationships, media relationships and meta relationships.
10. The system for generating a multimedia archive description of claim 9, wherein the feature space relationships are selected from the group consisting of spatial relationships, temporal relationships, and visual relationships.
11. The system for generating a multimedia archive description of claim 9, wherein the semantic relationships are selected from the group consisting of lexical relationships and predictive relationships.
12. The system for generating a multimedia archive of claim 1, wherein the digital storage system includes local computer readable storage for multimedia records, multimedia record descriptions and the archive description record.
13. The system of claim 1, wherein the digital storage system includes a plurality of storage devices interconnected by a computer network.
14. A method of describing the content of a multimedia archive, having records and record descriptions associated with the records, comprising: evaluating the record descriptions to determine similarity measures in at least two records in the archive; and generating an archive description record including at least one cluster describing at least one attribute similarity measure in the at least two records and a collection structure for indexing the cluster.
15. The method of describing the content of a multimedia archive of claim 14, wherein the cluster attributes are selected from the group including feature space attributes, semantic attributes, media attributes and meta attributes.
16. The method of describing the content of a multimedia archive of claim 15, wherein the cluster attributes are indexed in accordance with an information-based hierarchy.
17. The method of describing the content of a multimedia archive of claim 16, wherein feature space attributes are indexed above semantic attributes in the information-based hierarchy.
18. The method of describing the content of a multimedia archive of claim 16, wherein the feature space attributes are selected from the group consisting of type/technique attributes, global distribution attributes, local structure attributes and global composition attributes.
19. The method of describing the content of a multimedia archive of claim 16, wherein the semantic attributes are selected from the group consisting of generic object attributes, generic scene attributes, specific object attributes, specific scene attributes, abstract object attributes and abstract scene attributes.
20. The method of describing the content of a multimedia archive of claim 16, wherein the information-based hierarchy is a ten level index structure having a plurality of feature space levels and a plurality of semantic attribute levels, wherein the feature space attribute levels include a type/technique attribute level, a global distribution attribute level, a local structure attribute level and a global composition attribute level and wherein the semantic attribute levels include generic object attribute level, a generic scene attribute level, a specific object attribute level, a specific scene attribute level, an abstract object attribute level and an abstract scene attribute level.
21. The method of describing the content of a multimedia archive of claim 14, wherein the cluster includes at least one cluster relationship.
22. The method of describing the content of a multimedia archive of claim 21 wherein the cluster relationships are selected from the group consisting of feature space relationships, semantic relationships, media relationships and meta relationships.
23. The method of describing the content of a multimedia archive of claim 22 wherein the feature space relationships are selected from the group consisting of spatial relationships, temporal relationships and visual relationships.
24. The method of describing the content of a multimedia archive of claim 22 wherein the semantic relationships are selected from the group consisting of lexical relationships and predictive relationships.
25. An archive description file for describing the content of a multimedia archive, having records and record descriptions associated with the records, comprising: a cluster, the cluster including at least one cluster attribute describing at least one similarity measure in the record descriptions; and a collection index structure relating the clusters to the records.
26. The archive description file for describing the content of a multimedia archive as defined by claim 25, wherein the cluster attributes are selected from the group including feature space attributes, semantic attributes, media attributes and meta attributes.
27. The archive description file for describing the content of a multimedia archive of claim 26, wherein the cluster attributes are indexed in accordance with an information-based hierarchy.
28. The archive description file for describing the content of a multimedia archive of claim 27, wherein feature space attributes are indexed above semantic attributes in the information-based hierarchy.
29. The archive description file for describing the content of a multimedia archive of claim 27, wherein the feature space attributes are selected from the group including type/technique attributes, global distribution attributes, local structure attributes and global composition attributes.
30. The archive description file for describing the content of a multimedia archive of claim 27, wherein the semantic attributes are selected from the group consisting of generic object attributes, generic scene attributes, specific object attributes, specific scene attributes, abstract object attributes and abstract scene attributes.
31. The archive description file for describing the content of a multimedia archive of claim 27, wherein the information-based hierarchy is a ten level index structure having a plurality of feature space levels and a plurality of semantic attribute levels, wherein the feature space attribute levels include a type/technique attribute level, a global distribution attribute level, a local structure attribute level and a global composition attribute level and wherein the semantic attribute levels include generic object attribute level, a generic scene attribute level, a specific object attribute level, a specific scene attribute level, an abstract object attribute level and an abstract scene attribute level.
32. The archive description file for describing the content of a multimedia archive as defined by claim 25, wherein the clusters further include at least one cluster relationship.
33. The archive description file for describing the content of a multimedia archive of claim 32, wherein the cluster relationships are selected from the group including feature space relationships, semantic relationships, media relationships and meta relationships.
34. The archive description file for describing the content of a multimedia archive of claim 33, wherein the feature space relationships are selected from the group including spatial relationships, temporal relationships, and visual relationships.
35. The archive description file for describing the content of a multimedia archive of claim 33, wherein the semantic relationships are selected from the group including lexical relationships and predictive relationships.
36. A media archive system comprising: a computer readable storage system for storing a plurality of media records; at least one media description engine, said media description engine accessing a media record stored in the computer readable storage system and generating a media description record corresponding thereto; a cluster processor operatively coupled to the computer readable storage system, the cluster processor accessing the media description records and generating an archive description record, the archive description record including at least one cluster which relates at least two records in the storage system; and a query processor operatively coupled to the cluster processor, the query processor receiving archive search parameters from a user and providing a search query to the cluster processor.
37. The media archive system of claim 36, wherein the clusters of the archive description record relate records in accordance with attributes and relationships generated from the records.
38. The media archive system of claim 37, wherein the cluster attributes are selected from the group consisting of feature space attributes, semantic attributes, media attributes and meta attributes.
39. The media archive system of claim 38, wherein the cluster attributes are indexed in accordance with an information-based hierarchy.
40. The media archive system of claim 39, wherein feature space attributes are indexed above semantic attributes in the information-based hierarchy.
41. The media archive system of claim 40, wherein the feature space attributes are selected from the group consisting of type/technique attributes, global distribution attributes, local structure attributes and global composition attributes.
42. The media archive system of claim 40, wherein the semantic attributes are selected from the group consisting of generic object attributes, generic scene attributes, specific object attributes, specific scene attributes, abstract object attributes and abstract scene attributes.
43. The media archive system of claim 40, wherein the information-based hierarchy is a ten level index structure having a plurality of feature space levels and a plurality of semantic attribute levels, wherein the feature space attribute levels include a type/technique attribute level, a global distribution attribute level, a local structure attribute level and a global composition attribute level and wherein the semantic attribute levels include generic object attribute level, a generic scene attribute level, a specific object attribute level, a specific scene attribute level, an abstract object attribute level and an abstract scene attribute level.
44. The media archive system of claim 37, wherein the cluster relationships are selected from the group consisting of feature space relationships, semantic relationships, media relationships and meta relationships.
45. The media archive system of claim 44, wherein the feature space relationships are selected from the group including spatial relationships, temporal relationships, and visual relationships.
46. The media archive system of claim 44, wherein the semantic relationships are selected from the group consisting of lexical relationships and predictive relationships.
PCT/US2000/002488 1999-02-01 2000-02-01 Multimedia archive description scheme WO2000045307A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
EP00915716A EP1151398A4 (en) 1999-02-01 2000-02-01 Multimedia archive description scheme
MXPA01007725A MXPA01007725A (en) 1999-02-01 2000-02-01 Multimedia archive description scheme.
JP2000596495A JP2002537591A (en) 1999-02-01 2000-02-01 Description scheme for multimedia archives
US09/889,859 US6941325B1 (en) 1999-02-01 2000-02-01 Multimedia archive description scheme
AU36943/00A AU3694300A (en) 1999-02-01 2000-02-01 Multimedia archive description scheme
HK03100981.8A HK1048866A1 (en) 1999-02-01 2003-02-11 Multimedia archive description scheme

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US11802799P 1999-02-01 1999-02-01
US11802099P 1999-02-01 1999-02-01
US11802699P 1999-02-01 1999-02-01
US60/118,026 1999-02-01
US60/118,027 1999-02-01
US60/118,020 1999-02-01
US14232799P 1999-07-03 1999-07-03
US60/142,327 1999-07-03

Publications (2)

Publication Number Publication Date
WO2000045307A1 true WO2000045307A1 (en) 2000-08-03
WO2000045307A9 WO2000045307A9 (en) 2001-12-27

Family

ID=27494169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/002488 WO2000045307A1 (en) 1999-02-01 2000-02-01 Multimedia archive description scheme

Country Status (8)

Country Link
EP (1) EP1151398A4 (en)
JP (1) JP2002537591A (en)
KR (1) KR100706820B1 (en)
CN (1) CN1241140C (en)
AU (1) AU3694300A (en)
HK (1) HK1048866A1 (en)
MX (1) MXPA01007725A (en)
WO (1) WO2000045307A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004017228A2 (en) * 2002-08-09 2004-02-26 Agency Multimedia Software-type platform dedicated to internet site referencing
US7120626B2 (en) 2002-11-15 2006-10-10 Koninklijke Philips Electronics N.V. Content retrieval based on semantic association
EP2159720A1 (en) * 2008-08-28 2010-03-03 Bach Technology AS Apparatus and method for generating a collection profile and for communicating based on the collection profile
EP2849096A1 (en) * 2013-09-13 2015-03-18 Kabushiki Kaisha Toshiba Electronic apparatus, program recommendation system, program recommendation method, and program recommendation program
US10311867B2 (en) 2015-03-20 2019-06-04 Kabushiki Kaisha Toshiba Tagging support apparatus and method

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100336061C (en) * 2003-08-08 2007-09-05 富士通株式会社 Multimedia object searching device and methoed
CN1301479C (en) * 2004-05-12 2007-02-21 威盛电子股份有限公司 Organizational architecture establishing method and authority control and management method thereof
US8612643B2 (en) * 2007-06-30 2013-12-17 Microsoft Corporation Interfaces for digital media processing
CN111159434A (en) * 2019-12-29 2020-05-15 赵娜 Method and system for storing multimedia file in Internet storage cluster
CN113239202B (en) * 2021-05-25 2024-03-05 北京达佳互联信息技术有限公司 Data processing method, device, server and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5546571A (en) * 1988-12-19 1996-08-13 Hewlett-Packard Company Method of recursively deriving and storing data in, and retrieving recursively-derived data from, a computer database system
US5664177A (en) * 1988-04-13 1997-09-02 Digital Equipment Corporation Data processing system having a data structure with a single, simple primitive
US5794242A (en) * 1995-02-07 1998-08-11 Digital Equipment Corporation Temporally and spatially organized database
US5852435A (en) * 1996-04-12 1998-12-22 Avid Technology, Inc. Digital multimedia editing and data management system
US5884298A (en) * 1996-03-29 1999-03-16 Cygnet Storage Solutions, Inc. Method for accessing and updating a library of optical discs
US5983218A (en) * 1997-06-30 1999-11-09 Xerox Corporation Multimedia database for use over networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19535537A1 (en) * 1995-09-25 1997-03-27 Profil Verbindungstechnik Gmbh Bolt element, method for inserting the same, assembly part and rivet die

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5664177A (en) * 1988-04-13 1997-09-02 Digital Equipment Corporation Data processing system having a data structure with a single, simple primitive
US5546571A (en) * 1988-12-19 1996-08-13 Hewlett-Packard Company Method of recursively deriving and storing data in, and retrieving recursively-derived data from, a computer database system
US5794242A (en) * 1995-02-07 1998-08-11 Digital Equipment Corporation Temporally and spatially organized database
US5884298A (en) * 1996-03-29 1999-03-16 Cygnet Storage Solutions, Inc. Method for accessing and updating a library of optical discs
US5852435A (en) * 1996-04-12 1998-12-22 Avid Technology, Inc. Digital multimedia editing and data management system
US5983218A (en) * 1997-06-30 1999-11-09 Xerox Corporation Multimedia database for use over networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1151398A4 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004017228A2 (en) * 2002-08-09 2004-02-26 Agency Multimedia Software-type platform dedicated to internet site referencing
WO2004017228A3 (en) * 2002-08-09 2004-05-13 Agency Multimedia Software-type platform dedicated to internet site referencing
US7707196B2 (en) 2002-08-09 2010-04-27 Agency Multimedia Software-type platform dedicated to internet site referencing
US7120626B2 (en) 2002-11-15 2006-10-10 Koninklijke Philips Electronics N.V. Content retrieval based on semantic association
EP2159720A1 (en) * 2008-08-28 2010-03-03 Bach Technology AS Apparatus and method for generating a collection profile and for communicating based on the collection profile
WO2010022890A1 (en) * 2008-08-28 2010-03-04 Bach Technology As Apparatus and method for generating a collection profile and for communicating based on the collection profile
US8407224B2 (en) 2008-08-28 2013-03-26 Bach Technology As Apparatus and method for generating a collection profile and for communicating based on the collection profile
EP2849096A1 (en) * 2013-09-13 2015-03-18 Kabushiki Kaisha Toshiba Electronic apparatus, program recommendation system, program recommendation method, and program recommendation program
US10311867B2 (en) 2015-03-20 2019-06-04 Kabushiki Kaisha Toshiba Tagging support apparatus and method

Also Published As

Publication number Publication date
AU3694300A (en) 2000-08-18
MXPA01007725A (en) 2003-06-24
EP1151398A1 (en) 2001-11-07
KR20020006663A (en) 2002-01-24
WO2000045307A9 (en) 2001-12-27
CN1241140C (en) 2006-02-08
HK1048866A1 (en) 2003-04-17
CN1364267A (en) 2002-08-14
JP2002537591A (en) 2002-11-05
EP1151398A4 (en) 2004-04-14
KR100706820B1 (en) 2007-04-11

Similar Documents

Publication Publication Date Title
US6941325B1 (en) Multimedia archive description scheme
EP1395916B1 (en) Meta-descriptor for multimedia information
KR100771574B1 (en) A method for indexing a plurality of digital information signals
JP2002529863A (en) Image description system and method
KR100706820B1 (en) Multimedia archive description scheme
Bartolini et al. Multimedia queries in digital libraries
Benitez et al. IMKA: a multimedia organization system combining perceptual and semantic knowledge
Angelides et al. An MPEG-7 scheme for semantic content modelling and filtering of digital video
Jin et al. A flexible and extensible framework for web image retrieval system
Yang et al. Search for flash movies on the web
Chen et al. MINDEX: An efficient index structure for salient-object-based queries in video databases
Barnard et al. Exploiting image semantics for picture libraries
Chen et al. HISA: a query system bridging the semantic gap for large image databases
Eidenberger et al. A data management layer for visual information retrieval
Chortaras et al. Semantic representation, enrichment, and retrieval of audiovisual film content
Li et al. Multimedia information retrieval at a crossroad
Kim et al. Mediaviews: A layered view mechanism for integrating multimedia data
Sert et al. An Approach to the Semantic Modeling of Audio Databases.
Chen et al. A comprehensive multimedia material library system with semantic expansion retrieval
Kerhervé et al. Functional Requirements for a Generic Distributed Multimedia Presentational Application
Aghbari et al. Extending MPEG-7 description scheme of moving regions by the semantic visual-spatio-temporal relationships
Poli et al. Ontology and multimedia
Petković et al. Database Management Systems and Conetent-Based Retrieval
Çelik An mpeg-7 video database system for content-based management and retrieval
Calistru Data organization and search in multimedia databases

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 00806016.9

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2000915716

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: PA/a/2001/007725

Country of ref document: MX

Ref document number: 1020017009668

Country of ref document: KR

ENP Entry into the national phase

Ref document number: 2000 596495

Country of ref document: JP

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 2000915716

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

AK Designated states

Kind code of ref document: C2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/10-10/10, DRAWINGS, REPLACED BY NEW PAGES 1/12-12/12; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

WWE Wipo information: entry into national phase

Ref document number: 09889859

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1020017009668

Country of ref document: KR

WWG Wipo information: grant in national office

Ref document number: 1020017009668

Country of ref document: KR