WO2008092018A3 - Cross-lingual information retrieval - Google Patents

Cross-lingual information retrieval Download PDF

Info

Publication number
WO2008092018A3
WO2008092018A3 PCT/US2008/051934 US2008051934W WO2008092018A3 WO 2008092018 A3 WO2008092018 A3 WO 2008092018A3 US 2008051934 W US2008051934 W US 2008051934W WO 2008092018 A3 WO2008092018 A3 WO 2008092018A3
Authority
WO
WIPO (PCT)
Prior art keywords
english
terms
search
multiple categories
language
Prior art date
Application number
PCT/US2008/051934
Other languages
French (fr)
Other versions
WO2008092018A2 (en
Inventor
Joel Summerlin
Jarett Funnell
Heike Uhlig
Wayne Yerigan
Original Assignee
Corbis Corp
Joel Summerlin
Jarett Funnell
Heike Uhlig
Wayne Yerigan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Corbis Corp, Joel Summerlin, Jarett Funnell, Heike Uhlig, Wayne Yerigan filed Critical Corbis Corp
Publication of WO2008092018A2 publication Critical patent/WO2008092018A2/en
Publication of WO2008092018A3 publication Critical patent/WO2008092018A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3337Translation of the query language, e.g. Chinese to English
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Abstract

Multi-lingual search and retrieval of digital content. Embodiments are generally directed to methods and systems for creating an English language database that associates non- English terms with English terms in multiple categories of metadata. Language experts use an interface to create equivalencies between non-English terms and English terms, Boolean expressions, synonyms, and other forms of search terms. Language dictionaries and other sources also create equivalencies. The database is used to evaluate non-English search terms submitted by a user, and to determine English search terms that can be used to perform a search for content. The multiple categories of metadata may comprise structured data, such as keywords of a structured vocabulary, and/or unstructured data, such as captions, titles, descriptions, etc. Weighting and/or prioritization can be applied to the search terms, to the process of searching the multiple categories, and/or to the search results, to rank the search results.
PCT/US2008/051934 2007-01-25 2008-01-24 Cross-lingual information retrieval WO2008092018A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US88664907P 2007-01-25 2007-01-25
US60/886,649 2007-01-25
US11/692,777 2007-03-28
US11/692,777 US7933765B2 (en) 2007-01-25 2007-03-28 Cross-lingual information retrieval

Publications (2)

Publication Number Publication Date
WO2008092018A2 WO2008092018A2 (en) 2008-07-31
WO2008092018A3 true WO2008092018A3 (en) 2008-10-02

Family

ID=39645169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/051934 WO2008092018A2 (en) 2007-01-25 2008-01-24 Cross-lingual information retrieval

Country Status (2)

Country Link
US (1) US7933765B2 (en)
WO (1) WO2008092018A2 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7146358B1 (en) 2001-08-28 2006-12-05 Google Inc. Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
US7720856B2 (en) * 2007-04-09 2010-05-18 Sap Ag Cross-language searching
US8135580B1 (en) 2008-08-20 2012-03-13 Amazon Technologies, Inc. Multi-language relevance-based indexing and search
CA2639438A1 (en) * 2008-09-08 2010-03-08 Semanti Inc. Semantically associated computer search index, and uses therefore
US8533051B2 (en) * 2010-10-27 2013-09-10 Nir Platek Multi-language multi-platform E-commerce management system
US8442982B2 (en) * 2010-11-05 2013-05-14 Apple Inc. Extended database search
US8498972B2 (en) * 2010-12-16 2013-07-30 Sap Ag String and sub-string searching using inverted indexes
US9684653B1 (en) 2012-03-06 2017-06-20 Amazon Technologies, Inc. Foreign language translation using product information
US8751486B1 (en) 2013-07-31 2014-06-10 Splunk Inc. Executing structured queries on unstructured data
US9881006B2 (en) * 2014-02-28 2018-01-30 Paypal, Inc. Methods for automatic generation of parallel corpora
US10769184B2 (en) 2015-06-05 2020-09-08 Apple Inc. Systems and methods for providing improved search functionality on a client device
US11423023B2 (en) 2015-06-05 2022-08-23 Apple Inc. Systems and methods for providing improved search functionality on a client device
US11048779B2 (en) 2015-08-17 2021-06-29 Adobe Inc. Content creation, fingerprints, and watermarks
US10592548B2 (en) 2015-08-17 2020-03-17 Adobe Inc. Image search persona techniques and systems
US10366433B2 (en) 2015-08-17 2019-07-30 Adobe Inc. Methods and systems for usage based content search results
US10878021B2 (en) 2015-08-17 2020-12-29 Adobe Inc. Content search and geographical considerations
US10475098B2 (en) 2015-08-17 2019-11-12 Adobe Inc. Content creation suggestions using keywords, similarity, and social networks
US9715714B2 (en) 2015-08-17 2017-07-25 Adobe Systems Incorporated Content creation and licensing control
CN111368565B (en) * 2018-09-05 2022-03-18 腾讯科技(深圳)有限公司 Text translation method, text translation device, storage medium and computer equipment
US11328007B2 (en) * 2019-02-04 2022-05-10 International Business Machines Corporation Generating a domain-specific phrasal dictionary
US10853983B2 (en) 2019-04-22 2020-12-01 Adobe Inc. Suggestions to enrich digital artwork

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050177358A1 (en) * 2004-02-10 2005-08-11 Edward Melomed Multilingual database interaction system and method
US20050203931A1 (en) * 2004-03-13 2005-09-15 Robert Pingree Metadata management convergence platforms, systems and methods
US20060059192A1 (en) * 2004-09-15 2006-03-16 Samsung Electronics Co., Ltd. Information storage medium for storing metadata supporting multiple languages, and systems and methods of processing metadata

Family Cites Families (74)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2448824A1 (en) * 1979-02-06 1980-09-05 Telediffusion Fse VIDEOTEX SYSTEM PROVIDED WITH INFORMATION ACCESS CONTROL MEANS
US5241671C1 (en) * 1989-10-26 2002-07-02 Encyclopaedia Britannica Educa Multimedia search system using a plurality of entry path means which indicate interrelatedness of information
US5201047A (en) * 1989-12-21 1993-04-06 International Business Machines Corporation Attribute-based classification and retrieval system
US5263158A (en) * 1990-02-15 1993-11-16 International Business Machines Corporation Method and system for variable authority level user access control in a distributed data processing system having multiple resource manager
US5317507A (en) * 1990-11-07 1994-05-31 Gallant Stephen I Method for document retrieval and for word sense disambiguation using neural networks
US5325298A (en) * 1990-11-07 1994-06-28 Hnc, Inc. Methods for generating or revising context vectors for a plurality of word stems
US5438508A (en) * 1991-06-28 1995-08-01 Digital Equipment Corporation License document interchange format for license management system
US5260999A (en) * 1991-06-28 1993-11-09 Digital Equipment Corporation Filters in license management system
US5251316A (en) * 1991-06-28 1993-10-05 Digital Equipment Corporation Method and apparatus for integrating a dynamic lexicon into a full-text information retrieval system
US5442778A (en) * 1991-11-12 1995-08-15 Xerox Corporation Scatter-gather: a cluster-based method and apparatus for browsing large document collections
US6947959B1 (en) * 1992-10-01 2005-09-20 Quark, Inc. Digital media asset management system and process
US5319705A (en) * 1992-10-21 1994-06-07 International Business Machines Corporation Method and system for multimedia access control enablement
EP0622930A3 (en) * 1993-03-19 1996-06-05 At & T Global Inf Solution Application sharing for computer collaboration system.
US5519608A (en) * 1993-06-24 1996-05-21 Xerox Corporation Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
US5553143A (en) * 1994-02-04 1996-09-03 Novell, Inc. Method and apparatus for electronic licensing
US5493677A (en) * 1994-06-08 1996-02-20 Systems Research & Applications Corporation Generation, archiving, and retrieval of digital images with evoked suggestion-set captions and natural language interface
US5682487A (en) * 1994-06-10 1997-10-28 Bay Networks, Inc. Method and apparatus providing resizable views
US5675819A (en) * 1994-06-16 1997-10-07 Xerox Corporation Document information retrieval using global word co-occurrence patterns
US5706497A (en) * 1994-08-15 1998-01-06 Nec Research Institute, Inc. Document retrieval using fuzzy-logic inference
US5600775A (en) * 1994-08-26 1997-02-04 Emotion, Inc. Method and apparatus for annotating full motion video and other indexed data structures
US5850561A (en) * 1994-09-23 1998-12-15 Lucent Technologies Inc. Glossary construction tool
US5532839A (en) * 1994-10-07 1996-07-02 Xerox Corporation Simplified document handler job recovery system with reduced memory duplicate scanned image detection
US5629980A (en) * 1994-11-23 1997-05-13 Xerox Corporation System for controlling the distribution and use of digital works
US5758257A (en) * 1994-11-29 1998-05-26 Herz; Frederick System and method for scheduling broadcast of and access to video programs and other data using customer profiles
US5642502A (en) * 1994-12-06 1997-06-24 University Of Central Florida Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text
WO1997008604A2 (en) * 1995-08-16 1997-03-06 Syracuse University Multilingual document retrieval system and method using semantic vector matching
US5721902A (en) * 1995-09-15 1998-02-24 Infonautics Corporation Restricted expansion of query terms using part of speech tagging
US5765152A (en) * 1995-10-13 1998-06-09 Trustees Of Dartmouth College System and method for managing copyrighted electronic media
US6125236A (en) * 1995-12-05 2000-09-26 Intel Corporation Method and apparatus for providing user control of multimedia parameters
US5794249A (en) * 1995-12-21 1998-08-11 Hewlett-Packard Company Audio/video retrieval system that uses keyword indexing of digital recordings to display a list of the recorded text files, keywords and time stamps associated with the system
US5987459A (en) * 1996-03-15 1999-11-16 Regents Of The University Of Minnesota Image and document management system for content-based retrieval
US5991876A (en) * 1996-04-01 1999-11-23 Copyright Clearance Center, Inc. Electronic rights management and authorization system
US5903892A (en) * 1996-05-24 1999-05-11 Magnifi, Inc. Indexing of media content on a network
US5778362A (en) * 1996-06-21 1998-07-07 Kdl Technologies Limted Method and system for revealing information structures in collections of data items
US5864845A (en) * 1996-06-28 1999-01-26 Siemens Corporate Research, Inc. Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy
US5832499A (en) * 1996-07-10 1998-11-03 Survivors Of The Shoah Visual History Foundation Digital library system
US5832495A (en) * 1996-07-08 1998-11-03 Survivors Of The Shoah Visual History Foundation Method and apparatus for cataloguing multimedia data
US5813014A (en) * 1996-07-10 1998-09-22 Survivors Of The Shoah Visual History Foundation Method and apparatus for management of multimedia assets
US6545687B2 (en) * 1997-01-09 2003-04-08 Canon Kabushiki Kaisha Thumbnail manipulation using fast and aspect ratio zooming, compressing and scaling
US6006241A (en) * 1997-03-14 1999-12-21 Microsoft Corporation Production of a video stream with synchronized annotations over a computer network
US5875446A (en) * 1997-02-24 1999-02-23 International Business Machines Corporation System and method for hierarchically grouping and ranking a set of objects in a query context based on one or more relationships
US5920861A (en) * 1997-02-25 1999-07-06 Intertrust Technologies Corp. Techniques for defining using and manipulating rights management data structures
US6012068A (en) * 1997-06-30 2000-01-04 International Business Machines Corporation Media manager for access to multiple media types
US6546405B2 (en) * 1997-10-23 2003-04-08 Microsoft Corporation Annotating temporally-dimensioned multimedia content
US6072904A (en) * 1997-12-31 2000-06-06 Philips Electronics North America Corp. Fast image retrieval using multi-scale edge representation of images
JPH11203359A (en) * 1998-01-14 1999-07-30 Fuji Photo Film Co Ltd Network photo service system
US6385596B1 (en) * 1998-02-06 2002-05-07 Liquid Audio, Inc. Secure online music distribution system
US6834130B1 (en) * 1998-02-18 2004-12-21 Minolta Co., Ltd. Image retrieval system for retrieving a plurality of images which are recorded in a recording medium, and a method thereof
US6349373B2 (en) * 1998-02-20 2002-02-19 Eastman Kodak Company Digital image management system having method for managing images according to image groups
US6038333A (en) * 1998-03-16 2000-03-14 Hewlett-Packard Company Person identifier and management system
US6578073B1 (en) * 1998-05-13 2003-06-10 Hewlett-Packard Development Company, L.P. Accelerated content delivery over a network using reduced size objects
US6226618B1 (en) * 1998-08-13 2001-05-01 International Business Machines Corporation Electronic content delivery system
US6243713B1 (en) * 1998-08-24 2001-06-05 Excalibur Technologies Corp. Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types
JP3915267B2 (en) * 1998-09-07 2007-05-16 富士ゼロックス株式会社 Document search apparatus and document search method
US6523028B1 (en) * 1998-12-03 2003-02-18 Lockhead Martin Corporation Method and system for universal querying of distributed databases
US6920610B1 (en) * 1999-03-02 2005-07-19 Microsoft Corporation Method and system for browsing a low-resolution image
JP2000312334A (en) * 1999-04-27 2000-11-07 Canon Inc Image storage device
US6404441B1 (en) * 1999-07-16 2002-06-11 Jet Software, Inc. System for creating media presentations of computer software application programs
JP2001186334A (en) * 1999-12-27 2001-07-06 Canon Inc Device, system and method for picture processing, and storage medium
BR0105580A (en) * 2000-04-10 2002-06-11 Sony Corp Value management system and method for managing an essence, system and production method for creating a project and program from an essence, filing system and method for filing an essence, distribution system and method for allocating an essence, authoring system and method to create a packaging medium from an essence, production system to create an essence and control method of a production system to create an essence
AU7593601A (en) * 2000-07-14 2002-01-30 Atabok Inc Controlling and managing digital assets
US6944340B1 (en) * 2000-08-07 2005-09-13 Canon Kabushiki Kaisha Method and apparatus for efficient determination of recognition parameters
WO2002019147A1 (en) 2000-08-28 2002-03-07 Emotion, Inc. Method and apparatus for digital media management, retrieval, and collaboration
US6581055B1 (en) * 2000-09-11 2003-06-17 Oracle International Corporation Query optimization with switch predicates
US6735583B1 (en) * 2000-11-01 2004-05-11 Getty Images, Inc. Method and system for classifying and locating media content
US6931408B2 (en) * 2001-08-17 2005-08-16 E.C. Outlook, Inc. Method of storing, maintaining and distributing computer intelligible electronic data
US7110937B1 (en) * 2002-06-20 2006-09-19 Siebel Systems, Inc. Translation leveraging
US20040205333A1 (en) * 2003-04-14 2004-10-14 Telefonaktiebolaget Lm Ericsson (Publ) Method and system for digital rights management
US7814127B2 (en) * 2003-11-20 2010-10-12 International Business Machines Corporation Natural language support for database applications
US7277884B2 (en) * 2004-02-17 2007-10-02 Microsoft Corporation Method and system for generating help files based on user queries
US7603353B2 (en) * 2004-10-27 2009-10-13 Harris Corporation Method for re-ranking documents retrieved from a multi-lingual document database
US8732175B2 (en) * 2005-04-21 2014-05-20 Yahoo! Inc. Interestingness ranking of media objects
US20060277189A1 (en) * 2005-06-02 2006-12-07 Microsoft Corporation Translation of search result display elements
US7454413B2 (en) * 2005-08-19 2008-11-18 Microsoft Corporation Query expressions and interactions with metadata

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050177358A1 (en) * 2004-02-10 2005-08-11 Edward Melomed Multilingual database interaction system and method
US20050203931A1 (en) * 2004-03-13 2005-09-15 Robert Pingree Metadata management convergence platforms, systems and methods
US20060059192A1 (en) * 2004-09-15 2006-03-16 Samsung Electronics Co., Ltd. Information storage medium for storing metadata supporting multiple languages, and systems and methods of processing metadata

Also Published As

Publication number Publication date
WO2008092018A2 (en) 2008-07-31
US20080275691A1 (en) 2008-11-06
US7933765B2 (en) 2011-04-26

Similar Documents

Publication Publication Date Title
WO2008092018A3 (en) Cross-lingual information retrieval
Carpineto et al. A survey of automatic query expansion in information retrieval
Andrenucci et al. Automated question answering: Review of the main approaches
US8892550B2 (en) Source expansion for information retrieval and information extraction
CN104885081B (en) Search system and corresponding method
KR101732342B1 (en) Trusted query system and method
US9483557B2 (en) Keyword generation for media content
WO2006113597A3 (en) Method for information retrieval
WO2006074324A8 (en) Systems, methods, software, and interfaces for multilingual information retrieval
CN101763402A (en) Integrated retrieval method for multi-language information retrieval
Khan et al. Development of Arabic evaluations in information retrieval
Blake et al. UNC-CH at DUC 2007: Query expansion, lexical simplification and sentence selection strategies for Multi-Document summarization
Balasubramanian et al. Topic pages: An alternative to the ten blue links
Lu et al. Web-based query translation for English-Chinese CLIR
Klang et al. Linking, searching, and visualizing entities in wikipedia
Ma et al. Combining n-gram and dependency word pair for multi-document summarization
Souza et al. Extraction of keywords from texts: an exploratory study using Noun Phrases
Haggag et al. Keyword Extraction using Clustering and Semantic Analysis
Klyuev et al. A query expansion technique using the ewc semantic relatedness measure
Duc et al. Cross-language latent relational search between japanese and english languages using a web corpus
D'hondt Lexical issues of a syntactic approach to interactive patent retrieval
Pasca The role of queries in ranking labeled instances extracted from text
Rashidi et al. Comment-enriched index terms improve the relevance and novelty of the ranking of the commented medical articles retrieved by an NLP system
Wang Web-based verification on the representativeness of terms extracted from single short documents
Agirre et al. The Sheffield and Basque Country Universities Entry to CHiC: using random walks and similarity to access cultural heritage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08713966

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08713966

Country of ref document: EP

Kind code of ref document: A2