CN101305370B - 信息分类范例 - Google Patents
信息分类范例 Download PDFInfo
- Publication number
- CN101305370B CN101305370B CN200680042170XA CN200680042170A CN101305370B CN 101305370 B CN101305370 B CN 101305370B CN 200680042170X A CN200680042170X A CN 200680042170XA CN 200680042170 A CN200680042170 A CN 200680042170A CN 101305370 B CN101305370 B CN 101305370B
- Authority
- CN
- China
- Prior art keywords
- document
- source document
- group
- segment
- categorized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99937—Sorting
Abstract
Description
特征号 | 描述 |
1 | 包含图片的片断 |
2 | 包含链接的片断 |
3 | 包含具有链接的图片的片断 |
4 | 包含具有与另一链接相同的链接的图片的片断 |
5 | 包含jpg图像的片断 |
6 | 包含“input””或“submit”标签的片断 |
7 | 包含在其文本中具有用于指示精确价格(例如,小于10)的足够字符的价格标签的片断 |
8 | 在其他自由文本中具有货币符号的片断,不将包含价格标识符的标签计算在内 |
9 | 包含具有链接属性的价格标签的片断 |
10 | 包含具有相同链接的两个标签的片断 |
11 | 包含隐藏的输入标签的片断 |
12 | 包含具有在任何其他标签中的所有自由文本的替换文本的图像标签的片断 |
13 | 包含具有替换文本的jpg图像的片断 |
14 | 包含具有图像、链接和文本的标签的片断 |
15 | 图像标签与标签总数之比 |
16 | 具有自由文本的标签与标签总数之比 |
17 | 具有链接的标签与标签总数之比 |
18 | 具有图像和链接的标签与标签总数之比 |
19 | 具有文本和链接的标签与标签总数之比 |
20 | 具有图像、文本和链接的标签与标签总数之比 |
特征号 | 特征描述 |
1 | 以字符为单位的平均单词长度 |
2 | 以字符为单位的计算所得的平均单词长度 |
3 | 以字符为单位的平均段落长度 |
4 | 以单词为单位的平均段落长度 |
5 | 以句子为单位的平均段落长度 |
6 | 以字符为单位的平均章节长度 |
7 | 以段落为单位的平均章节长度 |
8 | 以句子为单位的平均章节长度 |
9 | 以单词为单位的平均章节长度 |
10 | 以字符为单位的平均句子长度 |
11 | 以单词为单位的平均句子长度 |
12 | 以字符为单位计算所得的文档长度 |
13 | 以单词为单位计算所得的文档长度 |
14 | 长度为N的单词的数目 |
15 | 文档中单词长度的标准差 |
16 | 文档中单词长度的方差 |
17 | 非空格字符的数目 |
18 | 总字符的数目 |
19 | 单词的数目 |
20 | 单词计数的平方根 |
21 | 单词技术的四次方根 |
22 | 拼写错误的数目(总数) |
23 | 可能的打字错误的数目(参看自动纠正) |
24 | 可能的非打字错误拼写错误的数目 |
25 | 句子的数目(标点符号划界) |
26 | 被动句的数目 |
27 | 主动句的数目 |
28 | 语法错误的数目 |
29 | 段落的数目 |
30 | 章节的数目 |
31 | 页数 |
32 | 拼写错误计数与字符计数之比 |
33 | 拼写错误计数与总字符计数之比 |
34 | 拼写错误计数与单词计数之比 |
35 | 拼写错误计数与句子计数之比 |
36 | 拼写错误计数与段落计数之比 |
37 | 可能的打字错误与字符计数之比 |
38 | 可能的打字错误与总字符计数之比 |
39 | 可能的打字错误与单词计数之比 |
40 | 可能的打字错误与拼写错误计数之比 |
41 | 可能的打字错误与句子计数之比 |
42 | 可能的打字错误与语法错误计数之比 |
43 | 可能的打字错误与段落计数之比 |
44 | 非空格字符计数与总字符计数之比 |
45 | 语法错误计数与ns字符计数之比 |
46 | 语法错误计数与总字符计数之比 |
47 | 语法错误计数与单词计数之比 |
48 | 语法错误计数与句子计数之比 |
49 | 语法错误计数与段落计数之比 |
50 | 被动句与主动句之比 |
51 | 被动据与所有句子之比 |
52 | Flesch-Kincaid阅读方便统计 |
53 | Flesch-Kincaid等级 |
Claims (14)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US73667605P | 2005-11-15 | 2005-11-15 | |
US60/736,676 | 2005-11-15 | ||
US11/276,818 | 2006-03-15 | ||
US11/276,818 US7529748B2 (en) | 2005-11-15 | 2006-03-15 | Information classification paradigm |
PCT/US2006/044476 WO2007059272A1 (en) | 2005-11-15 | 2006-11-15 | Information classification paradigm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101305370A CN101305370A (zh) | 2008-11-12 |
CN101305370B true CN101305370B (zh) | 2013-03-06 |
Family
ID=38042114
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200680042170XA Expired - Fee Related CN101305370B (zh) | 2005-11-15 | 2006-11-15 | 信息分类范例 |
Country Status (5)
Country | Link |
---|---|
US (1) | US7529748B2 (zh) |
EP (1) | EP1955220A4 (zh) |
KR (1) | KR101312770B1 (zh) |
CN (1) | CN101305370B (zh) |
WO (1) | WO2007059272A1 (zh) |
Families Citing this family (101)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7581077B2 (en) | 1997-10-30 | 2009-08-25 | Commvault Systems, Inc. | Method and system for transferring data in a storage operation |
US6418478B1 (en) * | 1997-10-30 | 2002-07-09 | Commvault Systems, Inc. | Pipelined high speed data transfer mechanism |
US7035880B1 (en) | 1999-07-14 | 2006-04-25 | Commvault Systems, Inc. | Modular backup and retrieval system used in conjunction with a storage area network |
US7389311B1 (en) | 1999-07-15 | 2008-06-17 | Commvault Systems, Inc. | Modular backup and retrieval system |
US7395282B1 (en) | 1999-07-15 | 2008-07-01 | Commvault Systems, Inc. | Hierarchical backup and retrieval system |
US6658436B2 (en) | 2000-01-31 | 2003-12-02 | Commvault Systems, Inc. | Logical view and access to data managed by a modular data and storage management system |
US7155481B2 (en) | 2000-01-31 | 2006-12-26 | Commvault Systems, Inc. | Email attachment management in a computer system |
US7003641B2 (en) | 2000-01-31 | 2006-02-21 | Commvault Systems, Inc. | Logical view with granular access to exchange data managed by a modular data and storage management system |
AU2003270482A1 (en) | 2002-09-09 | 2004-03-29 | Commvault Systems, Inc. | Dynamic storage device pooling in a computer system |
US8370542B2 (en) | 2002-09-16 | 2013-02-05 | Commvault Systems, Inc. | Combined stream auxiliary copy system and method |
US7246207B2 (en) | 2003-04-03 | 2007-07-17 | Commvault Systems, Inc. | System and method for dynamically performing storage operations in a computer network |
US7454569B2 (en) | 2003-06-25 | 2008-11-18 | Commvault Systems, Inc. | Hierarchical system and method for performing storage operations in a computer network |
WO2005065084A2 (en) | 2003-11-13 | 2005-07-21 | Commvault Systems, Inc. | System and method for providing encryption in pipelined storage operations in a storage network |
WO2005050381A2 (en) | 2003-11-13 | 2005-06-02 | Commvault Systems, Inc. | Systems and methods for performing storage operations using network attached storage |
WO2005048085A2 (en) | 2003-11-13 | 2005-05-26 | Commvault Systems, Inc. | System and method for performing an image level snapshot and for restoring partial volume data |
WO2006052872A2 (en) | 2004-11-05 | 2006-05-18 | Commvault Systems, Inc. | System and method to support single instance storage operations |
US7490207B2 (en) * | 2004-11-08 | 2009-02-10 | Commvault Systems, Inc. | System and method for performing auxillary storage operations |
US8271548B2 (en) * | 2005-11-28 | 2012-09-18 | Commvault Systems, Inc. | Systems and methods for using metadata to enhance storage operations |
US20070185926A1 (en) * | 2005-11-28 | 2007-08-09 | Anand Prahlad | Systems and methods for classifying and transferring information in a storage network |
US8930496B2 (en) | 2005-12-19 | 2015-01-06 | Commvault Systems, Inc. | Systems and methods of unified reconstruction in storage systems |
EP1974296B8 (en) | 2005-12-19 | 2016-09-21 | Commvault Systems, Inc. | Systems and methods for performing data replication |
US7962709B2 (en) | 2005-12-19 | 2011-06-14 | Commvault Systems, Inc. | Network redirector systems and methods for performing data replication |
US7617262B2 (en) | 2005-12-19 | 2009-11-10 | Commvault Systems, Inc. | Systems and methods for monitoring application data in a data replication system |
US20200257596A1 (en) | 2005-12-19 | 2020-08-13 | Commvault Systems, Inc. | Systems and methods of unified reconstruction in storage systems |
US7636743B2 (en) | 2005-12-19 | 2009-12-22 | Commvault Systems, Inc. | Pathname translation in a data replication system |
US8655850B2 (en) | 2005-12-19 | 2014-02-18 | Commvault Systems, Inc. | Systems and methods for resynchronizing information |
US7651593B2 (en) | 2005-12-19 | 2010-01-26 | Commvault Systems, Inc. | Systems and methods for performing data replication |
US7606844B2 (en) | 2005-12-19 | 2009-10-20 | Commvault Systems, Inc. | System and method for performing replication copy storage operations |
US8725711B2 (en) * | 2006-06-09 | 2014-05-13 | Advent Software, Inc. | Systems and methods for information categorization |
US8726242B2 (en) | 2006-07-27 | 2014-05-13 | Commvault Systems, Inc. | Systems and methods for continuous data replication |
US7882077B2 (en) | 2006-10-17 | 2011-02-01 | Commvault Systems, Inc. | Method and system for offline indexing of content and classifying stored data |
US8370442B2 (en) | 2008-08-29 | 2013-02-05 | Commvault Systems, Inc. | Method and system for leveraging identified changes to a mail server |
US20080228771A1 (en) | 2006-12-22 | 2008-09-18 | Commvault Systems, Inc. | Method and system for searching stored data |
US8312323B2 (en) | 2006-12-22 | 2012-11-13 | Commvault Systems, Inc. | Systems and methods for remote monitoring in a computer network and reporting a failed migration operation without accessing the data being moved |
US8719809B2 (en) | 2006-12-22 | 2014-05-06 | Commvault Systems, Inc. | Point in time rollback and un-installation of software |
US8290808B2 (en) | 2007-03-09 | 2012-10-16 | Commvault Systems, Inc. | System and method for automating customer-validated statement of work for a data storage environment |
US7836174B2 (en) | 2008-01-30 | 2010-11-16 | Commvault Systems, Inc. | Systems and methods for grid-based data scanning |
US8296301B2 (en) | 2008-01-30 | 2012-10-23 | Commvault Systems, Inc. | Systems and methods for probabilistic data classification |
US8204859B2 (en) | 2008-12-10 | 2012-06-19 | Commvault Systems, Inc. | Systems and methods for managing replicated database data |
US9495382B2 (en) | 2008-12-10 | 2016-11-15 | Commvault Systems, Inc. | Systems and methods for performing discrete data replication |
US8713007B1 (en) | 2009-03-13 | 2014-04-29 | Google Inc. | Classifying documents using multiple classifiers |
CN102656553B (zh) | 2009-09-09 | 2016-02-10 | 瓦欧尼斯系统有限公司 | 企业级数据管理 |
US20110061093A1 (en) * | 2009-09-09 | 2011-03-10 | Ohad Korkus | Time dependent access permissions |
US10229191B2 (en) | 2009-09-09 | 2019-03-12 | Varonis Systems Ltd. | Enterprise level data management |
US8442983B2 (en) | 2009-12-31 | 2013-05-14 | Commvault Systems, Inc. | Asynchronous methods of data classification using change journals and other data structures |
US8504517B2 (en) | 2010-03-29 | 2013-08-06 | Commvault Systems, Inc. | Systems and methods for selective data replication |
US8725698B2 (en) | 2010-03-30 | 2014-05-13 | Commvault Systems, Inc. | Stub file prioritization in a data replication system |
US8352422B2 (en) | 2010-03-30 | 2013-01-08 | Commvault Systems, Inc. | Data restore systems and methods in a replication environment |
US8504515B2 (en) | 2010-03-30 | 2013-08-06 | Commvault Systems, Inc. | Stubbing systems and methods in a data replication environment |
US10296596B2 (en) | 2010-05-27 | 2019-05-21 | Varonis Systems, Inc. | Data tagging |
EP2577444A4 (en) | 2010-05-27 | 2014-04-02 | Varonis Systems Inc | DATA CLASSIFICATION |
US8589347B2 (en) | 2010-05-28 | 2013-11-19 | Commvault Systems, Inc. | Systems and methods for performing data replication |
CN102033965A (zh) * | 2011-01-17 | 2011-04-27 | 安徽海汇金融投资集团有限公司 | 一种基于分类模型的数据分类方法及系统 |
US9021198B1 (en) | 2011-01-20 | 2015-04-28 | Commvault Systems, Inc. | System and method for sharing SAN storage |
US9680839B2 (en) | 2011-01-27 | 2017-06-13 | Varonis Systems, Inc. | Access permissions management system and method |
EP2668563A4 (en) | 2011-01-27 | 2015-06-10 | Varonis Systems Inc | METHOD AND SYSTEM FOR MANAGING ACCESS AUTHORIZATIONS |
US8719264B2 (en) | 2011-03-31 | 2014-05-06 | Commvault Systems, Inc. | Creating secondary copies of data based on searches for content |
US9298715B2 (en) | 2012-03-07 | 2016-03-29 | Commvault Systems, Inc. | Data storage system utilizing proxy device for storage operations |
US9471578B2 (en) | 2012-03-07 | 2016-10-18 | Commvault Systems, Inc. | Data storage system utilizing proxy device for storage operations |
US9342537B2 (en) | 2012-04-23 | 2016-05-17 | Commvault Systems, Inc. | Integrated snapshot interface for a data storage system |
US9069798B2 (en) * | 2012-05-24 | 2015-06-30 | Mitsubishi Electric Research Laboratories, Inc. | Method of text classification using discriminative topic transformation |
US8892523B2 (en) | 2012-06-08 | 2014-11-18 | Commvault Systems, Inc. | Auto summarization of content |
US20150178563A1 (en) * | 2012-07-23 | 2015-06-25 | Hewlett-Packard Development Company, L.P. | Document classification |
KR101374900B1 (ko) * | 2012-12-13 | 2014-03-13 | 포항공과대학교 산학협력단 | 문법 오류 정정 시스템 및 이를 이용한 문법 오류 정정 방법 |
US10379988B2 (en) | 2012-12-21 | 2019-08-13 | Commvault Systems, Inc. | Systems and methods for performance monitoring |
US9262435B2 (en) | 2013-01-11 | 2016-02-16 | Commvault Systems, Inc. | Location-based data synchronization management |
US9886346B2 (en) | 2013-01-11 | 2018-02-06 | Commvault Systems, Inc. | Single snapshot for multiple agents |
US10354187B2 (en) * | 2013-01-17 | 2019-07-16 | Hewlett Packard Enterprise Development Lp | Confidentiality of files using file vectorization and machine learning |
US9251363B2 (en) | 2013-02-20 | 2016-02-02 | Varonis Systems, Inc. | Systems and methodologies for controlling access to a file system |
CN104281603B (zh) * | 2013-07-05 | 2018-01-19 | 北大方正集团有限公司 | 字频分级统计方法及系统 |
US8886671B1 (en) | 2013-08-14 | 2014-11-11 | Advent Software, Inc. | Multi-tenant in-memory database (MUTED) system and method |
US9753812B2 (en) | 2014-01-24 | 2017-09-05 | Commvault Systems, Inc. | Generating mapping information for single snapshot for multiple applications |
US9639426B2 (en) | 2014-01-24 | 2017-05-02 | Commvault Systems, Inc. | Single snapshot for multiple applications |
US9495251B2 (en) | 2014-01-24 | 2016-11-15 | Commvault Systems, Inc. | Snapshot readiness checking and reporting |
US9632874B2 (en) | 2014-01-24 | 2017-04-25 | Commvault Systems, Inc. | Database application backup in single snapshot for multiple applications |
US10042716B2 (en) | 2014-09-03 | 2018-08-07 | Commvault Systems, Inc. | Consolidated processing of storage-array commands using a forwarder media agent in conjunction with a snapshot-control media agent |
US9774672B2 (en) | 2014-09-03 | 2017-09-26 | Commvault Systems, Inc. | Consolidated processing of storage-array commands by a snapshot-control media agent |
US9448731B2 (en) | 2014-11-14 | 2016-09-20 | Commvault Systems, Inc. | Unified snapshot storage management |
US9648105B2 (en) | 2014-11-14 | 2017-05-09 | Commvault Systems, Inc. | Unified snapshot storage management, using an enhanced storage manager and enhanced media agents |
WO2016103519A1 (ja) * | 2014-12-26 | 2016-06-30 | 株式会社Ubic | データ分析システム、データ分析方法、およびデータ分析プログラム |
US9898213B2 (en) | 2015-01-23 | 2018-02-20 | Commvault Systems, Inc. | Scalable auxiliary copy processing using media agent resources |
US9904481B2 (en) | 2015-01-23 | 2018-02-27 | Commvault Systems, Inc. | Scalable auxiliary copy processing in a storage management system using media agent resources |
US10354188B2 (en) | 2016-08-02 | 2019-07-16 | Microsoft Technology Licensing, Llc | Extracting facts from unstructured information |
US10318564B2 (en) | 2015-09-28 | 2019-06-11 | Microsoft Technology Licensing, Llc | Domain-specific unstructured text retrieval |
CN105706088A (zh) * | 2016-01-31 | 2016-06-22 | 深圳市博信诺达经贸咨询有限公司 | 大数据的应用方法及系统 |
US20170249594A1 (en) * | 2016-02-26 | 2017-08-31 | Linkedln Corporation | Job search engine for recent college graduates |
US10503753B2 (en) | 2016-03-10 | 2019-12-10 | Commvault Systems, Inc. | Snapshot replication operations based on incremental block change tracking |
JP6235082B1 (ja) * | 2016-07-13 | 2017-11-22 | ヤフー株式会社 | データ分類装置、データ分類方法、およびプログラム |
US10540516B2 (en) | 2016-10-13 | 2020-01-21 | Commvault Systems, Inc. | Data protection within an unsecured storage environment |
US10922189B2 (en) | 2016-11-02 | 2021-02-16 | Commvault Systems, Inc. | Historical network data-based scanning thread generation |
US10389810B2 (en) | 2016-11-02 | 2019-08-20 | Commvault Systems, Inc. | Multi-threaded scanning of distributed file systems |
US11010261B2 (en) | 2017-03-31 | 2021-05-18 | Commvault Systems, Inc. | Dynamically allocating streams during restoration of data |
US10984041B2 (en) | 2017-05-11 | 2021-04-20 | Commvault Systems, Inc. | Natural language processing integrated with database and data storage management |
US10732885B2 (en) | 2018-02-14 | 2020-08-04 | Commvault Systems, Inc. | Block-level live browsing and private writable snapshots using an ISCSI server |
US10642886B2 (en) | 2018-02-14 | 2020-05-05 | Commvault Systems, Inc. | Targeted search of backup data using facial recognition |
US11159469B2 (en) | 2018-09-12 | 2021-10-26 | Commvault Systems, Inc. | Using machine learning to modify presentation of mailbox objects |
US11042318B2 (en) | 2019-07-29 | 2021-06-22 | Commvault Systems, Inc. | Block-level data replication |
US11494417B2 (en) | 2020-08-07 | 2022-11-08 | Commvault Systems, Inc. | Automated email classification in an information management system |
US11348617B1 (en) | 2021-03-08 | 2022-05-31 | Bank Of America Corporation | System for implementing content retrofitting using information vectorization |
US11593223B1 (en) | 2021-09-02 | 2023-02-28 | Commvault Systems, Inc. | Using resource pool administrative entities in a data storage management system to provide shared infrastructure to tenants |
US11809285B2 (en) | 2022-02-09 | 2023-11-07 | Commvault Systems, Inc. | Protecting a management database of a data storage management system to meet a recovery point objective (RPO) |
Family Cites Families (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6092059A (en) * | 1996-12-27 | 2000-07-18 | Cognex Corporation | Automatic classifier for real time inspection and classification |
JPH10228486A (ja) * | 1997-02-14 | 1998-08-25 | Nec Corp | 分散ドキュメント分類システム及びプログラムを記録した機械読み取り可能な記録媒体 |
US6266664B1 (en) * | 1997-10-01 | 2001-07-24 | Rulespace, Inc. | Method for scanning, analyzing and rating digital information content |
US6484149B1 (en) * | 1997-10-10 | 2002-11-19 | Microsoft Corporation | Systems and methods for viewing product information, and methods for generating web pages |
JP2000348041A (ja) * | 1999-06-03 | 2000-12-15 | Nec Corp | 文書検索方法及びその装置並びにプログラムを記録した機械読み取り可能な記録媒体 |
WO2001027712A2 (en) | 1999-10-12 | 2001-04-19 | The Shopper Inc. | A method and system for automatically structuring content from universal marked-up documents |
US6892191B1 (en) * | 2000-02-07 | 2005-05-10 | Koninklijke Philips Electronics N.V. | Multi-feature combination generation and classification effectiveness evaluation using genetic algorithms |
US6920609B1 (en) * | 2000-08-24 | 2005-07-19 | Yahoo! Inc. | Systems and methods for identifying and extracting data from HTML pages |
JP4552296B2 (ja) * | 2000-09-08 | 2010-09-29 | ソニー株式会社 | 情報処理装置および情報処理方法、並びに記録媒体 |
US6751614B1 (en) * | 2000-11-09 | 2004-06-15 | Satyam Computer Services Limited Of Mayfair Centre | System and method for topic-based document analysis for information filtering |
US20040138946A1 (en) * | 2001-05-04 | 2004-07-15 | Markus Stolze | Web page annotation systems |
US7778872B2 (en) * | 2001-09-06 | 2010-08-17 | Google, Inc. | Methods and apparatus for ordering advertisements based on performance information and price information |
US7062498B2 (en) * | 2001-11-02 | 2006-06-13 | Thomson Legal Regulatory Global Ag | Systems, methods, and software for classifying text from judicial opinions and other documents |
US20030225763A1 (en) * | 2002-04-15 | 2003-12-04 | Microsoft Corporation | Self-improving system and method for classifying pages on the world wide web |
US7165068B2 (en) | 2002-06-12 | 2007-01-16 | Zycus Infotech Pvt Ltd. | System and method for electronic catalog classification using a hybrid of rule based and statistical method |
US7016895B2 (en) * | 2002-07-05 | 2006-03-21 | Word Data Corp. | Text-classification system and method |
US7035841B2 (en) * | 2002-07-18 | 2006-04-25 | Xerox Corporation | Method for automatic wrapper repair |
US7349917B2 (en) * | 2002-10-01 | 2008-03-25 | Hewlett-Packard Development Company, L.P. | Hierarchical categorization method and system with automatic local selection of classifiers |
US7386527B2 (en) * | 2002-12-06 | 2008-06-10 | Kofax, Inc. | Effective multi-class support vector machine classification |
WO2004088479A2 (en) * | 2003-03-26 | 2004-10-14 | Victor Hsieh | Online intelligent multilingual comparison-shop agents for wireless networks |
US20050066269A1 (en) * | 2003-09-18 | 2005-03-24 | Fujitsu Limited | Information block extraction apparatus and method for Web pages |
US7836038B2 (en) * | 2003-12-10 | 2010-11-16 | Google Inc. | Methods and systems for information extraction |
US7519621B2 (en) * | 2004-05-04 | 2009-04-14 | Pagebites, Inc. | Extracting information from Web pages |
US7516397B2 (en) * | 2004-07-28 | 2009-04-07 | International Business Machines Corporation | Methods, apparatus and computer programs for characterizing web resources |
US20060149710A1 (en) * | 2004-12-30 | 2006-07-06 | Ross Koningstein | Associating features with entities, such as categories of web page documents, and/or weighting such features |
-
2006
- 2006-03-15 US US11/276,818 patent/US7529748B2/en not_active Expired - Fee Related
- 2006-11-15 CN CN200680042170XA patent/CN101305370B/zh not_active Expired - Fee Related
- 2006-11-15 EP EP06837761A patent/EP1955220A4/en not_active Ceased
- 2006-11-15 WO PCT/US2006/044476 patent/WO2007059272A1/en active Application Filing
- 2006-11-15 KR KR1020087011666A patent/KR101312770B1/ko active IP Right Grant
Also Published As
Publication number | Publication date |
---|---|
EP1955220A1 (en) | 2008-08-13 |
US20070112756A1 (en) | 2007-05-17 |
WO2007059272A1 (en) | 2007-05-24 |
EP1955220A4 (en) | 2009-08-26 |
US7529748B2 (en) | 2009-05-05 |
CN101305370A (zh) | 2008-11-12 |
KR20080075501A (ko) | 2008-08-18 |
KR101312770B1 (ko) | 2013-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101305370B (zh) | 信息分类范例 | |
Stein et al. | Intrinsic plagiarism analysis | |
CN102402584B (zh) | 多语言文本中的语言识别 | |
CN104881458A (zh) | 一种网页主题的标注方法和装置 | |
Faruque et al. | Ascertaining polarity of public opinions on Bangladesh cricket using machine learning techniques | |
Patel et al. | Dynamic lexicon generation for natural scene images | |
CN103605690A (zh) | 一种即时通信中识别广告消息的装置和方法 | |
Schofield et al. | Identifying hate speech in social media | |
Wenliang et al. | Automatic word clustering for text categorization using global information | |
Hossari et al. | TEST: A terminology extraction system for technology related terms | |
Körner et al. | Evaluating reference string extraction using line-based conditional random fields: A case study with german language publications | |
Sara-Meshkizadeh et al. | Webpage classification based on compound of using HTML features & URL features and features of sibling pages | |
CN111460808B (zh) | 同义文本识别及内容推荐方法、装置及电子设备 | |
CN113642320A (zh) | 文档目录结构的提取方法、装置、设备和介质 | |
EP2461255A1 (en) | Document data processing device | |
Souza et al. | ARCTIC: metadata extraction from scientific papers in pdf using two-layer CRF | |
Algamdi et al. | Twitter accounts suggestion: Pipeline technique spacy entity recognition | |
CN114067343A (zh) | 一种数据集的构建方法、模型训练方法和对应装置 | |
CN109344254B (zh) | 一种地址信息分类方法及装置 | |
CN112270189A (zh) | 一种提问式的分析节点生成方法、系统及存储介质 | |
CN101295320B (zh) | 一种判定锚文本噪声级别的方法及系统 | |
CN102722489B (zh) | 从网页中抽取对象标识符的系统及方法 | |
CN111914868A (zh) | 模型训练方法、异常数据检测方法、装置和电子设备 | |
Vitman et al. | Evaluating the Impact of OCR Quality on Short Texts Classification Task | |
Xu et al. | Contextualized latent semantic indexing: A new approach to automated Chinese essay scoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: MICROSOFT TECHNOLOGY LICENSING LLC Free format text: FORMER OWNER: MICROSOFT CORP. Effective date: 20150505 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20150505 Address after: Washington State Patentee after: MICROSOFT TECHNOLOGY LICENSING, LLC Address before: Washington State Patentee before: Microsoft Corp. |
|
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130306 Termination date: 20211115 |
|
CF01 | Termination of patent right due to non-payment of annual fee |