CN104216979A - Chinese technology patent automatic classification system and method for patent classification by using system - Google Patents

Chinese technology patent automatic classification system and method for patent classification by using system Download PDF

Info

Publication number
CN104216979A
CN104216979A CN201410441093.1A CN201410441093A CN104216979A CN 104216979 A CN104216979 A CN 104216979A CN 201410441093 A CN201410441093 A CN 201410441093A CN 104216979 A CN104216979 A CN 104216979A
Authority
CN
China
Prior art keywords
classification
technique
module
function
title
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410441093.1A
Other languages
Chinese (zh)
Other versions
CN104216979B (en
Inventor
耿俊浩
刘永刚
王刚锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201410441093.1A priority Critical patent/CN104216979B/en
Publication of CN104216979A publication Critical patent/CN104216979A/en
Application granted granted Critical
Publication of CN104216979B publication Critical patent/CN104216979B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Abstract

The invention discloses a Chinese technology patent automatic classification system and a method for patent classification by using the system, in order to solve the problem that the classification efficiency of the existing patent classification system is low. The technical scheme is that the system comprises a client computer, an application server and a database, wherein the client computer is used for setting classification parameters, setting category markers and viewing classification results; the application server comprises a patent acquisition module, a static matching module and a dynamic clustering module, wherein the patent acquisition module is used for acquiring a title and an abstract of a patent document, the static matching module is used for matching and searching the title or the abstract of the patent in a classification word bank to conduct preliminary classification, and the dynamic clustering module is used for conducting classification processing to remaining patent sets after static matching; the database is used for storing patent information and classification results. Since static matching classification and dynamic clustering are adopted in a combined manner to process Chinese technology patent classification, the efficiency of the patent classification system is improved.

Description

Chinese technique patent automatic classifying system and utilize this system to carry out the method for patent classification
Technical field
The present invention relates to a kind of patent classification system, particularly one Chinese technique patent automatic classifying system.Also relate to a kind of method utilizing this Chinese technique patent automatic classifying system to carry out patent classification.
Background technology
Technique research and development is a kind of Facing to Manufacturing technical field, uses a large amount of process knowledge to carry out the complex process of novelty activity, and consequently the specific process of creationary application, realizes the process of specific manufacturing object and manufacturing feature thereof.Therefore, if technique research and development personnel can use for reference the multidisciplinary process knowledge of the high-quality with process similarity method, manufacturing object or manufacturing feature fast, in a large number, can effectively lifting process research and development efficiency.
Technique patent is generally technical contradiction for solving in existing technological problems and proposes a kind of new process or solution, has contained the multidisciplinary principle knowledge solving technological problems.Meanwhile, the feature that its process related to, manufacturing object or manufacturing feature these three distinguishes technology field feature is generally comprised in the title of technique patent or summary.Therefore, technique patent because of its novelty and practicality feature and become the important Knowledge Source of technique research and development.If the mode of technique patent according to process, manufacturing object and manufacturing feature classified, for technique research and development provides the reference of similar knowledge, the efficiency of technique research and development effectively can be promoted.But also lack similar technique patent classification method at present, technique research and development personnel mainly adopt the mode of manual sort to use patent knowledge, have impact on the efficiency of technique research and development.
The research of current Chinese patent automatic classifying is mainly based on the classification of International Patent Classification IPC (Inter-Process Communication), and belonging to patent description object, engineering field divides.Document " according to the Chinese patent automatic classifying of TRIZ inventive principle; Harbin University of Science and Technology's journal; 2013; Vol.18NO.3Jun.2013; p1-5 " discloses a kind of patent retrieval needs for carrying out innovation by TRIZ theory, proposes to utilize Text Mining Technology realization to carry out the automatic classification towards TRIZ inventive principle to Chinese patent.First the method carries out analysis to 40 basic TRIZ invention theories and again divides into groups, then word segmentation processing is carried out to patent text, feature selecting algorithm carries out Feature Dimension Reduction, finally carry out class test to Chinese patent, result shows can realize according to TRIZ inventive principle Chinese patent automatic classifying by means of Text Classification.But, method in the document is not for technique patent, its mode classification is not according to three features of technique research and development demand: technique patent collection is carried out target classification by process, manufacturing object, manufacturing feature, thus its mode classification is not suitable for technique research and development demand, can not supporting process development activities effectively.
Summary of the invention
In order to overcome the low deficiency of existing patent classification system classification effectiveness, the invention provides a kind of Chinese technique patent automatic classifying system.This system comprises client computer, application server and database.Computing machine is connected with application server respectively by network, and application server is by data line and DataBase combining.Client computer is used for sorting parameter setting, classification mark arranges and checks classification results.Application server comprises patent acquisition module, static matching module and dynamic clustering module.Wherein, patent acquisition module is for obtaining title and the summary of one section of patent documentation.Static matching module be used for classified lexicon matched and searched patent title or summary carry out preliminary classification.Dynamic clustering module comprises Chinese word segmentation function, part-of-speech tagging function, removes stop words function, word frequency statistics function, Feature Words abstraction function, clustering processing function and classification marking Function.Dynamic clustering module is used for carrying out classification process to the remaining patent collection of static matching.Database is for storing patent information and storing classification results.Process Chinese technique patent classification owing to adopting static matching classification to combine with dynamic clustering, the efficiency of patent classification system can be improved.
The present invention also provides the method utilizing this Chinese technique patent automatic classifying system to carry out patent classification.
The technical solution adopted for the present invention to solve the technical problems is: a kind of Chinese technique patent automatic classifying system, is characterized in: comprise client computer, application server and database.Client computer has multiple stage, and multiple stage client computer is connected with application server respectively by network, and application server is by data line and DataBase combining.Client computer is used for sorting parameter setting, classification mark arranges and checks classification results.Application server comprises patent acquisition module, static matching module and dynamic clustering module.Wherein, patent acquisition module is for obtaining title and the summary of one section of patent documentation.Static matching module be used for classified lexicon matched and searched patent title or summary carry out preliminary classification.Dynamic clustering module comprises Chinese word segmentation function, part-of-speech tagging function, removes stop words function, word frequency statistics function, Feature Words abstraction function, clustering processing function and classification marking Function.Dynamic clustering module is used for carrying out classification process to the remaining patent collection of static matching.Database is for storing patent information and storing classification results.
Utilize above-mentioned Chinese technique patent automatic classifying system to carry out a method for patent classification, be characterized in comprising the following steps:
Step one, centered by process, in conjunction with manufacturing object and manufacturing feature, technique patent to be classified according to two kinds of modes.One is process and manufacturing object; Another kind is process and manufacturing feature;
Step 2, search technique patent collection with technique domain classification lexicon static matching,
1) domain expert collective concludes and builds technology field classification lexicon;
2) title of technique patent collection or summary are matched, under namely the patent collection directly matching classificating word belongs to this classification with lexicon of classifying;
Step 3, dynamic clustering is carried out to the technique patent collection do not matched, finally carries out classification mark and join in classification lexicon,
1) title and the summary of static matching residue patent is obtained;
2) participle, part-of-speech tagging carried out to technique patent and go stop words pre-service;
3) word frequency statistics is carried out to the title of each technique patent and summary and Feature Words extracts, comprise three parts: manufacturing object, process and manufacturing feature, three parts are as the feature of patent, and each part all extracts keyword to represent the classification of patent from patent;
4) carry out clustering processing, carry out cluster respectively to patent collection three class keywords group, classification patent assigned to carries out marking, adding up, the patent simultaneously matching assemblage characteristic then for the purpose of the classification that requires;
5) classification mark carried out to the result of cluster and join in classification lexicon.
The invention has the beneficial effects as follows: this system comprises client computer, application server and database.Computing machine is connected with application server respectively by network, and application server is by data line and DataBase combining.Client computer is used for sorting parameter setting, classification mark arranges and checks classification results.Application server comprises patent acquisition module, static matching module and dynamic clustering module.Wherein, patent acquisition module is for obtaining title and the summary of one section of patent documentation.Static matching module be used for classified lexicon matched and searched patent title or summary carry out preliminary classification.Dynamic clustering module comprises Chinese word segmentation function, part-of-speech tagging function, removes stop words function, word frequency statistics function, Feature Words abstraction function, clustering processing function and classification marking Function.Dynamic clustering module is used for carrying out classification process to the remaining patent collection of static matching.Database is for storing patent information and storing classification results.Process Chinese technique patent classification owing to adopting static matching classification to combine with dynamic clustering, improve the efficiency of patent classification system.
The present invention is described in detail below in conjunction with the drawings and specific embodiments.
Accompanying drawing explanation
Fig. 1 is the Organization Chart of the present invention's Chinese technique patent automatic classifying system.
Fig. 2 is the present invention's Chinese technique patent automatic classifying system chart.
Fig. 3 is that the present invention utilizes Chinese technique patent automatic classifying system to carry out the method flow diagram of patent classification.
Fig. 4 utilizes above-mentioned Chinese technique patent automatic classifying system to carry out the static matching sorting operations process flow diagram of the method for patent classification.
Fig. 5 utilizes above-mentioned Chinese technique patent automatic classifying system to carry out the dynamic clustering process operation process chart of the method for patent classification.
Embodiment
Embodiment 1.With reference to Fig. 1-5.The present invention's Chinese technique patent automatic classifying system comprises client computer 1, application server 3 and database 4.Described client computer 1 is connected with application server 3 by network 2, and described application server 3 is connected with database 4 by data line, and application server 3 is for classifying to technique patent.The patent of the present embodiment belongs to the technique patent set in a certain special process field.Client computer 1 carries out patent classification for operating personnel and arranges the displaying with classification results.Database 4 is for storing patent information and patent classification result.Above-mentioned patent information refers to the full detail of the patent disclosing or announce, and comprises the patent No. of patent, title, summary, technical field, background technology, summary of the invention, and accompanying drawing illustrates, embodiment, patent document etc.
The functional module of present system entirety.Client computer comprises and arranges sorting parameter function, and mark classification arranges function, shows classification results function.Application server comprises a patent acquisition module, static matching module and dynamic clustering module.Wherein, patent acquisition module is for obtaining title and the summary of one section of patent documentation.Static matching module be used for classified lexicon matched and searched patent title or summary carry out preliminary classification.Dynamic clustering module is used for carrying out classification process to the remaining patent collection of static matching.Dynamic clustering module comprises a Chinese word segmentation function, a part-of-speech tagging function, removes stop words function, a word frequency statistics function, a Feature Words abstraction function, a clustering processing function and a classification marking Function.Database is for the result of the information and classification that store patent.
First technique patent collection is carried out to the process of static matching module, under the patent collection matched with technology field classified lexicon is referred to each classification, then the process that the technique patent collection do not matched carries out dynamic clustering module is remained, afterwards classification mark is carried out to the result of dynamic clustering, and the classifier marked is joined in classified lexicon.
Domain expert according to process+manufacturing object, the mode classification of process+manufacturing feature, the classification lexicon of technique patent is concluded by collective, manufacturing enterprise can set up enterprise's technique taxonomic hierarchies according to concrete process characteristic, as manufacturing object is divided into turbine, blade, diffuser etc. by certain aeromotor manufacturing enterprise, process is divided into milling, grinding, electric sparks etc., are divided into cylindrical by manufacturing feature, hole, root inclined-plane etc.Then use the title of Field Words storehouse static matching patent collection D0, if coupling, directly sort out, again mate the summary of residue patent collection D1, sort out, finally remain the patent collection D2 do not matched, lexicon does not comprise classificating word, needs dynamic clustering.
Technique patent dynamic clustering comprises the following steps:
1) carry out composition acquisition to patent collection D2, selection title+make a summary as the part of dynamic clustering, user realizes on client computer 1;
2) carry out participle, part-of-speech tagging process to Chinese technique patent, the participle dictionary building technology field carries out Chinese word segmentation operation to patent title and summary, is realized by application server 3 accessing database 4;
3) 2) basis on judge the attribute of participle, remove stop words, remove the function word not having practical significance, written complaint, conjunction, neutral words, characteristic is not obvious helps little vocabulary etc. to classification, leave noun, name verb, verb etc., realized by application server 3 accessing database 4;
4) make word frequency statistics and disappear heavily to process to these nouns, name verb, verb etc.The patent name of correspondence and the keyword of summary are sorted by word frequency.Concentrate the highest keyword of word frequency to put in the first place by whole patent, list other keywords successively.In addition, the noun little to those and patented technology thematic relation also does Transformatin, as " system ", " method ", and " device ", " program " etc., are realized by application server 3 accessing database 4;
5) Feature Words extraction is carried out to each patent file, comprise three parts: manufacturing object, process, manufacturing feature.Three parts are as the feature of patent, manufacturing object extracts noun from title division, process is from summary extracting section verb, choose the representative of 0-3 keyword as patent respectively, by increasing row " sort key word " stored in patent database, realized by application server 3 accessing database 4;
6) the technique patent classification method of density based, is similar to hierarchical agglomerative clustering algorithm.Calculate the semantic similarity between each document, the semantic similarity of each patent file is calculated by Feature Words similarity.Respectively for manufacturing object, process, the keyword of manufacturing feature is to all patent file collection traversal clusters, the patent that semantic similarity is the highest merges into one bunch, again above-mentioned steps again, until the cluster result forming specific threshold, is realized by application server 3 accessing database 4;
7) be associated with each patent No. according to cluster result, add up sorted result, the patent collection simultaneously assigning to process and manufacturing object or process and manufacturing feature is just classified as a class, becomes other patent collection of target class.Finally, carry out handmarking to the technique patent classification that dynamic clustering obtains, join in technique classification lexicon, user processes and checks classification results on client computer 1.

Claims (2)

1. a Chinese technique patent automatic classifying system, is characterized in that comprising client computer, application server and database; Client computer has multiple stage, and multiple stage client computer is connected with application server respectively by network, and application server is by data line and DataBase combining; Client computer is used for sorting parameter setting, classification mark arranges and checks classification results; Application server comprises patent acquisition module, static matching module and dynamic clustering module; Wherein, patent acquisition module is for obtaining title and the summary of one section of patent documentation; Static matching module be used for classified lexicon matched and searched patent title or summary carry out preliminary classification; Dynamic clustering module comprises Chinese word segmentation function, part-of-speech tagging function, removes stop words function, word frequency statistics function, Feature Words abstraction function, clustering processing function and classification marking Function; Dynamic clustering module is used for carrying out classification process to the remaining patent collection of static matching; Database is for storing patent information and storing classification results.
2. utilize Chinese technique patent automatic classifying system described in claim 1 to carry out a method for patent classification, it is characterized in that comprising the following steps:
Step one, centered by process, in conjunction with manufacturing object and manufacturing feature, technique patent to be classified according to two kinds of modes; One is process and manufacturing object; Another kind is process and manufacturing feature;
Step 2, search technique patent collection with technique domain classification lexicon static matching,
1) domain expert collective concludes and builds technology field classification lexicon;
2) title of technique patent collection or summary are matched, under namely the patent collection directly matching classificating word belongs to this classification with lexicon of classifying;
Step 3, dynamic clustering is carried out to the technique patent collection do not matched, finally carries out classification mark and join in classification lexicon,
1) title and the summary of static matching residue patent is obtained;
2) participle, part-of-speech tagging carried out to technique patent and go stop words pre-service;
3) word frequency statistics is carried out to the title of each technique patent and summary and Feature Words extracts, comprise three parts: manufacturing object, process and manufacturing feature, three parts are as the feature of patent, and each part all extracts keyword to represent the classification of patent from patent;
4) carry out clustering processing, carry out cluster respectively to patent collection three class keywords group, classification patent assigned to carries out marking, adding up, the patent simultaneously matching assemblage characteristic then for the purpose of the classification that requires;
5) classification mark carried out to the result of cluster and join in classification lexicon.
CN201410441093.1A 2014-09-01 2014-09-01 Chinese technique patent automatic classifying system and the method that patent classification is carried out using the system Expired - Fee Related CN104216979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410441093.1A CN104216979B (en) 2014-09-01 2014-09-01 Chinese technique patent automatic classifying system and the method that patent classification is carried out using the system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410441093.1A CN104216979B (en) 2014-09-01 2014-09-01 Chinese technique patent automatic classifying system and the method that patent classification is carried out using the system

Publications (2)

Publication Number Publication Date
CN104216979A true CN104216979A (en) 2014-12-17
CN104216979B CN104216979B (en) 2017-12-05

Family

ID=52098469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410441093.1A Expired - Fee Related CN104216979B (en) 2014-09-01 2014-09-01 Chinese technique patent automatic classifying system and the method that patent classification is carried out using the system

Country Status (1)

Country Link
CN (1) CN104216979B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809106A (en) * 2015-05-15 2015-07-29 合肥汇众知识产权管理有限公司 System and method for excavating patent schemes
CN104881401A (en) * 2015-05-27 2015-09-02 大连理工大学 Patent literature clustering method
CN107609169A (en) * 2017-09-27 2018-01-19 合肥博力生产力促进中心有限公司 A kind of patent name back-stage management analysis system based on database
CN108133009A (en) * 2017-12-22 2018-06-08 新奥(中国)燃气投资有限公司 A kind of information storage means and device
CN108710706A (en) * 2018-05-28 2018-10-26 江苏中安环能新能源科技有限公司 A kind of searching method, system and device
CN111400498A (en) * 2020-03-20 2020-07-10 广州需你计算机服务有限公司 Short message clustering method based on dimension reduction
US11507850B2 (en) * 2018-08-15 2022-11-22 Royal Bank Of Canada System and method for call centre management

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120006A1 (en) * 2003-05-30 2005-06-02 Geosign Corporation Systems and methods for enhancing web-based searching
CN101853250A (en) * 2009-04-03 2010-10-06 华为技术有限公司 Method and device for classifying documents
CN102542061A (en) * 2011-12-30 2012-07-04 互动在线(北京)科技有限公司 Intelligent product classification method
CN102752644A (en) * 2012-06-15 2012-10-24 四川长虹电器股份有限公司 Automatic program classification method of set top box
CN103631887A (en) * 2013-11-15 2014-03-12 北京奇虎科技有限公司 Method for network search at browser side and browser

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120006A1 (en) * 2003-05-30 2005-06-02 Geosign Corporation Systems and methods for enhancing web-based searching
CN101853250A (en) * 2009-04-03 2010-10-06 华为技术有限公司 Method and device for classifying documents
CN102542061A (en) * 2011-12-30 2012-07-04 互动在线(北京)科技有限公司 Intelligent product classification method
CN102752644A (en) * 2012-06-15 2012-10-24 四川长虹电器股份有限公司 Automatic program classification method of set top box
CN103631887A (en) * 2013-11-15 2014-03-12 北京奇虎科技有限公司 Method for network search at browser side and browser

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CONGLE ZHANG ET AL.: ""Keyword-Labeled Classification with Auxiliary Unlabeled Documents"", 《2008 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY》 *
卢雪燕: "《基于关键词的文献分类》", 《广西大学梧州分校学报》 *
蒋健安 等: ""一种面向专利文献数据的文本自动分类方法"", 《计算机应用》 *
陈爽 等: "《一种启发式网络信息采集系统设计与实现》", 《北京石油化工学院学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809106A (en) * 2015-05-15 2015-07-29 合肥汇众知识产权管理有限公司 System and method for excavating patent schemes
CN104881401A (en) * 2015-05-27 2015-09-02 大连理工大学 Patent literature clustering method
CN104881401B (en) * 2015-05-27 2017-10-17 大连理工大学 A kind of patent document clustering method
CN107609169A (en) * 2017-09-27 2018-01-19 合肥博力生产力促进中心有限公司 A kind of patent name back-stage management analysis system based on database
CN108133009A (en) * 2017-12-22 2018-06-08 新奥(中国)燃气投资有限公司 A kind of information storage means and device
CN108710706A (en) * 2018-05-28 2018-10-26 江苏中安环能新能源科技有限公司 A kind of searching method, system and device
US11507850B2 (en) * 2018-08-15 2022-11-22 Royal Bank Of Canada System and method for call centre management
CN111400498A (en) * 2020-03-20 2020-07-10 广州需你计算机服务有限公司 Short message clustering method based on dimension reduction

Also Published As

Publication number Publication date
CN104216979B (en) 2017-12-05

Similar Documents

Publication Publication Date Title
CN109189942B (en) Construction method and device of patent data knowledge graph
CN104216979A (en) Chinese technology patent automatic classification system and method for patent classification by using system
CN109992645B (en) Data management system and method based on text data
CN101593200B (en) Method for classifying Chinese webpages based on keyword frequency analysis
CN110209808B (en) Event generation method based on text information and related device
Tang et al. Enriching short text representation in microblog for clustering
CN102419778B (en) Information searching method for discovering and clustering sub-topics of query statement
CN102012900B (en) An information retrieval method and system
CN104281702B (en) Data retrieval method and device based on electric power critical word participle
CN104166651A (en) Data searching method and device based on integration of data objects in same classes
CN104239513A (en) Semantic retrieval method oriented to field data
CN103699525A (en) Method and device for automatically generating abstract on basis of multi-dimensional characteristics of text
CN101794311A (en) Fuzzy data mining based automatic classification method of Chinese web pages
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
CN103412888A (en) Point of interest (POI) identification method and device
CN104133916A (en) Search result information organizational method and device
CN103838754A (en) Information searching device and method
CN102542061A (en) Intelligent product classification method
CN104915405A (en) Microblog query expansion method based on multiple layers
Medvet et al. Brand-related events detection, classification and summarization on twitter
CN103823868A (en) Event recognition method and event relation extraction method oriented to on-line encyclopedia
CN103853771A (en) Search result pushing method and search result pushing system
Dang et al. WordNet-based suffix tree clustering algorithm
Brenner et al. MediaEval 2013: Social Event Detection, Retrieval and Classification in Collaborative Photo Collections.
CN102982063A (en) Control method based on tuple elaboration of relation keywords extension

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171205

Termination date: 20190901