CN102541857A - Webpage sorting method and device - Google Patents

Webpage sorting method and device Download PDF

Info

Publication number
CN102541857A
CN102541857A CN2010105849844A CN201010584984A CN102541857A CN 102541857 A CN102541857 A CN 102541857A CN 2010105849844 A CN2010105849844 A CN 2010105849844A CN 201010584984 A CN201010584984 A CN 201010584984A CN 102541857 A CN102541857 A CN 102541857A
Authority
CN
China
Prior art keywords
webpage
user
classification
value
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010105849844A
Other languages
Chinese (zh)
Inventor
刘致远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2010105849844A priority Critical patent/CN102541857A/en
Publication of CN102541857A publication Critical patent/CN102541857A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a webpage sorting method and a device. The webpage sorting method comprises the steps of: determining a webpage category set A; respectively creating a N-dimensional webpage category vector for each pre-saved webpage, wherein N values are identical with the number of categories in the webpage category set A for recording the weight of a corresponding webpage in different categories and the value of each weight in an initial state is 0; respectively determining the category of each user recorded in a user behavior log database; adding M to the value of the weight which corresponds to the category of the user in the webpage category vector which corresponds to the webpage browsed by each user; obtaining satisfactory webpages and sorting the webpages in advance when receiving a search request from any user X; determining the search category of the user X; sorting all the pre-sorted webpages again according to the principle that the larger the value of the weight corresponding to the search category is, the more forward the rank is; and displaying the webpages. By applying the scheme provided by the invention, user experience can be enhanced.

Description

A kind of Web page sequencing method and device
Technical field
The present invention relates to search engine technique, particularly a kind of Web page sequencing method and device.
Background technology
Search engine is a very fierce field of current competition.After the user imports a keyword; Usually can obtain thousands of even more Search Results; And for the user, it can hope before first page of display page even first page, promptly to find the result who oneself wants in several webpages, therefore; How each webpage that searches is sorted, with the user experience that directly has influence on the user.
In the prior art, each search engine can come each webpage that searches is sorted by comprehensive polyalgorithm usually, and one of them algorithm is: the webpage ordering that user's number of visits is more is forward more.But; Can there be certain problem in this mode in practical application: if certain webpage is relevant with the interests of a manufacturer; This manufacturer may repeatedly click maliciously this webpage through certain mode so; Thereby making that the ordering of this webpage is forward, all is this type webpages if sort forward, can reduce user experience undoubtedly.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of Web page sequencing method, can promote user experience.
Another object of the present invention is to provide a kind of webpage collator, can promote user experience.
For achieving the above object, technical scheme of the present invention is achieved in that
A kind of Web page sequencing method comprises:
Confirm a webpage classification set A; And be that each webpage of preserving is in advance created the webpage categorization vector that a N ties up respectively; The value of said N is identical with classification number in the said webpage classification set A; Each webpage categorization vector is respectively applied for the weight of its corresponding webpage of record in different classes of, original state, and the value of each weight is 0;
Classification under each user who confirms respectively to write down in the user behavior log database, and in the corresponding webpage categorization vector of the webpage that each user was browsed with this user under the value of the corresponding weight of classification add M, said M is a positive integer;
When the searching request that receives from arbitrary user X, obtain qualified webpage and carry out presort; Confirm the classification of said user X search, the forward more principle of the big more ordering of weight value that foundation is corresponding with the classification of said search is resequenced to each webpage behind the presort, and is shown.
A kind of webpage collator comprises:
First processing unit; Be used for confirming a webpage classification set A; And create the webpage categorization vector of a N dimension respectively for each webpage of preserving in advance, and the value of said N is identical with classification number in the said webpage classification set A, and each webpage categorization vector is respectively applied for and writes down the weight of its corresponding webpage in different classes of; Original state, the value of each weight is 0; Classification under each user who confirms respectively to write down in the user behavior log database, and in the corresponding webpage categorization vector of the webpage that each user was browsed with this user under the value of the corresponding weight of classification add M, said M is a positive integer;
Second processing unit is used for when the searching request that receives from arbitrary user X, obtaining qualified webpage and carrying out presort; Confirm the classification of said user X search, the forward more principle of the big more ordering of weight value that foundation is corresponding with the classification of said search is resequenced to each webpage behind the presort, and is shown.
It is thus clear that; Adopt technical scheme of the present invention; Confirm the weight of each webpage in different classes of of preserving in advance according to the classification under each user who carried out search and the situation of browsing, and according to the classification of active user's search and each webpage of searching corresponding to the value of such other weight, each webpage that searches out is resequenced; Thereby avoided problem such as click maliciously of the prior art as much as possible, promoted user experience.
Description of drawings
Fig. 1 is the process flow diagram of the inventive method embodiment.
Fig. 2 is the composition structural representation of apparatus of the present invention embodiment.
Practical implementation ten thousand formulas
To the problem that exists in the prior art, a kind of brand-new webpage sequencing schemes is proposed among the present invention, can promote user experience.
For make technical scheme of the present invention clearer, understand, below with reference to the accompanying drawing embodiment that develops simultaneously, scheme according to the invention is done to specify further.
Fig. 1 is the process flow diagram of the inventive method embodiment.As shown in Figure 1, may further comprise the steps:
Step 11: confirm a webpage classification set A.
In the existing search engine; Need utilize reptile instruments such as (Crawler) to the internet, to download webpage in advance; And be saved in the web database, the webpage number on the current internet surpasses 10,000,000,000, and reptile can be identifier of the unique distribution of each webpage; And set up index, so that subsequent searches.After the searching request that receives from arbitrary user, from the webpage of being preserved, search qualified webpage, and comprehensive multiple algorithm sorts to each webpage that searches, be shown to the user.In addition, the relevant information when each user searches for has at every turn been imported any keyword and has been browsed which webpage in the Search Results etc. like the user and all can have been recorded in the user behavior log database.
Scheme according to the invention promptly realizes based on above-mentioned web database and user behavior log database.
In this step; Content according to different web pages; Artificial confirm a webpage classification set A, wherein can comprise various classifications such as history, military affairs, physical culture, news, humanity, tourism, automobile, computer, promptly have A={ history, military affairs, physical culture, news, humanity ...).
Step 12: the webpage categorization vector of creating a N dimension for each webpage of preserving in advance respectively; The value of N is identical with classification number in the webpage classification set A; Each webpage categorization vector is respectively applied for the weight of its corresponding webpage of record in different classes of; Original state, the value of each weight is 0.
In the scheme according to the invention; Not that each webpage is incorporated into simply is a kind; But be the webpage categorization vector that each webpage is created a N dimension respectively; Utilize each webpage categorization vector to write down the weight of its corresponding webpage in different classes of, like the weight in historical classification, weight and the weight in Sport Class etc. in military classification.
Step 13: the classification under each user who confirms respectively to write down in the user behavior log database, and in the corresponding webpage categorization vector of the webpage that each user was browsed with this user under the value of the corresponding weight of classification add M, M is a positive integer.
In this step,, confirm the classification that it is affiliated at first respectively to each user who writes down in the user behavior log database.Specifically, to each user Y, confirm the classification of its each search respectively, the classification that searching times is maximum is confirmed as the classification under the user Y.
Illustrate: user Y carried out 5 search altogether; The keyword of each input is respectively " Santana ", " AK47 ", " Buick ", " schumacher ", " Jetta ", and wherein, " Santana ", " Buick " and " Jetta " corresponding class are " automobile "; " AK47 " corresponding class is " military affairs "; " schumacher " corresponding class is " physical culture ", and the classification that promptly user Y searching times is maximum is a class of vehicles, can confirm that so then the affiliated classification of user Y is a class of vehicles.Distinguishingly, if the maximum classification of searching times has a plurality of (number of times is identical), can confirm that user Y belongs to arbitrary classification wherein.How to confirm that different keyword corresponding class are prior art.
User Y is a certain classification of search often; Can think that then user Y is such other expert; Correspondingly, can think that its webpage of browsing is more relevant with this classification, therefore; To each user Y, in can be respectively that its webpage of browsing is the corresponding webpage categorization vector with user Y under the value of the corresponding weight of classification add M.The concrete value of M can be decided according to the actual requirements, and is generally 1.
Illustrate: user Y browsed 10 webpages altogether, and to these 10 webpages, the value of the weight corresponding with class of vehicles adds 1 in respectively that it is the corresponding webpage categorization vector so.
Step 14: when the searching request that receives from arbitrary user X, obtain qualified webpage and carry out presort.
This step specifically be embodied as prior art.
According to existing mode, each webpage behind the presort promptly can be used as Search Results and is shown to user X, but scheme according to the invention also can be handled according to mode shown in the step 15 on this basis further.
Step 15: confirm the classification of user X search, the forward more principle of the big more ordering of weight value that foundation is corresponding with the classification of search is resequenced to each webpage behind the presort, and is shown.
In this step, at first confirm the classification of user X search, the keyword of importing such as user X is " T43 ", can confirm that so then the classification of user X search is the computer classification; Afterwards; Check the value of weight corresponding in the webpage categorization vector of each webpage correspondence that gets access to successively with the computer classification; According to the forward more principle of the big more ordering of value, each webpage behind the presort is resequenced, promptly the presort result is optimized further; At last, each webpage after optimizing is shown to user X in order successively.
Need to prove that in practical application, the keyword of user X input maybe be for a plurality of; Such as being " schumacher " and " Ferrari ", can the classification of user X search be confirmed as any in Sport Class or the class of vehicles so, perhaps; Also can the classification of user X search be confirmed as Sport Class; Promptly be as the criterion with first keyword, also can adopt other to confirm mode as required, the present invention limit this.
Follow-up, when monitoring after user X browsed the arbitrary webpage Z that is shown, at first confirm the classification under the user X; Specifically, if user X had carried out search before this, the classification of user search and the classification of this search before combining so; Confirm the classification that user X is affiliated; Be about to the maximum classification of searching times and confirm as the affiliated classification of user X, if user X did not carry out search before this, the classification of user X search is the affiliated classification of user X so; Afterwards, in the webpage categorization vector that webpage Z is corresponding with user X under the value of the corresponding weight of classification add 1.Can find out, determine the weight of each webpage in different classes of according to the said mode of step 13 after, follow-uply also can it constantly be optimized according to user's the situation of browsing.
According to existing mode, after user X finished this search, this had imported user X any keyword and has browsed the information such as which webpage in the Search Results and also can have been recorded in the user behavior log database.
Based on above-mentioned introduction, Fig. 2 is the composition structural representation of apparatus of the present invention embodiment.As shown in Figure 2, comprising:
First processing unit 21; Be used for confirming a webpage classification set A; And create the webpage categorization vector of a N dimension respectively for each webpage of preserving in advance, and the value of N is identical with classification number in the webpage classification set A, and each webpage categorization vector is respectively applied for and writes down the weight of its corresponding webpage in different classes of; Original state, the value of each weight is 0; Classification under each user who confirms respectively to write down in the user behavior log database, and in the corresponding webpage categorization vector of the webpage that each user was browsed with this user under the value of the corresponding weight of classification add M, M is a positive integer;
Second processing unit 22 is used for when the searching request that receives from arbitrary user X, obtaining qualified webpage and carrying out presort; Confirm the classification of user X search, the forward more principle of the big more ordering of weight value that foundation is corresponding with the classification of search is resequenced to each webpage behind the presort, and is shown.
Wherein, can specifically comprise in first processing unit 21:
First handles subelement 211; Be used for confirming a webpage classification set A; And create the webpage categorization vector of a N dimension respectively for each webpage of preserving in advance, and the value of N is identical with classification number in the webpage classification set A, and each webpage categorization vector is respectively applied for and writes down the weight of its corresponding webpage in different classes of; Original state, the value of each weight is 0;
Second handles subelement 212; Be used for being directed against each user Y that the user behavior log database writes down; Confirm the classification of its each search respectively; The classification that searching times is maximum is confirmed as the classification under the user Y, and in the corresponding webpage categorization vector of the webpage that user Y was browsed with user Y under the value of the corresponding weight of classification add M.
In addition; Second processing unit 22 also can be further used for; When monitoring after user X browsed the arbitrary webpage Z that is shown, confirm the classification under the user X, and in the webpage categorization vector that webpage Z is corresponding with user X under the value of the corresponding weight of classification add M.
The concrete workflow of device embodiment shown in Figure 2 repeats no more please with reference to the respective description among the method embodiment shown in Figure 1 here.
In a word, adopt technical scheme of the present invention, can promote user experience.
The above is merely preferred embodiment of the present invention, and is in order to restriction the present invention, not all within spirit of the present invention and principle, any modification of being made, is equal to replacement, improvement etc., all should be included within the scope that the present invention protects.

Claims (7)

1. a Web page sequencing method is characterized in that, comprising:
Confirm a webpage classification set A; And be that each webpage of preserving is in advance created the webpage categorization vector that a N ties up respectively; The value of said N is identical with classification number in the said webpage classification set A; Each webpage categorization vector is respectively applied for the weight of its corresponding webpage of record in different classes of, original state, and the value of each weight is 0;
Classification under each user who confirms respectively to write down in the user behavior log database, and in the corresponding webpage categorization vector of the webpage that each user was browsed with this user under the value of the corresponding weight of classification add M, said M is a positive integer;
When the searching request that receives from arbitrary user X, obtain qualified webpage and carry out presort; Confirm the classification of said user X search, the forward more principle of the big more ordering of weight value that foundation is corresponding with the classification of said search is resequenced to each webpage behind the presort, and is shown.
2. method according to claim 1 is characterized in that, the classification under said each user who confirms respectively to write down in the user behavior log database comprises:
To each user Y, confirm the classification of its each search respectively, the classification that searching times is maximum is confirmed as the classification under the said user Y.
3. method according to claim 1 is characterized in that, this method further comprises:
When monitoring after said user X browsed the arbitrary webpage Z that is shown, confirm the classification under the said user X, and in the webpage categorization vector that said webpage Z is corresponding with said user X under the value of the corresponding weight of classification add M.
4. according to claim 1,2 or 3 described methods, it is characterized in that the value of said M is 1.
5. a webpage collator is characterized in that, comprising:
First processing unit; Be used for confirming a webpage classification set A; And create the webpage categorization vector of a N dimension respectively for each webpage of preserving in advance, and the value of said N is identical with classification number in the said webpage classification set A, and each webpage categorization vector is respectively applied for and writes down the weight of its corresponding webpage in different classes of; Original state, the value of each weight is 0; Classification under each user who confirms respectively to write down in the user behavior log database, and in the corresponding webpage categorization vector of the webpage that each user was browsed with this user under the value of the corresponding weight of classification add M, said M is a positive integer;
Second processing unit is used for when the searching request that receives from arbitrary user X, obtaining qualified webpage and carrying out presort; Confirm the classification of said user X search, the forward more principle of the big more ordering of weight value that foundation is corresponding with the classification of said search is resequenced to each webpage behind the presort, and is shown.
6. device according to claim 5 is characterized in that, said first processing unit comprises:
First handles subelement; Be used for confirming a webpage classification set A; And create the webpage categorization vector of a N dimension respectively for each webpage of preserving in advance, and the value of said N is identical with classification number in the said webpage classification set A, and each webpage categorization vector is respectively applied for and writes down the weight of its corresponding webpage in different classes of; Original state, the value of each weight is 0;
Second handles subelement; Be used for being directed against each user Y that said user behavior log database writes down; Confirm the classification of its each search respectively; The classification that searching times is maximum is confirmed as the classification under the said user Y, and in the corresponding webpage categorization vector of the webpage that said user Y was browsed with said user Y under the value of the corresponding weight of classification add M.
7. according to claim 5 or 6 described devices; It is characterized in that; Said second processing unit is further used for; When monitoring after said user X browsed the arbitrary webpage Z that is shown, confirm the classification under the said user X, and in the webpage categorization vector that said webpage Z is corresponding with said user X under the value of the corresponding weight of classification add M.
CN2010105849844A 2010-12-08 2010-12-08 Webpage sorting method and device Pending CN102541857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105849844A CN102541857A (en) 2010-12-08 2010-12-08 Webpage sorting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105849844A CN102541857A (en) 2010-12-08 2010-12-08 Webpage sorting method and device

Publications (1)

Publication Number Publication Date
CN102541857A true CN102541857A (en) 2012-07-04

Family

ID=46348780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105849844A Pending CN102541857A (en) 2010-12-08 2010-12-08 Webpage sorting method and device

Country Status (1)

Country Link
CN (1) CN102541857A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246677A (en) * 2012-02-13 2013-08-14 广州淘信互联网科技有限公司 Search method and search system on basis of social intercourse
CN104636366A (en) * 2013-11-11 2015-05-20 腾讯科技(深圳)有限公司 Method and device for aquiring search result queue
CN107885783A (en) * 2017-10-17 2018-04-06 北京京东尚科信息技术有限公司 The method and apparatus for obtaining the high relevant classification of search term

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6963867B2 (en) * 1999-12-08 2005-11-08 A9.Com, Inc. Search query processing to provide category-ranked presentation of search results
CN1996316A (en) * 2007-01-09 2007-07-11 天津大学 Search engine searching method based on web page correlation
CN101079064A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Web page sequencing method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6963867B2 (en) * 1999-12-08 2005-11-08 A9.Com, Inc. Search query processing to provide category-ranked presentation of search results
CN1996316A (en) * 2007-01-09 2007-07-11 天津大学 Search engine searching method based on web page correlation
CN101079064A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Web page sequencing method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246677A (en) * 2012-02-13 2013-08-14 广州淘信互联网科技有限公司 Search method and search system on basis of social intercourse
CN104636366A (en) * 2013-11-11 2015-05-20 腾讯科技(深圳)有限公司 Method and device for aquiring search result queue
CN104636366B (en) * 2013-11-11 2020-06-02 腾讯科技(深圳)有限公司 Method and device for acquiring search result queue
CN107885783A (en) * 2017-10-17 2018-04-06 北京京东尚科信息技术有限公司 The method and apparatus for obtaining the high relevant classification of search term
CN107885783B (en) * 2017-10-17 2020-11-03 北京京东尚科信息技术有限公司 Method and device for obtaining high-correlation classification of search terms

Similar Documents

Publication Publication Date Title
CN102760138B (en) Classification method and device for user network behaviors and search method and device for user network behaviors
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
US8775471B1 (en) Representing user behavior information
KR101315554B1 (en) Keyword assignment to a web page
CN104142940A (en) Information recommendation processing method and information recommendation processing device
US20100030768A1 (en) Classifying documents using implicit feedback and query patterns
CN103838798B (en) Page classifications system and page classifications method
CN105701216A (en) Information pushing method and device
CN1781100A (en) System and method for generating refinement categories for a set of search results
CN102799647A (en) Method and device for webpage reduplication deletion
CN104021161A (en) Cluster storage method and device
CN104750754A (en) Website industry classification method and server
WO2009000174A1 (en) Method and device of web page rank
US8156073B1 (en) Item attribute generation using query and item data
CN104699751A (en) Search recommending method and device based on search terms
CN101963965A (en) Document indexing method, data query method and server based on search engine
CN105512104A (en) Dictionary dimension reducing method and device and information classifying method and device
CN102710795A (en) Hotspot collecting method and device
WO2018137420A1 (en) Generating method and device for information recommendation list
CN111325030A (en) Text label construction method and device, computer equipment and storage medium
CN104834736A (en) Method and device for establishing index database and retrieval method, device and system
CN105302807A (en) Method and apparatus for obtaining information category
CN117150050A (en) Knowledge graph construction method and system based on large language model
CN110516164B (en) Information recommendation method, device, equipment and storage medium
CN115130601A (en) Two-stage academic data webpage classification method and system based on multi-dimensional feature fusion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20120704

RJ01 Rejection of invention patent application after publication