CN103116580A - Providing method, system and device of website content information - Google Patents

Providing method, system and device of website content information Download PDF

Info

Publication number
CN103116580A
CN103116580A CN2011103626460A CN201110362646A CN103116580A CN 103116580 A CN103116580 A CN 103116580A CN 2011103626460 A CN2011103626460 A CN 2011103626460A CN 201110362646 A CN201110362646 A CN 201110362646A CN 103116580 A CN103116580 A CN 103116580A
Authority
CN
China
Prior art keywords
information
site
linked object
link information
view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103626460A
Other languages
Chinese (zh)
Inventor
王寓辰
倪伟
毕娅娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN2011103626460A priority Critical patent/CN103116580A/en
Publication of CN103116580A publication Critical patent/CN103116580A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a providing method, a system and a device of website content information. The method comprises the following steps: carrying out crawling search according to obtained initial interlinking information of introduced websites to obtain interlinking information which is included by the introduced website, and obtaining linked object of the interlinking information and attribute information of the attribute information; according to the obtained linked object of the interlinking information and the attribute information of the attribute information, building linked object indexes which correspond to the interlinking information; according to incidence relations between the linked object indexes of each interlinking information, building website resource view of each introduced website; containing the linked object indexes, corresponding to the interlinking information and arranged according to a set rule, of each introduced website by the website resource view; supplying the website content information to a website information requester according to the built website resource view. The content information of the introduced websites can be obtained accurately and timely, and the website information requester is supported to accurately schedule the content.

Description

Web site contents information providing method, system and device
Technical field
The present invention relates to the data service field, espespecially a kind of web site contents information providing method, system and device.
Background technology
Internet data center (Internet Data Center, IDC) can realize introducing web site contents, for the user provides service.IDC can directly or indirectly pass through content distributing network (Content Delivery Network, CDN) provides the web site contents service for the user.
At present, IDC is generally non-full dose mode when introducing the web site contents source, only introduce website partial channel or partial content, IDC to trustship web site contents information to obtain, upgrade and manage what usually adopt be following mode, the one, the provider manually declares by site resource, thereby grasps the site resource information of introducing; The 2nd, obtain and dispose the site resource information of introducing by the IDC administrator hand.
Above-mentioned IDC introduces in the mode of web site contents, and mode one highly relies on the active operation of content providers, can't guarantee accuracy, promptness and the accuracy requirement of web site contents index; Mode two need to expend a large amount of IDC management costs of labor, and under the efficient, can't guarantee to introduce upgrading in time of content indexing.That is to say, present stage has only been realized the control of device level to the control of IDC web site contents introducing, the control ratio of content-level is more extensive, is difficult to obtain accurately, timely the content indexing information that IDC introduces the website with lower cost, for the accurate management of IDC content causes certain difficulty.
Existing IDC content is introduced and controlling mechanism, when being used for the CDN network, since in the CDN network except the web site contents service of mainly introducing towards IDC, buffer control (WebCache) system that also has buffer memory focus web site contents, the coordinates user access scheduling with the web site contents WebCache system cache that introduce to IDC can be unified in CDN scheduling of resource center, so that the user rationally accesses the web site contents that IDC and WebCache system introduce.
Generally speaking, because the direct content oriented provider of IDC introduces web site contents, the web site contents of its introducing should be more timely with respect to the web site contents renewal of WebCache system cache, therefore, the general web site contents of wishing to be preferably user's dispatch id C introducing, but also there is the WebCache system in the CDN network, probably exist the content of user's request access when being introduced into IDC, but to be scheduled to the WebCache system, significant wastage IDC system resource and WebCache cache resources, for fear of conflict, need to grasp in detail the web site contents information that IDC introduces.
Under existing IDC content introducing and controlling mechanism, the web site contents information spinner that IDC introduces will rely on provider or IDC administrator hand to upgrade, and renewal speed is slow, complicated operation, and accuracy is low, and real-time is relatively poor.After IDC web site contents source is upgraded, the CDN bus system can't in time be known the alteration of IDC content, therefore, just might occur the access conflict of the web site contents of web site contents that IDC introduces and WebCache buffer memory when the user accesses, the CDN bus can't know that be preferably user's dispatch id C introduces site resource or the site resource of WebCache buffer memory.
Summary of the invention
The embodiment of the invention provides a kind of web site contents information providing method, system and device, there is the web site contents that to know accurately that IDC introduces in the prior art in order to solve, cause the accurately access to content request of dispatched users, the problem of waste system resource.
A kind of web site contents information providing method comprises:
According to the search of creeping of the initial link information of the introducing website that obtains, get access to the link information that described introducing website comprises, and obtain linked object and the attribute information thereof of described link information;
According to linked object and the attribute information thereof of the described link information that obtains, set up linked object index corresponding to described link information;
According to the incidence relation between the linked object index of each described link information, set up and respectively introduce the site resource view of website; Comprise in the described site resource view by setting regularly arranged linked object index corresponding to the link information of respectively introducing the website;
Provide web site contents information according to the site resource view of setting up to the site information requesting party.
A kind of web site contents information provider unit comprises:
Search module for the search of creeping according to the initial link information of the introducing website that obtains, gets access to the link information that described introducing website comprises, and obtains linked object and the attribute information thereof of described link information;
Index module is used for setting up linked object index corresponding to described link information according to the linked object and the attribute information thereof that obtain;
View resource generation module is used for according to the incidence relation between the linked object index of each described link information, sets up and respectively introduces the site resource view of website; Comprise in the described site resource view by setting regularly arranged linked object index corresponding to the link information of respectively introducing the website;
The access retrieval module is used for providing web site contents information according to the site resource view of setting up to the site information requesting party.
A kind of web site contents information providing system comprises above-mentioned web site contents information provider unit and at least one site information requesting service.
Beneficial effect of the present invention is as follows:
Web site contents information providing method, system and device that the embodiment of the invention provides, to comprise website initial link information and with its related all-links information step by step, and corresponding linked object carries out association index, and the resource view that sets up a web site, thereby can accurately know the web site contents information of IDC introducing and for the user provides, can also save the utilization that query time reduces system resource simultaneously; Even have at the same time the site resource of IDC introducing and the site resource of WebCache buffer memory, also can be preferably the site resource that user's dispatch id C introduces, clash conserve system resources when avoiding the access to content scheduling.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of a part of the present invention, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 is the process flow diagram of web site contents information providing method in the embodiment of the invention;
Fig. 2 is the structural representation of web site contents information providing system in the embodiment of the invention;
Fig. 3 is the structural representation of web site contents information provider unit in the embodiment of the invention;
Fig. 4 is the concrete structure synoptic diagram of web site contents information providing system in the embodiment of the invention;
Fig. 5 is the process flow diagram that the web site contents information provider unit generates resource view in the embodiment of the invention;
Fig. 6 is the process flow diagram of web site contents information providing method in the embodiment of the invention one;
Fig. 7 is the process flow diagram of web site contents information providing method in the embodiment of the invention two.
Embodiment
In order to make technical matters to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.
The embodiment of the invention provides a kind of web site contents information providing method, obtain the web site contents index of renewal by the resource view that sets up a web site, to comprise that website initial link information and the related corresponding linked object of all-links information thereof carry out association index, and the resource view that sets up a web site, according to providing of site resource view realization web site contents information, the method flow process comprises the steps: as shown in Figure 1
Step S11: according to the search of creeping of the initial link information of the introducing website that obtains, get access to the link information that the introducing website comprises, and the linked object of the link information that acquires and attribute information thereof.
When creeping search according to the initial link information of the introducing website that obtains, specifically according to the search of creeping of the initial link information of the introducing website that obtains and pre-configured search strategy.Wherein, the search search strategy comprises one of following strategy or combination: depth-first strategy, breadth-first strategy and focused search strategy.
Can creep according to the initial link information of the introducing website that obtains and to search all association link information, namely can be according to initial link acquisition of information initial link object and attribute thereof, and from the search of creeping of current link object, obtain new url information, and constantly by new url acquisition of information corresponding objects and attribute.Wherein, initial link information can be the top layer domain name of Initial page, and such as URL(uniform resource locator) (Uniform/Universal Resource Locator, URL), association link information can be webpage each URL that line search arrives that swashes.The all-links information that comprises initial link information and association link information that searches according to creeping, the linked object of each link information of search of creeping.
Above-mentioned linked object comprises webpage and/or the file that link information is corresponding; The attribute information of above-mentioned linked object comprises one of following message or combination: link value, link type, web page title, crawled number of times, crawl time, the crawl degree of depth, whether grasp first, give tacit consent to coding, snapshots of web pages, file object name and object type.
For example: can realize creeping search by a web site contents information provider unit, one or several that provide that equipment provides from the IDC business are introduced website top layer domain name links and are begun to crawl URL on the Initial page, for each URL, reptile preserves the attribute information of the linked objects such as webpage corresponding to this link or file, includes but not limited to link value, link type, webpage Title, crawled number of times, crawl time, the crawl degree of depth, whether grasps first, gives tacit consent to the information such as coding, snapshots of web pages, file object name, object type.Simultaneously, reptile constantly extracts new URL from current page and puts into formation, behind the complete current page to be analyzed, extracts new URL and continue to crawl webpage or object information from formation, until satisfy default search stop condition.
Step S12: according to linked object and the attribute information thereof of the link information that obtains, set up linked object index corresponding to link information that obtains.
The content indexing that the linked object that comprises according to the link information that obtains and attribute information thereof make up each linked object, and determine incidence relation between the link information according to the routing information of the link information that obtains; By analysis, filter, set up the linked object index of the content indexing of the linked object that comprises that each link information incidence relation and each link information comprise.
The link information that searches creeping, and the linked object of link information and attribute information thereof process comprises the content indexing that makes up each linked object and the data correlation of each linked object.The link value that obtains according to reptile, link type, webpage Title, crawled number of times, crawl time, the crawl degree of depth, whether grasp first, give tacit consent to the information such as coding, snapshots of web pages, file object name, object type and carry out authorized index, make up the content indexing of each linked object; The set membership between different URL is judged in the URL path of record crawler capturing, forms the incidence relation between the content indexing, obtains the linked object index of each link information, provides Data support for generating the site resource global view.
Preferably, set up before linked object index corresponding to link information, also comprise the link information that searches creeping, and the linked object of link information and attribute information thereof carry out data and go heavily to process.The link information that can search creeping, and the linked object of link information and attribute information thereof are done MD5 (Message Digest Algorithm 5) computing, judge whether identically with the link information of setting up the linked object index according to the MD5 value that calculates, when identical, no longer set up the linked object index.Whether the link information that can certainly judge by other means creeps searches is identical with the link information of setting up the linked object index.For example: after a crawled success of URL, within updating period, do not need again crawled, but therefore other webpages may comprise this URL, need to go heavily URL.Native system adopts does the MD5 computing to the URL that has grasped, and guarantees namely no longer to repeat the uniqueness of crawl URL to grasp for the identical URL of MD5 value by the MD5 value of URL relatively.
Above-mentioned related data by the link information that searches creeping is carried out authorized index, association, is cleaned, goes heavily to wait processing, realizes generating standard I DC content indexing data, obtains the linked object index of each link information.
Step S13: according to the incidence relation between the linked object index of each link information, set up and respectively introduce the site resource view of website.Comprise in the site resource view of wherein setting up by setting regularly arranged linked object index corresponding to the link information of respectively introducing the website.
Incidence relation between the linked object index of each link information of setting up according to above-mentioned index, for example the set membership of each link information can realize setting up the site resource view of respectively introducing the website.
Step S14: provide web site contents information to the site information requesting party according to the site resource view of setting up.
When providing web site contents information according to the site resource view of setting up to the site information requesting party, generally be to provide web site contents information by the site information requesting service to the site information requesting party, can be by resource view being offered the mode of site information requesting service, also can adopt open query interface for the mode of site information requesting service query resource view, think that according to the site resource view site information requesting party provides web site contents information by the web site contents information inquiry facility.Wherein, IDC site information requesting service may be the CDN resource bus, also can be IDC business platform or other IDC site information requesting service.
Certainly optional, also can not pass through the web site contents information inquiry facility, directly provide web site contents information according to the site resource view of setting up to the site information requesting party.
Wherein, resource view is offered the mode of site information requesting service, specifically comprise: according to the view acquisition request conforms of site information requesting service transmission, offer the site information requesting service after the site resource view of setting up being offered the site information requesting service or according to the configuration requirement in the view acquisition request conforms site resource view of setting up being configured adjustment, provide the web site contents of asking information according to the site resource view that obtains to the site information requesting party by the site information requesting service; Comprise according to the site resource view that provides website visiting dispatch service or IDC website management service are provided.
Wherein, open query interface is for the mode of site information requesting service query resource view, specifically comprise: the view resource query request that root sends according to the site information requesting service, open query interface to the site information requesting service, the site resource view of foundation is provided or provides according to the configuration requirement in the view acquisition request conforms to the site information requesting service by query interface the site resource view of setting up is configured site resource view after the adjustment; Provide the web site contents of asking information according to the site resource view that inquires to the site information requesting party by the site information requesting service; Comprise according to the site resource view that provides website visiting dispatch service or IDC website management service are provided.
The above-mentioned web site contents information providing method that provides based on the embodiment of the invention, the embodiment of the invention also provides a kind of web site contents information providing system, its structure as shown in Figure 2, comprise above-mentioned web site contents information provider unit and at least one site information requesting service, can pass through the interfaces such as IF1, IF2 between the two and connect.For example: the web site contents information provider unit can be IDC content information synchronous device, and the site information requesting service can be IDC business platform and CDN resource bus etc.Realize information interaction between some IDC site information requesting services such as web site contents information provider unit and IDC business platform, CDN resource bus, provide the resource view of the introducing website of foundation to download or inquiry for site information requesting services such as IDC business platform, CDN resource bus.
Above-mentioned CDN resource bus can be realized the functions such as resource management, Content Management, user's scheduling.Obtain the resource view of the web site contents that IDC introduces by the web site contents information provider unit after, the rational management user access request, and according to content distribution policy the IDC web site contents is distributed to suitable CDN content node and service node.Above-mentioned IDC business platform can be realized hardware management and software administration.Wherein software administration can provide for the web site contents provider content configuration and the management function on basis, and provides basic domain-name information etc. for IDC content information synchronous device.
The above-mentioned web site contents information providing method that provides based on the embodiment of the invention, the embodiment of the invention also provides a kind of web site contents information provider unit, and structure as shown in Figure 3, comprising: search module 10, index module 20, view resource generation module 30 and access retrieval module 40.
Search module 10 for the search of creeping according to the initial link information of the introducing website that obtains, gets access to and introduces the link information that the website comprises, and obtains linked object and the attribute information thereof of link information.
Index module 20 is used for the attribute information according to the linked object that obtains, and sets up linked object index corresponding to link information that obtains.
View resource generation module 30 is used for setting up and respectively introducing the site resource view of website according to the incidence relation between the linked object index that respectively obtains link information; Wherein, comprise in the site resource view by setting regularly arranged linked object index corresponding to the link information of respectively introducing the website.
Access retrieval module 40 is used for providing web site contents information according to the site resource view of setting up to the site information requesting party.
Preferably, above-mentioned web site contents information provider unit also comprises: search strategy administration module 50; Wherein:
Search strategy administration module 50 is used for the configuration search strategy, and the search strategy of configuration comprises one of following strategy or combination: depth-first strategy, breadth-first strategy and focused search strategy.
Accordingly, above-mentioned search module 10, concrete being used for according to the search of creeping of the initial link information of the introducing website that obtains and pre-configured search strategy.
Preferably, above-mentioned index module 20, concrete being used for according to the linked object of the link information that obtains and the content indexing that attribute information makes up each linked object thereof, and determine incidence relation between each link information according to the routing information of the link information that obtains; Set up the linked object index of the content indexing of the linked object that comprises that each link information incidence relation and each link information comprise.
Preferably, above-mentioned index module 20 also is used for: before linked object index corresponding to the described link information that searches creeped in foundation, and the link information that searches creeping, and the linked object of link information and attribute information thereof carry out data and go heavily to process.
Preferably, above-mentioned access retrieval module 40, the concrete view acquisition request conforms that is used for according to the transmission of site information requesting service, offer the site information requesting service after the site resource view of setting up being offered the site information requesting service or according to the configuration requirement in the view acquisition request conforms site resource view of setting up being configured adjustment, provide the web site contents of asking information according to the site resource view that provides to the site information requesting party by the site information requesting service; Or the view resource query request that sends according to the site information requesting service, open query interface to the site information requesting service, the site resource view of foundation is provided or provides according to the configuration requirement in the request of view resource query to the site information requesting service by query interface the site resource view of setting up is configured site resource view after the adjustment, provide the web site contents of asking information according to the site resource view that inquires to the site information requesting party by the site information requesting service.
Preferably, above-mentioned web site contents information provider unit also comprises this territory control module 60, is used for the hunting zone that the control search module is creeped and searched for.
The concrete structure of above-mentioned web site contents information providing system as shown in Figure 4, wherein, the web site contents information provider unit comprises search module 10, index module 20, view resource generation module 30 and access retrieval module 40, search strategy administration module 50, this territory control module 60, system management module 70 and entrance (Portal) module 80.Wherein:
The access retrieval module 40 of web site contents information provider unit realize with the site information requesting service between communicate by letter, thereby realize providing web site contents information according to the site resource view of foundation to the site information requesting party.Access retrieval module 40 is based on the data interaction between interface protocol realization and external unit such as CDN resource bus, the IDC business platform etc., provide authentication functions as role server, account, the password of external unit carried out the user to be authenticated, realize that IDC introduces the importing of the initial link information of website, and the sending function of IDC web site contents resource information data; Belong to the technology realization that shields different bottom access waies for the upper strata.
Portal module 80 provides the door of Admin Administration, maintenance, access websites content information generator, system is based on the B/S framework, provide the necessary page of function (Web) operations such as user's login, log query, statistical report form and management to show the interface, the technology that belongs to the user interactions aspect realizes.
Search module 10 is realized introducing the search of creeping of website, utilize standard http agreement, the initial link information of the introducing website that provides according to site information requesting service and Portal module 80, and the search strategy of search strategy administration module formulation, in this territory scope, IDC is introduced the content of website and retrieve, travel through the all-links information in this this territory, IDC website, and corresponding linked object and attribute information thereof.
The produce index of the data that the link information that index module 20 realizations search creeping is relevant is set up the linked object index.By resolving the relevant information data of the linked objects such as the link information that obtained by search module 10 retrievals and webpage, file, after extraction, association, cleaning, go heavily to wait multi task process, realize generating the function of standard I DC content indexing data, obtain linked object index corresponding to each link information.
View resource generation module 30, the IDC content indexing data based on index module 20 generates generate the site resource view of introducing the website, in order to use for the user provides when dispatching.
Access retrieval module 40 can also be implemented on the basis of the IDC content indexing data that index module 20 sets up, and search function is provided, and introduces the content of website for Portal module and site information requesting service inquiry IDC.
Search strategy administration module 50 is used for allowing administrator configurations and management search strategy, such as rules such as depth-first search, BFS (Breadth First Search), focused searchs, calls for search module 10.
This territory control module 60 disposes and manages this domain search strategy, and the hunting zone of search module 10 is controlled, and limit search operates in the inside, introducing website in this territory to carry out, and still is linked to the IDC machine room in other territories or the linked object of server.
System management module 70 provides the local network management function, this module belongs to optional module, availability, equipment performance, network index to system carry out Real-Time Monitoring and management, such as: the resource operating position and the health status that obtain in real time the web site contents information providing system; The warning information that produces in the system is unified to collect, call corresponding strategy according to alarm level and process; Realize transmitting with docking with data acquisition of Upper NM Station system by network management interface; The various data that monitoring is produced record and analyze, and the Operation Log when being responsible for recording user use system is realized the statistical function to the superior and the subordinate's system queries record; Automatically generate conventional form and various personalized form, support to analyze all kinds of managerial demands; Configuring external network element relevant configuration information represents by Portal, and classification rights management function is provided, and guarantees that the user of different role can only use the function that is authorized to, and can only check and safeguard the data that are authorized to.
The support of above-mentioned web site contents information provider unit can generate different resource view files by IDC business platform, Portal interface, the required IDC web site contents resource subscription demand of CDN resource bus configuration different web sites information inquiry facility for the different web sites information inquiry facility.For guaranteeing safety and the independence of file, the web site contents information provider unit should can leave the web site contents resource view file for the different web sites information inquiry facility under the different paths, and controls by different access username and authority.
Interaction flow between above-mentioned each module of web site contents information provider unit specifically comprises the steps: as shown in Figure 5
Step S21: search module is to search strategy administration module request search strategy.
Search module in the web site contents information provider unit is to the search strategy of search strategy administration module request reptile.
Step S22: the search strategy administration module returns to search module with the search strategy of configuration.
For example: the search strategy administration module returns the reptile search strategy to search module.
Step S23: search module is to this this territory control strategy of territory control module request.
Step S24: this territory control strategy that this territory control module will dispose returns to search module.
As described in top method part, can determine the scope that search module is creeped and searched for according to this territory control strategy.
Step S25: search module is according to search strategy and the search of creeping of this territory control strategy of configuration.
The search module reptile obtains link information and the linked object of correspondence and the attribute information of linked object of appointed website in the scope of this territory control strategy appointment according to the search strategy of configuration.
The specific implementation process is participated in step S11.
Step S26: search module sends the data such as the linked object of each link information search and attribute information thereof to index module.
Step S27: linked object index corresponding to link information that obtains processed and generated to the data that index module searches search module.
The specific implementation process is referring to step S12.
Step S28: index module sends the index informations such as linked object index that generate to the resource view generation module.
Step S29: the resource view generation module is processed index data and is generated the site resource view.
The specific implementation process is referring to step S13.
Foregoing description in the web site contents information provider unit each module realize alternately the process of site resource view generation.
The main support two class data modes that provide of web site contents that IDC introduces are provided above-mentioned web site contents information provider unit: one provides file transfer protocol (FTP) (File Transfer Protocol, FTP) service function, the site information requesting service initiates first to be directed to the request of obtaining of the site resource view of specific introducing website, generate the resource view information of corresponding scope after the web site contents information provider unit parsing content, download for business platform.The 2nd, pass through HTML (Hypertext Markup Language) (HyperText Transfer Protocol between support and the site information requesting service, HTTP)+web service (WebService) mode mutual, by the query requests of site information requesting service initiation for the site resource view of specific introducing website, the web site contents information provider unit returns corresponding resource view information to the site information requesting service.The implementation procedure of the web site contents information providing method of above-mentioned two kinds of different pieces of information load modes is described below by specific embodiment:
Embodiment one
The web site contents information providing method that the embodiment of the invention one provides, based on the download that the site resource view is provided of file interface realization site resource view, its flow process comprises the steps: as shown in Figure 6
Step S101: the site information requesting service transmits the initial link information of website to the web site contents information provider unit.
For example: operating personnel transmit the raw information of IDC website by IDC business platform or Portal interface to the web site contents information provider unit, comprise domain name, the initially link etc. of creeping.
Step S102: the web site contents information provider unit by IDC business platform interface at the IDC Website server line search that swashes.
Step S103: obtain each link information corresponding to initial link information from the IDC Website server, and linked object corresponding to each link information and the attribute information of linked object.
Step S104: the linked object that each link information that the web site contents information provider unit obtains based on creeping is corresponding and the attribute information of linked object, set up the linked object index.
The web site contents information provider unit set up to be introduced the linked object index of each link information that the website comprises by data processing operation.
Step S105: the web site contents information provider unit generates standard I DC site resource view.
Step S106: the site information requesting service sends the view acquisition request conforms to the web site contents information provider unit.
The site information requesting service is by uploading configuration requirement and the download site resource view of view resource with the interface of web site contents information provider unit.As shown in Figure 6, the site information requesting service can be IDC business platform or CDN resource bus.
Step S107: the web site contents information provider unit is configured adjustment according to the configuration requirement in the view acquisition request conforms to the site resource view of setting up.
This step is optional step, when not carrying configuration requirement in the view acquisition request conforms, does not carry out this step.When carrying configuration requirement in the view acquisition request conforms, the web site contents information provider unit is according to the configuration requirement of site information requesting service, output meets the IDC site resource view file of configuration requirement and is stored under the corresponding path, downloads for the site information requesting service.
Step S108: the site resource view that the site information requesting service obtains from web site contents information provider unit download request.
The site information requesting service connects according to self-demand and web site contents information provider unit, from web site contents information provider unit download site resource view file.
Embodiment two
The web site contents information providing method that the embodiment of the invention two provides, based on real-time query Interface realization site resource view the site resource view query is provided, its flow process comprises the steps: as shown in Figure 7
Step S201: the site information requesting service transmits the initial link information of website to the web site contents information provider unit.
The web site contents information provider unit externally provides WebService or other real-time messages interface.Operating personnel transmit the raw information of IDC website by IDC business platform or Portal interface to the web site contents information provider unit, comprise domain name, the initially link etc. of creeping.
Step S202: the web site contents information provider unit by IDC business platform interface at the IDC Website server line search that swashes.
Step S203: obtain each link information corresponding to initial link information from the IDC Website server, and linked object corresponding to each link information and the attribute information of linked object.
Step S204: the linked object that each link information that the web site contents information provider unit obtains based on creeping is corresponding and the attribute information of linked object, set up the linked object index.
The web site contents information provider unit set up to be introduced the linked object index of each link information that the website comprises by data processing operation.
Step S205: the web site contents information provider unit generates standard I DC site resource view.
Step S206: the request of site information requesting service logs in website content information generator.
When the site information requesting service need to obtain the site resource view, send the request of logging in to the web site contents information provider unit.
Step S207: the request that logs in of web site contents information provider unit response site information requesting service.
Optionally, the web site contents information provider unit can allow business platform to log in after the site information requesting service is carried out authentication again.
Step S208: the view resource query request that the site information requesting service sends is to the web site contents information provider unit.
The site information requesting service is by uploading configuration requirement and the query web resource view of view resource with the interface of web site contents information provider unit.As shown in Figure 7, the site information requesting service can be IDC business platform or CDN resource bus.
Step S209: the web site contents information provider unit is configured adjustment according to the configuration requirement in the request of view resource query to the site resource view of setting up.
This step is optional step, when not carrying configuration requirement in the request of view resource query, does not carry out this step.When carrying configuration requirement in the request of view resource query, the web site contents information provider unit is according to the configuration requirement of site information requesting service, output meets the IDC site resource view of configuration requirement and is stored under the corresponding path, for the inquiry of site information requesting service.
Step S210: the view resource query request of web site contents information provider unit response site information requesting service.
The site information requesting service connects according to self-demand and web site contents information provider unit, from web site contents information provider unit query web resource view.
Step S211: the request of publishing is sent to the content information provider unit in site information requesting service website.
When the site information requesting service does not need to obtain the site resource view again, send the request of publishing to the web site contents information provider unit.
Step S212: the request of publishing of web site contents information provider unit response site information requesting service.
The web site contents information provider unit is nullified the logon information of site information requesting service.
Web site contents information providing method and device that the embodiment of the invention provides, can be from the IDC Website server with HTTP mode automatic access, gather, obtain web site contents information, the control reptile obtains the scope of URL, only obtains the site resource information of having introduced in the specific ID C territory; For the URL information of obtaining, support to carry out the URL association, go heavily to wait processing, generate until other linked object index information of linked object level; And according to the demand of different web sites information inquiry facility, support to generate flexibly different site resource view information, to offer the site information requesting service; Both can support by file mode, generate according to demand the site resource view file, offer the site information requesting service; Also can be by supporting message based real-time query mode, the site information requesting service can be mutual by interface and web site contents information provider unit, initiatively initiate the request of IDC site resource view query, the web site contents information provider unit returns the site resource view of inquiring about to the site information requesting service.
It is slow by information synchronization inefficiency, poor accuracy, speed that artificial manual mode configuration or collection IDC site information cause that said method effectively solves present stage, synchronous untimely defective, has the advantage that automatic collection integration is processed, efficient is high, real-time, can further optimize promptness and accuracy rate that the IDC site information provides, to strengthen the CDN network to the ability of site resource intelligent scheduling.
Said method concentrates on the processing of IDC site resource information in the newly-increased web site contents information provider unit and realizes, avoid all site information requesting services all to carry out the IDC site information and integrated the operation of processing, effectively reduce for the complexity and the functional requirement that realize the IDC web content management, reduce construction and the cost of investment of business side equipment, provide good solution for the site information requesting service obtains IDC site resource information fast and efficiently.
Above-mentioned explanation illustrates and has described the preferred embodiments of the present invention, but as previously mentioned, be to be understood that the present invention is not limited to the disclosed form of this paper, should not regard the eliminating to other embodiment as, and can be used for various other combinations, modification and environment, and can in invention contemplated scope described herein, change by technology or the knowledge of above-mentioned instruction or association area.And the change that those skilled in the art carry out and variation do not break away from the spirit and scope of the present invention, then all should be in the protection domain of the appended claim of the present invention.

Claims (11)

1. a web site contents information providing method is characterized in that, comprising:
According to the search of creeping of the initial link information of the introducing website that obtains, get access to the link information that described introducing website comprises, and obtain linked object and the attribute information thereof of described link information;
According to linked object and the attribute information thereof of the described link information that obtains, set up linked object index corresponding to described link information;
According to the incidence relation between the linked object index of each described link information, set up and respectively introduce the site resource view of website; Comprise in the described site resource view by setting regularly arranged linked object index corresponding to the link information of respectively introducing the website;
Provide web site contents information according to the site resource view of setting up to the site information requesting party.
2. the method for claim 1 is characterized in that, the search of creeping of the initial link information of described introducing website according to obtaining specifically comprises:
According to the search of creeping of the initial link information of the introducing website that obtains and pre-configured search strategy, wherein search strategy comprises one of following strategy or makes up: depth-first strategy, breadth-first strategy and focused search strategy.
3. the method for claim 1 is characterized in that, the linked object of the described link information that described basis is obtained and attribute information thereof are set up linked object index corresponding to described link information, specifically comprise:
The content indexing that makes up each linked object according to linked object corresponding to the described link information that obtains and attribute information thereof, and determine incidence relation between the described link information according to the routing information of described link information; Set up the linked object index of the content indexing of the linked object that comprises that each link information incidence relation and each link information are corresponding.
4. the method for claim 1 is characterized in that, described site resource view according to setting up provides web site contents information to the site information requesting party, specifically comprises:
View acquisition request conforms according to the transmission of site information requesting service, offer the site information requesting service after the site resource view of setting up being offered the site information requesting service or according to the configuration requirement in the view acquisition request conforms site resource view of setting up being configured adjustment, provide the web site contents of asking information according to the site resource view that provides to the site information requesting party by the site information requesting service; Or
View resource query request according to the transmission of site information requesting service, open query interface to the site information requesting service, the site resource view of foundation is provided or provides according to the configuration requirement in the request of view resource query to the site information requesting service by query interface the site resource view of setting up is configured site resource view after the adjustment; Provide the web site contents information station of asking management service according to the site resource view that inquires to the site information requesting party by the site information requesting service.
5. a web site contents information provider unit is characterized in that, comprising:
Search module for the search of creeping according to the initial link information of the introducing website that obtains, gets access to the link information that described introducing website comprises, and obtains linked object and the attribute information thereof of described link information;
Index module is used for setting up linked object index corresponding to described link information according to the linked object and the attribute information thereof that obtain;
View resource generation module is used for according to the incidence relation between the linked object index of each described link information, sets up and respectively introduces the site resource view of website; Comprise in the described site resource view by setting regularly arranged linked object index corresponding to the link information of respectively introducing the website;
The access retrieval module is used for providing web site contents information according to the site resource view of setting up to the site information requesting party.
6. device as claimed in claim 5 is characterized in that, also comprises: the search strategy administration module;
Described search strategy administration module is used for the configuration search strategy, and described search strategy includes but not limited to one of following strategy or combination: depth-first strategy, breadth-first strategy and focused search strategy
Described search module, concrete being used for according to the search of creeping of the initial link information of the introducing website that obtains and pre-configured search strategy.
7. device as claimed in claim 5 is characterized in that, described index module specifically is used for:
The content indexing that the linked object that comprises according to the described link information that obtains and attribute information thereof make up each linked object, and determine incidence relation between the described link information according to the routing information of described link information; Set up the linked object index of the content indexing of the linked object that comprises that each link information incidence relation and each link information are corresponding.
8. device as claimed in claim 5 is characterized in that, described access retrieval module specifically is used for:
View acquisition request conforms according to the transmission of site information requesting service, offer the site information requesting service after the site resource view of setting up being offered the site information requesting service or according to the configuration requirement in the view acquisition request conforms site resource view of setting up being configured adjustment, provide the web site contents of asking information according to the site resource view that provides to the site information requesting party by the site information requesting service; Or
View resource query request according to the transmission of site information requesting service, open query interface to the site information requesting service, the site resource view of foundation is provided or provides according to the configuration requirement in the request of view resource query to the site information requesting service by query interface the site resource view of setting up is configured site resource view after the adjustment; Provide the web site contents of asking information according to the site resource view that inquires to the site information requesting party by the site information requesting service.
9. such as the arbitrary described device of claim 5-8, it is characterized in that described index module also is used for:
Before linked object index corresponding to the described link information that searches creeped in foundation, the link information that searches creeping, and the linked object of link information and attribute information thereof carry out data and go heavily to process.
10. such as the arbitrary described device of claim 5-8, it is characterized in that, also comprise:
This territory control module is used for the hunting zone that the control search module is creeped and searched for.
11. a web site contents information providing system is characterized in that, comprises such as the arbitrary described web site contents information provider unit of claim 5-10 and at least one site information requesting service.
CN2011103626460A 2011-11-16 2011-11-16 Providing method, system and device of website content information Pending CN103116580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103626460A CN103116580A (en) 2011-11-16 2011-11-16 Providing method, system and device of website content information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103626460A CN103116580A (en) 2011-11-16 2011-11-16 Providing method, system and device of website content information

Publications (1)

Publication Number Publication Date
CN103116580A true CN103116580A (en) 2013-05-22

Family

ID=48414957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103626460A Pending CN103116580A (en) 2011-11-16 2011-11-16 Providing method, system and device of website content information

Country Status (1)

Country Link
CN (1) CN103116580A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103532944A (en) * 2013-10-08 2014-01-22 百度在线网络技术(北京)有限公司 Method and device for capturing unknown attack
CN104484424A (en) * 2014-12-19 2015-04-01 浪潮通用软件有限公司 Establishing method for resource price information base of construction enterprise based on internet
CN105183919A (en) * 2015-10-13 2015-12-23 郑州悉知信息科技股份有限公司 Deployment method and device for internal links of website
CN106168977A (en) * 2016-07-15 2016-11-30 河南山谷网安科技股份有限公司 A kind of column recognition methods for web portal security monitoring
CN109542402A (en) * 2018-10-12 2019-03-29 杭州工跃机械制造有限公司 A method of adaptive is used for more portal website's seamless switchings
CN110008390A (en) * 2019-02-27 2019-07-12 深圳壹账通智能科技有限公司 Appraisal procedure, device, computer equipment and the storage medium of application program
CN113660178A (en) * 2021-06-30 2021-11-16 新浪网技术(中国)有限公司 CDN content management system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6038610A (en) * 1996-07-17 2000-03-14 Microsoft Corporation Storage of sitemaps at server sites for holding information regarding content
US7296222B1 (en) * 1999-04-16 2007-11-13 International Business Machines Corporation Method and system for preparing and displaying page structures for web sites
US7599920B1 (en) * 2006-10-12 2009-10-06 Google Inc. System and method for enabling website owners to manage crawl rate in a website indexing system
CN102124460A (en) * 2008-04-04 2011-07-13 微软公司 Standard schema and user interface for website maps

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6038610A (en) * 1996-07-17 2000-03-14 Microsoft Corporation Storage of sitemaps at server sites for holding information regarding content
US7296222B1 (en) * 1999-04-16 2007-11-13 International Business Machines Corporation Method and system for preparing and displaying page structures for web sites
US7599920B1 (en) * 2006-10-12 2009-10-06 Google Inc. System and method for enabling website owners to manage crawl rate in a website indexing system
CN102124460A (en) * 2008-04-04 2011-07-13 微软公司 Standard schema and user interface for website maps

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103532944A (en) * 2013-10-08 2014-01-22 百度在线网络技术(北京)有限公司 Method and device for capturing unknown attack
CN103532944B (en) * 2013-10-08 2016-09-07 百度在线网络技术(北京)有限公司 A kind of method and apparatus capturing unknown attack
CN104484424A (en) * 2014-12-19 2015-04-01 浪潮通用软件有限公司 Establishing method for resource price information base of construction enterprise based on internet
CN105183919A (en) * 2015-10-13 2015-12-23 郑州悉知信息科技股份有限公司 Deployment method and device for internal links of website
CN105183919B (en) * 2015-10-13 2018-10-12 郑州悉知信息科技股份有限公司 The dispositions method and device of chain in a kind of website
CN106168977A (en) * 2016-07-15 2016-11-30 河南山谷网安科技股份有限公司 A kind of column recognition methods for web portal security monitoring
CN106168977B (en) * 2016-07-15 2019-07-02 山谷网安科技股份有限公司 A kind of column recognition methods for web portal security monitoring
CN109542402A (en) * 2018-10-12 2019-03-29 杭州工跃机械制造有限公司 A method of adaptive is used for more portal website's seamless switchings
CN110008390A (en) * 2019-02-27 2019-07-12 深圳壹账通智能科技有限公司 Appraisal procedure, device, computer equipment and the storage medium of application program
CN113660178A (en) * 2021-06-30 2021-11-16 新浪网技术(中国)有限公司 CDN content management system

Similar Documents

Publication Publication Date Title
CN103116580A (en) Providing method, system and device of website content information
CN103685590B (en) Obtain the method and system of IP address
CN102654885B (en) Mobile terminal webpage adaptation system and method
CN102084392B (en) System and method of content distrubution
CN103546343B (en) The network traffics methods of exhibiting of network traffic analysis system and system
CN104933188B (en) A kind of data synchronous system and method in patent personalization storehouse
CN101247402A (en) Multimedia files downloading and broadcasting system and method
CN1433622A (en) Systems and methods for redirecting users attempting to access network site
CN101404630B (en) Method and system for implementing internet service access gate
CN102025595A (en) Flow optimization method and system
US20140096237A1 (en) Information processing system, access right management method, information processing apparatus and control method and control program therefor
CN102577237A (en) Method for scheduling web hosting service, method for processing application access, apparatus and system thereof
CN106657374A (en) Internet traffic and flow direction big data intelligent analysis and decision-making method and system
CN103729380A (en) Data processing method, system and device
CN103179148A (en) Processing method and system for sharing enclosures in internet
CN102254018A (en) Method and system for generating navigation website based on Internet use behaviour analysis system
AU2008355023A1 (en) Generating sitemaps
CN101599921B (en) Enterprise instant communication system and file transfer method thereof
CN105279156B (en) Network information communication means and network information browsing apparatus
CN102833282A (en) Information propelling method, network element and system
CN104580400A (en) Real-time data publishing method and system for Internet of Things
JP2003162449A (en) Integrated access management system, integrated access management device and its method and program
US11797458B2 (en) Terminal management device and terminal device
KR20090021608A (en) Method and system for managing of server performance
CN101605032A (en) A kind of method and system of controlling website visiting

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130522

RJ01 Rejection of invention patent application after publication