WO2002029590A1 - Method and apparatus for transforming contents on the web - Google Patents

Method and apparatus for transforming contents on the web Download PDF

Info

Publication number
WO2002029590A1
WO2002029590A1 PCT/US2001/030691 US0130691W WO0229590A1 WO 2002029590 A1 WO2002029590 A1 WO 2002029590A1 US 0130691 W US0130691 W US 0130691W WO 0229590 A1 WO0229590 A1 WO 0229590A1
Authority
WO
WIPO (PCT)
Prior art keywords
contents
web contents
semantic analysis
web
information
Prior art date
Application number
PCT/US2001/030691
Other languages
French (fr)
Other versions
WO2002029590A8 (en
Inventor
Akio Yamamoto
Original Assignee
Hewlett-Packard Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Company filed Critical Hewlett-Packard Company
Priority to US10/381,507 priority Critical patent/US20040054973A1/en
Priority to KR10-2003-7004677A priority patent/KR20030079919A/en
Priority to EP01981345A priority patent/EP1323051A1/en
Publication of WO2002029590A1 publication Critical patent/WO2002029590A1/en
Publication of WO2002029590A8 publication Critical patent/WO2002029590A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/04Protocols specially adapted for terminals or networks with limited capabilities; specially adapted for terminal portability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • H04L67/303Terminal profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • H04L67/5651Reducing the amount or size of exchanged application data

Definitions

  • the present invention relates to a method for providing document contents by a Web server. More particularly, it relates to a method and an apparatus in which, in providing Web contents to a client (or browser) , a document is appropriately transformed on the basis of the results of the semantic analysis of the contents.
  • the Internet which is the network of computers distributed all over the world, has its importance and effectiveness recognized extensively as a medium through which a plurality of computers are able to communicate with one another .
  • the World Wide Web which is constructed of a plurality of server computers (Web servers) connected to the Internet and storing contents information (Web pages) therein, and a multiplicity of clients for accessing the information, is an information providing service on the Internet as has been most highlighted in recent years.
  • the service can provide and exchange, not only text information, but also graphics and image information, audio and video information, etc.
  • intranets which are the private computer networks of enterprises, can easily provide and share information within the enterprises by way of example and are in widespread use .
  • FIG. 1 A prior-art example for coping with these problems is schematically shown in Fig. 1.
  • Web contents are transformed in conformity with the properties of a device which is used for access.
  • a color image of large size has its size reduced and is transformed into a black- and-white image of low resolution as stated in Japanese Patents Laid-OpenNo.345178/1999, No.122958/2000, No.222275/2000 and No. 222276/2000.
  • document contents are subjected to such processing as the alteration of the font or font size of a text, or the division of the contents into parts of smaller size each of which can be displayed on the display panel of the mobile device. Nevertheless, drawbacks to be mentioned below are pointed out .
  • the present invention has for its object to transform Web contents so that a more efficient access facility can be provided to the user of a mobile terminal device, in addition to the facilitation of the display of the contents on the display panel of the mobile device.
  • Another object of the present invention is to transform Web contents so that a navigation mechanism can be realized which has hyperlinks permitting a client to readily judge whether or not the contents are necessary for him/her, without going through all the contents, and permitting the client to immediately move to a place that seems to be important within the contents.
  • Still another object of the present invention is to transform Web contents so that a facility which permits a client to browse information by the least access (communication) similarly to the above can be provided, not only for the contents composed of a single document, but also for the enormous contents composed of a plurality of documents.
  • the requested Web contents are analyzed, and editorial information as well as formal paragraph information is extracted. These information together with the requested contents are linked to corresponding semantic analysis results.
  • a semantic analysis program is executed for the requested Web contents so as to extract keywords, key sentences and/or key paragraphs from the Web contents . Also, the summary of the contents is created. These semantic information items obtained are saved as the semantic analysis results. Subsequently, the requested document contents are appropriately transformed on the basis of the semantic information contained in the retrieved semantic analysis results, and in accordance with the requests of a client or the attributes of the terminal device.
  • the processing of the transformation includes the creation of a top page which is formed of the title and other editorial information of the document, and menu information, the creation of a summary page, the creation of the lists of keywords, key sentences etc. and links to places where the keywords etc. appear, and the creation of the hyperlinks among the created pages.
  • the Web contents are displayed on the terminal device interactively in compliance with the requests of the client.
  • Fig. 1 is a block diagram showing an information access system in the prior art
  • Fig. 2 is a block diagram showing the architecture of an apparatus according to the present invention.
  • Fig.3 is a flow chart showing an embodiment of the present invention.
  • Fig. 4 is a diagram showing an example of the list of keywords in the present invention.
  • Fig. 5 is a diagram showing the logical structure of a transformation description object.
  • Fig. 6 is a diagram for explaining user operations in the present invention. PREFERRED EMBODIMENTS OF THE INVENTION:
  • a block diagram of an information access system for performing the present invention is shown in Fig.2.
  • a contents transformation system 10 physically lies between a client device or terminal device 20 and Web contents 40 which a client searches for, and it functions as the interface between them.
  • the contents transformation system 10 may well exist within a server computer 30.
  • the server computer 30 When the server computer 30 has received a request for access to Web contents 40 desired by the client, from the terminal device 20 connected through a communication network such as the Internet, the transformation system 10 accesses the Web contents 40 and a semantic analysis results 50 corresponding to the Web contents 40.
  • the "semantic analysis results 50" signify results which are obtained by extracting and analyzing semantic information contained in the Web contents 40 and are stored, and which can be generated beforehand by executing a semantic analysis program for the Web contents 40. In the absence of such semantic analysis results when the server computer 30 has received a request for access to Web contents 40, the semantic analysis program is executed to generate the semantic analysis results 50.
  • the transformation system 10 uses a Web contents analyzer 120 and a semantic analysis results analyzer 130, the transformation system 10 generates a transformation description object 110 by employing the elements of the Web contents 40 requested by the client and the elements of the corresponding semantic analysis results 50.
  • the transformation description object 110 contains information on the links between the lists of the elements contained in the Web contents 40 and the semantic analysis results 50, and Web contents corresponding to the elements.
  • the transformation system 10 searches for information desired by the client, in conformity with the properties of the terminal device 20 possessed by the client or in compliance with a request made by the client, and it transmits the desired information to the terminal device 20 through the server computer 30 so as to indicate the information on the display thereof.
  • Numeral 140 designates a transformation engine which will be explained later.
  • Step 210 A terminal device makes a request for access to Web contents .
  • Step 220 The results of a semantic analysis concerning the requested Web contents are retrieved.
  • Step 230 It is checked if the semantic analysis results are found.
  • Step 240 Unless the semantic analysis results are found, a semantic analysis program is executed.
  • Step 250 A transformation description object is generated by analyzing the Web contents and the semantic analysis results.
  • Step 260 Each element of the Web contents is transformed in accordance with the request of a user and the attributes of the terminal device .
  • Step 270 The transformed elements are transmitted, and are displayed on the terminal device.
  • a request for access to certain Web contents is transmitted from the client device 20 (in Fig. 2) connected through the communication network such as the Internet, to the server computer 30 by using the HyperText Transfer Protocol (HTTP) over transmission control protocol/Internet protocol (TCP/IP) connection.
  • HTTP HyperText Transfer Protocol
  • TCP/IP transmission control protocol/Internet protocol
  • the Web contents are formatted by a standard page description language such as the extensible Markup Language (XML) .
  • contents transformation which proceeds in the contents transformation system 10 is broadly made up of two processing stages.
  • the contents transformation system 10 analyzes the corresponding Web contents by means of the Web contents analyzer 120 so as to extract elements contained in the Web contents. Extracted are, for example, editorial information such as the title, author and date of a document, and the body of the document, as well as formal paragraph information constituting them. Simultaneously, the contents transformation system 10 links those extracted information to the semantic analysis results 50 corresponding to the Web contents 40. Using the link, the system 10 can retrieve the semantic analysis results 50 as required.
  • the semantic analysis results 50 hold the semantic information of the Web contents 40 in the XML format.
  • the semantic information contains the information of extracted keywords, key sentences or key paragraphs , positions where they appear in the document, and so forth. Also contained is information on a text structure which indicates the semantic consistency of the document as obtained by analyzing the contexts between sentences .
  • the semantic information is not restricted to such exemplary information.
  • the semantic analysis program is executed for the requested Web contents 40 so as to extract the semantic information of the contents 40.
  • the semantic information obtained is saved as the semantic analysis results 50 in the XML format.
  • a word (noun) of high frequency of appearance is set as the keyword on the basis of the assumption that the word often appearing in the document tends to indicate the theme of the document.
  • a technique for weighting a word in accordance with the rate of appearance is detailed in "Automatic Text Processing" written by G. Salton, published by Addison-Wesley Publishing Company in 1989.
  • the key sentence is extracted in such a way that the respective words are weighted in consideration of the frequencies of appearance of the words and the number of texts in which the words appear, and that the summation of the weights of the words which appear in the sentence is deemed the level of importance of the sentence .
  • This method has been proposed by K. Zechner, and is stated in "Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences” in the Proceedings of the 16th International Conference on Computational Linguistics, pp.986-989, 1996. Results obtained by the method are used also in this embodiment.
  • the contents transformation system 10 analyzes the semantic analysis results 50 by means of the semantic analysis results analyzer 130 so as to extract, for example, the list of keywords, words or word groups deeply relevant to the respective keywords, and information on places where they appear in the document . Similarly extracted are information on the key sentences, key paragraphs and summary of the document .
  • the transformation description object 110 contains link information for the Web contents 40 in which the lists of the keywords, key sentences etc. and information on the elements thereof are stored.
  • the contents transformation system 10 retrieves relevant information and provides the retrieved information to the client.
  • the transformation description object 110 has a structure as shown in Fig.5 and is expressed as an XML document object .
  • the object 110 holds a logical structure which expresses the creations of the following elements : (a) Top page information
  • Top page which is formed of the editorial information of the document, such as the title, author and date thereof, menu information having links to the respective information items, and so forth (b) Summary
  • Keyword page which contains the list of the extracted keywords, and links to places where the keywords appear in the document
  • Key phrase page which contains the list of key phrases relevant to the keywords, and links to places where the key phrases appear in the document
  • Key sentence page which contains the list of the extracted key sentences, and links to places where the key sentences appear in the document
  • the transformation engine 140 defines transformation rules, namely, a series of rules for' the display aspects of the elements included in the Web contents 40 and the semantic analysis results 50, on the client device 20; the information of link destinations in the case where the elements are linked; and so forth.
  • the transformation engine 140 transforms the respective elements included in the Web contents 40 and the semantic analysis results 50, on the basis of the transformation rules defined for all the elements.
  • the transformation engine 140 does not execute the final transformationprocessing of the contents yet, but it merely builds the logical structure of the transformed contents, that is, generates the object which describes transforming methods for the elements.
  • the transformed document can have the structure as shown in Fig. 5, as its logical structure.
  • the logical structure is formed of the top page which contains the editorial information of the document and the links to the summary, keywords and key sentences, the pages which contain the lists of the keywords, key sentences etc. and the links to the places where the keywords, key sentences etc. appear in the document, respectively, and document fragments which are obtained by dividing the body of the document into parts of appropriate size.
  • the transformation processing of the contents is actually executed by the transformation engine 140.
  • An access request from the client device 20 is transmitted to the Web server 30 by using the HTTP protocol.
  • information items on a communication facility, a display facility, etc. incorporated in the terminal 20 can be contained as parts of an HTTP header.
  • the transformation processing is executed for the respective elements in accordance with the information items on the terminal attributes, and the transformation description object 110 created at the first stage.
  • the pages of the body of the document are created, while at the same time, the pages and hyperlinks (a) - (g) mentioned above are created.
  • FIG. 6 An example of the communications between a client or user and the contents transformation system 10 will now be explained with reference to Fig. 6.
  • the client device 20 displays a top page (a) and the client wants to know information about a "keyword” or a "key phrase relevant to a keyword”, he/she selects "keywords” to open a "keyword page” (b) .
  • An anchor to a page which contains the list of keywords and key phrases relevant to the respective keywords is indicated on the keyword page (b) .
  • any of the keywords for example, "keyword 1" is selected on the keyword page (b) , the part of the "keyword 1" in the body of a document is displayed.
  • the client can readily grasp the whole document without going through all the document contents. Further, it is possible to cope with even the presence of such a limitation that the display screen of the client device 20 is small .
  • Computer program codes for executing the operation of the present invention should desirably be created with an object-oriented programming language such as Java or C++. However, they can also be created with a conventional procedure-oriented programming language such as C, or a functional programming language.
  • the contents transformation processing is implemented as a Java Servlet by using the Java programming language and is executed in the Web server 30.
  • the processing can also be implemented as a common gateway interface (CGI) application or as logic contained in an active server page (ASP) .
  • CGI common gateway interface
  • ASP active server page
  • all the program codes are executed on the Web server 30. It is also possible, however, to execute some of the program codes on the Web server 30 and the others on a Web proxy.
  • the present invention not only the display of document contents on the display panel of a mobile terminal device is facilitated, but also more efficient access to the contents can be realized, owing to a dynamic contents transformation method in which new hyperlinks based on the key information of a document, such as keywords and key sentences, are generated with reference to the results of the semantic analysis of the document contents, and in which the document contents are appropriately divided on the basis of results obtained by semantically structuring the whole document, and terminal attributes indicating communication and display facilities incorporated in a terminal device making access.
  • a dynamic contents transformation method in which new hyperlinks based on the key information of a document, such as keywords and key sentences, are generated with reference to the results of the semantic analysis of the document contents, and in which the document contents are appropriately divided on the basis of results obtained by semantically structuring the whole document, and terminal attributes indicating communication and display facilities incorporated in a terminal device making access.
  • information providing/browsing can be realized by the least access (communication) even for enormous Web contents, owing to a navigation mechanism which has hyperlinks permitting a client to readily judge whether or not the contents are necessary for him/her, from the summary, key elements, correlated keywords, etc. of at least one pertinent document and without going through all the contents, and permitting the client to immediately move to a place that seems important within the contents.
  • These functions are very effective for access to the Web contents from, not only the mobile terminal device, but also a conventional desktop computer.

Abstract

Web contents (40) requested by a user (client device) (20), and the results of the semantic analysis (50) of the Web contents are retrieved. The requested Web contents (40) are appropriately transformed utilizing contents transformation system (10) on the basis of the information items of the Web contents and semantic analysis results, and in accordance with the user's requests or the attributes of the client device, whereupon the transformed Web contents are transmitted to the client device (20). Thus, even the user of a palmtop computer, a handheld computer or a portable telephone whose display panel is small in size can access the Web contents conveniently and efficiently.

Description

METHOD AND APPARATUS FOR TRANSFORMING CONTENTS ON THE WEB
BACKGROUND OF THE INVENTION: FIELD OF THE INVENTION:
The present invention relates to a method for providing document contents by a Web server. More particularly, it relates to a method and an apparatus in which, in providing Web contents to a client (or browser) , a document is appropriately transformed on the basis of the results of the semantic analysis of the contents.
DESCRIPTION OF THE RELATED ART:
The Internet which is the network of computers distributed all over the world, has its importance and effectiveness recognized extensively as a medium through which a plurality of computers are able to communicate with one another . The World Wide Web which is constructed of a plurality of server computers (Web servers) connected to the Internet and storing contents information (Web pages) therein, and a multiplicity of clients for accessing the information, is an information providing service on the Internet as has been most highlighted in recent years. The service can provide and exchange, not only text information, but also graphics and image information, audio and video information, etc. Also intranets which are the private computer networks of enterprises, can easily provide and share information within the enterprises by way of example and are in widespread use . A Web browser having a graphical user interface, such as Netscape Navigator or Internet Explorer operating on a computer, has been usually employed in order to access the information provided by the Internet and the intranets.
Owing to the recent rapid progress of mobile computing technology, clients who use, not only conventional desktop computers, but also palmtop or handheld computers, have increased in number. Besides, more people have come to access the Internet using portable telephones adapted to be connected with networks. In general, in a mobile device such as the palmtop/handheld computer or the portable telephone, a display panel is smaller in size than that of the desktop computer and is often inferior in the capabilities of color display etc. As a result, unless Web contents are transformed in any way, part of the Web contents displayable on the display panel of the desktop computer becomes undisplayable on that of the mobile device in some cases. Moreover, the Web contents might fail to be correctly displayed due to limits of the performances of the mobile terminal device, such as the size of an installed memory and the bandwidth of the connection with the network.
A prior-art example for coping with these problems is schematically shown in Fig. 1. There has been mainly adopted a method wherein, as shown in the figure, Web contents are transformed in conformity with the properties of a device which is used for access. By way of example, a color image of large size has its size reduced and is transformed into a black- and-white image of low resolution as stated in Japanese Patents Laid-OpenNo.345178/1999, No.122958/2000, No.222275/2000 and No. 222276/2000. Besides, document contents are subjected to such processing as the alteration of the font or font size of a text, or the division of the contents into parts of smaller size each of which can be displayed on the display panel of the mobile device. Nevertheless, drawbacks to be mentioned below are pointed out .
With the transformation conforming to the properties of the mobile terminal used by a client, the Web contents are essentially the same, and merely the display of the contents on, fo -example, the display panel of small size is facilitated. On the other hand, in a case where a method for dividing the document contents is not appropriate, access to the contents might become complicated to inconvenience the client. SUMMARY OF THE INVENTION:
In view of the above drawbacks, the present invention has for its object to transform Web contents so that a more efficient access facility can be provided to the user of a mobile terminal device, in addition to the facilitation of the display of the contents on the display panel of the mobile device.
Another object of the present invention is to transform Web contents so that a navigation mechanism can be realized which has hyperlinks permitting a client to readily judge whether or not the contents are necessary for him/her, without going through all the contents, and permitting the client to immediately move to a place that seems to be important within the contents.
Still another object of the present invention is to transform Web contents so that a facility which permits a client to browse information by the least access (communication) similarly to the above can be provided, not only for the contents composed of a single document, but also for the enormous contents composed of a plurality of documents.
According to the present invention, when a request for Web contents is received from a terminal device, the requested Web contents are analyzed, and editorial information as well as formal paragraph information is extracted. These information together with the requested contents are linked to corresponding semantic analysis results. In the absence of the corresponding semantic analysis results, a semantic analysis program is executed for the requested Web contents so as to extract keywords, key sentences and/or key paragraphs from the Web contents . Also, the summary of the contents is created. These semantic information items obtained are saved as the semantic analysis results. Subsequently, the requested document contents are appropriately transformed on the basis of the semantic information contained in the retrieved semantic analysis results, and in accordance with the requests of a client or the attributes of the terminal device. Here, the processing of the transformation includes the creation of a top page which is formed of the title and other editorial information of the document, and menu information, the creation of a summary page, the creation of the lists of keywords, key sentences etc. and links to places where the keywords etc. appear, and the creation of the hyperlinks among the created pages. The Web contents are displayed on the terminal device interactively in compliance with the requests of the client. BRIEF DESCRIPTION OF THE DRAWINGS:
Fig. 1 is a block diagram showing an information access system in the prior art;
Fig. 2 is a block diagram showing the architecture of an apparatus according to the present invention;
Fig.3 is a flow chart showing an embodiment of the present invention;
Fig. 4 is a diagram showing an example of the list of keywords in the present invention;
Fig. 5 is a diagram showing the logical structure of a transformation description object; and
Fig. 6 is a diagram for explaining user operations in the present invention. PREFERRED EMBODIMENTS OF THE INVENTION:
A block diagram of an information access system for performing the present invention is shown in Fig.2. A contents transformation system 10 physically lies between a client device or terminal device 20 and Web contents 40 which a client searches for, and it functions as the interface between them. The contents transformation system 10 may well exist within a server computer 30. When the server computer 30 has received a request for access to Web contents 40 desired by the client, from the terminal device 20 connected through a communication network such as the Internet, the transformation system 10 accesses the Web contents 40 and a semantic analysis results 50 corresponding to the Web contents 40.
The "semantic analysis results 50" signify results which are obtained by extracting and analyzing semantic information contained in the Web contents 40 and are stored, and which can be generated beforehand by executing a semantic analysis program for the Web contents 40. In the absence of such semantic analysis results when the server computer 30 has received a request for access to Web contents 40, the semantic analysis program is executed to generate the semantic analysis results 50. Using a Web contents analyzer 120 and a semantic analysis results analyzer 130, the transformation system 10 generates a transformation description object 110 by employing the elements of the Web contents 40 requested by the client and the elements of the corresponding semantic analysis results 50. The transformation description object 110 contains information on the links between the lists of the elements contained in the Web contents 40 and the semantic analysis results 50, and Web contents corresponding to the elements. While the client and the contents transformation system 10 are communicating interactively, the transformation system 10 searches for information desired by the client, in conformity with the properties of the terminal device 20 possessed by the client or in compliance with a request made by the client, and it transmits the desired information to the terminal device 20 through the server computer 30 so as to indicate the information on the display thereof.
Numeral 140 designates a transformation engine which will be explained later.
Now, an embodiment of the present invention will be described. The flow chart of the embodiment is illustrated in Fig. 3.
Step 210: A terminal device makes a request for access to Web contents .
Step 220: The results of a semantic analysis concerning the requested Web contents are retrieved.
Step 230: It is checked if the semantic analysis results are found.
Step 240: Unless the semantic analysis results are found, a semantic analysis program is executed. Step 250: A transformation description object is generated by analyzing the Web contents and the semantic analysis results. Step 260: Each element of the Web contents is transformed in accordance with the request of a user and the attributes of the terminal device .
Step 270: The transformed elements are transmitted, and are displayed on the terminal device.
The embodiment will be described in detail below. A request for access to certain Web contents is transmitted from the client device 20 (in Fig. 2) connected through the communication network such as the Internet, to the server computer 30 by using the HyperText Transfer Protocol (HTTP) over transmission control protocol/Internet protocol (TCP/IP) connection. The Web contents are formatted by a standard page description language such as the extensible Markup Language (XML) .
The operation of contents transformation which proceeds in the contents transformation system 10 is broadly made up of two processing stages.
At the first stage, the contents transformation system 10 analyzes the corresponding Web contents by means of the Web contents analyzer 120 so as to extract elements contained in the Web contents. Extracted are, for example, editorial information such as the title, author and date of a document, and the body of the document, as well as formal paragraph information constituting them. Simultaneously, the contents transformation system 10 links those extracted information to the semantic analysis results 50 corresponding to the Web contents 40. Using the link, the system 10 can retrieve the semantic analysis results 50 as required.
The semantic analysis results 50 hold the semantic information of the Web contents 40 in the XML format. The semantic information contains the information of extracted keywords, key sentences or key paragraphs , positions where they appear in the document, and so forth. Also contained is information on a text structure which indicates the semantic consistency of the document as obtained by analyzing the contexts between sentences . The semantic information, however, is not restricted to such exemplary information. An example of parts relevant to keywords, extracted from the semantic analysis results 50, is shown in Fig. 4.
In a case where the semantic analysis results 50 are not created beforehand, or where they are unavailable for any reason, the semantic analysis program is executed for the requested Web contents 40 so as to extract the semantic information of the contents 40. The semantic information obtained is saved as the semantic analysis results 50 in the XML format. Regarding the extraction of the keyword, a word (noun) of high frequency of appearance is set as the keyword on the basis of the assumption that the word often appearing in the document tends to indicate the theme of the document. A technique for weighting a word in accordance with the rate of appearance is detailed in "Automatic Text Processing" written by G. Salton, published by Addison-Wesley Publishing Company in 1989. Besides, the key sentence is extracted in such a way that the respective words are weighted in consideration of the frequencies of appearance of the words and the number of texts in which the words appear, and that the summation of the weights of the words which appear in the sentence is deemed the level of importance of the sentence . This method has been proposed by K. Zechner, and is stated in "Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences" in the Proceedings of the 16th International Conference on Computational Linguistics, pp.986-989, 1996. Results obtained by the method are used also in this embodiment.
Regarding the semantic structuring of the document, the document is analyzed on the basis of a rhetorical structure analysis advocated by William C. Mann and Sandra A. Thompson. Details concerning this method are stated in "Rhetorical Structure Theory and Text Analysis" which is contained in "Discource Description: Diverse Linguistics Analyses of a Fund-Raising Text" written by W. C. Mann & S. A. Thompson, published by John Benjamins Publishing Company in 1992.
Subsequently, the contents transformation system 10 analyzes the semantic analysis results 50 by means of the semantic analysis results analyzer 130 so as to extract, for example, the list of keywords, words or word groups deeply relevant to the respective keywords, and information on places where they appear in the document . Similarly extracted are information on the key sentences, key paragraphs and summary of the document .
Next, using the results of the Web contents analyzer 120 and the semantic analysis results analyzer 130, the contents transformation system 10 creates the transformation description object 110. The transformation description object 110 contains link information for the Web contents 40 in which the lists of the keywords, key sentences etc. and information on the elements thereof are stored. When a client designates a desired one of the elements within the lists of the keywords, key sentences etc., the contents transformation system 10 retrieves relevant information and provides the retrieved information to the client. In this embodiment, the transformation description object 110 has a structure as shown in Fig.5 and is expressed as an XML document object . The object 110 holds a logical structure which expresses the creations of the following elements : (a) Top page information
Top page which is formed of the editorial information of the document, such as the title, author and date thereof, menu information having links to the respective information items, and so forth (b) Summary
Page which contains only the summary of the document
(c) Keyword page information
Keyword page which contains the list of the extracted keywords, and links to places where the keywords appear in the document
(d) Key phrase page information
Key phrase page which contains the list of key phrases relevant to the keywords, and links to places where the key phrases appear in the document
(e) Key sentence page information
Key sentence page which contains the list of the extracted key sentences, and links to places where the key sentences appear in the document
(f) Key paragraph page information
Key paragraph page which contains the list of the extracted key paragraphs, and links to places where the key paragraphs appear in the document
(g) Hyperlinks among the elements
Hyperlinks which indicate the relevance among the created pages
A method for generating the transformation description object 110 will be explained. First, the transformation engine 140 defines transformation rules, namely, a series of rules for' the display aspects of the elements included in the Web contents 40 and the semantic analysis results 50, on the client device 20; the information of link destinations in the case where the elements are linked; and so forth. The transformation engine 140 transforms the respective elements included in the Web contents 40 and the semantic analysis results 50, on the basis of the transformation rules defined for all the elements. At this stage, however, the transformation engine 140 does not execute the final transformationprocessing of the contents yet, but it merely builds the logical structure of the transformed contents, that is, generates the object which describes transforming methods for the elements.
The transformed document can have the structure as shown in Fig. 5, as its logical structure. In this embodiment, the logical structure is formed of the top page which contains the editorial information of the document and the links to the summary, keywords and key sentences, the pages which contain the lists of the keywords, key sentences etc. and the links to the places where the keywords, key sentences etc. appear in the document, respectively, and document fragments which are obtained by dividing the body of the document into parts of appropriate size.
Further, at the second stage, the transformation processing of the contents is actually executed by the transformation engine 140. An access request from the client device 20 is transmitted to the Web server 30 by using the HTTP protocol. Herein, information items on a communication facility, a display facility, etc. incorporated in the terminal 20 can be contained as parts of an HTTP header. The transformation processing is executed for the respective elements in accordance with the information items on the terminal attributes, and the transformation description object 110 created at the first stage. Thus, the pages of the body of the document are created, while at the same time, the pages and hyperlinks (a) - (g) mentioned above are created.
An example of the communications between a client or user and the contents transformation system 10 will now be explained with reference to Fig. 6. When the client device 20 displays a top page (a) and the client wants to know information about a "keyword" or a "key phrase relevant to a keyword", he/she selects "keywords" to open a "keyword page" (b) . An anchor to a page which contains the list of keywords and key phrases relevant to the respective keywords is indicated on the keyword page (b) . When any of the keywords, for example, "keyword 1" is selected on the keyword page (b) , the part of the "keyword 1" in the body of a document is displayed. In a case where a plurality of parts exist for the "keyword 1" within the identical document, these parts of the "keyword 1" are displayed in succession. Besides, when the client wants to know information about the "key phrase relevant to the keyword", he/she designates, for example, a "key phrase relevant to the keyword 1" corresponding to the pertinent keyword (keyword 1) on the keyword page (b) , thereby to open a "key phrase page" (d) . Likewise, when the client selects a "key phrase 1 relevant to the keyword 1" , the part of the "key phrase 1 relevant to the keyword 1" in the body of the document is displayed. In a case where a plurality of parts exist for the "key'phrase 1 relevant to the keyword 1" within the identical document, these parts of the "key phrase 1 relevant to the keyword 1" are displayed in succession.
In this manner, the client can readily grasp the whole document without going through all the document contents. Further, it is possible to cope with even the presence of such a limitation that the display screen of the client device 20 is small .
Accordingly, not only the display of Web contents on the display panel of a mobile device is facilitated, but also a more efficient access facility can be provided to the user of the mobile terminal device. It is also possible to realize a navigation mechanismwhich has hyperlinks permitting the client to readily judge whether or not the contents are necessary for him/her, without going through all the contents, and permitting the client to immediately move to a place that seems to be important within the contents. Further, it is possible to provide a facility which permits the client to browse information by the least access (communication) similarly to the above can be provided, not only for the contents composed of a single document, but also for the enormous contents composed of a plurality of documents.
Computer program codes for executing the operation of the present invention should desirably be created with an object-oriented programming language such as Java or C++. However, they can also be created with a conventional procedure-oriented programming language such as C, or a functional programming language.
In this embodiment, the contents transformation processing is implemented as a Java Servlet by using the Java programming language and is executed in the Web server 30. Alternatively, the processing can also be implemented as a common gateway interface (CGI) application or as logic contained in an active server page (ASP) .
Besides, in this embodiment, all the program codes are executed on the Web server 30. It is also possible, however, to execute some of the program codes on the Web server 30 and the others on a Web proxy.
According to the present invention, not only the display of document contents on the display panel of a mobile terminal device is facilitated, but also more efficient access to the contents can be realized, owing to a dynamic contents transformation method in which new hyperlinks based on the key information of a document, such as keywords and key sentences, are generated with reference to the results of the semantic analysis of the document contents, and in which the document contents are appropriately divided on the basis of results obtained by semantically structuring the whole document, and terminal attributes indicating communication and display facilities incorporated in a terminal device making access. Besides, information providing/browsing can be realized by the least access (communication) even for enormous Web contents, owing to a navigation mechanism which has hyperlinks permitting a client to readily judge whether or not the contents are necessary for him/her, from the summary, key elements, correlated keywords, etc. of at least one pertinent document and without going through all the contents, and permitting the client to immediately move to a place that seems important within the contents. These functions are very effective for access to the Web contents from, not only the mobile terminal device, but also a conventional desktop computer.

Claims

WHAT IS CLAIMED IS:
1. A method for transforming Web contents that contain one or more elements, in order to display the contents on a terminal device connected to a server computer with a communication network, comprising:
(a) the step of allowing said server computer to receive a request for access to said Web contents, from said terminal device;
(b) the step of retrieving semantic analysis results which concern the requested Web contents;
(c) the step of generating a transformation description object which associates at least one of the elements included in said Web contents with said semantic analysis results; and
(d) the step of transforming said at least one element so as to fit attributes of said terminal device, by using said transformation description object.
2. A method as defined in Claim 1, wherein said step of retrieving said semantic analysis results which concern said Web contents includes the step of executing a semantic analysis for said Web contents.
3. A method as defined in Claim 1, wherein said transformation description object is an extensible markup language (XML) document object.
4. A method as defined in Claim 1, wherein said transformation description object contains either of link information for places where said at least one element associated appears within said Web contents, and link information for another of said elements as is relevant to the associated element.
5. A method as defined in Claim 1, wherein said step of generating said transformation description object includes either of the step of dividing said at least one element into a plurality of elements, and the step of integrating the plurality of elements into at least one element.
6. A method as defined in Claim 1, wherein said step of generating said transformation description object includes the step of generating at least one new relevant element by employing at least one of elements included in said Web contents and said semantic analysis results.
7. A method as defined in Claim 1, wherein said step of transforming said at least one element includes the step of transforming said element so as to comply with a request made by a user of said terminal device.
8. An apparatus for transforming Web contents that contain one or more elements, in order to display the contents on a terminal device connected to a server computer with a communication network, comprising:
(a) means for allowing said server computer to receive a request for access to said Web contents, from said terminal device; (b) means for retrieving semantic analysis results which concern the requested Web contents;
(c) means for generating a transformation description object which associates at least one of the elements included in said Web contents with said semantic analysis results; and
(d) means for transforming said at least one element so as to fit attributes of said terminal device, by using said transformation description object.
PCT/US2001/030691 2000-10-02 2001-10-02 Method and apparatus for transforming contents on the web WO2002029590A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US10/381,507 US20040054973A1 (en) 2000-10-02 2001-10-02 Method and apparatus for transforming contents on the web
KR10-2003-7004677A KR20030079919A (en) 2000-10-02 2001-10-02 Method and apparatus for transforming contents on the web
EP01981345A EP1323051A1 (en) 2000-10-02 2001-10-02 Method and apparatus for transforming contents on the web

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2000302728A JP2002116983A (en) 2000-10-02 2000-10-02 Method and system for converting web contents
JP2000-302728 2000-10-02

Publications (2)

Publication Number Publication Date
WO2002029590A1 true WO2002029590A1 (en) 2002-04-11
WO2002029590A8 WO2002029590A8 (en) 2002-07-11

Family

ID=18784035

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/030691 WO2002029590A1 (en) 2000-10-02 2001-10-02 Method and apparatus for transforming contents on the web

Country Status (5)

Country Link
EP (1) EP1323051A1 (en)
JP (1) JP2002116983A (en)
KR (1) KR20030079919A (en)
CN (1) CN1254751C (en)
WO (1) WO2002029590A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002095594A1 (en) * 2001-05-18 2002-11-28 Sharp Kabushiki Kaisha Content delivery system, content server, and content delivery method
FR2849308A1 (en) * 2002-12-18 2004-06-25 France Telecom Document summaries providing method, involves transmitting summary document to user terminal as replacement for intercepted document displayed on terminal by navigation software
WO2005022411A1 (en) * 2003-09-01 2005-03-10 Koninklijke Philips Electronics N.V. Interface for transcoding system
CN100351832C (en) * 2003-03-28 2007-11-28 联想(北京)有限公司 Moving browse equipment and method of data self-adapting

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7373347B2 (en) 2002-07-22 2008-05-13 Ricoh Company, Ltd. Information processing apparatus and information processing method
JP2004078655A (en) * 2002-08-20 2004-03-11 Ntt Advanced Technology Corp Information management apparatus, its method, and information management program
CN101258494B (en) * 2005-09-08 2010-10-27 国际商业机器公司 Method and system for improving client-servlet communication
KR100870146B1 (en) * 2006-08-22 2008-11-24 주식회사 미디어워크 A system of learning using mobile unit and a method thereof
FR2935855B1 (en) * 2008-09-11 2010-09-17 Alcatel Lucent METHOD AND COMMUNICATION SYSTEM FOR DETERMINING A SERVICE SEQUENCE RELATED TO A CONVERSATION.
KR101134267B1 (en) 2010-04-14 2012-04-12 한국과학기술원 System and method for converting content
KR101667199B1 (en) * 2015-01-26 2016-10-18 (주)해나소프트 Relative quality index estimation apparatus of the web page using keyword search
CN108733635B (en) * 2017-04-24 2021-12-03 珠海金山办公软件有限公司 Text information display method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5727159A (en) * 1996-04-10 1998-03-10 Kikinis; Dan System in which a Proxy-Server translates information received from the Internet into a form/format readily usable by low power portable computers
US5991713A (en) * 1997-11-26 1999-11-23 International Business Machines Corp. Efficient method for compressing, storing, searching and transmitting natural language text

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5727159A (en) * 1996-04-10 1998-03-10 Kikinis; Dan System in which a Proxy-Server translates information received from the Internet into a form/format readily usable by low power portable computers
US5991713A (en) * 1997-11-26 1999-11-23 International Business Machines Corp. Efficient method for compressing, storing, searching and transmitting natural language text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Spyglass Prims 1.0", SPYGLASS INC., 1997, pages 1 - 2, XP002907214 *
"Spyglass prism concepts and applications", SPYGLASS INC., 1997, pages 1 - 8, XP002907213 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002095594A1 (en) * 2001-05-18 2002-11-28 Sharp Kabushiki Kaisha Content delivery system, content server, and content delivery method
US7319470B2 (en) 2001-05-18 2008-01-15 Sharp Kabushiki Kaisha Content delivery system, content server, and content delivery method
FR2849308A1 (en) * 2002-12-18 2004-06-25 France Telecom Document summaries providing method, involves transmitting summary document to user terminal as replacement for intercepted document displayed on terminal by navigation software
CN100351832C (en) * 2003-03-28 2007-11-28 联想(北京)有限公司 Moving browse equipment and method of data self-adapting
WO2005022411A1 (en) * 2003-09-01 2005-03-10 Koninklijke Philips Electronics N.V. Interface for transcoding system

Also Published As

Publication number Publication date
EP1323051A1 (en) 2003-07-02
CN1473297A (en) 2004-02-04
WO2002029590A8 (en) 2002-07-11
CN1254751C (en) 2006-05-03
KR20030079919A (en) 2003-10-10
JP2002116983A (en) 2002-04-19

Similar Documents

Publication Publication Date Title
US20040054973A1 (en) Method and apparatus for transforming contents on the web
JP3548098B2 (en) Method and system for providing a native language query service
US20020016801A1 (en) Adaptive profile-based mobile document integration
KR100461019B1 (en) web contents transcoding system and method for small display devices
US6925595B1 (en) Method and system for content conversion of hypertext data using data mining
KR100265548B1 (en) Automatic translating method and machine
JP4398098B2 (en) Glamor template query system
US6745181B1 (en) Information access method
AU706512B2 (en) System and method for automatically adding informational hypertext links to received documents
US20050149500A1 (en) Systems and methods for unification of search results
JP2000090001A (en) Method and system for conversion of electronic data using conversion setting
SE524391C2 (en) Method and system for content conversion of electronic documents for wireless clients.
US6738827B1 (en) Method and system for alternate internet resource identifiers and addresses
EP1428139A2 (en) System and method for extracting content for submission to a search engine
Schilit et al. m-links: An infrastructure for very small internet devices
EP1247213B1 (en) Method and apparatus for creating an index for a structured document based on a stylesheet
JPH11161682A (en) Device and method for retrieving information and recording medium
KR100456022B1 (en) An XML-based method of supplying Web-pages and its system for non-PC information terminals
EP1323051A1 (en) Method and apparatus for transforming contents on the web
JPH0844643A (en) Gateway device
US20020099852A1 (en) Mapping and caching of uniform resource locators for surrogate Web server
US7343372B2 (en) Direct navigation for information retrieval
KR20020017966A (en) Method and apparatus in a data processing system for word based render browser for skimming or speed reading web pages
KR100519748B1 (en) Method and apparatus for internet navigation through continuous voice command
KR19990078876A (en) Information search method by URL input

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN KR US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

AK Designated states

Kind code of ref document: C1

Designated state(s): CN KR US

AL Designated countries for regional patents

Kind code of ref document: C1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

CFP Corrected version of a pamphlet front page
CR1 Correction of entry in section i
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2001981345

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1020037004677

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 018183565

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2001981345

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 10381507

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1020037004677

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 2001981345

Country of ref document: EP