WO2008058423A1

WO2008058423A1 - Method for step-type downloading and displaying pdf file

Info

Publication number: WO2008058423A1
Application number: PCT/CN2006/003061
Authority: WO
Inventors: Yuqian Xiong
Original assignee: Yuqian Xiong
Priority date: 2006-11-14
Filing date: 2006-11-14
Publication date: 2008-05-22

Abstract

A method for step-type downloading and displaying PDF file adopts a PDF representation client terminal and a content server terminal, and further adopts a preprocessor to achieve a real step-type downloading without adopting different file formats.

Description

Step-by-step download method for displaying PDF files

TECHNICAL FIELD The present invention relates to a method of displaying any PDF document on the Internet. BACKGROUND OF THE INVENTION Currently, due to the non-linear organization in the PDF (Portable Document Format) format, PDF files are not optimized for network step-by-step downloading: Each page in a PDF document depends on various resources, and these resources It may appear in different places in the document, thus causing the user to spend a lot of time downloading and reading the required document information. SUMMARY OF THE INVENTION It is an object of the present invention to provide a method for step-by-step downloading and displaying a PDF file, which adopts a PDF presentation client and a content server, and implements a true step by means of a preprocessor. Download without having to use a different file format.

The method for step-by-step downloading and displaying a PDF file of the present invention is characterized in that it comprises: a PDF presentation client program, a content server program and a preprocessor program, the program comprising the following steps:

1. When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;

2. When the content server receives a document request, it loads the index data corresponding to the document. Then, it sends the basic information of the document to the client, including (but not limited to): (1) Basic information about the document, including the author of the document, title, etc.;

(2) The total number of pages of the document;

(3) The location and size of the indirect object in the flattened document;

(4) Basic properties of each page, including: page width, page height, indirect ID of the page object;

3. When the client requests a page, it simply sends the page number to the content server;

4. When the content server receives a page request, it sends the following data to the client:

(1) an indirect object representing the requested page;

(2) In the dependency list of the page, except for those objects that have been previously sent, all remaining indirect objects. Each indirect object is sent in its original PDF syntax.

The significant advantages of the present invention are that it is easy to operate, the document information is extracted quickly, and the user's work efficiency can be effectively improved. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Specific embodiments of the present invention are as follows:

A flat PDF document is a standard PDF document that conforms to the PDF specification, but it does not use the object stream feature. If the initial PDF document contains object streams, the preprocessor will break up the object streams into separate indirect objects.

The standard PDF document contains a cross-reference list that will also appear in the flattened PDF. This table stores the location of all indirect objects. If the initial document contains an object stream, it is possible for the preprocessor to modify the table. Flat PDF documents can be stored as a file or placed in other types of data stores, such as in a database.

Step-by-step download is a way to download document content from a computer over a network. In this mode, the order in which the document content is downloaded ensures that the document can be displayed, and the user can operate the document as soon as possible. For example, if the user wants to see the first page of the document, the step-by-step download process should display the first page immediately after the first page of content is downloaded. As another example, when a user wishes to go directly to the last page of a document, the download process should allow the last page to be loaded immediately without having to tune into the middle page. The steps are as follows -

2. When the content server receives a document request, it loads the index data corresponding to the document. It then sends the basic information of the document to the client, including but not limited to:

(1) Basic information about the document, including the author of the document, title, etc.;

(2) The total number of pages of the document;

(3) The location and size of the indirect object in the flattened document;

(4) Basic properties of each page, including: page width, page height, and indirect ID of the page object;

(1) an indirect object representing the requested page; (2) In the dependency list of the page, except for those objects that have been previously sent, all remaining indirect objects. Each indirect object is sent according to its original PDF syntax.

The flow of the program is -

1, pretreatment

When a PDF file is loaded into the content server and is represented by a client request, it must be preprocessed by the preprocessor. The input to the preprocessor is the initial PDF file, and the output is a flat PDF document and corresponding index data.

(1) Flat PDF generation

Because the flattened PDF does not support object streams, the preprocessor is called into the original PDF document to detect if the object uses the object stream. This detection process is accomplished by searching for cross-referenced streams and checking the flags of each indirect object in the cross-referenced stream. For details, see the PDF Reference » Fifth Edition, Section 3.4.6.

If an object stream is used in the initial document, the preprocessor decompresses the object stream and writes all indirect objects in the stream as separate indirect objects. When each individual indirect object is written, the cross-reference list will also be modified accordingly. Finally, the modified cross-reference list is output to the flattened PDF result document.

Even if a PDF document does not use an object stream, the preprocessor can still load all the inline objects, write them to a new document, and modify the cross-reference table accordingly. In this process, the preprocessor may fix some common problems in the cross-reference list and create the correct index data during the rewriting process.

(2) Generation of index data

The first part of the index data is document-level data, such as the author of the document, the title, the total number of pages, and so on. They can be generated when the initial document is loaded. The second part of the index data is the location and size information of all indirect objects. This data is calculated when each indirect object is written to a flat PDF document.

The third part of the index data is the page level data, including page width, page height, indirect ID of the page object, and so on. This data can be generated after all data in the document has been loaded.

(3) page dependency detection

The last part of the index data is the list of dependencies. This requires a dependency detection process that takes a page as input and then outputs a list of all indirect objects that the page depends on.

A page depends on an indirect object, meaning that if the data of the indirect object is not loaded, the page will not display properly. We use a tree-based index lookup procedure to determine the list of dependent objects:

• The process begins with an indirect object that represents the page;

• For each indirect object referenced by the current object, we determine if it satisfies one of the following conditions:

■ The referenced object is in a sublist of the page tree data structure;

■ Referenced objects are used for performance that we do not support in the client, such as annotations, internal data structures, etc.

■ The referenced object is already in the list of dependencies.

If the object used by the bow I satisfies one of the above conditions, it will be ignored.

• If the referenced indirect object does not satisfy any of the above conditions, it will be added to the list of dependencies, and then the above steps will continue until all the indirect objects referenced are processed.

In other cases, an object depends on other objects, but it is not directly referenced. The object. An example is the "name tree". In PDF, sometimes a target location (chapter, section, page, etc.) can be represented by a name. The actual target location is stored in the name tree. In this case, when a page references a named target location, the page will depend on the entire tree of names.

We have two ways to solve the problem of indirect references:

1) Add the root node of the name tree, or other types of indirect reference objects, to the list of dependencies and expand from it;

2) Replace indirect references with direct references. For example, we can replace the naming target location with "direct target location", which contains data that points directly to the target location. Such a process does not change the content or performance of any PDF.

The last output dependency list will list the IDs of all dependent indirect objects, possibly with a total before them, or an end flag.

2, the content server

When the content server starts, it sets up a communication port and waits for a connection to that port.

(1) Provide documentation

When the client and the content server establish a connection, the server may receive a request to obtain a specific document.

The content server will load the flat PDF document and the corresponding index data, and send the network information back and forth to request.

For improved performance, the index data should remain in the computer's memory until the client and server connections are lost.

The content server also maintains a list of all indirect objects that have been transmitted by each document in each active connection. (2) Provide page

When the performance client requests to get a specific page in the document, the content server will query the index data and find the following data for the requested page:

• The ID of the indirect object representing the page

• The ID of all indirect objects that the page depends on.

The content server should send an indirect object representing the page, as well as all indirect objects in the dependency list that have not been previously sent (refer to the list of already sent objects). For each object that is sent, it should be added to the list of already sent objects.

For each indirect object to be transferred, the content server should query the index data to determine its location and size, and read out a specific portion of the flattened PDF document and send that portion to the client.

After all objects have been transmitted, the content server should send a flag to the client informing the client that all data used to display the page has been transferred.

3, performance client

The presentation client maintains a connection to the content server, sends a request based on the user's actions, and displays the page when the page is available.

During the connection process, the client maintains an internal form of the PDF document containing the indirect objects that the client receives from the content server. The internal document may not be complete, but it is sufficient to display certain pages requested by the user.

(1) requesting a document

When a user requests a document, the client creates a connection to a specific content server and sends a document request, including the ID of the document. When the client receives a response from the server, the client will use the basic information of all pages to determine the display ruler for all pages. With this ruler information, the client can display blank pages correctly. The client can also set the scroll bar information accordingly.

(2) Request to get the page

When a document's information is received, or if the user's action needs to display a new page, the client sends a page request containing the page number of the requested page.

The client then begins receiving data for all indirect objects needed for that particular page. Indirect object data can be written to an internal form of a PDF document, assembled into a temporary document that, although incomplete, can be used to display the requested page.

The client may request access to these pages even if the user has not yet viewed the new page. This option keeps the communication connection busy so that when the user really needs to view a new page (usually the next page of the current page), the client improves the response speed.

(3) Display page

When the client receives the notification from the server and informs that all the indirect objects required by the requested page have been sent, the page will be displayed normally, and the effect is like displaying a normal PDF document, but actually the document Other pages may still be empty at this time.

Claims

Claim

A method for step-by-step downloading and displaying a PDF file, comprising: a PDF presentation client program, a content server program, and a preprocessor program, the program comprising the following steps -

_A. When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;

B. When the content server receives a document request, it loads the index data corresponding to the document. It then sends the basic information of the document to the client, including but not limited to:

(2) The total number of pages of the document;

(3) The location and size of the indirect object in the flattened document;

C. When the client requests a page, it simply sends the page number to the content server;

D. When the content server receives a page request, it sends the following data to the client:

(1) an indirect object representing the requested page;