WO2008058423A1 - Method for step-type downloading and displaying pdf file - Google Patents

Method for step-type downloading and displaying pdf file Download PDF

Info

Publication number
WO2008058423A1
WO2008058423A1 PCT/CN2006/003061 CN2006003061W WO2008058423A1 WO 2008058423 A1 WO2008058423 A1 WO 2008058423A1 CN 2006003061 W CN2006003061 W CN 2006003061W WO 2008058423 A1 WO2008058423 A1 WO 2008058423A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
page
client
indirect
content server
Prior art date
Application number
PCT/CN2006/003061
Other languages
French (fr)
Chinese (zh)
Inventor
Yuqian Xiong
Original Assignee
Yuqian Xiong
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuqian Xiong filed Critical Yuqian Xiong
Priority to PCT/CN2006/003061 priority Critical patent/WO2008058423A1/en
Publication of WO2008058423A1 publication Critical patent/WO2008058423A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing

Definitions

  • the present invention relates to a method of displaying any PDF document on the Internet.
  • PDF Portable Document Format
  • PDF files are not optimized for network step-by-step downloading: Each page in a PDF document depends on various resources, and these resources It may appear in different places in the document, thus causing the user to spend a lot of time downloading and reading the required document information.
  • the method for step-by-step downloading and displaying a PDF file of the present invention is characterized in that it comprises: a PDF presentation client program, a content server program and a preprocessor program, the program comprising the following steps:
  • a client When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
  • the content server When the content server receives a document request, it loads the index data corresponding to the document. Then, it sends the basic information of the document to the client, including (but not limited to): (1) Basic information about the document, including the author of the document, title, etc.;
  • a flat PDF document is a standard PDF document that conforms to the PDF specification, but it does not use the object stream feature. If the initial PDF document contains object streams, the preprocessor will break up the object streams into separate indirect objects.
  • the standard PDF document contains a cross-reference list that will also appear in the flattened PDF.
  • This table stores the location of all indirect objects. If the initial document contains an object stream, it is possible for the preprocessor to modify the table.
  • Flat PDF documents can be stored as a file or placed in other types of data stores, such as in a database.
  • Step-by-step download is a way to download document content from a computer over a network.
  • the order in which the document content is downloaded ensures that the document can be displayed, and the user can operate the document as soon as possible. For example, if the user wants to see the first page of the document, the step-by-step download process should display the first page immediately after the first page of content is downloaded. As another example, when a user wishes to go directly to the last page of a document, the download process should allow the last page to be loaded immediately without having to tune into the middle page.
  • the steps are as follows -
  • a client When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
  • the content server When the content server receives a document request, it loads the index data corresponding to the document. It then sends the basic information of the document to the client, including but not limited to:
  • a PDF file When a PDF file is loaded into the content server and is represented by a client request, it must be preprocessed by the preprocessor.
  • the input to the preprocessor is the initial PDF file, and the output is a flat PDF document and corresponding index data.
  • the preprocessor is called into the original PDF document to detect if the object uses the object stream. This detection process is accomplished by searching for cross-referenced streams and checking the flags of each indirect object in the cross-referenced stream. For details, see the PDF Reference » Fifth Edition, Section 3.4.6.
  • the preprocessor decompresses the object stream and writes all indirect objects in the stream as separate indirect objects.
  • the cross-reference list will also be modified accordingly. Finally, the modified cross-reference list is output to the flattened PDF result document.
  • the preprocessor can still load all the inline objects, write them to a new document, and modify the cross-reference table accordingly.
  • the preprocessor may fix some common problems in the cross-reference list and create the correct index data during the rewriting process.
  • the first part of the index data is document-level data, such as the author of the document, the title, the total number of pages, and so on. They can be generated when the initial document is loaded.
  • the second part of the index data is the location and size information of all indirect objects. This data is calculated when each indirect object is written to a flat PDF document.
  • the third part of the index data is the page level data, including page width, page height, indirect ID of the page object, and so on. This data can be generated after all data in the document has been loaded.
  • the last part of the index data is the list of dependencies. This requires a dependency detection process that takes a page as input and then outputs a list of all indirect objects that the page depends on.
  • a page depends on an indirect object, meaning that if the data of the indirect object is not loaded, the page will not display properly.
  • the referenced object is in a sublist of the page tree data structure
  • Referenced objects are used for performance that we do not support in the client, such as annotations, internal data structures, etc.
  • an object depends on other objects, but it is not directly referenced.
  • the object An example is the "name tree".
  • a target location (chapter, section, page, etc.) can be represented by a name.
  • the actual target location is stored in the name tree.
  • the page will depend on the entire tree of names.
  • the last output dependency list will list the IDs of all dependent indirect objects, possibly with a total before them, or an end flag.
  • the content server When the content server starts, it sets up a communication port and waits for a connection to that port.
  • the server may receive a request to obtain a specific document.
  • the content server will load the flat PDF document and the corresponding index data, and send the network information back and forth to request.
  • index data should remain in the computer's memory until the client and server connections are lost.
  • the content server also maintains a list of all indirect objects that have been transmitted by each document in each active connection.
  • the content server When the performance client requests to get a specific page in the document, the content server will query the index data and find the following data for the requested page:
  • the content server should send an indirect object representing the page, as well as all indirect objects in the dependency list that have not been previously sent (refer to the list of already sent objects). For each object that is sent, it should be added to the list of already sent objects.
  • the content server For each indirect object to be transferred, the content server should query the index data to determine its location and size, and read out a specific portion of the flattened PDF document and send that portion to the client.
  • the content server should send a flag to the client informing the client that all data used to display the page has been transferred.
  • the presentation client maintains a connection to the content server, sends a request based on the user's actions, and displays the page when the page is available.
  • the client maintains an internal form of the PDF document containing the indirect objects that the client receives from the content server.
  • the internal document may not be complete, but it is sufficient to display certain pages requested by the user.
  • the client When a user requests a document, the client creates a connection to a specific content server and sends a document request, including the ID of the document.
  • the client When the client receives a response from the server, the client will use the basic information of all pages to determine the display ruler for all pages. With this ruler information, the client can display blank pages correctly. The client can also set the scroll bar information accordingly.
  • the client When a document's information is received, or if the user's action needs to display a new page, the client sends a page request containing the page number of the requested page.
  • Indirect object data can be written to an internal form of a PDF document, assembled into a temporary document that, although incomplete, can be used to display the requested page.
  • the client may request access to these pages even if the user has not yet viewed the new page. This option keeps the communication connection busy so that when the user really needs to view a new page (usually the next page of the current page), the client improves the response speed.
  • the client When the client receives the notification from the server and informs that all the indirect objects required by the requested page have been sent, the page will be displayed normally, and the effect is like displaying a normal PDF document, but actually the document Other pages may still be empty at this time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A method for step-type downloading and displaying PDF file adopts a PDF representation client terminal and a content server terminal, and further adopts a preprocessor to achieve a real step-type downloading without adopting different file formats.

Description

步进式下载显示 PDF文件的方法  Step-by-step download method for displaying PDF files
技术领域 本发明涉及一种在互联网上显示任何 PDF文档的方法。 背景技术 目前由于 PDF (Portable Document Format)格式中采用非线性的 组织方式, 这就使得 PDF文件没有为网络步进式下载进行优化: PDF 文档中每一个页面都依赖于各种资源,而这些资源可能出现在文档中 的不同地方, 因而致使用户在下载阅读所需文档信息时花费时间较 长。 发明内容 本发明的目的在于提供一种步进式下载显示 PDF文件的方法,该 方法采用了一个 PDF表现客户端和一个内容服务端,并借助于一个预 处理器, 来实现真正的步进式下载, 而无需采用不同的文件格式。 TECHNICAL FIELD The present invention relates to a method of displaying any PDF document on the Internet. BACKGROUND OF THE INVENTION Currently, due to the non-linear organization in the PDF (Portable Document Format) format, PDF files are not optimized for network step-by-step downloading: Each page in a PDF document depends on various resources, and these resources It may appear in different places in the document, thus causing the user to spend a lot of time downloading and reading the required document information. SUMMARY OF THE INVENTION It is an object of the present invention to provide a method for step-by-step downloading and displaying a PDF file, which adopts a PDF presentation client and a content server, and implements a true step by means of a preprocessor. Download without having to use a different file format.
本发明的步进式下载显示 PDF文件的方法, 其特征在于: 包括 PDF表现客户端程序、 内容服务端程序和预处理器程序, 所述的程序 包括以下的步骤:  The method for step-by-step downloading and displaying a PDF file of the present invention is characterized in that it comprises: a PDF presentation client program, a content server program and a preprocessor program, the program comprising the following steps:
1、 当一个客户端希望显示一个文档的时候, 它就向内容服务端 发送一个请求, 该请求包含了文档的标识: 文档的名称或文档 ID;  1. When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
2、 当内容服务端收到一个文档请求, 它就调入该文档对应的索 引数据。然后,它将文档的基本信息发送给客户端,发送内容包括(但 不局限于): ( 1) 该文档的基本信息, 包括文档作者、 标题、 等等;2. When the content server receives a document request, it loads the index data corresponding to the document. Then, it sends the basic information of the document to the client, including (but not limited to): (1) Basic information about the document, including the author of the document, title, etc.;
(2)文档的总页数; (2) The total number of pages of the document;
(3) 间接对象在扁平化文档中所处的位置和大小;  (3) The location and size of the indirect object in the flattened document;
(4)每个页面的基本属性, 包括: 页面宽度、 页面高度、 页面对象 的间接 ID;  (4) Basic properties of each page, including: page width, page height, indirect ID of the page object;
3、 当客户端请求一个页面, 它只需简单地将页码发送给内容服 务端;  3. When the client requests a page, it simply sends the page number to the content server;
4、 当内容服务端收到一个页面请求, 它就发送以下的数据给客 户端:  4. When the content server receives a page request, it sends the following data to the client:
( 1 )代表被请求页面的间接对象;  (1) an indirect object representing the requested page;
(2)该页面的依赖列表中, 除去那些先前已经发送的对象之外, 剩下的所有间接对象。每个间接对象都是按照它原来的 PDF语法来发 送的。  (2) In the dependency list of the page, except for those objects that have been previously sent, all remaining indirect objects. Each indirect object is sent in its original PDF syntax.
本发明的显著优点在于操作方便,提取文档信息快捷, 能有效提 高用户的工作效率。 具体实施方式 本发明的具体实施方式如下:  The significant advantages of the present invention are that it is easy to operate, the document information is extracted quickly, and the user's work efficiency can be effectively improved. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Specific embodiments of the present invention are as follows:
一个扁平化的 PDF文档是一个遵从 PDF规范的标准 PDF文档,但 它不使用对象流特性。如果初始 PDF文档包含对象流,那么预处理器 将把这些对象流分解成独立的间接对象。  A flat PDF document is a standard PDF document that conforms to the PDF specification, but it does not use the object stream feature. If the initial PDF document contains object streams, the preprocessor will break up the object streams into separate indirect objects.
标准的 PDF文档包含一个交叉索引表,该表格也将出现在扁平化 的 PDF中。该表存储了所有间接对象的位置。如果初始文档包含了对 象流, 那么预处理器就有可能修改该表。 扁平化 PDF文档可以被存储成一个文件,或者放在其它类型的数 据存储中, 比如存在数据库中。 The standard PDF document contains a cross-reference list that will also appear in the flattened PDF. This table stores the location of all indirect objects. If the initial document contains an object stream, it is possible for the preprocessor to modify the table. Flat PDF documents can be stored as a file or placed in other types of data stores, such as in a database.
步进式下载是一种通过网络从一台电脑下载文档内容的方式。在 该方式下, 文档内容下载的顺序, 保证了该文档可以被显示, 而且用 户可以尽快对该文档进行操作。举个例子, 如果用户想看文档的第一 页,那么步进式下载过程就应该在第一页内容下载完后,立即显示第 一页。另一个例子, 当一个用户希望直接转向文档的最后一页, 那么 该下载过程就应该允许最后一页立即被调入, 而无需调入中间页面。 其步骤如下- Step-by-step download is a way to download document content from a computer over a network. In this mode, the order in which the document content is downloaded ensures that the document can be displayed, and the user can operate the document as soon as possible. For example, if the user wants to see the first page of the document, the step-by-step download process should display the first page immediately after the first page of content is downloaded. As another example, when a user wishes to go directly to the last page of a document, the download process should allow the last page to be loaded immediately without having to tune into the middle page. The steps are as follows -
1、 当一个客户端希望显示一个文档的时候, 它就向内容服务端 发送一个请求, 该请求包含了文档的标识: 文档的名称或文档 ID; 1. When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
2、 当内容服务端收到一个文档请求, 它就调入该文档对应的索 引数据。然后,它将文档的基本信息发送给客户端,发送内容包括(但 不局限于):  2. When the content server receives a document request, it loads the index data corresponding to the document. It then sends the basic information of the document to the client, including but not limited to:
( 1 )该文档的基本信息, 包括文档作者、 标题、 等等;  (1) Basic information about the document, including the author of the document, title, etc.;
(2)文档的总页数;  (2) The total number of pages of the document;
(3) 间接对象在扁平化文档中所处的位置和大小;  (3) The location and size of the indirect object in the flattened document;
(4)每个页面的基本属性, 包括: 页面宽度、 页面高度、 页面对 象的间接 ID;  (4) Basic properties of each page, including: page width, page height, and indirect ID of the page object;
3、 当客户端请求一个页面, 它只需简单地将页码发送给内容服 务端;  3. When the client requests a page, it simply sends the page number to the content server;
4、 当内容服务端收到一个页面请求, 它就发送以下的数据给客 户端:  4. When the content server receives a page request, it sends the following data to the client:
( 1 )代表被请求页面的间接对象; (2)该页面的依赖列表中, 除去那些先前已经发送的对象之外, 剩下的所有间接对象。每个间接对象都是按照它原来的 PDF语法来发 送的。 (1) an indirect object representing the requested page; (2) In the dependency list of the page, except for those objects that have been previously sent, all remaining indirect objects. Each indirect object is sent according to its original PDF syntax.
程序的流程为- The flow of the program is -
1、 预处理 1, pretreatment
当 PDF文件在被内容服务端调入和被表示客户端请求之前,它必 须被预处理器进行预处理。预处理器的输入是初始 PDF文件,输出是 扁平化的 PDF文档和对应的索引数据。  When a PDF file is loaded into the content server and is represented by a client request, it must be preprocessed by the preprocessor. The input to the preprocessor is the initial PDF file, and the output is a flat PDF document and corresponding index data.
( 1 )扁平化 PDF的生成  (1) Flat PDF generation
因为扁平化的 PDF不支持对象流, 所以预处理器要调入初始 PDF 文档,检测该文档是否使用了对象流。这个检测过程是通过搜索交叉 索引流, 并检查交叉索引流中的每个间接对象的标志来实现的。具体 细节, 请参见《PDF Reference »第五版, 第 3. 4. 6节。  Because the flattened PDF does not support object streams, the preprocessor is called into the original PDF document to detect if the object uses the object stream. This detection process is accomplished by searching for cross-referenced streams and checking the flags of each indirect object in the cross-referenced stream. For details, see the PDF Reference » Fifth Edition, Section 3.4.6.
如果初始文档中使用了对象流,那么预处理器将解压对象流, 并 把流中所有的间接对象写成独立的间接对象。每个独立的间接对象被 写入的时候, 交叉索引表也将被相应地修改。最后, 被修改过的交叉 索引表被输出到的扁平化的 PDF结果文档中。  If an object stream is used in the initial document, the preprocessor decompresses the object stream and writes all indirect objects in the stream as separate indirect objects. When each individual indirect object is written, the cross-reference list will also be modified accordingly. Finally, the modified cross-reference list is output to the flattened PDF result document.
即使一个 PDF文档不使用对象流,预处理器仍然可以把所有的间 接对象调入,再把它们写到一个新的文档中去, 并相应地修改交叉索 引表。在这个过程中,预处理器可能修复交叉索引表中一些常见的问 题, 并且在重写的过程中, 创建正确的索引数据。  Even if a PDF document does not use an object stream, the preprocessor can still load all the inline objects, write them to a new document, and modify the cross-reference table accordingly. In this process, the preprocessor may fix some common problems in the cross-reference list and create the correct index data during the rewriting process.
(2)索引数据的生成  (2) Generation of index data
索引数据的第一部分是文档层面的数据, 比如文档作者, 标题, 总页数等等。 它们可以在初始文档被调入时生成。 索引数据的第二个部分是所有间接对象的位置和大小信息。这些 数据是每个间接对象被写入扁平化 PDF文档时候, 计算获得的。 The first part of the index data is document-level data, such as the author of the document, the title, the total number of pages, and so on. They can be generated when the initial document is loaded. The second part of the index data is the location and size information of all indirect objects. This data is calculated when each indirect object is written to a flat PDF document.
索引数据的第三部分是页面层次的数据,包括页面宽度,页面高 度, 页面对象的间接 ID, 等等。 这些数据可以在文档内所有数据调 入后生成。  The third part of the index data is the page level data, including page width, page height, indirect ID of the page object, and so on. This data can be generated after all data in the document has been loaded.
(3 ) 页面依赖性检测  (3) page dependency detection
索引数据的最后一个部分是依赖性列表。这就需要一个依赖性检 测过程, 该过程以一个页面为输入, 然后输出一个列表, 包含该页面 所依赖的所有间接对象。  The last part of the index data is the list of dependencies. This requires a dependency detection process that takes a page as input and then outputs a list of all indirect objects that the page depends on.
一个页面依赖于一个间接对象,是指如果该间接对象的数据没有 被调入,那么该页面将无法正常显示。我们用一种基于树的索引査找 过程, 来确定依赖对象列表:  A page depends on an indirect object, meaning that if the data of the indirect object is not loaded, the page will not display properly. We use a tree-based index lookup procedure to determine the list of dependent objects:
• 该过程从代表页面的间接对象开始;  • The process begins with an indirect object that represents the page;
• 对每个当前对象引用的间接对象,我们判断它是否满足以下条 件之一:  • For each indirect object referenced by the current object, we determine if it satisfies one of the following conditions:
■ 被引用的对象, 是在页面树数据结构的子列表中的; ■ The referenced object is in a sublist of the page tree data structure;
■ 被引用的对象用于我们在客户端中不支持的性能, 比如标 注, 内部数据结构等; ■ Referenced objects are used for performance that we do not support in the client, such as annotations, internal data structures, etc.
■ 被引用的对象已经在依赖列表中了。  ■ The referenced object is already in the list of dependencies.
如果被弓 I用的对象满足上述条件之一, 那么它将被忽略。 If the object used by the bow I satisfies one of the above conditions, it will be ignored.
• 如果被引用的间接对象不满足上述任何一个条件,那么它将被 加入到依赖列表中, 然后, 上述步骤将继续, 直到所有被引用 到的间接对象都被处理为止。 • If the referenced indirect object does not satisfy any of the above conditions, it will be added to the list of dependencies, and then the above steps will continue until all the indirect objects referenced are processed.
还有其它一些情况下,一个对象依赖于其它对象,但不直接引用 该对象。 一个例子就是 "名称树"。 在 PDF中, 有时候, 一个目标位 置 (章, 节, 页, 等)可以用一个名称来表示。 而实际的目标位置, 则存储在名称树中。在这种情况中, 当一个页面引用到一个命名目标 位置时, 该页面将依赖于整棵名称树。 In other cases, an object depends on other objects, but it is not directly referenced. The object. An example is the "name tree". In PDF, sometimes a target location (chapter, section, page, etc.) can be represented by a name. The actual target location is stored in the name tree. In this case, when a page references a named target location, the page will depend on the entire tree of names.
我们有两种方法来解决间接引用的问题:  We have two ways to solve the problem of indirect references:
1 ) 将名称树的根节点, 或者其它类型的间接引用对象, 加入到依 赖列表中, 并从它开始扩展;  1) Add the root node of the name tree, or other types of indirect reference objects, to the list of dependencies and expand from it;
2 ) 将间接引用替换为直接引用。 比如, 我们可以将命名目标位置, 替换为 "直接目标位置", 后者包含了直接指向目标位置的数据。 这 样的过程, 不会改变任何 PDF的内容或表现效果。  2) Replace indirect references with direct references. For example, we can replace the naming target location with "direct target location", which contains data that points directly to the target location. Such a process does not change the content or performance of any PDF.
最后输出的依赖性列表将列出所有依赖的间接对象的 ID, 可能 前面有一个总数, 或者后面跟着一个结束标志。  The last output dependency list will list the IDs of all dependent indirect objects, possibly with a total before them, or an end flag.
2、 内容服务端 2, the content server
当内容服务端启动时,它设立了一个通讯端口并等待到该端口的 连接。  When the content server starts, it sets up a communication port and waits for a connection to that port.
( 1 )提供文档  (1) Provide documentation
当表示客户端和内容服务端建立起一个连接时,服务端可能收到 一个请求, 要求获取某个特定的文档。  When the client and the content server establish a connection, the server may receive a request to obtain a specific document.
内容服务端将调入扁平化的 PDF文档和对应的索引数据,并发送 网络信息来回应该请求。  The content server will load the flat PDF document and the corresponding index data, and send the network information back and forth to request.
为了提高的性能,索引数据应该保留在电脑内存中,直到客户端 和服务端的连接断开为止。  For improved performance, the index data should remain in the computer's memory until the client and server connections are lost.
内容服务端还要维护一个列表,记录在每一个活动的连接中,每 一个文档已经传送的所有间接对象。 (2)提供页面 The content server also maintains a list of all indirect objects that have been transmitted by each document in each active connection. (2) Provide page
当表现客户端请求获取文档中的一个特定页面时,内容服务端将 査询索引数据, 并找到被请求页面的下列数据:  When the performance client requests to get a specific page in the document, the content server will query the index data and find the following data for the requested page:
• 表示该页面的间接对象的 ID  • The ID of the indirect object representing the page
• 该页面所依赖的所有间接对象的 ID.  • The ID of all indirect objects that the page depends on.
内容服务端应该发送代表该页面的间接对象,以及所有在依赖性 列表中的,先前还未发送的间接对象 (参照已经发送对象列表)。对于 每一个发送的对象, 它应该被添加到已经发送对象列表中。  The content server should send an indirect object representing the page, as well as all indirect objects in the dependency list that have not been previously sent (refer to the list of already sent objects). For each object that is sent, it should be added to the list of already sent objects.
对于每一个将被传送的间接对象,内容服务端应该査询索引数据 来确定它的位置和大小,并读出扁平化 PDF文档的特定部分,把该部 分发送给客户端。  For each indirect object to be transferred, the content server should query the index data to determine its location and size, and read out a specific portion of the flattened PDF document and send that portion to the client.
当所有对象都被传送之后,内容服务端应该发送一个标志给客户 端, 告知客户端, 用于显示该页面的所有数据都已经被传送完毕。  After all objects have been transmitted, the content server should send a flag to the client informing the client that all data used to display the page has been transferred.
3、 表现客户端 3, performance client
表现客户端保持着到内容服务端的一个连接,根据用户的动作发 送请求, 并在页面可用的时候, 显示该页面。  The presentation client maintains a connection to the content server, sends a request based on the user's actions, and displays the page when the page is available.
在连接过程中,客户端维护着一个内部形式的 PDF文档,包含了 客户端从内容服务端收到的间接对象。 该内部文档也许不是完整的, 但它足以显示用户要求的某些页面。  During the connection process, the client maintains an internal form of the PDF document containing the indirect objects that the client receives from the content server. The internal document may not be complete, but it is sufficient to display certain pages requested by the user.
( 1 ) 请求获取文档  (1) requesting a document
当用户请求一个文档时,客户端创建一个到特定内容服务端的连 接, 并发送一个文档请求, 包括文档的 ID。 当客户端收到服务端的回应时,客户端将用所有页面的基本信息 来确定所有页面的显示标尺。有了这个标尺信息,客户端就能正确地 显示空白页面。 客户端也能够相应地设置滚动条信息。 When a user requests a document, the client creates a connection to a specific content server and sends a document request, including the ID of the document. When the client receives a response from the server, the client will use the basic information of all pages to determine the display ruler for all pages. With this ruler information, the client can display blank pages correctly. The client can also set the scroll bar information accordingly.
(2)请求获取页面  (2) Request to get the page
当一个文档的信息被接收完毕后,或者用户的动作需要显示一个 新的页面,客户端就会发送一个页面请求,包含了被请求页面的页码。  When a document's information is received, or if the user's action needs to display a new page, the client sends a page request containing the page number of the requested page.
然后客户端就开始接收该特定页面所需要的全部间接对象的数 据。 间接对象的数据, 可以被写入内部形式的 PDF文档, 组装成一个 临时的文档,该文档虽然是不完整的,但是可以被用来显示被请求的 页面。  The client then begins receiving data for all indirect objects needed for that particular page. Indirect object data can be written to an internal form of a PDF document, assembled into a temporary document that, although incomplete, can be used to display the requested page.
即使用户还没有查看到新的页面的时候,客户端也可能请求获取 这些页面。这个选项使通讯连接保持繁忙状态, 从而当用户真的需要 査看一个新的页面(通常是当前页面的下一个页面)的时候, 客户端 提高了响应速度。  The client may request access to these pages even if the user has not yet viewed the new page. This option keeps the communication connection busy so that when the user really needs to view a new page (usually the next page of the current page), the client improves the response speed.
(3)显示页面  (3) Display page
当客户端收到服务端的通知,告知被请求页面所需要的全部间接 对象都己经发送完毕的时候, 该页面将被正常显示, 效果就如同显示 一个正常的 PDF文档, 而实际上该文档的其它页面此时可能还是空 的。  When the client receives the notification from the server and informs that all the indirect objects required by the requested page have been sent, the page will be displayed normally, and the effect is like displaying a normal PDF document, but actually the document Other pages may still be empty at this time.

Claims

权利要求书 Claim
1.一种步进式下载显示 PDF文件的方法, 其特征在于: 包括 PDF 表现客户端程序、 内容服务端程序和预处理器程序, 所述的程序包括 以下的步骤- A method for step-by-step downloading and displaying a PDF file, comprising: a PDF presentation client program, a content server program, and a preprocessor program, the program comprising the following steps -
A、 当一个客户端希望显示一个文档的时候, 它就向内容服务端 发送一个请求, 该请求包含了文档的标识: 文档的名称或文档 ID; A. When a client wants to display a document, it sends a request to the content server, which contains the identifier of the document: the name of the document or the document ID;
B、 当内容服务端收到一个文档请求, 它就调入该文档对应的索 引数据。然后,它将文档的基本信息发送给客户端,发送内容包括(但 不局限于):  B. When the content server receives a document request, it loads the index data corresponding to the document. It then sends the basic information of the document to the client, including but not limited to:
( 1 )该文档的基本信息, 包括文档作者、 标题、 等等;  (1) Basic information about the document, including the author of the document, title, etc.;
(2)文档的总页数;  (2) The total number of pages of the document;
(3) 间接对象在扁平化文档中所处的位置和大小;  (3) The location and size of the indirect object in the flattened document;
(4)每个页面的基本属性, 包括: 页面宽度、 页面高度、 页面对 象的间接 ID;  (4) Basic properties of each page, including: page width, page height, and indirect ID of the page object;
C、 当客户端请求一个页面, 它只需简单地将页码发送给内容服 务端;  C. When the client requests a page, it simply sends the page number to the content server;
D、 当内容服务端收到一个页面请求, 它就发送以下的数据给客 户端:  D. When the content server receives a page request, it sends the following data to the client:
( 1 )代表被请求页面的间接对象;  (1) an indirect object representing the requested page;
(2)该页面的依赖列表中, 除去那些先前已经发送的对象之外, 剩下的所有间接对象。每个间接对象都是按照它原来的 PDF语法来发 送的。  (2) In the dependency list of the page, except for those objects that have been previously sent, all remaining indirect objects. Each indirect object is sent in its original PDF syntax.
PCT/CN2006/003061 2006-11-14 2006-11-14 Method for step-type downloading and displaying pdf file WO2008058423A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2006/003061 WO2008058423A1 (en) 2006-11-14 2006-11-14 Method for step-type downloading and displaying pdf file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2006/003061 WO2008058423A1 (en) 2006-11-14 2006-11-14 Method for step-type downloading and displaying pdf file

Publications (1)

Publication Number Publication Date
WO2008058423A1 true WO2008058423A1 (en) 2008-05-22

Family

ID=39401301

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2006/003061 WO2008058423A1 (en) 2006-11-14 2006-11-14 Method for step-type downloading and displaying pdf file

Country Status (1)

Country Link
WO (1) WO2008058423A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2346215A1 (en) * 2000-05-05 2001-11-05 Xerox Corporation Fast reprint file format that utilizes private tags to provide reprintable jobs that can be viewed and edited using standard tools
US6538760B1 (en) * 1998-09-08 2003-03-25 International Business Machines Corp. Method and apparatus for generating a production print stream from files optimized for viewing
CN1479899A (en) * 2001-02-05 2004-03-03 �ʼҷ����ֵ������޹�˾ Object transfor method with format adaptation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6538760B1 (en) * 1998-09-08 2003-03-25 International Business Machines Corp. Method and apparatus for generating a production print stream from files optimized for viewing
CA2346215A1 (en) * 2000-05-05 2001-11-05 Xerox Corporation Fast reprint file format that utilizes private tags to provide reprintable jobs that can be viewed and edited using standard tools
CN1479899A (en) * 2001-02-05 2004-03-03 �ʼҷ����ֵ������޹�˾ Object transfor method with format adaptation

Similar Documents

Publication Publication Date Title
JP4716612B2 (en) Method for redirecting the source of a data object displayed in an HTML document
US6772144B2 (en) Method and apparatus for applying an adaptive layout process to a layout template
JP5787963B2 (en) Computer platform programming interface
US7318193B2 (en) Method and apparatus for automatic document generation based on annotation
US7953116B2 (en) Intelligent access within a document package
US7509477B2 (en) Aggregating data from difference sources
US20040203624A1 (en) Technique for sharing of files with minimal increase of storage space usage
US20020143523A1 (en) System and method for providing a file in multiple languages
US20060218492A1 (en) Copy and paste with citation attributes
JP2004265402A (en) Method and system for extending pasting function of computer software application
CN102063483A (en) Serving font files in varying formats based on user agent type
JP2006114045A (en) Mapping of schema data into data structure
CN102043764A (en) Reduced glyph font files
JP2006526837A (en) How to browse content using page save file
JP2006178951A (en) Method and system for exposing nested data in computer-generated document in transparent manner
JP2006178952A (en) Method and system for linking data range of computer-generated document with associated xml elements
US20110295936A1 (en) Web server providing access to documents having multiple versions
US20060230057A1 (en) Method and apparatus for mapping web services definition language files to application specific business objects in an integrated application environment
JP4965014B2 (en) Data object transfer method and transfer device, activation method and activation device in computer communication network
US8108768B2 (en) Improving efficiency of content rule checking in a content management system
US7793220B1 (en) Scalable derivative services
JP5964847B2 (en) Connecting dynamic image results
EP1345135A2 (en) Apparatus, system, method and computer program product for document management
US8037090B2 (en) Processing structured documents stored in a database
US20080222183A1 (en) Autonomic rule generation in a content management system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06805241

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06805241

Country of ref document: EP

Kind code of ref document: A1