US6262675B1 - Method of compressing data with an alphabet - Google Patents

Method of compressing data with an alphabet Download PDF

Info

Publication number
US6262675B1
US6262675B1 US09/471,102 US47110299A US6262675B1 US 6262675 B1 US6262675 B1 US 6262675B1 US 47110299 A US47110299 A US 47110299A US 6262675 B1 US6262675 B1 US 6262675B1
Authority
US
United States
Prior art keywords
window
data stream
characters
match
input data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/471,102
Inventor
Balakrishna Raghavendra Iyer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US09/471,102 priority Critical patent/US6262675B1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: IYER, BALAKRISHNA RAGHAVENDRA
Application granted granted Critical
Publication of US6262675B1 publication Critical patent/US6262675B1/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3086Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77

Definitions

  • This invention relates in general to data compression, and in particular, to a method for compressing and decompressing data with an alphabet.
  • the Liv-Zempel 77 (LZ77) method is a well known method of data compression and decompression. However, it is inefficient in terms of its code space usage. This can be illustrated by an encoding and decoding example using the prior art LZ77 algorithm.
  • Input Stream a sequence of characters to be compressed
  • Coding Position a position of the character in the input stream that is currently being coded (the beginning of a lookahead buffer defined below);
  • Lookahead Buffer a character sequence from the coding position to an end of the input stream
  • Window a “backward” window of size W that contains W characters from the coding position, i.e., the last W characters previously processed;
  • Pointer a pointer to a match in the window W that also specifies the length of the match.
  • the prior art LZ77 method searches the window for the longest match with the beginning of the lookahead buffer and outputs a pointer to that match. Since it is possible that not even a one-character match can be found, the output cannot contain just pointers.
  • the prior art LZ77 method solves this problem as follows: after each pointer, it outputs the first character in the lookahead buffer after the match; if there is no match, then it outputs a null-pointer and the character at the coding position. Then, the coding position is moved further by one.
  • the steps of the prior art LZ77 encoding method comprise the following:
  • B is the number of characters to be traversed backwards in the backward window W in order to get to the starting location of the match. If there is no match, then B takes a null value (0) without loss of generality.
  • (3) C is the first character in the lookahead buffer that did not match.
  • step (iv) If the lookahead buffer is not empty, then move the coding position (and the backward window W) L+1 characters forward and return to step (ii); otherwise, terminate.
  • the column Step indicates the number of the encoding step. It completes each time the prior art LZ77 encoding method makes an output. With the prior art LZ77 method, this happens in each step of the encoding method above at (iii).
  • the column Pos indicates the coding position.
  • the first character in the input stream has the coding position 1 .
  • the column W shows the backward window.
  • the column Match shows the longest match found in the window.
  • the column Char shows the first character in the lookahead buffer after the match.
  • the column Output presents the output in the format (B,L)C.
  • (B,L) is the pointer to the Match, which provides the following instruction to the decoding method: “Go back B characters in the window and copy L characters to the output.” C is the next character.
  • the window is maintained the same way as during the encoding method.
  • the decoding method reads a triple (B,L)C from the input.
  • the decoding method outputs the sequence from the window specified by (B,L) and the character C.
  • the compression ratio achieved by the prior art LZ77 method is very good for many types of data, but the encoding method can be quite time-consuming, since there are a lot of comparisons to perform between the lookahead buffer and the window.
  • the decoding method is very simple and fast. Memory requirements are low both for the encoding and the decoding methods, since the only structure held in memory is the window, which is usually sized between 4 and 1 kilobyte.
  • the prior art LZ77 method suffers from the problem of non-optimal code space usage, because it uses two integers and one character for a code.
  • the first integer is the starting position of the match
  • the second integer is the length of the match
  • the character is the first non-matching character after the match.
  • including the first non-matching character after the match leads to compression inefficiency.
  • the present invention discloses a method, apparatus, and article of manufacture for compressing and decompressing data using an embedded alphabet to reduce code space in the compressed data.
  • FIG. 1 illustrates the hardware and software environment of the present invention
  • FIG. 2 is a flowchart that illustrates the logic of the Le′Z99 encoding method according to the preferred embodiment of the present invention.
  • FIG. 3 is a flowchart that illustrates the logic of the Le′Z99 decoding method according to the preferred embodiment of the present invention.
  • the present invention describes an improved LZ77 method of data compression and decompression that optimizes code space usage.
  • the improved LZ77 method is referred to as “the Le′Z99 method.”
  • FIG. 1 illustrates an exemplary hardware and software environment that could be used with the preferred embodiment of the present invention.
  • the present invention is typically implemented using a computer 100 , which may include, inter alia, a processor 102 , random access memory (RMA 104 , data storage devices 106 (e.g., hard, floppy, and/or CD-ROM disk drives, etc.), data communications devices 108 (e.g., modems, network interfaces, etc.), etc.
  • RMA 104 random access memory
  • data storage devices 106 e.g., hard, floppy, and/or CD-ROM disk drives, etc.
  • data communications devices 108 e.g., modems, network interfaces, etc.
  • the present invention is usually implemented in one or more computer programs 110 that comprises an encode and decode program, although different programs could be used to provide each of these functions.
  • the encode and decode program 110 accepts input data 112 and generates output data 114 , the contents of which depend upon whether the encode and decode program 110 is performing an encode method or a decode method.
  • the encode and decode program 110 , input data 112 , and output data 114 each comprises logic and/or data that is embodied in or retrievable from a device, medium, or carrier, e.g., a fixed or removable data storage device, a remote device coupled to the computer by a data communications device, etc.
  • this logic and/or data when read, executed, and/or interpreted by the computer 100 , cause the computer 100 to perform the steps necessary to implement and/or use the present invention.
  • the present invention may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof.
  • article of manufacture or alternatively, “computer program carrier”, as used herein is intended to encompass logic or instructions accessible from any computer-readable device, carrier, or media.
  • the encode and decode program 112 solves the problem of efficiency and speed by providing an Le′Z99 method with an embedded alphabet.
  • an immutable, ordered list or window A of the alphabet is attached to a backward window W.
  • the Le′Z99 method encodes the input data 112 , not based on the backward window W, but based on a coding window CW, which is a concatenation of the backward window W (which need not be a fixed size) and the alphabet window A (which generally is a fixed size). Since the alphabet window A includes all the symbols in the alphabet, every character and thus every phrase in the input data 112 will be matched.
  • B is the number of characters traversed backward in the coding window CW in order to get to the starting location of the match
  • steps (iii) and (iii)(3) of the prior art LZ77 method have been modified and deleted, respectively.
  • step (ii) matches the lookahead buffer with the coding window CW in the Le′Z99 method, instead of just the backward window W as in the prior art LZ77 method.
  • Step Pos W CW Match Code Output 1. 1 — ABC A (3,1) 2. 2 A AABC ABC (3,3) 3. 5 AABC AABCABC B (2,1) 4. 6 AABCB AABCBABC BABC (4,4)
  • the column Step indicates the number of the encoding step.
  • Each encoding step makes an output. As in the prior art LZ77 method, so too for the Le′Z99 method, this occurs at line (iii) of the encoding method above.
  • the column Pos indicates the coding position.
  • the first character in the input has the coding position 1 .
  • the column W stores the contents of the backward window.
  • the column CW stores the contents of the coding window.
  • the column Match shows the longest match found in the coding window CW.
  • the column Output presents the output in the format (B,L).
  • (B,L) is the pointer to the Match.
  • the Le′Z99 method is assured of a match of at least length one; the prior art LZ77 method cannot be so assured.
  • the Le′Z99 method uses the same number of codes to compress the string “AABCBBAC” as the prior art LZ77 method.
  • the Le′Z99 codes do not contain the extra character contained in every LZ77 code.
  • the Le′Z99 method provides more compression than the prior art LZ77 method.
  • realization of the Le′Z99 method in software and/or hardware is easier due to the simplification of the logic.
  • the coding window CW and backward window W are maintained in the same way as with the encoding method.
  • the Le′Z99 method reads a pair of integers (B,L) from the input data 112 .
  • the Le′Z99 method then outputs a sequence from the coding window CW as specified by (B,L) to the output data 114 .
  • FIG. 2 is a flowchart that illustrates the logic of encoding in the Le′Z99 method according to the preferred embodiment of the present invention.
  • Block 200 represents the encode and decode program 112 setting the coding position to the beginning of the input data 112 .
  • Block 202 represents the encode and decode program 112 finding a match in the coding window CW for the lookahead buffer, wherein the coding window CW comprises a concatenation of a backward window W and an alphabet window A
  • Block 204 represents the encode and decode program 112 outputting the pair (B,L) as the output data 114 with the following meaning: (1) B is the pointer to the match in the coding window CW and (2) L is the number of characters matched.
  • Block 206 is a decision block that represents the encode and decode program 112 determining whether the lookahead buffer is empty. If not, control transfers to Block 208 ; otherwise, the logic terminates.
  • Block 208 represents the encode and decode program 112 moving the coding position (and the backward window W) L characters forward. Thereafter, control returns to Block 202 .
  • FIG. 3 is a flowchart that illustrates the logic of decoding in the Le′Z99 method according to the preferred embodiment of the present invention.
  • Block 300 represents the encode and decode program 112 setting the decoding position to the beginning of the input data 112 .
  • Block 302 represents the encode and decode program 112 inputting the pair (B,L) with the following meaning: (1) B is the pointer to the match in the coding window CW and (2) L is the number of characters matched.
  • Block 304 represents the encode and decode program 112 decoding the pair (B,L) using the coding window CW to generate a character sequence as the output data 114 .
  • the pair (B,L) indicates that the encode and decode program 112 should position B characters in the coding window CW and copy an L character sequence to the output data 114 .
  • Block 306 is a decision block that represents the encode and decode program 112 determining whether the end of the input data 112 has been reached. If not, control transfers to Block 308 ; otherwise, the logic terminates.
  • Block 308 represents the encode and decode program 112 moving the decoding position to the next (B,L) pair in the input data 112 and moving the backward window W forward to encompass the generated character sequence. Thereafter, control returns to Block 302 .
  • the present invention discloses a method, apparatus, and article of manufacture for compressing and decompressing data using an embedded alphabet to reduce code space in the compressed data.

Abstract

An improved LZ77 data compression and decompression method, known as Le′Z99, uses an embedded alphabet to optimize code space and speed in the compressed data.

Description

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates in general to data compression, and in particular, to a method for compressing and decompressing data with an alphabet.
2. Description of Related Art
The Liv-Zempel 77 (LZ77) method is a well known method of data compression and decompression. However, it is inefficient in terms of its code space usage. This can be illustrated by an encoding and decoding example using the prior art LZ77 algorithm.
The following terms are used in describing the prior art LZ77 method:
Input Stream: a sequence of characters to be compressed;
Character: a basic data element in the input stream;
Coding Position: a position of the character in the input stream that is currently being coded (the beginning of a lookahead buffer defined below);
Lookahead Buffer: a character sequence from the coding position to an end of the input stream;
Window: a “backward” window of size W that contains W characters from the coding position, i.e., the last W characters previously processed;
Pointer: a pointer to a match in the window W that also specifies the length of the match.
With regard to encoding, the prior art LZ77 method searches the window for the longest match with the beginning of the lookahead buffer and outputs a pointer to that match. Since it is possible that not even a one-character match can be found, the output cannot contain just pointers. The prior art LZ77 method solves this problem as follows: after each pointer, it outputs the first character in the lookahead buffer after the match; if there is no match, then it outputs a null-pointer and the character at the coding position. Then, the coding position is moved further by one.
Specifically, the steps of the prior art LZ77 encoding method comprise the following:
(i) Set the coding position to the beginning of the input stream.
(ii) Find a match in the backward window W for the lookahead buffer.
(iii) output the triple (B,L)C with the following meanings:
(1) B is the number of characters to be traversed backwards in the backward window W in order to get to the starting location of the match. If there is no match, then B takes a null value (0) without loss of generality.
(2) L is the number of characters matched.
(3) C is the first character in the lookahead buffer that did not match.
(iv) If the lookahead buffer is not empty, then move the coding position (and the backward window W) L+1 characters forward and return to step (ii); otherwise, terminate.
This is best illustrated by providing an example of the prior art LZ77 encoding method. The following table describes the input data for the example, wherein the first row indicates the position and the second row indicates the corresponding character:
Pos 1 2 3 4 5 6 7 8 9
Char A A B C B B A B C
The following table illustrates the prior art LZ77 encoding method performed on the above input data:
Step Pos W Match Char Output
1. 1 A (0,0) A
2. 2 A A B (1,1) B
3. 4 AAB C (0,0) C
4. 5 AABC B B (2,1) B
5. 7 AABCBB AB C (5,2) C
The following describes the columns in the above table:
The column Step indicates the number of the encoding step. It completes each time the prior art LZ77 encoding method makes an output. With the prior art LZ77 method, this happens in each step of the encoding method above at (iii).
The column Pos indicates the coding position. The first character in the input stream has the coding position 1.
The column W shows the backward window.
The column Match shows the longest match found in the window.
The column Char shows the first character in the lookahead buffer after the match.
The column Output presents the output in the format (B,L)C. (B,L) is the pointer to the Match, which provides the following instruction to the decoding method: “Go back B characters in the window and copy L characters to the output.” C is the next character.
With regard to the prior art LZ77 decoding method, the window is maintained the same way as during the encoding method. In each step, the decoding method reads a triple (B,L)C from the input. The decoding method outputs the sequence from the window specified by (B,L) and the character C.
The compression ratio achieved by the prior art LZ77 method is very good for many types of data, but the encoding method can be quite time-consuming, since there are a lot of comparisons to perform between the lookahead buffer and the window. On the other hand, the decoding method is very simple and fast. Memory requirements are low both for the encoding and the decoding methods, since the only structure held in memory is the window, which is usually sized between 4 and 1 kilobyte.
However, the prior art LZ77 method suffers from the problem of non-optimal code space usage, because it uses two integers and one character for a code. The first integer is the starting position of the match, the second integer is the length of the match, and the character is the first non-matching character after the match. In practical terms, including the first non-matching character after the match leads to compression inefficiency.
Other prior art methods exist to code this character selectively, based on an efficiency criteria. However, each requires that the decoding method check whether it is to decode a character of a string from the window. In logic or instruction terms, the check requires a conditional branch, once for every compressed code, resulting in inefficient logic. For systems that are read intensive (such as database management systems where reads outnumber writes by 3-to-1 or more), it is necessary to speed up the decoding method, and removing conditional branches from the decoding method is one means of doing so. Thus, there is a need in the art for an improved LZ77 method that not only optimizes code space usage, but also the speed of decoding.
SUMMARY OF THE INVENTION
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for compressing and decompressing data using an embedded alphabet to reduce code space in the compressed data.
BRIEF DESCRIPTION OF THE DRAWINGS
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
FIG. 1 illustrates the hardware and software environment of the present invention;
FIG. 2 is a flowchart that illustrates the logic of the Le′Z99 encoding method according to the preferred embodiment of the present invention; and
FIG. 3 is a flowchart that illustrates the logic of the Le′Z99 decoding method according to the preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
In the following description of the preferred embodiment, reference is made to die accompanying drawings which form a part hereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural and functional changes may be made without departing from the scope of the present invention.
OVERVIEW
The present invention describes an improved LZ77 method of data compression and decompression that optimizes code space usage. Throughout this specification, the improved LZ77 method is referred to as “the Le′Z99 method.”
HARDWARE AND SOFTWARE ENVIRONMENT
FIG. 1 illustrates an exemplary hardware and software environment that could be used with the preferred embodiment of the present invention. In the exemplary environment, the present invention is typically implemented using a computer 100, which may include, inter alia, a processor 102, random access memory (RMA 104, data storage devices 106 (e.g., hard, floppy, and/or CD-ROM disk drives, etc.), data communications devices 108 (e.g., modems, network interfaces, etc.), etc. Of course, those skilled in the art will recognize that the present invention may be implemented in any number of other devices, without departing from the scope of the present invention.
In the preferred embodiment, the present invention is usually implemented in one or more computer programs 110 that comprises an encode and decode program, although different programs could be used to provide each of these functions. The encode and decode program 110 accepts input data 112 and generates output data 114, the contents of which depend upon whether the encode and decode program 110 is performing an encode method or a decode method.
Generally, the encode and decode program 110, input data 112, and output data 114 each comprises logic and/or data that is embodied in or retrievable from a device, medium, or carrier, e.g., a fixed or removable data storage device, a remote device coupled to the computer by a data communications device, etc. Moreover, this logic and/or data, when read, executed, and/or interpreted by the computer 100, cause the computer 100 to perform the steps necessary to implement and/or use the present invention.
Thus, the present invention may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture”, or alternatively, “computer program carrier”, as used herein is intended to encompass logic or instructions accessible from any computer-readable device, carrier, or media.
Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope of the present invention. For example, those skilled in the art will recognize that any number of devices and/or programs may be used to implement the present invention, so long as similar functions are performed thereby.
OPERATION OF THE ENCODE AND DECODE PROGRAM
The encode and decode program 112 solves the problem of efficiency and speed by providing an Le′Z99 method with an embedded alphabet. In this method, an immutable, ordered list or window A of the alphabet is attached to a backward window W.
For example, let A be a window comprising the entire alphabet. W, as in the prior art LZ77 method, is still the backward window. However, the Le′Z99 method encodes the input data 112, not based on the backward window W, but based on a coding window CW, which is a concatenation of the backward window W (which need not be a fixed size) and the alphabet window A (which generally is a fixed size). Since the alphabet window A includes all the symbols in the alphabet, every character and thus every phrase in the input data 112 will be matched.
The Le′Z99 encoding method is described below:
(i) set the coding position to the beginning of the input data 112;
(ii) find a match in the coding window CW for the lookahead buffer (for example, the longest match);
(iii) output the pair (B,L) with the following meaning:
(1) B is the number of characters traversed backward in the coding window CW in order to get to the starting location of the match;
(2) L is the number of characters matched;
(iv) if the lookahead buffer is not empty, then move the coding position (and the backward window W) L characters forward and return to (ii); otherwise, terminate.
To compare the Le′Z99 method with the prior art LZ77 method described above, an example is provided. It can be seen that steps (iii) and (iii)(3) of the prior art LZ77 method have been modified and deleted, respectively. Also, note that step (ii) matches the lookahead buffer with the coding window CW in the Le′Z99 method, instead of just the backward window W as in the prior art LZ77 method.
This is best illustrated by providing an example of the Le′Z99 encoding method. The following table describes the input data 112 for the example, wherein the first row indicates the position and the second row indicates the corresponding character:
Pos 1 2 3 4 5 6 7 8 9 10
Char A A B C B B A B C
The following table illustrates the Le′Z99 encoding method performed on the above input data 112:
Step Pos W CW Match Code Output
1. 1 ABC A (3,1)
2. 2 A AABC ABC (3,3)
3. 5 AABC AABCABC B (2,1)
4. 6 AABCB AABCBABC BABC (4,4)
The following describes the columns in the above table:
The column Step indicates the number of the encoding step. Each encoding step makes an output. As in the prior art LZ77 method, so too for the Le′Z99 method, this occurs at line (iii) of the encoding method above.
The column Pos indicates the coding position. The first character in the input has the coding position 1.
The column W stores the contents of the backward window.
The column CW stores the contents of the coding window.
The column Match shows the longest match found in the coding window CW.
The column Output presents the output in the format (B,L). (B,L) is the pointer to the Match. This gives the following instruction to the decoding method: “Go back B characters in the coding window CW and copy L characters to the output”, wherein B represents the displacement and L represents the length (in this embodiment, B>=1 and L>=1, although other embodiments could use a different base or coding scheme). The Le′Z99 method is assured of a match of at least length one; the prior art LZ77 method cannot be so assured.
For this example, the Le′Z99 method uses the same number of codes to compress the string “AABCBBAC” as the prior art LZ77 method. However, the Le′Z99 codes do not contain the extra character contained in every LZ77 code. For this example, therefore, the Le′Z99 method provides more compression than the prior art LZ77 method. In addition, realization of the Le′Z99 method in software and/or hardware is easier due to the simplification of the logic.
With regard to decoding in the Le′Z99 method, the coding window CW and backward window W are maintained in the same way as with the encoding method. In each step, the Le′Z99 method reads a pair of integers (B,L) from the input data 112. The Le′Z99 method then outputs a sequence from the coding window CW as specified by (B,L) to the output data 114.
LOGIC OF THE Le′Z99 ENCODING METHOD
FIG. 2 is a flowchart that illustrates the logic of encoding in the Le′Z99 method according to the preferred embodiment of the present invention.
Block 200 represents the encode and decode program 112 setting the coding position to the beginning of the input data 112.
Block 202 represents the encode and decode program 112 finding a match in the coding window CW for the lookahead buffer, wherein the coding window CW comprises a concatenation of a backward window W and an alphabet window A
Block 204 represents the encode and decode program 112 outputting the pair (B,L) as the output data 114 with the following meaning: (1) B is the pointer to the match in the coding window CW and (2) L is the number of characters matched.
Block 206 is a decision block that represents the encode and decode program 112 determining whether the lookahead buffer is empty. If not, control transfers to Block 208; otherwise, the logic terminates.
Block 208 represents the encode and decode program 112 moving the coding position (and the backward window W) L characters forward. Thereafter, control returns to Block 202.
LOGIC OF THE Le′Z99 DECODING METHOD
FIG. 3 is a flowchart that illustrates the logic of decoding in the Le′Z99 method according to the preferred embodiment of the present invention.
Block 300 represents the encode and decode program 112 setting the decoding position to the beginning of the input data 112.
Block 302 represents the encode and decode program 112 inputting the pair (B,L) with the following meaning: (1) B is the pointer to the match in the coding window CW and (2) L is the number of characters matched.
Block 304 represents the encode and decode program 112 decoding the pair (B,L) using the coding window CW to generate a character sequence as the output data 114. The pair (B,L) indicates that the encode and decode program 112 should position B characters in the coding window CW and copy an L character sequence to the output data 114.
Block 306 is a decision block that represents the encode and decode program 112 determining whether the end of the input data 112 has been reached. If not, control transfers to Block 308; otherwise, the logic terminates.
Block 308 represents the encode and decode program 112 moving the decoding position to the next (B,L) pair in the input data 112 and moving the backward window W forward to encompass the generated character sequence. Thereafter, control returns to Block 302.
CONCLUSION
This concludes the description of the preferred embodiment of the invention. The following describes some alternative embodiments for accomplishing the present invention. For example, any type of device, such as a computer, integrated circuit, or other electronic device could be used to implement the present invention. Moreover, any software program performing compression and/or decompression could benefit from the present invention.
In summary, the present invention discloses a method, apparatus, and article of manufacture for compressing and decompressing data using an embedded alphabet to reduce code space in the compressed data.
The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description.

Claims (6)

What is claimed is:
1. A method for compressing data, comprising.
(i) setting an encoding position to a beginning of an input data stream;
(ii) finding a match in a coding window CW for a lookahead buffer, wherein the coding window CW is comprised of a concatenation of a backward window W that contains W characters from the encoding position and an alphabet window that contains symbols in an alphabet, and the lookahead buffer comprises a character sequence from the encoding position to an end of the input data stream;
(iii) outputting a pair (B,L), wherein B is a pointer to the match in the coding window CW and L represents a number of characters in the match;
(iv) if the lookahead buffer is not empty, then moving the encoding position and the backward window W forward L characters in the input data stream and repeating steps (ii)-(iv); and
(v) if the lookahead buffer is empty, then terminating the method.
2. A method for decompressing data, comprising.
(i) setting a decoding position to a beginning of an input data stream;
(ii) inputting a pair (B,L), wherein B is a pointer to a match in a coding window CW comprising a concatenation of a backward window W that contains W characters generated thus far in an output data stream and an alphabet window that contains symbols in an alphabet, and L represents a number of characters in the match;
(iii) decoding the inputted pair (B,L) using the coding window CW to generate a character sequence for the output data stream, wherein inputted pair (B,L) indicates that L characters from a position B characters in the coding window CW are copied to the output data stream;
(iv) if the decoding position is not at an end of the input data stream, then moving the decoding position one pair (B,L) forward in the input data stream, moving the backward window W forward to encompass the generated character sequence, and repeating steps (ii)-(iv); and
(v) if the decoding position is at an end of the input data stream, then terminating the method.
3. An apparatus for compressing data, comprising.
(i) means for setting an encoding position to a beginning of an input data stream;
(ii) means for finding a match in a coding window CW for a lookahead buffer, wherein the coding window CW is comprised of a concatenation of a backward window W that contains W characters from the encoding position and an alphabet window that contains symbols in an alphabet, and the lookahead buffer comprises a character sequence from the encoding position to an end of the input data stream;
(iii) means for outputting a pair (B,L), wherein B is a pointer to the match in the coding window CW and L, represents a number of characters in the match;
(iv) means for moving the encoding position and the backward window W forward L characters in the input data stream, if the lookahead buffer is not empty, and means for repeating the means (ii)-(iv); and
(v) means for terminating, if the lookahead buffer is empty.
4. An apparatus for decompressing data, comprising:
(i) means for setting a decoding position to a beginning of an input data stream;
(ii) means for inputting a pair (B,L), wherein B is a pointer to a match in a coding window CW comprising a concatenation of a backward window W that contains W characters generated thus far in an output data stream and an alphabet window that contains symbols in an alphabet, and L represents a number of characters in the match;
(iii) means for decoding the inputted pair (B,L) using the coding window CW to generate a character sequence for the output data stream, wherein inputted pair (B,L) indicates that L characters from a position B characters in the coding window CW are copied to the output data stream;
(iv) means for moving the decoding position one pair (B,L) forward in the input data stream, moving the backward window W forward to encompass the generated character sequence, if the decoding position is not at an end of the input data stream, and for repeating the means (ii)-(iv); and
(v) means for terminating, if the decoding position is at an end of the input data stream.
5. An article of manufacture embodying logic for compressing data, the logic comprising.
(i) setting an encoding position to a beginning of an input data stream;
(ii) finding a match in a coding window CW for a lookahead buffer, wherein the coding window CW is comprised of a concatenation of a backward window W that contains W characters from the encoding position and an alphabet window that contains symbols in an alphabet, and the lookahead buffer comprises a character sequence from the encoding position to an end of the input data stream;
(iii) outputting a pair (B,L), wherein B is a pointer to the match in the coding window CW and L represents a number of characters in the match;
(iv) if the lookahead buffer is not empty, then moving the encoding position and the backward window W forward L characters in the input data stream and repeating steps (ii)-(iv); and
(v) if the lookahead buffer is empty, then terminating the method.
6. An article of manufacture embodying logic for decompressing data, the logic comprising.
(i) setting a decoding position to a beginning of an input data stream;
(ii) inputting a pair (B,L), wherein B is a pointer to a match in a coding window CW comprising a concatenation of a backward window W that contains W characters generated thus far in an output data stream and an alphabet window that contains symbols in an alphabet, and L represents a number of characters in the match;
(iii) decoding the inputted pair (B,L) using the coding window CW to generate a character sequence for the output data stream, wherein inputted pair (B,L) indicates that L characters from a position B characters in the coding window CW are copied to the output data stream;
(iv) if the decoding position is not at an end of the input data stream, then moving the decoding position one pair (B,L) forward in the input data stream, moving the backward window W forward to encompass the generated character sequence, and repeating steps (ii)-(iv); and
(v) if the decoding position is at an end of the input data stream, then terminating the method.
US09/471,102 1999-12-21 1999-12-21 Method of compressing data with an alphabet Expired - Fee Related US6262675B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/471,102 US6262675B1 (en) 1999-12-21 1999-12-21 Method of compressing data with an alphabet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/471,102 US6262675B1 (en) 1999-12-21 1999-12-21 Method of compressing data with an alphabet

Publications (1)

Publication Number Publication Date
US6262675B1 true US6262675B1 (en) 2001-07-17

Family

ID=23870263

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/471,102 Expired - Fee Related US6262675B1 (en) 1999-12-21 1999-12-21 Method of compressing data with an alphabet

Country Status (1)

Country Link
US (1) US6262675B1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6501395B1 (en) * 2002-04-10 2002-12-31 Hewlett-Packard Company System, method and computer readable medium for compressing a data sequence
US20030020639A1 (en) * 1998-08-13 2003-01-30 Fujitsu Limited Encoding and decoding apparatus using context
US20060106870A1 (en) * 2004-11-16 2006-05-18 International Business Machines Corporation Data compression using a nested hierarchy of fixed phrase length dictionaries
US7053803B1 (en) * 2005-01-31 2006-05-30 Hewlett Packard Development Company, L.P. Data compression
US9287893B1 (en) 2015-05-01 2016-03-15 Google Inc. ASIC block for high bandwidth LZ77 decompression

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4748638A (en) * 1985-10-30 1988-05-31 Microcom, Inc. Data telecommunications system and method for transmitting compressed data
US4814746A (en) 1983-06-01 1989-03-21 International Business Machines Corporation Data compression method
US4876541A (en) * 1987-10-15 1989-10-24 Data Compression Corporation Stem for dynamically compressing and decompressing electronic data
US5406279A (en) 1992-09-02 1995-04-11 Cirrus Logic, Inc. General purpose, hash-based technique for single-pass lossless data compression
US5412384A (en) 1993-04-16 1995-05-02 International Business Machines Corporation Method and system for adaptively building a static Ziv-Lempel dictionary for database compression
US5424732A (en) * 1992-12-04 1995-06-13 International Business Machines Corporation Transmission compatibility using custom compression method and hardware
US5448733A (en) 1993-07-16 1995-09-05 International Business Machines Corp. Data search and compression device and method for searching and compressing repeating data
US5455576A (en) * 1992-12-23 1995-10-03 Hewlett Packard Corporation Apparatus and methods for Lempel Ziv data compression with improved management of multiple dictionaries in content addressable memory
US5521597A (en) 1993-08-02 1996-05-28 Mircosoft Corporation Data compression for network transport
US5532693A (en) 1994-06-13 1996-07-02 Advanced Hardware Architectures Adaptive data compression system with systolic string matching logic
US5561421A (en) 1994-07-28 1996-10-01 International Business Machines Corporation Access method data compression with system-built generic dictionaries
US5572206A (en) * 1994-07-06 1996-11-05 Microsoft Corporation Data compression method and system
US5608396A (en) 1995-02-28 1997-03-04 International Business Machines Corporation Efficient Ziv-Lempel LZI data compression system using variable code fields
US5694125A (en) 1995-08-02 1997-12-02 Advance Hardware Architecture Sliding window with big gap data compression system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4814746A (en) 1983-06-01 1989-03-21 International Business Machines Corporation Data compression method
US4748638A (en) * 1985-10-30 1988-05-31 Microcom, Inc. Data telecommunications system and method for transmitting compressed data
US4876541A (en) * 1987-10-15 1989-10-24 Data Compression Corporation Stem for dynamically compressing and decompressing electronic data
US5406279A (en) 1992-09-02 1995-04-11 Cirrus Logic, Inc. General purpose, hash-based technique for single-pass lossless data compression
US5424732A (en) * 1992-12-04 1995-06-13 International Business Machines Corporation Transmission compatibility using custom compression method and hardware
US5455576A (en) * 1992-12-23 1995-10-03 Hewlett Packard Corporation Apparatus and methods for Lempel Ziv data compression with improved management of multiple dictionaries in content addressable memory
US5412384A (en) 1993-04-16 1995-05-02 International Business Machines Corporation Method and system for adaptively building a static Ziv-Lempel dictionary for database compression
US5448733A (en) 1993-07-16 1995-09-05 International Business Machines Corp. Data search and compression device and method for searching and compressing repeating data
US5521597A (en) 1993-08-02 1996-05-28 Mircosoft Corporation Data compression for network transport
US5532693A (en) 1994-06-13 1996-07-02 Advanced Hardware Architectures Adaptive data compression system with systolic string matching logic
US5572206A (en) * 1994-07-06 1996-11-05 Microsoft Corporation Data compression method and system
US5561421A (en) 1994-07-28 1996-10-01 International Business Machines Corporation Access method data compression with system-built generic dictionaries
US5608396A (en) 1995-02-28 1997-03-04 International Business Machines Corporation Efficient Ziv-Lempel LZI data compression system using variable code fields
US5694125A (en) 1995-08-02 1997-12-02 Advance Hardware Architecture Sliding window with big gap data compression system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Wan et al. (1994) IEEE International Conference on Neural Networks, pp. 4384-4389.

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030020639A1 (en) * 1998-08-13 2003-01-30 Fujitsu Limited Encoding and decoding apparatus using context
US6563438B2 (en) * 1998-08-13 2003-05-13 Fujitsu Limited Encoding and decoding apparatus with matching length means for symbol strings
US20030102989A1 (en) * 1998-08-13 2003-06-05 Fujitsu Limited Coding apparatus and decoding apparatus
US6778103B2 (en) 1998-08-13 2004-08-17 Fujitsu Limited Encoding and decoding apparatus using context
US6906644B2 (en) 1998-08-13 2005-06-14 Fujitsu Limited Encoding and decoding apparatus with matching length means for symbol strings
US6501395B1 (en) * 2002-04-10 2002-12-31 Hewlett-Packard Company System, method and computer readable medium for compressing a data sequence
US20060106870A1 (en) * 2004-11-16 2006-05-18 International Business Machines Corporation Data compression using a nested hierarchy of fixed phrase length dictionaries
US7053803B1 (en) * 2005-01-31 2006-05-30 Hewlett Packard Development Company, L.P. Data compression
US9287893B1 (en) 2015-05-01 2016-03-15 Google Inc. ASIC block for high bandwidth LZ77 decompression

Similar Documents

Publication Publication Date Title
JP3273119B2 (en) Data compression / decompression device
US6597812B1 (en) System and method for lossless data compression and decompression
US7403136B2 (en) Block data compression system, comprising a compression device and a decompression device and method for rapid block data compression with multi-byte search
JP3309031B2 (en) Method and apparatus for compressing and decompressing short block data
US5870036A (en) Adaptive multiple dictionary data compression
US6650261B2 (en) Sliding window compression method utilizing defined match locations
US6982661B2 (en) Method of performing huffman decoding
JPH0779262B2 (en) Encoding method of compressed data
US5650783A (en) Data coding/decoding device and method
JPH07336237A (en) System and method for compressing data information
US5392036A (en) Efficient optimal data recopression method and apparatus
JPS6356726B2 (en)
US7656320B2 (en) Difference coding adaptive context model using counting
KR100906041B1 (en) Font compression and retrieval
US6262675B1 (en) Method of compressing data with an alphabet
US5010344A (en) Method of decoding compressed data
JP3242795B2 (en) Data processing device and data processing method
Ghuge Map and Trie based Compression Algorithm for Data Transmission
JP3241787B2 (en) Data compression method
JP2823918B2 (en) Data compression method
JPH0946235A (en) Data compression device
JPH02190080A (en) Picture encoding device
CN114637459A (en) Device for processing received data
JP2799228B2 (en) Dictionary initialization method
JPH06326616A (en) Constituting method for huffman decoding table

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:IYER, BALAKRISHNA RAGHAVENDRA;REEL/FRAME:010481/0576

Effective date: 19991203

FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20090717