WO1995001677A1

WO1995001677A1 - Method and apparatus for encoding and decoding compressed data in data communication

Info

Publication number: WO1995001677A1
Application number: PCT/US1994/006310
Authority: WO
Inventors: James A. Pasco-Anderson; Jeff Klayman; Frank Fulling; Mark Miner
Original assignee: Codex, Inc.
Priority date: 1993-06-30
Filing date: 1994-06-06
Publication date: 1995-01-12
Also published as: JPH08502397A; CN1111467A; EP0667064A1

Abstract

A method and apparatus for encoding and decoding compressed data in data communications includes construction of a tree structure by the decoder using children counters (62) in non-first level nodes (72, 78, 79). A digital communication device utilizes the tree structure for transmission of signals.

Description

METHOD AND APPARATUS FOR ENCODING AND DECODING COMPRESSED DATA IN DATA COMMUNICATION

Field of the Invention

This application relates to digital communication equipment, and more particularly to data compression systems and methods which improve the efficiency and speed of data communication.

Background of the Invention

Data communication is the movement of computer- encoded information from one point to another by means of a transmission system. Data communication results in nearly instantaneous information exchange over long distances.

Data communication links data terminal equipment (DTE) such as a terminal, printer or computer that transmits or receives data. Data communication equipment (DCE) is a device attached between a DTE and the communication channel that manipulates the transmitted signal or data. The DCE usually comprises a microprocessor and random access memory (RAM). The communication channel is often a telephone network, although it could be a cellular network, a digital communication network, or a satellite network.

The information sent by a transmitter DTE (TXDTE) to a receiver DTE (RXDTE) consists of a sequence of characters. The information generally contains a significant amount of redundancy. The information, therefore, may be compressed so that it can be transmitted in less time over a communication channel.

Among known data compression methods is the Ziv- Lempel '78 algorithm ("ZL78"). In the ZL78 algorithm, the transmit DCE (TXDCE) records the history of recentiy transmitted data by storing the strings in a vocabulary (also referred to as the "vocabulary tree") stored in the TXDCE RAM. By comparing successive elements of the current data with the vocabulary, redundant data is found. The TXDCE, instead of sending the entire redundant sequence, sends a codeword which points to the location of the earlier occurrence of the redundant data in the vocabulary tree. Data compression occurs whenever the number of bits required to send the codeword is less than the number of bits in the redundant data sequence. Another data compression method is discussed in a pending application 07/976,298 by Brian Ta-Cheng Hou, Craig D. Cohen, James A. Pasco-Anderson, and Michael Gutman, and assigned to the assignee herein. The information contained therein is incorporated into this application.

At the other end of the channel, the receiver DCE (RXDCE) maintains a vocabulary in the RXDCE RAM similar to that maintained by the TXDCE. Upon receipt of the codeword from the TXDCE, the RXDCE uses the codeword to find the redundant data sequence in the vocabulary. The RXDCE then transmits the data sequence to the TXDTE. As noted previously, data compression occurs whenever the number of bits required to send the codeword is less than the number of bits in the redundant data sequence. In some instances, such as the information is close to a random sequence of characters, the codewords are actually longer than the original data, in which case data expansion (as opposed to data compression) may occur. When data expansion may occur, the TXDCE signals the RXDCE to operate without data compression. This is method of communication without data compression is called "transparent mode" (TM). The TXDCE then monitors the TXD to determine if compression would be beneficial. If compression would be beneficial, the TXDCE signals the RXDCE to begin compression, and operate in "compressed mode" (CM). In a normal communication session, the TXDCE and the RXDCE may switch back and forth between CM and TM several times. Other methods may be found in Clark, U.S. Patent No. 5,177,480.

As the information is transmitted, the TXDCE builds a vocabulary according to a set of rules. The vocabulary is a tree structure data base with various levels of interconnected nodes. A full description of a procedure for building the tree, updating the tree, deleting nodes from the tree and adding nodes to the tree may be found in Clark, U.S. Patent No. 5,153,591 and Welsh, U.S. 4,558,302. Such a tree structure has been implemented in V.42bis applications for the CCITT

(Comite Consultatif International de Telegraphie et Telephonie). Each node in the decoder vocabulary (other than the first level nodes) requires nine bytes: one byte for character represented by the node, two bytes for down pointer, two bytes for left pointer, two bytes for right pointer and two bytes for up pointer. Thus, the memory required by the decoder tree structure is signficant. This is especially true when one DCE contains multible vocabulai instances such as is described in the application "Dynamic Vocabulary Storage for Adaptive Date Compression of Frame-Multiplexed Traffic", 07/976,298 by

Brian Ta-Cheng Hou, Craig D. Cohen, James A. Pasco-Anderson, and Michael Gutman, and assigned to the assignee herein.

Further, whenever a leaf node is added to the tree, the pointers for adjacent leaf nodes must be updated, requiring that the pointer be read and wrote to for each leaf. Such operations consume the time of the microprocessor, and thus reduce the throughput of the DCE.

When the RAM for the vocabulary is filled, nodes are deleted. Deletion of the nodes requires the modification of all the pointers associated with the deleted node, which results in time consumption and slower throughput by the RXDCE.

Prior methods of encoding and decoding allow the string matching procedure to terminate prior to a longest string match, at which time the codeword for any partially matched string is sent. In some V.42bis implementations, this may occur due to a mode switch or flush. In some extensions to V.42bis for sync data compression, this can occur due to an end-of-frame or sync error. The first character following a premature termination of the string matching procedure is treated as an unmatched character by the encoder. The encoder normally adds an unmatched character to the matched string and begins a new string with the character. However, when the string matching procedure has terminated prior to a longest string match, the next character may already be in the vocabulary. Therefore, the encoder searches for the character in the vocabulary and adds only the character if the character is not already in the vocabulary.

When the decoder receives the codeword for the new string, it treats the innovation character as the unmatched character from the previous string. The decoder will normally add the innovation character to the previous string and begin a new string with the innovation character. However, because the encoder can terminate the string matching procedure prior to finding the longest string match, the innovation character may already be in the vocabulary. Therefore, the decoder searches for the innovation character in the vocabulary and only adds the innovation character if it is not already in the vocabulary.

In a typical implementation, the decoder requires nine bytes for each non-root node: one byte for character, two bytes for down pointer, two bytes for left pointer, two bytes for right pointer, and two bytes for up pointer. This data structure allows the decoder to move down, across, and up the tree structure. In those implementations, a vocabulary built during transparent mode is used also compressed mode. The decoder must therefore maintain the vocabulary during transparent mode. Vocabulary maintenance during transparent mode requires the decoder to perform a search operation, which involves moving down and across the tree structure. The decoder must check for a duplicate string in the vocabulary before adding a node during compressed mode, and the decoder must move down and across the tree structure during transparent mode. Therefore, the decoder must maintain the full tree structure (i.e. the ability to move down and across the tree) during compressed mode.

The search operation in transparent mode and the check for a duplicate string in compressed mode require that the decoder move down and across the tree. Adding and deleting nodes requires the updating of all of the pointers. Thus, the ability to move down and across the tree structure is expensive in terms of microprocessor usage.

Thus, both the encoder and decoder maintain the escape character in both transparent mode and compressed mode, even though the escape character is only used in transparent mode. To maintain the escape character, both the encoder and decoder must check each character in the data for occurrences of the escape character and must update the escape character when one is detected in the data. These methods for encoding and decoding compressed data therefore require significant amounts of memory and microprocessor usage. A method using less memory and microprocessor usage would be valuable

Brief Description of the Drawings

Fig. 1 is a block diagram of a DCE attached to a DTE.

Fig. 2 is a functional block diagram of the DCE in both transmit and receive modes, thereby forming a data communication system.

Fig. 3 shows an anti-expansion control.

Fig. 4 is a representation of a vocabulary node of the preferred embodiment.

Fig. 5 is a representation of a tree structure of the preferred embodiment. Fig. 6 shows a method used in a DCE. Fig. 7 shows a method of processing of a character. Fig. 8 shows a method for processing a command. Fig. 9 shows a method for testing compression. Fig. 10 shows a method for escape character procedure. Fig. 11 shows a method of exception processing of the next character.

Fig. 12 shows a method of operation of data compression decoder. Fig. 13 shows a method of decoder operation during transparent mode.

Description of the Preferred Embodiment

(As is commonly used in data communication, an "RX" prefix indicates "receiver", while a "TX" prefix indicates "transmitter".)

Fig. 1 shows a block diagram of a data communication system. DTE 10 is coupled to DCE 12. DTE 10 sends information for transmission (TXD) to DCE 12. Similarly, DTE 10 obtains received information (RXD) from DCE 12. DCE 12 consists of microprocessor 14. Microprocessor 14 performs the functions of a data compression encoder T6, a transmit data pump 18, a data compression decoder 20, and a receive data pump 22. Data compression encoder 16 takes TXD and compresses the TXD into codewords, if possible.. Transmit data pump 18 sends the compressed TXD via communication channel 30 to a DCE/DTE pair at some other location.

Similarly, receive data pump 22 obtains compressed RXD from communication channel 24. Data compression decoder 20 then decompresses the compressed RXD into RXD for use by DTE 10.

RAM 26 is coupled to microprocessor 14. RAM 26 contains, among other things, the vocabulary and the program controlling the microprocessor. - / -

Fig. 2 shows a functional block diagram of DCE 12 of Fig. 1 in both transmit and receive modes. TXDCE 26 communicates with RXDCE 28 by way of communication channel 30. (In most cases, a DCE contains both a TXDCE and a RXDCE.) TXDCE 26 receives TXD via the transmit DTE interface

(TXDTE) 32. TXD then goes to data compression encoder 34 and escape character handler 36. Escape character handler 36 processes escape characters that are commands to the DCEs rather than information to be transmitted between the DTEs. Encoder vocabulary 35 is read and wrote to by data compression encoder 34. If TXDCE 26 is operating in TM, anti- expansion control 38 receives characters from escape handler 36. If TXDCE 26 is operating in CM, anti-expansion control 38 receives codewords from data compression encoder 34. TX error correction 40 receives data from anti-expansion control 38, and sends the data to TX data pump 18 for transmission via communication channel 30 to RXDCE 28. The transmit anti-expansion control 38 may reset the encoder vocabulary 35 via the reset line. RX data pump 22 receives data from communication channel 30. RX error correction 42 processes the data, and sends the data to the decoder anti- expansion control 44. In compressed mode (CM), the data is a codeword, and therefore is sent to the data compression decoder 46. The data compression decoder 46 then decodes the codeword by using decoder vocabulary 47, and send the character string represented by the codeword to the RX DTE interface 50.

In transparent mode (TM), the data is sent from the decoder anti-expansion control 44 to decoder escape character handler 48. After processing by the escape character handler 48, the data is sent to the RX DTE interface 50.

In contrast to the method described in V.42bis, both the encoder escape character handler 36 and the decoder escape character handler 48 do not operate on CM data or when in CM. This results in significant saving in processor cycles as compared to other decoding/encoding methods.

Fig. 3 shows an anti-expansion control 38 for TXDCE 26 in block form. Anti-expansion control 38 receives transparent mode data (TM data) and compressed mode data (CM data). Transparent mode data handler 54 interprets TM data. TM data handler 54 sends the TM data to the TX error correction 40. It also may send a reset memory (RM) to the encoder vocabulary 35 and an ENTER COMPRESSED MODE (ECM) control character to the RXDCE 28. (Other control characters may be sent to RXDCE 28 as are described in V.42bis.)

Compressed mode data handler 56 sends the CM data to the TX error correction 40 as well as an ENTER TRANSPARENT MODE (ETM) command codeword. Other command codewords could also be sent to RXDCE 28.

Fig. 4 shows the data structure for non-first level node 60 in the preferred embodiment. Node 60 has one character byte 66, two bytes for an up pointer 62, and two bytes for a children counter 64. A portion of the tree structure of an example decoder vocabulary is shown in Fig. 5.

The tree structure of the vocabulary is shown in Fig. 6. The characters strings represented by the tree are "T", "TH", "THE", "THI", "THIS", "TO", "TOI", "TU", and "TUG". In the preferred embodiment, each of the possible 256 single-character strings in the TXD data are always represented in the vocabulary tree. Since they all share a common parent which is the root of the tree, level- 1 nodes do not require storage in the vocabulary. In this embodiment, the up pointer 62 of all level- two nodes store the character of its parent level- 1 node. By representing the up pointers of all non level- 1 or level-2 nodes by values greater than 255, possible ambiguity as to the meaning of the up pointer 62 is avoided. The string decoder procedure terminates when the up pointer value is less than 256. The level-2 node representing the string "TH" 72 has an up pointer to level 1 node "T" that contains the character "T" as described and has a children counter. It contains the character "H" that is the suffix character for the string "TH". Level-2 node "TH" 72 contains the character "H". It has two children, "THE" and "THI", and thus the children counter for node 72 is two. The children counter does not count offspring other than children and so the children counter for node 72 does not include grandchild node 77. Level-3 node "THE" 74 contains the character "E". It has no children

(associated level 4 nodes), and thus the children counter for node 74 is zero. Its parent is node 72 and thus the up pointer for node 74 contains the memory address of node 72.. Level-3 node "THI" 76 contains the character "I". It has a single child which is the level-4 node "THIS" 77, and thus the children counter for node 76 is one. Its parent is node 72 and thus the up pointer for node 76 contains the memory address of node 72. Level-4 node "THIS" 77 contains the character "S". It has no children, and thus the children counter for node 77 is zero. Its parent node is node 76 and thus the up pointer for node 77 contains the memory address of node 76.

The level-2 node representing the string "TO" 78 contains the character "O". It has no children and thus the children counter for node 78 is zero. Its parent node is the level- 1 node "T" and thus the up pointer for node 78 contains the character value "T". Level-2 node "TU" 79 contains the character "U". It has no a single child which is the level-3 node "TUG" 80, and thus the children counter for node 79 is one. Its parent node is the level- 1 node "T" and thus the up pointer for node 79 contains the character value "T". Level-3 node representing the string "TUG" 80 contains the character "G". It has no children and thus the children counter for node 80 is zero. Its parent node is node 79 and thus the up pointer for node 80 contains the memory address of node 79. To delete a node, the children counter for the upper node is merely decremented.

For example, if level 3 node 74 representing the string "THE" was to be deleted, then the children counter for the parent of level 3 node 74, in this case level 2 node 72 for the string "TH", would be reduced by one to zero.

Some prior references recognize that nodes that are to be deleted should have no children (leaf nodes). Such a node is recognized in the preferred embodiment as a node with children counter containing zero. This is the reason that a children counter is included in the storage of each node. Since level- 1 nodes may not be deleted in this embodiment, an offspring counter is not required for level- 1 nodes.

If a node were to be added, then the children counter for the upper node is incremented. For example, if the string

"TUGZ" were to be added to the tree, it would follow leaf node 80 representing the string "TUG", the children counter for leaf node 80 would be incremented to one.

Adding a node consists of incrementing a single memory location, while deleting a node consists of decrementing a single memory location. As is well known, decrements and increments of memory locations by microprocessors are two of the faster operations performed by microprocessors. Thus, the processing of adding and deleting nodes is very quick. Additionally, the memory overhead for the preferred embodiment is likewise small since each non-level 1 node requires only five bytes. With less memory per node, more nodes can be held within a given amount of RAM. Alternatively, for a given number of nodes, 44% less RAM is required.

With less processing per character, increased DCE throughput can be achieved for a given number of processor cycles. Alternatively, for a given throughput, this saving in processor cycles can be used by the microprocessor for other needs. Simulations of the use of this data structure indicates that the throughput is increased by more than 20%.

Fig. 6 shows a method used in the above described device. The communication begins (block 200). The DCE waits for a character (block 202). If a character is received (block 204), then the character is processed (block 206; see Fig. 7). If a flush, EOF (end of frame), or SYNCH_ERROR (synchronization error) command request is received (block 210), then the command is processed (block 210; see Fig. 8). If a test compression request is received (block 212), then a test is performed to determine whether compression should start or end, depending upon the current state (see Fig. 9).

Fig. 7 shows the processing of a character (block 206). The dictionary is searched for a string plus the next character (block 208). If the string plus the next character was not found (block 210), then the string is added to the dictionary (block 212).

If the string was found (block 210), then the string plus the next character is tested to determine whether that is the same as the previously sent string (block 214). If so, then that string is added to the dictionary (block 212). If not, the string is set to the string plus the next character (block 216).

If the string was not found, then the mode is tested (block 211). If the system is operating in compressed mode, the codeword is sent (block 213). The string is then added to the dictionary (block 212).

Whether or not the string was found (block 210), the string is initialized to the unmatched character (block 218). The DCEs are then checked to determine if they are operating in compressed or transparent mode (block 220). If the DCEs are operating in compressed mode, the processing of the character ends (block 226). On the other hand, if the DCEs are operating in the transparent mode, the character is sent (block 222) and the escape character procedure is applied (block 224, see Fig. 10, described below), and the processing exited (block 226).

Fig. 8 shows the method for processing a command (block

210). The DCEs are checked to determine the mode (block 230). If they are operating in transparent mode, the buffered characters are sent (block 232), and the command processing ends.

If the DCEs are operating in compressed mode, the data is checked to see if the string is empty (block 234). If so, then the character is sent (block 236). Otherwise, the codeword is sent (block 238). The command is then sent (block 240), and the next character is exception processed (block 242; see Fig.

11, described hereinafter). The processing of the command then ends (block 244). Fig. 9 shows the method for testing compression (block

212). A compression test is employed to determine whether compression would result in faster transmission of the information (block 250). Next, it is determined whether the mode should be changed (block 252). If a change in mode is not required, then the test compression procedure is ended

(block 254).

If the mode should be changed, then the present mode of the DCEs is checked (block 256).

If in compressed mode, the string is checked to determine if the string is empty (block 258). If not, the codeword is sent (block 260), and the command is sent to enter transparent mode (block 262. If the string is empty (block

258, the command to enter compressed mode is immediately sent (block 262). The escape character is then initialized (block 264, and transparent mode is entered (block 266). The next character is then exception processed (block 268), and the procedure terminated (block 254).

If in transparent mode, the escape character is sent

(block 270), the decoder dictionary is re-initialized (block 272), the enter compressed mode command is sent (block 274), and compressed mode is entered (block 276).

The next character is exception processed (block 268), and the procedure exited, (block 254). Fig. 10 shows the process for escape character procedure

(block 224, referred to in Fig. 7). The character is tested to determine whether the character is an escape character (block 280). If not, the procedure is exited. If so, an escape in data control character is sent (block 282, and the escape character is updated (block 284).

Fig. 11 shows the exception processing of the next character (block 242 referred to in Fig. 8). The DCE waits for the next character (block 290). If a FLUSH, EOF, or SYNC_ERROR command request is received (block 292), the DCE waits for the next character (block 290).

If a character is received (block 294), the string is initialized to that character (block 296). The mode is then checked (block 298). In compressed mode, nothing further happens, and the procedure is exited (block 300). In transparent mode, the character is sent (block 202), the escape character is sent (block 204), and the program exited (block 200).

Fig. 12 shows the operation of data compression decoder 46 in compressed mode (block 210). Decoder 46 waits for codeword (block 212. On receipt of codeword (block 214), the codeword is checked (block 216) to determine if it is a command codeword. If the codeword is a FLUSH, EOF or SYNC_ERROR command codeword, the command is processed. If the codeword is an ENTER TRANSPARENT MODE (ETM) command codeword generated by TXDCE 26 (see 274, Fig. 9), the escape character is initialized (block 220), and transparent mode is entered (block 222; see Fig. 13, described hereinafter). If a string codeword is received, the string is decoded from the codeword (block 224). The method of decoding from the codeword is described in Clark, U.S. 5,153,591 and Welsh, U.S. 4,558,302. If the prior codeword is equal to the string (block 326), the vocabulai is updated (block 328), and the string is sent to TX data pump 18 (block 330).

Fig. 13 shows the decoder operation during transparent mode (block 322). RXDCE 28 waits for a character (block 340). When a character is received (block 342), the character is checked to determine if the character is an escape character (block 344).

If not, the character is put into the output buffer (block 346).

If so, the escape character is updated (block 348). RXDCE 28 waits for the next character (block 350), gets the command (block 352), then executes the command (block 354). If the command is an ENTER COMPRESSED MODE (ECM) command (see block 274, Fig. 9), decoder vocabulary 47 is re-initialized (block 356), and the procedure exited (block 358). Note that reinitialization of vocabularies 35, 47 could be entire deletion of the current vocabulary or could be resetting vocabulary 35, 47 to a tree structure either identified externally or identified by negotiation between the TXDCE 26 and RXDCE 28. Otherwise, the prior escape character is put into the output buffer (block 344), and the RXDCE 28 again waits for the next character (block 324).

Fig. 14 shows a method for updating the decoder vocabulary (block 400). A candidate free node is selected (block 402). The candidate node is examined to determine the number of children of the node, (block 404). If the children node is not zero, then a new candidate node is selected (block 402).

If the children counter is zero, then the first character of the current string is stored into the character byte of the free node (block 406), the children counter of the free node is zeroed (block 408), the memory location of the parent node is stored into the up pointer of the free node (block 410), and the children counter of the parent node is incremented (block 412). The procedure is then exited (block 414). CONCLUSION

As show in Figure 4, the decoder data structure requires minimally a character field, an up pointer field, and a children counter field. The decoder need not move down or across the tree and therefore there are no down, left, or right pointers. The down pointer is replaced by the children counter field, which is a count of the number of children of a node, and is also used to determine if a node is a leaf node. The improvements result in reduced RAM required by the decoder and reduced processor usage to maintain the decoder data structure. In the preferred embodiment, the character field is one byte, the up pointer field is two bytes, and the children counter field is two bytes. The children counter field is two bytes because a node may have from zero (0) to 256 children which requires more than one byte for storage. Another solution uses a one byte children counter field and one bit flag elsewhere, possibly in the up pointer field.

The decoder is able to use a data structure like the one shown in Figure 4 because it does not check for a duplicate string before adding an innovation character to the previous string. The decoder does not maintain the vocabulary during transparent mode. The vocabulary is reset by both the encoder and decoder when switching from transparent mode to compressed mode. The children counter field of a node is incremented when a "child" is added and decremented when a "child" is deleted, and the node is a leaf node when the children counter field is zero .

In the preferred embodiment, whenever the string matching procedure is terminated by the encoder prior to a longest string match, a new string is started with the next character, but the character is not added to the vocabulary. This will occur after a mode switch, flush, end-of-frame, or sync_error. The encoder explicitly signals the decoder of a mode switch, flush, end-of-frame, or sync_error (the FLUSH codeword is sent after every flush operation). The decoder adds the innovation character of a string to the previous string without checking for a duplicate string, except the first innovation character following a mode switch, flush, end-of- frame, or sync_error. In another implementation, the decoder adds the innovation character of a string to the previous string, even if a duplicate is added. The encoder may or may not add duplicate strings to its vocabulary. If the encoder does not add a duplicate string to its vocabulary, then the encoder reserves the codeword for the node that would have been added as if the node had been added. The encoder does not send the codeword for any duplicate strings in the vocabulary. Duplicate strings are not used to build longer strings. Nodes representing duplicate strings remain leaf nodes, and are deleted from the tree in the course of vocabulary maintenance. The encoder and decoder need not maintain the escape character during compressed mode. Thus, the encoder and decoder do not check each character in the data for the escape character and update the escape character when one is found in the data. In the preferred embodiment, either the encoder or the decoder may continue to maintain the escape character during compressed mode, and the escape character is reset to its initial value (0) when switching from compressed mode to transparent mode. Another implementation requires that neither the encoder nor the decoder modify the escape character during compressed mode.

We claim:

Claims

1. A digital communication system coupling a first and second digital terminals for communication of signals, the signals comprising digital information, by way of a communication channel, the system comprising: a first digital communication device having: an interface coupling the digital communication device with the first digital terminal; a data compression encoder; an encoder vocabulary coupled to the data compression encoder, the encoder vocabulary using a first tree structure for storing entries in the encoder vocabulary; and a transmit data pump coupled to the communication channel; and a second digital communication device having: a receiver data pump coupled to the communication channel; an interface coupling the second digital terminal with the second digital communication device; a data compression decoder; a decoder vocabulary coupled to the data compression decoder, the decoder vocabulary having a second tree structure for storing entries in the decoder vocabulary.

2. The digital communication system of claim 1 where the first digital communication device includes an encoder anti- expansion control for controlling whether the data compression encoder is enabled.

3. The digital communication system of claim 2 where the encoder anti-expansion control further comprises means for disabling the data compression encoder.

4. The digital communication system of claim 3 where the encoder anti-expansion control further comprises means for enabling or disabling the data compression decoder of the second digital communication device.

5. A data compression decoder coupled to a communication channel, the decoder receiving signals comprising compressed information, the decoder coupled to a decoder vocabulary, the decoder creating a hierarchical tree structure for storing information into the decoder vocabulary, the tree structure having a plurality of nodes, the nodes characterized by a hierarchical structure such that at least a first plurality of nodes has a different hierarchical level than the other nodes, some of the first plurality of nodes containing a representation of the number of nodes having a lower hierarchical level associated with that node.

6. A method of updating a decoder vocabulary in a data compression decoder, comprising the steps of: (a) Selecting a candidate node;

(b) determining if the node has any offspring nodes; (C) using the candidate node to store a new vocabulary entry if the candidate node has no offspring nodes.