WO2000004442A1 - Reduced overhead text messaging - Google Patents
Reduced overhead text messaging Download PDFInfo
- Publication number
- WO2000004442A1 WO2000004442A1 PCT/US1999/013293 US9913293W WO0004442A1 WO 2000004442 A1 WO2000004442 A1 WO 2000004442A1 US 9913293 W US9913293 W US 9913293W WO 0004442 A1 WO0004442 A1 WO 0004442A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- token
- messaging system
- text messaging
- reduced overhead
- tokens
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
Definitions
- This invention relates in general to selective call signaling systems and more particularly to a selective call signaling system that facilitates reduced overhead text messaging over a wireless network.
- a user or originator may send a message to a messaging unit (e.g., selective call receiver), the message comprising an address associated with the messaging unit, and data.
- the data may be in one or more forms such as numeric digits representing a phone number, alphanumeric characters representing a readable text message, or possibly a multimedia message comprising audio and graphical information.
- this type of messaging is sufficient to convey information between individuals or services relating to their business, special interests, whereabouts, general scheduling, or time critical appointments.
- a solution must be found that reduces the overall amount of data on a given channel, thus allowing higher data throughput, increased channel utilization, and reduced messaging latency.
- wireless messaging system that allows an originator to communicate a text message to a messaging unit in a form that reduces the total overhead associated with text messaging.
- a method and apparatus for sending data comprising compressed text messages using existing selective call signaling equipment operating with paging protocols such as FLEXTM, a trademark of Motorola, Inc., POCSAG (Post Office Code Standardisation Advisory Group), or the like.
- FLEXTM a trademark of Motorola, Inc.
- POCSAG Post Office Code Standardisation Advisory Group
- a first aspect of the invention involves realizing hardware that implements a method for overlaying compressed text messaging on an existing paging infrastructure.
- the existing paging infrastructure comprises a paging terminal that includes a paging encoder for processing received messages containing original text messages and their corresponding destination requests, e.g., selective call Personal Identification Numbers (PIN's), cap codes, selective call addresses, or the like.
- the paging terminal generates a messaging queue of compressed selective call messages comprising the received messages and their corresponding selective call address(es), as determined from the corresponding destination requests. Distribution of the selective call messages in the messaging queue is handled by the paging terminal which dispatches messages to at least one base station (e.g., transmitter, antenna, and receiver) for communication between the base station and the messaging unit(s) or pagers.
- base station e.g., transmitter, antenna, and receiver
- a second aspect of the invention involves the inclusion of a lossless compression engine, preferably in the paging terminal, for selectively compressing messages received from the originator or a two-way messaging unit, e.g. portable data terminal, cellular telephone, or two-way pager.
- a lossless compression engine preferably in the paging terminal, for selectively compressing messages received from the originator or a two-way messaging unit, e.g. portable data terminal, cellular telephone, or two-way pager.
- a third aspect of the invention involves the messaging unit or pager that is equipped with a lossless decompression engine that can decompress compressed messaging information contained in the compressed selective call messages to recover the original text.
- a fourth aspect of the invention involves the messaging unit or pager being equipped with a primary and possibly a secondary apparatus for communicating both inbound and outbound messages.
- the primary apparatus comprises a conventional radio frequency receiver and optionally a conventional radio frequency transmitter.
- the secondary apparatus comprises an optical receiver and optionally an optical transmitter.
- the secondary apparatus may further comprise one or more acoustic or other electromagnetic transducers and associated circuitry implementing a uni- or bi-directional communication link between the messaging unit or pager and the originator.
- a fifth aspect of the invention involves the compression engine associated with the paging terminal and the decompression engine associated with the messaging unit or pager accommodating a plurality of compression procedures. These compression procedures comprise both lossless or lossy compression schemes, as appropriate for the information being compressed.
- a sixth aspect of the invention involves the messaging unit or pager including a compression engine for compressing a message newly originating from the messaging unit or pager and destined for at least one messaging unit or pager having message decompression capability.
- FIG. 1 is an electrical block diagram of a data transmission system for use in accordance with the preferred embodiment of the present invention.
- FIG. 2 is an electrical block diagram of a terminal for processing and transmitting message information in accordance with the preferred embodiment of the present invention.
- FIGs. 3-5 are timing diagrams illustrating the transmission format of the signaling protocol utilized in accordance with the preferred embodiment of the present invention.
- FIGs. 6 and 7 are timing diagrams illustrating the synchronization signals utilized in accordance with the preferred embodiment of the present invention.
- FIG. 8 is an electrical block diagram of a messaging unit in accordance with the preferred embodiment of the present invention.
- FIG. 9 is a diagram of a compressed messaging system in accordance with the present invention.
- FIG. 10 is a high level block diagram of a messaging unit in accordance with the preferred embodiment of the present invention.
- FIG. 11 is a block diagram of the message composition and compression equipment that could be used to send compressed messages to subscriber messaging units via a paging channel.
- FIG. 12 is a functional diagram of a wireless selective call signaling system controller that implements a combined one-way and two-way compressed messaging system capable of communicating with the messaging units.
- FIG. 13 is an exemplary state diagram showing the stages for a string of length n.
- FIG. 14 is a state table depicting a typical parsing and compression operation of the compression engine to resolve the shortest path for optimal text compression in accordance with the preferred embodiment of the present invention.
- FIG. 15 is an exemplary state diagram showing a typical initial data type assignment in accordance with the preferred embodiment of the present invention.
- FIG. 16 is an example of shortest path encoding of a compressed message including details such as message segmentation, shortest path data encoding, shortest path data type switch encoding, and padding in accordance with the preferred embodiment of the present invention.
- an electrical block diagram illustrates a data transmission system 100, such as a paging system, for use in accordance with the preferred embodiment of the present invention.
- messages originating either from a phone, as in a system providing numeric data transmission, or from a message entry device, such as an alphanumeric data terminal are routed through the public switched telephone network (PSTN) to a paging terminal 102 which processes the numeric or alphanumeric message information for transmission by one or more transmitters 104 provided within the system.
- PSTN public switched telephone network
- the transmitters 104 may use simulcast, multicast, or zone based transmitting schemes to broadcast the message information to messaging units 106. Processing of the numeric and alphanumeric information by the paging terminal 102, and the protocol utilized for the transmission of the messages is described in detail in the following text.
- an electrical block diagram illustrates the paging terminal 102 utilized for processing and controlling the transmission of the message information in accordance with the preferred embodiment of the present invention.
- Short messages such as tone-only and numeric messages which can be readily entered using a conventional dual tone multi-frequency telephone are coupled to the paging terminal 102 through a telephone interface 202 in a manner well known in the art.
- Longer messages such as alphanumeric messages which require the use of a data entry device are coupled to the paging terminal 102 through a modem 206 or the like using any of a number of well known transmission protocol standards.
- a controller 204 handles the processing of the message.
- the controller 204 is preferably a microcomputer, such as manufactured by Motorola Inc., and which runs various pre-programmed routines for controlling such terminal operations as voice prompts to direct the caller to enter the message, or the handshaking protocol to enable reception of messages from a data entry device.
- the controller 204 references information stored in the subscriber database 208 to determine how the message being received is to be processed.
- the subscriber data base 208 includes, but is not limited to such information as addresses assigned to the messaging unit, message type associated with the address, and information related to the status of the messaging unit, such as active or inactive for failure to pay the bill.
- a data entry terminal 240 is provided which couples to the controller 204, and which is used for such purposes as entry, updating and deleting of information stored in the subscriber data base 208, for monitoring system performance, and for obtaining such information as billing information.
- the subscriber database 208 also includes such information as to what transmission frame and to what transmission phase the messaging unit is assigned, as will be described in further detail below.
- the received message is stored in an active page file 210 which stores the messages in queues according to the transmission phase assigned to the messaging unit.
- four phase queues are provided in the active page file 210.
- the active page file 210 is preferably a dual port, first in first out random access memory, although it will be appreciated that other random access memory devices, such as hard disk drives, can be utilized as well.
- Periodically the message information stored in each of the phase queues is recovered from the active page file 210 under control of controller 204 using timing information such as provided by a real time clock 214, or other suitable timing source.
- the recovered message information from each phase queue is sorted by frame number and is then organized by address, message information, and any other information required for transmission (all of which is referred to as message related information), and then batched into frames based upon message size by frame batching controller 212.
- the batched frame information for each phase queue is coupled to frame message buffers 216 which temporarily store the batched frame information until a time for further processing and transmission. Frames are batched in numeric sequence, so that while a current frame is being transmitted, the next frame to be transmitted is in the frame message buffer 216, and the next frame thereafter is being retrieved and batched. At the appropriate time, the batched frame information stored in the frame message buffer 216 is transferred to the frame encoder 218, again maintaining the phase queue relationship.
- the frame encoder 218 encodes the address and message information into address and message codewords required for transmission, as will be described below.
- the encoded address and message codewords are ordered into blocks and then coupled to a block interleaver 220 which interleaves preferably eight codewords at a time to form interleaved information blocks for transmission in a manner well known in the art.
- the interleaved codewords contained in the interleaved information blocks produced by each block interleaver 220 are then serially transferred to a phase multiplexer 221, which multiplexes the message information on a bit by bit basis into a serial data stream by transmission phase.
- the controller 204 next enables a frame sync generator 222 which generates the synchronization code which is transmitted at the start of each frame transmission.
- the synchronization code is multiplexed with address and message information under the control of controller 204 by serial data splicer 224, and generates therefrom a message stream which is properly formatted for transmission.
- the message stream is next coupled to a transmitter controller 226, which under the control of controller 204 transmits the message stream over a distribution channel 228.
- the distribution channel 228 may be any of a number of well known distribution channel types, such as wire line, an RF or microwave distribution channel, or a satellite distribution link.
- the distributed message stream is transferred to one or more transmitter stations 104, depending upon the size of the communication system.
- the message stream is first transferred into a dual port buffer 230 which temporarily stores the message stream prior to transmission.
- the message stream is recovered from the dual port buffer 230 and coupled to the input of preferably a 4-level FSK modulator 234.
- the modulated message stream is then coupled to the transmitter 236 for transmission via antenna 238.
- the timing diagrams illustrate the transmission format of the signaling protocol utilized in accordance with the preferred embodiment of the present invention.
- This signaling protocol is commonly referred to as Motorola'sTM FLEXTM selective call signaling protocol.
- the signaling protocol enables message transmission to messaging units, such as pagers, assigned to one or more of 128 frames which are labeled frame 0 through frame 127.
- the actual number of frames provided within the signaling protocol can be greater or less than described above.
- the greater the number of frames utilized the greater the battery life that may be provided to the messaging units operating within the system.
- the fewer the number of frames utilized the more often messages can be queued and delivered to the messaging units assigned to any particular frame, thereby reducing the latency, or time required to deliver messages.
- the frames comprise a synchronization codeword (sync) followed preferably by eleven blocks of message information (information blocks) which are labeled block 0 through block 10.
- each block of message information comprises preferably eight address, control or data codewords which are labeled word 0 through word 7 for each phase. Consequently, each phase in a frame allows the transmission of up to eighty-eight address, control and data codewords.
- the address, control and data codewords preferably comprise two sets, a set first relating to a vector field comprising a short address vector, a long address vector, a first message word, and a null word, and a second set relating to a message field comprising a message word and a null word.
- the address, control, and data or message codewords are preferably 31,21 BCH codewords with an added thirty-second even parity bit which provides an extra bit of distance to the codeword set. It will be appreciated that other codewords, such as a 23,12 Golay codeword could be utilized as well. Unlike the well known POCSAG signaling protocol which provides address and data codewords which utilize the first codeword bit to define the codeword type, as either address or data, no such distinction is provided for the address and data codewords in the FLEXTM signaling protocol utilized with the preferred embodiment of the present invention. Rather, address and data codewords are defined by their position within the individual frames.
- FIGS. 6 and 7 are timing diagrams illustrating the synchronization code utilized in accordance with the preferred embodiment of the present invention.
- the synchronization code comprises preferably three parts, a first synchronization code (sync 1), a frame information codeword (frame info) and a second synchronization codeword (sync 2).
- the first synchronization codeword comprises first and third portions, labeled bit sync 1 and BS1, which are alternating 1,0 bit patterns which provides bit synchronization, and second and fourth portions, labeled "A" and its complement “A bar", which provide frame synchronization.
- the second and fourth portions are preferably single 32,21 BCH codewords which are predefined to provide high codeword correlation reliability, and which are also used to indicate the data bit rate at which addresses and messages are transmitted.
- Table 1 defines the data bit rates which are used in conjunction with the signaling protocol.
- the frame information codeword is preferably a single 32,21 BCH codeword which includes within the data portion a predetermined number of bits reserved to identify the frame number, such as 7 bits encoded to define frame number 0 to frame number 127.
- the structure of the second synchronization code is preferably similar to that of the first synchronization code described above.
- the second synchronization code is transmitted at the data symbol rate at which the address and messages are to be transmitted in any given frame. Consequently, the second synchronization code allows the messaging unit to obtain "fine" bit and frame synchronization at the frame transmission data bit rate.
- FIG. 8 is an electrical block diagram of the messaging unit 106 in accordance with the preferred embodiment of the present invention.
- the heart of the messaging unit 106 is a controller 816, which is preferably implemented using a low power microcomputer, such as manufactured by Motorola, Inc., or the like.
- the microcomputer controller hereinafter call the controller 816, receives and processes inputs from a number of peripheral circuits, as shown in FIG. 8, and controls the operation and interaction of the peripheral circuits using software subroutines.
- the use of a microcomputer controller for processing and control functions (e.g., as a function controller) is well known to one of ordinary skill in the art.
- the messaging unit 106 is capable of receiving address, control and message information, hereafter called "data" which is modulated using preferably 2-level and 4-level frequency modulation techniques.
- data address, control and message information
- the transmitted data is intercepted by an antenna 802 which couples to the input of a receiver section 804.
- Receiver section 804 processes the received data in a manner well known in the art, providing at the output an analog 4-level recovered data signal, hereafter called a recovered data signal.
- the recovered data signal is coupled to one input of a threshold level extraction circuit 808, and to an input of a 4-level decoder 810.
- threshold level extraction circuit 808, 4-level decoder 810, symbol synchronizer 812, 4-level to binary converter 814, synchronization codeword correlator 818, and phase timing generator (data recovery timing circuit) 826 depicted in the messaging unit of FIG. 8 is best understood with reference to United States Patent No. 5,282,205 entitled “Data Communication Terminal Providing Variable Length Message Carry-On And Method Therefor," issued to Kuznicki et al., assigned to Motorola, Inc., the teachings of which are incorporated herein by reference thereto.
- the threshold level extraction circuit 808 comprises two clocked level detector circuits (not shown) which have as inputs the recovered data signal.
- signal states Preferably, signal states of 17%, 50% and 83%, are utilized to enable decoding the 4-level data signals presented to the threshold level extraction circuit 808.
- a clock rate selector When power is initially applied to the receiver portion, as when the messaging unit is first turned on, a clock rate selector is preset through a control input (center sample) to select a 128X clock, i.e. a clock having a frequency equivalent to 128 times the slowest data bit rate, which as described above is 1600 bps.
- the 128X clock is generated by 128X clock generator 844, as shown in FIG. 8, which is preferably a crystal controlled oscillator operating at 204.8 KHz (kilohertz).
- the output of the 128X clock generator 844 couples to an input of frequency divider 846 which divides the output frequency by two to generate a 64X clock at 102.4 KHz.
- the 128X clock allows the level detectors to asynchronously detect in a very short period of time the peak and valley signal amplitude values, and to therefore generate the low (Lo), average (Avg) and high (Hi) threshold output signal values required for modulation decoding.
- the controller 816 After symbol synchronization is achieved with the synchronization signal, as will be described below, the controller 816 generates a second control signal (center sample) to enable selection of a IX symbol clock which is generated by symbol synchronizer 812 as shown in FIG. 8.
- the 4-level decoder 810 preferably operates using three voltage comparators and a symbol decoder. The recovered data signal is coupled to an input of the three comparators having thresholds corresponding with normalized signal states of 17%, 50% and 83%.
- the resulting system effectively recovers the demodulated 2- or 4- level FSK information signal by coupling the recovered data signal to the second input of an 83% comparator, the second input of a 50% comparator, and the second input of a 17% comparator.
- the outputs of the three comparators corresponding with the low (Lo), average (Avg) and high (Hi) threshold output signal values are coupled to inputs of a symbol decoder.
- the symbol decoder then decodes the inputs according to Table 2.
- the MSB output from the 4-level decoder 810 is coupled to an input of the symbol synchronizer 812 and provides a recovered data input generated by detecting the zero crossings in the 4-level recovered data signal.
- the positive level of the recovered data input represents the two positive deviation excursions of the analog 4-level recovered data signal above the average threshold output signal, and the negative level represents the two negative deviation excursions of the analog 4-level recovered data signal below the average threshold output signal.
- the symbol synchronizer 812 uses a 64X clock at 102.4 KHz which is generated by frequency divider 846, that is coupled to an input of a 32X rate selector (not shown).
- the 32X rate selector is preferably a divider which provides selective division by 1 or 2 to generate a sample clock which is thirty-two times the symbol transmission rate.
- a control signal (1600/3200) is coupled to a second input of the 32X rate selector, and is used to select the sample clock rate for symbol transmission rates of 1600 and 3200 symbols per second.
- the selected sample clock is coupled to an input of 32X data oversampler (not shown) which samples the recovered data signal (MSB) at thirty-two samples per symbol.
- the symbol samples are coupled to an input of a data edge detector (not shown) which generates an output pulse when a symbol edge is detected.
- the sample clock is also coupled to an input of a divide-by- 16/32 circuit (not shown) which is utilized to generate IX and 2X symbol clocks synchronized to the recovered data signal.
- the divided-by- 16/32 circuit is preferably an up/down counter.
- a pulse is generated which is gated by an AND gate with the current count of divide-by- 16/32 circuit. Concurrently, a pulse is generated by the data edge detector which is also coupled to an input of the divide-by- 16/32 circuit.
- the output generated by the AND gate causes the count of divide-by-16/32 circuit to be advanced by one count in response to the pulse which is coupled to the input of divide-by-16/32 circuit from the data edge detector, and when the pulse coupled to the input of the AND gate arrives after the generation of a count of thirty-two by the divide-by-16/32 circuit, the output generated by the AND gate causes the count of divide-by-16/32 circuit to be retarded by one count in response to the pulse which is coupled to the input of divide-by-16/32 circuit from the data edge detector, thereby enabling the synchronization of the IX and 2X symbol clocks with the recovered data signal.
- the symbol clock rates generated are best understood from Table 3 below.
- the IX and 2X symbol clocks are generated 1600, 3200 and
- the 4-level binary converter 814 couples the IX symbol clock to a first clock input of a clock rate selector (not shown).
- a 2X symbol clock is coupled to a second clock input of the clock rate selector.
- the symbol output signals (MSB, LSB) are coupled to inputs of an input data selector (not shown).
- a selector signal (2L/4L) is coupled to a selector input of the clock rate selector and the selector input of the input data selector, and provides control of the conversion of the symbol output signals as either 2-level FSK data, or 4-level FSK data.
- 2-level FSK data conversion (2L) is selected, only the MSB output is selected which is coupled to the input of a conventional parallel to serial converter (not shown).
- the IX clock input is selected by clock rate selector which results in a single bit binary data stream to be generated at the output of the parallel to serial converter.
- clock rate selector When the 4-level FSK data conversion (4L) is selected, both the LSB and MSB outputs are selected which are coupled to the inputs of the parallel to serial converter.
- the 2X clock input is selected by clock rate selector which results in a serial two bit binary data stream to be generated at 2X the symbol rate, which is provided at the output of the parallel to serial converter.
- the serial binary data stream generated by the 4-level to binary converter 814 is coupled to inputs of a synchronization codeword correlator 818 and a demultiplexer 820.
- Predetermined "A" codeword synchronization patterns are recovered by the controller 816 from a code memory 822 and are coupled to an "A" codeword correlator (not shown).
- an "A" or "A-bar” output is generated and is coupled to controller 816.
- the particular "A" or "A-bar" codeword synchronization pattern correlated provides frame synchronization to the start of the frame ID codeword, and also defines the data bit rate of the message to follow, as was previously described.
- the serial binary data stream is also coupled to an input of the frame codeword decoder
- the controller 816 (not shown) which decodes the frame codeword and provides an indication of the frame number currently being received by the controller 816.
- power is supplied to the receiver portion by battery saver circuit 848, shown in FIG. 8, which enabled the reception of the "A" synchronization codeword, as described above, and which continues to be supplied to enable processing of the remainder of the synchronization code.
- the controller 816 compares the frame number currently being received with a list of assigned frame numbers stored in code memory 822. Should the currently received frame number differ from an assigned frame numbers, the controller 816 generates a battery saving signal which is coupled to an input of battery saver circuit 848, suspending the supply of power to the receiver portion. The supply of power will be suspended until the next frame assigned to the receiver, at which time a battery saver signal is generated by the controller 816 which is coupled to the battery saving circuit 848 to enable the supply of power to the receiver portion to enable reception of the assigned frame.
- a predetermined "C” codeword synchronization pattern is recovered by the controller 816 from a code memory 822 and is coupled to a "C” codeword correlator (not shown).
- a "C” or “C-bar” output is generated which is coupled to controller 816.
- the particular "C” or “C-bar” synchronization codeword correlated provides “fine” frame synchronization to the start of the data portion of the frame.
- the start of the actual data portion is established by the controller 816 generating a block start signal (Blk Start) which is coupled to inputs of a codeword de-interleaver 824 and a data recovery timing circuit 826.
- a control signal (2L / 4L) is coupled to an input of clock rate selector (not shown) which selects either IX or 2X symbol clock inputs.
- the selected symbol clock is coupled to the input of a phase generator (not shown) which is preferably a clocked ring counter which is clocked to generate four phase output signals (01-04).
- a block start signal is also coupled to an input of the phase generator, and is used to hold the ring counter in a predetermined phase until the actual decoding of the message information is to begin. When the block start signal releases the phase generator, it begins generating clocked phase signals which are synchronized with the incoming message symbols.
- phase selector 828 The clocked phase signal outputs are then coupled to inputs of a phase selector 828.
- the controller 816 recovers from the code memory 822, the transmission phase number to which the messaging unit is assigned. The phase number is transferred to the phase select output (0 Select) of the controller 816 and is coupled to an input of phase selector 828.
- a phase clock, corresponding to the transmission phase assigned, is provided at the output of the phase selector 828 and is coupled to clock inputs of the demultiplexer 820, block de-interleaver 824, and address and data decoders 830 and 832, respectively.
- the demultiplexer 820 is used to select the binary bits associated with the assigned transmission phase which are then coupled to the input of block de-interleaver 824, and clocked into the de-interleaver array on each corresponding phase clock.
- the de-interleaver uses an 8 x 32 bit array which de-interleaves eight 32 bit interleaved address, control or message codewords, corresponding to one transmitted information block.
- the de-interleaved address codewords are coupled to the input of address correlator 830.
- the controller 816 recovers the address patterns assigned to the messaging unit, and couples the patterns to a second input of the address correlator.
- the message information and corresponding information associated with the address is then decoded by the data decoder 832 and stored in a message memory 850.
- the message information is coupled to the input of data decoder 832 which decodes the encoded message information into preferably a BCD or ASCII format suitable for storage and subsequent display.
- the software based signal processor may be replaced with a hardware equivalent signal processor that recovers the address patterns assigned to the messaging unit, and the message related information.
- the message information and corresponding information associated with the address may be stored directly in the message memory 850. Operation in this manner allows later decoding of the actual message information, e.g., that encoded message information that decodes into a BCD, ASCII, or multimedia format suitable for subsequent presentation.
- the memory in performing direct storage, the memory must be structured in a manner that allows efficient, high speed placement of the message information and corresponding information associated with the address.
- a codeword identifier 852 examines the received codeword to assign a type identifier to the codeword in response to the codeword belonging to one of a set comprising a vector field and a set comprising a message field. After determining the type identifier, a memory controller 854 operates to store the type identifier in a second memory region within the memory corresponding with the codeword.
- a sensible alert signal is generated by the controller 816.
- the sensible alert signal is preferably an audible alert signal, although it will be appreciated that other sensible alert signals, such as tactile alert signals, and visual alert signals can be generated as well.
- the audible alert signal is coupled by the controller 816 to an alert driver 834 which is used to drive an audible alerting device, such as a speaker or a transducer 836. The user can override the alert signal generation through the use of user input controls 838 in a manner well known in the art.
- the stored message information can be recalled by the user using the user input controls 838 whereupon the controller 816 recovers the message information from memory, and provides the message information to a display driver 840 for presentation on a display 842, such as an LCD display.
- FIG. 9 a diagram shows a selective call messaging system 900 capable of communicating compressed messages in accordance with the present invention.
- the paging terminal 102 or wireless selective call signaling system controller receives information comprising a selective call message request including original text and a destination identifier.
- the information is typically coupled to the paging terminal 102 via a Public Switched Telephone Network (PSTN) 912 which serves to transport the information from an originator 914.
- PSTN Public Switched Telephone Network
- the PSTN 912 may be coupled to the paging terminal 102 and the originator 914 using conventional phone lines 910 or possibly a high speed digital network, depending on the information bandwidth required for communicating messages between the originator 914 and a plurality of messaging units 906.
- the information is compressed and formatted as one or more selective call messages that are transferred 922 to at least one radio frequency transmitter 904 for broadcast to at least one messaging unit 906 located in any one of a number of communication zones 902.
- the messaging unit 906 operates to decompress the compressed message and store the decompressed message in message memory. Two-way capability may be provided for the messaging unit 906 using either a wired or a wireless return path.
- the message is received by the messaging unit 906 which decompresses a content of the message. This message content is then stored by the messaging unit 906 pending presentation.
- the messaging system allows wireless return or origination of messages received by distributed receiver sites 908.
- each messaging unit 906 is kept to a minimum, yielding a more ergonomic portable device with the value added function of not requiring a physical connection to a wired network when effecting messaging.
- the secure messaging system is adapted to accommodate lower power messaging unit 906 devices that might include additional means for implementing the return or origination of messages using a reverse or inbound channel that is accessed at specific points in the network.
- the lower power messaging unit 906 could include an infrared or laser optical port, low power proximate magnetic inductive or electric capacitive port, or possibly an ultrasonic or audio band acoustic transducer port, all of which could couple signals between the lower power messaging unit 906 and a distributed communication device or the like.
- this short-range localized type of transmission might be desirable because of privacy or security concerns.
- the illustration shows a high level block diagram of a messaging unit 906 in accordance with the preferred embodiment of the present invention.
- the preferred embodiment of the messaging unit 906 is a conventional paging device as shown in FIG. 10 modified to include a message decompression engine 1014 and associated memory 1016 containing compression token table(s) and other data elements necessary for message decompression.
- the electronics required to implement the decompression engine 1014 may be integrated with the paging device.
- the message decompression engine may be implemented as an application in software or firmware that is executed using the central processing unit (CPU) 1006, random access memory (RAM) 1008 and read-only memory (ROM) 1010.
- CPU central processing unit
- RAM random access memory
- ROM read-only memory
- the incoming signal is captured by the antenna 802 coupled to the receiver 804 which detects and demodulates the signal, recovering any information as previously discussed with reference to FIG. 8.
- the messaging unit 906 may contain a low power reverse channel transmitter 1034, power switch 1032, and transmit antenna 1030 for either responding to an outbound channel query or generating an inbound channel request.
- the portable transmitter 1034 e.g., a low power radio frequency device
- the alternative transmission block 1036 may additionally contain either uni- or bi-directional communication transducers. Examples of such transducers are optical devices like lasers or light emitting diodes (LED), extremely low power magnetic field inductive or electric field capacitive structures (e.g., coils, transmission lines), or possibly acoustic transducers in the audio or ultrasonic range.
- An optional input/output (I/O) switch 1002 serves to direct the incoming or outgoing radio frequency (RF) energy between the RF receiver 804, RF transmitter 1034 and a selective call decoder 1004.
- the selective call decoder 1004 comprises the CPU 1006, and its associated RAM 1008, ROM 1010, and universal input/output (I/O) module 1012.
- the primary function of the selective call decoder 1004 is to detect and decode information contained in signaling intended for receipt by the messaging unit 906.
- the selective call decoder 1004 can function as an encoder to generate and deliver requests or messages to the originator 914, a user, or other on-line system (not shown).
- the components within the selective call decoder 1004 may be used to implement the message compression/decompression engine.
- the messaging unit 906 can further operate as a message generator to create a outgoing selective call message and its corresponding system transmission request.
- An optional message entry device 1018 allows a user to initiate a selective call message or the like. Typically, a user might enter a request using a keyboard, a voice activated recognition device, a touch- sensitive device (e.g., screen or pad), or other convenient data entry device.
- a portable transmitter 1034 coupled to the message generator operates to broadcast the selective call message request for receipt by one of the distributed receiver sites 908 and coupling to a paging terminal for decoding, forwarding, or any other function requested by the originator.
- a user may return a message in response to prior messages stored in the messaging unit 906 or information recently communicated with the messaging unit 906.
- the messaging unit 906 can initiate messaging transactions without requiring a physical connection to a land-line hard wired network or PSTN.
- the invention preferably operates using the Motorola
- the block diagram illustrates message composition and compression equipment that could be used to send compressed messages to subscriber messaging units via a paging channel or the like.
- both direct branch and customer calls are received by a selective call processor 1100 comprising a message processing computer 1102, a message compression computer 1104, a subscriber database 1106, and a compression token database 1108.
- the message processing computer 1102 receives messaging requests and communicates with the message compression computer 1104 to generate and compress the original text (numeric or alphanumeric) message based on information (e.g., tokens) contained in the compression token database 1108.
- the message processing computer 1102 also determines a destination identifier from information contained in the subscriber database 1106, which allows a selective call message distributor (an intrinsic function of the selective call processor) to communicate the destination identifier, typically as a selective call address, and its corresponding compressed message, to a selective call transmission service 904.
- the destination identifier may correspond with a conventional paging or selective call address, a cellular telephone address, or any other address that uniquely identifies a destination associated with the compressed message.
- the subscriber database 1106 may include a compression version identifier that corresponds with a specific destination identifier.
- the purpose of the compression version identifier is to allow each messaging unit to operate with either the same compression and decompression procedure, or with compression and decompression procedures that are specifically adapted to the text, e.g., characters or language, being communicated between messaging units or alternate devices of like capability.
- the compression version identifier would indicate the version of the compression and decompression procedure used to compress and/or decompress the English text communicated with the messaging unit.
- the compression version identifier would indicate the version of the compression and decompression procedure used to compress and/or decompress the Mexican Spanish text communicated with the messaging unit.
- the message composition and compression equipment illustrated in FIGs. 11 and 12 would typically be used on the premises of a radio common carrier or other messaging service provider to send compressed messages to messaging units 906 via a conventional paging channel or the like.
- the message entry, receipt, capture, generation, and compression may be distributed in any number of ways.
- a software application program executing on a conventional personal computer (PC) may implement the data entry function using an associated keyboard or other data entry device.
- the same PC may have access to a local or remote subscriber database with selective call address information.
- the software application program may include the procedures and associated token tables needed to compress the text associated with the selective call message.
- the message or messages delivered to a paging terminal may already be pre-compressed and addresses, so that the paging terminal need only forward the compressed messages to the particular messaging units.
- the illustration shows a functional diagram of a wireless selective call signaling system controller that implements a combined one-way and two-way compressed messaging system capable of signaling the messaging units.
- the wireless selective call signaling system controller comprises the paging terminal 102 along with a transmitter 104 and associated antenna 904, and in two-way RF systems, at least one receiver 1202 system comprising a received signal processor 1204 and at least one receive antenna 908. Preferably, several of at least one receiver 1202 systems may be distributed over a wide geographical area to receive the low power transmissions broadcast by two-way messaging units 906. The number of receiver 1202 systems in any given geographical area is selected to insure adequate coverage for all inbound transmissions. As one of ordinary skill in the art will appreciate, this number may vary greatly depending on terrain, buildings, foliage, and other environmental factors.
- the wireless selective call signaling system controller represents a closely coupled implementation of the overall compressed messaging system.
- an originator is not the party responsible for maintaining the RF infrastructure, i.e., the transmitter 104 and associated antenna 904, and the at least one receiver 1202 system. Consequently, a conventional wireless messaging service provider or radio common carrier would provide and maintain the RF infrastructure, and the originator would utilize that RF infrastructure in a conventional manner to communicate compressed messages between the originator and the messaging units 906.
- the selective call signaling system controller may operate to compress, encode, and transmit compressed messages received from an originator, where the selective call processor 1100 has generated the compressed message. Subsequently, the messaging unit 906 decodes and decompressed the compressed message, revealing the original text message. Similarly, the selective call signaling system controller may receive messages originating from the messaging unit 906, decode and decompress the message, then pass the decompressed message to the originator as a reply to an original message sent to the messaging unit 906.
- the selective call signaling system controller may operate to encode and transmit compressed messages communicated between the originator and the messaging unit 906.
- the originator may pre-compress the original text message and forward it along with a destination identifier to the selective call processor 1100.
- the selective call signaling system controller then operates to associate a selective call address with the pre-compressed message based on a received destination identifier, and transmit a resulting compressed selective call message for receipt by the messaging unit 906.
- the messaging unit 906 decodes and decompresses the selective call message, revealing the original text message.
- the selective call signaling system controller may operate to receive messages originating from the messaging unit 906 and pass the received message in its compressed or decompressed state to the originator as a reply to an original message sent to the messaging unit 906.
- the messaging unit 906 may operate to originate a compressed or uncompressed message targeted for either another messaging unit or any device capable of processing the message for presentation.
- the preferred embodiment of the present invention implements a text compression procedure that is described in detail in the following text. Since most messages in conventional selective call messaging systems are relatively short e.g., from tens of characters to several hundred characters, a static token table design has been selected that gives good efficiency in both coding size and execution (compression / decompression) speed. Several test cases were performed on statistical data sets using the instant procedure, yielding compression on English and Spanish test data sets of 1.534 and 1.878 respectively. These compression rates correspond with throughput increases of at least fifty percent for English text messaging and at least eighty percent for Spanish text messaging. To maintain flexibility in the compression system, the defined data types are re-definable, as is the data switch. To further insure compatibility, an application version number may be specified in the message control segment of a compressed message.
- Length type 4 or 5 Coding: 4-bits fixed Table size: 16 Leading space: none
- ⁇ mini-4> token table There are preferably sixteen elements in the ⁇ mini-4> token table.
- a 4-bit length field is defined for the ⁇ mini-4> covering the length of one to sixteen.
- another bit is added to specify the continuity of the mini-tokens.
- the length field is '1111' with a continuity bit of '0'. If the length of the ⁇ mini-4> string is longer than 16, the continuity bit is set to ' 1' and another length field (4 or 5 bits) is expected after the 16 ⁇ mini-4> tokens are encoded.
- the preferred ⁇ mini-4> token table is shown in Table 10.
- a ⁇ mini-5> token is defined as follows:
- ⁇ mini-5> token table There are preferably thirty two elements in the ⁇ mini-5> token table. Up to two bits are used for specifying the length of this data type. One bit is used for the length field. A '0' means length of one and ' 1 ' means length of two. If the length field is '0' no second bit is needed. If the bit is ' 1', a continuation bit is required ('0' means finish and '1' means continue). Two bits are sufficient because most of the consecutive ⁇ mini-5>s are short. The most significant bit (MSB) of ⁇ mini-5> is the leading space indicator, therefore, six bits are used for ⁇ mini-5> tokens as indicated in the preceding definition.
- the preferred ⁇ mini-5> token table is shown in Table 11. Table 5 shows a hypothetical ⁇ mini-5> token table for English as follows:
- the compression procedure is designed to support multiple versions of compression and decompression.
- two types of Spanish language messaging units may be defined.
- Type A messaging units would display English characters while Type B messaging units would display Spanish characters. This variation is necessary because paging terminals typically use different character sets as in the United States and Spain.
- a token table is implemented that includes tokens comprising all ASCII characters except upper case and the most frequently used phrases, words and fragments.
- the selected token table size is 542 elements with each element having a fixed length of 10 bits.
- the most significant bit (except for command tokens) is a defined as a leading space indicator. The following relation describes the general structure and contents of the token table.
- Token table command tokens
- Length arbitrary (0) Coding: 10 bit fixed Table size: 542 Leading space: partial and the ⁇ Token Table> ( ⁇ T>) is the default data type.
- command tokens consisting of capital control tokens and data type switch tokens. This number may be increased of decreased depending on the size of the data set being considered and the compression desired.
- address fields 0 to 29 and 512 to 541 are reserved for command tokens.
- Capital control tokens are used to control the capitalization of characters, words, or phrases.
- ⁇ all cap> indicates that all following alphabetical characters in a string are to be capitalized. This token has no effect on characters such as numerals, '0' to '9', and symbols, ',', '$', etc.
- ⁇ all cap> is the default capitalization token, i.e., if no capital control token shown in the encoded message, the decoded message will be all capitalized or the decoded message will be capitalized until one of other capital control tokens is encountered.
- ⁇ cap first> indicates that the first alphabetical character encountered is capitalized and the following characters in lower case.
- ⁇ cc> ⁇ mini-4> ⁇ capital control> ( ⁇ cc>) and ⁇ character string> ( ⁇ cs>).
- the hypothetical coding of ⁇ cc> is defined as:
- the preferred capitalization control token table is shown in Table 13.
- a set of data type switch control tokens are defined in the token table. Since the token table serves as the default data type in the compression process, only the data type switch among ⁇ mini-4>, ⁇ mini-5>, ⁇ capital control> and ⁇ character string> are included in the data type switch control token set.
- data type switch tokens where:
- Length field 0 coding: 10 bit fixed
- the preferred command tokens are listed in the command token Table 13, Table 20 and Table 21.
- the phrase and fragment token table contains a number of language phrases and fragments depending on the statistics of the information being compressed.
- the phrase and fragment token table comprises 406 phrase and fragment tokens and 76 ASCII tokens in the token table.
- the phrase and fragment token table is as follows.
- an efficient compression (encoding) process is accomplished as follows. First, long messages are fragmented to improve the performance of the encoding process. According to the encoding process, a fragment is generated before a ⁇ mini-4> token or when a ⁇ capital control> token is encountered. ⁇ cc> and ⁇ mini-4> tokens are selected to fragment a message because (1) they occur frequently, (2) there are a very limited number of data types to represent the elements in these tables, and (3) generally, they are the most efficient representation of the information. This inherently yields an efficient code which results in optimal compression of the text data.
- ⁇ cc> there is only one way to represent all the elements in ⁇ cc> table (that is ⁇ cc>), but there are up to three ways to represent an element in the ⁇ mini-4> table ( ⁇ mini-4>, ⁇ cs> and ⁇ T>).
- a fragment can be generated before a token of a special character not covered by any of the ⁇ character string>, ⁇ mini-5> or ⁇ mini-4> (e.g. '!') or before any token in the token table if none of above mentioned criteria is satisfied.
- a shortest path procedure was designed for the encoding process.
- the major difference between the instant shortest path procedure and a conventional shortest path procedure is that the present procedure is based on a context-dependent cost matrix while the prior art is based on a context-independent cost matrix.
- the cost (number of bits used) is dependent on the history of data type switches and the length of each of the individual data type. For example, if the current pending encoding data type is ⁇ mini-4> and the previous data type is ⁇ T>, ten bits are needed for the data type switch. On the other hand, if this is the second data type in the data type switching chain beginning from a ⁇ T> at the second previous node and its previous data type is ⁇ cs>, and there is a data type switch token in the command token table with value 'c4', the incremental cost for the current data type switch is zero (no extra bit is needed for introducing another data type switch ⁇ mini-4> after ⁇ cs> at this particular node).
- the cost within a particular data type is also a function of the length of the data type. For example, if the current data is ⁇ mini-4> and it is the third consecutive token of the same data type, the incremental cost of encoding this data is 4 bits. On the other hand, if every element is the same except that this is the 16th consecutive ⁇ mini-4> data token, the cost will be 5 instead of 4. The extra bit is for the continuation bit in the length field.
- ⁇ T> serves as the default data type
- data types involved in the data type switch tokens. They are ⁇ mini-4>, ⁇ mini-5>, ⁇ cs> (character string) and ⁇ cc> (capital control) tokens.
- L is assigned a value of '0' when it is data type ⁇ T>. Whenever a data type is switched out from ⁇ T>, L is assigned a value of '1'. So long as the data type is not changed, the value of L is not changed. If a new data type switch is added (other than switching back to ⁇ T>, L is reset to '0') the value of L of the current node will be incremented.
- the parameter L is the L value of the current node.
- the incremental cost function C,( ) is defined based on a simulated result as: O.OO ⁇ 4
- C,( ) is an estimation of the incremental data type switching cost.
- C, (L) the accumulative data type switching cost C a ( ) is estimated.
- the S at a node with data type ⁇ T> is then defined as empty.
- the S should be updated whenever a new data type switch occurs. If the ending data type is any data type other than ⁇ T>, the S of the ending data type node should be concatenated to the S of its previous node with the ending data type.
- the ⁇ capital control> ('0' for short) code is then attached after the ⁇ mini-5> code ('5' for short) in the associated S of the current node, i.e., '50'.
- (S) is defined as a measurement of the cost (number of bits) of encoding data type switching sequence S.
- the partial string '45054' costs 20 bits while the full string '45054c' cost 10 bits.
- an 'S prefix' concept is introduced.
- the 'S prefix' sequence satisfies the following three conditions:
- the string '4c4' is the prefix sequence satisfying all the conditions mentioned above. If no such sequence exists for a given S, the prefix string is defined as empty with length zero.
- parameter L pre is defined as the maximum length of the prefixes associated with a given S of length L. Accordingly, the estimated switching cost of S with length L is defined as:
- C(S) C t (S).
- C(S) is cumulative and is based on the statistics of the data type switching cost of the system.
- C(S) will have a low cost if it could be encoded by small number of data type switch tokens or it has a long prefix sequence. Therefore C(S) is a desirable data type switching cost estimator.
- the parameters L, the string S and C(S) are updated for all the non- ⁇ T> stages in all nodes. This in turn contributes to the cost calculation of each node.
- the stage (data type) with the minimum cost value is selected.
- the first 10 is the ten bit data type switching from ⁇ T> and the next four bits are the length field of ⁇ mini-4>.
- the last four bits are the ⁇ mini-4> token itself.
- the length for the ⁇ mini-4> sequence may be longer than 15. Thus more than 4 bits may be needed for the length fields.
- the additional bits for the length fields are considered in the cost of C M .
- the first ten bits are for the data type switching.
- the next two bits are for the length field.
- the last six bits are for the character itself.
- the overhead (2 bits for the length field) of the switching from ⁇ T> to ⁇ character string> is distributed in several characters, say two characters in this case.
- Special attention is paid when there is only one character in the ⁇ character string> and when the data type switches to another data type.
- C M is a function of n which is the sequence number of the current ⁇ mini-4> token in the ⁇ mini-4> sequence.
- the cost of Gu calculated first and then the value of n is updated.
- C 45 1 + 6 + C(S 4 + '5'); one bit is used for the length field for ⁇ mini-5> and six bits are used for the ⁇ mini-5> token.
- the data type switching cost C(S) is also considered. Due to the fact that the C 45 is incremental in nature while the C(S) is accumulative, special attention is paid not to duplicate the C(S) in the cost estimation.
- C 5t 10.
- the cost of Ccc is then defined as:
- / (equivalent to n c ) is the length of the character string before the current updating. Initially it is zero. Whenever a switch is made to ⁇ cs> from any of the ⁇ mini-4>, ⁇ cc>, ⁇ mini-5> or ⁇ T>, / is assigned a value of one. Any time a new character is added into the character string I is incremented by one modulo 4.
- an exemplary state diagram shows the stages for a string of length n.
- n string For a length n string there are n + 2 stages in its state diagram.
- There is a link connecting two different stages in the state diagram if and only if there is a token in any of the above defined token tables which is identical to the that denoted by these two stages in the diagram.
- the link connecting stage '0' and '11' in FIG. 13 is an indication that there is a token of "please call”.
- each character in the message may have two ways to represent it. One is ⁇ cs>, another is ⁇ T>. This corresponds to having two links between any adjacent stage in the diagram. Note that only one link is shown in FIG. 13.
- the illustration shows a state table depicting a typical parsing and compression operation of the compression engine to resolve the shortest path for optimal text compression in accordance with the preferred embodiment of the present invention.
- the shortest path is identified with the help of the shortest path spread sheet.
- An example of the spread sheet associated with the state diagram of FIG. 13 is shown in FIG. 14.
- the first column of the spread sheet is the stage number. Each printable character has an unique stage number.
- the second column is the previous stage number of a associated link. There is a link for each character string which could be represented by a token in any of the token tables. There is a node for each of the links in the stage diagram.
- the third column is the character string represented by the link specific by the first two columns.
- the next three variables S 4 , S 5 , and S c are the data type switch sequences associated with ⁇ mini-4>, ⁇ mini-5> and ⁇ cs> of a node respectively.
- the data type switch sequence S has been defined before.
- the n 4 is the parameter n in the definition of C M and the n c is the parameter I of the definition of C cc .
- the ⁇ 4 is the node minimum cost when the associated character string is treated as a ⁇ mini-4>. The node minimum cost is calculated by considering all the possible data types of its previous stage. Mathematically it is calculated as follows
- the Cj j s are defined before. The following three fields are calculated in a similar way.
- the previous stage of the node is stage zero. If there are multiple links ended at a stage, there are multiple nodes under the stage. For example, there are two links ended at stage two therefore there are two nodes under stage two, each for a link. Define m as the link index of a stage.
- the stage minimum cost Sj (4, 5, T, c) is defined as the minimum of all the associated node minimum costs of that stage. This is mathematically represented as follows.
- stage minimum cost T 10 for the stage 6 in FIG. 14, that is the minimum of the three ⁇ T's with value 40, 30 and 10.
- field n c for the character links are updated in the spread sheet.
- the segment ending data type is selected at the end of the segment.
- the segment ending data type is selected as a ⁇ mini-5> in the example shown in FIG. 14.
- the selection of the segment ending data type depending not only on cost evaluation within its segment but also on the initial stage data type of its following segment.
- the data type of the ending stage of a segment is actually the data type of the initial stage of the following segment.
- ⁇ T> Data type ⁇ T> is assumed in the example, as the initial stage data type of the following segment. Therefore, ⁇ T> is assigned as the unique data type of the ending stage 12.
- the ⁇ mini-5> is selected as the segment ending stage data type only when the following relation is evaluated:
- the encoding can be easily identified in this example to be a ⁇ mini-5> token "please call”.
- the segment ending data type selection will follow the same rule as that specified in this section for other ending stage data types.
- the ending stage data type assignment becomes a don't care if the associated fragment is the last one of a message.
- an exemplary state diagram shows a typical initial data type assignment in accordance with the preferred embodiment of the present invention.
- the segment ending data type of the fragment serves as the initial stage data type of next fragment.
- the default data type (e.g. ⁇ T>) serves as the initial stage data type as shown in FIG. 14.
- the data type switch patterns among ⁇ mini-4>, ⁇ mini-5>, ⁇ cs> are collected as ten bit tokens to improve the compression rate. For example ten bits are used for a data type switch token '4545c'. That is equivalent to a sequence of five data type switch tokens, namely, '4', '5', '4', '5' and 'c'. Thus forty bits are saved by using the data type switch token in the example.
- a string of ⁇ mini-4> is treated as a single ⁇ mini-4> as far as the data type is concerned.
- the same treatment is performed in the encoding process for ⁇ mini-5> and ⁇ cs> strings. It is irrelevant to have a string of ⁇ cos.
- the selection of data type switch tokens are based on the merit of each tokens.
- the top M data type switch tokens with the highest PFVs are selected for the data type switch token table, where the M is the size of the table. To improve the error recovery capability of the system a bit padding technique is implemented.
- the technique requires bit padding in returning from the non-ten tokens to ⁇ T> so that the system can be recovered from any detectable or un-detectable errors as long as all the ⁇ T> tokens are beginning on the boundary of multiple of 10 bits. It is also noted that the error recovery is most desirable for long messages. The cost introduced by the error recovery technology is typically not justified in short message applications. Thus an adaptive error recovery technique is implemented in the present system.
- a run length parameter / is calculated in the encoding-decoding process.
- the value of / is equal to the number of bits of the compressed message from the current point of a data type (any data type other than ⁇ T>) switching back to ⁇ T> to the end of the message.
- the system can find this value in its backward encoding process and the subscriber can find this value by keep tracking the current decoding position (the total number of bits of the information should be easily obtained by the subscriber).
- a length threshold TL is defined. Bit padding should be implemented if / > TL. Otherwise, no padding is necessary.
- the TL was defined as 100 bits in performing the benchmark calculations for the compression rate.
- bit padding should be implemented in all the previous occurrences for the data type switching back to ⁇ T>; and (b) if a bit padding is not implemented at the current stage no bit padding should be performed in the following stages.
- the illustration shows an example of shortest path encoding of a compressed message including details such as message segmentation, shortest path data encoding, shortest path data type switch encoding, and padding in accordance with the preferred embodiment of the present invention.
- the first level 1602 shows the test message "Please call BOB at 123-4567 asap. It is important.”
- the second level 1604 shows fragmentation of the original text message into fragments separated by ⁇ cc> and/or ⁇ mini-4> tokens.
- the third level 1606 shows the test message encoded using the shortest path procedure.
- the fourth level 1608 shows the further encoding process as data type switch tokens are identified and applied to the message data.
- the fifth level 1610 shows the resulting coded (compressed) information in a decimal representation, e.g., ASCII characters, phrases or the like have been replaced with their equivalent tokens and the token string has been optimized for continuations, run length occurrences, capital control, etc.
- the sixth level 1612 shows the padding procedure as applied to 10 bit and non- 10 bit tokens such that all blocks in the compressed message are filled upon completion of the compression process. Since all optimizations are performed on encoding (compression) and not decoding (decompression), decompression of the compressed message is much simpler than compression. Accordingly, to decompress the compressed message, one only needs to follow the top-level encoding rules in reverse.
- Another feature of the present invention is its ability to mix samples for the same or dissimilar languages in order to derive an optimal token table relative to the statistical occurrence of characters, words, phrases, and other common groups of characters, numbers and symbols.
- the English word list disclosed is a mixture of word frequencies from different word frequency tables collected in different geographical regions.
- a first table was generated by collecting paging messages from Ottawa, Toronto, Montreal and Hamilton of Canada. The table comprised two data sets collected three years apart. The total period of paging messages collection is 14 days from two radio common carriers.
- the resultant table included 264,956 messages with 23.38% of messages in mixed case.
- the total number of characters in the table was 13,569,243.
- a second written English table was ported from the "Word Frequency Book" by John B. Carroll et. al.
- the tables were mixed as follows. First, items in the paging message word list are added to the mixed word list with a weight of 0.6 (60%). Second, items with a frequency greater than 0.018% (more than 500) in the written English word list are added to the mixed word list with a weight of 0.25 (25%). Last, items with frequency greater than 0.016% (more than 500) in the spoken English word list are added to the mixed word list with a weight of 0.15 (15%). From the resulting mixed table, a normalized word frequency is calculated as follows.
- f n is the updated frequency
- f 0 is the old frequency
- a Spanish word list is mixed.
- two tables were used for Spanish word list mixing. These tables contained paging messages and written Spanish.
- the first table was collected from paging messages sent in Mexico City, Mexico during a six day period from two radio common carriers. The paging messages collected were all in upper case. The total number of messages collected was 206,543 containing 12,378,243 characters.
- Written Spanish was ported from two different sources, the INFOSEL and EFE corpus.
- the first corpus was collected from general domain Mexican news articles of about 100 MB text size.
- the resulting file size was 3.4 MB for the item full list with 2 MB of items frequency greater than one and 1.2 MB of items having a frequency greater than or equal to five and 62K of items frequency greater than or equal to ten.
- the second corpus is a compilation of news articles of size 500 MB of widely divergent domains from the EFE satellite newswire service of Madrid, Spain.
- the full list is of 16.8 MB with 6.4 MB of items of frequency greater than one and 3 MB of items of frequency greater than or equal to five and 1.9 MB of items of frequency greater than or equal to ten. Only word count information was available for these statistics.
- This table was as follows. First, items in the paging message word list are added to the mixed word list with a weight of 0.6 (60%). Second, items in the INFOSEL corpus are added to the mixed word list with a weight of 0.20 (20%). Last, items in the EFE corpus are added to the mixed word list with a weight of 0.20 (20%). No spoken Spanish was added to the mixed word list.
- the Token Text Compression protocol is an end-appiication protocol used to send compressed messages over the air to wireless applications. It uses a token-based data compression techniques for lossless (i.e., the decompressed message is identical to the pre-compressed message) message transmissions. It supports 7-bit printable ASCII characters and some control characters for languages such as English and Spanish.
- Expandable Integer is defined in the Expandable Integer Appendix
- the Data field contains fields of one or more data types. Each data type has its own termination rules.
- ASCII, Printable ASCII characters from 0x20 to 0x7E.
- ASCII, Control ASCII characters from 0x00 to 0x1 F.
- Token A character or a string of characters that are grouped together.
- Token Table A set of tokens that share similar characteristics. Each token in the table has a Token ID, a unique identification number.
- Data Type Type of data in an element in a Token Table. All the tokens in a token table are defined to be of the same data type. Data Types are the building blocks of the compression.
- Mini-4 A data type that encodes numerals, space, and symbols that are closely associated with telephone numbers.
- Mini-5 A data type that encodes words.
- a ini-5 entry may only appear next to (i.e., prior to or after) a Mini-4 element.
- Character String A data type that is comprised of a set of individual printable characters.
- Capitalization Control A data type that is comprised of a set of tokens that control the capitalization elements in of a message. Two types of the Capitalization Control are defined: Stand Alone and Embedded.The effect of a capitalization control remains until another capitalization control token is encountered. Stand Along tokens are 10 bits long, while Embedded tokens are 2 bits long.
- Fragment and Phrase A data type that is comprised of frequently used fragments of words, complete words, phrases, and all printable ASCII.
- Data Type Switch A type of token that is used to control the switch from one data type to another.
- Default Data Type Data types that the compression algorithm utilizes when it is either at the beginning of the message or at the end of certain stages of the compression. These data types are defined to be 10-bit tokens and could be either Data Type Switch tokens or Fragment and Phrase tokens.
- Non-Default Data Type Data types that have data bit length other than 10 bits long (such as Mini-4, Mini-5, Character String, and Embedded Capitalization Control).
- Leading Space The space before a word or a character that is associated with the word or character. If multiple space characters exist, then only the last space before a word or a character is treated as a leading space.
- 10-bit Tokens Data Type Switch tokens and Fragment and Phrase tokens for which the data bit length is 10 bits long.
- the Status Information Field contains the value '0x6C Dictionary ID
- Dictionary ID defines the dictionary tables to be used for the decompression of the messages. Every Dictionary ID covers a set of the tables. They are Mini-4, Mini-5, Character String, Data Type Switch and Fragment and Phrase tables. In some cases, different ID share same tables. For example, the Mini-4 table is shared by both the default English and the default Spanish dictionaries.
- the Data field provides the data and parameters needed to decode the message. This field is comprised of tokens that are interpreted sequentially according to the compression algorithm to decompress the message.
- Mini-4 Mini-4
- Mini-5 Character String
- Capitalization Control Fragment and Phrase
- Data Type Switch Several Capitalization Controls are also defined.
- the following table summarizes important data type information for the English and Spanish compression. Each will be described in later sections.
- the preferred tokens (which are a collection of symbols) for the default English and the default Spanish dictionaries are listed in .
- Fragment and Phrase data type and Data Type Switch data type share the same range of token identification numbers: from 0 to 1023. Fragment and Phrase tokens from 30 to 511 and from 542 to 1023 are identical except that the later set has a leading space. The following map shows the locations of each:
- Mini-4 token table There are 16 elements in the Mini-4 token table.
- One or more Mini-4 token entries are specified using the following format:
- the Data field consists of one of more Mini-4 tokens, as specified by the Length Field.
- the Length Field has a (4+1) format.
- the 4 is used to indicate the data count in the Data field (i.e., the number of Mini-4 tokens) immediately following the Length Field, and the 1 is an optional continuation bit.
- the Length Field and Data pattern repeats if the continuation bit is present and set.
- the possible Length Fields are listed in Table 10,
- Each Mini-4 token is identified by a 4-bit identification number in the range of 0 to 15.
- the Mini-4 tokens for the default English and the default Spanish dictionaries are listed in Table 14.
- Mini-5 token table There are 32 elements in the Mini-5 token table.
- One or more Mini-5 token entries are specified using the following format:
- the Data field consists of one of more Mini-5 tokens, as specified by the Length Field.
- the Length Field has a (1+1) format.
- the first 1 is used to indicate the data count in the Data field (i.e., the number of Mini-5 tokens) immediately following the Length Field, and the second 1 is an optional continuation bit.
- the Length Field and Data pattern repeats if the continuation bit is present and set.
- the possible Length Fields are listed in Table 11.
- Each Mini-5 token is identified by a 6 bit value: a 5-bit identification number in the range of 0 to 31 , and a 1 -bit leading space indicator.
- the leading space indicator bit is the most significant bit of the 6 bits. If the leading indicator bit is set, it indicates that a Leading Space is present.
- the Mini-5 tokens for the default English and the default Spanish dictionaries are listed in Tables 17 and 18.
- Character String token table There are 64 elements in the Character String token table.
- One or more Character String token entries are specified using the following format:
- the Data field consists of one of more Character String tokens, as specified by the Length Field.
- the Length Field has a (2+1) format.
- the 2 is used to indicate the data count in the Data field (i.e., the number of Character String tokens) immediately following the Length Field, and the 1 is an optional continuation bit.
- the Length Field and Data pattern repeats if the continuation bit is present and set.
- the possible Length Fields are listed in Table 12.
- Each Character String token is identified by a 6-bit identification number in the range of 0 to 63.
- the Character String tokens for the default English and the default Spanish dictionaries are listed in Table 19. Fragment and Phrase
- the Fragment and Phrase token table contains the most commonly used fragments, words and phrases for the dictionary. This data type (together with Data Type Switch) defines the default data type.
- a Fragment and Phrase token is identified by a 10 bit value: a 9-bit identification number in the range of 30 to 511 (the range of 0 to 31 is used for data type switching), and a 1 -bit leading space indicator. The leading space indicator bit is the most significant bit of the 10 bits. If the leading indicator bit is set, it indicates that a Leading Space is present.
- the Fragment and Phrase tokens for the default English and the default Spanish dictionaries are listed in .
- Stand Alone and Embedded Two types of capitalization controls are defined: Stand Alone and Embedded. They are identical in function. Stand Alone capitalization control tokens are part of the Data Type Switch tokens and are 10 bits long. The Embedded capitalization control tokens are 2 bits long. The Embedded Capitalization Control tokens are represented by the symbol '0' in the Data Type Switch token strings. It identifies the relative location of the actual 2-bit data of the Capitalization Control token. The default case of a message is upper case. Thus, if a message is all in upper case, there is no need for a capitalization control.
- Cap First token The next letter encountered is capitalized.
- the capitalization flag is changed to lower case after the capitalization. If the character immediately after the Cap First token is not a letter, Cap First token remains effective until it encounters a letter and capitalizes it.
- the IDs for the capitalization control tokens are listed inTable 13.
- Data Type Switch tokens are used to group non-default data types. There are 60 Data Type Switch tokens. The possible non-default data types for a Data Type Switch are Mini-4(4), Mini-5(5), Character String(c), and Embedded Capitalization Control(O). Data Type Switch tokens are 10 bits long. The Data Type Switch tokens for the default English and the default Spanish dictionaries are listed in Tables 20 and 21.
- the Data Type Switch Token table When decompressing the message using the default English dictionary, after the identification of the 10-bit token with ID 523 as a Data Type Switch token, the Data Type Switch Token table is used.
- the table entry for ID 523 is 40c5.
- Only the Embedded Capitalization Control has a finite bit length, i.e., 2 bits. The lengths for the other types vary depending on the Length Field for each type. The type of the Capitalization Control is unknown until the 2 bits are read. Note that each field has its own termination rule.
- the processing of the last data type in this example, Mini-5) is completed, the next 10 bits of data are interpreted to continue the process of decompressing the message.
- All data types are classified into two types: Default Data Type and Non-Default Data Type.
- the default Data Types are Data Type Switch, and Fragment and Phrase.
- the non-Default Data Types are Mini-4, Mini-5, and Character String.
- the data of Default Data Type are 10 bits long. All the Non-Default Data Types are padded with 'O's to the next multiple of 10-bit boundary if not already on the boundary, except for the last group. The following is a graphical description of the padding for the Token Text Compression.
- the output is padded to be an integral number of octets.
- the data is padded with 'O's to the next multiple of 8-bit boundary if not already on the boundary.
- Token Text Compression may be used to specify portions of the data which are not compressed. To specify such data, a special case mode is used. The following rules are used to specify the data:
- Type Data Field bits 10 1 8*n 0 or more
- ⁇ so> a 10-bit indicator for the starting of Special Data Handling section of the Token Text Compression.
- Length Field a length indication.
- Data Data octets.
- the Length Field specifies the number of octets, n, in the Data field.
- P Compression padding, 0 or more bits to the next multiple of 10-bit boundary.
- the first 1 is used to indicate the number of octets immediately following the Length Field, and the second 1 is an optional continuation bit.
- the Length Field and Data pattern repeats if the continuation bit is present and set.
- the possible Length Fields are listed in Table 15.
- the message is to be compressed using the default English dictionary.
- the message is broken into (NOTE: one of many possibilities)
- only one bit of information is required on the presentation layer to specify whether the message is compressed, while m bits are required for the specification of the version number.
- the selection of the number m depends on whether it is a system specific or a customer specific version.
- a system specific version number requires uniqueness within the whole system while the customer specific version number requires the uniqueness in regards to that customer only. The former requires more bits while the latter less.
- customer compression version mapping should be part of the customer's profile at a radio common carrier or the like.
- the messaging information is composed using conventional computers and data structures, and the message is compressed using the unique procedure disclosed herein. After each message is compressed, it may be sent like a normal paging message to the paging system via the public switched telephone network or the like.
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002303357A CA2303357A1 (en) | 1998-07-14 | 1999-06-11 | Reduced overhead text messaging |
AU44392/99A AU4439299A (en) | 1998-07-14 | 1999-06-11 | Reduced overhead text messaging |
EP99927500A EP1046098A1 (en) | 1998-07-14 | 1999-06-11 | Reduced overhead text messaging |
BR9906586-0A BR9906586A (en) | 1998-07-14 | 1999-06-11 | Reduced header text messaging system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11544598A | 1998-07-14 | 1998-07-14 | |
US09/115,445 | 1998-07-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000004442A1 true WO2000004442A1 (en) | 2000-01-27 |
Family
ID=22361445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/013293 WO2000004442A1 (en) | 1998-07-14 | 1999-06-11 | Reduced overhead text messaging |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1046098A1 (en) |
AR (1) | AR019767A1 (en) |
AU (1) | AU4439299A (en) |
BR (1) | BR9906586A (en) |
CA (1) | CA2303357A1 (en) |
WO (1) | WO2000004442A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002058415A1 (en) * | 2001-01-18 | 2002-07-25 | Siemens Aktiengesellschaft | Method for transferring texts in a communication system in addition to a corresponding encoding and decoding device |
US7567586B2 (en) | 2005-10-31 | 2009-07-28 | Microsoft Corporation | Above-transport layer message partial compression |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4597057A (en) * | 1981-12-31 | 1986-06-24 | System Development Corporation | System for compressed storage of 8-bit ASCII bytes using coded strings of 4 bit nibbles |
US5325091A (en) * | 1992-08-13 | 1994-06-28 | Xerox Corporation | Text-compression technique using frequency-ordered array of word-number mappers |
US5561421A (en) * | 1994-07-28 | 1996-10-01 | International Business Machines Corporation | Access method data compression with system-built generic dictionaries |
US5585793A (en) * | 1994-06-10 | 1996-12-17 | Digital Equipment Corporation | Order preserving data translation |
US5838963A (en) * | 1995-10-25 | 1998-11-17 | Microsoft Corporation | Apparatus and method for compressing a data file based on a dictionary file which matches segment lengths |
-
1999
- 1999-06-11 CA CA002303357A patent/CA2303357A1/en not_active Abandoned
- 1999-06-11 BR BR9906586-0A patent/BR9906586A/en not_active Application Discontinuation
- 1999-06-11 AU AU44392/99A patent/AU4439299A/en not_active Abandoned
- 1999-06-11 EP EP99927500A patent/EP1046098A1/en not_active Withdrawn
- 1999-06-11 WO PCT/US1999/013293 patent/WO2000004442A1/en not_active Application Discontinuation
- 1999-07-13 AR ARP990103403 patent/AR019767A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4597057A (en) * | 1981-12-31 | 1986-06-24 | System Development Corporation | System for compressed storage of 8-bit ASCII bytes using coded strings of 4 bit nibbles |
US5325091A (en) * | 1992-08-13 | 1994-06-28 | Xerox Corporation | Text-compression technique using frequency-ordered array of word-number mappers |
US5585793A (en) * | 1994-06-10 | 1996-12-17 | Digital Equipment Corporation | Order preserving data translation |
US5561421A (en) * | 1994-07-28 | 1996-10-01 | International Business Machines Corporation | Access method data compression with system-built generic dictionaries |
US5838963A (en) * | 1995-10-25 | 1998-11-17 | Microsoft Corporation | Apparatus and method for compressing a data file based on a dictionary file which matches segment lengths |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002058415A1 (en) * | 2001-01-18 | 2002-07-25 | Siemens Aktiengesellschaft | Method for transferring texts in a communication system in addition to a corresponding encoding and decoding device |
US7567586B2 (en) | 2005-10-31 | 2009-07-28 | Microsoft Corporation | Above-transport layer message partial compression |
Also Published As
Publication number | Publication date |
---|---|
AR019767A1 (en) | 2002-03-13 |
CA2303357A1 (en) | 2000-01-27 |
AU4439299A (en) | 2000-02-07 |
EP1046098A1 (en) | 2000-10-25 |
BR9906586A (en) | 2000-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5974180A (en) | Text compression transmitter and receiver | |
US5311516A (en) | Paging system using message fragmentation to redistribute traffic | |
US6883035B2 (en) | System and method for communicating with temporary compression tables | |
US6985965B2 (en) | Static information knowledge used with binary compression methods | |
US5635914A (en) | Method and apparatus for dynamic group calling in a selective call system | |
JP2001507543A (en) | Methods and means for processing information | |
CN1212105A (en) | Method for determining available frequencies in selective call receivers | |
KR100306728B1 (en) | Messaging system providing flexible roaming and method therefor | |
US20030164781A1 (en) | Forward link text compression in satellite messaging | |
CN102695148B (en) | Methods and devices for sending and receiving short message, and short message sending and receiving system | |
US6963587B2 (en) | Communication system and method utilizing request-reply communication patterns for data compression | |
CA2428788C (en) | Static information knowledge used with binary compression methods | |
CA2189150A1 (en) | Multiple subchannel flexible protocol method and apparatus | |
EP1262931A1 (en) | Improvements in text messaging | |
EP1046098A1 (en) | Reduced overhead text messaging | |
WO2002041498A2 (en) | Communication system and method utilizing request-reply communication patterns for data compression | |
KR19980033340A (en) | Method and apparatus for managing incoming message index of wireless paging receiver | |
JPH08502872A (en) | Method for transmitting message and communication system for transmitting message | |
MXPA00002526A (en) | Reduced overhead text messaging | |
EP1605595A1 (en) | Method for compressing a short message | |
KR0173899B1 (en) | Wireless call character information transmission / reception using fixed frame | |
US7450664B2 (en) | Selection of transmission alphabet sets for short message services | |
KR19980075363A (en) | Wireless Calling Method Using Encryption Password | |
KR100191925B1 (en) | Data sequence structure for wide-area paging system | |
GB2396777A (en) | Using alpha addresses in place of numeric addresses in gsm networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU BR CA MX ZA |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 44392/99 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1999927500 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2303357 Country of ref document: CA Ref country code: CA Ref document number: 2303357 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: PA/a/2000/002526 Country of ref document: MX |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 1999927500 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1999927500 Country of ref document: EP |