SYSTEM FOR HOT STANDBY OF A TELEPHONE SWITCHING MATRIX FIELD AND BACKGROUND OF THE INVENTION
The present invention relates to a system for hot standby of a telephone switching matrix, and in particular, to a hot standby system in which only those components necessary for fault tolerance are duplicated, including the memory and the CPU.
Many different systems which rely upon a collection of electronic components feature redundant elements in order to prevent system failure if one or more electronic components fail. Such systems are described as "fault tolerant" since the failure of one component does not result in a total failure of the system. Such redundancy is typically achieved by duplicating entire components such as computers within the system, such that each computer can immediately take over the functions of a failed computer. Examples of systems for which fault tolerance is desirable include, but are not limited to, data processing systems such as banking systems; real time electrical control systems such as those employed for factories, aircraft and utility power systems; and telecommunications systems such as telephone exchange systems and satellite relay stations. For example, in a telephone system and in particular for the PBX, a fault tolerant system is desirable in order to avoid loss of telephone functionality if a single element of the telephone system fails. One solution to the problem of fault tolerance is disclosed in U.S. Patent No. 4,466,098, which features a circuit for updating the memory of a standby computer which is maintained in "hot standby" mode. This solution is a typical example of solutions available in the background art, in that the complete computer is duplicated and placed in "hot standby" mode. The term "hot standby" refers to the state of the standby computer, which is constantly maintained in readiness to take over the functions of the active computer.
Every action taken by the CPU of the active computer is reported to the CPU of the standby computer. Furthermore, the entire memory of the active computer is duplicated in the standby computer. Thus, the standby computer is a passive mirror image of the active computer, and so is able to take over the functions of the active computer at any time.
Unfortunately, the system disclosed in U.S. Patent No. 4,466,098 has a number of drawbacks. First, duplicating the entire computer is costly in terms of hardware. Second, specially adapted hardware is required, such as the special circuit of U.S. Patent No. 4,466,098. These drawbacks are typical of currently available systems which are known in the background art.
A more useful solution would only duplicate those components which are essential for the maintenance of "hot standby" mode, yet would still enable the standby component to immediately take over the functions of the failed component without losing the status of any calls in the switching matrix. Such a solution would be even more useful if it required mainly "off the shelf components, rather than specialized, expensive hardware. Unfortunately, such a solution is not currently available.
Therefore, there is an unmet need for, and it would be highly useful to have, a "hot standby" system in which only those components required for the maintenance of the processor in "hot standby" mode are duplicated, including the CPU, the memory and the switching matrix, such that hardware overhead is minimized and such that highly specialized equipment is not required.
SUMMARY OF THE INVENTION
The system of the present invention features at least one, but preferably two microcomputer buses, such as a compact PCI bus. The system also features two identical memory cards, both of which are connected to the at least one bus, and two CPU boards or components. If
two buses are present, each bus is connected to a particular CPU. Otherwise both CPU's are connected to the one bus. A first CPU is the active CPU, and its associated memory, switching matrix and other components are the active components. The active CPU controls the functions of the system, such as for a telephone system. The second CPU and the second associated set of components are in "hot standby" mode. The standby memory "snoops" for write commands which are written to a particular segment of the active memory by the active CPU, and mimics these commands. Thus, the memory is duplicated only as necessary, enabling the standby CPU and associated set of standby components to be maintained in a state of readiness to take over from the active CPU if one or more active components should fail.
According to the present invention, there is provided a hot standby system, comprising: (a) an active CPU (central processing unit) for controlling the system; (b) an active memory card associated with the active CPU, the active memory card featuring a local bus and containing at least the memory to be duplicated; (c) a standby CPU; (d) a standby memory card for duplicating data written according to a write command to the active memory card from the active CPU if the write command is to an address within a particular range of addresses, the standby memory card featuring a local bus and at least the duplicate memory; (e) at least one bus for connecting the active CPU to the active memory card and to the standby memory card, and for connecting the standby CPU to the active memory card and to the standby memory card; and (f) a bridge for connecting the standby memory card and the active memory card to the at least one bus, the bridge of the standby memory card detecting the write command to the active memory card and comparing the address of the write command to the particular range of addresses, such that the bridge sends the write command to the standby memory card if the address is within the particular range of addresses, without notifying the at least one bus.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein:
FIG. 1 is a diagram of an exemplary hot standby system according to the present invention;
FIGS. 2 A and 2B illustrate features of the system of Figure 1 in more detail according to a preferred embodiment of the present invention;
FIG. 3 is a map of the bus translation from the PCI bus memory space to the active memory space and the standby memory space of Figures 1 and 2 A; and
FIG. 4 is a pinout of a PCI bridge core suitable for use with the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENTS The system of the present invention features at least one, but preferably two microcomputer buses, such as a compact PCI bus. The system also features two identical cards containing (among other items) memory to be duplicated, both of which are connected to the at least one bus, and two CPU boards or components. If two buses are present, each bus is connected to a particular CPU. Otherwise both CPU boards or components are connected to the one bus. A first CPU is the active CPU, and its associated memory card contains the active memory, with other active components. The term "memory card" includes a card or a set of components containing at least the memory to be duplicated, for the active card, and the duplicate memory, for the standby card. As described in greater detail below, other components may also be on this card, and the card may also be a set of components on a board instead of a separate card.
The active CPU controls the functions of the system, such as for a telephone system. The second CPU and the second associated memory, and
other associated components, are in "hot standby" mode. The standby memory snoops for write commands which are written to a particular segment of the active memory by the active CPU, and mimics these commands. Thus, the memory is duplicated only as necessary, enabling the standby CPU to be maintained in a state of readiness to take over from the active CPU if the latter component should fail.
The principles and operation of the system according to the present invention may be better understood with reference to the drawings and the accompanying description. Referring now to the drawings, Figure 1 is an illustration of an exemplary system according to the present invention. As shown in Figure 1, a hot standby system 10 features at least one standard microcomputer bus, such as a first PCI bus 12. For the sake of clarity, the discussion below centers upon a hot standby system 10 with a 2 bus configuration, it being understood that this is only for descriptive purposes and is not meant to be limiting in any way.
First PCI bus 12 is connected to two separate and preferably identical memory cards, a first memory card 14 and a second memory card 16. Both first memory card 14 and second memory card 16 contain memory and other components such as a switching matrix. Although only first memory card 14 and second memory card 16 are shown for the purposes of illustration, other such cards could also be connected to first PCI bus 12 or to second PCI bus 26.
First memory card 14 is connected to first PCI bus 12 through a first bridge 18 and second memory card 16 is connected to first PCI bus 12 through a second bridge 20. First PCI bus 12 is also connected to a first CPU (central processing unit) 22. First PCI bus 12 is optionally also connected to a second CPU 24 if a single bus structure is used and if first CPU 22 and second CPU 24 support such a system. However, this does not protect
system 10 from certain bus-related failures, such as a locked-up bus. Otherwise, for this embodiment of the present invention, which is discussed in greater detail throughout the remainder of the description of the present invention for the sake of clarity only and without intending to be limiting, first PCI bus 12 is only connected to first CPU 22, and second CPU 24 is connected to a second PCI bus 26. Second PCI bus 26 is also connected to both first memory card 14 and to second memory card 16, through bridges as discussed in greater detail below.
For the purposes of description only and without intending to be limiting in any way, first CPU 22 is described as the active CPU, while second CPU 24 is described as the standby CPU. Similarly, first memory card 14 is described as the active memory card, while second memory card 16 is described as the standby memory card. However, these designations can be changed at any time, such that second CPU 24 would become the active CPU and first CPU 22 would become the standby CPU, and so forth, as described in greater detail below.
Each of first memory card 14 and second memory card 16 is connected to second PCI bus 26 through one of two bridges 28 or 30. Each bridge 28 or 30 is directly addressable by the associated CPU, second CPU 24. Bridges 18 and 20 are directly addressable by first CPU 22. The structures of bridges 18, 20, 28 and 30, first memory card 14 and second memory card 16 are shown in greater detail below with regard to Figure 2 A.
The function of system 10 is as follows. First CPU 22 controls the functions for which system 10 is required, for example to control a telephone system. In order to perform these functions, first CPU 22 sends write commands to first memory card 14, which contains a number of components including the telephone switching matrix (see Figure 2 A below). Second memory card 16 acts in snooping mode, during which second memory card 16 mimics all write commands which are sent to first memory card 14,
preferably within certain memory address ranges in order to avoid unnecessary duplication. Bridge 20 which is associated with second memory card 16 detects the write command, and preferably compares the memory address of the write command with those range or ranges of memory addresses which must be duplicated. If the write command is to an address within that range or ranges of memory addresses, bridge 20 then causes the write command to be written to second memory card 16.
Preferably, those ranges of memory addresses which are duplicated are determined according to the particular information stored therein, which is implementation dependent. For example, for a telephone system, preferably the information which is duplicated includes, but is not limited to, the configuration of each phone, the dynamic state table of the components of the system, any necessary definitions and the settings/configurations of the switching matrix. The switching matrix is optionally directly addressed by first CPU 22.
Alternatively, the switching matrix is controlled by a separate CPU on first memory card 14, although the actions of this separate CPU would be at least partially controlled according to information stored by first CPU 22 on first memory card 14. Also preferably, the information which must be duplicated is stored on first memory card 14, which is duplicated on second memory card 16. Information which does not need to be duplicated can therefore be stored on a memory which is only associated with first CPU 22 or second CPU 24, or in an on-board memory of either first CPU 22 or second CPU 24, for faster access.
First CPU 22 and second CPU 24 can communicate through control words or messages stored in a particular address of first memory card 14 and/or second memory card 16. For example, by changing a word which is stored, first CPU 22 could tell second CPU 24 to take over. This word would
also inform first memory card 14 and second memory card 16 of the identity of the active CPU, thereby determining which memory card is in "active mode" and which memory card is in "snooping mode". In addition, preferably second CPU 24 can communicate with first CPU 22 by leaving a message in a specific memory location of either first memory card 14 or second memory card 16. Thus, the structure enables first CPU 22 and second CPU 24 to communicate in order to determine which CPU is the active CPU.
Figure 2 A shows a portion of Figure 1 in more detail, including the structures of bridge 20, which could be any of bridges 18, 20, 28 or 30, and a memory card 32, which could be either first memory card 14 or second memory card 16. In addition, a PCI bus 33 is shown which could be either of first bus 12 or second bus 26. Memory card 32 features a local bus 34, a switching matrix 35, a local CPU 37 and a control arbiter 36. Control arbiter 36 controls the ability of any device on local bus 34, such as the one of the two bridges in the bridge pair of bridges 18 or 28, or one of the two bridges in the other bridge pair of bridges 20 or 30, to access local bus 34 at any one time (only one bridge is shown for the purposes of clarity). Write commands are performed by bridge 20 by first requesting permission of control arbiter 36 and then, upon reception of permission, writing the data to memory card 32 through local bus 34. Since control of local bus 34 may not be immediately available to bridge 20, bridge 20 preferably features a FIFO memory 38. The data to be written and the address is latched inside bridge 20 by being stored in FIFO memory 38. The data is then written to the proper address in memory card 32 when bridge 20 has control of local bus 34.
As stated previously, bridge 20 "snoops" for write commands to certain memory addresses, described in greater detail below with regard to Figure 3. When snooping is enabled, bridge 20 snoops for these write
commands from first CPU 22 to first memory card 14 which are targeted to a specific address range. When bridge 20 detects a write command to an address within this specific range, bridge 20 performs a local write transaction through FIFO memory 38. However, bridge 20 does not assert any signal to first PCI bus 12, but simply performs the local write transaction. Preferably, control arbiter 36 gives the highest priority for bridge 20 toward local bus 34. This requirement is strongly preferred in order to ensure that all write commands within the specific address range are written accurately and completely to memory card 32. The address range to be snooped is defined in a snooping space base register 40, which is one of three base registers of bridge 28. As described in greater detail below with regard to Figure 3, the other two base registers are a working space base register 42 which is a register of target addresses, and an I/O base register 44 which is used for control. When snooping is enabled, snooping space base register 40 is the base address for the range of addresses to be snooped. Otherwise, this is a regular target address, similar to working space base register 42.
In order to advise local bus 34 as to which of these three registers has matched the PCI address, there are three output pins located on bridge 20. A single pin is associated with each of these three base address registers, and is asserted whenever there is a transaction on PCI bus 33 in the address range of that register (see Figure 4 below for a more detailed description of bridge 20).
In addition, bridge 20 preferably features a local-to-PCI offset register 46, shown also in Figure 2B. Such an offset register is necessary because the width of local bus 34 is 25 bits (32 megabytes), while the width of both first PCI bus 12 and second PCI bus 26 is 32 bits. Therefore, 7 bits need to be added to the 25 bit local address from local-to-PCI offset register 46 whenever local bus 34 accesses first PCI bus 12 or second PCI bus 26.
Local-to-PCI offset register 46 should be a programmable register which is programmed and accessed from the bridge PCI configuration space.
As another preferred embodiment, bridge 20 optionally features a second FIFO memory 48. Second FIFO memory 48 would be used when bridge 20 is in the regular target mode. However, second FIFO memory 48 is optional since most transactions on the PCI bus are single transactions rather than burst transactions, such that second FIFO memory 48 is required only if there is no other support for burst transactions issued by a PCI master on the PCI bus. As shown in Figure 3, a PCI memory space 50 is translated for each memory card according to a bus translation layout 52. PCI memory space 50 features addresses from 0 megabytes to 4 gigabytes. The identities of the information at certain of these addresses are shown. Preferably, from 14 megabytes to 18 megabytes is a working space 54. Working space 54 is mapped to active memory space 57 of active board 56, and through snooping to a standby memory space 60 of standby memory 58, the difference being that for standby memory 58, working space 54 is a standby memory space 60. Thus, the write commands to working space 54 are preferably stored in both active memory 56 in space 57 and standby memory 58 in standby memory space 60.
In addition, information written to PCI memory space 50 from 22 megabytes to 26 megabytes (space 55) is mapped to directly address the same area (space 57) of the active memory card, without being snooped by the standby card. This allows testing of the components, and permits communication between the active and the standby memory card without mirroring such communication. Similarly, information written to PCI memory space 50 from 32 megabytes to 36 megabytes (space 59) is written directly to the standby memory card (space 60), which is the direct access mode.
As mentioned previously, the CPU can write to the bridge to either enable or disable snooping, thereby enabling the active CPU to turn off snooping. This is a second method to allow the active CPU to write to the memory without duplication of the written data. Figure 4 is a pinout diagram 62 of an appropriately designed core of
PCI bridge 20. The description of each pin is obvious to one of ordinary skill in the art, and so only those pins which are required for the operation of the present invention are described herein. PCI bridge 20 is a bus interface unit for interfacing between local bus 34 and a PCI bus, such as PCI bus 33. One example of such a bridge core is the EC210 PCI bus master core (Eureka Technology Inc., Los Altos, California, USA), with modifications as described below.
PCI bridge 20 supports three base address registers when functioning as a target. These three base address registers are BAR0, BAR1 and BAR2. Each base address register is memory mapped. B AR2 provides both normal target support and snooping support. In the snoop target mode, BAR2 does not respond to the PCI bus transactions but only snoops on the PCI bus. In order to support this added functionality, the pinout of PCI bridge core 62 is modified as follows. PCI bridge core 62 features three additional pins, each of which is an active output when the respective address of this pin is the target of a PCI transaction. One of the chip select signals is asserted at the same time as H_ADSM# is asserted, which is the standard bridge signal indicating a target read/write access. Next, PCI bridge core 62 features H_SNP, which is an input signal determining the snoop mode for BAR2. When this input signal is high, BAR2 functions in snoop mode. When this input signal is low, BAR2 functions in normal target mode. H_WREN#, the write enable output signal, indicates to the memory card to write data, which is available in the
H ADOUT bus. As previously mentioned, preferably priority is given to snooped data. If H_SNP is low, the address BAR2 acts as a regular identifying address. Thus, the same address can be loaded to both the active memory card bridge and the standby memory card bridge, and H_SNP can be used to control the action of the bridge at both the active memory card and the standby memory card.
As noted previously, PCI bridge 20 has three base address registers, BARO, BAR1 and BAR2. When accessed as a target, PCI bridge 20 compares the incoming address with all three base address registers. When one of these three base address registers matches the incoming address, PCI bridge core 62 asserts H_ADSM#. PCI bridge core 62 also asserts one of the H_CS# signals to indicate which base address register matches the incoming address. H_CS[0]# is asserted if the address matches BARO, H_CS[1]# is asserted for BAR1 and H_CS[2]# for BAR2. As previously mentioned, B AR2 functions in the snoop mode when the H_SNP input is high. When the incoming address matches BAR2 in snoop mode, PCI bridge 20 does not assert any signal on the PCI bus, listens to the PCI bus. If the access is a read access, then PCI bridge 20 ignores the access. If the access is a write access, then PCI bridge 20 copies the write data to first FIFO memory 38, from which the data is copied to the standby memory card.
PCI bridge core 62 asserts H_ADSM# with H_CS[2] and H_WREN# asserted to request permission from control arbiter 36 to write the snooped data without delay from FIFO 38 to memory card 32. Control arbiter 36 must give prompt access to bridge 20 for this data, and to write the data from FIFO memory 38 when H_WREN# is asserted since bridge 20 cannot insert a wait state to local bus 34 during snoop mode. Multiple H_WREN# signals may be asserted if the write access is a burst write transaction. Thus, in snoop mode, bridge 20 writes data to FIFO memory 38, which is then written to the
standby memory card, thereby enabling the standby CPU to remain on "hot standby".
It should be noted that the pinout diagram for the PCI bridge core, and the description of the signals for snoop mode, are intended as examples only, and that variations are possible which could easily be determined by one of ordinary skill in the art. The important functions of the PCI bridge core are to enable the bridge to snoop on transactions on the PCI bus, to determine if the transaction is a write access and if the write access is to a memory address which lies within a predefined range of addresses, and then to duplicate the write command to the address at the standby memory card.
While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made.