WO2014029755A1 - A method for exchanging a set of replicas - Google Patents

A method for exchanging a set of replicas Download PDF

Info

Publication number
WO2014029755A1
WO2014029755A1 PCT/EP2013/067279 EP2013067279W WO2014029755A1 WO 2014029755 A1 WO2014029755 A1 WO 2014029755A1 EP 2013067279 W EP2013067279 W EP 2013067279W WO 2014029755 A1 WO2014029755 A1 WO 2014029755A1
Authority
WO
WIPO (PCT)
Prior art keywords
replicas
initial
additional
software
state
Prior art date
Application number
PCT/EP2013/067279
Other languages
French (fr)
Inventor
Patrick FROESE
Stuart Goose
Jonathan Kirsch
Nico STRAUB
Original Assignee
Siemens Aktiengesellschaft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Aktiengesellschaft filed Critical Siemens Aktiengesellschaft
Publication of WO2014029755A1 publication Critical patent/WO2014029755A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • G06F8/656Updates while running
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1433Saving, restoring, recovering or retrying at system level during software upgrading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1492Generic software techniques for error detection or fault masking by run-time replication performed by the application software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2041Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/82Solving problems relating to consistency

Definitions

  • a method for exchanging a set of replicas The invention relates to a method for exchanging a set of replicas in a state machine replication system such that the replicated service remains available and resilient during the update .
  • a state machine replication is an established technique for improving the availability of software applications in a distributed system.
  • one or several services can be provided by servers .
  • Each service can be provided by one or more servers when the clients invoke this service by making corresponding requests for the
  • replicas of a single server are executed on separate processors of the distributed information system and protocols can be used to coordinate client interactions with these replicas.
  • a state machine approach implements fault-tolerant services by replicating an software for servers and coordinating client interactions with the server replicas.
  • a state machine consists of state variables which encode its state and commands which transform its state.
  • Each command can be implemented by a deterministic program wherein the execution of a command can be atomic with respect to the other commands and modifies state variables which can produce data output.
  • the client of the state machine can make a request to execute the respective command.
  • a request can indicate a state machine a command to be performed.
  • a request can contain any relevant information to be used by the respective command.
  • a state machine replication system it is possible to run several copies or replicas of an software, wherein the replicas are logically synchronized with one another via a replication protocol.
  • a replication protocol orders all events in the system that might cause the software to change its current state.
  • the replicas execute the events according to an agreed order.
  • a component in a system is considered as faulty once its behaviour is no longer consistent with its specification.
  • a component comprises a so-called Byzantine failure when it exhibits an arbitrary and malicious behaviour which possibly involves collusion with other faulty components. If the component has a so-called fail -stop failure, the component changes to a state that permits other components to detect that a failure has occurred and then stops.
  • the system guarantees safety in all executions in which replicas fail only by crashing, and it guarantees liveness during executions in which a majority of said replicas can communicate with one another in a timely manner .
  • the system guarantees safety in all executions in which no more than a predetermined number f out of 3f+l replicas are Byzantine, and the system further guarantees liveness in executions in which at least 2f+l correct, i.e. non-Byzantine, replicas can communicate with one another over links whose delay is eventually bounded.
  • a client In a benign fault-tolerant replication system, the clients of the system can act on a reply when it is received from one or more replicas. In contrast, in a Byzantine fault-tolerant replication system, a client must wait until it collects at least f+1 identical responses before acting on them, since at most f replicas are faulty. This implies that the content was sent by at least one correct replica.
  • production software often needs to be updated after being initially deployed. For instance, it may be necessary to apply security patches to address discovered vulnerabilities within the system. Another example is that it may become necessary to apply software patches that implement an
  • a Primary/Hot Standby approach to fault tolerance two similar but slightly different copies or replicas of an software, i.e. the Primary and the Hot Standby, HSB, run in parallel.
  • the Primary controls the system and interacts with the clients, whereas the Hot Standby, HSB, receives the same inputs as the Primary and executes them but its output is typically suppressed.
  • the HSB monitors the Primary and performs a takeover operation, if it determines that the Primary has failed.
  • This Primary/HSB approach provides a conventional mechanism for performing updates without suffering downtime. To perform an update one proceeds as follows. First, the HSB is taken offline and all software patches are applied. In a second step, the HSB is brought back online and its state is
  • the Primary is taken offline and all patches are applied.
  • the takeover mechanism does cause the HSB to immediately assume responsibility and begin acting as the Primary .
  • Performing a software update without suffering downtime is more complicated in a state machine replication system because of the dual requirements that the replicas must remain consistent with one another and at the same time the system must remain available and performing.
  • a conventional simple approach can be to take all replicas offline, apply the patches, and then bring them back online. This keeps the replicas consistent with one another, however, the service is unavailable while the replicas are down, since a threshold number of replicas must be running in order to ensure
  • a method for exchanging a set of initial replicas comprising the steps of:
  • step of phasing-in comprises
  • the combined or substituted set of replicas is defined as a new set of initial replicas for following exchanges of replicas within the system.
  • the combined or substituted set of replicas is defined as the new set of initial replicas for following exchanges of replicas within the system in that the combined or substituted set of replicas adopts
  • identifiers such as addresses of the previous set of initial replicas .
  • the triggering event comprises a request for a state transfer from at least one initial replica or from at least one additional replica.
  • the triggering event comprises a request for a state transfer from a coordinating entity of the state machine replication system.
  • a number of state changes of said initial replicas caused by events is counted when the transfer of the current state of at least one initial replica has started.
  • the transfer of the current state of at least one initial replica is repeated, if the counted number of state changes exceeds a predetermined threshold value.
  • the replicas communicate with each other by means of messages of a state machine replication protocol.
  • a set of updated replicas running the updated application software is instantiated as a set of additional replicas and an update message is sent by the set of updated replicas to the set of initial replicas as a triggering event indicating that a software update is requested.
  • the set of initial replicas in response to the update message received from the updated replicas, the set of initial replicas generates a snapshot of their current state which is transferred by the initial replicas to the updated replicas to synchronize both groups of replicas.
  • individual replicas or a subset of replicas of the set of initial replicas are replaced by replicas of the combined set of replicas.
  • the number of replicas of the combined set of replicas deviates from the number of replicas in the initial set of replicas to adjust a level of redundancy in the respective system.
  • the instantiated set of additional replicas runs on the same or a different software application or software version as the set of initial replicas.
  • the initial replicas and/or the additional replicas run on real or virtual machines of a real-time system.
  • a state machine replication system is provided.
  • the state machine replication system comprises a plurality of real or virtual machines each being adapted to run one or several replicas of a software.
  • Fig. 1 shows a flowchart of an exemplary embodiment of a method for exchanging a set of initial replicas according to the present invention
  • Fig. 2 shows a diagram for illustrating the operation of a method for exchanging a set of initial replicas according to the present invention.
  • the method for exchanging a set of initial replicas according to the first aspect of the present invention comprises in an exemplary embodiment three main steps .
  • a set of additional replicas is
  • Initial replicas and additional replicas can run on real or virtual machines of a system, e.g. a real-time system.
  • a real-time system is a power grid comprising a plurality of components.
  • Another exemplary realtime system is a traffic control system.
  • a real-time system can be performed by a control system having one or several servers which control programmable objects running on control entities. In such a real-time system, it is essential that the tasks performed by the components of the real-time system are executed correctly and fault-free with sufficient
  • the current state of at least one initial replica is transferred to at least one additional replica in response to a triggering event in step S2.
  • This triggering event can comprise in a possible embodiment a request for a state transfer. In a possible embodiment, this request is launched by at least one initial replica. In another embodiment, the request for state transfer can be also launched by one of the generated additional replicas. In a still further possible embodiment, the triggering event which triggers the transfer of the current state of at least one initial replica to at least one additional replica can be provided by a
  • the set of additional replicas being now in the transferred state is phased- in by adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or by substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of the
  • the number of the new set of replicas can deviate from the number of the previous set of initial replicas.
  • individual replicas or a subset of replicas within the set of initial replicas are replaced. This can be done, for instance, if one or a subset of individual replicas in the initial set of replicas has been compromised by an attack on the system and must be replaced.
  • a level of redundancy in the system can be adjusted.
  • the number of replicas of the combined set of replicas deviates from the number of replicas in the initial set of replicas. If the number of replicas in the combined set of replicas is higher than the number of replicas in the initial set of replicas, the level of redundancy in the system is increased so that the safety and security of the system is also enhanced. On the other hand, if the number of replicas of the combined set is lower than the number of replicas in the initial set of replicas, the level of redundancy in the system is diminished.
  • a software running on a set of initial replicas is updated.
  • a set of updated replicas running the updated software is instantiated as a set of additional replicas.
  • an update message is sent by the set of updated replicas to the set of initial replicas as a triggering event indicating that a software update is requested.
  • the set of initial replicas generates a snapshot of their current state which is transferred by the initial replicas to the updated replicas to synchronize both groups of replicas with each other.
  • the initial set of replicas can be substituted by a set of additional replicas.
  • a combined and/or substituted set of replicas is defined as a new set of initial replicas for future exchanges of replicas within the system.
  • the combined and/or substituted set of replicas can be defined in a possible embodiment as the new set of initial replicas for following exchanges of replicas in that the combined and/or substituted set of replicas adopts
  • identifiers of the previous set of initial replicas can be, for example, addresses used by the previous set of initial replicas. Accordingly, an update of a software of a logical Primary group of replicas which can be implemented by N replicas may be performed in a possible embodiment by the following steps. First, the application software is updated by applying all desired patches. Then, a set of N new replicas is
  • HSG Hot Standby group
  • the logical HSB does perform a takeover operation for the logical Primary.
  • the logical HSB first synchronizes its state with that of the logical Primary. Therefore, as a first action, the HSB replicas can send a message to the primary replicas informing them that a software update is pending and that a state synchronization shall begin.
  • the Primary or initial replicas take a snapshot of their respective state. Since the Primary replicas order incoming messages using the state machine replication protocol, all correct Primary replicas execute the HSB message at the same logical point in their execution. Thus, all correct, i.e. fault-free Primary initial replicas produce identical snapshots of their current state.
  • the Primary replicas transfer their state to the HSB replicas, i.e. the additional replicas. This can be performed in different variations depending on the system requirements and fault tolerances.
  • the system can be designed to tolerate benign but not Byzantine faults.
  • a single Primary replica can take responsibility for
  • the HSB additional replicas generate a message indicating that the initial synchronization is complete. Since the Primary replicas may be executing new events while the synchronization is going on, the HSB
  • replicas may still be missing some events that were recently executed.
  • the logical Primary can compute a number of events that still need to be transferred. If this number is greater than a predetermined threshold, then the
  • the Primary initial replicas will eventually determine that the HSB additional replicas are close enough. At this point, no new events are executed at the Primary replicas and the last remaining events are passed to the HSB replicas. Finally, the Primary replicas terminate after getting an acknowledgement message from the HSB replicas.
  • the HSB replicas cannot trust that a state message from a Primary replica is
  • (f+1) Primary replicas can send the messages to (f+1) HSB replicas, or an erasure coding scheme can be used.
  • HSB replicas can send a message to the Primary replicas upon executing the last state message. Since the Primary replicas may be executing new events while the synchronization is still going on one proceeds in the Byzantine fault-tolerant system in the same manner as in a benign fault-tolerant system.
  • each HSB replica can takeover an address, in particular a network address, used by its corresponding Primary replica.
  • HSB replica A can adopt the IP address of a Primary replica A
  • a HSB replica B can adopt the IP address of a Primary replica B, etc. This can be accomplished by sending gratuitous ARP messages to re-establish MAC addresses used by the HSB replicas with appropriate IP addresses.
  • Fig. 2 shows a diagram for illustrating a method for
  • Fig. 2 there is an existing set of n replicas, so-called local Primaries LP.
  • the local Primaries have different software versions and have different addresses.
  • a set of additional replicas is
  • This logical Hot Standby replicas have in the shown example different software versions C, D and also comprise different addresses.
  • the software version C, D illustrated in Fig. 2 can be different release states or randomly compiled singular versions .
  • step S2a an update trigger between the logical Hot Standby replicas and the local Primary replicas is recognized.
  • This triggering event can be a request for a state transfer from at least one initial or additional replica. Further, this triggering event can be formed by a request for a state transfer from a coordinating entity of the state machine replication system.
  • a current state of at least one initial replica is transferred in step S2b to at least one additional replica as illustrated in Fig. 2.
  • Fig. 2 illustrates three use cases UC, where the method according to the present invention can be implemented.
  • the initial set of replicas i.e. the local Primaries
  • the additional set of replicas is substituted by the additional
  • replicas In this use case one can, for example, perform a software update.
  • a second use case UC2 selected replicas are replaced.
  • individual replicas or a subset of replicas of the set of initial replicas are replaced by replicas of the combined set of replicas. This can be performed, for
  • the number of replicas can be adjusted to increase or diminish a level of redundancy in the system.
  • the method for exchanging a set of initial replicas can be used for a wide range of use cases in a system.
  • a state machine replication system is provided, wherein a set of initial replicas or local Primaries run by the system are exchanged by other replicas by using the method according to the first aspect of the present
  • This state machine replication system can comprise a plurality of real or virtual machines or nodes each being adapted to run one or several replicas of a software.

Abstract

A method for exchanging a set of initial replicas comprising the steps of instantiating a set of additional replicas; transferring the current state of at least one initial replica to at least one additional replica in response to a triggering event; and phasing-in the set of additional replicas being in the transferred state, wherein the step of phasing-in comprises adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of additional replicas.

Description

Description
A method for exchanging a set of replicas The invention relates to a method for exchanging a set of replicas in a state machine replication system such that the replicated service remains available and resilient during the update . A state machine replication is an established technique for improving the availability of software applications in a distributed system. In a distributed IT system, one or several services can be provided by servers . Each service can be provided by one or more servers when the clients invoke this service by making corresponding requests for the
service. Although using a single, centralized server is the simplest way to implement a service, a resulting service can only be as fault-tolerant as the processor executing the server. If this level of fault-tolerance is unacceptable, multiple servers are used that fail independently.
Accordingly, replicas of a single server are executed on separate processors of the distributed information system and protocols can be used to coordinate client interactions with these replicas.
A state machine approach implements fault-tolerant services by replicating an software for servers and coordinating client interactions with the server replicas.
Generally, a state machine consists of state variables which encode its state and commands which transform its state. Each command can be implemented by a deterministic program wherein the execution of a command can be atomic with respect to the other commands and modifies state variables which can produce data output. The client of the state machine can make a request to execute the respective command. A request can indicate a state machine a command to be performed. Moreover, a request can contain any relevant information to be used by the respective command.
In a state machine replication system, it is possible to run several copies or replicas of an software, wherein the replicas are logically synchronized with one another via a replication protocol. Such a replication protocol orders all events in the system that might cause the software to change its current state. The replicas execute the events according to an agreed order.
A component in a system is considered as faulty once its behaviour is no longer consistent with its specification. A component comprises a so-called Byzantine failure when it exhibits an arbitrary and malicious behaviour which possibly involves collusion with other faulty components. If the component has a so-called fail -stop failure, the component changes to a state that permits other components to detect that a failure has occurred and then stops. Naturally,
Byzantine failures of components in the system can be most disruptive for the respective system.
Different state machine replication protocols are designed to ensure safety, i.e. replica consistency, and liveness, i.e. eventual progress, under different fault and synchronization assumptions. For example, in a benign fault-tolerant
replication system, the system guarantees safety in all executions in which replicas fail only by crashing, and it guarantees liveness during executions in which a majority of said replicas can communicate with one another in a timely manner .
In a Byzantine fault-tolerant replication system, the system guarantees safety in all executions in which no more than a predetermined number f out of 3f+l replicas are Byzantine, and the system further guarantees liveness in executions in which at least 2f+l correct, i.e. non-Byzantine, replicas can communicate with one another over links whose delay is eventually bounded.
In a benign fault-tolerant replication system, the clients of the system can act on a reply when it is received from one or more replicas. In contrast, in a Byzantine fault-tolerant replication system, a client must wait until it collects at least f+1 identical responses before acting on them, since at most f replicas are faulty. This implies that the content was sent by at least one correct replica.
In many applications, it is necessary to exchange a set of replicas by another set of replicas. For example, a
production software often needs to be updated after being initially deployed. For instance, it may be necessary to apply security patches to address discovered vulnerabilities within the system. Another example is that it may become necessary to apply software patches that implement an
additional desired functionality.
In a Primary/Hot Standby approach to fault tolerance, two similar but slightly different copies or replicas of an software, i.e. the Primary and the Hot Standby, HSB, run in parallel. The Primary controls the system and interacts with the clients, whereas the Hot Standby, HSB, receives the same inputs as the Primary and executes them but its output is typically suppressed. The HSB monitors the Primary and performs a takeover operation, if it determines that the Primary has failed.
This Primary/HSB approach provides a conventional mechanism for performing updates without suffering downtime. To perform an update one proceeds as follows. First, the HSB is taken offline and all software patches are applied. In a second step, the HSB is brought back online and its state is
synchronized with the Primary. When the synchronization is complete, the Primary is taken offline and all patches are applied. The takeover mechanism does cause the HSB to immediately assume responsibility and begin acting as the Primary .
Performing a software update without suffering downtime is more complicated in a state machine replication system because of the dual requirements that the replicas must remain consistent with one another and at the same time the system must remain available and performing. A conventional simple approach can be to take all replicas offline, apply the patches, and then bring them back online. This keeps the replicas consistent with one another, however, the service is unavailable while the replicas are down, since a threshold number of replicas must be running in order to ensure
liveness of the system. Another conventional approach is to take one replica offline, update it and then bring it back online, and then to repeat this process for any other replica until all replicas are updated. While this approach ensures that a sufficient number of replicas are always running, updated replicas may not take the same actions in response to inputs as those replicas that have not yet been updated, and thus it is possible for the state of the replicas to diverge.
Accordingly, it is an object of the present invention to provide a method for exchanging a set of replicas which overcomes the above-mentioned drawbacks and which preserves replica consistency while avoiding downtime in the respective system.
This object is achieved by a method for exchanging a set of initial replicas comprising the steps of claim 1.
According to a first aspect of the present invention, a method for exchanging a set of initial replicas is provided, wherein the method comprises the steps of:
instantiating a set of additional replicas,
transferring the current state of at least one initial replica to at least one additional replica in response to a triggering event, and phasing-in the set of additional replicas being in the transferred state,
wherein the step of phasing-in comprises
adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of additional replicas. In a possible embodiment of the method according to the first aspect of the present invention, the combined or substituted set of replicas is defined as a new set of initial replicas for following exchanges of replicas within the system. In a possible embodiment of the method according to the first aspect of the present invention, the combined or substituted set of replicas is defined as the new set of initial replicas for following exchanges of replicas within the system in that the combined or substituted set of replicas adopts
identifiers such as addresses of the previous set of initial replicas .
In a still further possible embodiment of the method
according to the first aspect of the present invention, the triggering event comprises a request for a state transfer from at least one initial replica or from at least one additional replica.
In a still further possible embodiment of the method
according to the first aspect of the present invention, the triggering event comprises a request for a state transfer from a coordinating entity of the state machine replication system. In a still further possible embodiment of the method
according to the first aspect of the present invention, a number of state changes of said initial replicas caused by events is counted when the transfer of the current state of at least one initial replica has started.
In a still further possible embodiment of the method
according to the first aspect of the present invention, the transfer of the current state of at least one initial replica is repeated, if the counted number of state changes exceeds a predetermined threshold value. In a still further possible embodiment of the method
according to the first aspect of the present invention, the replicas communicate with each other by means of messages of a state machine replication protocol. In a further possible embodiment of the method according to the first aspect of the present invention, for updating an application software running on a set of initial replicas after updating a source code of the software, a set of updated replicas running the updated application software is instantiated as a set of additional replicas and an update message is sent by the set of updated replicas to the set of initial replicas as a triggering event indicating that a software update is requested. In a further possible embodiment of the method according to the first aspect of the present invention, in response to the update message received from the updated replicas, the set of initial replicas generates a snapshot of their current state which is transferred by the initial replicas to the updated replicas to synchronize both groups of replicas.
In a still further possible embodiment of the method
according to the first aspect of the present invention, individual replicas or a subset of replicas of the set of initial replicas are replaced by replicas of the combined set of replicas. In a still further possible embodiment of the method
according to the first aspect of the present invention, the number of replicas of the combined set of replicas deviates from the number of replicas in the initial set of replicas to adjust a level of redundancy in the respective system.
In a further possible embodiment of the method according to the first aspect of the present invention, the instantiated set of additional replicas runs on the same or a different software application or software version as the set of initial replicas.
In a still further possible embodiment of the method
according to the first aspect of the present invention, the initial replicas and/or the additional replicas run on real or virtual machines of a real-time system.
According to a second aspect of the present invention, a state machine replication system is provided,
wherein a set of initial replicas of a software run by said system is exchanged by other replicas using the method according to the first aspect of the present invention.
In a possible embodiment of the state machine replication system according to the second aspect, the state machine replication system comprises a plurality of real or virtual machines each being adapted to run one or several replicas of a software. In the following, possible embodiments of different aspects of the present invention are described in more detail with reference to the enclosed figures.
Fig. 1 shows a flowchart of an exemplary embodiment of a method for exchanging a set of initial replicas according to the present invention; and Fig. 2 shows a diagram for illustrating the operation of a method for exchanging a set of initial replicas according to the present invention. As can be seen in Fig. 1, the method for exchanging a set of initial replicas according to the first aspect of the present invention comprises in an exemplary embodiment three main steps . In a first step SI, a set of additional replicas is
instantiated. Accordingly, in this step additional replicas are created or generated. The instantiated set of additional replicas can run on the same or a different software
application or software version as the set of initial
replicas. Initial replicas and additional replicas can run on real or virtual machines of a system, e.g. a real-time system. An example of a real-time system is a power grid comprising a plurality of components. Another exemplary realtime system is a traffic control system. A real-time system can be performed by a control system having one or several servers which control programmable objects running on control entities. In such a real-time system, it is essential that the tasks performed by the components of the real-time system are executed correctly and fault-free with sufficient
performance. Moreover, such a real-time system must be resilient to component failures and/or manipulations or intended attacks on the real-time system.
After having instantiated the set of additional replicas in step SI, the current state of at least one initial replica is transferred to at least one additional replica in response to a triggering event in step S2. This triggering event can comprise in a possible embodiment a request for a state transfer. In a possible embodiment, this request is launched by at least one initial replica. In another embodiment, the request for state transfer can be also launched by one of the generated additional replicas. In a still further possible embodiment, the triggering event which triggers the transfer of the current state of at least one initial replica to at least one additional replica can be provided by a
coordinating entity of the state machine replication system. In a third step S3, the set of additional replicas being now in the transferred state is phased- in by adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or by substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of the
additional replicas.
Depending on the use case, the number of the new set of replicas can deviate from the number of the previous set of initial replicas.
In a first possible use case, individual replicas or a subset of replicas within the set of initial replicas are replaced. This can be done, for instance, if one or a subset of individual replicas in the initial set of replicas has been compromised by an attack on the system and must be replaced.
In a further use case, a level of redundancy in the system can be adjusted. In this use case, the number of replicas of the combined set of replicas deviates from the number of replicas in the initial set of replicas. If the number of replicas in the combined set of replicas is higher than the number of replicas in the initial set of replicas, the level of redundancy in the system is increased so that the safety and security of the system is also enhanced. On the other hand, if the number of replicas of the combined set is lower than the number of replicas in the initial set of replicas, the level of redundancy in the system is diminished.
Consequently, the security and safety of the system is reduced.
In a still further possible use case, a software running on a set of initial replicas is updated. In this case, after updating a source code of the software, a set of updated replicas running the updated software is instantiated as a set of additional replicas. Then, an update message is sent by the set of updated replicas to the set of initial replicas as a triggering event indicating that a software update is requested. In response to the update message received from the updated replicas, the set of initial replicas generates a snapshot of their current state which is transferred by the initial replicas to the updated replicas to synchronize both groups of replicas with each other. In this use case, the initial set of replicas can be substituted by a set of additional replicas.
After steps SI, S2, S3 have been performed, a combined and/or substituted set of replicas is defined as a new set of initial replicas for future exchanges of replicas within the system. The combined and/or substituted set of replicas can be defined in a possible embodiment as the new set of initial replicas for following exchanges of replicas in that the combined and/or substituted set of replicas adopts
identifiers of the previous set of initial replicas. These identifiers can be, for example, addresses used by the previous set of initial replicas. Accordingly, an update of a software of a logical Primary group of replicas which can be implemented by N replicas may be performed in a possible embodiment by the following steps. First, the application software is updated by applying all desired patches. Then, a set of N new replicas is
instantiated which run the updated software. These replicas can be referred to as the Hot Standby group, HSG, of
replicas, which collectively implement a logical Hot Standby, HSB . The logical HSB does perform a takeover operation for the logical Primary. In order to prepare for the transfer or takeover, the logical HSB first synchronizes its state with that of the logical Primary. Therefore, as a first action, the HSB replicas can send a message to the primary replicas informing them that a software update is pending and that a state synchronization shall begin. Upon executing the message from the logical HSB, the Primary or initial replicas take a snapshot of their respective state. Since the Primary replicas order incoming messages using the state machine replication protocol, all correct Primary replicas execute the HSB message at the same logical point in their execution. Thus, all correct, i.e. fault-free Primary initial replicas produce identical snapshots of their current state.
Next, the Primary replicas transfer their state to the HSB replicas, i.e. the additional replicas. This can be performed in different variations depending on the system requirements and fault tolerances.
In a possible embodiment, the system can be designed to tolerate benign but not Byzantine faults. In this case, a single Primary replica can take responsibility for
transferring the snapshot of its state to a single HSB additional replica. Other replicas can monitor the transfer and takeover, if it appears that the transfer has not been completed due to faults. Upon receiving the last
synchronization message, the HSB additional replicas generate a message indicating that the initial synchronization is complete. Since the Primary replicas may be executing new events while the synchronization is going on, the HSB
replicas may still be missing some events that were recently executed. Upon executing, a synchronization done message from the HSB replicas, the logical Primary can compute a number of events that still need to be transferred. If this number is greater than a predetermined threshold, then the
synchronization process can be repeated to transfer the additional events. As long as the synchronization happens faster than the rate at which new events are generated, the Primary initial replicas will eventually determine that the HSB additional replicas are close enough. At this point, no new events are executed at the Primary replicas and the last remaining events are passed to the HSB replicas. Finally, the Primary replicas terminate after getting an acknowledgement message from the HSB replicas.
In a Byzantine fault-tolerant system, the HSB replicas cannot trust that a state message from a Primary replica is
legitimate, since the Primary replica may be faulty.
Therefore, in such a Byzantine fault-tolerant system multiple Primary replicas are responsible for passing state messages to the HSB replicas. There are several ways how this can be achieved. In a possible embodiment, (f+1) Primary replicas can send the messages to (f+1) HSB replicas, or an erasure coding scheme can be used. HSB replicas can send a message to the Primary replicas upon executing the last state message. Since the Primary replicas may be executing new events while the synchronization is still going on one proceeds in the Byzantine fault-tolerant system in the same manner as in a benign fault-tolerant system.
After the HSB replicas have completed synchronization and the Primary replicas have terminated, the last remaining step for each HSB replica is to adopt identifiers of the previous set of initial Primary replicas. In a possible embodiment, each HSB replica can takeover an address, in particular a network address, used by its corresponding Primary replica. For example, HSB replica A can adopt the IP address of a Primary replica A, a HSB replica B can adopt the IP address of a Primary replica B, etc. This can be accomplished by sending gratuitous ARP messages to re-establish MAC addresses used by the HSB replicas with appropriate IP addresses.
Once the exchanged replicas, i.e. HSB replicas, have assumed control of the system, the desired goal in exchanging the set of replicas has been achieved. There are after the exchange of the replicas a set of N replicas running the updated software, wherein the replicas are consistent with one another, and the service has remained available throughout the whole exchange process.
Fig. 2 shows a diagram for illustrating a method for
exchanging a set of initial replicas according to the present invention in more detail.
As can be seen in Fig. 2, there is an existing set of n replicas, so-called local Primaries LP. In the example shown in Fig. 2, the local Primaries have different software versions and have different addresses.
In a first step SI, a set of additional replicas is
instantiated by generating m Hot Standby replicas HS . This logical Hot Standby replicas have in the shown example different software versions C, D and also comprise different addresses. The software version C, D illustrated in Fig. 2 can be different release states or randomly compiled singular versions .
In a further step S2a, an update trigger between the logical Hot Standby replicas and the local Primary replicas is recognized. This triggering event can be a request for a state transfer from at least one initial or additional replica. Further, this triggering event can be formed by a request for a state transfer from a coordinating entity of the state machine replication system. In response to the triggering event, a current state of at least one initial replica is transferred in step S2b to at least one additional replica as illustrated in Fig. 2.
Fig. 2 illustrates three use cases UC, where the method according to the present invention can be implemented. In the first use case UC1, the initial set of replicas, i.e. the local Primaries, is substituted by the additional
replicas. In this use case one can, for example, perform a software update. In a second use case UC2 , selected replicas are replaced. In this use case, individual replicas or a subset of replicas of the set of initial replicas are replaced by replicas of the combined set of replicas. This can be performed, for
instance, if a single or a subset of initial replicas, i.e. local Primaries, has been compromised.
In a still further third use case UC3 , the number of replicas can be adjusted to increase or diminish a level of redundancy in the system.
As can be seen from Fig. 2, the method for exchanging a set of initial replicas can be used for a wide range of use cases in a system. According to the second aspect of the present invention, a state machine replication system is provided, wherein a set of initial replicas or local Primaries run by the system are exchanged by other replicas by using the method according to the first aspect of the present
invention. This state machine replication system can comprise a plurality of real or virtual machines or nodes each being adapted to run one or several replicas of a software.

Claims

Claims :
1. A method for exchanging a set of initial replicas
comprising the steps of:
(a) instantiating (SI) a set of additional replicas;
(b) transferring (S2) the current state of at least one initial replica to at least one additional replica in response to a triggering event; and
(c) phasing-in (S3) the set of additional replicas being in the transferred state,
wherein the step of phasing-in comprises
adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of additional replicas.
2. The method according to claim 1,
wherein the combined and/or substituted set of replicas is defined as a new set of initial replicas for following exchanges of replicas.
3. The method according to claim 2,
wherein the combined and/or substituted set of replicas is defined as the new set of initial replicas for following exchanges of replicas in that the combined or substituted set of replicas adopts identifiers of the previous set of initial replicas.
4. The method according to one of the preceding claims 1 to 3,
wherein the triggering event comprises a request for a state transfer from at least one initial or additional replica . The method according to one of the preceding claims 1 to 3,
wherein the triggering event comprises a request for a state transfer from a coordinating entity of a state machine replication system.
The method according to one of the preceding claims 1 to 5,
wherein a number of state changes of the initial replicas caused by events is counted when the transfer of the current state of the at least one initial replica has begun .
The method according to claim 6,
wherein the transfer of current state of the at least one initial replica is repeated, if the counted number of state changes exceeds a predetermined threshold value.
The method according to one of the preceding claims 1 to 7,
wherein the replicas communicate with each other by means of messages of a state machine replication protocol.
The method according to one of the preceding claims 1 to 8,
wherein for updating a software running on a set of initial replicas after updating a source code of the software a set of updated replicas running the updated software is instantiated as a set of additional replicas and an update message is sent by the set of updated replicas to the set of initial replicas as a triggering event indicating that a software update is requested.
The method according to claim 9,
wherein in response to the update message received from the updated replicas, the set of initial replicas
generates a snapshot of their current state which is transferred by the initial replicas to the updated replicas to synchronize both groups of replicas.
The method according to one of the preceding claims 1 to 10,
wherein individual replicas or a subset of replicas of the set of initial replicas are replaced by replicas of the combined set of replicas.
The method according to one of the preceding claims 1 to 10,
wherein the number of replicas of the combined set of replicas deviates from the number of replicas in the initial set of replicas to adjust a level of redundancy in the respective system.
The method according to one of the preceding claims 1 to 12,
wherein the instantiated set of additional replicas runs on the same or a different software or a software version as the set of initial replicas.
The method according to one of the preceding claims 1 to 13,
wherein the initial and/or additional replicas run on real or virtual machines of a real-time system.
A state machine replication system,
wherein a set of initial replicas of an application software run by said system is exchanged by other replicas using a method according to one of the preceding claims 1 to 14.
The state machine replication system according to claim 15,
wherein the state machine replication system comprises a plurality of real or virtual machines each being adapted to run one or several replicas of a software.
PCT/EP2013/067279 2012-08-20 2013-08-20 A method for exchanging a set of replicas WO2014029755A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261684866P 2012-08-20 2012-08-20
US61/684,866 2012-08-20

Publications (1)

Publication Number Publication Date
WO2014029755A1 true WO2014029755A1 (en) 2014-02-27

Family

ID=49123822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/067279 WO2014029755A1 (en) 2012-08-20 2013-08-20 A method for exchanging a set of replicas

Country Status (1)

Country Link
WO (1) WO2014029755A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114124707A (en) * 2021-11-22 2022-03-01 中国电子科技集团公司第五十四研究所 Network control center multipoint hot standby method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956489A (en) * 1995-06-07 1999-09-21 Microsoft Corporation Transaction replication system and method for supporting replicated transaction-based services
WO2002091179A2 (en) * 2001-04-30 2002-11-14 Sun Microsystems, Inc. Method and apparatus for migration of managed application state for a java based application

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956489A (en) * 1995-06-07 1999-09-21 Microsoft Corporation Transaction replication system and method for supporting replicated transaction-based services
WO2002091179A2 (en) * 2001-04-30 2002-11-14 Sun Microsystems, Inc. Method and apparatus for migration of managed application state for a java based application

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114124707A (en) * 2021-11-22 2022-03-01 中国电子科技集团公司第五十四研究所 Network control center multipoint hot standby method

Similar Documents

Publication Publication Date Title
EP1617331B1 (en) Efficient changing of replica sets in distributed fault-tolerant computing system
US11102084B2 (en) Fault rectification method, device, and system
WO2017067484A1 (en) Virtualization data center scheduling system and method
Babay et al. Network-attack-resilient intrusion-tolerant SCADA for the power grid
JP2003022258A (en) Backup system for server
CN111460039A (en) Relational database processing system, client, server and method
CN110545203B (en) Method for establishing initial resource backup pool and self-healing repair of cloud platform by cloud platform
WO2020024615A1 (en) Consensus process recovery method and related nodes
CN109039748B (en) Method for dynamically adding and deleting nodes by PBFT protocol
Sousa et al. State machine replication for the masses with bft-smart
US20040153704A1 (en) Automatic startup of a cluster system after occurrence of a recoverable error
Ngo et al. Tolerating slowdowns in replicated state machines using copilots
Venâncio et al. VNF‐Consensus: A virtual network function for maintaining a consistent distributed software‐defined network control plane
WO2014060465A1 (en) Control system and method for supervisory control and data acquisition
WO2015196692A1 (en) Cloud computing system and processing method and apparatus for cloud computing system
WO2014029755A1 (en) A method for exchanging a set of replicas
CN113190620A (en) Method, device, equipment and storage medium for synchronizing data between Redis clusters
US11140221B2 (en) Network-attack-resilient intrusion-tolerant SCADA architecture
CN111083074A (en) High availability method and system for main and standby dual OSPF state machines
AT&T
TW202001556A (en) Fault tolerance method and system for virtual machine group
Sabino Bytam: a byzantine fault tolerant adaptation manager
Bravo et al. Policy-based adaptation of a byzantine fault tolerant distributed graph database
JP2022503583A (en) Non-destructive upgrade methods, equipment and systems for distributed tuning engines in a distributed computing environment
Zbierski Iwazaru: the byzantine sequencer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13759455

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13759455

Country of ref document: EP

Kind code of ref document: A1