WO2014029755A1

WO2014029755A1 - A method for exchanging a set of replicas

Info

Publication number: WO2014029755A1
Application number: PCT/EP2013/067279
Authority: WO
Inventors: Patrick FROESE; Stuart Goose; Jonathan Kirsch; Nico STRAUB
Original assignee: Siemens Aktiengesellschaft
Priority date: 2012-08-20
Filing date: 2013-08-20
Publication date: 2014-02-27

Abstract

A method for exchanging a set of initial replicas comprising the steps of instantiating a set of additional replicas; transferring the current state of at least one initial replica to at least one additional replica in response to a triggering event; and phasing-in the set of additional replicas being in the transferred state, wherein the step of phasing-in comprises adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of additional replicas.

Description

A method for exchanging a set of replicas The invention relates to a method for exchanging a set of replicas in a state machine replication system such that the replicated service remains available and resilient during the update . A state machine replication is an established technique for improving the availability of software applications in a distributed system. In a distributed IT system, one or several services can be provided by servers . Each service can be provided by one or more servers when the clients invoke this service by making corresponding requests for the

service. Although using a single, centralized server is the simplest way to implement a service, a resulting service can only be as fault-tolerant as the processor executing the server. If this level of fault-tolerance is unacceptable, multiple servers are used that fail independently.

Accordingly, replicas of a single server are executed on separate processors of the distributed information system and protocols can be used to coordinate client interactions with these replicas.

A state machine approach implements fault-tolerant services by replicating an software for servers and coordinating client interactions with the server replicas.

Generally, a state machine consists of state variables which encode its state and commands which transform its state. Each command can be implemented by a deterministic program wherein the execution of a command can be atomic with respect to the other commands and modifies state variables which can produce data output. The client of the state machine can make a request to execute the respective command. A request can indicate a state machine a command to be performed. Moreover, a request can contain any relevant information to be used by the respective command.

In a state machine replication system, it is possible to run several copies or replicas of an software, wherein the replicas are logically synchronized with one another via a replication protocol. Such a replication protocol orders all events in the system that might cause the software to change its current state. The replicas execute the events according to an agreed order.

A component in a system is considered as faulty once its behaviour is no longer consistent with its specification. A component comprises a so-called Byzantine failure when it exhibits an arbitrary and malicious behaviour which possibly involves collusion with other faulty components. If the component has a so-called fail -stop failure, the component changes to a state that permits other components to detect that a failure has occurred and then stops. Naturally,

Byzantine failures of components in the system can be most disruptive for the respective system.

Different state machine replication protocols are designed to ensure safety, i.e. replica consistency, and liveness, i.e. eventual progress, under different fault and synchronization assumptions. For example, in a benign fault-tolerant

replication system, the system guarantees safety in all executions in which replicas fail only by crashing, and it guarantees liveness during executions in which a majority of said replicas can communicate with one another in a timely manner .

In a Byzantine fault-tolerant replication system, the system guarantees safety in all executions in which no more than a predetermined number f out of 3f+l replicas are Byzantine, and the system further guarantees liveness in executions in which at least 2f+l correct, i.e. non-Byzantine, replicas can communicate with one another over links whose delay is eventually bounded.

In a benign fault-tolerant replication system, the clients of the system can act on a reply when it is received from one or more replicas. In contrast, in a Byzantine fault-tolerant replication system, a client must wait until it collects at least f+1 identical responses before acting on them, since at most f replicas are faulty. This implies that the content was sent by at least one correct replica.

In many applications, it is necessary to exchange a set of replicas by another set of replicas. For example, a

production software often needs to be updated after being initially deployed. For instance, it may be necessary to apply security patches to address discovered vulnerabilities within the system. Another example is that it may become necessary to apply software patches that implement an

additional desired functionality.

In a Primary/Hot Standby approach to fault tolerance, two similar but slightly different copies or replicas of an software, i.e. the Primary and the Hot Standby, HSB, run in parallel. The Primary controls the system and interacts with the clients, whereas the Hot Standby, HSB, receives the same inputs as the Primary and executes them but its output is typically suppressed. The HSB monitors the Primary and performs a takeover operation, if it determines that the Primary has failed.

This Primary/HSB approach provides a conventional mechanism for performing updates without suffering downtime. To perform an update one proceeds as follows. First, the HSB is taken offline and all software patches are applied. In a second step, the HSB is brought back online and its state is

synchronized with the Primary. When the synchronization is complete, the Primary is taken offline and all patches are applied. The takeover mechanism does cause the HSB to immediately assume responsibility and begin acting as the Primary .

Performing a software update without suffering downtime is more complicated in a state machine replication system because of the dual requirements that the replicas must remain consistent with one another and at the same time the system must remain available and performing. A conventional simple approach can be to take all replicas offline, apply the patches, and then bring them back online. This keeps the replicas consistent with one another, however, the service is unavailable while the replicas are down, since a threshold number of replicas must be running in order to ensure

liveness of the system. Another conventional approach is to take one replica offline, update it and then bring it back online, and then to repeat this process for any other replica until all replicas are updated. While this approach ensures that a sufficient number of replicas are always running, updated replicas may not take the same actions in response to inputs as those replicas that have not yet been updated, and thus it is possible for the state of the replicas to diverge.

Accordingly, it is an object of the present invention to provide a method for exchanging a set of replicas which overcomes the above-mentioned drawbacks and which preserves replica consistency while avoiding downtime in the respective system.

This object is achieved by a method for exchanging a set of initial replicas comprising the steps of claim 1.

According to a first aspect of the present invention, a method for exchanging a set of initial replicas is provided, wherein the method comprises the steps of:

instantiating a set of additional replicas,

transferring the current state of at least one initial replica to at least one additional replica in response to a triggering event, and phasing-in the set of additional replicas being in the transferred state,

wherein the step of phasing-in comprises

adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of additional replicas. In a possible embodiment of the method according to the first aspect of the present invention, the combined or substituted set of replicas is defined as a new set of initial replicas for following exchanges of replicas within the system. In a possible embodiment of the method according to the first aspect of the present invention, the combined or substituted set of replicas is defined as the new set of initial replicas for following exchanges of replicas within the system in that the combined or substituted set of replicas adopts

identifiers such as addresses of the previous set of initial replicas .

In a still further possible embodiment of the method

according to the first aspect of the present invention, the triggering event comprises a request for a state transfer from at least one initial replica or from at least one additional replica.

In a still further possible embodiment of the method

according to the first aspect of the present invention, the triggering event comprises a request for a state transfer from a coordinating entity of the state machine replication system. In a still further possible embodiment of the method

according to the first aspect of the present invention, a number of state changes of said initial replicas caused by events is counted when the transfer of the current state of at least one initial replica has started.

In a still further possible embodiment of the method

according to the first aspect of the present invention, the transfer of the current state of at least one initial replica is repeated, if the counted number of state changes exceeds a predetermined threshold value. In a still further possible embodiment of the method

according to the first aspect of the present invention, the replicas communicate with each other by means of messages of a state machine replication protocol. In a further possible embodiment of the method according to the first aspect of the present invention, for updating an application software running on a set of initial replicas after updating a source code of the software, a set of updated replicas running the updated application software is instantiated as a set of additional replicas and an update message is sent by the set of updated replicas to the set of initial replicas as a triggering event indicating that a software update is requested. In a further possible embodiment of the method according to the first aspect of the present invention, in response to the update message received from the updated replicas, the set of initial replicas generates a snapshot of their current state which is transferred by the initial replicas to the updated replicas to synchronize both groups of replicas.

In a still further possible embodiment of the method

according to the first aspect of the present invention, individual replicas or a subset of replicas of the set of initial replicas are replaced by replicas of the combined set of replicas. In a still further possible embodiment of the method

according to the first aspect of the present invention, the number of replicas of the combined set of replicas deviates from the number of replicas in the initial set of replicas to adjust a level of redundancy in the respective system.

In a further possible embodiment of the method according to the first aspect of the present invention, the instantiated set of additional replicas runs on the same or a different software application or software version as the set of initial replicas.

In a still further possible embodiment of the method

according to the first aspect of the present invention, the initial replicas and/or the additional replicas run on real or virtual machines of a real-time system.

According to a second aspect of the present invention, a state machine replication system is provided,

wherein a set of initial replicas of a software run by said system is exchanged by other replicas using the method according to the first aspect of the present invention.

In a possible embodiment of the state machine replication system according to the second aspect, the state machine replication system comprises a plurality of real or virtual machines each being adapted to run one or several replicas of a software. In the following, possible embodiments of different aspects of the present invention are described in more detail with reference to the enclosed figures.

Fig. 1 shows a flowchart of an exemplary embodiment of a method for exchanging a set of initial replicas according to the present invention; and Fig. 2 shows a diagram for illustrating the operation of a method for exchanging a set of initial replicas according to the present invention. As can be seen in Fig. 1, the method for exchanging a set of initial replicas according to the first aspect of the present invention comprises in an exemplary embodiment three main steps . In a first step SI, a set of additional replicas is

instantiated. Accordingly, in this step additional replicas are created or generated. The instantiated set of additional replicas can run on the same or a different software

application or software version as the set of initial

replicas. Initial replicas and additional replicas can run on real or virtual machines of a system, e.g. a real-time system. An example of a real-time system is a power grid comprising a plurality of components. Another exemplary realtime system is a traffic control system. A real-time system can be performed by a control system having one or several servers which control programmable objects running on control entities. In such a real-time system, it is essential that the tasks performed by the components of the real-time system are executed correctly and fault-free with sufficient

performance. Moreover, such a real-time system must be resilient to component failures and/or manipulations or intended attacks on the real-time system.

After having instantiated the set of additional replicas in step SI, the current state of at least one initial replica is transferred to at least one additional replica in response to a triggering event in step S2. This triggering event can comprise in a possible embodiment a request for a state transfer. In a possible embodiment, this request is launched by at least one initial replica. In another embodiment, the request for state transfer can be also launched by one of the generated additional replicas. In a still further possible embodiment, the triggering event which triggers the transfer of the current state of at least one initial replica to at least one additional replica can be provided by a

coordinating entity of the state machine replication system. In a third step S3, the set of additional replicas being now in the transferred state is phased- in by adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or by substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of the

additional replicas.

Depending on the use case, the number of the new set of replicas can deviate from the number of the previous set of initial replicas.

In a first possible use case, individual replicas or a subset of replicas within the set of initial replicas are replaced. This can be done, for instance, if one or a subset of individual replicas in the initial set of replicas has been compromised by an attack on the system and must be replaced.

In a further use case, a level of redundancy in the system can be adjusted. In this use case, the number of replicas of the combined set of replicas deviates from the number of replicas in the initial set of replicas. If the number of replicas in the combined set of replicas is higher than the number of replicas in the initial set of replicas, the level of redundancy in the system is increased so that the safety and security of the system is also enhanced. On the other hand, if the number of replicas of the combined set is lower than the number of replicas in the initial set of replicas, the level of redundancy in the system is diminished.

Consequently, the security and safety of the system is reduced.

In a still further possible use case, a software running on a set of initial replicas is updated. In this case, after updating a source code of the software, a set of updated replicas running the updated software is instantiated as a set of additional replicas. Then, an update message is sent by the set of updated replicas to the set of initial replicas as a triggering event indicating that a software update is requested. In response to the update message received from the updated replicas, the set of initial replicas generates a snapshot of their current state which is transferred by the initial replicas to the updated replicas to synchronize both groups of replicas with each other. In this use case, the initial set of replicas can be substituted by a set of additional replicas.

After steps SI, S2, S3 have been performed, a combined and/or substituted set of replicas is defined as a new set of initial replicas for future exchanges of replicas within the system. The combined and/or substituted set of replicas can be defined in a possible embodiment as the new set of initial replicas for following exchanges of replicas in that the combined and/or substituted set of replicas adopts

identifiers of the previous set of initial replicas. These identifiers can be, for example, addresses used by the previous set of initial replicas. Accordingly, an update of a software of a logical Primary group of replicas which can be implemented by N replicas may be performed in a possible embodiment by the following steps. First, the application software is updated by applying all desired patches. Then, a set of N new replicas is

instantiated which run the updated software. These replicas can be referred to as the Hot Standby group, HSG, of

replicas, which collectively implement a logical Hot Standby, HSB . The logical HSB does perform a takeover operation for the logical Primary. In order to prepare for the transfer or takeover, the logical HSB first synchronizes its state with that of the logical Primary. Therefore, as a first action, the HSB replicas can send a message to the primary replicas informing them that a software update is pending and that a state synchronization shall begin. Upon executing the message from the logical HSB, the Primary or initial replicas take a snapshot of their respective state. Since the Primary replicas order incoming messages using the state machine replication protocol, all correct Primary replicas execute the HSB message at the same logical point in their execution. Thus, all correct, i.e. fault-free Primary initial replicas produce identical snapshots of their current state.

Next, the Primary replicas transfer their state to the HSB replicas, i.e. the additional replicas. This can be performed in different variations depending on the system requirements and fault tolerances.

In a possible embodiment, the system can be designed to tolerate benign but not Byzantine faults. In this case, a single Primary replica can take responsibility for

transferring the snapshot of its state to a single HSB additional replica. Other replicas can monitor the transfer and takeover, if it appears that the transfer has not been completed due to faults. Upon receiving the last

synchronization message, the HSB additional replicas generate a message indicating that the initial synchronization is complete. Since the Primary replicas may be executing new events while the synchronization is going on, the HSB

replicas may still be missing some events that were recently executed. Upon executing, a synchronization done message from the HSB replicas, the logical Primary can compute a number of events that still need to be transferred. If this number is greater than a predetermined threshold, then the

synchronization process can be repeated to transfer the additional events. As long as the synchronization happens faster than the rate at which new events are generated, the Primary initial replicas will eventually determine that the HSB additional replicas are close enough. At this point, no new events are executed at the Primary replicas and the last remaining events are passed to the HSB replicas. Finally, the Primary replicas terminate after getting an acknowledgement message from the HSB replicas.

In a Byzantine fault-tolerant system, the HSB replicas cannot trust that a state message from a Primary replica is

legitimate, since the Primary replica may be faulty.

Therefore, in such a Byzantine fault-tolerant system multiple Primary replicas are responsible for passing state messages to the HSB replicas. There are several ways how this can be achieved. In a possible embodiment, (f+1) Primary replicas can send the messages to (f+1) HSB replicas, or an erasure coding scheme can be used. HSB replicas can send a message to the Primary replicas upon executing the last state message. Since the Primary replicas may be executing new events while the synchronization is still going on one proceeds in the Byzantine fault-tolerant system in the same manner as in a benign fault-tolerant system.

After the HSB replicas have completed synchronization and the Primary replicas have terminated, the last remaining step for each HSB replica is to adopt identifiers of the previous set of initial Primary replicas. In a possible embodiment, each HSB replica can takeover an address, in particular a network address, used by its corresponding Primary replica. For example, HSB replica A can adopt the IP address of a Primary replica A, a HSB replica B can adopt the IP address of a Primary replica B, etc. This can be accomplished by sending gratuitous ARP messages to re-establish MAC addresses used by the HSB replicas with appropriate IP addresses.

Once the exchanged replicas, i.e. HSB replicas, have assumed control of the system, the desired goal in exchanging the set of replicas has been achieved. There are after the exchange of the replicas a set of N replicas running the updated software, wherein the replicas are consistent with one another, and the service has remained available throughout the whole exchange process.

Fig. 2 shows a diagram for illustrating a method for

exchanging a set of initial replicas according to the present invention in more detail.

As can be seen in Fig. 2, there is an existing set of n replicas, so-called local Primaries LP. In the example shown in Fig. 2, the local Primaries have different software versions and have different addresses.

In a first step SI, a set of additional replicas is

instantiated by generating m Hot Standby replicas HS . This logical Hot Standby replicas have in the shown example different software versions C, D and also comprise different addresses. The software version C, D illustrated in Fig. 2 can be different release states or randomly compiled singular versions .

In a further step S2a, an update trigger between the logical Hot Standby replicas and the local Primary replicas is recognized. This triggering event can be a request for a state transfer from at least one initial or additional replica. Further, this triggering event can be formed by a request for a state transfer from a coordinating entity of the state machine replication system. In response to the triggering event, a current state of at least one initial replica is transferred in step S2b to at least one additional replica as illustrated in Fig. 2.

Fig. 2 illustrates three use cases UC, where the method according to the present invention can be implemented. In the first use case UC1, the initial set of replicas, i.e. the local Primaries, is substituted by the additional

replicas. In this use case one can, for example, perform a software update. In a second use case UC2 , selected replicas are replaced. In this use case, individual replicas or a subset of replicas of the set of initial replicas are replaced by replicas of the combined set of replicas. This can be performed, for

instance, if a single or a subset of initial replicas, i.e. local Primaries, has been compromised.

In a still further third use case UC3 , the number of replicas can be adjusted to increase or diminish a level of redundancy in the system.

As can be seen from Fig. 2, the method for exchanging a set of initial replicas can be used for a wide range of use cases in a system. According to the second aspect of the present invention, a state machine replication system is provided, wherein a set of initial replicas or local Primaries run by the system are exchanged by other replicas by using the method according to the first aspect of the present

invention. This state machine replication system can comprise a plurality of real or virtual machines or nodes each being adapted to run one or several replicas of a software.

Claims

Claims :

1. A method for exchanging a set of initial replicas

comprising the steps of:

(a) instantiating (SI) a set of additional replicas;

(b) transferring (S2) the current state of at least one initial replica to at least one additional replica in response to a triggering event; and

(c) phasing-in (S3) the set of additional replicas being in the transferred state,

wherein the step of phasing-in comprises

adding the set or a subset of additional replicas to the set or a subset of the initial replicas to form a combined set of replicas and/or substituting the initial set of replicas or a subset of the initial set of replicas by the set or a subset of additional replicas.

2. The method according to claim 1,

wherein the combined and/or substituted set of replicas is defined as a new set of initial replicas for following exchanges of replicas.

3. The method according to claim 2,

wherein the combined and/or substituted set of replicas is defined as the new set of initial replicas for following exchanges of replicas in that the combined or substituted set of replicas adopts identifiers of the previous set of initial replicas.

4. The method according to one of the preceding claims 1 to 3,

wherein the triggering event comprises a request for a state transfer from at least one initial or additional replica . The method according to one of the preceding claims 1 to 3,

wherein the triggering event comprises a request for a state transfer from a coordinating entity of a state machine replication system.

The method according to one of the preceding claims 1 to 5,

wherein a number of state changes of the initial replicas caused by events is counted when the transfer of the current state of the at least one initial replica has begun .

The method according to claim 6,

wherein the transfer of current state of the at least one initial replica is repeated, if the counted number of state changes exceeds a predetermined threshold value.

The method according to one of the preceding claims 1 to 7,

wherein the replicas communicate with each other by means of messages of a state machine replication protocol.

The method according to one of the preceding claims 1 to 8,

wherein for updating a software running on a set of initial replicas after updating a source code of the software a set of updated replicas running the updated software is instantiated as a set of additional replicas and an update message is sent by the set of updated replicas to the set of initial replicas as a triggering event indicating that a software update is requested.

The method according to claim 9,

wherein in response to the update message received from the updated replicas, the set of initial replicas

generates a snapshot of their current state which is transferred by the initial replicas to the updated replicas to synchronize both groups of replicas.

The method according to one of the preceding claims 1 to 10,

wherein individual replicas or a subset of replicas of the set of initial replicas are replaced by replicas of the combined set of replicas.

The method according to one of the preceding claims 1 to 10,

wherein the number of replicas of the combined set of replicas deviates from the number of replicas in the initial set of replicas to adjust a level of redundancy in the respective system.

The method according to one of the preceding claims 1 to 12,

wherein the instantiated set of additional replicas runs on the same or a different software or a software version as the set of initial replicas.

The method according to one of the preceding claims 1 to 13,

wherein the initial and/or additional replicas run on real or virtual machines of a real-time system.

A state machine replication system,

wherein a set of initial replicas of an application software run by said system is exchanged by other replicas using a method according to one of the preceding claims 1 to 14.

The state machine replication system according to claim 15,

wherein the state machine replication system comprises a plurality of real or virtual machines each being adapted to run one or several replicas of a software.