WO2004059484A1

WO2004059484A1 - A method of standby and controlling load in distributed data processing system

Info

Publication number: WO2004059484A1
Application number: PCT/CN2002/000939
Authority: WO
Inventors: Haipeng Li; Cunjun Dai
Original assignee: Zte Corporation
Priority date: 2002-12-31
Filing date: 2002-12-31
Publication date: 2004-07-15
Also published as: CN1695120A; CN100334554C; AU2002357568A1

Abstract

This is some kind of distributing process system. It includes many a loading group which puts forward the transaction quest, one interface process module which receives the transaction request from the loading groups and distribute the request to the corresponding processor and some processors above-mentioned. Each processor answers for the transaction requests from many above-mentioned loading groups. Among these, a processor was regarded as the main processor of one loading group, and another processor as standby one. When the main processor goes wrong, the standby one will be instead of it and process the transaction of this loading group.

Description

Backup and load control method in a fully distributed processing system TECHNICAL FIELD:

The present invention relates to a fully distributed processing system, and particularly to a backup and load control method in a fully distributed system. Background technique:

In a distributed system, the backup and control method usually used is primary-backup allocation, that is: a primary processing system plus a backup processing system. Under normal circumstances, the backup system does not process any business, only the primary When the system fails, all loads are transferred to the backup system.

Another commonly used method is the load sharing method. That is, two systems share half of the load, and there is no active / standby sharing. If one of the systems fails, the entire load is handled by the other system. This is disclosed in the document EP1 139 235 A2 "Distributed data processing system and method of processing data in distributed data processing system" One way to distribute data processing systems.

However, all of the above processing methods need to reserve half of the processing power for the backup operation to be performed, so the utilization of system resources will not exceed 50%. It is true that such a control method can be applied to a small-scale distributed processing system, but in a large-capacity fully-distributed processing system, the development of the existing technology has reduced the probability of a processor failure to a minimum. This utilization is undoubtedly a waste.

In order to improve the utilization rate of the system, the distributed data access system including a plurality of database access processors with one-for-n redundancy (Access System) "proposes an N + 1 backup method, that is: among multiple processing systems, one processing system is in a backup position, and when one of the normally running multiple processing systems fails, the service is withdrawn from service When this happens, the backup processing system is started to work instead of the failed system. In this system, as for N + 1 processing systems, N processing systems in normal work cannot be prefabricated. Know which system should be backed up in a timely manner on the standby system, therefore, the N processing systems in normal operation have not been backed up in a timely manner on the standby system; when the normal operating system fails, the data to be backed up has been lost After the backup system is started, a new process needs to be restarted.

This working method is obviously unacceptable for a distributed processing system based on data processing, especially for a system that requires real-time backup of the processed data and processes. SP: For a communication system that requires high real-time performance. Summary of the invention:

The purpose of the present invention is to construct a backup and load control method and system used in a large-capacity fully distributed processing system. The method and system can not only meet the requirements of real-time backup, that is, improve the reliability of the system, but also effectively Improve system utilization. A distributed processing system provided according to the present invention includes: a plurality of load groups that generate service requests, an interface processing module for receiving service requests from each load group and distributing the service requests to corresponding processors, And multiple processors, where each processor is responsible for processing the business of multiple load groups, and each load group is responsible for its business processing by one of the multiple processors as the main processor, and multiple processors The other processor is used as a backup processor, and when the main processor fails, the backup processor is responsible for processing the business of the load group.

A processing method executed in a distributed processing system provided in accordance with the present invention includes the steps of: receiving service requests from multiple load groups; allocating business requests to corresponding processors among a plurality of processors for processing, wherein Each processor is capable of processing multiple load groups' services; when a failure occurs among the multiple processors as the main processor for processing the service request, the other one of the multiple processors serves as a backup of the service request The handler of the handler performs processing.

In the foregoing distributed processing system and processing method provided by the present invention, a plurality of processors should include at least three processors. Detailed Description: Detailed descriptions are given around the schematic diagrams below:

Figure 1 describes the specific processing relationship between the processor and the load group;

Figure 2 shows the processing of the load when one of the processors fails; Figure 3 shows the processing of the load when two adjacent processors fail; Figure 4 depicts a block diagram of a process for implementing such a load sharing method

FIG. 5 describes the complex type of load sharing mode included in the invention. As shown in FIG. 1, there are four processors in the distributed management system. Under normal operating conditions, processor 1 is responsible for processing the load from groups 1 and 2. The processor 2 is responsible for processing the services from the load groups 3 and 4, the same processor 3 is processing the load groups 5 and 6, and the processor 4 is responsible for the load groups 7 and 8.

Because the processing of a load group's business requires a certain peripheral environment, such as data conditions, memory conditions, etc., and therefore is limited by physical conditions, it is not possible for one processor to conditionally process all load group's business. Therefore, it is necessary to determine the corresponding relationship between the processor and the load group, and prepare the environmental conditions at startup.

In the example shown in FIG. 1, the processor 1 also has the ability to process the services of the load group 8 and the load group 3, that is, if the main processor 4 that is responsible for the business processing of the load group 8 fails, the load group 8 It can be transferred to the processing machine 1 to continue processing. At the same time, if the processor 4 is capable of synchronously backing up the intermediate information to the processor 1 when processing the services of the load group 8 normally, it can be achieved when the processor 4 fails and the processor 1 takes over the services of the load group 8 , To realize the switch of the processor without interrupting the service.

Similarly, all other load groups have backup conditions on an adjacent processor, and each processor can also backup and process two adjacent load groups, so that a load backup chain is formed. Linked together to complete business processing.

If one of the processors fails, as shown in the situation shown in FIG. 2, assuming that the processor 2 fails, the load of the load group originally processed by the faulty processor, the load of the load group 3 is transferred to the processor 1, and the load group The load of 4 is processed by processor 3, so the distributed processing avoids the impact of the load caused by transferring too much load to another processor, and reduces the occurrence of serial failures. In this case, the processor only added one load group, and it originally had two load groups, so it can be seen that under normal circumstances, the load rate of the processor can reach In the traditional two-by-two backup system, if load groups 1 to 4 are borne by processors 1 and 2 and are backed up to each other, in the extreme case if both processors 1 and 2 fail, the load will inevitably lead to load All services in groups 1 to 4 are interrupted.

FIG. 3 shows the load distribution of the processors 1 and 2 when the processors 1 and 2 fail at the same time in the backup working mode proposed by the present invention. It can be seen that the load group 1 is adjacent to the processor 4 and the processor 4 contains The conditions of the load group are processed, so load group 1 is transferred to the processor for processing, and load group 4 is transferred to the processor 3 for processing. Only two load groups were interrupted. Generally speaking, a distributed processing system of a multiprocessor has only one interface processing module 10 for receiving external service requests. After a service request enters the interface module, it is distributed to each processor for processing in a certain manner. This method is also adopted in the method described in the present invention, and its implementation scheme and principle block diagram are shown in FIG. 4.

In the distributed processing system of the present invention shown in FIG. 4, all data and parameter groups required by the load group are stored in a centralized large-capacity database, and these data and units are distinguished according to logical load groups in the database. …-Divide units with similar business capabilities into the same load group.-The total number of load groups divided is twice the number of processors.

When the processor is started, according to the load group assigned by the processor, load the data related to the load group and the data of the load group that the processor needs to be processed as backup and other conditions into the processor to complete. Can prepare for business.

In the interface processing module, when a service request is received, the load unit from which the service request comes is first determined, and then as described above, according to the grouping of the load unit, the service group to which the service request belongs is determined, so as to access a service query The business sub-publishing in module 30 determines the main processor and the standby processor responsible for processing the service request according to the information in the business sub-publishing composed of the load group, the main processor, and the standby processor. The service query module may be placed in the interface processing module described above, or may be placed in each processor; the service sub-publication in the service query module is statically configured in advance, as shown in FIG. 4 It is the service point announcement based on the structure of the ring backup link in Figure 1. According to the information published by the business branch, after obtaining information about the main processor and the standby processor that processes the service request, it is necessary to determine whether the current business should be specifically handled by the main processor or the standby processor according to the status of the processor deal with.

In the interface processing module, there is a system management module 20, which is responsible for recording the status information of each processor; through contact with different processors, the system management module maintains the status information of each processor, that is: when the system When one of the processors fails, the system management program responds immediately and changes the status record of the processor.

If the information in the system management module indicates that the status of the main processor corresponding to the service request is normal, the service request is distributed to its main processor; but if the information in the system management module indicates that the main processor fails, the The service request is assigned to a standby processor for processing. Inside the processor, each business process has a memory area corresponding to it. This memory area records the intermediate processes of the business process and determines the business direction. There is a synchronization trigger module in the processor that requires external triggering. The synchronization program in the synchronization trigger module has a fast synchronization channel with the adjacent processor. Once there is a need to save and synchronize to the standby processor during business processing, the synchronization program will be triggered, and the synchronization program will synchronize the contents of the memory area in the main processor corresponding to the business request to the standby processor. Therefore, in the course of business processing, even if the main processor fails, subsequent messages of the service can still find the content recorded in the memory area of the main processor when the subsequent messages of the service are passed to the standby processor for processing, and according to it, The recorded content continues to perform business processing, thereby achieving continuous processing of services when a failover handler occurs. The core idea of the present invention is the formation of a circular backup chain, and the backup and load control of a large-capacity distributed system can be realized by organizing a corresponding relationship between a load group and a processor.

FIG. 1 is a typical form of a circular backup chain constructed according to the idea of the present invention, and the content of the present invention should not be limited to this. According to the idea of the present invention, it is also possible to construct other forms of more complex backup chains by defining the relationship between more complex load groups and processors to achieve better backup and load control in a large-capacity distributed system. As shown in Figure 5 Shows a complex form of backup link. . Complex backup links have better performance and load control effects, especially for large-capacity and ultra-large-capacity load systems and systems with more processors for distributed processing, which can improve the overall system utilization and stability. Beneficial effect

According to the above-mentioned distributed processing system and method of the present invention, in this system, each processor is responsible for processing the services of multiple load groups, and each load group is treated by one processor among the multiple processors. The manner in which the main processor undertakes its business processing, and another one of the plurality of processors is used as a backup processor, and the backup processor is responsible for processing the load group's business when the main processor fails. When one of the processor nodes fails, the load group processed on it is shared by its adjacent multiple processors, that is: under the premise of ensuring that the load is not full when one processor fails, the normal processor can handle The load is N / (N + 1) * 100%, where N is the number of adjacent backup processors, which is the multiple of the load group to the processors. In this way, when N = 2, the processing load is 66.7%, and when N = 3, the processing load is 75%. The larger N is, the higher the daily processing load can be, thus improving the utilization of system resources. .

In addition, if two adjacent processors fail at the same time, after adopting the distributed processing system and method of the present invention, since the load groups therein are processed by their adjacent processors respectively, the paralysis of the load groups is reduced. Quantity, which improves the reliability of the system.

At the same time, because there is a synchronization trigger module in each processor, the content of the memory area corresponding to the business request in the main processor can be synchronized to the standby processor, thereby realizing the continuity of the system's processing business and improving the system's reliability.

Claims

Claim

1. A distributed processing system, including:

Multiple load groups for generating service requests;

An interface processing module, configured to receive service requests from each load group, and distribute the service requests to corresponding processors;

A plurality of the processors, wherein each processor is responsible for processing the services of the plurality of load groups, and each of the load groups is responsible for its business processing by one of the plurality of processors as a main processor, and The other processor among the plurality of processors is used as a backup processor, and when the main processor fails, the backup processor is responsible for processing the load of the load group.

2. The distributed processing system according to claim 1, wherein the plurality of processors should include at least three processors.

3. The distributed processing system according to claim 1 or 2, further comprising: a service query module, configured to provide mapping information between the service request and the multiple processors

4. The distributed processing system according to claim 1 or 2, wherein the interface further comprises: a system management module, configured to provide status information of the processor.

5. The distributed processing system according to claim 3, wherein the interface further comprises: a system management module, configured to provide status information of the processor.

6. The distributed processing system according to claim 1 or 2, wherein the processor includes: a synchronization trigger module for synchronizing information in the main processor when the main processor fails. Into the standby processor.

7. The distributed processing system according to claim 3, wherein the processor includes a packet Including: a synchronization trigger module, configured to synchronize the information in the main processor to the standby processor when the main processor fails.

8. The distributed processing system according to claim 4, wherein the processor includes: a synchronization trigger module, configured to synchronize information in the main processor to a standby process when the main processor fails. Machine.

9. The distributed processing system according to claim 5, wherein the processor includes: a synchronization trigger module, configured to synchronize information in the main processor to a standby process when the main processor fails. Machine.

10. The distributed processing system according to claim 3, wherein the processor includes: a synchronization triggering module configured to synchronize information in the main processor to a standby process when the main processor fails. Machine.

1 1. A processing method executed in a distributed processing system, comprising the steps of: receiving service requests from multiple load groups;

Allocating the service request to corresponding processors in a plurality of processors for processing, each of which can process services of a plurality of the load groups;

When a failure occurs in the plurality of processors as a main processor for processing the service request, the service request is processed by a processor in the other of the plurality of processors as a backup processor for the service request.

12. The processing method executed in a distributed processing system according to claim 11, wherein the plurality of processors include at least three processors.

13. The processing method executed in a distributed processing system according to claim 11 or 12, further comprising the step of: providing mapping information between the service request and the plurality of processors.

14. The processing method executed in a distributed processing system according to claim 11 or 12, further comprising the step of: providing status information of the processor.

15. The processing method executed in a distributed processing system according to claim 13, further comprising the step of: providing status information of the processor.

16. A method for processing executed in a distributed processing system according to claim 11 or claim 12 ", further comprising the step of: when the main processor fails, synchronizing information in the main processor to Standby processor.

17. The processing method executed in a distributed processing system according to claim 13, further comprising the step of: when the main processor fails, synchronizing information in the main processor to a backup processor.

18. The processing method executed in a distributed processing system according to claim 14, further comprising the step of: when the main processor fails, synchronizing information in the main processor to a backup processor.

19. The processing method executed in a distributed processing system according to claim 15, further comprising the step of: when the main processor fails, synchronizing information in the main processor to a backup processor.

20. The processing method executed in a distributed processing system according to claim 13, further comprising the step of: when the main processor fails, synchronizing information in the main processor to a backup processor.