WO2004059484A1 - A method of standby and controlling load in distributed data processing system - Google Patents

A method of standby and controlling load in distributed data processing system Download PDF

Info

Publication number
WO2004059484A1
WO2004059484A1 PCT/CN2002/000939 CN0200939W WO2004059484A1 WO 2004059484 A1 WO2004059484 A1 WO 2004059484A1 CN 0200939 W CN0200939 W CN 0200939W WO 2004059484 A1 WO2004059484 A1 WO 2004059484A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
processing system
main processor
processors
distributed processing
Prior art date
Application number
PCT/CN2002/000939
Other languages
French (fr)
Chinese (zh)
Inventor
Haipeng Li
Cunjun Dai
Original Assignee
Zte Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zte Corporation filed Critical Zte Corporation
Priority to CNB028299396A priority Critical patent/CN100334554C/en
Priority to PCT/CN2002/000939 priority patent/WO2004059484A1/en
Priority to AU2002357568A priority patent/AU2002357568A1/en
Publication of WO2004059484A1 publication Critical patent/WO2004059484A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant

Definitions

  • the present invention relates to a fully distributed processing system, and particularly to a backup and load control method in a fully distributed system.
  • Background technique :
  • the backup and control method usually used is primary-backup allocation, that is: a primary processing system plus a backup processing system.
  • primary-backup allocation that is: a primary processing system plus a backup processing system.
  • the backup system does not process any business, only the primary When the system fails, all loads are transferred to the backup system.
  • Another commonly used method is the load sharing method. That is, two systems share half of the load, and there is no active / standby sharing. If one of the systems fails, the entire load is handled by the other system.
  • This is disclosed in the document EP1 139 235 A2 "Distributed data processing system and method of processing data in distributed data processing system" One way to distribute data processing systems.
  • the distributed data access system including a plurality of database access processors with one-for-n redundancy (Access System) "proposes an N + 1 backup method, that is: among multiple processing systems, one processing system is in a backup position, and when one of the normally running multiple processing systems fails, the service is withdrawn from service When this happens, the backup processing system is started to work instead of the failed system.
  • N + 1 processing systems N processing systems in normal work cannot be prefabricated.
  • a distributed processing system includes: a plurality of load groups that generate service requests, an interface processing module for receiving service requests from each load group and distributing the service requests to corresponding processors, And multiple processors, where each processor is responsible for processing the business of multiple load groups, and each load group is responsible for its business processing by one of the multiple processors as the main processor, and multiple processors
  • the other processor is used as a backup processor, and when the main processor fails, the backup processor is responsible for processing the business of the load group.
  • a processing method executed in a distributed processing system includes the steps of: receiving service requests from multiple load groups; allocating business requests to corresponding processors among a plurality of processors for processing, wherein Each processor is capable of processing multiple load groups' services; when a failure occurs among the multiple processors as the main processor for processing the service request, the other one of the multiple processors serves as a backup of the service request
  • the handler of the handler performs processing.
  • a plurality of processors should include at least three processors. Detailed Description: Detailed descriptions are given around the schematic diagrams below:
  • Figure 1 describes the specific processing relationship between the processor and the load group
  • Figure 2 shows the processing of the load when one of the processors fails
  • Figure 3 shows the processing of the load when two adjacent processors fail
  • Figure 4 depicts a block diagram of a process for implementing such a load sharing method
  • FIG. 5 describes the complex type of load sharing mode included in the invention.
  • processor 1 is responsible for processing the load from groups 1 and 2.
  • the processor 2 is responsible for processing the services from the load groups 3 and 4, the same processor 3 is processing the load groups 5 and 6, and the processor 4 is responsible for the load groups 7 and 8.
  • the processor 1 also has the ability to process the services of the load group 8 and the load group 3, that is, if the main processor 4 that is responsible for the business processing of the load group 8 fails, the load group 8 It can be transferred to the processing machine 1 to continue processing.
  • the processor 4 is capable of synchronously backing up the intermediate information to the processor 1 when processing the services of the load group 8 normally, it can be achieved when the processor 4 fails and the processor 1 takes over the services of the load group 8 , To realize the switch of the processor without interrupting the service.
  • all other load groups have backup conditions on an adjacent processor, and each processor can also backup and process two adjacent load groups, so that a load backup chain is formed. Linked together to complete business processing.
  • FIG. 3 shows the load distribution of the processors 1 and 2 when the processors 1 and 2 fail at the same time in the backup working mode proposed by the present invention. It can be seen that the load group 1 is adjacent to the processor 4 and the processor 4 contains The conditions of the load group are processed, so load group 1 is transferred to the processor for processing, and load group 4 is transferred to the processor 3 for processing. Only two load groups were interrupted. Generally speaking, a distributed processing system of a multiprocessor has only one interface processing module 10 for receiving external service requests. After a service request enters the interface module, it is distributed to each processor for processing in a certain manner. This method is also adopted in the method described in the present invention, and its implementation scheme and principle block diagram are shown in FIG. 4.
  • all data and parameter groups required by the load group are stored in a centralized large-capacity database, and these data and units are distinguished according to logical load groups in the database. ...-Divide units with similar business capabilities into the same load group.-The total number of load groups divided is twice the number of processors.
  • the interface processing module when a service request is received, the load unit from which the service request comes is first determined, and then as described above, according to the grouping of the load unit, the service group to which the service request belongs is determined, so as to access a service query
  • the business sub-publishing in module 30 determines the main processor and the standby processor responsible for processing the service request according to the information in the business sub-publishing composed of the load group, the main processor, and the standby processor.
  • the service query module may be placed in the interface processing module described above, or may be placed in each processor; the service sub-publication in the service query module is statically configured in advance, as shown in FIG. 4 It is the service point announcement based on the structure of the ring backup link in Figure 1. According to the information published by the business branch, after obtaining information about the main processor and the standby processor that processes the service request, it is necessary to determine whether the current business should be specifically handled by the main processor or the standby processor according to the status of the processor deal with.
  • the interface processing module there is a system management module 20, which is responsible for recording the status information of each processor; through contact with different processors, the system management module maintains the status information of each processor, that is: when the system When one of the processors fails, the system management program responds immediately and changes the status record of the processor.
  • each business process has a memory area corresponding to it. This memory area records the intermediate processes of the business process and determines the business direction.
  • There is a synchronization trigger module in the processor that requires external triggering.
  • the synchronization program in the synchronization trigger module has a fast synchronization channel with the adjacent processor.
  • the synchronization program will be triggered, and the synchronization program will synchronize the contents of the memory area in the main processor corresponding to the business request to the standby processor. Therefore, in the course of business processing, even if the main processor fails, subsequent messages of the service can still find the content recorded in the memory area of the main processor when the subsequent messages of the service are passed to the standby processor for processing, and according to it, The recorded content continues to perform business processing, thereby achieving continuous processing of services when a failover handler occurs.
  • the core idea of the present invention is the formation of a circular backup chain, and the backup and load control of a large-capacity distributed system can be realized by organizing a corresponding relationship between a load group and a processor.
  • FIG. 1 is a typical form of a circular backup chain constructed according to the idea of the present invention, and the content of the present invention should not be limited to this.
  • Figure 5 Shows a complex form of backup link.
  • Complex backup links have better performance and load control effects, especially for large-capacity and ultra-large-capacity load systems and systems with more processors for distributed processing, which can improve the overall system utilization and stability. Beneficial effect
  • each processor is responsible for processing the services of multiple load groups, and each load group is treated by one processor among the multiple processors.
  • the load group processed on it is shared by its adjacent multiple processors, that is: under the premise of ensuring that the load is not full when one processor fails, the normal processor can handle
  • the content of the memory area corresponding to the business request in the main processor can be synchronized to the standby processor, thereby realizing the continuity of the system's processing business and improving the system's reliability.

Abstract

This is some kind of distributing process system. It includes many a loading group which puts forward the transaction quest, one interface process module which receives the transaction request from the loading groups and distribute the request to the corresponding processor and some processors above-mentioned. Each processor answers for the transaction requests from many above-mentioned loading groups. Among these, a processor was regarded as the main processor of one loading group, and another processor as standby one. When the main processor goes wrong, the standby one will be instead of it and process the transaction of this loading group.

Description

全分布处理系统中的备份和负荷控制方法 技术领域:  Backup and load control method in a fully distributed processing system TECHNICAL FIELD:
本发明涉及一种全分布处理系统,特别涉及全分布系统中的备份 和负荷控制方法。 背景技术:  The present invention relates to a fully distributed processing system, and particularly to a backup and load control method in a fully distributed system. Background technique:
在分布式系统中,通常采用的备份和控制方式是主备用分配,即: 一个主用的处理系统再加上一个备用的处理系统, 在正常情况下, 备 用系统不处理任何业务, 只有主用系统发生故障时, 才将所有的负荷 转移到备用系统中。  In a distributed system, the backup and control method usually used is primary-backup allocation, that is: a primary processing system plus a backup processing system. Under normal circumstances, the backup system does not process any business, only the primary When the system fails, all loads are transferred to the backup system.
还有一种常采用的方式是负荷分担方式, 即:两个系统各分担一 半的负荷, 没有主备之分, 如果其中一个系统发生故障, 则全部负荷 都由另一个系统处理。在专利号为 EP1 139 235 A2的文件《 Distributed data processing system and method of processing data in distributed data processing system (在分布数据处理系统中用于数据处理的分布数据 处理系统和方法)》 中既公开了这样一种方式的分布数据处理系统。  Another commonly used method is the load sharing method. That is, two systems share half of the load, and there is no active / standby sharing. If one of the systems fails, the entire load is handled by the other system. This is disclosed in the document EP1 139 235 A2 "Distributed data processing system and method of processing data in distributed data processing system" One way to distribute data processing systems.
然而,上述这些处理方式都需要为将要进行的备份操作预留一半 的处理能力, 因此系统资源的利用率不会超过 50%。诚然, 这样的控 制方式在规模很小的分布处理系统中可以适用,但在大容量全分布的 处理系统中,在现有技术的发展己经使得处理机发生故障的几率降低 到极小的情况下, 这样的利用率无疑是一种浪费。  However, all of the above processing methods need to reserve half of the processing power for the backup operation to be performed, so the utilization of system resources will not exceed 50%. It is true that such a control method can be applied to a small-scale distributed processing system, but in a large-capacity fully-distributed processing system, the development of the existing technology has reduced the probability of a processor failure to a minimum. This utilization is undoubtedly a waste.
为了提高系统的利用率, 在专利号为 US5408649 的专利文件 Distributed data access system including a plurality of database access processors with one-for-n redundancy (具有 1 N冗余的包含多个数据 库访问处理器的分布数据访问系统)》中提出了一种 N+ 1的备份方式, 即: 在多个处理系统中, 有一个处理系统处于备份地位, 当正常运行 的多个处理系统中的一个处理系统发生故障而退出服务时,启动该备 份处理系统, 以代替发生故障的系统进行工作。 在该系统中, 由于对 于 N+1个处理系统而言, 处于正常工作中的 N个处理系统无法预制 知其中的哪个系统应当在备用系统上进行适时备份, 因此, 处于正常 工作中的 N个处理系统没有在备用系统上进行适时备份; 当其中正 常运行的系统出现故障时, 需要备份的数据已经丢失, 备份系统启动 后需要重新开始新的处理。 In order to improve the utilization rate of the system, the distributed data access system including a plurality of database access processors with one-for-n redundancy (Access System) "proposes an N + 1 backup method, that is: among multiple processing systems, one processing system is in a backup position, and when one of the normally running multiple processing systems fails, the service is withdrawn from service When this happens, the backup processing system is started to work instead of the failed system. In this system, as for N + 1 processing systems, N processing systems in normal work cannot be prefabricated. Know which system should be backed up in a timely manner on the standby system, therefore, the N processing systems in normal operation have not been backed up in a timely manner on the standby system; when the normal operating system fails, the data to be backed up has been lost After the backup system is started, a new process needs to be restarted.
这种工作方式对于基于数据处理的分布式处理系统而言,尤其对 于要求对处理的数据和过程都进行实时备份, SP : 实时性要求非常高 的通讯系统而言, 显然是不能接受的。 发明内容:  This working method is obviously unacceptable for a distributed processing system based on data processing, especially for a system that requires real-time backup of the processed data and processes. SP: For a communication system that requires high real-time performance. Summary of the invention:
本发明的目的是构造一种在大容量全分布处理系统中使用的备 份和负荷控制方法和系统,该方法和系统不仅能够满足实时备份的要 求, 即: 提高系统的可靠性, 而且可以有效地提高系统的利用率。 按照本发明所提供的一种分布式处理系统,包括: 多个产生业务 请求的负荷组,用于接收来自各负荷组的业务请求并将业务请求分发 给相应的处理机的一个接口处理模块, 以及多个处理机, 其中每个处 理机负责处理多个负荷组的业务,并且每个负荷组由多个处理机中的 一个处理机作为主处理机承担其业务处理,而由多个处理机中的另一 个处理机作为备用处理机, 在该主处理机发生故障时, 该备用处理机 负责处理该负荷组的业务。  The purpose of the present invention is to construct a backup and load control method and system used in a large-capacity fully distributed processing system. The method and system can not only meet the requirements of real-time backup, that is, improve the reliability of the system, but also effectively Improve system utilization. A distributed processing system provided according to the present invention includes: a plurality of load groups that generate service requests, an interface processing module for receiving service requests from each load group and distributing the service requests to corresponding processors, And multiple processors, where each processor is responsible for processing the business of multiple load groups, and each load group is responsible for its business processing by one of the multiple processors as the main processor, and multiple processors The other processor is used as a backup processor, and when the main processor fails, the backup processor is responsible for processing the business of the load group.
按照本发明所提供的一种分布式处理系统中执行的处理方法,包 括步骤: 接收来自多个负荷组的业务请求; 将业务请求分配给多个处 理机中对应的处理机进行处理,其中的每个处理机能够处理多个负荷 组的业务;当多个处理机中作为处理该业务请求的主处理机发生故障 时,该业务请求由多个处理机中其他的一个作为该业务请求的备用处 理机的处理机进行处理。  A processing method executed in a distributed processing system provided in accordance with the present invention includes the steps of: receiving service requests from multiple load groups; allocating business requests to corresponding processors among a plurality of processors for processing, wherein Each processor is capable of processing multiple load groups' services; when a failure occurs among the multiple processors as the main processor for processing the service request, the other one of the multiple processors serves as a backup of the service request The handler of the handler performs processing.
在上述本发明所提供的一种分布式处理系统及其处理方法中,多 个处理机至少应当包括 3个处理机。 详细描述: 下面分别围绕示意图进行详细说明: In the foregoing distributed processing system and processing method provided by the present invention, a plurality of processors should include at least three processors. Detailed Description: Detailed descriptions are given around the schematic diagrams below:
图 1描述的是处理机与负荷组之间具体的处理关系;  Figure 1 describes the specific processing relationship between the processor and the load group;
图 2示出在其中一个处理机出现故障时, 负荷的处理情况; 图 3示出相邻两个处理机出现故障时, 负荷的处理情况; 图 4描述了实现这种负荷分担方式的流程框图  Figure 2 shows the processing of the load when one of the processors fails; Figure 3 shows the processing of the load when two adjacent processors fail; Figure 4 depicts a block diagram of a process for implementing such a load sharing method
图 5描述了该发明所包含的复杂类型负荷分担方式 如图 1所示, 该分布式 ^理系统中有四个处理机,在正常运行的 条件下处理机 1负责处理来自负荷组 1和 2的业务,处理机 2负责处 理来自负荷组 3和 4的业务, 同样的处理机 3处理负荷组 5和 6, 处 理机 4负责负荷组 7和 8。  FIG. 5 describes the complex type of load sharing mode included in the invention. As shown in FIG. 1, there are four processors in the distributed management system. Under normal operating conditions, processor 1 is responsible for processing the load from groups 1 and 2. The processor 2 is responsible for processing the services from the load groups 3 and 4, the same processor 3 is processing the load groups 5 and 6, and the processor 4 is responsible for the load groups 7 and 8.
由于处理一个负荷组的业务需要一定的外围环境,例如:数据条 件、 内存条件等, 因而受到物理条件的限制, 一个处理机并不可能有 条件处理所有负荷组的业务。所以需要确定处理机与负荷组之间的对 应关系, 并在启动时准备好环境条件。  Because the processing of a load group's business requires a certain peripheral environment, such as data conditions, memory conditions, etc., and therefore is limited by physical conditions, it is not possible for one processor to conditionally process all load group's business. Therefore, it is necessary to determine the corresponding relationship between the processor and the load group, and prepare the environmental conditions at startup.
在图 1所示的例子中,处理机 1还具有能处理负荷组 8和负荷组 3的业务的能力, 即: 如果承担负荷组 8的业务处理的主处理机 4发 生故障, 则负荷组 8可以转移到处理机 1中进行继续处理。 同时, 若 处理机 4在正常处理负荷组 8的业务时,能够将中间信息同步备份到 处理机 1中,就可以做到当处理机 4发生故障而由处理机 1接管负荷 组 8的业务时, 实现在不中断业务的情况下进行处理机的切换。  In the example shown in FIG. 1, the processor 1 also has the ability to process the services of the load group 8 and the load group 3, that is, if the main processor 4 that is responsible for the business processing of the load group 8 fails, the load group 8 It can be transferred to the processing machine 1 to continue processing. At the same time, if the processor 4 is capable of synchronously backing up the intermediate information to the processor 1 when processing the services of the load group 8 normally, it can be achieved when the processor 4 fails and the processor 1 takes over the services of the load group 8 , To realize the switch of the processor without interrupting the service.
同理, 其他所有的负荷组都在其相邻的一个处理机上有备份条 件, 而每一个处理机也同样可以备份处理相邻的两个负荷组, 这样就 形成一个负荷备份链, 该链环环相扣, 共同完成业务处理。  Similarly, all other load groups have backup conditions on an adjacent processor, and each processor can also backup and process two adjacent load groups, so that a load backup chain is formed. Linked together to complete business processing.
如果有一个处理机发生故障, 如图 2所示的情况, 假设处理机 2 发生故障, 则故障处理机原处理的负荷组中, 负荷组 3的负荷转由处 理机 1处理了, 而负荷组 4的负荷则由处理机 3处理, 这样分散处理 避免了太大负荷转由另一处理机处理时造成的负荷冲击,减少了连环 故障的产生。这个情况下, 处理机仅增加了一个负荷组, 而其原来具 有两个负荷组,因此可以看出在正常情况下该处理机的负载率可以达 在传统两两互为备份的体制下, 如果负荷组 1〜4由处理机 1和 2 分别承担且互为备份,在极端如果处理机 1和处理机 2都出现故障的 情况下, 必然导致负荷组 1~4全部业务中断。 If one of the processors fails, as shown in the situation shown in FIG. 2, assuming that the processor 2 fails, the load of the load group originally processed by the faulty processor, the load of the load group 3 is transferred to the processor 1, and the load group The load of 4 is processed by processor 3, so the distributed processing avoids the impact of the load caused by transferring too much load to another processor, and reduces the occurrence of serial failures. In this case, the processor only added one load group, and it originally had two load groups, so it can be seen that under normal circumstances, the load rate of the processor can reach In the traditional two-by-two backup system, if load groups 1 to 4 are borne by processors 1 and 2 and are backed up to each other, in the extreme case if both processors 1 and 2 fail, the load will inevitably lead to load All services in groups 1 to 4 are interrupted.
而图 3示出了在本发明提出的备份工作方式下, 处理机 1 和 2 同时发生故障时负荷的分配情况; 可以看出负荷组 1由于与处理机 4 相邻, 且处理机 4中含有处理该负荷组的条件, 因此负荷组 1转到处 理机处理, 同样负荷组 4转到处理机 3上处理。仅有两个负荷组业务 中断。 一般来说,多处理机的分布处理系统对外仅有一个接口处理模块 10, 用于接收来自外部的业务请求, 业务请求进入该接口模块后, 按 一定的方式被分发到各个处理机进行处理。本发明所描述的方法中也 采用这种实现方式, 其实现方案和原理框图如图 4所示。  FIG. 3 shows the load distribution of the processors 1 and 2 when the processors 1 and 2 fail at the same time in the backup working mode proposed by the present invention. It can be seen that the load group 1 is adjacent to the processor 4 and the processor 4 contains The conditions of the load group are processed, so load group 1 is transferred to the processor for processing, and load group 4 is transferred to the processor 3 for processing. Only two load groups were interrupted. Generally speaking, a distributed processing system of a multiprocessor has only one interface processing module 10 for receiving external service requests. After a service request enters the interface module, it is distributed to each processor for processing in a certain manner. This method is also adopted in the method described in the present invention, and its implementation scheme and principle block diagram are shown in FIG. 4.
在图 4所示的本发明的分布处理系统中,系统中所有的数据和负 荷组所需要的参数条件都存在集中的大容量数据库中,在数据库中按 照逻辑的负荷组来区分这些数据和单元… -将业务能力相似的单元划 分入同一个负荷分组一-一共划分的负荷组数为处理机数量的两倍。  In the distributed processing system of the present invention shown in FIG. 4, all data and parameter groups required by the load group are stored in a centralized large-capacity database, and these data and units are distinguished according to logical load groups in the database. …-Divide units with similar business capabilities into the same load group.-The total number of load groups divided is twice the number of processors.
在处理机启动的时候,根据本处理机所分配的负荷组将该负荷组 相关的数据以及本处理机需要作为备份处理的负荷组的数据及其他 条件装载入本处理机中, 以做好可以处理业务的准备。  When the processor is started, according to the load group assigned by the processor, load the data related to the load group and the data of the load group that the processor needs to be processed as backup and other conditions into the processor to complete. Can prepare for business.
在接口处理模块中, 当收到一个业务请求时,首先确定该业务请 求来自的负荷单元, 然后如上所述, 按照负荷单元的分组, 确定该业 务请求所属的业务组, 从而通过访问一个业务查询模块 30内的业务 分发表, 根据由负荷组、 主处理机、 备用处理机组成的业务分发表中 的信息, 确定负责处理该业务请求的主处理机和备用处理机。 其中, 该业务查询模块可以置于上述的接口处理模块中,也可以置于各个处 理机中; 业务查询模块中的业务分发表是事先静态配置好的, 如图 4 中所示的业务分发表是预先根据图 1 环形备份链路的结构设置的业 务分发表。 根据业务分发表的信息,在获得关于处理该业务请求的主处理机 和备用处理机的信息后, 还需要依据处理机的状态, 确定目前的业务 具体应当由主处理机还是由备用处理机负责处理。 In the interface processing module, when a service request is received, the load unit from which the service request comes is first determined, and then as described above, according to the grouping of the load unit, the service group to which the service request belongs is determined, so as to access a service query The business sub-publishing in module 30 determines the main processor and the standby processor responsible for processing the service request according to the information in the business sub-publishing composed of the load group, the main processor, and the standby processor. The service query module may be placed in the interface processing module described above, or may be placed in each processor; the service sub-publication in the service query module is statically configured in advance, as shown in FIG. 4 It is the service point announcement based on the structure of the ring backup link in Figure 1. According to the information published by the business branch, after obtaining information about the main processor and the standby processor that processes the service request, it is necessary to determine whether the current business should be specifically handled by the main processor or the standby processor according to the status of the processor deal with.
在接口处理模块中, 有一个系统管理模块 20, 其负责记录各处 理机的状态信息; 通过与不同处理机之间的联系, 该系统管理模块维 护着各处理机的状态信息, 即: 当系统中的一台处理机出现故障时, 系统管理程序立刻做出反应, 将该处理机的状态记录进行修改。  In the interface processing module, there is a system management module 20, which is responsible for recording the status information of each processor; through contact with different processors, the system management module maintains the status information of each processor, that is: when the system When one of the processors fails, the system management program responds immediately and changes the status record of the processor.
如果系统管理模块中的信息表明该业务请求对应的主处理机的 状态正常, 则将该业务请求分发给其主处理机; 但如果系统管理模块 中的信息表明该主处理机出现故障,则将该业务请求分配给备用处理 机进行处理。 在处理机内部,每个业务的处理都有一个内存区与之对应,这个 内存区中记录了该业务处理的各中间过程, 决定了业务的走向。在处 理机中有一个需要外部触发的同步触发模块,该同步触发模块中的同 步程序, 与相邻处理机之间存在快速同步通道。一旦业务处理过程中 出现需要保存并同步到备用处理机的情况, 将触发该同步程序, 该同 步程序会将该业务请求对应的主处理机中内存区的内容同步到备用 处理机上。 因此, 在业务处理过程中, 即使主处理机出现故障, 该业 务的后续消息在传递到备用处理机处理时,仍然能找到该业务的在主 处理机的内存区中记录的内容, 并根据其中记录的内容, 继续进行业 务处理, 从而实现了在发生故障切换处理机时, 业务的连续处理。 本发明的核心思想是环行备份链的形成,通过组织合适的负荷组 与处理机之间的对应关系实现大容量分布式系统的备份和负荷控制。  If the information in the system management module indicates that the status of the main processor corresponding to the service request is normal, the service request is distributed to its main processor; but if the information in the system management module indicates that the main processor fails, the The service request is assigned to a standby processor for processing. Inside the processor, each business process has a memory area corresponding to it. This memory area records the intermediate processes of the business process and determines the business direction. There is a synchronization trigger module in the processor that requires external triggering. The synchronization program in the synchronization trigger module has a fast synchronization channel with the adjacent processor. Once there is a need to save and synchronize to the standby processor during business processing, the synchronization program will be triggered, and the synchronization program will synchronize the contents of the memory area in the main processor corresponding to the business request to the standby processor. Therefore, in the course of business processing, even if the main processor fails, subsequent messages of the service can still find the content recorded in the memory area of the main processor when the subsequent messages of the service are passed to the standby processor for processing, and according to it, The recorded content continues to perform business processing, thereby achieving continuous processing of services when a failover handler occurs. The core idea of the present invention is the formation of a circular backup chain, and the backup and load control of a large-capacity distributed system can be realized by organizing a corresponding relationship between a load group and a processor.
图 1是按照本发明思想所构筑的环行备份链中的一种典型形式, 本发明的内容应该不仅局限于此。按照本发明的思想, 还可以通过定 义更复杂的负荷组与处理机之间关系,构筑其他形式的更复杂的备份 链, 以实现大容量分布式系统中更好的备份和负荷控制。如图 5中所 示的一种复杂的备份链路形式。.复杂的备份链路, 具有更好的性能和 负荷控制的效果,尤其对于大容量及超大容量负荷的系统以及更多处 理机进行分布处理的系统来说, 更能提高整个系统的利用率和稳定 性。 有益效果 FIG. 1 is a typical form of a circular backup chain constructed according to the idea of the present invention, and the content of the present invention should not be limited to this. According to the idea of the present invention, it is also possible to construct other forms of more complex backup chains by defining the relationship between more complex load groups and processors to achieve better backup and load control in a large-capacity distributed system. As shown in Figure 5 Shows a complex form of backup link. . Complex backup links have better performance and load control effects, especially for large-capacity and ultra-large-capacity load systems and systems with more processors for distributed processing, which can improve the overall system utilization and stability. Beneficial effect
按照本发明上述的分布式处理系统及其方法, 由于在该系统中, 采取了每个处理机负责处理多个负荷组的业务、并且每个负荷组由多 个处理机中的一个处理机作为主处理机承担其业务处理、而由多个处 理机中的另一个处理机作为备用处理机、在该主处理机发生故障时该 备用处理机负责处理该负荷组的业务的方式, 因此, 当其中一个处理 机节点出现故障时,其上处理的负荷组分别由其相邻的多个处理机来 分担, 即: 在保证一个处理机出现故障时负荷不满的前提下, 平常处 理机可以处理的负荷为 N/(N+1)*100%, 其中 N为相邻的备份处理机 数量, 也就是负荷组对处理机的倍数。 这样, 当 N=2 时, 处理的负 荷为 66.7%, 当 N=3时, 可处理的负荷为 75%, N越大, 日常可以 处理的负荷就越高, 因而提高了系统资源的利用率。  According to the above-mentioned distributed processing system and method of the present invention, in this system, each processor is responsible for processing the services of multiple load groups, and each load group is treated by one processor among the multiple processors. The manner in which the main processor undertakes its business processing, and another one of the plurality of processors is used as a backup processor, and the backup processor is responsible for processing the load group's business when the main processor fails. When one of the processor nodes fails, the load group processed on it is shared by its adjacent multiple processors, that is: under the premise of ensuring that the load is not full when one processor fails, the normal processor can handle The load is N / (N + 1) * 100%, where N is the number of adjacent backup processors, which is the multiple of the load group to the processors. In this way, when N = 2, the processing load is 66.7%, and when N = 3, the processing load is 75%. The larger N is, the higher the daily processing load can be, thus improving the utilization of system resources. .
此外,如果相邻的两个处理机同时发生故障,在采用本发明的分 布式处理系统和方法后,由于其中的负荷组分别由他们相邻的处理机 进行处理, 因而减少了负荷组的瘫痪数量, 提高了系统的可靠性。  In addition, if two adjacent processors fail at the same time, after adopting the distributed processing system and method of the present invention, since the load groups therein are processed by their adjacent processors respectively, the paralysis of the load groups is reduced. Quantity, which improves the reliability of the system.
同时, 由于各处理机中存在同步触发模块, 因此可以将主处理机 中业务请求所对应的内存区的内容同步到备用处理机中,从而实现了 系统处理业务的连续性, 并提高了系统的可靠性。  At the same time, because there is a synchronization trigger module in each processor, the content of the memory area corresponding to the business request in the main processor can be synchronized to the standby processor, thereby realizing the continuity of the system's processing business and improving the system's reliability.

Claims

权利要求书 Claim
1、 一种分布式处理系统, 包括:  1. A distributed processing system, including:
多个负荷组, 用于产生业务请求;  Multiple load groups for generating service requests;
一个接口处理模块, 用于接收来自各负荷组的业务请求, 并将业 务请求分发给相应的处理机;  An interface processing module, configured to receive service requests from each load group, and distribute the service requests to corresponding processors;
多个所述处理机,其中每个处理机负责处理多个所述负荷组的业 务,并且每个负荷组由所述多个处理机中的一个处理机作为主处理机 承担其业务处理,而由所述多个处理机中的另一个处理机作为备用处 理机, 在该主处理机发生故障时, 该备用处理机负 处理该负荷组的 业务。  A plurality of the processors, wherein each processor is responsible for processing the services of the plurality of load groups, and each of the load groups is responsible for its business processing by one of the plurality of processors as a main processor, and The other processor among the plurality of processors is used as a backup processor, and when the main processor fails, the backup processor is responsible for processing the load of the load group.
2、 如权利要求 1所述的分布式处理系统, 其中所述多个处理机 至少应当包括 3个处理机。 2. The distributed processing system according to claim 1, wherein the plurality of processors should include at least three processors.
3、 如权利要求 1或 2所述的分布式处理系统, 其中还包括: 一 个业务查询模块,用于提供所述业务请求与所述多个处理机之间的映 射 息 3. The distributed processing system according to claim 1 or 2, further comprising: a service query module, configured to provide mapping information between the service request and the multiple processors
4、 如权利要求 1或 2所述的分布式处理系统, 其中所述接口还 包括: 一个系统管理模块, 用于提供处理机的状态信息。 4. The distributed processing system according to claim 1 or 2, wherein the interface further comprises: a system management module, configured to provide status information of the processor.
5、如权利要求 3所述的分布式处理系统, 其中所述接口还包括: 一个系统管理模块, 用于提供处理机的状态信息。 5. The distributed processing system according to claim 3, wherein the interface further comprises: a system management module, configured to provide status information of the processor.
6、 如权利要求 1或 2所述的分布式处理系统, 其中所述处理机 中包括: 一个同步触发模块, 用于在所述主处理机发生故障时, 将主. 处理机中的信息同步到备用处理机中。 6. The distributed processing system according to claim 1 or 2, wherein the processor includes: a synchronization trigger module for synchronizing information in the main processor when the main processor fails. Into the standby processor.
7、 如权利要求 3所述的分布式处理系统, 其中所述处理机中包 括: 一个同步触发模块, 用于在所述主处理机发生故障时, 将主处理 机中的信息同步到备用处理机中。 7. The distributed processing system according to claim 3, wherein the processor includes a packet Including: a synchronization trigger module, configured to synchronize the information in the main processor to the standby processor when the main processor fails.
8、 如权利要求 4所述的分布式处理系统, 其中所述处理机中包 括: 一个同步触发模块, 用于在所述主处理机发生故障时, 将主处理 机中的信息同步到备用处理机中。 8. The distributed processing system according to claim 4, wherein the processor includes: a synchronization trigger module, configured to synchronize information in the main processor to a standby process when the main processor fails. Machine.
9、 如权利要求 5所述的分布式处理系统, 其中所述处理机中包 括: 一个同步触发模块, 用于在所述主处理机发生故障时, 将主处理 机中的信息同步到备用处理机中。 9. The distributed processing system according to claim 5, wherein the processor includes: a synchronization trigger module, configured to synchronize information in the main processor to a standby process when the main processor fails. Machine.
10、如权利要求 3所述的分布式处理系统, 其中所述处理机中包 括: 一个同步触发模块, 用于在所述主处理机发生故障时, 将主处理 机中的信息同步到备用处理机中。 10. The distributed processing system according to claim 3, wherein the processor includes: a synchronization triggering module configured to synchronize information in the main processor to a standby process when the main processor fails. Machine.
1 1、 一种分布式处理系统中执行的处理方法, 包括步骤: 接收来自多个负荷组的业务请求; 1 1. A processing method executed in a distributed processing system, comprising the steps of: receiving service requests from multiple load groups;
将所述业务请求分配给多个处理机中对应的处理机进行处理,其 中的每个处理机能够处理多个所述负荷组的业务;  Allocating the service request to corresponding processors in a plurality of processors for processing, each of which can process services of a plurality of the load groups;
当所述多个处理机中作为处理该业务请求的主处理机发生故障 时,该业务请求由所述多个处理机中其他的一个作为该业务请求的备 用处理机的处理机进行处理。  When a failure occurs in the plurality of processors as a main processor for processing the service request, the service request is processed by a processor in the other of the plurality of processors as a backup processor for the service request.
12、 如权利要求 1 1所述的一种分布式处理系统中执行的处理方 法, 其中所述多个处理机至少包括 3个处理机。 12. The processing method executed in a distributed processing system according to claim 11, wherein the plurality of processors include at least three processors.
13、如权利要求 11或 12所述的一种分布式处理系统中执行的处 理方法, 其中还包括步骤: 提供所述业务请求与所述多个处理机之间 的映射信息。 13. The processing method executed in a distributed processing system according to claim 11 or 12, further comprising the step of: providing mapping information between the service request and the plurality of processors.
14、如权利要求 11或 12所述的一种分布式处理系统中执行的处 理方法, 其中还包括步骤: 提供处理机的状态信息。 14. The processing method executed in a distributed processing system according to claim 11 or 12, further comprising the step of: providing status information of the processor.
15、 如权利要求 13所述的一种分布式处理系统中执行的处理方 法, 其中还包括步骤: 提供处理机的状态信息。 15. The processing method executed in a distributed processing system according to claim 13, further comprising the step of: providing status information of the processor.
16、如权利要求 1 1或 12所述的一种分布式处理系统中执行的处 理¾"法, 其中还包括步骤: 在所述主处理机发生故障时, 将主处理机 中的信息同步到备用处理机中。 16. A method for processing executed in a distributed processing system according to claim 11 or claim 12 ", further comprising the step of: when the main processor fails, synchronizing information in the main processor to Standby processor.
17、 如权利要求 13所述的一种分布式处理系统中执行的处理方 法, 其中还包括步骤: 在所述主处理机发生故障时, 将主处理机中的 信息同步到备用处理机中。 17. The processing method executed in a distributed processing system according to claim 13, further comprising the step of: when the main processor fails, synchronizing information in the main processor to a backup processor.
18、 如权利要求 14所述的一种分布式处理系统中执行的处理方 法, 其中还包括步骤: 在所述主处理机发生故障时, 将主处理机中的 信息同步到备用处理机中。 18. The processing method executed in a distributed processing system according to claim 14, further comprising the step of: when the main processor fails, synchronizing information in the main processor to a backup processor.
19、 如权利要求 15所述的一种分布式处理系统中执行的处理方 法, 其中还包括步骤: 在所述主处理机发生故障时, 将主处理机中的 信息同步到备用处理机中。 19. The processing method executed in a distributed processing system according to claim 15, further comprising the step of: when the main processor fails, synchronizing information in the main processor to a backup processor.
20、 如权利要求 13所述的一种分布式处理系统中执行的处理方 法, 其中还包括步骤: 在所述主处理机发生故障时, 将主处理机中的 信息同步到备用处理机中。 20. The processing method executed in a distributed processing system according to claim 13, further comprising the step of: when the main processor fails, synchronizing information in the main processor to a backup processor.
PCT/CN2002/000939 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system WO2004059484A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CNB028299396A CN100334554C (en) 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system
PCT/CN2002/000939 WO2004059484A1 (en) 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system
AU2002357568A AU2002357568A1 (en) 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2002/000939 WO2004059484A1 (en) 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system

Publications (1)

Publication Number Publication Date
WO2004059484A1 true WO2004059484A1 (en) 2004-07-15

Family

ID=32661066

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2002/000939 WO2004059484A1 (en) 2002-12-31 2002-12-31 A method of standby and controlling load in distributed data processing system

Country Status (3)

Country Link
CN (1) CN100334554C (en)
AU (1) AU2002357568A1 (en)
WO (1) WO2004059484A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100466534C (en) * 2004-11-12 2009-03-04 华为技术有限公司 Method for processing fault of multimedia sub-system equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1889699B (en) * 2006-07-27 2010-05-12 华为技术有限公司 Distributing system business dispensing method and system
KR102082282B1 (en) * 2016-01-14 2020-02-27 후아웨이 테크놀러지 컴퍼니 리미티드 Method and system for managing resource objects

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1092886A (en) * 1992-12-08 1994-09-28 艾利森电话股份有限公司 Be used for the system that database backs up
US5408649A (en) * 1993-04-30 1995-04-18 Quotron Systems, Inc. Distributed data access system including a plurality of database access processors with one-for-N redundancy

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5307481A (en) * 1990-02-28 1994-04-26 Hitachi, Ltd. Highly reliable online system
EP0645702B1 (en) * 1993-09-24 2000-08-02 Siemens Aktiengesellschaft Load balancing method in a multiprocessor system
SE9404295D0 (en) * 1994-12-09 1994-12-09 Ellemtel Utvecklings Ab Methods and apparatus for telecommunications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1092886A (en) * 1992-12-08 1994-09-28 艾利森电话股份有限公司 Be used for the system that database backs up
US5408649A (en) * 1993-04-30 1995-04-18 Quotron Systems, Inc. Distributed data access system including a plurality of database access processors with one-for-N redundancy

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100466534C (en) * 2004-11-12 2009-03-04 华为技术有限公司 Method for processing fault of multimedia sub-system equipment

Also Published As

Publication number Publication date
CN1695120A (en) 2005-11-09
CN100334554C (en) 2007-08-29
AU2002357568A1 (en) 2004-07-22

Similar Documents

Publication Publication Date Title
US6421739B1 (en) Fault-tolerant java virtual machine
EP3694148A1 (en) Configuration modification method for storage cluster, storage cluster and computer system
CN1893370B (en) Server cluster recovery and maintenance method and system
EP2224341B1 (en) Node system, server switching method, server device, and data transfer method
US6012150A (en) Apparatus for synchronizing operator initiated commands with a failover process in a distributed processing system
CN102411639B (en) Multi-copy storage management method and system of metadata
US8032786B2 (en) Information-processing equipment and system therefor with switching control for switchover operation
KR20010079917A (en) Protocol for replicated servers
CN110224871A (en) A kind of high availability method and device of Redis cluster
CN1317658C (en) Fault-tolerance approach using machine group node interacting buckup
US8527454B2 (en) Data replication using a shared resource
WO2012097588A1 (en) Data storage method, apparatus and system
CN108469996A (en) A kind of system high availability method based on auto snapshot
JP3197279B2 (en) Business takeover system
CN101686261A (en) RAC-based redundant server system
JP3139884B2 (en) Multi-element processing system
WO2004059484A1 (en) A method of standby and controlling load in distributed data processing system
CN101145955A (en) Hot backup method, network management and network management system of network management software
JP3447347B2 (en) Failure detection method
CN114564340B (en) High availability method for distributed software of aerospace ground system
JP2002136000A (en) Uninterruptible power supply system
JP2002055840A (en) Redundant constitution switching system
CN113220509A (en) Double-combination alternating shift system and method
JP3354045B2 (en) System backup method
CN111831490A (en) Method and system for synchronizing memories between redundant main and standby nodes

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 20028299396

Country of ref document: CN

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP