CN100432940C - Method for distributing shared resource lock in computer cluster system and cluster system - Google Patents

Method for distributing shared resource lock in computer cluster system and cluster system Download PDF

Info

Publication number
CN100432940C
CN100432940C CNB2006101409834A CN200610140983A CN100432940C CN 100432940 C CN100432940 C CN 100432940C CN B2006101409834 A CNB2006101409834 A CN B2006101409834A CN 200610140983 A CN200610140983 A CN 200610140983A CN 100432940 C CN100432940 C CN 100432940C
Authority
CN
China
Prior art keywords
lock
node
shared resource
application
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2006101409834A
Other languages
Chinese (zh)
Other versions
CN1945539A (en
Inventor
吴俊敏
张少林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Huawei Technologies Co Ltd
Original Assignee
University of Science and Technology of China USTC
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC, Huawei Technologies Co Ltd filed Critical University of Science and Technology of China USTC
Priority to CNB2006101409834A priority Critical patent/CN100432940C/en
Publication of CN1945539A publication Critical patent/CN1945539A/en
Application granted granted Critical
Publication of CN100432940C publication Critical patent/CN100432940C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

This invention publishes a locked distribution method of shared resources in a type of computer integration system including: making more than one node of system members as locking management node of all the shared resources in the system, meanwhile each shared resource matches a locking management node; when one apply asks for applicant or release shored resources lock, and send applicant to the corresponding locking management node of this shored resources, the corresponding locking management node will finish the distribution and release of lock's applicant. This invention also publishes the corresponding structure of computer integration system. The invention can be used for realizing high usable locking service in the integration, and it has a high expansibility and realizes a balance of load.

Description

Shared resource lock distribution method and computing machine and group system in the computer cluster
Technical field
The present invention relates to computer system, relate in particular to lock distribution method and the corresponding computing machine and the group system of shared resource in the computer cluster.
Background technology
Lock is a key concept in the computer system.In one-of-a-kind system, lock is used to guarantee a plurality of processes or the thread exclusive reference to same shared resource.Simple mode-locking type comprises shared lock and exclusive lock.A kind of shared resource at a time can only be assigned a plurality of shared locks or an exclusive lock.In the computer cluster environment, can guarantee to be distributed in process on each member node (being each physical equipment in the computer cluster) to the exclusive reference of same shared resource in the cluster by means of distributed lock.Need use the distributed lock service based on the application program of cluster in a large number.
Application program based on cluster can provide stronger service ability on the one hand, can provide higher service availability on the other hand.High availability generally can by software in 1 year or hardware can operate as normal time weigh.Availability such as 99.999% refers to that the time of service stopping in a year is no more than 5 minutes, and the time that stops to comprise planned stopping (as system upgrade etc.) and unplanned stopping (as unusual hardware and software failure, power down suddenly or the like).When high availability requires in cluster some member node or process to break down, can smoothly carry out task apace and switch, prevent losing of running state data.Distributed lock service in the cluster must also will possess high availability, thereby this requires can not cause lock assignment information mistake or lose and cause the system service fault when node failure.
The gordian technique of the high available lock of cluster comprises distributed lock management technology, lock recovery technology etc.
A kind of implementation of the high available lock service of cluster is to adopt host-guest architecture in the prior art, promptly there be the lock management node of a host node as all resources in the cluster, can there be simultaneously another one or a plurality of node backup node (being the chromaffin body point) as this lock management node, adopt heartbeat mechanism to monitor its duty between the major-minor node, adopt synchronously or the current lock assignment information of asynchronous system updated stored.When finding that host node lost efficacy, chromaffin body point is taken over the work of host node, continues to safeguard that the lock of all resources in the cluster distributes.
Above-mentioned prior art scheme realizes simple, clear, has single management node, has preferable performance in the little or low load applications of cluster scale, and guarantees that its height is available.The shortcoming of the prior art scheme is that extendability is not strong, and when cluster scale increases or load when big, host node is a performance bottleneck potential in the cluster service.
Summary of the invention
The invention provides shared resource lock distribution method in a kind of computer cluster, in order to solve exist in the prior art by single node during, when cluster scale increases or load can't be satisfied the problem of service request when big as the lock management node of all resources in the cluster.
According to shared resource lock distribution method provided by the invention, the present invention provides a kind of corresponding computing machine and group system in addition.
The inventive method comprises:
With the lock management node of more than one member node in the system as shared resource in the system, the only corresponding lock management node of each shared resource;
When a certain application request application or release shared resource lock, ask to the lock management node transmission of this shared resource correspondence, finish to lock by the lock management node of correspondence and apply for distributing or release.
According to said method of the present invention, each member node in the system is with the member node of the first application shared resource lock lock management node as this shared resource.
According to said method of the present invention, in each member node, preserve the global resource directory information table, the corresponding relation of the lock management node of shared resource in the storage system and correspondence thereof;
When a certain application request application or when discharging a certain shared resource lock, the described global resource directory information table that the local node inquiry self is preserved is initiated the lock application or is discharged request to the corresponding lock management node that inquires.
According to said method of the present invention, when in inquiring described global resource directory information table, not having the lock management nodal information of shared resource correspondence of record request, described local node each member node in system sends reliable orderly multicast message, and the lock management node of this shared resource correspondence is determined in request; Each member node is determined the lock management node of this shared resource correspondence according to the logic preface of the multicast message that receives; And lock distribution by the corresponding lock management node of determining.
According to said method of the present invention, when the lock management nodal information of the shared resource correspondence that does not have record request in the described global resource directory information table, described local node each member node in system sends multicast message, and the lock management node of this shared resource correspondence is determined in request; After each member node receives described multicast message, adopt the lock management node of determining this shared resource correspondence with a kind of hash algorithm, and lock distribution by the corresponding lock management node of determining.
After described lock management node is received lock application request, judge according to the lock distribution principle of corresponding shared resource and can lock the branch timing, allow current request that this shared resource is locked.
According to said method of the present invention, in each lock management node, preserve the lock allocation information table, the lock formation that storage has distributed and etc. lock formation to be allocated;
When the lock management node is received lock application request and judged the current lock branch timing that has according to the lock distribution principle of corresponding shared resource, the shared resource lock of request for allocation, and will distribute the nodal information of lock and initiation lock application request to be deposited in the lock formation that has distributed;
Divide a timing when judging current lock, this request is inserted etc. in the lock formation to be allocated; Up to this shared resource have lock to discharge and request not overtime, the lock that will discharge is distributed to this request again.
In system's member node, preserve lock application information table, the shared resource lock application information that storage is initiated by this node;
After the shared resource of application was finished using, the corresponding lock management node of request discharged the shared resource lock of application, and corresponding lock application record in the local lock of the deletion application information table.
According to said method of the present invention, when having member node to lose efficacy in the system, carry out the following step:
The stale resource will of managing for failure node in the system redefines new lock management node; Described stale resource will is the shared resource that failure node is managed;
The shared resource lock whether the failure node application is arranged in the local lock allocation information table of preserving of each member node inspection if having, then discharges the shared resource lock of failure node application, and upgrades local lock allocation information table of preserving;
Each member node is according to local global dictionary information table of preserving of the new lock management node updates of determining and lock application information table.
Described for stale resource will redefines new lock management node, concrete grammar comprises:
Take reliable orderly multicast mode multicast separately to the application status message of described stale resource will lock by the member node of having applied for the stale resource will lock;
Each member node sends the new lock management node of the node of a certain stale resource will lock application status message as this stale resource will according to the message preface of described application status message with first.
According to said method of the present invention, the application status message that the stale resource will that the new lock management node of described stale resource will sends according to described each member node is locked is in the lock assignment information of this stale resource will of local recovery.
Described stale resource will of managing for failure node in the system redefines new lock management node, and concrete grammar comprises:
The member node that detects a certain member node inefficacy is taked reliable orderly multicast mode multicast node failure notification in whole group system, after each member node receives described node failure notice, adopt the new management node that redefines out the shared resource that failure node manages with a kind of hash algorithm.
There have newcomer's node to add in system to be fashionable, carries out the following step:
Described newcomer's node obtains described global resource directory information table from other effective member node and storage is a in this locality.
The invention provides a kind of computing machine, be applied to computer cluster, described computing machine is as a member node of described system; In described computing machine, comprise:
First memory module, the global resource directory information table of the lock management node of shared resource in the storage system and correspondence thereof;
First functional module, be used for when a certain shared resource of a certain application request application is locked, inquire about the global resource directory information table of preserving in described first memory module, if do not write down the lock management nodal information of this shared resource correspondence in the described global resource directory information table, then send the reliable multicast message that transmits in order to the whole member node in the system, corresponding lock management node is determined in request; If record the lock management nodal information of this shared resource correspondence in the described global resource directory information table, then initiate lock application request to corresponding lock management node;
Lock management node determination module receives described multicast message, determines the lock management node of the shared resource correspondence of request, and judges whether this node is corresponding lock management node; When judging this node, send a notification message to the shared resource lock management module of this node for corresponding lock management node;
Shared resource lock management module receives the lock application or discharges request, and its one or more shared resource locks of managing are realized distributing and releasing operation.
Described computing machine also comprises:
Second memory module is preserved lock application information table, the shared resource lock application information that storage is initiated by this node;
Second functional module, after the shared resource of this node application was finished using, the corresponding lock management node of request discharged the shared resource lock of application, and deletes shared resource application record corresponding in the lock application information table of storing in second memory module.
Also comprise:
The 3rd memory module, the lock formation that distributed of storage and etc. lock formation to be allocated; After shared resource lock management module receives lock application request, if judge the current lock branch timing that has according to the lock distribution principle of corresponding shared resource, then the shared resource of request for allocation is locked, and the nodal information that will distribute lock and the application of initiation lock to ask is deposited in the lock formation that has distributed; Divide timing if judge current lock, then this request is inserted etc. in the lock formation to be allocated, have lock to discharge and request when not overtime up to this shared resource, the lock that will discharge is distributed to this request again.
The present invention also provides a kind of computer cluster, comprises many described computing machines provided by the invention.
Beneficial effect of the present invention is as follows:
(1) adopts the present invention, in computer cluster, at least exist two or more member node as the lock management node, finish the lock distribution/releasing operation of one or more shared resources of being managed, overcome in the prior art by the cluster single node and finished the performance bottleneck problem that the lock distribution/release of all resources in the cluster is had.
(2) the present invention can set up the mapping relations of shared resource and member node by modes such as hash algorithms, determines the lock management node of each shared resource, thereby reaches the lock management node of each shared resource of balanced distribution, realizes load balance.
(3) the present invention can also take the member node of first application shared resource lock is determined as the mode of the lock management node of this shared resource the corresponding relation of shared resource and lock management node, owing to use randomness in the cluster to the lock services request, therefore, the lock management node of each shared resource will be randomly dispersed on each member node, can reach the purpose of load balance equally.
(4) the present invention adopts the lock management node of unique node as a shared resource, and backup node is not set, and the lock application information is only set up copy in this locality in application and response message are flowed through the process of local service process, and therefore, system response time is fast; When having overcome prior art employing backup management node mode, for each latching operation, lock management node (host node) must just can return response message, the problem that system response time is slow synchronously with backup node (chromaffin body point).
Description of drawings
Figure 1A, Figure 1B are shared resource lock application process figure of the present invention;
Fig. 2 is the computer organization synoptic diagram in the group system of the present invention.
Embodiment
Shared resource lock distribution method in the computer cluster provided by the invention comprises:
With the lock management node of more than one member node in the system as whole shared resources in the system, the only corresponding lock management node of each shared resource;
When a certain application request application or release shared resource lock, ask to the lock management node transmission of this shared resource correspondence, finish to lock by the lock management node of correspondence and apply for distributing or release.
Specify lock management node how to determine the shared resource correspondence with two embodiment below:
Embodiment one: with the member node of first application shared resource lock in the system lock management node as this shared resource.
Suppose to have in the computer cluster 5 member node, be respectively: node 1, node 2, node 3, node 4 and node 5;
Have 10 shared resources in the system, be respectively: shared resource A, shared resource B, shared resource C, shared resource D, shared resource E, shared resource F, shared resource G, shared resource H, shared resource I and shared resource J;
Each node in the system (node 1-5) is all preserved a global resource directory information table, and this global resource directory information table comprises two fields at least, is respectively: shared resource title and corresponding lock management node.When system initialization, in this global resource directory information table in each node all without any record; After the system start-up operation, application process in one or more nodes can be initiated the application of shared resource lock, supposes that the lock application request of shared resource A is initiated in certain application in the node 1, then the local global resource directory information table of preserving of node 1 inquiry, according to Query Result, carry out following processing:
1) if Query Result shows the corresponding lock management node that does not write down shared resource A in current this global resource directory information table, then send and adopt the reliable multicast message that transmits in order, each node in the reporting system " node 1 request shared resource A locks distribution " by node 1 each node (comprising himself) in system; Each node in the system is determined the corresponding lock management node of shared resource A according to the logic preface of the multicast message that receives.If also lock in the system without any node request over-allocation shared resource A, then decision node 1 is the member node of this shared resource lock of first application, therefore, confirm that node 1 is the lock management node of shared resource A correspondence, 1 couple of shared resource A locks distribution by node; Simultaneously, the lock management node of the record shared resource A correspondence in the global resource directory information table that preserve this locality of each node in the system is a node 1.
Owing to adopted the reliable multicast message that transmits in order to come each node " certain node request shared resource lock distributes " in the reporting system, even being arranged, the application process of a plurality of nodes all initiates the lock application request of same shared resource at the synchronization or the close moment, the order of the notification message that each node in the system is received is consistent, the member node of the first application shared resource lock of therefore determining is consistent, is consistent from the lock management node of certain shared resource of determining.
2) if Query Result shows the corresponding lock management node that has recorded shared resource A in current this global resource directory information table, then directly to the lock management node transmission unicast messages of correspondence, request distributes shared resource.For example: after supposing a period of time, certain application process in the node 2 is initiated the request of application shared resource A, then by the local global resource directory information table of preserving of node 2 inquiries, learn that its corresponding management node is a node 1, then directly initiate lock application request, lock distribution by node 1 to node 1.
Owing in the time inquiring the lock management node of certain shared resource, do not need to initiate again multicast message, and can improve the efficient of system greatly directly to the lock management node transmission unicast messages of correspondence.
Group system is in operational process, each uses the lock application request of shared resource of initiating successively, according to said method, this group system can be stored the lock management nodal information that each uses the shared resource correspondence of being applied in the global resource directory information table that each node is stored after operation a period of time.The member node of supposing the lock application request of first application shared resource A, B is a node 1, the member node of the lock application request of first application shared resource C, D is a node 2, the member node of the lock application request of first application shared resource E, F is a node 3, the member node of the lock application request of first application shared resource G, H is a node 4, the member node of the lock application request of first application shared resource I, J is a node 5, the global resource directory information table of storing in each node then, as shown in table 1 below:
Table 1
The shared resource title Corresponding lock management node
Shared resource A, B Member node 1
Shared resource C, D Member node 2
Shared resource E, F Member node 3
Shared resource G, H Member node 4
Shared resource I, J Member node 5
Embodiment two: the corresponding lock management node of setting up shared resource by hash algorithm.
Hash algorithm has multiple, and the present invention only requires the unique member node in each shared resource and the system in the system is set up mapping relations, and member node that will be corresponding with each shared resource is as this shared resource lock management node; The present invention does not limit the concrete hash algorithm that is adopted.
Adopt hash algorithm to determine that the concrete grammar of lock management node and embodiment one are basic identical, difference is: after the request of receiving of each member node in the system distributes the multicast message of shared resource lock, adopt the lock management node of determining corresponding resource with a kind of hash algorithm.The lock management node of being made is locked distribution to the corresponding shared resource of request.
Below in conjunction with accompanying drawing, the idiographic flow of the inventive method is described in detail, referring to Figure 1A, 1B, comprise the steps:
Step S11, certain shared resource lock of application application, the global resource directory information table of preserving in the inquiry local node initiates to lock the application request by the local node service processes to the corresponding lock management node of this shared resource;
Step S12, lock management node receive lock application request;
Whether step S13, lock management node judge that this shared resource is current has lock to distribute, if there is lock to distribute, and execution in step S14 then; Otherwise, execution in step S15;
In this step, after the lock management node is received lock application request, judged whether that according to the lock distribution principle of corresponding shared resource lock can distribute;
For example: suppose that this shared resource maximum can distribute 5 locks to use for identical or different application simultaneously, after the lock management node is received lock application request to this shared resource, judge current 5 locks that whether dispensed, there is lock to distribute if no, think then that this shared resource is current; Otherwise, judge this shared resource current not lock can distribute;
Step S14, distribution shared resource are locked to current request, and the shared resource that promptly allows current request that application is opened locks;
In this step, the lock management node also returns the lock application to the member node of initiating lock application request and accepts result's response after distributing the shared resource lock to current request;
In each lock management node, preserve the lock allocation information table, the lock formation that storage has distributed and etc. lock formation to be allocated;
When the lock management node is that current request has been distributed shared resource when lock, the nodal information that distributes lock and initiate lock application request is deposited in the lock formation that has distributed preserves; Finish this lock application process;
Step S15, lock management node insert the shared resource lock application request of this reception etc. in the lock formation to be allocated and preserve, and continue flow process shown in Figure 1B.
Shown in Figure 1B, when some lock management nodes etc. store one or more requests in the lock formation to be allocated, when waiting for that the shared resource lock that has taken discharges, carry out following steps:
Step S21, lock management node judge local preserve etc. whether request overtime (system can be provided with the time-out time of request, when request timed out, should cancel this time request) is arranged in the lock formation to be allocated, if there is not request timed out, then continue step S22; If have request overtime, then go to step S24;
Whether the shared resource of step S22, judgement application has lock to discharge; If lock discharges, then go to step S21 (if etc. a request is only arranged in the lock formation to be allocated, then continue to judge whether this request overtime; If etc. having stored a plurality of requests in the lock formation to be allocated, then select next request to judge whether it is overtime successively);
If have lock to discharge, then execution in step S23;
Step S23, the lock that will discharge are distributed to this request in the formation, also return the lock application and accept result's response to the corresponding member node of initiating this lock application request, and the nodal information that will distribute lock and initiate lock application request is deposited in the lock formation that has distributed and preserves, and goes to step S25;
Overtime lock application request in the lock formation to be allocated such as step S24, deletion, and return lock application refusal result response, continuation step S25 to the corresponding member node of initiating this lock application request;
Whether lock formation to be allocated such as step S25, judgement is empty, if formation is sky, then process ends; Otherwise, return step S21, continue the request waited in the processing queue, up to all requests all dispose (perhaps the shared resource of request for allocation lock, perhaps request timed out is cancelled).
Discharge for locking, flow process is simple relatively, and after the shared resource of application was finished using, the shared resource lock of the lock management node release application that request is corresponding was sent out and got final product.After the release one of lock management node is the shared resource lock, the corresponding lock assignment record of deletion in the lock formation that has distributed in the lock allocation information table of this locality.
In order to satisfy when having member node to lose efficacy in the system data correctness of whole service in the recovery system as early as possible, each member node is also preserved the lock application information table of self, and the shared resource that storage is initiated by this node is locked application information.After the shared resource of this node application was finished using, the corresponding lock management node of request discharged the shared resource of application and locks, and corresponding shared resource is locked application record in the lock application information table of the local storage of deletion.
More than with idiographic flow the shared resource of group system of the present invention lock distributions/release steps is described in detail, specifically describing below has member node to lose efficacy in system or newcomer's node adds fashionablely, system carries out the concrete grammar of data recovery.
When having member node to lose efficacy in the system, can be that shared resource (for convenience of description, being called stale resource will) that failure node is managed redefines the new lock management node line data of going forward side by side and recovers by following two kinds of methods.
Method one: take reliable orderly multicast mode multicast separately to the application status message of stale resource will lock by the member node of having applied for the stale resource will lock; According to the message preface, first is sent the new lock management node of the node of a certain stale resource will lock application status message as this stale resource will.
Still with above-mentioned 5 member node, group system with 10 shared resources is an example, suppose that member node 1 lost efficacy, by last table 1 as can be known, member node 1 is the lock management node of shared resource A, B, therefore, shared resource A and shared resource B are stale resource will, need to determine new lock management node for shared resource A, B in the system again; For this reason, in the system all applied for the member node of stale resource will A or B take reliable orderly multicast mode in whole group system this node of multicast to the application status of stale resource will A or B; Suppose:
Node 2 has been applied for shared resource C, D, A lock;
Node 3 has been applied for shared resource E, F, B lock;
Node 4 has been applied for shared resource G, H, A, B lock;
Node 2, node 3 and node 4 all adopt reliable orderly multicast mode to send multicast message, and according to the message preface of multicast message, each member node is determined:
Node 2 sends multicast message at first, notifies it to apply for shared resource A lock;
Node 3 sends multicast message after node 2, notify it to apply for shared resource B lock;
Node 4 sends multicast message after node 3, notify it to apply for shared resource A, B lock;
Then according to said method of the present invention, the new lock management node of shared resource A is a node 2; The new lock management node of shared resource B is a node 3.
Method two: lost efficacy if a member node detects another member node, then take reliable orderly multicast mode multicast node failure notification in whole group system, after each member node receives node failure notice, adopt the new management node that redefines out the shared resource that failure node manages with a kind of hash algorithm.
After the new lock management node of stale resource will was determined, each effective member node need be carried out following Data Update:
1, upgrades the global resource directory information table.
As mentioned above, suppose that the new lock management node of shared resource A is a node 2, the new lock management node of shared resource B is a node 3, and then the global resource directory information table shown in the updating form 1 is as shown in table 2 below:
Table 2
The shared resource title Corresponding lock management node
Shared resource C, D, A Member node 2
Shared resource E, F, B Member node 3
Shared resource G, H Member node 4
Shared resource I, J Member node 5
By last table 2 as can be known, because member node 1 had lost efficacy, no longer be the effective member node in the system, shared resource A, the B of its original management are now taken over by node 2 and node 3 respectively and carry out lock management.
2, upgrade lock application information table
As mentioned above, for node 2, owing to applied for shared resource C, D, A lock, before node 1 lost efficacy, the application information table of its local storage was as shown in table 3 below:
Table 3
The shared resource title of application Corresponding lock management node
Shared resource C Member node 2
Shared resource D Member node 2
Shared resource A Member node 1
After node 1 lost efficacy, it is as shown in table 4 below that node 2 upgrades the local application information table of storing:
Table 4
The shared resource title of application Corresponding lock management node
Shared resource C Member node 2
Shared resource D Member node 2
Shared resource A Member node 2
In like manner, for node 3, owing to applied for shared resource E, F, B lock, before node 1 lost efficacy, the application information table of its local storage was as shown in table 5 below:
Table 5
The shared resource title of application Corresponding lock management node
Shared resource E Member node 3
Shared resource F Member node 3
Shared resource B Member node 1
After node 1 lost efficacy, it is as shown in table 6 below that node 3 upgrades the local application information table of storing:
Table 6
The shared resource title of application Corresponding lock management node
Shared resource E Member node 3
Shared resource F Member node 3
Shared resource B Member node 3
In like manner, for node 4, owing to applied for shared resource G, H, A, B lock, before node 1 lost efficacy, the application information table of its local storage was as shown in table 7 below:
Table 7
The shared resource title of application Corresponding lock management node
Shared resource G Member node 4
Shared resource H Member node 4
Shared resource A Member node 1
Shared resource B Member node 1
After node 1 lost efficacy, it is as shown in table 8 below that node 4 upgrades the local application information table of storing:
Table 8
The shared resource title of application Corresponding lock management node
Shared resource G Member node 4
Shared resource H Member node 4
Shared resource A Member node 2
Shared resource B Member node 3
3, upgrade local lock allocation information table
Suppose to have applied for shared resource A, B, C, G before member node 1 lost efficacy;
Give an example according to above-mentioned, this locality lock allocation information table (being example with the lock formation that has distributed only) of node 2 storage before node 1 lost efficacy is as shown in table 9 below:
Table 9
The lock sign of having distributed Initiate the corresponding member node of lock application request
The 1st lock of shared resource C Member node 2
The 1st lock of shared resource D Member node 2
The 2nd lock of shared resource C Member node 1
After node 1 lost efficacy, node 2 needs to discharge the lock (promptly discharging the 2nd lock of shared resource C) of failure node 1 application, simultaneously, because node 2 has become the new lock management node of shared resource A, and give an example according to above-mentioned, node 2 and node 4 have all been applied for stale resource will A lock, and like this, this locality lock allocation information table after node 2 upgrades is as shown in table 10 below:
Table 10
The lock sign of having distributed Initiate the corresponding member node of lock application request
The 1st lock of shared resource C Member node 2
The 1st lock of shared resource D Member node 2
The 2nd lock of shared resource A Member node 2
The 3rd lock of shared resource A Member node 4
For node 3, this locality lock allocation information table (being example with the lock formation that has distributed only) of storage was as shown in table 11 below before node 1 loses efficacy:
Table 11
The lock sign of having distributed Initiate the corresponding member node of lock application request
The 1st lock of shared resource E Member node 3
The 1st lock of shared resource F Member node 3
After node 1 lost efficacy, because node 3 has become the new lock management node of shared resource B, and give an example according to above-mentioned, node 3 and node 4 have all been applied for stale resource will B lock, and like this, this locality lock allocation information table after node 3 upgrades is as shown in table 12 below:
Table 12
The lock sign of having distributed Initiate the corresponding member node of lock application request
The 1st lock of shared resource E Member node 3
The 1st lock of shared resource F Member node 3
The 2nd lock of shared resource B Member node 3
The 3rd lock of shared resource B Member node 4
In like manner, for node 4, this locality lock allocation information table of storage before node 1 lost efficacy is as shown in table 13 below:
Table 13
The lock sign of having distributed Initiate the corresponding member node of lock application request
The 1st lock of shared resource G Member node 4
The 1st lock of shared resource H Member node 4
The 2nd lock of shared resource G Member node 1
After node 1 lost efficacy, node 4 needed to discharge the lock (promptly discharging the 2nd lock of shared resource G) of failure node 1 application, and like this, this locality lock allocation information table after node 4 upgrades is as shown in table 14 below:
Table 14
The lock sign of having distributed Initiate the corresponding member node of lock application request
The 1st lock of shared resource G Member node 4
The 1st lock of shared resource H Member node 4
Like this, each member node is by the renewal of above-mentioned three tables, the consistance of finishing the recovery of total system data and realizing each node related data.
There have newcomer's node to add in system to be fashionable, and this newcomer's node can acquire the current global resource directory information table of system from other effective member node of normally moving.For example, this newcomer's node sends multicast message, acquisition request global resource directory information table, other effective member node of normally having moved sends to this newcomer's node with the global resource directory information table of oneself, this newcomer's node can obtain the current global resource directory information table of system, if this newcomer's node receives many parts of global resource directory information tables, this newcomer's node can be stored a copy of it; Perhaps this newcomer's node can send multicast message, and other member node status information of acquisition request, and according to other member node status information of obtaining is obtained the global resource directory information table from the longest node of time-to-live.
According to above-mentioned shared resource lock distribution method provided by the invention, the invention provides corresponding computing machine, each computing machine as shown in Figure 2, comprises in each computing machine as a member node in the computer cluster:
First memory module 11, the global resource directory information table of the lock management node of shared resource in the storage system and correspondence thereof;
First functional module 12, be used for when a certain shared resource of a certain application request application is locked, inquire about the global resource directory information table of preserving in described first memory module 11, if do not write down the lock management nodal information of this shared resource correspondence in the described global resource directory information table, then send the reliable multicast message that transmits in order to the whole member node in the system; If record the lock management nodal information of this shared resource correspondence in the described global resource directory information table, then initiate lock application request to corresponding lock management node;
Lock management node determination module 13 receives described multicast message, determines the lock management node of the shared resource correspondence of request, and judges whether this node is corresponding lock management node; When judging this node, send a notification message to the shared resource lock management module 16 of this node for corresponding lock management node;
Shared resource lock management module 16 receives the lock application or discharges request, and its one or more shared resource locks of managing are realized distributing and releasing operation.
Also comprise:
Second memory module 14, the shared resource lock application information that storage is initiated by this node;
Second functional module 15 after the shared resource of this node application is finished using, is asked corresponding lock management node to discharge the shared resource lock of application, and is deleted shared resource application record corresponding in the lock application information table of storing in second memory module 14;
Also comprise:
The 3rd memory module 17, the lock formation that distributed of storage and etc. lock formation to be allocated; After shared resource lock management module 16 receives lock application request, if judge the current lock branch timing that has according to the lock distribution principle of corresponding shared resource, the shared resource lock of request for allocation, and will distribute the nodal information of lock and initiation lock application request to be deposited in the lock formation that has distributed; Divide a timing if judge current lock, this request is inserted etc. in the lock formation to be allocated; Have lock to discharge and request when not overtime up to this shared resource, the lock that will discharge is distributed to this request again.
Form computer cluster by above-mentioned many computing machines provided by the invention, thereby realize method provided by the invention.
In sum, the present invention is by in computer cluster, two or more member node are set as the lock management node, finish the lock distribution/releasing operation of one or more shared resources of being managed, overcome in the prior art by single node and finished the performance bottleneck problem that the lock distribution/release of all resources in the cluster is had.
The present invention can set up the mapping relations of shared resource and member node by modes such as hash algorithms, determines the lock management node of each shared resource, thereby reaches the lock management node of each shared resource of balanced distribution, realizes load balance; Perhaps adopt the corresponding relation of the member node of first application shared resource lock being determined shared resource and lock management node as the mode of the lock management node of this shared resource, owing to use randomness in the cluster to the lock services request, therefore, the lock management node of each shared resource will be randomly dispersed on each member node, can reach the purpose of load balance equally.
The present invention adopts the lock management node of unique node as a shared resource, and backup node is not set, and the lock application information is only set up copy in this locality in application and response message are flowed through the process of local service process, and therefore, system response time is fast; When having overcome prior art employing backup management node mode, for each latching operation, lock management node (host node) must just can return response message, the problem that system response time is slow synchronously with backup node (chromaffin body point).
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (17)

1, shared resource lock distribution method in a kind of computer cluster is characterized in that, comprising:
With the lock management node of more than one member node in the system as shared resource in the system, the only corresponding lock management node of each shared resource;
When a certain application request application or release shared resource lock, ask to the lock management node transmission of this shared resource correspondence, finish to lock by the lock management node of correspondence and apply for distributing or release.
2, the method for claim 1 is characterized in that, each member node in the system is with the member node of the first application shared resource lock lock management node as this shared resource.
3, method as claimed in claim 2 is characterized in that, preserves the global resource directory information table in each member node, the corresponding relation of the lock management node of shared resource in the storage system and correspondence thereof;
When a certain application request application or when discharging a certain shared resource lock, the described global resource directory information table that the local node inquiry self is preserved is initiated the lock application or is discharged request to the corresponding lock management node that inquires.
4, method as claimed in claim 3, it is characterized in that, when in inquiring described global resource directory information table, not having the lock management nodal information of shared resource correspondence of record request, described local node each member node in system sends reliable orderly multicast message, and the lock management node of this shared resource correspondence is determined in request; Each member node is determined the lock management node of this shared resource correspondence according to the logic preface of the multicast message that receives; And lock distribution by the corresponding lock management node of determining.
5, method as claimed in claim 3, it is characterized in that, when the lock management nodal information of the shared resource correspondence that does not have record request in the described global resource directory information table, described local node each member node in system sends multicast message, and the lock management node of this shared resource correspondence is determined in request; After each member node receives described multicast message, adopt the lock management node of determining this shared resource correspondence with a kind of hash algorithm, and lock distribution by the corresponding lock management node of determining.
6, as the described method of the arbitrary claim of claim 1-5, it is characterized in that, after described lock management node is received lock application request, judge according to the lock distribution principle of corresponding shared resource and can lock the branch timing, allow current request that this shared resource is locked.
7, method as claimed in claim 6 is characterized in that, preserves the lock allocation information table in each lock management node, the lock formation that storage has distributed and etc. lock formation to be allocated;
When the lock management node is received lock application request and judged the current lock branch timing that has according to the lock distribution principle of corresponding shared resource, the shared resource lock of request for allocation, and will distribute the nodal information of lock and initiation lock application request to be deposited in the lock formation that has distributed;
Divide a timing when judging current lock, this request is inserted etc. in the lock formation to be allocated; Up to this shared resource have lock to discharge and request not overtime, the lock that will discharge is distributed to this request again.
8, as the described method of the arbitrary claim of claim 1-5, it is characterized in that, in system's member node, preserve lock application information table, the shared resource lock application information that storage is initiated by this node;
After the shared resource of application was finished using, the corresponding lock management node of request discharged the shared resource lock of application, and corresponding lock application record in the local lock of the deletion application information table.
9, method as claimed in claim 8 is characterized in that, when having member node to lose efficacy in the system, carries out the following step:
The stale resource will of managing for failure node in the system redefines new lock management node; Described stale resource will is the shared resource that failure node is managed;
The shared resource lock whether the failure node application is arranged in the local lock allocation information table of preserving of each member node inspection if having, then discharges the shared resource lock of failure node application, and upgrades local lock allocation information table of preserving;
Each member node is according to local global dictionary information table of preserving of the new lock management node updates of determining and lock application information table.
10, method as claimed in claim 9 is characterized in that, described for stale resource will redefines new lock management node, concrete grammar comprises:
Take reliable orderly multicast mode multicast separately to the application status message of described stale resource will lock by the member node of having applied for the stale resource will lock;
Each member node sends the new lock management node of the node of a certain stale resource will lock application status message as this stale resource will according to the message preface of described application status message with first.
11, method as claimed in claim 10 is characterized in that, the application status message that the stale resource will that the new lock management node of described stale resource will sends according to described each member node is locked is in the lock assignment information of this stale resource will of local recovery.
12, method as claimed in claim 9 is characterized in that, described stale resource will of managing for failure node in the system redefines new lock management node, and concrete grammar comprises:
The member node that detects a certain member node inefficacy is taked reliable orderly multicast mode multicast node failure notification in whole group system, after each member node receives described node failure notice, adopt the new management node that redefines out the shared resource that failure node manages with a kind of hash algorithm.
13, method as claimed in claim 8 is characterized in that, there have newcomer's node to add in system to be fashionable, carries out the following step:
Described newcomer's node obtains described global resource directory information table from other effective member node and storage is a in this locality.
14, a kind of computing machine is applied to computer cluster, and described computing machine is as a member node of described system; It is characterized in that, in described computing machine, comprise:
First memory module, the global resource directory information table of the lock management node of shared resource in the storage system and correspondence thereof;
First functional module, be used for when a certain shared resource of a certain application request application is locked, inquire about the global resource directory information table of preserving in described first memory module, if do not write down the lock management nodal information of this shared resource correspondence in the described global resource directory information table, then send the reliable multicast message that transmits in order to the whole member node in the system, corresponding lock management node is determined in request; If record the lock management nodal information of this shared resource correspondence in the described global resource directory information table, then initiate lock application request to corresponding lock management node;
Lock management node determination module receives described multicast message, determines the lock management node of the shared resource correspondence of request, and judges whether this node is corresponding lock management node; When judging this node, send a notification message to the shared resource lock management module of this node for corresponding lock management node;
Shared resource lock management module receives the lock application or discharges request, and its one or more shared resource locks of managing are realized distributing and releasing operation.
15, computing machine as claimed in claim 14 is characterized in that, also comprises:
Second memory module is preserved lock application information table, the shared resource lock application information that storage is initiated by this node;
Second functional module, after the shared resource of this node application was finished using, the corresponding lock management node of request discharged the shared resource lock of application, and deletes shared resource application record corresponding in the lock application information table of storing in second memory module.
16, as claim 14 or 15 described computing machines, it is characterized in that, also comprise:
The 3rd memory module, the lock formation that distributed of storage and etc. lock formation to be allocated; After shared resource lock management module receives lock application request, if judge the current lock branch timing that has according to the lock distribution principle of corresponding shared resource, then the shared resource of request for allocation is locked, and the nodal information that will distribute lock and the application of initiation lock to ask is deposited in the lock formation that has distributed; Divide timing if judge current lock, then this request is inserted etc. in the lock formation to be allocated, have lock to discharge and request when not overtime up to this shared resource, the lock that will discharge is distributed to this request again.
17, a kind of computer cluster is characterized in that, comprises many computing machines as claimed in claim 14.
CNB2006101409834A 2006-10-19 2006-10-19 Method for distributing shared resource lock in computer cluster system and cluster system Active CN100432940C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101409834A CN100432940C (en) 2006-10-19 2006-10-19 Method for distributing shared resource lock in computer cluster system and cluster system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101409834A CN100432940C (en) 2006-10-19 2006-10-19 Method for distributing shared resource lock in computer cluster system and cluster system

Publications (2)

Publication Number Publication Date
CN1945539A CN1945539A (en) 2007-04-11
CN100432940C true CN100432940C (en) 2008-11-12

Family

ID=38044954

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101409834A Active CN100432940C (en) 2006-10-19 2006-10-19 Method for distributing shared resource lock in computer cluster system and cluster system

Country Status (1)

Country Link
CN (1) CN100432940C (en)

Families Citing this family (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256510B (en) * 2008-04-11 2010-06-16 中兴通讯股份有限公司 Cluster system and method for implementing centralized management thereof
CN101650646B (en) * 2009-09-22 2012-02-08 杭州华三通信技术有限公司 Method and device for realizing shared data consistency
CN102262559B (en) * 2010-05-24 2013-08-14 腾讯科技(深圳)有限公司 Resource sharing method and system
CN102339283A (en) * 2010-07-20 2012-02-01 中兴通讯股份有限公司 Access control method for cluster file system and cluster node
US8561080B2 (en) * 2011-04-26 2013-10-15 Sap Ag High-load business process scalability
CN102355473B (en) * 2011-06-28 2013-12-25 用友软件股份有限公司 Locking control system in distributed computing environment and method
CN102388367B (en) * 2011-08-17 2014-11-05 华为技术有限公司 Processor management method, lock competitive management device and computer system
CN102426540B (en) * 2011-11-14 2013-06-05 苏州阔地网络科技有限公司 Global session backup switching method and device in distributed instant communication software
CN103248667B (en) * 2012-02-14 2016-03-30 阿里巴巴集团控股有限公司 A kind of resource access method of distributed system and system
CN103297456B (en) * 2012-02-24 2016-09-28 阿里巴巴集团控股有限公司 Access method and the distributed system of resource is shared under a kind of distributed system
CN103544189A (en) * 2012-07-17 2014-01-29 珠海金山办公软件有限公司 Method and system for locking currently-edited file
CN103488526A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 System and method for locking business resource in distributed system
US9742685B2 (en) * 2013-09-26 2017-08-22 International Business Machines Corporation Enhanced mechanisms for granting access to shared resources
CN104657260B (en) * 2013-11-25 2018-05-15 航天信息股份有限公司 The implementation method of the distributed lock of shared resource is accessed between control distributed node
CN103731485A (en) * 2013-12-26 2014-04-16 华为技术有限公司 Network equipment, cluster storage system and distributed lock management method
CN104753987B (en) * 2013-12-26 2019-03-01 北京东方通科技股份有限公司 A kind of distributed conversation management method and system
CN104702655B (en) * 2014-03-21 2018-04-27 杭州海康威视系统技术有限公司 Cloud storage resource allocation methods and its system
CN104102547B (en) * 2014-07-25 2017-12-26 珠海全志科技股份有限公司 The synchronous method and its sychronisation of multicomputer system
CN104239418B (en) * 2014-08-19 2018-01-19 天津南大通用数据技术股份有限公司 Support the distribution locking method and distributed data base system of distributed data base
HUE042424T2 (en) * 2014-11-12 2019-07-29 Huawei Tech Co Ltd Lock server malfunction processing method and system thereof in distribution system
CN104461705B (en) * 2014-11-17 2019-02-19 华为技术有限公司 A kind of method and storage control, cluster storage system of business access
CN104461707B (en) * 2014-11-28 2018-09-28 华为技术有限公司 a kind of lock request processing method and device
CN104780613B (en) * 2015-04-23 2018-03-02 河北远东通信系统工程有限公司 Resource-sharing and synchronous method between Digital Clustering base station and switching centre
CN105069008A (en) * 2015-07-03 2015-11-18 曙光信息产业股份有限公司 Distributed system data processing method and apparatus
CN106712981B (en) * 2015-07-23 2020-03-06 阿里巴巴集团控股有限公司 Node change notification method and device
CN105069081B (en) * 2015-07-31 2019-05-07 北京金山安全软件有限公司 Shared resource access method and device
CN105426469A (en) * 2015-11-16 2016-03-23 天津南大通用数据技术股份有限公司 Database cluster metadata management method and system
CN106713398A (en) * 2015-11-18 2017-05-24 中兴通讯股份有限公司 Communication monitoring method and monitoring node of shared storage type cluster file system node
CN106991008B (en) * 2016-01-20 2020-12-18 华为技术有限公司 Resource lock management method, related equipment and system
CN105760519B (en) * 2016-02-26 2020-08-28 北京鲸鲨软件科技有限公司 Cluster file system and file lock distribution method thereof
CN107301091A (en) * 2016-04-14 2017-10-27 北京京东尚科信息技术有限公司 Resource allocation methods and device
CN105721617B (en) * 2016-04-28 2019-05-14 安徽四创电子股份有限公司 A kind of rolling update method of cloud service system
CN106293934B (en) * 2016-07-19 2019-02-01 浪潮(北京)电子信息产业有限公司 A kind of cluster system management optimization method and platform
CN107066324A (en) 2017-03-08 2017-08-18 广东欧珀移动通信有限公司 A kind of control method and equipment of finger prints processing resource
CN106851123B (en) 2017-03-09 2020-12-22 Oppo广东移动通信有限公司 Exposure control method, exposure control device and electronic device
WO2018176397A1 (en) 2017-03-31 2018-10-04 华为技术有限公司 Lock allocation method, device and computing apparatus
CN107402821B (en) * 2017-07-03 2020-06-30 阿里巴巴集团控股有限公司 Access control method, device and equipment for shared resources
CN107515935A (en) * 2017-08-29 2017-12-26 郑州云海信息技术有限公司 A kind of method and system for releasing file lock failure
CN109753540A (en) * 2018-12-03 2019-05-14 新华三云计算技术有限公司 Shared resource access method, device and computer-readable storage medium
CN109344136A (en) * 2018-12-13 2019-02-15 浪潮(北京)电子信息产业有限公司 A kind of access method of shared-file system, device and equipment
CN110430258B (en) * 2019-08-01 2021-12-24 赵志强 Distributed lock management method and device
CN110708187A (en) * 2019-09-11 2020-01-17 上海爱数信息技术股份有限公司 Tape library management device and method supporting cluster
US11321300B2 (en) * 2019-10-08 2022-05-03 Huawei Technologies Co., Ltd. Method and system for fast processing of locks requested to access a shared resource
CN110990161A (en) * 2019-11-15 2020-04-10 北京浪潮数据技术有限公司 Shared resource access method, device, equipment and computer readable storage medium
CN113032407A (en) * 2019-12-24 2021-06-25 顺丰科技有限公司 Processing method and device of mutual exclusion function, storage medium and computer equipment
CN112835982B (en) * 2021-02-26 2023-03-24 浪潮云信息技术股份公司 Table lock implementation method based on distributed database
CN117519945A (en) * 2023-12-07 2024-02-06 北京优炫软件股份有限公司 Database resource scheduling method, device and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6542926B2 (en) * 1998-06-10 2003-04-01 Compaq Information Technologies Group, L.P. Software partitioned multi-processor system with flexible resource sharing levels
CN1703891A (en) * 2002-11-29 2005-11-30 国际商业机器公司 High-performance lock management for flash copy in N-way shared storage systems
CN1808389A (en) * 2006-02-20 2006-07-26 南京联创科技股份有限公司 Autonomous locking method based on shared memory for account background memory database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6542926B2 (en) * 1998-06-10 2003-04-01 Compaq Information Technologies Group, L.P. Software partitioned multi-processor system with flexible resource sharing levels
CN1703891A (en) * 2002-11-29 2005-11-30 国际商业机器公司 High-performance lock management for flash copy in N-way shared storage systems
CN1808389A (en) * 2006-02-20 2006-07-26 南京联创科技股份有限公司 Autonomous locking method based on shared memory for account background memory database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于服务器主动调度策略的集群体系结构. 赵文来,余冬梅,杨俊秀,祝超群.兰州理工大学学报,第30卷第2期. 2004 *
一种服务器/均衡器合作式集群体系的结构. 吴欢欢,万晓冬.科技广场. 2005 *

Also Published As

Publication number Publication date
CN1945539A (en) 2007-04-11

Similar Documents

Publication Publication Date Title
CN100432940C (en) Method for distributing shared resource lock in computer cluster system and cluster system
CN107888657B (en) Low latency distributed storage system
CN101751415B (en) Metadata service system, metadata synchronized method and writing server updating method
JP2003022209A (en) Distributed server system
CN112463366B (en) Cloud-native-oriented micro-service automatic expansion and contraction capacity and automatic fusing method and system
US8818942B2 (en) Database system with multiple layer distribution
CN102831156A (en) Distributed transaction processing method on cloud computing platform
US9201747B2 (en) Real time database system
CN101960427A (en) The balance consistance hash of distributed resource management
CN102025550A (en) System and method for managing data in distributed cluster
CN102355473A (en) Locking control system in distributed computing environment and method
US20100161897A1 (en) Metadata server and disk volume selecting method thereof
CN103744719A (en) Lock management method, lock management system, lock management system configuration method and lock management system configuration device
Glade et al. Light-weight process groups in the ISIS system
CN102474531A (en) Address server
CN102281332A (en) Distributed cache array and data updating method thereof
CN109639773A (en) A kind of the distributed data cluster control system and its method of dynamic construction
CN105069152A (en) Data processing method and apparatus
US20090100436A1 (en) Partitioning system including a generic partitioning manager for partitioning resources
CN101344882B (en) Data query method, insertion method and deletion method
CN116055563A (en) Task scheduling method, system, electronic equipment and medium based on Raft protocol
CN101800763A (en) hybrid locking using network and on-disk based schemes
US8266634B2 (en) Resource assignment system with recovery notification
JP2740105B2 (en) Distributed database control method
CN100416542C (en) Load distribution system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant