CN101714152B - Method for dividing database ownership among different database servers to control access to databases - Google Patents

Method for dividing database ownership among different database servers to control access to databases Download PDF

Info

Publication number
CN101714152B
CN101714152B CN 200910146448 CN200910146448A CN101714152B CN 101714152 B CN101714152 B CN 101714152B CN 200910146448 CN200910146448 CN 200910146448 CN 200910146448 A CN200910146448 A CN 200910146448A CN 101714152 B CN101714152 B CN 101714152B
Authority
CN
China
Prior art keywords
entitlement
group
database
data
data item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CN 200910146448
Other languages
Chinese (zh)
Other versions
CN101714152A (en
Inventor
詹弗兰科·普措卢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Priority to CN 200910146448 priority Critical patent/CN101714152B/en
Publication of CN101714152A publication Critical patent/CN101714152A/en
Application granted granted Critical
Publication of CN101714152B publication Critical patent/CN101714152B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Abstract

The invention relates to a method for dividing a database ownership among different database servers to control access to databases. At least part of a database (250) is divided into ownership groups (230, 232, 234, 236), and one or a plurality of database servers (208, 210, 212) are designated to each ownership group as the owners. The database servers (208, 210, 212) with the designated owners (s208, s210, s212) are deemed as the owners of all data items in the ownership group, which means that the designated owners (s208, s210, s212) are allowed to directly get access to the data items in the ownership group, but other database servers are not allowed to directly get access to the data items. A database system comprises one or a plurality of permanent storage devices (214, 216) in which the database (250) is stored, and a plurality of database servers (280, 210, 212), which operate on nodes (202, 204, 206) which can directly get access to the permanent storage devices (214, 216). At least part of the database (250) is divided into a plurality of ownership groups (230, 232, 234, 236), and an owner aggregate is designated to each ownership group. Only the progress operating on the database server which belongs to the same owner aggregate is allowed to directly get access to the data in the ownership group.

Description

Divide a database entitlement with the access control database between the disparate databases server
The application is based on dividing an application that the Chinese patent application submitted to June 28 calendar year 2001 proposes for No. 01822844.5.
Technical field
The present invention relates to Database Systems, relate in particular to and a kind ofly between the disparate databases server, database entitlement is divided with the access control database.
Background technology
The multiprocessing computer system is those systems that comprise a plurality of processing units, and these processing units can be relative to each other and parallel execution of instructions.In order to use parallel processing capability, the different aspect of a task can be assigned to different processing units.Here, the different aspect of a task is called workspace group (work granule), and the process of the district's group of being responsible for sharing out the work between the available processes unit is called the telegon process.
The multiprocessing computer system is divided into three kinds usually: share the system (sharedeverything system) of all, the system (shared disk system) of shared disk and the system (shared nothing system) that nothing is shared.To change according to related multiprocessing system type in the constraint condition that work allocation is placed during to the process of execution work district group.
In sharing the system of all, the process on all processors is all dynamic memory (hereinafter being commonly referred to as " storer ") and all static storage devices (hereinafter being commonly referred to as " disk ") in access system directly.
Therefore in a system of sharing all, almost there is no constraint condition for the district's group of how assigning the job.Yet need to carry out highly wiring so that the function of sharing all to be provided between different computer modules.In addition, concerning the framework of sharing all, also exist the restriction of scalability.
In the system of shared disk, processor and storer are returned synthetic node.Each node of shared disc system self can consist of a system of sharing all, has wherein comprised a plurality of processors and a plurality of storer.Process on all processors can be accessed all disks of this system, but only belongs to the storer that process on the processor of certain specific node can directly be accessed this specific node.The needed wiring of shared disc system is less than the system of sharing all usually.Yet shared disc system is more vulnerable to the impact of unbalanced operating load condition.For instance, if node has a process, this process acts on the workspace group of a large amount of dynamic storagies of needs, and so, the storer that belongs to this node may be not enough to preserve simultaneously the data of all needs.Therefore, a large amount of available and do not use storer even other nodes keep, this process also still will with the local storage swap data of node.
Shared disc system provides and has caused the software fault that storer is damaged to divide.Unique exception is those controll blocks of being used by internodal lock manager, and in fact, these controll blocks have all obtained copying at all nodes.
In without shared system, all processors, storer and disk all will be returned synthetic node.The same with shared disc system, in without shared system, each node self can consist of one and share the system of all or the system of a shared disk.Only have those to operate in storer and disk that process on certain specific node can directly be accessed this intra-node.In the multiprocessing system of these three kinds of routines, usually need minimum wiring without shared system between the different system assembly.Yet, be vulnerable to the impact of unbalanced operating load condition most without shared system.For instance, the total data that will access in certain workspace group all can be kept on the disk of certain specific node.Therefore the process that only operates in this intra-node can be used for carrying out this workspace group, even in the situation that the process on other node all keeps idle.
Provide the software fault that causes storer and/or disk to be damaged to divide without shared system.Unique exception is exactly the controll block of being controlled according to the data subset " entitlement " of different nodes.Compare with shared magnetic disc locking management information, entitlement is seldom modified.Therefore, the entitlement technology is more simple and reliable than shared disk lock management technology, because they do not have for high performance needs.
The database that operates on multiprocessing system is divided into two kinds usually: the database of shared disk and shared-nothing database.In the Database Systems of shared disk, a plurality of database servers (being normally operated on different nodes) can read any part with write into Databasce.Data access in the shared disk framework is coordinated by a distributed lock managers.The database of shared disk both may operate at without on the computer system of sharing, and also may operate on the computer system of shared disk.Want to add software support to operating system so at a database without a shared disk of operation in the Sharing computer system, also can provide additional firmware and make process can directly access remote disk.
Shared-nothing database hypothesis: only have to be included in one when belonging to together on the disk of a node with process when data, this process could direct visit data.Specifically, database data segments between the availability database server.The part data that each database server can only this database server of direct read/write has.If first server attempts to access the data that second server has, the first database server must send message to the second database server so, so that the second database server represents that it comes the executing data access.
Shared-nothing database both may operate on the multiprocessing system of shared disk, also may operate in without on the multiprocessing system of sharing.For shared-nothing database of operation on the machine of shared disk, can provide a kind of software mechanism that database is carried out logical partitioning and the entitlement of each subregion is assigned to certain specific node.
System without shared system and shared disk all has the convenience advantages relevant to its certain architectures separately.For instance, if there is frequent write access (writing focus) for data, shared-nothing database provides better performance so.If there is read access frequently (reading focus), the database of shared disk provides better performance so.And as mentioned above, in the situation that software fault occurs, provide better fault-tolerance without shared system.
According to above, comparatively it is desirable to provide independent Database Systems, this system can provide the performance advantage of these two kinds of database schemas.Yet these two kinds of frameworks normally repel mutually.
Summary of the invention
Database Systems here are provided, and wherein certain part with database or database is divided into the entitlement group.Each entitlement group has been assigned one or more database servers, makes it the owner as the entitlement group.The possessory database server that is assigned as the entitlement group is counted as the owner of all data item that belong to this entitlement group.That is to say, these database servers permit directly accessing the data item in this entitlement group, and other database servers disapprove these data item of direct access.
According to an aspect of the present invention, provide Database Systems, this system comprises one or more persistent storage device, has preserved a database on it, and this system also comprises a plurality of database servers that operate on a plurality of nodes.Each node can both directly be accessed persistent storage device.At least a portion database partition is a plurality of entitlement groups.Each entitlement group has been assigned owner's set.Only have those just to permit directly accessing the data of entitlement group inside as the member in entitlement group owner set and the process moved on database server.
Each entitlement group is labeled as without tenant in common group or shared disk entitlement group.Each has assigned an owner without the tenant in common group from database server.Only have each owner without the tenant in common group just to permit directly access without the data of tenant in common group inside.Each database server permits directly accessing the data of entitlement group inside, and these entitlement groups are labeled as shared disk entitlement group.
Description of drawings
The present invention describes by example, but this is not as restriction, and in the figure of accompanying drawing, identical Reference numeral represents identical parts, wherein:
Fig. 1 is the block diagram that can carry out the computer system of one embodiment of the invention;
Fig. 2 uses the block diagram of the distributed data base system of entitlement group according to one embodiment of the invention;
Fig. 3 is that a description is to the process flow diagram of the step of the data item executable operations of the system of support entitlement group;
Fig. 4 is a process flow diagram that the step of owner's set of the group that changes ownership according to the embodiment of the present invention is described; And
Fig. 5 is that the block diagram of the technology of atom variation is carried out in a description according to an embodiment of the present invention.
Embodiment
Described a kind of method here, be used between different database servers, the entitlement of a database being divided, in order to database access is controlled.In the following description, for illustrative purposes, many details are set forth, in order to provide about complete understanding of the present invention.Yet for a person skilled in the art, the present invention obviously can be in the situation that do not possess these specific detail and implement.For fear of unnecessarily making the present invention unclear, other examples, known features and equipment show with the block diagram form.
Hardware overview
Fig. 1 is the block diagram of describing the computer system 100 that can carry out one embodiment of the invention.Computer system 100 comprises a bus 102 or is used for other communication structures of transmission of information, and comprises that one is coupled with bus 102 and for the treatment of the processor 104 of information.Computer system 100 also comprises a primary memory 106, for example random access storage device (RAM) or other dynamic storage, and this storer and bus 102 couplings are used for the instruction that preservation information and processor 104 will be carried out.In the process of operation processor 104 performed instructions, primary memory 106 also can be used for preserving temporary variable or other intermediate information.Computer system 100 also comprises a ROM (read-only memory) (ROM) 108 or other static storage device, and itself and bus 102 are coupled, the instruction that is used for preserving static information and relates to processor 104.And provide such as disk or this memory device 110 of CD, itself and bus 102 are coupled, and are used for preservation information and instruction.
Computer system 100 can be connected with this class display 112 of cathode-ray tube (CRT) (CRT) via bus 102, thereby information is shown to the computer user.The input equipment 114 that comprises alphanumeric and other buttons is connected with bus 102, in order to information and command selection are passed to processor 104.Another kind of user input device is that cursor controls 116, and for example mouse, trace ball or cursor direction key are used for directional information and command selection are passed to cursor movement on processor 104 and control display device 112.This input equipment has two degree of freedom usually on the first axle (for example x) and these two axles of the second axle (for example y), equipment can be determined a position on the plane thus.
The present invention relates to provide with computer system 100 Database Systems that shared disk/nothing is shared of a mixing.According to one embodiment of present invention, processor 104 is carried out one or more sequences of the one or more instructions that comprise in primary memory 106, and computer system 100 is made response to this, this Database Systems are provided thus.These instructions can be from reading in primary memory 106 such as the so another kind of computer-readable medium of memory device 110.By carrying out the instruction sequence that comprises in primary memory 106, processor 104 is carried out treatment step described herein.In alternative embodiment, hard-wired circuitry can be used for replacing software instruction or combination with it, realizes thus the present invention.Therefore, embodiments of the invention are not limited to any particular combination of hardware circuit and software.
Term used herein " computer-readable medium " refers to that any participation provides instruction for the medium of carrying out to processor 104.This medium can be taked a lot of forms, comprising but be not limited to: non-volatile media, volatile medium and transmission medium.For instance, non-volatile media comprises CD or disk, and for example memory device 110.Volatile medium comprises dynamic storage, and for example primary memory 106.Transmission medium comprises concentric cable, copper cash and optical fiber, comprising the line that consists of bus 102.Transmission medium can also be taked the form of sound wave or light wave, the signal that for example produces in radiowave and infrared data communication.
For instance, the common version of computer-readable medium comprises: floppy disk, flexible plastic disc, hard disk, tape or any other magnetic medium, CD-ROM or any other optical medium, punched card, paper tape or have other any medium that any other physical medium, RAM, PROM and EPROM, FLASH-EPROM, other any storage chip or the cartridge disk drive of hole patterns, carrier wave as described below or computing machine can read.
Multi-form computer-readable medium relates to one or more sequences of the processor 104 one or more instructions of transmission, in order to carried out.For instance, originally can carry instruction on the disk of remote computer.Remote computer can be with instruction load in its dynamic storage, and sends instruction with modulator-demodular unit via telephone wire.The modulator-demodular unit of computer system 100 this locality can be on telephone wire receive data and data-switching is become infrared signal with infrared transmitter.Infrared detector can receive the data of carrying in infrared signal, and appropriate circuit can assign into data on bus 102.Bus 102 is sent to primary memory 106 with data, and instruction retrieved and carry out by processor 104 from primary memory 106.Before or after being carried out by processor 104, the instruction that primary memory 106 receives can arbitrarily be kept on memory device 110.
Computer system 100 also comprises a communication interface 118 that is connected with bus 102.Communication interface 118 provides a bidirectional data communication that is coupled with network link 120, and wherein network link 120 is connected with local network 122.For instance, communication interface 118 can be network interface card or modulator-demodular unit that the telephone line for respective type provides data communication to connect of an ISDN (Integrated Service Digital Network) (ISDN).As another example, communication interface 118 can be a LAN (Local Area Network) (LAN) network interface card, and it provides data communication to connect to the LAN of compatibility.Can implement Radio Link in addition.In any this class was implemented, communication interface 118 all can be received and dispatched electricity, electromagnetism or light signal, and what these signals transmitted is those digit data streams that represent different types of information.
Network link 120 provides data communication via one or more networks to other data equipment usually.For instance, network link 120 can offer a connection by local network 122 data equipment of main frame 124 or Internet Service Provider (ISP) 126 runnings.ISP 126 and then provide data communication service via the worldwide packet data communication network that is commonly referred to now " internet " 128 again.Local network 122 and internet 128 have all used electricity, electromagnetism or the light signal of carry digital data streams.Carried via the signal via communication interface 118 on the signal of heterogeneous networks and network link 120 numerical data of travelling to and fro between computer system 100, these signals are the example form of the carrier wave of transmission information.
Computer system 100 can send message and receive data via one or more networks, network link 120 and communication interface 118, comprising program code.In the Internet example, server 130 can send a requested code that is used for application program via internet 128, ISP 126, local network 122 and communication interface 118.According to the present invention, the application that this download obtains is to provide for mixing shared disk/shared-nothing database system described here.
The code that receives can be when receiving original state carry out and/or deposit in memory device 110 or other nonvolatile memory by processor 104 and for carrying out after a while.Like this, computer system 100 can obtain the application code of carrier format.
The method of dividing a database entitlement and database access being controlled between the disparate databases server described here is implemented on a computer system, although according to an aspect of the present invention, access to some " without sharing " data in magnetic disk will be subject to software constraint, but for this computer system, the shared disk access of whole disks can be provided from node, that is to say, it is a system that can be used for strict shared disk access.
The entitlement group
According to one embodiment of present invention, a database (or its some part) is divided into the entitlement group.Each entitlement group has been assigned one or more database servers, with this owner as the entitlement group.Be assigned as the owner that the possessory database server of entitlement group is counted as all data item that are subordinated to this entitlement group.That is to say, these database servers permit directly accessing the data item of this entitlement group inside, and other database servers can not directly be accessed these data item.
According to an embodiment, the data item of often being accessed is simultaneously returned and is incorporated into identical entitlement group, guarantees that with this they are had by identical database server.The entitlement group is by being considered as the contiguous items combination atomic unit and one group of contiguous items is operated.For instance, by the entitlement of an entitlement group is transferred to the second database server from the first database server, the entitlement of all data item of this entitlement group inside all can be transferred to the second database server from the first database server.
The hybrid database system
Fig. 2 is a block diagram that hybrid database system architecture according to an embodiment of the invention is described.Fig. 2 comprises three nodes 202,204 and 206, is moving respectively database server 208,210 and 212 on it.Database server 208,210 and 212 is associated with memory buffer 220,222 and 224 respectively.Each node 202,204 is connected with system bus 218 with 206, so that database server 208,210 and 212 can directly be accessed the data of database 250 inside that are kept on these two disks 214 and 216.
Be included in data on disk 214 and 216 from being divided in logic entitlement group 230,232,234 and 236.According to one embodiment of present invention, each entitlement group comprises one or more table spaces (tablespace).Table space is the set of one or more data files (datafile).Yet the present invention is not limited to any specific division granularity, and can use together with the entitlement group of greater or lesser scope.
According to an embodiment, each entitlement group is designated as shared disk entitlement group or without the tenant in common group.Be appointed as without each entitlement group of tenant in common group and all assigned an availability database server, with this owner as it.In the system that Fig. 2 describes, entitlement group 230 be server 210 have without the tenant in common group, entitlement group 232 is shared disk entitlement groups, entitlement group 234 be server 212 have without the tenant in common group, entitlement group 236 be server 208 have without the tenant in common group.
Due to entitlement group 230 be server 210 have without the tenant in common group, therefore only allow the directly data (D1) of access entitlement group 230 inside of server 210.Any other server of attempting to access entitlement group 230 data need to send request to server 210 usually, require server 210 to carry out the data access of expection as the representative of request server.Equally, entitlement group 234 and 236 is also without the tenant in common group, and can only directly be accessed by they owners separately.
Are entitlement groups of a shared disk due to entitlement group 232, therefore any database server all can directly be accessed the data set that wherein comprises.As shown in Figure 2, each database server can comprise a copy of these data (D2) in its memory buffer.And used a distributed lock managers to coordinate sharing data access.
According to an embodiment, these Database Systems comprise a kind of mechanism, are used for certain specific entitlement group is dynamically changed into without sharing from shared disk, and vice versa.For instance, if certain specific shared-nothing collection is subject to read access frequently (reading focus), the entitlement group under it can be shared so being converted to shared disk from nothing, thereby these data are converted to shared disk.Equally, if the data set of certain shared disk is subject to write access (writing focus) frequently, the entitlement group that comprises these data can be made into one without the tenant in common group so, and the entitlement of this entitlement group is assigned to a database server, thus this data-switching is become shared-nothing.
According to an aspect of the present invention, these Database Systems also comprise a kind of mechanism, are used for the entitlement without the tenant in common group is assigned to another node again from a node.The operator may need this to operate to improve load balancing, and this operation also can carry out automatically, provides support in order to continue as the data without the tenant in common group that access N1 has after node N1 breaks down.
Entitlement
As mentioned above, provide a kind of Database Systems, wherein some entitlement group is designated as without the tenant in common group, and some entitlement group is designated as the entitlement group of shared disk.Each has been assigned an owner without the tenant in common group.Concerning all database servers, be known without the entitlement of tenant in common group, therefore, when needs were executed the task to entitlement group internal data, these database servers can send request to the owner of entitlement group.
According to one embodiment of present invention, the entitlement information of different entitlement groups all is kept in a control documents, and all can be accessed this wide area information server server and all permit accessing this control documents.Each database server can deposit a copy of control documents in buffer memory.In the situation that have the control documents copy in buffer memory, database server can be determined the entitlement of this entitlement group, and needn't bear all the time and the expense that reads entitlement information be associated from disk.
Fig. 3 is the process flow diagram of the performed step of descriptive data base server, and this database server wishes to obtain not only to use shared disk entitlement group but also use without the data in the system of tenant in common group.In step 300, database server is determined the entitlement group under anticipatory data.In step 302, database server determines to comprise the owner of the entitlement group of anticipatory data.As mentioned above, can come execution in step 302 by accessing a control documents, wherein can deposit a copy of control documents in be associated with database server buffer memory.If this entitlement group is a shared disk entitlement group, all database servers all will be counted as the owner of this entitlement group so.If this entitlement group be one without the tenant in common group, will be appointed as certain database server the owner of this entitlement group so in control documents.
In step 304, database server is judged the owner who self whether is the entitlement group that keeps anticipatory data.If (1) this entitlement group entitlement group that is a shared disk, perhaps (2) this entitlement group be one without the tenant in common group, and in control documents, this database server is appointed as the owner without the tenant in common group, this database server is the owner of this entitlement group so.If database server is the owner who keeps the entitlement group of anticipatory data, control so and will be delivered to step 310, database server will directly be retrieved anticipatory data there.
If database server is not the owner who keeps the entitlement group of data, controls so and will be delivered to step 306.In step 306, database server sends a request to the owner of this entitlement group, so that this owner represents that the requestor visits anticipatory data.In step 308, database server receives anticipatory data from the owner of this entitlement group.
Owner's set
According to an alternative embodiment, entitlement group is not limited to (1) and is only had (without shared) by database server only, and perhaps (2) are had (shared disk) by all database servers.On the contrary, the entitlement group can also be had by any specified subset in the availability database server.The database server set that has certain specific entitlement group here, is called owner's set of entitlement group.Therefore, of equal value without the entitlement group that only comprises a database server in tenant in common group and owner's set, the entitlement group that has comprised all availability database servers during shared disk entitlement group is gathered with the owner is of equal value.
When gathering to carry out a task about the data of entitlement group with the owner, the database server that does not belong to owner's set of this entitlement group will send request to those servers that belongs in the database server of owner's set of this entitlement group.In response to this request, the request receiving person directly accesses the data of this entitlement group and the task that execution is asked.Writing by entitlement group inside contention that focus produces only can produce between those belong to the database server of owner's set of entitlement group.
The entitlement of the group that changes ownership
As mentioned above, from becoming shared disk or become without sharing from shared disk without sharing, this is desirable the entitlement group.This variation can be started automatically in response to the read-write Hot spots detection, also can manually boot (order of for example sending in response to the data base administrator).
Can gather (" source owner set ") to the entitlement group from an owner with different technologies and transfer to another owner's set (" purpose owner's set ").Fig. 4 is a description according to the change ownership process flow diagram of the step that owner's set of group carries out of the embodiment of the present invention.
With reference to figure 4, in step 400, the information broadcast of " forbidding changing " is to all available database servers.This is forbidden changing the information order database server and stops the inner data of entitlement group (" the entitlement group in transfer ") that those owner's set will change are made the forward direction change.It is that those create a change that did not before have the version of (data item that namely creates new " current " version) that forward direction changes.On the other hand, backward change is that those data item that cause again creating previous existing version change.
In step 402, the part database of owner's set (" owner changes mechanism ") of group of being responsible for changing ownership waits for always, and the affairs of the entitlement group in the transfer are all submitted (commit) or rollback (roll back) to until institute has changed.
Change owing to allowing that no longer the entitlement group is carried out forward direction, therefore, those the entitlement group internal data in shifting had been carried out some before step 400 but not the affairs of update all will rollback.Change because step 400 only stops those forward directions to the entitlement group in shifting, therefore can not prevent the change that database server rollback those they has been made the entitlement group in shifting.
Unfortunately, may need a large amount of expenses to judge which affairs upgraded the entitlement group in shifting.Therefore one embodiment of the present of invention are provided, and wherein Database Systems are not attempted following the trail of those and have been upgraded entitlement group in shifting with the affairs of interior data.Yet, in the situation that this information is not followed the trail of, must suppose any one permit access in shifting entitlement group data and start from the data that affairs before step 400 have all changed the entitlement group inside in shifting.
Suppose based on this, the owner that step 402 requirement changes mechanism waits for always, until (1) may access the entitlement group data in shifting, and (2) start from step 400 all affairs submission or rollbacks before.Usually, the data of only having the entitlement group of those affairs of moving in just might accessing transfer in the database server of the source owner set that belongs to conversion entitlement group.Therefore, if the entitlement group in shifting is shared disk, the owner who changes so mechanism must wait for always, until all affairs that start from all database servers before step 400 have all been submitted to or rollback.If the entitlement group in shifting is without sharing, the owner who changes so mechanism must wait for always, has all submitted to or rollback until have all affairs of the database server of the entitlement group in transfer.It is noted that wherein having comprised those has initiated and created the user's business of the local subtransaction of the entitlement group in shifting at other nodes.
The whole affairs that might upgrade the entitlement group internal data in shifting all submitted to or rollback in, control will advance to step 404.In step 404, the owner who changes mechanism changes owner's set of the entitlement group in transfer by the control documents in the renewal atomic operation.For instance, this indication changes the entitlement group that can make in transfer from transfer to the entitlement group of shared disk without the tenant in common group, and vice versa.As selection, indication changes and can a change have one without the database server of tenant in common group, and does not change the type of this entitlement group.
Changing control documents, when the new owner who makes it to reflect the entitlement group in transfer gathered, control will advance to step 406.In step 406, the message of one " flush buffers " will send to all availability database servers.In case receive the message of flush buffers, each database server will deactivate the control documents copy that comprises in buffer memory.Therefore, when database server needed subsequently to check that control documents is determined the entitlement of entitlement group, they can retrieve the control documents that upgrades version from long-time memory.These database servers will be appreciated that new owner's set of the entitlement group in transfer thus.
Adjustment to the entitlement variation
Frequent use certain inquiry in, this inquiry can be kept at database inside usually.When the inquiry of will preserve initially was submitted to Database Systems, most of Database Systems can be that all the inquiry preserve produces an executive plan, rather than recomputated an operating scheme each when using institute to preserve to inquire about.The executive plan of an inquiry must be taken the entitlement of the entitlement group that comprises this inquiry institute visit data into account.For instance, if the renewal of a data item in the entitlement group that has specially for certain specific database server has been specified in this inquiry, the executive plan of this inquiry must comprise this renewal operation is sent to this specific database server so.
Yet as mentioned above, provide a kind of mechanism for the entitlement of the group that changes ownership.This entitlement changes and can appear at as after certain specific executive plan of having been preserved query generation.Therefore, these executive plans may need the entitlement group internal data executable operations that some database server is had no longer them.According to one embodiment of present invention, those require the database server for its all entitlement group internal data executable operations is not returned to the message of one " entitlement mistake " to the process of this operation of request.In response to receiving an entitlement error message, will produce a new executive plan for this inquiry that leads to errors.The current entitlement of the entitlement group that the control documents of current version is indicated has been considered in new executive plan.
The management of control documents
Used as mentioned above an atomic operation to upgrade control documents, so that the sign (step 404) of the group that changes ownership.Can guarantee that this operation is the operation of minimum unit with different mechanisms here.For instance, as shown in Figure 5, according to one embodiment of present invention, control documents comprises a bitmap and a series of data block pair.Each bit in bitmap 512 with a data block to corresponding.
At any given time, a data block centering only has a data block to comprise current data.With data block, the bit value that is associated has been indicated which maintenance current data in two right data blocks of corresponding data piece.For instance, bit 502 is associated to 504 with the data block that comprises data block 506 and 508.The value of bit 502 (for example " 0 ") representative data piece 506 is that data block is to the current data block of 504 inside.The value of bit 502 can become " 1 ", and the data of this expression data block 508 are current data (data in data block 506 are no longer valid thus).
Because the data of the right non-current data block of data block are considered to invalid, therefore data can be write non-current data block, and not change effective content of control documents.When in fact only having the bit value of Figure 51 2 in place to change, the content of control documents just can change.Therefore, as the preliminary step that atom changes, data block can be loaded in storer the content of current data block 506 in 504, also can be modified, and can also deposit data block in to 504 non-current data block 508.After having carried out these preliminary steps, this variation can be carried out the value of the bit 502 of 504 corresponding bitmap 512 inside automatically by changing with data block.
This is only an example that automatically performs the technology of change.Other technology is also acceptable.Therefore, the present invention is not limited to any peculiar technology be used to automatically performing change.
Mobile data item between the entitlement group
A kind of change such as the proprietorial method of this data item of table space is the owner of the entitlement group of change data item ownership.The proprietorial method of another kind of change data item is that data item is assigned to different entitlement groups again.For instance, by removing table space A and it is assigned in the entitlement group that is assigned to server B from the entitlement group that is assigned to server A, can change the owner of table space A into server B from server A.
According to one embodiment of present invention, the members of entitlement group all is kept in a data dictionary of database inside.Therefore, a data item is moved to the second entitlement group from the first entitlement group, must upgrade so the information of members of the first and second entitlement groups of data dictionary inside.Different step related when changing data item entitlement group is similar to those above-mentioned steps for owner's set of the group that changes ownership.Specifically, the access for the table space that is shifting (" table space in transfer ") is forbidden.Then, the entitlement change mechanism will be waited for all affairs (or its assembly) rollback or the submission that is locked on data item.
In case all affairs that are locked on data item have all been submitted to or rollback, data dictionary will be modified so, in order to indicate the new entitlement group of this data item.Then control documents will be modified, so that the owner of the entitlement group that the designation data item moves to set is owner's set of data item.This change allows the object owner to access this data item automatically.If just among entitlement changes, control documents will be updated the entitlement group so, in order to indicate this data item to be in the state of " the mobile delay ".
Variation for the entitlement group under data item might change the data item owner, also might not change the data item owner.If it is identical that the owner of the entitlement group in the owner of source entitlement group set and transfer gathers, so, in the entitlement group of data item from source entitlement group moves to transfer, the owner of data item can not change.On the other hand, if the owner of source entitlement group set is different from owner's set of the entitlement group in transfer, so, in the entitlement group of data item from source entitlement group moves to transfer, the owner of data item will change.
Specific proprietorial change condition
According to an embodiment, provide some technical method to process following situation, wherein: (1) attempts changing owner's set of this entitlement group when the data item that belongs to an entitlement group is being transferred to another one entitlement group; And (2) attempt data item is transferred to another one entitlement group when destination entitlement group is changing its owner's set.
In order to detect these states, one embodiment of the present of invention are inner at control documents is that each data item (for example table space) that belongs to an entitlement group provides one or more Status Flags.Whether for instance, can come the entitlement group of designation data item ownership to be in a mark assigns in new possessory process.Equally, a mark can be in the process of transferring to different entitlement groups by the designation data item.
In the owner's set organized of attempting changing ownership, the entitlement change mechanism will check the Status Flag of the data item that belongs to the entitlement group, in order to determine whether any data item that belongs to this entitlement group is in the process that transforms to different entitlement groups.Be in the process of transferring to another one entitlement group if belong to any one data item of this entitlement group, will stop so attempting changing owner's set of this entitlement group.Be in the process that is transformed into different entitlement groups if neither one belongs to the data item of this entitlement group, the entitlement of entitlement group of Status Flag designation data item ownership that belongs to so the data item of this entitlement group is among transfer.And a message is sent to different database servers, in order to deactivate the control documents version of their buffer memorys.Guaranteed that thus these database servers recognize new status flag value.
When attempting data item is moved to different entitlement group, the Status Flag of data item will be examined, in order to determine whether purpose entitlement group is in the process that changes its owner's set.According to an embodiment, the execution of this inspection occurs in after the Update Table dictionary makes it to reflect new data item entitlement group, and upgrades control documents and make before the owner of new entitlement group can access this data item.If the entitlement group of data item ownership is in the process that changes its owner's set, the Status Flag that is used for so the data item of control documents will be configured to indicate the state of " the mobile delay ".In addition, " mobile postpone " mark in whole database will be set to and comprise some in this database of indication and be in data item in mobile delaying state.
In the operation of having completed the group membership that passes ownership, the process of carrying out conversion will the update mode sign, in order to indicate this entitlement group no longer to be in the process of ownership transfer.In addition, this process is paid off " the mobile delay " sign except any data item of moving to this entitlement group in the ownership transfer process of entitlement group.
Fault is got rid of
Might break down in the process that entitlement changes.This fault might be the result of " process dead (progress death) " or " server delay machine (server death) ".When certain process that relates to the entitlement change broke down, a process death will appear.When whole database server broke down, the server machine of delaying will appear.In the situation that there are these two kinds of fault types, the institute that preserves not yet on long-time memory changes and perhaps can lose.After this fault, be necessary database is turned back to a kind of consistent state.
According to one embodiment of present invention, recover from process death by using status object (state object) to carry out.Status object is a kind of data structure, it be dispensed on process under the memory area that is associated of database server in.Before carrying out an action, process can the update mode object, in order to indicate the action that it will carry out.If process is dead, another process of server inside (for example " process monitors ") will be called a kind of method (" clear program ") of this status object and database is turned back to a consistent state so.
After breaking down, process carries out the degree that operation that the special action remove depends on that the inefficacy process is performed and the death of inefficacy process had been carried out before it lost efficacy.According to an embodiment, in the entitlement change process of entitlement group, process failure is following processing:
Lost efficacy before it changes final control documents if carry out the process that entitlement changes, the so initial owner will be restored as the owner of entitlement group.
Still lost efficacy before the deletion status object after changing final control documents if carry out the process of entitlement variation, the so new owner remains this owner, and status object is with deleted.
When data item was transferred to another entitlement group from an entitlement group, process failure was following processing:
Lost efficacy before changing data dictionary if carry out the process that shifts, the initial owner of data item will be restored as the data item owner so.
Be still to lose efficacy before final control documents changes after having submitted the variation that relates to dictionary to if carry out the process that shifts, process monitors will finish mobile and control documents is carried out appropriate change so.If this entitlement group is in the entitlement variation, data item is in " the mobile delay " state so.
Still lost efficacy before the deletion status object after final control documents changes if carry out the process that shifts, process monitors will be deleted status object so.
The server machine of delaying
The machine although database server is delayed, the data of the entitlement group that does not have separately to the machine server of delaying provide access.Therefore, according to one embodiment of present invention, the server machine of delaying is an event, and it has triggered an automatic entitlement variation, and wherein, the plena proprietas group that the server that breaks down has separately all has been assigned to the new owner.
The special action that is used for removing of carrying out after server failure depends on that operation that database server is being carried out and server are delayed has carried out before machine for how many ownership transfers.According to an embodiment, the server failure in an entitlement change procedure of entitlement group is following processing:
If the source database server machine of having delayed before changing final control documents, the entitlement group will be assigned to another thread so, and the status information in control documents will be upgraded, so that indication entitlement group no longer is among transfer.
The machine if the target database server is delayed, (1) is carried out the process that shifts and example (instance) inefficacy and termination transfer will be detected so, perhaps (2) in the rejuvenation of machine server of delaying, the entitlement group will be reassigned into another server from the machine server of delaying.
Following processing at the server failure that data item is occurred when an entitlement group is transferred to another entitlement group:
The machine if source server was delayed before dictionary changes, in rejuvenation, the new owner will be assigned to source entitlement group and can clear data the metastatic marker of item so.
The machine if source server is delayed after changing dictionary but before changing final control documents can be assigned the appropriate owner or it is labeled as mobile the delay for data item in the rejuvenation of source server so, finishes thus move operation.
The machine if destination server is delayed, and changed final control documents, data item will be labeled as " the mobile delay " so.In the rejuvenation of machine server of delaying, the entitlement of the entitlement group in transfer will be assigned again, and the mobile mark that postpones will be eliminated.
Reduce the stop time in the entitlement change procedure
As mentioned above, the step of Fig. 4 description represents a kind of proprietorial technology be used to the group that changes ownership.In this technology, step 402 needs the entitlement change mechanism to wait for always, until all affairs have all submitted to or rollback, wherein these affairs have changed those data outside the data of the entitlement group that belongs in transfer.In this waiting process, all data of the entitlement group in transfer are all disabled.Therefore, minimizing latency is very important.
As mentioned above, the affairs that reality changed the data of the entitlement group that belongs in transfer are followed the trail of, and this is likely unpractical.Therefore, the entitlement change mechanism waits for that the affairs on all database servers of source owners set that all operate in the entitlement group that belongs in transfer submit to or rollback.The transactions that must wait for due to the entitlement change mechanism, and wherein many affairs might not change the data of the entitlement group in transfer, and it might be very important therefore postponing.
According to a replacement example, a kind of mechanism is provided, this mechanism allows those to keep available in the data that shift between the owner in this postpones.Specifically, forbid that the message that changes does not send to all database servers.On the contrary, " the new owner " message sends to all database servers, and what its was indicated is object owner's set of entitlement group.New owner's message can be broadcasted, for instance, send the message of a flush buffers to all database servers after upgrading control documents, so that indication: (1) source owner set, (2) object owner's set, and (3) this entitlement group is among transfer.
After server receives new possessory message, all gone into action by all affairs of this startup of server, just look like that object owner's set has this entitlement group such.Before server received new owner's message, all affairs that started in server all can be proceeded, and it is such that all right image source owner set has this entitlement group.Therefore, in waiting process, the entitlement of the entitlement group in transfer has obtained effectively sharing between source owner set member and object owner set member.In other words, the data of the entitlement group in transfer are interim sharing between two database servers, and the shared disk locking mechanism is activated in order to access these class data temporarily.
All affairs in the source owner set that began before the new owner's message of broadcasting all submitted to or rollback in, control documents will upgrade again.In upgrading for the second time, control documents will be updated, so that indicating target owner set is the owner that monopolizes of entitlement group, and this entitlement group no longer is among transfer.
In the above description, with reference to embodiments of the invention, it is described.Yet clearly, can modifications and variations of the present invention are, and the essence and the scope that do not break away from it.Therefore, instructions and accompanying drawing are counted as illustrative, and they do not have limited significance.

Claims (4)

1. method that is used for management data said method comprising the steps of:
Keep a plurality of data item on can be by the long-time memory of a plurality of node visits;
The entitlement of the group that the data item in described a plurality of data item is formed is distributed to the first node in described a plurality of node;
The operation that will be referred to the specific data item in described group sends described first node to, makes described first node execution to the operation of described specific data item, and wherein said specific data item resides in the specific location of described long-time memory;
When described first node continues operation, the entitlement of described specific data item is redistributed to Section Point from described first node, and the described specific location from the described long-time memory does not move described specific data item;
After described redistributing, when a node wishes to carry out the operation that relates to described specific data item, wish that the described node of carrying out described operation sends described operation to described Section Point, if described specific data item resides in described specific location, described Section Point is carried out described operation to described specific data item.
2. method according to claim 1, wherein, described a plurality of nodes are nodes of multinode Database Systems.
3. method according to claim 1, wherein, the entitlement of described specific data item is redistributed to the step of Section Point from described first node carried out as the part of the process that the entitlement of described group is shifted from described first node to described Section Point.
4. method according to claim 3, wherein, have more high workload load in response to described first node being detected with respect to described Section Point, starts described transfer process.
CN 200910146448 2001-06-28 2001-06-28 Method for dividing database ownership among different database servers to control access to databases Expired - Lifetime CN101714152B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200910146448 CN101714152B (en) 2001-06-28 2001-06-28 Method for dividing database ownership among different database servers to control access to databases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910146448 CN101714152B (en) 2001-06-28 2001-06-28 Method for dividing database ownership among different database servers to control access to databases

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CNB018228445A Division CN100517303C (en) 2001-06-28 2001-06-28 Partitioning ownership of a database among different database servers to control access to the database

Publications (2)

Publication Number Publication Date
CN101714152A CN101714152A (en) 2010-05-26
CN101714152B true CN101714152B (en) 2013-06-26

Family

ID=42417802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910146448 Expired - Lifetime CN101714152B (en) 2001-06-28 2001-06-28 Method for dividing database ownership among different database servers to control access to databases

Country Status (1)

Country Link
CN (1) CN101714152B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9152669B2 (en) 2013-03-13 2015-10-06 Futurewei Technologies, Inc. System and method for distributed SQL join processing in shared-nothing relational database clusters using stationary tables
US20230409602A1 (en) * 2022-06-21 2023-12-21 International Business Machines Corporation Data management

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1145489A (en) * 1995-06-06 1997-03-19 美国电报电话公司 System and method for database access administration
US5625811A (en) * 1994-10-31 1997-04-29 International Business Machines Corporation Method and system for database load balancing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5625811A (en) * 1994-10-31 1997-04-29 International Business Machines Corporation Method and system for database load balancing
CN1145489A (en) * 1995-06-06 1997-03-19 美国电报电话公司 System and method for database access administration

Also Published As

Publication number Publication date
CN101714152A (en) 2010-05-26

Similar Documents

Publication Publication Date Title
CN100517303C (en) Partitioning ownership of a database among different database servers to control access to the database
US11755481B2 (en) Universal cache management system
JP4557975B2 (en) Reassign ownership in a non-shared database system
US8510334B2 (en) Lock manager on disk
US7120651B2 (en) Maintaining a shared cache that has partitions allocated among multiple nodes and a data-to-partition mapping
US5434994A (en) System and method for maintaining replicated data coherency in a data processing system
CN100465914C (en) Managing checkpoint queues in a multiple node system
JP4721639B2 (en) Storage access key
US5630050A (en) Method and system for capturing and controlling access to information in a coupling facility
JP4586019B2 (en) Parallel recovery with non-failing nodes
US20070043726A1 (en) Affinity-based recovery/failover in a cluster environment
US20100174802A1 (en) Super master
US11586641B2 (en) Method and mechanism for efficient re-distribution of in-memory columnar units in a clustered RDBMs on topology change
US20130232379A1 (en) Restoring distributed shared memory data consistency within a recovery process from a cluster node failure
CN102460411A (en) Distributed cache availability during garbage collection
CN100565460C (en) Be used for method of managing data
CN101571879B (en) Assigning database ownership among different database servers to control access to database
US6799172B2 (en) Method and system for removal of resource manager affinity during restart in a transaction processing system
CN101714152B (en) Method for dividing database ownership among different database servers to control access to databases
US6658513B1 (en) Managing locks affected by planned or unplanned reconfiguration of locking facilities
JP2000105722A (en) Method and device for previewing result of data structure allocation
Siddha A Persistent Snapshot Device Driver for Linux
JP2007188518A (en) Partitioning of ownership of database between different database servers for controlling access to database
CN100487676C (en) Disk writing operation in a distributed shared disk system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term

Granted publication date: 20130626

CX01 Expiry of patent term