WO2014008652A1 - Metadata management method and device - Google Patents

Metadata management method and device Download PDF

Info

Publication number
WO2014008652A1
WO2014008652A1 PCT/CN2012/078563 CN2012078563W WO2014008652A1 WO 2014008652 A1 WO2014008652 A1 WO 2014008652A1 CN 2012078563 W CN2012078563 W CN 2012078563W WO 2014008652 A1 WO2014008652 A1 WO 2014008652A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
node
current
basic shape
cluster
Prior art date
Application number
PCT/CN2012/078563
Other languages
French (fr)
Chinese (zh)
Inventor
李熠斌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201280003170.4A priority Critical patent/CN104054294B/en
Priority to PCT/CN2012/078563 priority patent/WO2014008652A1/en
Publication of WO2014008652A1 publication Critical patent/WO2014008652A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • the present invention relates to storage technologies, and in particular, to a metadata management method and apparatus. Background technique
  • Metadata is the description of the resources managed by the current device or system. It plays an important role in the self-maintenance and management of the device itself.
  • metadata is often scattered among the nodes of the cluster.
  • the cluster usually includes a primary node and a standby node, and the primary node maintains a complete metadata complete set including metadata of all nodes of the cluster, and the standby node only maintains a subset of metadata that may be used by itself (the subset is Relative to the corpus maintained by the primary node, the subset maintained by the standby node is the metadata necessary to maintain normal operation of the node.
  • the present invention provides a metadata management method and apparatus for quickly implementing active/standby node switching in a distributed cluster.
  • a first aspect of the present invention provides a metadata management method, where the method is applied to a cluster including a plurality of nodes, and the method includes:
  • the metadata including the current node itself a currently running metadata set and a metadata set currently running by all nodes in the cluster other than itself, the metadata pile including at least two basic shapes, each of which is composed of a vertex, a first leaf node, and a a binary tree composed of two leaf nodes;
  • the storing the metadata in the cluster as a metadata heap includes:
  • the second leaf node of the current basic shape is used to connect the degenerate basic shape of the node corresponding to the current basic shape, and the degenerate basic shape is a metadata set that is operated by the node corresponding to the current basic shape at the first time point.
  • the first time point is earlier than the current time point;
  • the second leaf node of the degenerate basic shape is used to connect another degenerate basic shape of the node corresponding to the current basic shape, and the other degenerate basic shape is the current basic shape a set of metadata that the corresponding node runs at a second time point, the second time point is earlier than the first time point, and so on;
  • the management of the metadata is performed according to the metadata heap.
  • the managing the metadata according to the metadata heap includes: acquiring the An updated metadata set currently running by one of the nodes other than itself in the cluster; storing the updated metadata set in a current basic shape in the metadata heap corresponding to the one of the nodes.
  • a second possible implementation after the storing in the current basic shape corresponding to the one of the nodes in the metadata heap, : storing, in the metadata base, degraded data corresponding to the one of the nodes in a degenerate basic shape, where the degraded data is stored in the current basic shape before the updated metadata set Data; connecting the degenerate base shape to a second leaf node of the current base shape.
  • the performing according to the metadata heap includes: the current basic shape corresponding to the another node, and the current The first leaf node of the current basic shape of the node is disconnected, and the other current node corresponding to the other node of the other current basic shape corresponding to the other node is disconnected Opening a connection; connecting the further current basic shape to a first leaf node of a current basic shape of the current node.
  • the managing the metadata according to the metadata heap includes: from being to join the cluster Obtaining a currently running metadata set of the new node in a metadata heap stored in a new node; establishing a current basic shape corresponding to the new node in a metadata heap of the current node, and the new node
  • the currently running metadata set is stored in a current basic shape corresponding to the new node, and the current basic shape of the new node is connected to a current basic shape of the first leaf node having the idleness in the metadata heap.
  • the performing the management of the metadata according to the metadata heap includes: storing, respectively, a metadata set running at each time point, where the metadata set includes a metadata set of each node in the cluster, where each time point includes the current time point, a first time point, and a second time point, to run metadata corresponding to a certain time point in each time point set.
  • the performing the management of the metadata according to the metadata heap includes: when a current basic shape corresponding to a current node in the metadata heap exists When the metadata is repaired, the repair data corresponding to the data to be repaired is obtained from the degenerate basic shape corresponding to the current node itself in the metadata heap; or the metadata heap from the other nodes in the cluster and itself Acquiring the repair data corresponding to the data to be repaired in the current basic shape; replacing the acquired repair data with the metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap.
  • a second aspect of the present invention provides a metadata management apparatus, including:
  • a storage unit configured to store metadata in the cluster as a metadata heap, where the metadata includes a metadata set currently running by the current node itself and a metadata set currently running by all nodes in the cluster other than itself,
  • the metadata heap includes at least two basic shapes, each of which is a binary tree shape composed of a vertex, a first leaf node, and a second leaf node; the storing the metadata as a metadata heap, including:
  • the first leaf node of the current basic shape is used to connect another current basic shape storing a metadata set currently running by another node, and the first leaf node of the other current basic shape is used for Connecting to store another current basic shape of the metadata set currently running by the node, and so on until all nodes in the cluster are connected;
  • a second leaf node of each of the current basic shapes is used to connect the node corresponding to the current basic shape to run at a first time point in each current basic shape for storing a metadata set currently running by each node in the cluster.
  • a degenerate basic shape of the metadata set the first time point being earlier than the current time point;
  • the second leaf node of the degenerate basic shape is used to connect another set of metadata of the node running at the second time point a degenerate basic shape, the second time point is earlier than the first time point, and so on;
  • the management unit is configured to perform management of the metadata according to the metadata heap.
  • the management unit includes: a synchronization subunit, configured to acquire, in addition to the cluster itself, the metadata in the metadata heap An updated metadata set currently running by one of the nodes; storing the updated metadata set in a current basic shape in the metadata heap corresponding to the one of the nodes.
  • the management unit includes: a storage subunit, configured to correspond to one of the metadata heaps Degraded data stored in a degenerate basic shape, the degraded data being data stored in the current basic shape before the updated metadata set; connecting the degenerate basic shape to the current basic shape The second leaf node.
  • the management unit includes: a morphological control subunit, configured to split the another node other than the current node itself in the cluster from the cluster When going out, disconnecting the current basic shape corresponding to the another node from the first leaf node of the current basic shape of the current node, and the another current basic shape corresponding to the another node The further current basic shape corresponding to the further node connected by the first leaf node is disconnected; the first current basic shape is connected to the first leaf node of the current basic shape of the current node; When a new node is added to the cluster, a metadata set of the current running of the new node is obtained from a metadata heap stored in a new node to be joined to the cluster; and a metadata heap of the current node is established.
  • a morphological control subunit configured to split the another node other than the current node itself in the cluster from the cluster When going out, disconnecting the current basic shape corresponding to the another node from the first leaf node of the current basic shape of the current node, and the another current
  • the management unit includes: a snapshot subunit, configured to separately store a metadata set running at each time point, where the metadata set includes each node in the cluster The metadata set includes the current time point, the first time point, and the second time point to run a metadata set corresponding to a certain time point in each time point.
  • the management unit includes: a repair subunit, configured to: when there is metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap, Acquiring the repair data corresponding to the data to be repaired in the degenerate basic shape corresponding to the current node in the metadata heap; or from the current basic shape corresponding to the metadata heap of the other nodes in the cluster Acquiring the repair data corresponding to the data to be repaired; replacing the acquired repair data with the metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap.
  • a repair subunit configured to: when there is metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap, Acquiring the repair data corresponding to the data to be repaired in the degenerate basic shape corresponding to the current node in the metadata heap; or from the current basic shape corresponding to the metadata heap of the other nodes in the cluster Acquiring the repair data corresponding to the data to be repaired; replacing the acquired repair data with the metadata to be repaired in the current basic shape
  • the management unit when the primary node in the cluster is powered off, is further configured to: determine, according to a predetermined rule, that the current node is a master node; The primary and secondary identifiers of the current node are modified for the primary use.
  • a third aspect of the present invention provides a metadata management apparatus, including a memory and a processor, where the memory is used to store metadata in the cluster as a metadata heap, where the metadata includes a current operation of the current node itself. a metadata set and a metadata set currently running by all nodes in the cluster other than itself, the metadata pile including at least two basic shapes, each basic shape being composed of a vertex, a first leaf node, and a second leaf node Binary tree
  • the storing the metadata in the cluster as a metadata heap includes:
  • the second leaf node of the current basic shape is used to connect the degenerate basic shape of the node corresponding to the current basic shape, and the degenerate basic shape is a metadata set that is operated by the node corresponding to the current basic shape at the first time point.
  • the first time point is earlier than the current time point;
  • the second leaf node of the degenerate basic shape is used to connect another degenerate basic shape of the node corresponding to the current basic shape, and the other degenerate basic shape is the current basic shape a metadata set of the corresponding node running at the second time point, the second The time is earlier than the first time, and so on;
  • the processor is configured to perform management of the metadata according to the metadata heap.
  • the metadata management method and apparatus provided by the present invention saves a metadata heap in each node, and each metadata pile stores a current metadata set of the current node itself and a data set other than itself.
  • the metadata set of all the nodes in the cluster can be used as the master node because all the nodes in the cluster have the same architecture. Therefore, only the primary and secondary identifiers inside the node need to be used for the active/standby switchover.
  • the modification can be used as the main function. It does not need to perform data migration between the active and standby nodes. This avoids the large amount of data migration during the active/standby switchover of the prior art, and implements the fast execution of the active/standby switchover in the cluster.
  • FIG. 1 is a schematic diagram showing the configuration of a metadata heap in a node in an embodiment of a metadata management method according to the present invention
  • FIG. 2 is a schematic diagram of a metadata heap in a node in an embodiment of a metadata management method according to the present invention
  • FIG. 3 is a schematic flowchart of an embodiment of a metadata management method according to the present invention
  • FIG. 4 is a schematic diagram of a principle of another embodiment of a metadata management method according to the present invention.
  • FIG. 5 is a schematic diagram of a schematic diagram of still another embodiment of a metadata management method according to the present invention.
  • FIG. 6 is a schematic diagram of a schematic diagram of still another embodiment of a metadata management method according to the present invention.
  • FIG. 7 is a schematic diagram of a schematic diagram of still another embodiment of a metadata management method according to the present invention.
  • FIG. 8 is a schematic diagram 1 of another embodiment of a metadata management method according to the present invention.
  • FIG. 9 is a schematic diagram 2 of another embodiment of a metadata management method according to the present invention.
  • FIG. 10 is a schematic diagram of a schematic diagram of still another embodiment of a metadata management method according to the present invention.
  • FIG. 1 is a schematic structural diagram of an embodiment of a metadata management apparatus according to the present invention
  • FIG. 12 is a schematic structural diagram of another embodiment of a metadata management apparatus according to the present invention.
  • Embodiments of the present invention implement metadata management based on fractal theory.
  • Fractal theory is generally understood as "a rough or fragmentary geometry that can be divided into several parts, and each part is at least roughly the overall reduced size shape", This property is called self-similarity; a mathematical fractal is based on an iterative equation, a recursive-based feedback system.
  • fractals There are several types of fractals, which can be defined according to the exact self-similarity, semi-self-similarity and statistical self-similarity respectively. Fractals generally have the following characteristics: they can have fine structures on any small scale; Rules, whether in whole or in part, are difficult to describe in the language of traditional Euclidean geometry; have (at least approximate or statistical) self-similar forms.
  • the organization and management of the metadata is applicable to clusters including multiple nodes, including, for example, metadata management of a cluster of computer hosts, cluster devices. Cache metadata management, metadata management for the Distributed File System, and all other scenarios that require discrete management of cluster data.
  • the following embodiment uses a distributed cluster system as an example to describe the method of the embodiment of the present invention.
  • a distributed cluster system a plurality of nodes are included.
  • a metadata set of all nodes of the entire cluster is stored inside each node of the cluster, and at least a metadata set of all nodes currently stored is stored.
  • the metadata is stored as a metadata heap in each node, and the composition of the metadata heap is based on the fractal theory.
  • Metadata heap Within each node, a plurality of metadata, such as metadata including nodes themselves, metadata of other nodes, etc., are stored. These metadata are stored in association with each other, and the whole of all metadata is called a metadata heap;
  • Basic shape A unit that stores metadata. Each basic shape stores one type of metadata. For example, one basic shape is used to store metadata of the node itself, and another basic shape is used to store metadata of another node. .
  • the base shape includes a current base shape and a degenerate base shape, wherein the current base shape: metadata of the current running of the storage node itself; the degenerate base shape: metadata of the storage node itself running at a previous time point before the current time point.
  • FIG. 1 is a schematic diagram showing the configuration of a metadata heap in a node in an embodiment of a metadata management method according to the present invention. The intention is to take a cluster including two nodes, node 1 and node 2 as an example.
  • FIG. 1 shows the principle of the metadata heap inside the node 1 in the cluster.
  • the metadata heap includes at least two basic shapes, that is, for example, a basic shape 11 for storing a metadata set currently running by the node 1, and another for storing a metadata set currently running by the node 2
  • the basic shape 12 and the like since the basic shape 11 and the basic shape 12 store the metadata set currently running by the node, it may be referred to as "current basic shape", and the "current basic shape” mentioned in the subsequent embodiment of the present invention is also Both refer to the basic shape of the metadata set currently used to store the node, which refers to each node in the cluster.
  • the current base shape of the currently running metadata set of each node of the storage cluster is stored inside each node in the cluster. For example, if the cluster is a five-node cluster, each node internally includes at least five current basic shapes for storing the currently running metadata sets of the nodes.
  • each of the basic shapes includes a vertex a, a first leaf node b, and a second leaf node c, the basic shape being a triangle-like shape, which may also be referred to as a binary tree shape; for example, the base shape 12 also has The above-mentioned vertex a, first leaf node b, and second leaf node ⁇ base 12 are connected to the first leaf node b of the base shape 11. Similarly, if the cluster also includes node 3, the current base shape corresponding to node 3 will be connected to the first leaf node b of the base shape 12; and so on until all nodes in the cluster are traversed.
  • the basic shape is introduced in the metadata heap in order to more clearly explain the connection relationship between the stored metadata sets; specifically, for example, in the basic shape 11, it can be understood that the vertex a represents the element currently running by the node 1.
  • the data set (there is no limitation on the data structure inside the set), that is, the metadata set stored in each base shape can be represented by the vertex a.
  • the first leaf node b and the second leaf node c of the basic shape can be understood as a "connection interface" for connecting another metadata set, indicating that the metadata sets are related to each other;
  • the first leaf node b of the base shape 11 is a metadata set for connecting to the node 2 that is, the first leaf node b is a connection interface that associates the metadata set currently running by each node.
  • the second leaf node c in each basic shape is used to connect the degraded data of each node, where the degraded data refers to a metadata set run by the node at a time point before the current time point;
  • the basic shape of the degraded data may be referred to as a "degenerate basic shape", in a subsequent embodiment of the present invention
  • the "degenerate basic shape” mentioned also refers to the basic form of the metadata set used to store the point in time before the node is stored. As shown in FIG.
  • the second leaf node c of the base shape 11 (which may also be referred to as the current base shape 11) is connected to the degenerate base shape 13 corresponding to the node 1, and the degenerate base shape 13 stores, for example, the node 1 at the first time.
  • a set of metadata running, the first time point is earlier than the current time point;
  • the second leaf node c of the degraded basic shape 13 is connected to another degenerate basic shape 15 of the metadata set of the storage node 1 running at the second time point, The second time point is earlier than the first time point;
  • the second leaf node c of the base 12 is connected to the degenerate base shape 14 corresponding to the node 2.
  • the second leaf node c in the basic shape is a connection interface that links the metadata set running at each time point of the node for a single node, and the time points include the current time point and before the current time point. Time point.
  • the above-mentioned degradation data is determined according to actual needs.
  • the metadata currently running by node 1 is modified at time t1, and is modified at time points t2 and t3, t3 is later than t2.
  • the data modified by the time point tl and the data modified by the time point t2 are saved according to actual needs, and are respectively stored in a degenerate basic shape, and are performed according to the connection rule of FIG. 1 above. Connection; you can also choose to save only the modified data at time point tl. As shown in FIG.
  • A represents a metadata set of node 1, including a currently running metadata set of node 1, and a metadata set of node 1 at two time points before the current time point
  • B represents The metadata set of node 2, including the currently running metadata set of node 2, and the metadata set of node 2 at a previous point in time
  • a and B are connected by the current basic shape corresponding to node 1 and node 2, and It is indicated that the node 2 corresponds to the vertex a in the current base shape 12 (the metadata set indicating the current operation of the node 2) and is connected to the first leaf node b (connection interface) corresponding to the current base shape 11 of the node 1.
  • the other nodes in the cluster also organize metadata according to the above connection method.
  • the first leaf node is connected to the bottom edge of the base shape in the current basic shape, or alternatively, the second leaf node on the bottom edge of the base shape may be connected.
  • Degraded data is also a complete set of metadata inside a node, except that degradation means that the data is not currently running, but data that is run at a point in time before the current point in time; for example, after the current running metadata has changed , the data before the change can be stored in the degenerate base shape.
  • the node 1 has two layers of degradation (that is, two degenerate basic shapes). In practice, more levels of degenerate basic shapes can be configured according to requirements.
  • node 2 in the metadata heap, node 2 may be referred to as a "logical neighbor node" of node 1, and the "logical neighbor node” refers to two nodes connected in the metadata heap.
  • Mutual logical neighbors for example, node 1 in Figure 1 is a logical neighbor of node 2, node 2 is also a logical neighbor of node 1, if node 3 also includes node 3, node 3 is connected to node 2
  • the node 3 is referred to as a logical neighbor node of the node 2. That is, a "logical neighbor” is a definition used in a metadata heap to represent a connection relationship between nodes, regardless of the actual physical connection of each node.
  • each basic shape in the metadata heap may include a current basic shape and a degenerate basic shape, and each basic shape includes a vertex, a first leaf node, and a second leaf node; however, The connection relationship of the above three nodes of the current basic shape and the degenerate basic shape is different.
  • the vertex a of the current base shape 11 represents the metadata set currently running by the node 1
  • the first leaf node b of the current base shape 11 is the current base shape of the logical neighboring node (node 2) for connecting the node 1
  • the second leaf node c of the current base shape 11 is a degenerate base shape 13 corresponding to the metadata set of the last time point of the connection node 1 itself; and the degenerate base shape 13 whose vertex a indicates that the node 1 is on the above A set of metadata running at a point in time, the second leaf node c is used to connect the metadata set run by the node 1 at a higher point in time, but the first leaf node b of the degraded basic shape 13 does not hang any data.
  • each node internally has the current basic shape of the node itself as the top layer of the metadata heap, and the degenerate basic shape does not necessarily exist in the metadata heap, for example Newly established clusters or clusters without any metadata modifications may not have degenerate base shapes.
  • FIG. 2 is a schematic diagram of a metadata heap in a node in an embodiment of a metadata management method according to the present invention.
  • FIG. 2 is an example of a cluster including three nodes of node 1, node 2, and node 3, and is internal to node 1. Metadata heap example. It should be noted that FIG.
  • FIG. 2 only shows the degraded data of the two levels of the node 1, that is, ⁇ and 1 ", where ⁇ is the metadata set of the node 1 running at the first time point, the first time point Earlier than the current time point, 1 " is the metadata set of node 1 running at the second time point, the second time point is earlier than the first time point;
  • Figure 2 shows the degradation data of one level of node 2, ie 2
  • Figure 2 also shows the metadata set currently running by node 3, but does not show the degraded data of node 3.
  • FIG. 2 is merely an example, and the degradation level of each node is not limited.
  • the node 2 may also have the second level of degradation data 2", and the node 3 may also have degraded data. Comparing FIG.
  • FIG. 2 with FIG. 1 substantially, if the current basic shape of a node 3 is connected at the first leaf node b of the current basic shape 12 in FIG. 1, FIG. 2 and FIG. 1 are the same; FIG.
  • the plurality of basic shapes dispersed in Fig. 1 are combined.
  • the degenerate basic shape 13 in Fig. 1 that is, the basic shape storing the data, in Fig. 1, the first leaf node b is not connected to any data, but in Fig. 2, the first base of the degenerate basic shape 13 A leaf node b is overlapped with the second leaf node c of the current base shape 12 of the node 2, as described above, only to make the representation of the metadata heap more concise.
  • the first leaf node b of the degenerate base shape 15 in FIG. 1 is also overlapped with the second leaf node c of the degenerate base shape 14 of the node 2.
  • the other overlapping processing principles are the same and will not be described again.
  • the metadata stack shape shown in FIG. 2 is also described.
  • the metadata heap at the top level is the current storage shape of the metadata set currently running by the storage node 1; the metadata heap starts from the top layer and is the rightmost column (indicated by C), that is The metadata of all nodes in the current cluster (including node 1, node 2, and node 3) (that is, the metadata currently running on all nodes).
  • a column indicated by D in Fig. 2 includes the degenerate basic shape of the degraded data ⁇ of the storage node 1 and the degraded basic shape of the storage node 2, and is the degraded data of the first hierarchy.
  • the degenerate base form 15 of the degraded data 1 " of the storage node 1 in Fig. 2 is the degraded data of the second level.
  • the nature of the metadata pile explained in the subsequent embodiment is similar.
  • the above details the structure of the metadata heap within a node and describes how metadata is organized through the metadata heap.
  • how to manage the metadata according to the metadata heap is specifically introduced, for example, how the metadata stored in the above metadata heap is consistent in each node of the cluster, How clusters are split and combined, how clusters implement metadata redundancy and fault tolerance, and so on.
  • FIG. 3 is a schematic flowchart of a method for managing a metadata according to an embodiment of the present invention.
  • the method may be performed by a node in a cluster.
  • the method in this embodiment is a simple description.
  • the specific principle refer to the intra-node metadata heap as described above.
  • the way of composition. As shown in FIG. 3, the method may include:
  • Metadata Store metadata as a metadata heap, where the metadata includes a metadata set currently running by the node and a metadata set currently running by all nodes except the cluster itself;
  • the metadata heap includes at least two basic shapes, each of which is a binary shape formed by a vertex, a first leaf node, and a second leaf node.
  • the storing the metadata as a metadata heap includes:
  • the first leaf node of the current basic shape is used to connect another one storing a metadata set currently running by another node a current basic shape
  • the first leaf node of the another current basic shape is used to connect and store another current basic shape of the metadata set currently running by the node, and so on to traverse all the nodes in the cluster
  • a second leaf node of each of the current basic shapes is used to connect the node corresponding to the current basic shape to run at a first time point in each current basic shape for storing a metadata set currently running by each node in the cluster.
  • the base shape, the second time point is earlier than the first time point, and so on.
  • the management of the metadata includes, for example, metadata management when the nodes of the cluster perform consistency synchronization, metadata management during cluster splitting and combining, redundancy of cluster metadata, and fault-tolerant management.
  • the method may further include: determining, according to a predetermined rule, that the current node is a master node; modifying a primary and secondary identifier of the current node as a primary use.
  • the primary and secondary identifiers may be Flag identifiers inside the storage node. After the identifier is modified for use as a primary node, the node becomes a primary node, and the metadata in the cluster may be managed, because the node is in the node. The same holds the structure of the metadata heap, which stores the metadata collection of all the nodes in the cluster, thus avoiding the data migration in the prior art.
  • the metadata management method in this embodiment does not need to perform data migration between the active and standby nodes when performing the active/standby switchover by storing the metadata set of the current node running in all the nodes in the cluster. , realizes the fast execution of the active/standby switchover in the cluster.
  • the metadata at each node can be
  • the heap retrieves the metadata of any node of the cluster; and if the degraded data is stored, the degraded data of the cluster can also be retrieved, which is also the degraded data of all the nodes of the entire cluster.
  • the metadata organization structure of the embodiment makes the retrieval of the metadata very Convenience.
  • FIG. 4 is a schematic diagram of the principle of another embodiment of the metadata management method of the present invention. This embodiment is to explain how each node in the cluster maintains data consistency, that is, synchronization of configuration data at each node.
  • the cluster includes four nodes, namely node 1, node 2, node 3, and node 4; and correspondingly displays the metadata heap form inside each node, which is identified by the top layer of each metadata pile. It is a collection of metadata currently running on this node.
  • the two layers of degraded data 1 of node 1 and 1 " are stored in the metadata heap of node 1, and only one layer of node 1 is stored in the metadata heap of node 3.
  • Degraded data ⁇ there is no storage node 1 degradation data in the metadata heap of node 2; this is because FIG. 4 is only an example, as already explained above, two nodes 1 can also be stored in node 2 and node 3.
  • Layer degradation data 1, and 1 are examples of the metadata heap form inside each node, which is identified by the top layer of each metadata pile. It is a collection of metadata currently running on this node.
  • the two layers of degraded data 1 of node 1 and 1 are stored in the metadata heap of node 1, and only one layer of node 1 is stored in the metadata heap
  • the embodiment may be configured such that the degraded data of the node itself must be saved.
  • the metadata heap of the node 1 must store 1, and 1 ", but the degraded data of other nodes may be optionally saved, for example, Node 1 can selectively save the degraded data of node 3, the actual node 3 has two layers of degraded data 3, and 3", but the node only saves one layer of degraded data 3; because, even if the degraded data of other nodes is not saved, It can also be obtained from the metadata heap of other nodes themselves.
  • this embodiment can be set, each node must save the metadata of its current operation and its own degraded data, and the metadata of the current running of other nodes must also be saved, and the degraded data for other nodes is Optional save.
  • the degraded data is stored in chronological order; for example:
  • the metadata heap vertex in node 1 stores the currently running metadata set, and stores the metadata set running at time point T1, 1 "stored Is the metadata set running at time point T2, then the current time point (for example, 10 points) - time point T1 (for example, 9 points) - time point T2 (for example, 8 points), the three times are sequentially forward, Then, if the current metadata set changes and needs to be saved, the metadata set running at the current time point (10 o'clock) needs to be stored, and the metadata set originally stored at the time point T1 (9 o'clock) is going backward. , stored to 1 ", the same reason, the original metadata collection at the time point T2 (8 points) of 1 " is also going backwards, stored to a newly established degenerate base shape 1 "'.
  • the specific preservation of several layers of degraded data can also be set autonomously.
  • the pre-set is to save the two layers of degraded data, ie 1 ", then there is no need to create a new degenerate base shape 1 ", and the original time point T2 stored at 1"
  • the metadata set (8 points) will be discarded directly.
  • it can be saved.
  • the running time points corresponding to the degradation data are also saved together. For example, the above-mentioned time points T1 (for example, 9 points) and time points T2 (for example, 8 points) are required to be stored.
  • the cluster usually includes the primary node and the standby node, and if some part of the metadata needs to be changed, it usually starts from the primary node, that is, the primary node changes first.
  • node 1 is the master node, and the metadata related to I/O is initiated from node 1 to each node; each node in the cluster can communicate with each other, node 1
  • the corresponding I/O metadata in other nodes may be sequentially changed in a certain logical order, or the changes of each other node may be performed concurrently.
  • first layer synchronization that is, the top level of the metadata heap in each node is configured to change the metadata.
  • the metadata heap of each node is internally updated synchronously: For example, in node 1, 1->2->3->4 synchronizes other node metadata in the node metadata heap, Node 2 internally synchronizes other node metadata in the node metadata heap with 2->3->4->1, and so on.
  • the internal update of the metadata heap described here means that, for example, after the first layer synchronization, the metadata stored in the top layer of the metadata heap of node 2, node 3, and node 4 is changed, and the currently running metadata set has been It is no longer the metadata set before the first layer synchronization. Therefore, the current base shape corresponding to node 2, node 3, and node 4 in the metadata heap inside node 1 ("1" in Figure 4,
  • node 1 can acquire the first node other than itself in the cluster (the first node refers to node 2, or node 3) , or node 4)
  • the currently running updated metadata set can be obtained from the current base shape at the top of the metadata heap in the first node.
  • the obtained updated metadata set is stored in the current basic shape corresponding to the first node in the metadata heap of the node 1; for example, the node 1 obtains the update data from the top of the metadata heap of the node 2
  • the identifier "2" stored in the own metadata heap corresponds to the current base shape 20, and the others are similar.
  • each node can also save the degraded data of other nodes. For example, after the top level of the metadata heap of node 2 is updated to a new metadata set, the metadata set stored at the top level before the first layer synchronization becomes the degraded data at the time point before the current time point, and node 2 will degrade the data. Saved in the degenerate base shape in its own metadata heap, for example stored in the identity 2, where the degradation is based Shape 21.
  • the node 1 can acquire the degradation data from the degenerate basic shape 21 of the node 2 and store it in the metadata heap of the node 1 itself, specifically in the degenerate basic shape 22 corresponding to the node 2, the degenerate basic shape 22
  • the second leaf node of the current base shape 20 corresponding to the node 2 (the left node is seen from FIG. 4). If node 1 has no degenerate basic shape of node 2 before, node 1 newly creates a degenerate base shape 22, stores the degraded data, and connects to the second leaf node of the current base shape 20.
  • FIG. 5 is a schematic diagram of another embodiment of a metadata management method according to the present invention. This embodiment illustrates how the metadata heap implements cluster splitting.
  • the cluster still includes four nodes, and the cluster includes node 1, node 2, node 3, and node 4.
  • the cluster is split into two two-node clusters, including node 1 and node 2.
  • New cluster and a new cluster of nodes 3 and 4.
  • the metadata heap inside each node in the new cluster must also be changed.
  • the new cluster consisting of node 1 and node 2 the new cluster does not include Node 3 and Node 4, then in the metadata heap inside Node 1, the current base shape corresponding to Node 3 is the current base shape that should not be connected to Node 2, because the two current connections in the metadata heap are connected.
  • the nodes corresponding to the basic shape belong to the same cluster, so it is necessary to disconnect the current basic shape corresponding to the node 3 and the node 2.
  • the metadata set corresponding to the node 3 and the node 4 needs to be segmented, because the node 3 and the node 4 no longer belong to a new cluster consisting of node 1 and node 2; and, to connect the metadata set corresponding to node 1 and node 2, because in the new cluster, the two nodes communicate with each other, and also connect in the metadata heap. stand up.
  • node 1 the connection between the current basic shape 31 of the node 3 and the current basic shape 32 of the node 2 is disconnected, and the current basic shape 31 is connected by the first leaf node b of the current basic shape 32, splitting When the connection here is disconnected (the slash line shown in FIG.
  • node 2 In node 2, the same basic shape 31 and node of node 3 need to be disconnected.
  • connection of the current base shape 32 at the first leaf node b (which may be referred to as the second node) the connection of the current base shape 32 at the first leaf node b; in addition, the current base shape 35 of the node 4 and the current state of the node 1 (which may be referred to as the third node) are also required to be disconnected
  • FIG. 6 is a schematic diagram of another embodiment of a metadata management method according to the present invention. This embodiment is to explain how the metadata heap implements cluster combination.
  • a cluster of two two-node clusters is used as a four-node cluster.
  • a cluster consisting of node 1 and node 2 and a cluster consisting of node 3 and node 4, the two cluster groups.
  • a new cluster is synthesized, which includes node 1, node 2, node 3, and node 4, which is equivalent to the reverse process of the embodiment shown in FIG.
  • the metadata set of node 3 and node 4 needs to be added to the metadata heap of node 1 and node 2, and the metadata set of node 1 and node 2 needs to be added to node 3 and In the metadata heap of node 4, the metadata heap inside each node in the new cluster includes at least the currently running metadata set of each node.
  • node 1 can obtain the metadata set currently running by node 3 from the top layer of the metadata heap inside node 3, and obtain the metadata set currently running by node 4 from the top level of the metadata heap inside node 4. . Then, the node 1 can establish a current basic shape 41 corresponding to the node 3, a current basic shape 42 corresponding to the node 4 in its own metadata heap, and store the current running metadata set of the node 3 in the current basic shape 41.
  • the metadata processing process of the node 2, the node 3, and the node 4 in the cluster combination is similar to the above, and will not be described again. See FIG. 6 for details.
  • the degraded data of each node is optionally saved; for example, in the node 1, the degraded data 3 of the first level of the node 3 may be selected, and in the node 2, two of the nodes 3 may be selected for saving.
  • the layer degradation data is 3, and 3"; of course, each node in the cluster can also save the current running metadata set of all nodes and the metadata set of the degraded data, so that the data stored in the metadata heap of each node is consistent.
  • the metadata heap implements support for cluster morphological transformation (such as splitting or combining); and, since the metadata heap is based on fractal theory design, its basic shape is very conducive to combination and segmentation, and a cluster can By dividing any number of nodes, the metadata heap can be split or combined according to the above principle, and the clustering and combination can be realized very simply.
  • cluster morphological transformation such as splitting or combining
  • the management of metadata based on fractal theory is not node-limited, because the metadata in each node is stored fractally (that is, stored in the basic shape), so in theory, as long as there is a storage in the node "metadata heap" "The memory and disk space, cluster nodes have no upper limit, and because the clustering and combination can be quickly and easily implemented, the impact on cluster performance is small under the condition that the number of nodes increases.
  • FIG. 7 is a schematic diagram of another embodiment of a metadata management method according to the present invention. This embodiment is a snapshot implementation principle of the metadata heap.
  • the second level degradation data 51 includes second level degradation data of each node of the cluster (may be referred to as a second time point)
  • the running metadata set for example, the degraded data 1" of the node 1, the degraded data 2 of the node 2, etc.
  • the node 1 can store the second hierarchical degradation data 51 as a whole, which is called "snapshot”.
  • the first level of degradation data 53 (which may be referred to as a metadata set running at a first point in time) and the current running data 52 of the currently running metadata set may also be stored; .
  • the second level of degradation data 51 stored in the previous snapshot is moved to the position of the current running data 52, that is, the second level of the degraded data 51 is stored in the current basic shape of the metadata heap, which is called "rolling forward" (ie, data). Move to the position of the more advanced time point; or, it is also possible to operate only the second level degradation data 51, which is equivalent to temporarily selected use, but does not move its position, and is still stored in the position shown in FIG.
  • the movement of the second hierarchical degradation data 51 may be referred to as "rolling forward"; corresponding, current operation
  • the position of the data 52 must also be moved, for example, to the position of the previous second level of degraded data 51, that is, the entirety of the metadata set running in each of the current basic shapes of the metadata heap, stored to the second level of the metadata heap is degraded.
  • the position corresponding to the second hierarchical degradation data 51 is interchanged, and the positional movement of the current operational data 52 can be referred to as "rollback".
  • FIG. 8 is a schematic diagram 1 of another embodiment of a metadata management method according to the present invention.
  • FIG. 9 is a schematic diagram of a second embodiment of a metadata management method according to the present invention. This embodiment illustrates how the metadata heap implements redundancy and Fault-tolerant.
  • node 1 can first view its own degenerate basic shape, such as degenerate basic shape 62, to see if there is repair data corresponding to the above-mentioned metadata to be repaired (ie, before loss or damage) Data); If there is, the repair data can be obtained immediately, and the acquired repair data is replaced with the metadata to be repaired in the current basic shape, and the node 1 realizes the self-repair of the metadata.
  • node 1 can be directly obtained from the metadata heap of other nodes. For example, referring to FIG. 9, node 1 can be from the node in node 2 The repair data is obtained in the corresponding current basic shape 63.
  • each node not only provides redundancy for its own metadata, for example, the metadata set currently running by the node is also stored in each of its degenerate basic shapes, and is also stored in other nodes, in the cluster. Other nodes, such as node 2, will store the currently running element of that node 1. Data collection, therefore, the metadata of node 1 is equivalent to having multiple backups. Node 1 can perform data repair in multiple ways, which improves the fault tolerance and redundancy level of data, and has higher security assurance.
  • This embodiment mainly illustrates that the management method of the metadata heap can make the data change mode more flexible, and can implement a more flexible cluster read/write lock.
  • the metadata heap inside node 1 is to modify part of the data in the currently running metadata set of node 1
  • the currently running metadata set for node 1 is according to the prior art Locking, which is equivalent to aborting the current running of node 1
  • other nodes are also unable to read or write the current running data of node 1, and can not resume normal reading and writing until the modification is completed; however, in this embodiment
  • the current running data of the node 1 is already redundantly stored in the degenerate basic shape, for example, the degenerate basic shape of the first level of the node 1 has the metadata to be modified in this part, then the node 1 may only base the degradation The shape is locked and modified.
  • Other nodes cannot read and write the data in the degenerate basic shape, but have no effect on the stored data of the current basic shape of node 1.
  • the modified new metadata is replaced with the current one.
  • the corresponding data in the basic shape can be.
  • the metadata heap of this embodiment can be designed in such a way that the data change is more flexible and does not affect the operation of the cluster.
  • FIG. 10 is a schematic diagram of another embodiment of a metadata management method according to the present invention.
  • a metadata set stored in a current basic shape corresponding to each node of a cluster may be stored in a memory, and the degenerate basic shape may be stored.
  • the metadata set is stored in a storage medium other than the memory to save memory usage.
  • the metadata heap inside node 1 can store the metadata in each current primitive in memory (Mem in the figure represents memory), while other degraded data, that is, the first level in Figure 10
  • the degraded data 71 and the second degraded data 72 exist as redundancy and fault tolerance, so they can be placed on other slow storage media, for example, the first hierarchical degradation data 71 is placed on a cache.
  • the second level of degraded data 72 is placed on a solid state disk (SSD) or DISK (disk), which saves memory usage.
  • SSD solid state disk
  • DISK disk
  • Example nine 11 is a schematic structural diagram of an embodiment of a metadata management apparatus according to the present invention.
  • the metadata management apparatus can execute a metadata management method according to any embodiment of the present invention, and the metadata management apparatus is equivalent to, for example, each set in a cluster.
  • a control module in the node, which is set in each node, can be used to store the metadata of the cluster, and can also be used for metadata management such as cluster splitting, combining or repairing the metadata.
  • This embodiment briefly describes the structure of the device, wherein the specific working principle of each functional unit can be combined with any of the method embodiments of the present invention.
  • the metadata management apparatus of this embodiment may include: a storage unit 91 and a management unit 92;
  • the storage unit 91 is configured to store the metadata in the cluster as a metadata heap, where the metadata includes a metadata set currently running by the current node itself and a metadata set currently running by all nodes in the cluster other than itself.
  • the metadata heap includes at least two basic shapes, each basic shape is a binary tree shape composed of a vertex, a first leaf node, and a second leaf node; and the storing the metadata as a metadata heap includes:
  • a second leaf node of each of the current basic shapes is used to connect the node corresponding to the current basic shape to run at a first time point in each current basic shape for storing a metadata set currently running by each node in the cluster.
  • a degenerate basic shape of the metadata set the first time point being earlier than the current time point;
  • the second leaf node of the degenerate basic shape is used to connect another set of metadata of the node running at the second time point a degenerate basic shape, the second time point is earlier than the first time point, and so on;
  • the management unit 92 is configured to perform management of the metadata according to the metadata heap.
  • the management unit 92 may include: a synchronization subunit 921, configured to acquire, after updating the metadata in the metadata heap, updated metadata currently running by nodes other than the cluster itself And storing the updated metadata set in a current basic shape corresponding to the node in the metadata heap.
  • the management unit 92 may include: a storage subunit 922, configured to use the metadata Degraded data corresponding to the node in the heap, stored in a degenerate basic shape, the degraded data being data stored in the current basic shape before the updated metadata set;
  • the base shape is connected to the second leaf node of the current base shape.
  • the storage unit 91 is specifically configured to connect the current basic shape of the first node to the first leaf node of the current basic shape of the second node, and connect the first leaf node of the current basic shape of the first node to the third node.
  • the current basic shape of the node; correspondingly, the management unit 92 may include: a morphological control sub-unit 923; the morphological control sub-unit 923 is configured to take the other node other than the current node itself in the cluster When the cluster is split, the current basic shape corresponding to the other node is disconnected from the first leaf node of the current basic shape of the current node, and the another current corresponding to the another node is And a further current basic shape corresponding to another node connected by the first leaf node of the basic shape; connecting another current basic shape of the further node to the first leaf node of the current basic shape of the current node Also used to acquire the current node of the new node from the metadata heap stored in the new node to be joined to the cluster when adding a
  • management unit 92 may include: a snapshot subunit 924, a selection subunit 925, and a migration subunit 926;
  • the snapshot sub-unit 924 is configured to separately store the entirety of the metadata set running by each node in the cluster at each time point, where each time point includes the current time point, the first time point, and the second time point, Running a metadata set corresponding to a certain time point in each of the time points.
  • a sub-unit 925 configured to run an entirety of a metadata set corresponding to a certain time point in the each time point, where the certain time point is a time point other than the current time, and the metadata set is The whole is stored in the degenerate basic shape of the metadata heap;
  • the migration sub-unit 926 is configured to store the entirety of the metadata set corresponding to the certain time point in each current basic shape of the metadata heap; and to use the current basis of the metadata heap The entirety of the set of metadata running in the shape is stored in the degenerate base shape of the metadata heap.
  • the management unit 92 may include: a repair subunit 927, configured to: when there is metadata to be repaired in a current basic shape corresponding to the current node in the metadata heap, Acquiring the repair data corresponding to the data to be repaired according to the degenerate basic shape corresponding to the current node in the stack; or acquiring the current basic shape corresponding to the metadata heap of the other nodes in the cluster The repair data corresponding to the data to be repaired; the repair data to be repaired is replaced with the metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap.
  • a repair subunit 927 configured to: when there is metadata to be repaired in a current basic shape corresponding to the current node in the metadata heap, Acquiring the repair data corresponding to the data to be repaired according to the degenerate basic shape corresponding to the current node in the stack; or acquiring the current basic shape corresponding to the metadata heap of the other nodes in the cluster The repair data corresponding to the data to be repaired; the repair data to be repaired is replaced
  • the management unit 92 may include: a storage control sub-unit 928, configured to store, in the memory, a metadata set stored in a current basic shape corresponding to each node of the cluster in the metadata heap, where the element is The set of metadata stored in the degenerate basic shape of the data heap is stored in a storage medium other than the memory.
  • the management unit 92 may include: a read/write control subunit 929, configured to acquire metadata that needs to be changed from a certain degenerate basic shape in the metadata heap, and perform locking modification on the metadata. And replacing the modified metadata with the metadata that needs to be changed stored in the current basic shape corresponding to the same.
  • the management unit 92 is further configured to: determine that the current node is a master node according to a predetermined rule; and modify the primary and secondary identifiers of the current node to be used as a primary node. .
  • the primary and secondary identifiers may be Flag identifiers inside the storage node. After the identifier is modified for primary use, the node becomes a primary node, and the metadata in the cluster may be managed, because the node is in the node. Also stored is a structure of a metadata heap in which a metadata set of all nodes in the cluster is stored, thus avoiding data migration in the prior art.
  • the metadata management apparatus provided by the present invention saves the current running metadata set of all the nodes in the cluster in each node, so that data migration between the active and standby nodes is not required when the active/standby switchover is performed.
  • the fast execution of the active/standby switchover in the cluster is implemented.
  • FIG. 12 is a schematic structural diagram of another embodiment of a metadata management apparatus according to the present invention.
  • the metadata management apparatus may perform a metadata management method according to any embodiment of the present invention.
  • the apparatus may include: a memory 1201 and processing. 1202; wherein
  • a storage 1201 configured to store metadata in the cluster as a metadata heap, where the metadata includes a metadata set currently running by the current node itself and a metadata set currently running by all nodes in the cluster other than itself,
  • the metadata heap includes at least two basic shapes, each of which is a binary tree shape composed of a vertex, a first leaf node, and a second leaf node; the storing the metadata as a metadata heap, including: Storing a metadata set currently running by the current node itself in a current basic shape at a top level of the metadata heap, where the first leaf node of the current basic shape is used to connect and store a metadata set currently running by another node. Another current basic shape, the first leaf node of the other current basic shape is used to connect to store another current basic shape of the metadata set currently running by another node, and so on until all nodes in the cluster are connected ;
  • a second leaf node of each of the current basic shapes is used to connect the node corresponding to the current basic shape to run at a first time point in each current basic shape for storing a metadata set currently running by each node in the cluster.
  • a degenerate basic shape of the metadata set the first time point being earlier than the current time point;
  • the second leaf node of the degenerate basic shape is used to connect another set of metadata of the node running at the second time point a degenerate basic shape, the second time point is earlier than the first time point, and so on;
  • the processor 1202 is configured to perform the management of the metadata according to the metadata heap.
  • the processor 1202 may be configured to: when updating the metadata in the metadata heap, acquire an updated metadata set currently running by one of the nodes other than the cluster itself; The updated metadata set is stored in a current basic shape in the metadata heap corresponding to the one of the nodes.
  • the processor 1202 is further configured to store, in the metadata base, degraded data corresponding to the one of the nodes in a degenerate basic shape, where the degraded data is stored before the updated metadata set. Data in the current base shape; connecting the degenerate base shape to a second leaf node of the current base shape.
  • the processor 1202 is further configured to: when the another node other than the current node itself in the cluster is split from the cluster, use a current basic shape corresponding to the another node, and the current The first leaf node of the current basic shape of the node is disconnected, and the other current node corresponding to the other node of the other current basic shape corresponding to the other node is disconnected Opening a connection; connecting the further current basic shape to a first leaf node of a current basic shape of the current node; and also for using a new node to join the cluster when adding a new node to the cluster Obtaining a currently running metadata set of the new node in a metadata heap stored in the metadata; establishing a current basic shape corresponding to the new node in a metadata heap of the current node, and The running metadata set is stored in a current basic shape corresponding to the new node, and the current basic shape of the new node is connected with a current basic shape of the first leaf node having the idleness in the metadata heap.
  • the processor 1202 is further configured to separately store a metadata set that is run at each time point, where the metadata set includes a metadata set of each node in the cluster, where each time point includes the current time point, A time point and a second time point are used to run a metadata set corresponding to a certain time point in each time point.
  • the processor 1202 is further configured to: when the metadata to be repaired exists in the current basic shape corresponding to the current node in the metadata heap, obtain the information from the degenerate basic shape corresponding to the current node in the metadata heap. Recovering the repair data corresponding to the repair data; or acquiring the repair data corresponding to the data to be repaired from the current basic shape of the metadata stack of the other nodes in the cluster; The metadata to be repaired in the current basic shape corresponding to the current node itself in the metadata heap is replaced.

Abstract

The present invention provides a metadata management method and device. The method comprises: storing in a current basic shape on a top layer a currently running metadata set of a node, a first leaf node of the current basic shape being used to be connected to another current basic shape storing a currently running metadata set of another node, a second leaf node of the current basic shape being used to be connected to a degradation basic shape of a metadata set, running at a first time point, of a node corresponding to the current basic shape, and the first time point being prior to the current time; and managing the metadata according to a metadata heap. The present invention implements quick execution of active/standby switch in a cluster.

Description

元数据管理方法和装置  Metadata management method and device
技术领域 本发明涉及存储技术, 特别涉及一种元数据管理方法和装置。 背景技术 TECHNICAL FIELD The present invention relates to storage technologies, and in particular, to a metadata management method and apparatus. Background technique
元数据是当前设备或系统对所管理资源的描述信息, 对设备自身的自我 维护和管理具有重要作用, 设备运行中涉及到的功能运算例如 I/O访问、 资 源分配等都需要对元数据进行增加、 删除、 修改等操作。 在分布式结构的集 群系统中, 元数据往往分散在集群的各个节点中。 并且, 集群中通常包括主 节点和备节点, 由主节点维护一个包括集群所有节点的元数据的完整的元数 据全集, 而备节点仅仅维护自身可能使用到的元数据子集(该子集是相对于 主节点维护的全集而言) , 备节点所维护的子集是维持该节点正常业务运行 所必需的元数据。  Metadata is the description of the resources managed by the current device or system. It plays an important role in the self-maintenance and management of the device itself. The functional operations involved in device operation, such as I/O access and resource allocation, need to be performed on the metadata. Add, delete, modify, etc. In a distributed structure cluster system, metadata is often scattered among the nodes of the cluster. Moreover, the cluster usually includes a primary node and a standby node, and the primary node maintains a complete metadata complete set including metadata of all nodes of the cluster, and the standby node only maintains a subset of metadata that may be used by itself (the subset is Relative to the corpus maintained by the primary node, the subset maintained by the standby node is the metadata necessary to maintain normal operation of the node.
集群中可能会出现主备切换的问题, 即由于某种原因, 某个备节点将成 为新主节点, 原主节点将成为备节点, 此时将出现数据迁移, 原主节点上维 护的元数据全集将迁移到新主节点, 例如, 可以从包括原主节点的元数据集 合镜像的磁盘上将元数据全部复制到新节点。 集群维护的节点数目越多, 元 数据全集中的数据量越大, 数据复制的时间就越长; 然而在集群中对主备切 换的切换时间是有严格限制的, 一旦超时就会导致集群 I/O业务被阻塞等严 重后果。 发明内容  There may be a problem of active/standby switchover in the cluster. That is, for some reason, a standby node will become the new primary node, and the original primary node will become the standby node. At this time, data migration will occur, and the metadata collection maintained on the original primary node will be Migrate to the new primary node, for example, you can copy all of the metadata to the new node from the disk that mirrors the metadata set of the original primary node. The more nodes the cluster maintains, the larger the amount of data in the metadata set and the longer the data replication time. However, the switching time of the active/standby switchover in the cluster is strictly limited. Once the timeout occurs, the cluster I will be caused. /O business is blocked and other serious consequences. Summary of the invention
本发明提供一种元数据管理方法和装置, 以快速实现分布式集群中的主 备节点切换。  The present invention provides a metadata management method and apparatus for quickly implementing active/standby node switching in a distributed cluster.
本发明第一方面提供一种元数据管理方法, 所述方法应用于包含多个节 点的集群中, 所述方法包括:  A first aspect of the present invention provides a metadata management method, where the method is applied to a cluster including a plurality of nodes, and the method includes:
将所述集群中的元数据存储为元数据堆, 所述元数据包括当前节点自身 的当前运行的元数据集合以及除自身之外的集群中所有节点当前运行的元数 据集合, 所述元数据堆包括至少两个基础形, 每个基础形是由顶点、 第一叶 子节点和第二叶子节点构成的二叉树; Storing the metadata in the cluster as a metadata heap, the metadata including the current node itself a currently running metadata set and a metadata set currently running by all nodes in the cluster other than itself, the metadata pile including at least two basic shapes, each of which is composed of a vertex, a first leaf node, and a a binary tree composed of two leaf nodes;
所述将所述集群中的元数据存储为元数据堆, 具体包括:  The storing the metadata in the cluster as a metadata heap includes:
将所述当前节点自身当前运行的元数据集合存储在位于所述元数据堆顶 层的当前基础形中, 所述当前基础形的第一叶子节点用于连接存储另一节点 当前运行的元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子 节点用于连接存储再一节点当前运行的元数据集合的再一当前基础形, 依此 类推直至连接所述集群中的所有节点,  Storing a metadata set currently running by the current node itself in a current basic shape located at a top level of the metadata heap, where the first leaf node of the current basic shape is used to connect and store a metadata set currently running by another node Another current basic shape of the other current basic shape, the first leaf node of the other current basic shape is used to connect to store another current basic shape of the metadata set currently running by the node, and so on until all the nodes in the cluster are connected Node,
所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点的 退化基础形, 所述退化基础形是当前基础形对应的节点在第一时间点运行的 元数据集合, 所述第一时间点早于当前时间点; 所述退化基础形的第二叶子 节点用于连接所述当前基础形对应的节点的另一退化基础形, 所述另一退化 基础形是当前基础形对应的节点在第二时间点运行的元数据集合, 所述第二 时间点早于第一时间点, 依此类推;  The second leaf node of the current basic shape is used to connect the degenerate basic shape of the node corresponding to the current basic shape, and the degenerate basic shape is a metadata set that is operated by the node corresponding to the current basic shape at the first time point. The first time point is earlier than the current time point; the second leaf node of the degenerate basic shape is used to connect another degenerate basic shape of the node corresponding to the current basic shape, and the other degenerate basic shape is the current basic shape a set of metadata that the corresponding node runs at a second time point, the second time point is earlier than the first time point, and so on;
根据所述元数据堆进行所述元数据的管理。  The management of the metadata is performed according to the metadata heap.
在第一方面的第一种可能的实现方式中, 在对所述元数据堆中的元数据 进行更新时, 所述根据所述元数据堆进行所述元数据的管理, 包括: 获取所 述集群中自身之外的其中一个节点当前运行的更新后的元数据集合; 将所述 更新后的元数据集合, 存储在所述元数据堆中与所述其中一个节点对应的当 前基础形中。  In a first possible implementation manner of the first aspect, when the metadata in the metadata heap is updated, the managing the metadata according to the metadata heap includes: acquiring the An updated metadata set currently running by one of the nodes other than itself in the cluster; storing the updated metadata set in a current basic shape in the metadata heap corresponding to the one of the nodes.
结合第一方面的第一种可能的实现方式, 在第二种可能的实现方式中, 在所述存储在所述元数据堆中与所述其中一个节点对应的当前基础形中之 后, 还包括: 将所述元数据堆中的与所述其中一个节点对应的退化数据, 存 储在退化基础形中, 所述退化数据是在所述更新后的元数据集合之前存储在 所述当前基础形中的数据; 将所述退化基础形连接在所述当前基础形的第二 叶子节点。  In conjunction with the first possible implementation of the first aspect, in a second possible implementation, after the storing in the current basic shape corresponding to the one of the nodes in the metadata heap, : storing, in the metadata base, degraded data corresponding to the one of the nodes in a degenerate basic shape, where the degraded data is stored in the current basic shape before the updated metadata set Data; connecting the degenerate base shape to a second leaf node of the current base shape.
在第一方面的第三种可能的实现方式中, 在将所述集群中的当前节点自 身之外的所述另一节点从所述集群分裂出去时, 所述根据所述元数据堆进行 所述元数据的管理, 包括: 将所述另一节点对应的当前基础形, 与所述当前 节点的当前基础形的第一叶子节点断开连接, 并将所述另一节点对应的所述 另一当前基础形的第一叶子节点连接的再一节点对应的所述再一当前基础形 断开连接; 将所述再一当前基础形连接在所述当前节点的当前基础形的第一 叶子节点。 In a third possible implementation manner of the first aspect, when the another node other than the current node itself in the cluster is split from the cluster, the performing according to the metadata heap The management of the metadata includes: the current basic shape corresponding to the another node, and the current The first leaf node of the current basic shape of the node is disconnected, and the other current node corresponding to the other node of the other current basic shape corresponding to the other node is disconnected Opening a connection; connecting the further current basic shape to a first leaf node of a current basic shape of the current node.
在第一方面的第四种可能的实现方式中,在向所述集群中加入新节点时, 所述根据所述元数据堆进行所述元数据的管理, 包括: 从将要加入所述集群 的新节点中存储的元数据堆中, 获取所述新节点的当前运行的元数据集合; 在所述当前节点的元数据堆建立与所述新节点对应的当前基础形, 并将所述 新节点的当前运行的元数据集合存储在所述新节点对应的当前基础形中, 将 所述新节点的当前基础形与所述元数据堆中具有空闲的第一叶子节点的当前 基础形连接。  In a fourth possible implementation manner of the first aspect, when the new node is added to the cluster, the managing the metadata according to the metadata heap includes: from being to join the cluster Obtaining a currently running metadata set of the new node in a metadata heap stored in a new node; establishing a current basic shape corresponding to the new node in a metadata heap of the current node, and the new node The currently running metadata set is stored in a current basic shape corresponding to the new node, and the current basic shape of the new node is connected to a current basic shape of the first leaf node having the idleness in the metadata heap.
在第一方面的第五种可能的实现方式中, 所述根据所述元数据堆进行所 述元数据的管理, 包括: 分别存储在各时间点运行的元数据集合, 所述元数 据集合包括所述集群中各节点的元数据集合, 所述各时间点包括所述当前时 间点、 第一时间点和第二时间点, 以运行所述各时间点中的某个时间点对应 的元数据集合。  In a fifth possible implementation manner of the foregoing aspect, the performing the management of the metadata according to the metadata heap includes: storing, respectively, a metadata set running at each time point, where the metadata set includes a metadata set of each node in the cluster, where each time point includes the current time point, a first time point, and a second time point, to run metadata corresponding to a certain time point in each time point set.
在第一方面的第六种可能的实现方式中, 所述根据所述元数据堆进行所 述元数据的管理, 包括: 当所述元数据堆中当前节点自身对应的当前基础形 中存在待修复元数据时, 从所述元数据堆中当前节点自身对应的退化基础形 中获取与所述待修复数据对应的修复数据; 或者, 从所述集群中的其他节点 的元数据堆的与自身对应的当前基础形中获取所述待修复数据对应的修复数 据; 将获取的所述修复数据替换所述元数据堆中当前节点自身对应的当前基 础形中的待修复元数据。  In a sixth possible implementation manner of the foregoing aspect, the performing the management of the metadata according to the metadata heap includes: when a current basic shape corresponding to a current node in the metadata heap exists When the metadata is repaired, the repair data corresponding to the data to be repaired is obtained from the degenerate basic shape corresponding to the current node itself in the metadata heap; or the metadata heap from the other nodes in the cluster and itself Acquiring the repair data corresponding to the data to be repaired in the current basic shape; replacing the acquired repair data with the metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap.
本发明第二方面提供一种元数据管理装置, 包括:  A second aspect of the present invention provides a metadata management apparatus, including:
存储单元, 用于将集群中的元数据存储为元数据堆, 所述元数据包括当 前节点自身当前运行的元数据集合以及除自身之外的集群中所有节点当前运 行的元数据集合, 所述元数据堆包括至少两个基础形, 每个基础形是由顶点、 第一叶子节点和第二叶子节点构成的二叉树形状; 所述将元数据存储为元数 据堆, 包括:  a storage unit, configured to store metadata in the cluster as a metadata heap, where the metadata includes a metadata set currently running by the current node itself and a metadata set currently running by all nodes in the cluster other than itself, The metadata heap includes at least two basic shapes, each of which is a binary tree shape composed of a vertex, a first leaf node, and a second leaf node; the storing the metadata as a metadata heap, including:
将所述当前节点自身当前运行的元数据集合存储在位于所述元数据堆顶 层的当前基础形, 所述当前基础形的第一叶子节点用于连接存储另一节点当 前运行的元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子节 点用于连接存储再一节点当前运行的元数据集合的再一当前基础形, 依此类 推直至连接所述集群中的所有节点; Storing the metadata set currently running by the current node itself at the top of the metadata heap a current basic shape of the layer, the first leaf node of the current basic shape is used to connect another current basic shape storing a metadata set currently running by another node, and the first leaf node of the other current basic shape is used for Connecting to store another current basic shape of the metadata set currently running by the node, and so on until all nodes in the cluster are connected;
用于存储集群中各节点当前运行的元数据集合的各当前基础形中, 每个 所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点在第一 时间点运行的元数据集合的退化基础形, 所述第一时间点早于所述当前时间 点; 所述退化基础形的第二叶子节点用于连接所述节点在第二时间点运行的 元数据集合的另一退化基础形, 所述第二时间点早于第一时间点, 依此类推; 管理单元, 用于根据所述元数据堆进行所述元数据的管理。  a second leaf node of each of the current basic shapes is used to connect the node corresponding to the current basic shape to run at a first time point in each current basic shape for storing a metadata set currently running by each node in the cluster. a degenerate basic shape of the metadata set, the first time point being earlier than the current time point; the second leaf node of the degenerate basic shape is used to connect another set of metadata of the node running at the second time point a degenerate basic shape, the second time point is earlier than the first time point, and so on; the management unit is configured to perform management of the metadata according to the metadata heap.
在第二方面的第一种可能的实现方式中, 所述管理单元包括: 同步子单 元, 用于在对所述元数据堆中的元数据进行更新时, 获取所述集群中自身之 外的其中一个节点当前运行的更新后的元数据集合; 将所述更新后的元数据 集合, 存储在所述元数据堆中与所述其中一个节点对应的当前基础形中。  In a first possible implementation manner of the second aspect, the management unit includes: a synchronization subunit, configured to acquire, in addition to the cluster itself, the metadata in the metadata heap An updated metadata set currently running by one of the nodes; storing the updated metadata set in a current basic shape in the metadata heap corresponding to the one of the nodes.
结合第二方面的第一种可能的实现方式, 在第二种可能的实现方式中, 所述管理单元包括: 存储子单元, 用于将所述元数据堆中的与所述其中一个 节点对应的退化数据, 存储在退化基础形中, 所述退化数据是在所述更新后 的元数据集合之前存储在所述当前基础形中的数据; 将所述退化基础形连接 在所述当前基础形的第二叶子节点。  With reference to the first possible implementation of the second aspect, in a second possible implementation, the management unit includes: a storage subunit, configured to correspond to one of the metadata heaps Degraded data stored in a degenerate basic shape, the degraded data being data stored in the current basic shape before the updated metadata set; connecting the degenerate basic shape to the current basic shape The second leaf node.
在第二方面的第三种可能的实现方式中, 所述管理单元包括: 形态控制 子单元, 用于在将所述集群中的当前节点自身之外的所述另一节点从所述集 群分裂出去时, 将所述另一节点对应的当前基础形, 与所述当前节点的当前 基础形的第一叶子节点断开连接, 并将所述另一节点对应的所述另一当前基 础形的第一叶子节点连接的再一节点对应的所述再一当前基础形断开连接; 将所述再一当前基础形连接在所述当前节点的当前基础形的第一叶子节点; 还用于在向所述集群中加入新节点时, 从将要加入所述集群的新节点中存储 的元数据堆中, 获取所述新节点的当前运行的元数据集合; 在所述当前节点 的元数据堆建立与所述新节点对应的当前基础形, 并将所述新节点的当前运 行的元数据集合存储在所述新节点对应的当前基础形中, 将所述新节点的当 前基础形与元数据堆中具有空闲的第一叶子节点的当前基础形连接。 在第二方面的第四种可能的实现方式中, 所述管理单元包括: 快照子单 元, 用于分别存储在各时间点运行的元数据集合, 所述元数据集合包括所述 集群中各节点的元数据集合, 所述各时间点包括所述当前时间点、 第一时间 点和第二时间点, 以运行各时间点中的某个时间点对应的元数据集合。 In a third possible implementation manner of the second aspect, the management unit includes: a morphological control subunit, configured to split the another node other than the current node itself in the cluster from the cluster When going out, disconnecting the current basic shape corresponding to the another node from the first leaf node of the current basic shape of the current node, and the another current basic shape corresponding to the another node The further current basic shape corresponding to the further node connected by the first leaf node is disconnected; the first current basic shape is connected to the first leaf node of the current basic shape of the current node; When a new node is added to the cluster, a metadata set of the current running of the new node is obtained from a metadata heap stored in a new node to be joined to the cluster; and a metadata heap of the current node is established. a current basic shape corresponding to the new node, and storing the currently running metadata set of the new node in a current basic shape corresponding to the new node, and current The base shape is connected to the current base shape of the first leaf node that is free in the metadata heap. In a fourth possible implementation manner of the second aspect, the management unit includes: a snapshot subunit, configured to separately store a metadata set running at each time point, where the metadata set includes each node in the cluster The metadata set includes the current time point, the first time point, and the second time point to run a metadata set corresponding to a certain time point in each time point.
在第二方面的第五种可能的实现方式中, 所述管理单元包括: 修复子单 元, 用于当所述元数据堆中当前节点自身对应的当前基础形中存在待修复元 数据时, 从所述元数据堆中当前节点自身对应的退化基础形中获取与所述待 修复数据对应的修复数据; 或者, 从所述集群中的其他节点的元数据堆的与 自身对应的当前基础形中获取所述待修复数据对应的修复数据; 将获取的所 述修复数据替换所述元数据堆中当前节点自身对应的当前基础形中的待修复 元数据。  In a fifth possible implementation manner of the second aspect, the management unit includes: a repair subunit, configured to: when there is metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap, Acquiring the repair data corresponding to the data to be repaired in the degenerate basic shape corresponding to the current node in the metadata heap; or from the current basic shape corresponding to the metadata heap of the other nodes in the cluster Acquiring the repair data corresponding to the data to be repaired; replacing the acquired repair data with the metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap.
在第二方面的第六种可能的实现方式中, 当所述集群中的主节点下电时, 所述管理单元还用于: 根据预定的规则确定所述当前节点为主节点; 将所述 当前节点的主备用标识符修改为主用。  In a sixth possible implementation manner of the second aspect, when the primary node in the cluster is powered off, the management unit is further configured to: determine, according to a predetermined rule, that the current node is a master node; The primary and secondary identifiers of the current node are modified for the primary use.
本发明第三方面提供一种元数据管理装置, 包括存储器和处理器; 所述存储器用于将所述集群中的元数据存储为元数据堆, 所述元数据包 括当前节点自身的当前运行的元数据集合以及除自身之外的集群中所有节点 当前运行的元数据集合, 所述元数据堆包括至少两个基础形, 每个基础形是 由顶点、 第一叶子节点和第二叶子节点构成的二叉树;  A third aspect of the present invention provides a metadata management apparatus, including a memory and a processor, where the memory is used to store metadata in the cluster as a metadata heap, where the metadata includes a current operation of the current node itself. a metadata set and a metadata set currently running by all nodes in the cluster other than itself, the metadata pile including at least two basic shapes, each basic shape being composed of a vertex, a first leaf node, and a second leaf node Binary tree
所述将所述集群中的元数据存储为元数据堆, 具体包括:  The storing the metadata in the cluster as a metadata heap includes:
将所述当前节点自身当前运行的元数据集合存储在位于所述元数据堆顶 层的当前基础形中, 所述当前基础形的第一叶子节点用于连接存储另一节点 当前运行的元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子 节点用于连接存储再一节点当前运行的元数据集合的再一当前基础形, 依此 类推直至连接所述集群中的所有节点 ,  Storing a metadata set currently running by the current node itself in a current basic shape located at a top level of the metadata heap, where the first leaf node of the current basic shape is used to connect and store a metadata set currently running by another node Another current basic shape of the other current basic shape, the first leaf node of the other current basic shape is used to connect to store another current basic shape of the metadata set currently running by the node, and so on until all the nodes in the cluster are connected Node,
所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点的 退化基础形, 所述退化基础形是当前基础形对应的节点在第一时间点运行的 元数据集合, 所述第一时间点早于当前时间点; 所述退化基础形的第二叶子 节点用于连接所述当前基础形对应的节点的另一退化基础形, 所述另一退化 基础形是当前基础形对应的节点在第二时间点运行的元数据集合, 所述第二 时间点早于第一时间点, 依此类推; The second leaf node of the current basic shape is used to connect the degenerate basic shape of the node corresponding to the current basic shape, and the degenerate basic shape is a metadata set that is operated by the node corresponding to the current basic shape at the first time point. The first time point is earlier than the current time point; the second leaf node of the degenerate basic shape is used to connect another degenerate basic shape of the node corresponding to the current basic shape, and the other degenerate basic shape is the current basic shape a metadata set of the corresponding node running at the second time point, the second The time is earlier than the first time, and so on;
所述处理器, 用于根据所述元数据堆进行所述元数据的管理。  The processor is configured to perform management of the metadata according to the metadata heap.
本发明提供的元数据管理方法和装置, 通过在每个节点内部都保存有一 个元数据堆, 每个元数据堆中都保存有当前节点自身的当前运行的元数据集 合以及除自身之外的集群中所有节点当前运行的元数据集合, 由于在该集群 中的所有节点都具有同样的架构, 都可以充当主节点的角色, 使得在进行主 备切换时, 只需要将节点内部的主备用标识符修改为主用即可, 不需要在主 备节点之间进行数据迁移, 也就避免了现有技术在主备切换时大量的数据迁 移, 实现了集群中主备切换的快速执行。 附图说明 为了更清楚地说明本发明实施例中的技术方案, 下面将对实施例描述中 所需要使用的附图作一简单地介绍, 显而易见地, 下面描述中的附图是本发 明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动性的 前提下, 还可以根据这些附图获得其他的附图。  The metadata management method and apparatus provided by the present invention saves a metadata heap in each node, and each metadata pile stores a current metadata set of the current node itself and a data set other than itself. The metadata set of all the nodes in the cluster can be used as the master node because all the nodes in the cluster have the same architecture. Therefore, only the primary and secondary identifiers inside the node need to be used for the active/standby switchover. The modification can be used as the main function. It does not need to perform data migration between the active and standby nodes. This avoids the large amount of data migration during the active/standby switchover of the prior art, and implements the fast execution of the active/standby switchover in the cluster. BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the technical solutions in the embodiments of the present invention, a brief description of the drawings to be used in the description of the embodiments will be briefly made. It is obvious that the drawings in the following description are some of the present invention. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative labor.
图 1为本发明元数据管理方法实施例中的节点内元数据堆的构成方式示 意图;  1 is a schematic diagram showing the configuration of a metadata heap in a node in an embodiment of a metadata management method according to the present invention;
图 2为本发明元数据管理方法实施例中的节点内元数据堆示意图; 图 3为本发明元数据管理方法一实施例的流程示意图;  2 is a schematic diagram of a metadata heap in a node in an embodiment of a metadata management method according to the present invention; FIG. 3 is a schematic flowchart of an embodiment of a metadata management method according to the present invention;
图 4为本发明元数据管理方法另一实施例的原理示意图;  4 is a schematic diagram of a principle of another embodiment of a metadata management method according to the present invention;
图 5为本发明元数据管理方法再一实施例的原理示意图;  FIG. 5 is a schematic diagram of a schematic diagram of still another embodiment of a metadata management method according to the present invention; FIG.
图 6为本发明元数据管理方法又一实施例的原理示意图;  6 is a schematic diagram of a schematic diagram of still another embodiment of a metadata management method according to the present invention;
图 7为本发明元数据管理方法又一实施例的原理示意图;  7 is a schematic diagram of a schematic diagram of still another embodiment of a metadata management method according to the present invention;
图 8为本发明元数据管理方法又一实施例的原理示意图一;  FIG. 8 is a schematic diagram 1 of another embodiment of a metadata management method according to the present invention; FIG.
图 9为本发明元数据管理方法又一实施例的原理示意图二;  FIG. 9 is a schematic diagram 2 of another embodiment of a metadata management method according to the present invention; FIG.
图 10为本发明元数据管理方法又一实施例的原理示意图;  FIG. 10 is a schematic diagram of a schematic diagram of still another embodiment of a metadata management method according to the present invention; FIG.
图 1 1为本发明元数据管理装置一实施例的结构示意图;  FIG. 1 is a schematic structural diagram of an embodiment of a metadata management apparatus according to the present invention;
图 12为本发明元数据管理装置另一实施例的结构示意图。 具体实施方式 本发明实施例是基于分形理论实现对元数据的管理, 分形理论通常被理 解为"一个粗糙或零碎的几何形状, 可以分成数个部分, 且每一部分至少会大 致是整体缩小尺寸的形状", 此性质称为自相似; 一个数学意义上的分形是基 于一个不断迭代的方程式, 一种基于递归的反馈系统。 分形有几种类型, 可 以分别依据表现出的精确自相似性、 半自相似性和统计自相似性来定义; 分 形一般有以下特质: 在任意小的尺度上都能有精细的结构; 太不规则, 无论 是其整体或局部都难以用传统欧氏几何的语言来描述; 具有 (至少是近似的 或统计的) 自相似形式。 FIG. 12 is a schematic structural diagram of another embodiment of a metadata management apparatus according to the present invention. detailed description Embodiments of the present invention implement metadata management based on fractal theory. Fractal theory is generally understood as "a rough or fragmentary geometry that can be divided into several parts, and each part is at least roughly the overall reduced size shape", This property is called self-similarity; a mathematical fractal is based on an iterative equation, a recursive-based feedback system. There are several types of fractals, which can be defined according to the exact self-similarity, semi-self-similarity and statistical self-similarity respectively. Fractals generally have the following characteristics: they can have fine structures on any small scale; Rules, whether in whole or in part, are difficult to describe in the language of traditional Euclidean geometry; have (at least approximate or statistical) self-similar forms.
如下说明本发明实施例是如何将分形理论应用于元数据的组织和管理 的, 该元数据的组织和管理适用于包括多个节点的集群, 例如包括: 计算机 主机集群的元数据管理, 集群设备的集群緩存(Cache )元数据管理, 分布式 文件系统(File System ) 的元数据管理, 以及其他所有需要离散的管理集群 数据的场景。 如下的实施例中以分布式的集群系统为例, 对本发明实施例的 方法进行说明。 在分布式的集群系统中包括多个节点, 本实施例在集群每个 节点的内部都存储整个集群所有节点的元数据集合, 至少存储所有节点的当 前运行的元数据集合。 其中, 在每个节点内都将元数据存储为元数据堆, 该 元数据堆的构成方式就是基于分形理论。  The following describes how the embodiment of the present invention applies fractal theory to the organization and management of metadata. The organization and management of the metadata is applicable to clusters including multiple nodes, including, for example, metadata management of a cluster of computer hosts, cluster devices. Cache metadata management, metadata management for the Distributed File System, and all other scenarios that require discrete management of cluster data. The following embodiment uses a distributed cluster system as an example to describe the method of the embodiment of the present invention. In the distributed cluster system, a plurality of nodes are included. In this embodiment, a metadata set of all nodes of the entire cluster is stored inside each node of the cluster, and at least a metadata set of all nodes currently stored is stored. Among them, the metadata is stored as a metadata heap in each node, and the composition of the metadata heap is based on the fractal theory.
如下首先对本发明各实施例中所用到的几个概念进行简单说明: 元数据堆: 在每个节点内部, 存储了多种元数据, 例如包括节点自身的 元数据、 其他节点的元数据等, 这些元数据之间相互关联存储, 所有元数据 组成的整体称为元数据堆;  Firstly, several concepts used in various embodiments of the present invention are briefly described as follows: Metadata heap: Within each node, a plurality of metadata, such as metadata including nodes themselves, metadata of other nodes, etc., are stored. These metadata are stored in association with each other, and the whole of all metadata is called a metadata heap;
基础形: 存储元数据的单元, 每一个基础形存储一种类型的元数据, 例 如, 其中一个基础形用于存储该节点自身的元数据, 另一个基础形用于存储 另一个节点的元数据。  Basic shape: A unit that stores metadata. Each basic shape stores one type of metadata. For example, one basic shape is used to store metadata of the node itself, and another basic shape is used to store metadata of another node. .
该基础形包括当前基础形和退化基础形, 其中, 当前基础形: 存储节点 自身当前运行的元数据; 退化基础形: 存储节点自身在位于当前时间点之前 的上一个时间点运行的元数据。  The base shape includes a current base shape and a degenerate base shape, wherein the current base shape: metadata of the current running of the storage node itself; the degenerate base shape: metadata of the storage node itself running at a previous time point before the current time point.
其中, 关于上述各概念的具体含义及其相互关系在下面的各实施例中将 伴细进行描述。  Here, the specific meanings of the above respective concepts and their mutual relations will be described in detail in the following embodiments.
图 1为本发明元数据管理方法实施例中的节点内元数据堆的构成方式示 意图, 以包括节点 1和节点 2这两个节点的集群为例, 图 1所示的是该集群 内的节点 1内部的元数据堆的构成原理。 该元数据堆包括至少两个基础形, 该基础形指的是,例如,用于存储节点 1当前运行的元数据集合的基础形 11、 用于存储节点 2当前运行的元数据集合的另一个基础形 12等, 由于基础形 11和基础形 12存储的是节点当前运行的元数据集合, 因此可以称为 "当前 基础形" , 本发明后续的实施例中提到的 "当前基础形" 也均是指的用于存 储节点当前运行的元数据集合的基础形, 该节点指的是集群中的各节点。 在 集群中的每个节点的内部, 都至少包括存储集群各节点的当前运行的元数据 集合的当前基础形。 例如, 若集群为五节点集群, 则每个节点内部都至少包 括 5个当前基础形, 分别用于存储各节点的当前运行的元数据集合。 FIG. 1 is a schematic diagram showing the configuration of a metadata heap in a node in an embodiment of a metadata management method according to the present invention; The intention is to take a cluster including two nodes, node 1 and node 2 as an example. FIG. 1 shows the principle of the metadata heap inside the node 1 in the cluster. The metadata heap includes at least two basic shapes, that is, for example, a basic shape 11 for storing a metadata set currently running by the node 1, and another for storing a metadata set currently running by the node 2 The basic shape 12 and the like, since the basic shape 11 and the basic shape 12 store the metadata set currently running by the node, it may be referred to as "current basic shape", and the "current basic shape" mentioned in the subsequent embodiment of the present invention is also Both refer to the basic shape of the metadata set currently used to store the node, which refers to each node in the cluster. Inside each node in the cluster, at least the current base shape of the currently running metadata set of each node of the storage cluster is stored. For example, if the cluster is a five-node cluster, each node internally includes at least five current basic shapes for storing the currently running metadata sets of the nodes.
如图 1所示,每个基础形包括顶点 a、第一叶子节点 b和第二叶子节点 c, 该基础形是类似三角形的形状, 也可以称为二叉树形状; 例如, 基础形 12也 同样具有上述的顶点 a、 第一叶子节点 b和第二叶子节点^ 基础形 12连接 在基础形 11的第一叶子节点 b上。 同理, 如果该集群还包括节点 3 , 则与节 点 3对应的当前基础形将会连接在基础形 12的第一叶子节点 b上;依此类推 直至遍历集群中的所有节点。  As shown in FIG. 1, each of the basic shapes includes a vertex a, a first leaf node b, and a second leaf node c, the basic shape being a triangle-like shape, which may also be referred to as a binary tree shape; for example, the base shape 12 also has The above-mentioned vertex a, first leaf node b, and second leaf node ^ base 12 are connected to the first leaf node b of the base shape 11. Similarly, if the cluster also includes node 3, the current base shape corresponding to node 3 will be connected to the first leaf node b of the base shape 12; and so on until all nodes in the cluster are traversed.
实际上, 在元数据堆中引入基础形是为了更清楚的说明存储的元数据集 合之间的连接关系; 具体的, 例如, 基础形 11中, 可以理解为顶点 a表示节 点 1当前运行的元数据集合(对于该集合内部的数据结构本实施例不做限 制) , 即, 每个基础形存储的元数据集合可以由顶点 a表示。 而基础形的第 一叶子节点 b和第二叶子节点 c, 可以理解为是 "连接接口" , 是用于连接另 一个元数据集合的, 表示这连个元数据集合之间是具有关系的; 例如, 如图 1所示, 基础形 11的第一叶子节点 b就是用于连接节点 2的元数据集合, 即 第一叶子节点 b是将各节点当前运行的元数据集合联系起来的连接接口。  In fact, the basic shape is introduced in the metadata heap in order to more clearly explain the connection relationship between the stored metadata sets; specifically, for example, in the basic shape 11, it can be understood that the vertex a represents the element currently running by the node 1. The data set (there is no limitation on the data structure inside the set), that is, the metadata set stored in each base shape can be represented by the vertex a. The first leaf node b and the second leaf node c of the basic shape can be understood as a "connection interface" for connecting another metadata set, indicating that the metadata sets are related to each other; For example, as shown in FIG. 1, the first leaf node b of the base shape 11 is a metadata set for connecting to the node 2, that is, the first leaf node b is a connection interface that associates the metadata set currently running by each node.
通过在集群每个节点内都存储整个集群各节点当前运行的元数据集合, 使得即使发生主备切换, 由于各节点内存储的元数据是相同的, 所以也不需 要在原主节点和新主节点之间进行数据迁移, 从而可以快速实现主备切换。  By storing the metadata set currently running by each node of the entire cluster in each node of the cluster, even if the active/standby switchover occurs, since the metadata stored in each node is the same, the original primary node and the new primary node are not needed. Data migration between them enables fast active/standby switchover.
可选的, 每个基础形中的第二叶子节点 c是用于连接各节点的退化数据 的,该退化数据指的是在当前时间点之前的时间点该节点运行的元数据集合; 存储该退化数据的基础形可以称为 "退化基础形" , 本发明后续的实施例中 提到的 "退化基础形" 也均是指的用于存储节点之前时间点运行的元数据集 合的基础形。 如图 1所示, 基础形 11 (也可以称为当前基础形 11 )的第二叶 子节点 c连接节点 1对应的退化基础形 13 ,该退化基础形 13存储的例如是节 点 1在第一时间点运行的元数据集合, 该第一时间点早于当前时间点; 退化 基础形 13的第二叶子节点 c连接存储节点 1在第二时间点运行的元数据集合 的另一退化基础形 15 , 该第二时间点早于第一时间点; 基础形 12的第二叶 子节点 c连接节点 2对应的退化基础形 14。 由上可知, 基础形中的第二叶子 节点 c是针对单个的节点, 将该节点各时间点运行的元数据集合联系起来的 连接接口, 该各时间点包括当前时间点、 以及当前时间点之前的时间点。 Optionally, the second leaf node c in each basic shape is used to connect the degraded data of each node, where the degraded data refers to a metadata set run by the node at a time point before the current time point; The basic shape of the degraded data may be referred to as a "degenerate basic shape", in a subsequent embodiment of the present invention The "degenerate basic shape" mentioned also refers to the basic form of the metadata set used to store the point in time before the node is stored. As shown in FIG. 1, the second leaf node c of the base shape 11 (which may also be referred to as the current base shape 11) is connected to the degenerate base shape 13 corresponding to the node 1, and the degenerate base shape 13 stores, for example, the node 1 at the first time. a set of metadata running, the first time point is earlier than the current time point; the second leaf node c of the degraded basic shape 13 is connected to another degenerate basic shape 15 of the metadata set of the storage node 1 running at the second time point, The second time point is earlier than the first time point; the second leaf node c of the base 12 is connected to the degenerate base shape 14 corresponding to the node 2. As can be seen from the above, the second leaf node c in the basic shape is a connection interface that links the metadata set running at each time point of the node for a single node, and the time points include the current time point and before the current time point. Time point.
需要说明的是, 上述的退化数据是根据实际需要决定保存的数据, 例如, 节点 1当前运行的元数据在时间点 tl进行了修改, 又在时间点 t2和 t3进行 了修改, t3晚于 t2, t2晚于 tl , 则可以根据实际需要, 选择将时间点 tl修改 后的数据和时间点 t2修改后的数据都保存, 各自存储在一退化基础形中, 并 且按照上述图 1的连接规则进行连接; 也可以选择仅保存时间点 tl修改后的 数据。 如图 1所示, A代表的是节点 1的元数据集合, 包括节点 1的当前运 行的元数据集合、以及节点 1在当前时间点之前的两个时间点的元数据集合, B代表的是节点 2的元数据集合, 包括节点 2的当前运行的元数据集合、 以 及节点 2在之前一个时间点的元数据集合; A和 B是通过节点 1和节点 2对 应的当前基础形连接的,并且是表示节点 2对应当前基础形 12中的顶点 a (表 示节点 2的当前运行的元数据集合)连接在节点 1对应当前基础形 11中的第 一叶子节点 b (连接接口) 。 集群中的其他节点也均是按照上述的连接方式 组织元数据的。  It should be noted that the above-mentioned degradation data is determined according to actual needs. For example, the metadata currently running by node 1 is modified at time t1, and is modified at time points t2 and t3, t3 is later than t2. After t2 is later than tl, the data modified by the time point tl and the data modified by the time point t2 are saved according to actual needs, and are respectively stored in a degenerate basic shape, and are performed according to the connection rule of FIG. 1 above. Connection; you can also choose to save only the modified data at time point tl. As shown in FIG. 1, A represents a metadata set of node 1, including a currently running metadata set of node 1, and a metadata set of node 1 at two time points before the current time point, and B represents The metadata set of node 2, including the currently running metadata set of node 2, and the metadata set of node 2 at a previous point in time; A and B are connected by the current basic shape corresponding to node 1 and node 2, and It is indicated that the node 2 corresponds to the vertex a in the current base shape 12 (the metadata set indicating the current operation of the node 2) and is connected to the first leaf node b (connection interface) corresponding to the current base shape 11 of the node 1. The other nodes in the cluster also organize metadata according to the above connection method.
此外, 图 1中是以当前基础形在基础形底边的第一叶子节点连接, 可选 的, 也可以在基础形底边的第二叶子节点连接。 退化数据也是一个节点内部 的完整的元数据集合, 只是退化指的是该数据并不是当前正在运行的, 而是 当前时间点之前的时间点运行的数据; 例如, 当前运行的元数据发生更改后, 更改之前的数据就可以存储在退化基础形中。 图 1中是以节点 1具有两层退 化(即两个退化基础形) 为例, 在实际应用中可以根据需求配置更多层次的 退化基础形; 当然, 退化层次越多, 对应的时间点也越多, 但是其占用的存 储空间也越大。 如图 1所示, 在该元数据堆中, 节点 2可以称为是节点 1的 "逻辑相邻 节点" , 该 "逻辑相邻节点" 指的是在元数据堆中相连接的两个节点互为逻 辑相邻节点, 例如, 图 1中的节点 1是节点 2的逻辑相邻节点, 节点 2也是 节点 1的逻辑相邻节点, 若该集群中还包括节点 3 , 节点 3连接在节点 2对 应的当前基础形 12的第一叶子节点, 则节点 3称为节点 2的逻辑相邻节点。 即, "逻辑相邻节点"是在元数据堆中表示节点之间连接关系所使用的定义, 其与各节点实际的物理连接无关。 In addition, in FIG. 1, the first leaf node is connected to the bottom edge of the base shape in the current basic shape, or alternatively, the second leaf node on the bottom edge of the base shape may be connected. Degraded data is also a complete set of metadata inside a node, except that degradation means that the data is not currently running, but data that is run at a point in time before the current point in time; for example, after the current running metadata has changed , the data before the change can be stored in the degenerate base shape. In Figure 1, the node 1 has two layers of degradation (that is, two degenerate basic shapes). In practice, more levels of degenerate basic shapes can be configured according to requirements. Of course, the more degraded levels, the corresponding time points are also The more, but the more storage space it occupies. As shown in FIG. 1, in the metadata heap, node 2 may be referred to as a "logical neighbor node" of node 1, and the "logical neighbor node" refers to two nodes connected in the metadata heap. Mutual logical neighbors, for example, node 1 in Figure 1 is a logical neighbor of node 2, node 2 is also a logical neighbor of node 1, if node 3 also includes node 3, node 3 is connected to node 2 Corresponding to the first leaf node of the current base shape 12, the node 3 is referred to as a logical neighbor node of the node 2. That is, a "logical neighbor" is a definition used in a metadata heap to represent a connection relationship between nodes, regardless of the actual physical connection of each node.
其中, 从图 1中也可以看到, 元数据堆中的各基础形可以包括当前基础 形和退化基础形, 并且, 各基础形均包括顶点、 第一叶子节点和第二叶子节 点; 但是, 当前基础形和退化基础形的上述三个节点的连接关系有所区别。 例如, 当前基础形 11的顶点 a表示该节点 1当前运行的元数据集合, 当前基 础形 11的第一叶子节点 b是用于连接节点 1的逻辑相邻节点 (节点 2 ) 的当 前基础形 12,当前基础形 11的第二叶子节点 c是用于连接节点 1 自身的上一 个时间点的元数据集合对应的退化基础形 13; 而退化基础形 13 , 其顶点 a表 示节点 1在所述上一时间点运行的元数据集合, 其第二叶子节点 c用于连接 节点 1更上一时间点运行的元数据集合, 但是, 退化基础形 13的第一叶子节 点 b是不挂任何数据的。上述特点也是每个节点内部的元数据堆的构成规则; 并且, 每个节点内部都是以节点自身的当前基础形作为元数据堆的顶层的, 退化基础形在元数据堆不一定存在, 例如, 新建立的集群或者没有任何的元 数据修改的集群就可能没有退化基础形。  It can also be seen from FIG. 1 that each basic shape in the metadata heap may include a current basic shape and a degenerate basic shape, and each basic shape includes a vertex, a first leaf node, and a second leaf node; however, The connection relationship of the above three nodes of the current basic shape and the degenerate basic shape is different. For example, the vertex a of the current base shape 11 represents the metadata set currently running by the node 1, and the first leaf node b of the current base shape 11 is the current base shape of the logical neighboring node (node 2) for connecting the node 1 The second leaf node c of the current base shape 11 is a degenerate base shape 13 corresponding to the metadata set of the last time point of the connection node 1 itself; and the degenerate base shape 13 whose vertex a indicates that the node 1 is on the above A set of metadata running at a point in time, the second leaf node c is used to connect the metadata set run by the node 1 at a higher point in time, but the first leaf node b of the degraded basic shape 13 does not hang any data. The above characteristics are also the composition rules of the metadata heap inside each node; and, each node internally has the current basic shape of the node itself as the top layer of the metadata heap, and the degenerate basic shape does not necessarily exist in the metadata heap, for example Newly established clusters or clusters without any metadata modifications may not have degenerate base shapes.
图 2为本发明元数据管理方法实施例中的节点内元数据堆示意图, 该图 2是以一包括节点 1、 节点 2和节点 3这三个节点的集群为例, 并且是节点 1 内部的元数据堆示例。 需要说明的是, 该图 2仅示出了节点 1的两个层次的 退化数据, 即 Γ和 1 ", 其中, Γ是节点 1在第一时间点运行的元数据集合, 该第一时间点早于当前时间点, 1 " 是节点 1在第二时间点运行的元数据集 合, 该第二时间点早于第一时间点; 图 2示出了节点 2的一个层次的退化数 据, 即 2, ; 图 2还示出了节点 3当前运行的元数据集合, 但并没有示出节点 3的退化数据。 如上面所述的, 图 2仅仅是示例, 每次节点的退化层次不做 限制, 例如, 节点 2还可以具有第二层次的退化数据 2" , 节点 3也可以具 有退化数据。 将图 2与图 1进行比较, 实质上, 如果在图 1中的当前基础形 12的第一 叶子节点 b处连接一个节点 3的当前基础形, 则图 2和图 1是相同的; 图 2 仅仅是为了表示简洁, 将图 1中分散的多个基础形组合在一起。 例如, 图 1 中的退化基础形 13 , 即存储 Γ数据的基础形, 其在图 1中, 第一叶子节点 b 是没有连接任何数据的, 但是在图 2中, 将退化基础形 13的第一叶子节点 b 与节点 2的当前基础形 12的第二叶子节点 c进行了重叠处理, 如上所述, 这 样仅仅是为了使得元数据堆的表示方式更为简洁。 同理, 例如, 图 1中的退 化基础形 15的第一叶子节点 b也与节点 2的退化基础形 14的第二叶子节点 c进行了重叠处理。 其他重叠处理原理相同, 不再赘述。 在后续的实施例中, 也均以图 2所示的元数据堆形状进行说明。 2 is a schematic diagram of a metadata heap in a node in an embodiment of a metadata management method according to the present invention. FIG. 2 is an example of a cluster including three nodes of node 1, node 2, and node 3, and is internal to node 1. Metadata heap example. It should be noted that FIG. 2 only shows the degraded data of the two levels of the node 1, that is, Γ and 1 ", where Γ is the metadata set of the node 1 running at the first time point, the first time point Earlier than the current time point, 1 " is the metadata set of node 1 running at the second time point, the second time point is earlier than the first time point; Figure 2 shows the degradation data of one level of node 2, ie 2 Figure 2 also shows the metadata set currently running by node 3, but does not show the degraded data of node 3. As described above, FIG. 2 is merely an example, and the degradation level of each node is not limited. For example, the node 2 may also have the second level of degradation data 2", and the node 3 may also have degraded data. Comparing FIG. 2 with FIG. 1, substantially, if the current basic shape of a node 3 is connected at the first leaf node b of the current basic shape 12 in FIG. 1, FIG. 2 and FIG. 1 are the same; FIG. Just for the sake of brevity, the plurality of basic shapes dispersed in Fig. 1 are combined. For example, the degenerate basic shape 13 in Fig. 1, that is, the basic shape storing the data, in Fig. 1, the first leaf node b is not connected to any data, but in Fig. 2, the first base of the degenerate basic shape 13 A leaf node b is overlapped with the second leaf node c of the current base shape 12 of the node 2, as described above, only to make the representation of the metadata heap more concise. For the same reason, for example, the first leaf node b of the degenerate base shape 15 in FIG. 1 is also overlapped with the second leaf node c of the degenerate base shape 14 of the node 2. The other overlapping processing principles are the same and will not be described again. In the subsequent embodiments, the metadata stack shape shown in FIG. 2 is also described.
如图 2所示, 该元数据堆中, 位于顶层的是存储节点 1当前运行的元数 据集合的当前存储形 11 ;该元数据堆从顶层开始最右侧的一列(以 C表示 ) , 就是当前集群中所有节点(包括节点 1、 节点 2和节点 3 )的元数据 (即所有 节点当前运行的元数据) 。 图 2中的 D表示的一列包括了存储节点 1的退化 数据 Γ 的退化基础形以及存储节点 2的退化数据 2, 的退化基础形, 为第一 层次的退化数据。 图 2中的存储节点 1的退化数据 1 " 的退化基础形 15为第 二层次的退化数据。 在后续的实施例中说明的元数据堆的性质与此类似。  As shown in FIG. 2, in the metadata heap, at the top level is the current storage shape of the metadata set currently running by the storage node 1; the metadata heap starts from the top layer and is the rightmost column (indicated by C), that is The metadata of all nodes in the current cluster (including node 1, node 2, and node 3) (that is, the metadata currently running on all nodes). A column indicated by D in Fig. 2 includes the degenerate basic shape of the degraded data Γ of the storage node 1 and the degraded basic shape of the storage node 2, and is the degraded data of the first hierarchy. The degenerate base form 15 of the degraded data 1 " of the storage node 1 in Fig. 2 is the degraded data of the second level. The nature of the metadata pile explained in the subsequent embodiment is similar.
以上详细说明了节点内的元数据堆的构成原理, 描述了是如何通过元数 据堆组织元数据的。 下面的实施例中, 将在此基础上, 具体介绍如何根据元 数据堆进行元数据的管理, 例如包括, 以上述元数据堆形式存储的元数据, 是如何在集群各节点保持一致性的, 集群是如何实现分裂和组合的, 集群是 如何实现元数据的冗余和容错的, 等等。  The above details the structure of the metadata heap within a node and describes how metadata is organized through the metadata heap. In the following embodiments, on the basis of this, how to manage the metadata according to the metadata heap is specifically introduced, for example, how the metadata stored in the above metadata heap is consistent in each node of the cluster, How clusters are split and combined, how clusters implement metadata redundancy and fault tolerance, and so on.
实施例一  Embodiment 1
图 3为本发明元数据管理方法一实施例的流程示意图, 该方法可以是集 群中某个节点执行, 本实施例的方法为简单说明, 具体的原理可以参见如上 所述的节点内元数据堆的构成方式。 如图 3所示, 该方法可以包括:  FIG. 3 is a schematic flowchart of a method for managing a metadata according to an embodiment of the present invention. The method may be performed by a node in a cluster. The method in this embodiment is a simple description. For the specific principle, refer to the intra-node metadata heap as described above. The way of composition. As shown in FIG. 3, the method may include:
301、 将元数据存储为元数据堆, 所述元数据包括自身节点当前运行的元 数据集合以及集群中自身之外的所有节点当前运行的元数据集合;  301. Store metadata as a metadata heap, where the metadata includes a metadata set currently running by the node and a metadata set currently running by all nodes except the cluster itself;
其中, 所述元数据堆包括至少两个基础形, 每个基础形是由顶点、 第一 叶子节点和第二叶子节点构成的二叉^"形状。 所述将元数据存储为元数据堆, 包括: The metadata heap includes at least two basic shapes, each of which is a binary shape formed by a vertex, a first leaf node, and a second leaf node. The storing the metadata as a metadata heap includes:
将所述自身当前运行的元数据集合存储在位于所述元数据堆顶层的当前 基础形, 所述当前基础形的第一叶子节点用于连接存储另一节点当前运行的 元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子节点用于连 接存储再一节点当前运行的元数据集合的再一当前基础形, 依此类推至遍历 所述集群中的所有节点;  Storing the metadata set currently running by itself in a current basic shape located at a top level of the metadata heap, where the first leaf node of the current basic shape is used to connect another one storing a metadata set currently running by another node a current basic shape, the first leaf node of the another current basic shape is used to connect and store another current basic shape of the metadata set currently running by the node, and so on to traverse all the nodes in the cluster;
用于存储集群中各节点当前运行的元数据集合的各当前基础形中, 每个 所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点在第一 时间点运行的元数据集合的退化基础形, 所述第一时间点早于所述当前; 所 述退化基础形的第二叶子节点用于连接所述节点在第二时间点运行的元数据 集合的另一退化基础形, 所述第二时间点早于第一时间点, 依此类推。  a second leaf node of each of the current basic shapes is used to connect the node corresponding to the current basic shape to run at a first time point in each current basic shape for storing a metadata set currently running by each node in the cluster. a degenerate base shape of the metadata set, the first time point being earlier than the current; the second leaf node of the degenerate base shape is used to connect another degradation of the metadata set of the node running at a second time point The base shape, the second time point is earlier than the first time point, and so on.
302、 根据所述元数据堆进行所述元数据的管理。  302. Perform management of the metadata according to the metadata heap.
其中, 所述的元数据的管理, 例如包括, 集群各节点进行一致性同步时 的元数据管理, 集群分裂和组合时的元数据管理, 集群元数据的冗余和容错 管理等。  The management of the metadata includes, for example, metadata management when the nodes of the cluster perform consistency synchronization, metadata management during cluster splitting and combining, redundancy of cluster metadata, and fault-tolerant management.
在本发明实施例中, 当集群中的主节点下电时, 所述方法还可以包括: 根据预定的规则确定所述当前节点为主节点; 将所述当前节点的主备用标识 符修改为主用。具体的,该主备用标识符可以是存储节点内部的 Flag标识符, 将该标识符修改为主用以后, 该节点即成为主节点, 可以对集群中的元数据 进行管理操作, 由于该节点中同样保存有元数据堆的结构, 该元数据堆中存 储有集群中所有节点的元数据集合, 因此避免了现有技术中的数据迁移  In the embodiment of the present invention, when the primary node in the cluster is powered off, the method may further include: determining, according to a predetermined rule, that the current node is a master node; modifying a primary and secondary identifier of the current node as a primary use. Specifically, the primary and secondary identifiers may be Flag identifiers inside the storage node. After the identifier is modified for use as a primary node, the node becomes a primary node, and the metadata in the cluster may be managed, because the node is in the node. The same holds the structure of the metadata heap, which stores the metadata collection of all the nodes in the cluster, thus avoiding the data migration in the prior art.
本实施例的元数据管理方法, 通过在每个节点内部都保存有集群中所有 节点的当前节点运行的元数据集合, 使得在进行主备切换时, 不需要在主备 节点之间进行数据迁移, 实现了集群中主备切换的快速执行。  The metadata management method in this embodiment does not need to perform data migration between the active and standby nodes when performing the active/standby switchover by storing the metadata set of the current node running in all the nodes in the cluster. , realizes the fast execution of the active/standby switchover in the cluster.
此外, 由于在每个节点内部均存储集群中所有节点的元数据集合, 包括 当前运行的元数据集合以及之前时间点运行的元数据集合(即退化数据) , 所以可以在每个节点的元数据堆检索集群任何一个节点的元数据; 并且, 如 果存储有退化数据, 还可以检索集群的退化数据, 同样也是整个集群所有节 点的退化数据, 本实施例的元数据组织结构使得元数据的检索非常方便。  In addition, since the metadata set of all the nodes in the cluster is stored inside each node, including the currently running metadata set and the metadata set running at the previous time point (ie, degraded data), the metadata at each node can be The heap retrieves the metadata of any node of the cluster; and if the degraded data is stored, the degraded data of the cluster can also be retrieved, which is also the degraded data of all the nodes of the entire cluster. The metadata organization structure of the embodiment makes the retrieval of the metadata very Convenience.
实施例二 图 4为本发明元数据管理方法另一实施例的原理示意图, 本实施例是说 明集群内各节点是如何保持数据的一致性的,即配置数据在各个节点的同步。 Embodiment 2 FIG. 4 is a schematic diagram of the principle of another embodiment of the metadata management method of the present invention. This embodiment is to explain how each node in the cluster maintains data consistency, that is, synchronization of configuration data at each node.
如图 4所示, 该集群中包括四个节点, 分别为节点 1、 节点 2、 节点 3和 节点 4; 并对应显示了各节点内部的元数据堆形态, 每个元数据堆顶层所标 识的都是本节点当前运行的元数据集合。 需要说明的是, 参见图 4, 例如, 节点 1的元数据堆中存储了节点 1的两层退化数据 1,和 1 " , 而在节点 3的 元数据堆中只存储了节点 1的一层退化数据 Γ ,在节点 2的元数据堆中没有 存储节点 1的退化数据; 这是因为图 4仅仅为示例, 如上面已经说明过的, 在节点 2和节点 3中也可以存储节点 1的两层退化数据 1,和 1 "。  As shown in FIG. 4, the cluster includes four nodes, namely node 1, node 2, node 3, and node 4; and correspondingly displays the metadata heap form inside each node, which is identified by the top layer of each metadata pile. It is a collection of metadata currently running on this node. It should be noted that, referring to FIG. 4, for example, the two layers of degraded data 1 of node 1 and 1 " are stored in the metadata heap of node 1, and only one layer of node 1 is stored in the metadata heap of node 3. Degraded data Γ, there is no storage node 1 degradation data in the metadata heap of node 2; this is because FIG. 4 is only an example, as already explained above, two nodes 1 can also be stored in node 2 and node 3. Layer degradation data 1, and 1".
通常情况下, 本实施例可以设置为, 本节点自身的退化数据必须保存, 例如, 节点 1的元数据堆必然存储 1,和 1 " , 但是可以设置可选保存其他节 点的退化数据, 例如, 节点 1可以选择性的保存节点 3的退化数据, 实际节 点 3有两层退化数据 3,和 3" , 但是节点仅保存了一层退化数据 3, ; 因为, 即使不保存其他节点的退化数据,也可以从其他节点自身的元数据堆获取到。 综上, 本实施例可以设置, 各节点对于自身的当前运行的元数据、 以及自身 的退化数据必须保存, 对于其他节点的当前运行的元数据也是必须保存, 而 对于其他节点的退化数据则为可选保存。  In general, the embodiment may be configured such that the degraded data of the node itself must be saved. For example, the metadata heap of the node 1 must store 1, and 1 ", but the degraded data of other nodes may be optionally saved, for example, Node 1 can selectively save the degraded data of node 3, the actual node 3 has two layers of degraded data 3, and 3", but the node only saves one layer of degraded data 3; because, even if the degraded data of other nodes is not saved, It can also be obtained from the metadata heap of other nodes themselves. In summary, this embodiment can be set, each node must save the metadata of its current operation and its own degraded data, and the metadata of the current running of other nodes must also be saved, and the degraded data for other nodes is Optional save.
此外, 退化数据是按照时间顺序依次存储的; 举例如下: 节点 1内的元 数据堆顶点存储的是当前运行的元数据集合, 存储的是在时间点 T1运行 的元数据集合, 1 "存储的是在时间点 T2运行的元数据集合, 则当前时间点 (例如 10点) ——时间点 T1 (例如 9点) ——时间点 T2 (例如 8点)这三 个时间是依次往前的, 则如果当前的元数据集合发生变化并且需要保存, 则 在当前时间点 ( 10点)运行的元数据集合需要存储在 , 原来在 存储 的时间点 T1 ( 9点)的元数据集合就要往后退, 存储至 1 " , 同理, 原来在 1 " 存储的时间点 T2 ( 8点) 的元数据集合也要往后退, 存储至可以是新建立的 一个退化基础形 1 "'。  In addition, the degraded data is stored in chronological order; for example: The metadata heap vertex in node 1 stores the currently running metadata set, and stores the metadata set running at time point T1, 1 "stored Is the metadata set running at time point T2, then the current time point (for example, 10 points) - time point T1 (for example, 9 points) - time point T2 (for example, 8 points), the three times are sequentially forward, Then, if the current metadata set changes and needs to be saved, the metadata set running at the current time point (10 o'clock) needs to be stored, and the metadata set originally stored at the time point T1 (9 o'clock) is going backward. , stored to 1 ", the same reason, the original metadata collection at the time point T2 (8 points) of 1 " is also going backwards, stored to a newly established degenerate base shape 1 "'.
此外, 具体保存几层的退化数据, 也可以自主设定。 例如, 仍然上面的 例子为例, 如果预先设定的是保存两层的退化数据即 、 1 " , 则就不需要 再新建一个退化基础形 1 ",了, 原来在 1 " 存储的时间点 T2 ( 8点) 的元数 据集合将被直接丟弃即可。 另外, 为了后续方便查找退化数据, 在保存时可 以将该退化数据对应的运行时间点也一并保存, 例如, 上述的时间点 T1 (例 如 9点) 、 时间点 T2 (例如 8点)是需要存储的。 In addition, the specific preservation of several layers of degraded data can also be set autonomously. For example, still the above example is an example. If the pre-set is to save the two layers of degraded data, ie 1 ", then there is no need to create a new degenerate base shape 1 ", and the original time point T2 stored at 1" The metadata set (8 points) will be discarded directly. In addition, for the convenience of searching for degraded data, it can be saved. The running time points corresponding to the degradation data are also saved together. For example, the above-mentioned time points T1 (for example, 9 points) and time points T2 (for example, 8 points) are required to be stored.
下面说明这四个节点是如何进行元数据的同步: 集群中通常包括主节点 和备节点, 并且如果某部分元数据需要进行更改, 一般是从主节点开始, 即 主节点首先更改。  The following describes how the four nodes synchronize metadata: The cluster usually includes the primary node and the standby node, and if some part of the metadata needs to be changed, it usually starts from the primary node, that is, the primary node changes first.
假设图 4所示的集群中, 节点 1为主节点, 和 I/O有关的元数据的更改 从节点 1发起, 到各个节点; 集群中的各节点之间都是可以互相通信的, 节 点 1可以沿着一定的逻辑顺序依次更改其他节点中对应的 I/O元数据, 或者 也可以并发进行各个其他节点的更改。 上述进行的称为 "首层同步" , 即各 节点内的元数据堆的顶层进行元数据的配置更改。  Assume that in the cluster shown in Figure 4, node 1 is the master node, and the metadata related to I/O is initiated from node 1 to each node; each node in the cluster can communicate with each other, node 1 The corresponding I/O metadata in other nodes may be sequentially changed in a certain logical order, or the changes of each other node may be performed concurrently. The above is called "first layer synchronization", that is, the top level of the metadata heap in each node is configured to change the metadata.
在首层同步完成后, 每个节点的元数据堆进行内部更新同步: 例如, 在 节点 1内部以 1->2->3->4同步本节点元数据堆中的其他节点元数据, 在节点 2内部以 2->3->4->1同步本节点元数据堆中的其他节点元数据, 以此类推。 此处所述的元数据堆的内部更新指的是, 例如, 首层同步之后, 节点 2、 节 点 3和节点 4的元数据堆顶层存储的元数据发生了变更, 当前运行的元数据 集合已经不再是首层同步之前的元数据集合了, 因此, 节点 1内部的元数据 堆中与节点 2、 节点 3和节点 4分别对应的当前基础形 (在图 4中以 "1" 、 After the first layer synchronization is completed, the metadata heap of each node is internally updated synchronously: For example, in node 1, 1->2->3->4 synchronizes other node metadata in the node metadata heap, Node 2 internally synchronizes other node metadata in the node metadata heap with 2->3->4->1, and so on. The internal update of the metadata heap described here means that, for example, after the first layer synchronization, the metadata stored in the top layer of the metadata heap of node 2, node 3, and node 4 is changed, and the currently running metadata set has been It is no longer the metadata set before the first layer synchronization. Therefore, the current base shape corresponding to node 2, node 3, and node 4 in the metadata heap inside node 1 ("1" in Figure 4,
"2" 、 "3" 和 "4" 所在的基础形表示)也必须对应更新, 才能保持其中的 数据与对应节点内当前运行的数据一致。 The basic representation of "2", "3", and "4" must also be updated to keep the data in line with the data currently running in the corresponding node.
以节点 1的内部更新为例: 由于集群中的各节点之间是相互通信连接的, 节点 1能够获取集群中自身之外的第一节点 (该第一节点指的是节点 2、 或 者节点 3、 或者节点 4 ) 当前运行的更新后的元数据集合, 可以是从第一节点 内的元数据堆顶层的当前基础形中获取。 然后, 将获取的所述更新后的元数 据集合, 存储在节点 1 自身的元数据堆中与第一节点对应的当前基础形中; 例如, 节点 1从节点 2的元数据堆顶层获取更新数据后, 存储至自身元数据 堆的标识 "2" 对应的当前基础形 20中, 其他类似。  Taking the internal update of node 1 as an example: Since each node in the cluster is in communication with each other, node 1 can acquire the first node other than itself in the cluster (the first node refers to node 2, or node 3) , or node 4) The currently running updated metadata set can be obtained from the current base shape at the top of the metadata heap in the first node. Then, the obtained updated metadata set is stored in the current basic shape corresponding to the first node in the metadata heap of the node 1; for example, the node 1 obtains the update data from the top of the metadata heap of the node 2 After that, the identifier "2" stored in the own metadata heap corresponds to the current base shape 20, and the others are similar.
可选的, 各节点还可以保存其他节点的退化数据。 例如, 节点 2的元数 据堆顶层更新为新的元数据集合后, 在首层同步之前存储在顶层的元数据集 合则成为当前时间点之前的时间点的退化数据, 节点 2会将该退化数据保存 在其自身的元数据堆中的退化基础形中, 例如存储在标识 2,所在的退化基础 形 21中。 节点 1可以从节点 2的退化基础形 21中获取所述退化数据, 并存 储在节点 1 自身的元数据堆中, 具体是存储在与节点 2对应的退化基础形 22 中,该退化基础形 22连接在节点 2对应的当前基础形 20的第二叶子节点(从 图 4中看即为左节点) 。 如果节点 1内部之前没有节点 2的退化基础形, 则 节点 1新建退化基础形 22, 将退化数据存储, 并连接到当前基础形 20的第 二叶子节点。 Optionally, each node can also save the degraded data of other nodes. For example, after the top level of the metadata heap of node 2 is updated to a new metadata set, the metadata set stored at the top level before the first layer synchronization becomes the degraded data at the time point before the current time point, and node 2 will degrade the data. Saved in the degenerate base shape in its own metadata heap, for example stored in the identity 2, where the degradation is based Shape 21. The node 1 can acquire the degradation data from the degenerate basic shape 21 of the node 2 and store it in the metadata heap of the node 1 itself, specifically in the degenerate basic shape 22 corresponding to the node 2, the degenerate basic shape 22 The second leaf node of the current base shape 20 corresponding to the node 2 (the left node is seen from FIG. 4). If node 1 has no degenerate basic shape of node 2 before, node 1 newly creates a degenerate base shape 22, stores the degraded data, and connects to the second leaf node of the current base shape 20.
经过以上的处理, 实现了集群各节点的数据同步一致, 并且本实施例的 数据一致性的实现很简单。  Through the above processing, the data synchronization of each node of the cluster is realized, and the data consistency of the embodiment is simple to implement.
实施例三  Embodiment 3
图 5为本发明元数据管理方法再一实施例的原理示意图, 本实施例是说 明元数据堆是如何实现集群分裂的。  FIG. 5 is a schematic diagram of another embodiment of a metadata management method according to the present invention. This embodiment illustrates how the metadata heap implements cluster splitting.
如图 5所示, 仍然以包括四个节点的集群为例, 该集群包括节点 1、 节 点 2、 节点 3和节点 4, 该集群要分裂为两个二节点集群, 包括节点 1和节点 2组成的新集群, 以及节点 3和节点 4组成的新集群。 集群在分裂后, 由于 新集群内的节点发生变动, 所以新集群内的各节点内部的元数据堆必然也需 要变更; 例如, 在节点 1和节点 2组成的新集群中, 该新集群不包括节点 3 和节点 4, 则在节点 1内部的元数据堆中, 节点 3对应的当前基础形是不应 该与节点 2对应的当前基础形连接的, 因为在元数据堆中相连接的两个当前 基础形对应的节点是属于同一个集群的, 所以需要断开节点 3和节点 2对应 的当前基础形的连接。  As shown in FIG. 5, the cluster still includes four nodes, and the cluster includes node 1, node 2, node 3, and node 4. The cluster is split into two two-node clusters, including node 1 and node 2. New cluster, and a new cluster of nodes 3 and 4. After the cluster is split, because the nodes in the new cluster change, the metadata heap inside each node in the new cluster must also be changed. For example, in the new cluster consisting of node 1 and node 2, the new cluster does not include Node 3 and Node 4, then in the metadata heap inside Node 1, the current base shape corresponding to Node 3 is the current base shape that should not be connected to Node 2, because the two current connections in the metadata heap are connected. The nodes corresponding to the basic shape belong to the same cluster, so it is necessary to disconnect the current basic shape corresponding to the node 3 and the node 2.
具体的, 参见图 5 , 在集群分裂时, 节点 1和节点 2内部的元数据堆中, 需要对节点 3和节点 4对应的元数据集合进行切分, 因为节点 3和节点 4已 经不再属于节点 1和节点 2构成的新集群; 并且, 要将节点 1和节点 2对应 的元数据集合进行连接, 因为在新集群中, 这两个节点是相互通信的, 在元 数据堆中也要连接起来。  Specifically, referring to FIG. 5, when the cluster is split, in the metadata heap inside the node 1 and the node 2, the metadata set corresponding to the node 3 and the node 4 needs to be segmented, because the node 3 and the node 4 no longer belong to a new cluster consisting of node 1 and node 2; and, to connect the metadata set corresponding to node 1 and node 2, because in the new cluster, the two nodes communicate with each other, and also connect in the metadata heap. stand up.
举例说明: 在节点 1中, 断开节点 3的当前基础形 31与节点 2的当前基 础形 32的连接,该当前基础形 31之前是通过当前基础形 32的第一叶子节点 b连接的, 分裂时就断开此处的连接即可(如图 5中所示的切分线即表示断 开连接) , 当然, 前面已经说明过的, 节点 3的退化数据所在的退化基础形 33与节点 2的退化基础形 34之间实际上是没有连接的, 图中只是显示将退 化基础形 33的第二叶子节点和退化基础形 34的第一叶子节点重叠而已; 随 着当前基础形 32的第一叶子节点 b处的连接断开, 整个切分线右侧的节点 3 和节点 4对应的基础形都要从该元数据堆中去除, 即不再属于该元数据堆, 因为节点 3和节点 4不再属于节点 1所在的新集群。 For example: In node 1, the connection between the current basic shape 31 of the node 3 and the current basic shape 32 of the node 2 is disconnected, and the current basic shape 31 is connected by the first leaf node b of the current basic shape 32, splitting When the connection here is disconnected (the slash line shown in FIG. 5 means disconnection), of course, as described above, the degenerate basic shape 33 and the node 2 where the degradation data of the node 3 is located The degenerate base shape 34 is actually not connected, the figure just shows that it will retreat The second leaf node of the basic shape 33 overlaps with the first leaf node of the degenerate basic shape 34; as the connection at the first leaf node b of the current base shape 32 is broken, the node 3 on the right side of the entire slice line and The basic shape corresponding to node 4 is removed from the metadata heap, that is, it no longer belongs to the metadata heap, because node 3 and node 4 no longer belong to the new cluster where node 1 is located.
再举例说明: 在节点 2中, 同理需要断开节点 3的当前基础形 31与节点 For further example: In node 2, the same basic shape 31 and node of node 3 need to be disconnected.
2 (可称为第二节点) 的当前基础形 32在第一叶子节点 b处的连接; 此外, 还需要断开节点 4的当前基础形 35与节点 1 (可称为第三节点) 的当前基础 形 36在第一叶子节点 b处的连接, 因为节点 4也不再属于节点 1所属的新集 群, 其与节点 1之间不再存在通信连接。 然后, 参见图 5, 还需要将节点 1 的当前基础形 36连接在节点 2的当前基础形 32的第一叶子节点 b处, 因为 此时节点 1和节点 2组成新集群,所以这两个节点的当前基础形是要连接的。 2 (which may be referred to as the second node) the connection of the current base shape 32 at the first leaf node b; in addition, the current base shape 35 of the node 4 and the current state of the node 1 (which may be referred to as the third node) are also required to be disconnected The connection of the base shape 36 at the first leaf node b, since the node 4 no longer belongs to the new cluster to which the node 1 belongs, there is no longer a communication connection with the node 1. Then, referring to FIG. 5, it is also necessary to connect the current base shape 36 of the node 1 at the first leaf node b of the current base shape 32 of the node 2, because at this time, the node 1 and the node 2 form a new cluster, so the two nodes The current basic shape is to be connected.
对于节点 3和节点 4组成的新集群中的元数据堆的分裂, 原理与节点 1 和节点 2相同, 不再赘述, 可以参见图 5所示。 由此可以看到, 元数据堆可 以以基础形为最小粒度, 进行切割分形。  For the splitting of the metadata heap in the new cluster consisting of node 3 and node 4, the principle is the same as that of node 1 and node 2, and will not be described again. See Figure 5. It can be seen that the metadata heap can be cut and fractal with the basic shape as the minimum granularity.
实施例四  Embodiment 4
图 6为本发明元数据管理方法又一实施例的原理示意图, 本实施例是说 明元数据堆是如何实现集群组合的。  FIG. 6 is a schematic diagram of another embodiment of a metadata management method according to the present invention. This embodiment is to explain how the metadata heap implements cluster combination.
如图 6所示, 以两个二节点的集群组合为一个四节点的集群为例, 例如, 节点 1和节点 2组成的集群, 以及节点 3和节点 4组成的集群, 这两个集群 组合成新集群, 该新集群包括节点 1、 节点 2、 节点 3和节点 4, 相当于图 5 所示实施例的反过程。  As shown in Figure 6, a cluster of two two-node clusters is used as a four-node cluster. For example, a cluster consisting of node 1 and node 2, and a cluster consisting of node 3 and node 4, the two cluster groups. A new cluster is synthesized, which includes node 1, node 2, node 3, and node 4, which is equivalent to the reverse process of the embodiment shown in FIG.
在集群组合时, 参见图 6, 需要将节点 3和节点 4的元数据集合增加到 节点 1和节点 2的元数据堆中, 也需要将节点 1和节点 2的元数据集合增加 到节点 3和节点 4的元数据堆中, 使得新集群中的各节点内部的元数据堆都 至少包括各节点的当前运行的元数据集合。  When clustering, referring to Figure 6, the metadata set of node 3 and node 4 needs to be added to the metadata heap of node 1 and node 2, and the metadata set of node 1 and node 2 needs to be added to node 3 and In the metadata heap of node 4, the metadata heap inside each node in the new cluster includes at least the currently running metadata set of each node.
举例说明: 在节点 1中, 节点 1可以从节点 3内部的元数据堆的顶层获 取节点 3当前运行的元数据集合, 从节点 4内部的元数据堆的顶层获取节点 4当前运行的元数据集合。 然后, 节点 1可以在自身的元数据堆建立一与节 点 3对应的当前基础形 41 , 与节点 4对应的当前基础形 42, 并将节点 3当前 运行的元数据集合存储在当前基础形 41中, 将当前基础形 41连接在节点 2 的当前基础形 43的第一叶子节点 b处(该第一叶子节点 b目前处于空闲尚未 连接 ); 将节点 4当前运行的元数据集合存储在当前基础形 42中, 将当前基 础形 42连接在节点 3的当前基础形 41的第一叶子节点 b处。 For example: In node 1, node 1 can obtain the metadata set currently running by node 3 from the top layer of the metadata heap inside node 3, and obtain the metadata set currently running by node 4 from the top level of the metadata heap inside node 4. . Then, the node 1 can establish a current basic shape 41 corresponding to the node 3, a current basic shape 42 corresponding to the node 4 in its own metadata heap, and store the current running metadata set of the node 3 in the current basic shape 41. , connect the current base shape 41 to node 2 At the first leaf node b of the current base shape 43 (the first leaf node b is currently idle and not yet connected); store the metadata set currently running by the node 4 in the current base shape 42, and connect the current base shape 42 At the first leaf node b of the current base shape 41 of the node 3.
节点 2、 节点 3和节点 4在集群组合时的元数据处理过程与上述类似, 不再赘述, 可以参见图 6的示意。 其中, 各节点对于其他节点的退化数据是 可选保存的; 例如, 在节点 1内, 可以选择保存节点 3的第一层次的退化数 据 3,, 在节点 2内, 可以选择保存节点 3的两层退化数据即 3,和 3" ; 当然, 也可以集群中的各节点都保存所有节点的当前运行元数据集合以及退化数据 的元数据集合, 使得各节点的元数据堆存储的数据都一致。  The metadata processing process of the node 2, the node 3, and the node 4 in the cluster combination is similar to the above, and will not be described again. See FIG. 6 for details. The degraded data of each node is optionally saved; for example, in the node 1, the degraded data 3 of the first level of the node 3 may be selected, and in the node 2, two of the nodes 3 may be selected for saving. The layer degradation data is 3, and 3"; of course, each node in the cluster can also save the current running metadata set of all nodes and the metadata set of the degraded data, so that the data stored in the metadata heap of each node is consistent.
经过以上的处理, 该元数据堆实现了对集群形态变换(例如分裂或者组 合等) 的支持; 并且, 由于该元数据堆是基于分形理论设计, 其基础形非常 利于组合和分割, 一个集群可以分割任意数量的节点, 都可以按照上述的原 理进行元数据堆的切分或者组合, 能够非常简单的实现集群的分割和组合。 此外, 基于分形理论管理元数据, 是没有节点限制的, 因为每个节点内的元 数据都是被分形存储的 (即存储在基础形中) , 所以理论上只要节点内具有 存储 "元数据堆" 的内存和磁盘空间, 集群节点是没有上限的, 而且由于可 以快速简单的实现集群的分割和组合, 在节点数量增加的条件下对集群性能 的影响很小。  After the above processing, the metadata heap implements support for cluster morphological transformation (such as splitting or combining); and, since the metadata heap is based on fractal theory design, its basic shape is very conducive to combination and segmentation, and a cluster can By dividing any number of nodes, the metadata heap can be split or combined according to the above principle, and the clustering and combination can be realized very simply. In addition, the management of metadata based on fractal theory is not node-limited, because the metadata in each node is stored fractally (that is, stored in the basic shape), so in theory, as long as there is a storage in the node "metadata heap" "The memory and disk space, cluster nodes have no upper limit, and because the clustering and combination can be quickly and easily implemented, the impact on cluster performance is small under the condition that the number of nodes increases.
实施例五  Embodiment 5
图 7为本发明元数据管理方法又一实施例的原理示意图, 本实施例是说 明元数据堆的快照实现原理。  FIG. 7 is a schematic diagram of another embodiment of a metadata management method according to the present invention. This embodiment is a snapshot implementation principle of the metadata heap.
如图 7所示, 在有些情况下, 比如节点 1内部的第二层次退化数据 51 , 该第二层次退化数据 51包括了集群各节点的第二层次退化数据(可以称为在 第二时间点运行的元数据集合) , 例如, 节点 1的退化数据 1" 、 节点 2的 退化数据 2"等,节点 1可以将第二层次退化数据 51作为一个整体进行存储, 此称为 "快照" 。 同理, 也可以存储第一层次退化数据 53 (可以称为在第一 时间点运行的元数据集合 ) 、 以及当前运行的元数据集合的整体的当前运行 数据 52; 将上述三个整体分别存储。  As shown in FIG. 7, in some cases, such as second level degradation data 51 inside the node 1, the second level degradation data 51 includes second level degradation data of each node of the cluster (may be referred to as a second time point) The running metadata set), for example, the degraded data 1" of the node 1, the degraded data 2 of the node 2, etc., the node 1 can store the second hierarchical degradation data 51 as a whole, which is called "snapshot". Similarly, the first level of degradation data 53 (which may be referred to as a metadata set running at a first point in time) and the current running data 52 of the currently running metadata set may also be stored; .
当需要运行该第二层次退化数据 51时, 即想要用第二层次退化数据 51 整体替换当前运行数据 52 (包括各节点当前运行的元数据集合) 时, 可以将 之前快照存储的第二层次退化数据 51移动到当前运行数据 52的位置, 即将 第二层次退化数据 51整体存储至元数据堆的各当前基础形中运行, 此称为 "前滚" (即数据移动到更靠前的时间点的位置) ; 或者, 也可以只将该第 二层次退化数据 51运行, 相当于暂时选定使用, 但不移动其位置, 仍然存储 在图 7所示的位置。 When it is necessary to run the second level degradation data 51, that is, if the entire operation data 52 (including the metadata set currently running by each node) is to be replaced by the second level degradation data 51 as a whole, The second level of degradation data 51 stored in the previous snapshot is moved to the position of the current running data 52, that is, the second level of the degraded data 51 is stored in the current basic shape of the metadata heap, which is called "rolling forward" (ie, data). Move to the position of the more advanced time point; or, it is also possible to operate only the second level degradation data 51, which is equivalent to temporarily selected use, but does not move its position, and is still stored in the position shown in FIG.
在上述第二层次退化数据 51整体替换当前运行数据 52, 且移动到当前 运行数据 52的位置的情况下, 可以将第二层次退化数据 51的移动称为 "前 滚" ; 对应的, 当前运行数据 52的位置必然也要移动, 例如移动到之前第二 层次退化数据 51的位置,即将元数据堆的各当前基础形中运行的元数据集合 的整体, 存储至元数据堆的第二层次退化数据 51对应的退化基础形中, 相当 于与第二层次退化数据 51互换位置, 可以将当前运行数据 52的位置移动称 为 "回滚" 。  In the case where the second hierarchical degradation data 51 as a whole replaces the current operational data 52 and moves to the position of the current operational data 52, the movement of the second hierarchical degradation data 51 may be referred to as "rolling forward"; corresponding, current operation The position of the data 52 must also be moved, for example, to the position of the previous second level of degraded data 51, that is, the entirety of the metadata set running in each of the current basic shapes of the metadata heap, stored to the second level of the metadata heap is degraded. In the degenerate basic shape corresponding to the data 51, the position corresponding to the second hierarchical degradation data 51 is interchanged, and the positional movement of the current operational data 52 can be referred to as "rollback".
通过上述的快照实现方式, 可以使得数据的回滚、 前滚等处理更加快速。 实施例六  Through the snapshot implementation described above, data rollback, rollforward, and the like can be processed more quickly. Embodiment 6
图 8为本发明元数据管理方法又一实施例的原理示意图一, 图 9为本发 明元数据管理方法又一实施例的原理示意图二, 本实施例是说明元数据堆是 如何实现冗余和容错的。  FIG. 8 is a schematic diagram 1 of another embodiment of a metadata management method according to the present invention. FIG. 9 is a schematic diagram of a second embodiment of a metadata management method according to the present invention. This embodiment illustrates how the metadata heap implements redundancy and Fault-tolerant.
参见图 8所示, 假设节点 1当前运行的元数据集合中 (即节点 1内部元 数据堆顶层的当前基础形 61中存储的数据), 有部分数据丟失或者损坏, 需 要进行修改, 可以将这部分数据称为 "待修复元数据" , 则节点 1可以首先 查看自身的退化基础形例如退化基础形 62, 看其中是否存储有与上述待修复 元数据对应的修复数据(即丟失或损坏之前的数据) ; 如果有, 则可以立即 获取该修复数据, 并将获取的修复数据替换自身对应的当前基础形中的待修 复元数据, 节点 1就实现了元数据的自身修复。  Referring to FIG. 8, assuming that the metadata set currently running in node 1 (that is, the data stored in the current basic shape 61 at the top of the internal metadata heap of node 1), some data is lost or damaged, and needs to be modified. Part of the data is called "to be repaired metadata", then node 1 can first view its own degenerate basic shape, such as degenerate basic shape 62, to see if there is repair data corresponding to the above-mentioned metadata to be repaired (ie, before loss or damage) Data); If there is, the repair data can be obtained immediately, and the acquired repair data is replaced with the metadata to be repaired in the current basic shape, and the node 1 realizes the self-repair of the metadata.
或者, 如果节点 1在其自身的退化基础形中不能找到所述的修复数据, 则可以直接从其他节点的元数据堆中获取, 例如, 参见图 9, 节点 1可以从 节点 2中的与节点 1对应的当前基础形 63中获取修复数据。  Alternatively, if node 1 cannot find the repair data in its own degenerate basic shape, it can be directly obtained from the metadata heap of other nodes. For example, referring to FIG. 9, node 1 can be from the node in node 2 The repair data is obtained in the corresponding current basic shape 63.
本实施例中, 各节点不仅对自身的元数据提供了冗余, 例如, 节点当前 运行的元数据集合还在自身的各退化基础形中也有存储, 并且, 在其他节点 也有存储, 集群中的其他节点例如节点 2都会存储该节点 1的当前运行的元 数据集合, 所以, 节点 1的元数据相当于具有多个备份, 节点 1可以通过多 种途径进行数据修复, 提高了数据的容错性和冗余性级别, 安全保障性更高。 In this embodiment, each node not only provides redundancy for its own metadata, for example, the metadata set currently running by the node is also stored in each of its degenerate basic shapes, and is also stored in other nodes, in the cluster. Other nodes, such as node 2, will store the currently running element of that node 1. Data collection, therefore, the metadata of node 1 is equivalent to having multiple backups. Node 1 can perform data repair in multiple ways, which improves the fault tolerance and redundancy level of data, and has higher security assurance.
实施例七  Example 7
本实施例主要说明采用元数据堆的管理方式可以使得数据的更改方式更 加灵活, 能够实现更加灵活的集群读 /写锁。  This embodiment mainly illustrates that the management method of the metadata heap can make the data change mode more flexible, and can implement a more flexible cluster read/write lock.
举例说明: 假设节点 1内部的元数据堆, 要对该节点 1的当前运行的元 数据集合中的部分数据进行修改, 则如果按照现有技术, 将对节点 1的该当 前运行的元数据集合加锁, 相当于中止了节点 1的当前运行, 其他节点也是 不能够对节点 1的当前运行数据进行读或写操作的, 直至修改完成后才能重 新恢复正常读写; 但是, 在本实施例中, 由于节点 1的当前运行数据已经在 退化基础形中也有冗余存储, 例如, 节点 1的第一层次的退化基础形中有这 部分将要修改的元数据, 则节点 1可以仅对该退化基础形进行加锁和修改, 其他节点不能够读写退化基础形中的数据, 而对节点 1当前基础形的存储数 据是没有影响的; 在修改完成后, 将修改后的新的元数据替换当前基础形中 的对应数据即可。  For example: Assume that the metadata heap inside node 1 is to modify part of the data in the currently running metadata set of node 1, if the currently running metadata set for node 1 is according to the prior art Locking, which is equivalent to aborting the current running of node 1, other nodes are also unable to read or write the current running data of node 1, and can not resume normal reading and writing until the modification is completed; however, in this embodiment Since the current running data of the node 1 is already redundantly stored in the degenerate basic shape, for example, the degenerate basic shape of the first level of the node 1 has the metadata to be modified in this part, then the node 1 may only base the degradation The shape is locked and modified. Other nodes cannot read and write the data in the degenerate basic shape, but have no effect on the stored data of the current basic shape of node 1. After the modification is completed, the modified new metadata is replaced with the current one. The corresponding data in the basic shape can be.
由上述可以看到, 本实施例的元数据堆的设计方式可以使得数据更改更 加灵活, 不会影响集群的运行。  As can be seen from the above, the metadata heap of this embodiment can be designed in such a way that the data change is more flexible and does not affect the operation of the cluster.
实施例八  Example eight
图 10为本发明元数据管理方法又一实施例的原理示意图,如图 10所示, 可以将集群各节点对应的当前基础形中存储的元数据集合存储在内存中, 将 退化基础形中存储的元数据集合存储在所述内存之外的存储介质, 以节省对 内存的占用。  10 is a schematic diagram of another embodiment of a metadata management method according to the present invention. As shown in FIG. 10, a metadata set stored in a current basic shape corresponding to each node of a cluster may be stored in a memory, and the degenerate basic shape may be stored. The metadata set is stored in a storage medium other than the memory to save memory usage.
例如, 参见图 10, 节点 1内部的元数据堆, 可以将各当前基础形中的元 数据存储在内存(图中的 Mem表示内存) , 而其他的退化数据, 即图 10中 的第一层次退化数据 71和第二层次退化数据 72是作为冗余和容错存在, 所 以可以放置在其他的慢速存储介质上, 例如, 将第一层次退化数据 71放置在 cache (高速緩冲存储器)上, 将第二层次退化数据 72放置在固态硬盘 ( solid state disk, 简称: SSD )或者 DISK (磁盘)上, 这样可以节省对内存的占用。 并且, 上述的存储方式是将不同退化层次的数据保存在不同的存储介质上。  For example, referring to Figure 10, the metadata heap inside node 1 can store the metadata in each current primitive in memory (Mem in the figure represents memory), while other degraded data, that is, the first level in Figure 10 The degraded data 71 and the second degraded data 72 exist as redundancy and fault tolerance, so they can be placed on other slow storage media, for example, the first hierarchical degradation data 71 is placed on a cache. The second level of degraded data 72 is placed on a solid state disk (SSD) or DISK (disk), which saves memory usage. Moreover, the above storage method is to store data of different degradation levels on different storage media.
实施例九 图 11为本发明元数据管理装置一实施例的结构示意图,该元数据管理装 置可以执行本发明任意实施例的元数据管理方法, 并且, 该元数据管理装置 例如相当于是设置于集群中的各个节点内的一控制模块, 该控制模块设置在 每个节点内, 可以用于存储集群的元数据, 也可以用于对元数据进行诸如集 群分裂、 组合或者修复时等的元数据管理。 本实施例对该装置的结构进行简 单说明, 其中每个功能单元的具体工作原理可以结合参见本发明的任意方法 实施例所述。 Example nine 11 is a schematic structural diagram of an embodiment of a metadata management apparatus according to the present invention. The metadata management apparatus can execute a metadata management method according to any embodiment of the present invention, and the metadata management apparatus is equivalent to, for example, each set in a cluster. A control module in the node, which is set in each node, can be used to store the metadata of the cluster, and can also be used for metadata management such as cluster splitting, combining or repairing the metadata. This embodiment briefly describes the structure of the device, wherein the specific working principle of each functional unit can be combined with any of the method embodiments of the present invention.
如图 11所示, 本实施例的元数据管理装置可以包括: 存储单元 91和管 理单元 92; 其中,  As shown in FIG. 11, the metadata management apparatus of this embodiment may include: a storage unit 91 and a management unit 92;
存储单元 91 , 用于将集群中的元数据存储为元数据堆, 所述元数据包括 当前节点自身当前运行的元数据集合以及除自身之外的集群中所有节点当前 运行的元数据集合, 所述元数据堆包括至少两个基础形, 每个基础形是由顶 点、 第一叶子节点和第二叶子节点构成的二叉树形状; 所述将元数据存储为 元数据堆, 包括:  The storage unit 91 is configured to store the metadata in the cluster as a metadata heap, where the metadata includes a metadata set currently running by the current node itself and a metadata set currently running by all nodes in the cluster other than itself. The metadata heap includes at least two basic shapes, each basic shape is a binary tree shape composed of a vertex, a first leaf node, and a second leaf node; and the storing the metadata as a metadata heap includes:
将所述当前节点自身当前运行的元数据集合存储在位于所述元数据堆顶 层的当前基础形, 所述当前基础形的第一叶子节点用于连接存储另一节点当 前运行的元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子节 点用于连接存储再一节点当前运行的元数据集合的再一当前基础形, 依此类 推直至连接所述集群中的所有节点;  Storing a metadata set currently running by the current node itself in a current basic shape at a top level of the metadata heap, where the first leaf node of the current basic shape is used to connect and store a metadata set currently running by another node. Another current basic shape, the first leaf node of the other current basic shape is used to connect to store another current basic shape of the metadata set currently running by another node, and so on until all nodes in the cluster are connected ;
用于存储集群中各节点当前运行的元数据集合的各当前基础形中, 每个 所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点在第一 时间点运行的元数据集合的退化基础形, 所述第一时间点早于所述当前时间 点; 所述退化基础形的第二叶子节点用于连接所述节点在第二时间点运行的 元数据集合的另一退化基础形, 所述第二时间点早于第一时间点, 依此类推; 管理单元 92, 用于根据所述元数据堆进行所述元数据的管理。  a second leaf node of each of the current basic shapes is used to connect the node corresponding to the current basic shape to run at a first time point in each current basic shape for storing a metadata set currently running by each node in the cluster. a degenerate basic shape of the metadata set, the first time point being earlier than the current time point; the second leaf node of the degenerate basic shape is used to connect another set of metadata of the node running at the second time point a degenerate basic shape, the second time point is earlier than the first time point, and so on; the management unit 92 is configured to perform management of the metadata according to the metadata heap.
进一步的, 该管理单元 92可以包括: 同步子单元 921 , 用于在对所述元 数据堆中的元数据进行更新时, 获取所述集群中自身之外的节点当前运行的 更新后的元数据集合; 将所述更新后的元数据集合, 存储在所述元数据堆中 与所述节点对应的当前基础形中。  Further, the management unit 92 may include: a synchronization subunit 921, configured to acquire, after updating the metadata in the metadata heap, updated metadata currently running by nodes other than the cluster itself And storing the updated metadata set in a current basic shape corresponding to the node in the metadata heap.
进一步的, 该管理单元 92可以包括: 存储子单元 922, 用于将所述元数 据堆中的与所述节点对应的退化数据, 存储在退化基础形中, 所述退化数据 是在所述更新后的元数据集合之前存储在所述当前基础形中的数据; 将所述 退化基础形连接在所述当前基础形的第二叶子节点。 Further, the management unit 92 may include: a storage subunit 922, configured to use the metadata Degraded data corresponding to the node in the heap, stored in a degenerate basic shape, the degraded data being data stored in the current basic shape before the updated metadata set; The base shape is connected to the second leaf node of the current base shape.
进一步的, 存储单元 91 , 具体用于将第一节点的当前基础形连接在第二 节点的当前基础形的第一叶子节点, 并将第一节点的当前基础形的第一叶子 节点连接第三节点的当前基础形; 相应的, 该管理单元 92可以包括: 形态控 制子单元 923; 该形态控制子单元 923用于在将所述集群中的当前节点自身 之外的所述另一节点从所述集群分裂出去时, 将所述另一节点对应的当前基 础形, 与所述当前节点的当前基础形的第一叶子节点断开连接, 并将所述另 一节点对应的所述另一当前基础形的第一叶子节点连接的再一节点对应的再 一当前基础形断开连接; 将所述再一节点的再一当前基础形连接在所述当前 节点的当前基础形的第一叶子节点; 还用于在向所述集群中加入新节点时, 从将要加入所述集群的新节点中存储的元数据堆中, 获取所述新节点的当前 运行的元数据集合; 在当前节点的元数据堆建立与所述新节点对应的当前基 础形, 并将所述新节点的当前运行的元数据集合存储在新节点对应的当前基 础形中, 将所述新节点的当前基础形与所述元数据堆中具有空闲的第一叶子 节点的当前基础形连接。  Further, the storage unit 91 is specifically configured to connect the current basic shape of the first node to the first leaf node of the current basic shape of the second node, and connect the first leaf node of the current basic shape of the first node to the third node. The current basic shape of the node; correspondingly, the management unit 92 may include: a morphological control sub-unit 923; the morphological control sub-unit 923 is configured to take the other node other than the current node itself in the cluster When the cluster is split, the current basic shape corresponding to the other node is disconnected from the first leaf node of the current basic shape of the current node, and the another current corresponding to the another node is And a further current basic shape corresponding to another node connected by the first leaf node of the basic shape; connecting another current basic shape of the further node to the first leaf node of the current basic shape of the current node Also used to acquire the current node of the new node from the metadata heap stored in the new node to be joined to the cluster when adding a new node to the cluster Running a metadata set; establishing a current basic shape corresponding to the new node in a metadata heap of the current node, and storing the currently running metadata set of the new node in a current basic shape corresponding to the new node, The current base shape of the new node is connected to a current base shape of the first leaf node having an idleness in the metadata heap.
进一步的, 该管理单元 92可以包括: 快照子单元 924、 选择子单元 925 和迁移子单元 926; 其中,  Further, the management unit 92 may include: a snapshot subunit 924, a selection subunit 925, and a migration subunit 926;
快照子单元 924, 用于分别存储所述集群中各节点在各时间点运行的元 数据集合的整体, 所述各时间点包括所述当前时间点、 第一时间点和第二时 间点, 以运行所述各时间点中的某个时间点对应的元数据集合。  The snapshot sub-unit 924 is configured to separately store the entirety of the metadata set running by each node in the cluster at each time point, where each time point includes the current time point, the first time point, and the second time point, Running a metadata set corresponding to a certain time point in each of the time points.
选择子单元 925 , 用于运行所述各时间点中的某个时间点对应的元数据 集合的整体, 所述某个时间点为所述当前之外的时间点, 且所述元数据集合 的整体存储在所述元数据堆的退化基础形中;  a sub-unit 925, configured to run an entirety of a metadata set corresponding to a certain time point in the each time point, where the certain time point is a time point other than the current time, and the metadata set is The whole is stored in the degenerate basic shape of the metadata heap;
迁移子单元 926, 用于将所述某个时间点对应的元数据集合的整体, 存 储至所述元数据堆的各当前基础形中运行; 还用于将所述元数据堆的各当前 基础形中运行的元数据集合的整体, 存储至所述元数据堆的退化基础形中。  The migration sub-unit 926 is configured to store the entirety of the metadata set corresponding to the certain time point in each current basic shape of the metadata heap; and to use the current basis of the metadata heap The entirety of the set of metadata running in the shape is stored in the degenerate base shape of the metadata heap.
进一步的, 该管理单元 92可以包括: 修复子单元 927, 用于当所述元数 据堆中当前节点自身对应的当前基础形中存在待修复元数据时, 从所述元数 据堆中当前节点自身对应的退化基础形中获取与所述待修复数据对应的修复 数据; 或者, 从所述集群中的其他节点的元数据堆的与自身对应的当前基础 形中获取所述待修复数据对应的修复数据; 将获取的所述修复数据替换所述 元数据堆中当前节点自身对应的当前基础形中的待修复元数据。 Further, the management unit 92 may include: a repair subunit 927, configured to: when there is metadata to be repaired in a current basic shape corresponding to the current node in the metadata heap, Acquiring the repair data corresponding to the data to be repaired according to the degenerate basic shape corresponding to the current node in the stack; or acquiring the current basic shape corresponding to the metadata heap of the other nodes in the cluster The repair data corresponding to the data to be repaired; the repair data to be repaired is replaced with the metadata to be repaired in the current basic shape corresponding to the current node in the metadata heap.
进一步的, 该管理单元 92可以包括: 存储控制子单元 928, 用于将所述 元数据堆中的与集群各节点对应的当前基础形中存储的元数据集合存储在内 存中, 将所述元数据堆的退化基础形中存储的元数据集合存储在所述内存之 外的存储介质。  Further, the management unit 92 may include: a storage control sub-unit 928, configured to store, in the memory, a metadata set stored in a current basic shape corresponding to each node of the cluster in the metadata heap, where the element is The set of metadata stored in the degenerate basic shape of the data heap is stored in a storage medium other than the memory.
进一步的, 该管理单元 92可以包括: 读写控制子单元 929, 用于从所述 元数据堆中的某一退化基础形中获取需要更改的元数据, 并对所述元数据进 行加锁修改; 将修改后的所述元数据替换自身对应的当前基础形中存储的所 述需要更改的元数据。  Further, the management unit 92 may include: a read/write control subunit 929, configured to acquire metadata that needs to be changed from a certain degenerate basic shape in the metadata heap, and perform locking modification on the metadata. And replacing the modified metadata with the metadata that needs to be changed stored in the current basic shape corresponding to the same.
进一步的, 当所述集群中的主节点下电时, 该管理单元 92还用于: 根据 预定的规则确定所述当前节点为主节点; 将所述当前节点的主备用标识符修 改为主用。 具体的, 该主备用标识符可以是存储节点内部的 Flag标识符, 将 该标识符修改为主用以后, 该节点即成为主节点, 可以对集群中的元数据进 行管理操作, 由于该节点中同样保存有元数据堆的结构, 该元数据堆中存储 有集群中所有节点的元数据集合, 因此避免了现有技术中的数据迁移。  Further, when the primary node in the cluster is powered off, the management unit 92 is further configured to: determine that the current node is a master node according to a predetermined rule; and modify the primary and secondary identifiers of the current node to be used as a primary node. . Specifically, the primary and secondary identifiers may be Flag identifiers inside the storage node. After the identifier is modified for primary use, the node becomes a primary node, and the metadata in the cluster may be managed, because the node is in the node. Also stored is a structure of a metadata heap in which a metadata set of all nodes in the cluster is stored, thus avoiding data migration in the prior art.
本发明提供的元数据管理装置, 通过在每个节点内部都保存有集群中所 有节点的当前运行的元数据集合, 使得在进行主备切换时, 不需要在主备节 点之间进行数据迁移, 实现了集群中主备切换的快速执行。  The metadata management apparatus provided by the present invention saves the current running metadata set of all the nodes in the cluster in each node, so that data migration between the active and standby nodes is not required when the active/standby switchover is performed. The fast execution of the active/standby switchover in the cluster is implemented.
实施例十  Example ten
图 12为本发明元数据管理装置另一实施例的结构示意图,该元数据管理 装置可以执行本发明任意实施例的元数据管理方法, 如图 12所示, 该装置可 以包括: 存储器 1201和处理器 1202; 其中,  FIG. 12 is a schematic structural diagram of another embodiment of a metadata management apparatus according to the present invention. The metadata management apparatus may perform a metadata management method according to any embodiment of the present invention. As shown in FIG. 12, the apparatus may include: a memory 1201 and processing. 1202; wherein
存储器 1201 , 用于将集群中的元数据存储为元数据堆, 所述元数据包括 当前节点自身当前运行的元数据集合以及除自身之外的集群中所有节点当前 运行的元数据集合, 所述元数据堆包括至少两个基础形, 每个基础形是由顶 点、 第一叶子节点和第二叶子节点构成的二叉树形状; 所述将元数据存储为 元数据堆, 包括: 将所述当前节点自身当前运行的元数据集合存储在位于所述元数据堆顶 层的当前基础形, 所述当前基础形的第一叶子节点用于连接存储另一节点当 前运行的元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子节 点用于连接存储再一节点当前运行的元数据集合的再一当前基础形, 依此类 推直至连接所述集群中的所有节点; a storage 1201, configured to store metadata in the cluster as a metadata heap, where the metadata includes a metadata set currently running by the current node itself and a metadata set currently running by all nodes in the cluster other than itself, The metadata heap includes at least two basic shapes, each of which is a binary tree shape composed of a vertex, a first leaf node, and a second leaf node; the storing the metadata as a metadata heap, including: Storing a metadata set currently running by the current node itself in a current basic shape at a top level of the metadata heap, where the first leaf node of the current basic shape is used to connect and store a metadata set currently running by another node. Another current basic shape, the first leaf node of the other current basic shape is used to connect to store another current basic shape of the metadata set currently running by another node, and so on until all nodes in the cluster are connected ;
用于存储集群中各节点当前运行的元数据集合的各当前基础形中, 每个 所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点在第一 时间点运行的元数据集合的退化基础形, 所述第一时间点早于所述当前时间 点; 所述退化基础形的第二叶子节点用于连接所述节点在第二时间点运行的 元数据集合的另一退化基础形, 所述第二时间点早于第一时间点, 依此类推; 处理器 1202, 用于根据所述元数据堆进行所述元数据的管理。  a second leaf node of each of the current basic shapes is used to connect the node corresponding to the current basic shape to run at a first time point in each current basic shape for storing a metadata set currently running by each node in the cluster. a degenerate basic shape of the metadata set, the first time point being earlier than the current time point; the second leaf node of the degenerate basic shape is used to connect another set of metadata of the node running at the second time point a degenerate basic shape, the second time point is earlier than the first time point, and so on; the processor 1202 is configured to perform the management of the metadata according to the metadata heap.
具体的, 该处理器 1202, 可以用于在对所述元数据堆中的元数据进行更 新时, 获取所述集群中自身之外的其中一个节点当前运行的更新后的元数据 集合; 将所述更新后的元数据集合, 存储在所述元数据堆中与所述其中一个 节点对应的当前基础形中。  Specifically, the processor 1202 may be configured to: when updating the metadata in the metadata heap, acquire an updated metadata set currently running by one of the nodes other than the cluster itself; The updated metadata set is stored in a current basic shape in the metadata heap corresponding to the one of the nodes.
该处理器 1202, 还用于将所述元数据堆中的与所述其中一个节点对应的 退化数据, 存储在退化基础形中, 所述退化数据是在所述更新后的元数据集 合之前存储在所述当前基础形中的数据; 将所述退化基础形连接在所述当前 基础形的第二叶子节点。  The processor 1202 is further configured to store, in the metadata base, degraded data corresponding to the one of the nodes in a degenerate basic shape, where the degraded data is stored before the updated metadata set. Data in the current base shape; connecting the degenerate base shape to a second leaf node of the current base shape.
该处理器 1202, 还用于在将所述集群中的当前节点自身之外的所述另一 节点从所述集群分裂出去时, 将所述另一节点对应的当前基础形, 与所述当 前节点的当前基础形的第一叶子节点断开连接, 并将所述另一节点对应的所 述另一当前基础形的第一叶子节点连接的再一节点对应的所述再一当前基础 形断开连接; 将所述再一当前基础形连接在所述当前节点的当前基础形的第 一叶子节点; 还用于在向所述集群中加入新节点时, 从将要加入所述集群的 新节点中存储的元数据堆中, 获取所述新节点的当前运行的元数据集合; 在 所述当前节点的元数据堆建立与所述新节点对应的当前基础形, 并将所述新 节点的当前运行的元数据集合存储在所述新节点对应的当前基础形中, 将所 述新节点的当前基础形与所述元数据堆中具有空闲的第一叶子节点的当前基 础形连接。 该处理器 1202, 还用于分别存储在各时间点运行的元数据集合, 所述元 数据集合包括所述集群中各节点的元数据集合, 所述各时间点包括所述当前 时间点、 第一时间点和第二时间点, 以运行所述各时间点中的某个时间点对 应的元数据集合。 The processor 1202 is further configured to: when the another node other than the current node itself in the cluster is split from the cluster, use a current basic shape corresponding to the another node, and the current The first leaf node of the current basic shape of the node is disconnected, and the other current node corresponding to the other node of the other current basic shape corresponding to the other node is disconnected Opening a connection; connecting the further current basic shape to a first leaf node of a current basic shape of the current node; and also for using a new node to join the cluster when adding a new node to the cluster Obtaining a currently running metadata set of the new node in a metadata heap stored in the metadata; establishing a current basic shape corresponding to the new node in a metadata heap of the current node, and The running metadata set is stored in a current basic shape corresponding to the new node, and the current basic shape of the new node is connected with a current basic shape of the first leaf node having the idleness in the metadata heap. Pick up. The processor 1202 is further configured to separately store a metadata set that is run at each time point, where the metadata set includes a metadata set of each node in the cluster, where each time point includes the current time point, A time point and a second time point are used to run a metadata set corresponding to a certain time point in each time point.
该处理器 1202, 还用于当所述元数据堆中当前节点自身对应的当前基础 形中存在待修复元数据时, 从所述元数据堆中当前节点自身对应的退化基础 形中获取与所述待修复数据对应的修复数据; 或者, 从所述集群中的其他节 点的元数据堆的与自身对应的当前基础形中获取所述待修复数据对应的修复 数据; 将获取的所述修复数据替换所述元数据堆中当前节点自身对应的当前 基础形中的待修复元数据。  The processor 1202 is further configured to: when the metadata to be repaired exists in the current basic shape corresponding to the current node in the metadata heap, obtain the information from the degenerate basic shape corresponding to the current node in the metadata heap. Recovering the repair data corresponding to the repair data; or acquiring the repair data corresponding to the data to be repaired from the current basic shape of the metadata stack of the other nodes in the cluster; The metadata to be repaired in the current basic shape corresponding to the current node itself in the metadata heap is replaced.
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成, 前述程序可以存储于一计算机可读取 存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述的 存储介质包括: ROM, RAM,磁碟或者光盘等各种可以存储程序代码的介质。  A person skilled in the art can understand that all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and when executed, the program includes The foregoing steps of the method embodiment; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是: 以上各实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述各实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改, 或者对其中部分或者全部技术特征进行等同替换; 而这些修改或者替换, 并 不使相应技术方案的本质脱离本发明各实施例技术方案的范围。  Finally, it should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting thereof; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

权 利 要 求 书 claims
1、 一种元数据管理方法, 其特征在于, 所述方法应用于包含多个节点的 集群中, 所述方法包括: 1. A metadata management method, characterized in that the method is applied to a cluster containing multiple nodes, and the method includes:
将所述集群中的元数据存储为元数据堆, 所述元数据包括当前节点自身 的当前运行的元数据集合以及除自身之外的集群中所有节点当前运行的元数 据集合, 所述元数据堆包括至少两个基础形, 每个基础形是由顶点、 第一叶 子节点和第二叶子节点构成的二叉树; The metadata in the cluster is stored as a metadata heap. The metadata includes the currently running metadata set of the current node itself and the currently running metadata set of all nodes in the cluster except itself. The metadata The heap includes at least two basic shapes, each basic shape is a binary tree composed of a vertex, a first leaf node and a second leaf node;
所述将所述集群中的元数据存储为元数据堆, 具体包括: The storing of metadata in the cluster as a metadata heap specifically includes:
将所述当前节点自身当前运行的元数据集合存储在位于所述元数据堆顶 层的当前基础形中, 所述当前基础形的第一叶子节点用于连接存储另一节点 当前运行的元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子 节点用于连接存储再一节点当前运行的元数据集合的再一当前基础形, 依此 类推直至连接所述集群中的所有节点, The current node's own currently running metadata set is stored in the current basic shape located at the top of the metadata heap. The first leaf node of the current basic shape is used to connect and store the currently running metadata set of another node. Another current base shape, the first leaf node of the other current base shape is used to connect to another current base shape that stores the metadata set currently running on another node, and so on until all nodes in the cluster are connected. node,
所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点的 退化基础形, 所述退化基础形是当前基础形对应的节点在第一时间点运行的 元数据集合, 所述第一时间点早于当前时间点; 所述退化基础形的第二叶子 节点用于连接所述当前基础形对应的节点的另一退化基础形, 所述另一退化 基础形是当前基础形对应的节点在第二时间点运行的元数据集合, 所述第二 时间点早于第一时间点, 依此类推; The second leaf node of the current basic shape is used to connect the degenerated basic shape of the node corresponding to the current basic shape. The degenerated basic shape is a set of metadata of the node corresponding to the current basic shape running at the first point in time, so The first time point is earlier than the current time point; The second leaf node of the degenerated basic shape is used to connect another degenerated basic shape of the node corresponding to the current basic shape, and the other degenerated basic shape is the current basic shape. The metadata set of the corresponding node running at the second time point, which is earlier than the first time point, and so on;
根据所述元数据堆进行所述元数据的管理。 The metadata is managed according to the metadata heap.
2、 根据权利要求 1所述的元数据管理方法, 其特征在于, 在对所述元数 据堆中的元数据进行更新后,所述根据所述元数据堆进行所述元数据的管理, 包括: 2. The metadata management method according to claim 1, characterized in that, after updating the metadata in the metadata pile, managing the metadata according to the metadata pile includes: :
获取所述集群中自身之外的其中一个节点当前运行的更新后的元数据集 合; Obtain the updated metadata set currently running on one of the nodes other than itself in the cluster;
将所述更新后的元数据集合, 存储在所述元数据堆中与所述其中一个节 点对应的当前基础形中。 The updated metadata set is stored in the current basic shape corresponding to one of the nodes in the metadata heap.
3、 根据权利要求 2所述的元数据管理方法, 其特征在于, 在所述存储在 所述元数据堆中与所述其中一个节点对应的当前基础形中之后, 还包括: 将所述元数据堆中的与所述其中一个节点对应的退化数据, 存储在退化 基础形中, 所述退化数据是在所述更新后的元数据集合之前存储在所述当前 基础形中的数据; 3. The metadata management method according to claim 2, characterized in that, after the current basic shape corresponding to one of the nodes stored in the metadata heap is stored, further comprising: converting the metadata The degraded data corresponding to one of the nodes in the data heap is stored in the degraded In the base form, the degraded data is data stored in the current base form before the updated metadata set;
将所述退化基础形连接在所述当前基础形的第二叶子节点。 Connect the degenerated basic shape to the second leaf node of the current basic shape.
4、 根据权利要求 1所述的元数据管理方法, 其特征在于, 4. The metadata management method according to claim 1, characterized in that,
在将所述集群中的当前节点自身之外的所述另一节点从所述集群分裂出 去时, 所述根据所述元数据堆进行所述元数据的管理, 包括: When the other node other than the current node in the cluster is split from the cluster, the management of the metadata according to the metadata pile includes:
将所述另一节点对应的当前基础形, 与所述当前节点的当前基础形的第 一叶子节点断开连接, 并将所述另一节点对应的所述另一当前基础形的第一 叶子节点连接的再一节点对应的所述再一当前基础形断开连接; Disconnect the current basic shape corresponding to the other node from the first leaf node of the current basic shape of the current node, and connect the first leaf node of the other current basic shape corresponding to the other node. The further current basic shape corresponding to the further node connected by the node is disconnected;
将所述再一当前基础形连接在所述当前节点的当前基础形的第一叶子节 点。 Connect the further current basic shape to the first leaf node of the current basic shape of the current node.
5、 根据权利要求 1所述的元数据管理方法, 其特征在于, 在向所述集群 中加入新节点时, 所述根据所述元数据堆进行所述元数据的管理, 包括: 从将要加入所述集群的新节点中存储的元数据堆中, 获取所述新节点的 当前运行的元数据集合; 5. The metadata management method according to claim 1, characterized in that, when adding a new node to the cluster, the management of the metadata according to the metadata heap includes: Obtain the currently running metadata set of the new node from the metadata heap stored in the new node of the cluster;
在所述当前节点的元数据堆建立与所述新节点对应的当前基础形, 并将 所述新节点的当前运行的元数据集合存储在所述新节点对应的当前基础形 中, 将所述新节点的当前基础形与所述元数据堆中具有空闲的第一叶子节点 的当前基础形连接。 Establish the current basic shape corresponding to the new node in the metadata pile of the current node, and store the currently running metadata set of the new node in the current basic shape corresponding to the new node, and store the current basic shape corresponding to the new node. The current base shape of the new node is connected to the current base shape with the free first leaf node in the metadata heap.
6、 根据权利要求 1所述的元数据管理方法, 其特征在于, 所述根据所述 元数据堆进行所述元数据的管理, 包括: 6. The metadata management method according to claim 1, wherein the management of the metadata according to the metadata heap includes:
分别存储在各时间点运行的元数据集合, 所述元数据集合包括所述集群 中各节点的元数据集合, 所述各时间点包括所述当前时间点、 第一时间点和 第二时间点, 以运行所述各时间点中的某个时间点对应的元数据集合。 Metadata sets running at each time point are respectively stored. The metadata set includes the metadata set of each node in the cluster. Each time point includes the current time point, the first time point and the second time point. , to run the metadata collection corresponding to a certain time point among the time points.
7、 根据权利要求 1所述的元数据管理方法, 其特征在于, 所述根据所述 元数据堆进行所述元数据的管理, 包括: 7. The metadata management method according to claim 1, wherein the management of the metadata according to the metadata heap includes:
当所述元数据堆中当前节点自身对应的当前基础形中有需要修复的元数 据时, 从所述元数据堆中当前节点自身对应的退化基础形中获取与所述需要 修复的元数据对应的退化数据; 或者, 从所述集群中的其他节点的元数据堆 的与自身对应的当前基础形中获取所述需要修复的元数据对应的退化数据; 将获取的需要修复的元数据对应的退化数据替换所述元数据堆中当前节 点自身对应的当前基础形中的需要修复的元数据。 When there is metadata that needs to be repaired in the current basic shape corresponding to the current node itself in the metadata heap, obtain the metadata corresponding to the metadata that needs to be repaired from the degraded basic shape corresponding to the current node itself in the metadata heap. The degraded data; or, obtain the degraded data corresponding to the metadata that needs to be repaired from the current basic shape corresponding to itself in the metadata pile of other nodes in the cluster; The acquired degraded data corresponding to the metadata that needs to be repaired is replaced with the metadata that needs to be repaired in the current basic shape corresponding to the current node itself in the metadata heap.
8、 一种元数据管理装置, 其特征在于, 包括: 8. A metadata management device, characterized by including:
存储单元, 用于将集群中的元数据存储为元数据堆, 所述元数据包括当 前节点自身当前运行的元数据集合以及除自身之外的集群中所有节点当前运 行的元数据集合, 所述元数据堆包括至少两个基础形, 每个基础形是由顶点、 第一叶子节点和第二叶子节点构成的二叉树形状; 所述将元数据存储为元数 据堆, 包括: A storage unit used to store metadata in the cluster as a metadata heap, where the metadata includes a set of metadata currently running on the current node itself and a set of metadata currently running on all nodes in the cluster except itself, the The metadata heap includes at least two basic shapes, each basic shape is a binary tree shape composed of a vertex, a first leaf node and a second leaf node; the storage of metadata as a metadata heap includes:
将所述当前节点自身当前运行的元数据集合存储在位于所述元数据堆顶 层的当前基础形, 所述当前基础形的第一叶子节点用于连接存储另一节点当 前运行的元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子节 点用于连接存储再一节点当前运行的元数据集合的再一当前基础形, 依此类 推直至连接所述集群中的所有节点; The current node's own currently running metadata set is stored in the current basic shape located at the top of the metadata heap, and the first leaf node of the current basic shape is used to connect and store the currently running metadata set of another node. Another current basic shape, the first leaf node of the other current basic shape is used to connect to another current basic shape that stores the metadata set currently running by another node, and so on until all nodes in the cluster are connected. ;
用于存储集群中各节点当前运行的元数据集合的各当前基础形中, 每个 所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点在第一 时间点运行的元数据集合的退化基础形, 所述第一时间点早于所述当前时间 点; 所述退化基础形的第二叶子节点用于连接所述节点在第二时间点运行的 元数据集合的另一退化基础形, 所述第二时间点早于第一时间点, 依此类推; 管理单元, 用于根据所述元数据堆进行所述元数据的管理。 In each current basic shape used to store the metadata set currently running by each node in the cluster, the second leaf node of each current basic shape is used to connect the node corresponding to the current basic shape running at the first point in time. The degenerated basic shape of the metadata set, the first time point is earlier than the current time point; the second leaf node of the degenerated basic shape is used to connect another metadata set of the node running at the second time point. A degenerated basic form, the second time point is earlier than the first time point, and so on; a management unit, used to manage the metadata according to the metadata heap.
9、 根据权利要求 8所述的元数据管理装置, 其特征在于, 所述管理单元 包括: 9. The metadata management device according to claim 8, characterized in that the management unit includes:
同步子单元, 用于在对所述元数据堆中的元数据进行更新后, 获取所述 集群中自身之外的其中一个节点当前运行的更新后的元数据集合; 将所述更 新后的元数据集合, 存储在所述元数据堆中与所述其中一个节点对应的当前 基础形中。 Synchronization subunit, used to obtain the updated metadata set currently running on one of the nodes other than itself in the cluster after updating the metadata in the metadata pile; The data set is stored in the current basic shape corresponding to one of the nodes in the metadata heap.
10、 根据权利要求 9所述的元数据管理装置, 其特征在于, 所述管理单 元包括: 10. The metadata management device according to claim 9, characterized in that the management unit includes:
存储子单元, 用于将所述元数据堆中的与所述其中一个节点对应的退化 数据, 存储在退化基础形中, 所述退化数据是在所述更新后的元数据集合之 前存储在所述当前基础形中的数据; 将所述退化基础形连接在所述当前基础 形的第二叶子节点。 The storage subunit is used to store the degraded data corresponding to one of the nodes in the metadata heap in the degraded basic shape, and the degraded data is stored in the updated metadata set before the updated metadata set. data in the current basic shape; connect the degenerated basic shape to the current basic shape The second leaf node of the shape.
1 1、 根据权利要求 8所述的元数据管理装置, 其特征在于, 所述管理单 元包括: 11. The metadata management device according to claim 8, characterized in that the management unit includes:
形态控制子单元, 用于在将所述集群中的当前节点自身之外的所述另一 节点从所述集群分裂出去时, 将所述另一节点对应的当前基础形, 与所述当 前节点的当前基础形的第一叶子节点断开连接, 并将所述另一节点对应的所 述另一当前基础形的第一叶子节点连接的再一节点对应的所述再一当前基础 形断开连接; 将所述再一当前基础形连接在所述当前节点的当前基础形的第 一叶子节点; A shape control subunit, configured to, when splitting another node other than the current node in the cluster from the cluster, compare the current basic shape corresponding to the other node with the current node. The first leaf node of the current basic shape is disconnected, and the first leaf node of the other current basic shape corresponding to the other node is connected to the further current basic shape corresponding to the further node. Connect; Connect the further current basic shape to the first leaf node of the current basic shape of the current node;
还用于在向所述集群中加入新节点时, 从将要加入所述集群的新节点中 存储的元数据堆中, 获取所述新节点的当前运行的元数据集合; 在所述当前 节点的元数据堆建立与所述新节点对应的当前基础形, 并将所述新节点的当 前运行的元数据集合存储在所述新节点对应的当前基础形中, 将所述新节点 的当前基础形与所述元数据堆中具有空闲的第一叶子节点的当前基础形连 接。 It is also configured to, when adding a new node to the cluster, obtain the currently running metadata set of the new node from the metadata heap stored in the new node that will be added to the cluster; The metadata heap establishes the current basic shape corresponding to the new node, stores the currently running metadata set of the new node in the current basic shape corresponding to the new node, and stores the current basic shape of the new node. Connect to the current base shape of the first leaf node that is free in the metadata heap.
12、 根据权利要求 8所述的元数据管理装置, 其特征在于, 所述管理单 元包括: 12. The metadata management device according to claim 8, characterized in that the management unit includes:
快照子单元, 用于分别存储在各时间点运行的元数据集合, 所述元数据 集合包括所述集群中各节点的元数据集合, 所述各时间点包括所述当前时间 点、 第一时间点和第二时间点, 以运行所述各时间点中的某个时间点对应的 元数据集合。 Snapshot subunit, used to respectively store metadata sets running at each time point. The metadata set includes the metadata set of each node in the cluster. Each time point includes the current time point, the first time point and the second time point to run the metadata set corresponding to a certain time point among the time points.
13、 根据权利要求 8所述的元数据管理装置, 其特征在于, 所述管理单 元包括: 13. The metadata management device according to claim 8, characterized in that the management unit includes:
修复子单元, 用于当所述元数据堆中当前节点自身对应的当前基础形中 有需要修复的元数据时, 从所述元数据堆中当前节点自身对应的退化基础形 中获取与所述需要修复的元数据对应的退化数据; 或者, 从所述集群中的其 他节点的元数据堆的与自身对应的当前基础形中获取所述需要修复的元数据 对应的退化数据; 将获取的需要修复的元数据对应的退化数据替换所述元数 据堆中当前节点自身对应的当前基础形中的需要修复的元数据。 Repair subunit, used to obtain metadata that needs to be repaired in the current basic shape corresponding to the current node itself in the metadata pile, from the degraded basic shape corresponding to the current node itself in the metadata pile. The degraded data corresponding to the metadata that needs to be repaired; or, obtain the degraded data corresponding to the metadata that needs to be repaired from the current basic shape corresponding to itself in the metadata pile of other nodes in the cluster; The degraded data corresponding to the repaired metadata replaces the metadata that needs to be repaired in the current basic shape corresponding to the current node itself in the metadata heap.
14、 根据权利要求 8所述的元数据管理装置, 其特征在于, 当所述集群 中的主节点下电时, 所述管理单元还用于: 14. The metadata management device according to claim 8, characterized in that when the cluster When the master node in is powered off, the management unit is also used to:
根据预定的规则确定所述当前节点为主节点; Determine the current node as the master node according to predetermined rules;
将所述当前节点的主备用标识符修改为主用。 Modify the primary and backup identifiers of the current node to the primary.
15、 一种元数据管理装置, 其特征在于, 包括存储器和处理器; 所述存储器用于将所述集群中的元数据存储为元数据堆, 所述元数据包 括当前节点自身的当前运行的元数据集合以及除自身之外的集群中所有节点 当前运行的元数据集合, 所述元数据堆包括至少两个基础形, 每个基础形是 由顶点、 第一叶子节点和第二叶子节点构成的二叉树; 15. A metadata management device, characterized in that it includes a memory and a processor; the memory is used to store metadata in the cluster as a metadata heap, and the metadata includes the currently running data of the current node itself. A set of metadata and a set of metadata currently running on all nodes in the cluster except itself. The metadata heap includes at least two basic shapes, each basic shape is composed of a vertex, a first leaf node and a second leaf node. binary tree;
所述将所述集群中的元数据存储为元数据堆, 具体包括: The storing of metadata in the cluster as a metadata heap specifically includes:
将所述当前节点自身当前运行的元数据集合存储在位于所述元数据堆顶 层的当前基础形中, 所述当前基础形的第一叶子节点用于连接存储另一节点 当前运行的元数据集合的另一当前基础形, 所述另一当前基础形的第一叶子 节点用于连接存储再一节点当前运行的元数据集合的再一当前基础形, 依此 类推直至连接所述集群中的所有节点, The current node's own currently running metadata set is stored in the current basic shape located at the top of the metadata heap. The first leaf node of the current basic shape is used to connect and store the currently running metadata set of another node. Another current base shape, the first leaf node of the other current base shape is used to connect to another current base shape that stores the metadata set currently running on another node, and so on until all nodes in the cluster are connected. node,
所述当前基础形的第二叶子节点用于连接所述当前基础形对应的节点的 退化基础形, 所述退化基础形是当前基础形对应的节点在第一时间点运行的 元数据集合, 所述第一时间点早于当前时间点; 所述退化基础形的第二叶子 节点用于连接所述当前基础形对应的节点的另一退化基础形, 所述另一退化 基础形是当前基础形对应的节点在第二时间点运行的元数据集合, 所述第二 时间点早于第一时间点, 依此类推; The second leaf node of the current basic shape is used to connect the degenerated basic shape of the node corresponding to the current basic shape. The degenerated basic shape is a set of metadata of the node corresponding to the current basic shape running at the first point in time, so The first time point is earlier than the current time point; The second leaf node of the degenerated basic shape is used to connect another degenerated basic shape of the node corresponding to the current basic shape, and the other degenerated basic shape is the current basic shape. The metadata set of the corresponding node running at the second time point, which is earlier than the first time point, and so on;
所述处理器, 用于根据所述元数据堆进行所述元数据的管理。 The processor is configured to manage the metadata according to the metadata heap.
PCT/CN2012/078563 2012-07-12 2012-07-12 Metadata management method and device WO2014008652A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280003170.4A CN104054294B (en) 2012-07-12 2012-07-12 Metadata management method and device
PCT/CN2012/078563 WO2014008652A1 (en) 2012-07-12 2012-07-12 Metadata management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/078563 WO2014008652A1 (en) 2012-07-12 2012-07-12 Metadata management method and device

Publications (1)

Publication Number Publication Date
WO2014008652A1 true WO2014008652A1 (en) 2014-01-16

Family

ID=49915326

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/078563 WO2014008652A1 (en) 2012-07-12 2012-07-12 Metadata management method and device

Country Status (2)

Country Link
CN (1) CN104054294B (en)
WO (1) WO2014008652A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101577735A (en) * 2009-06-24 2009-11-11 成都市华为赛门铁克科技有限公司 Method, device and system for taking over fault metadata server
CN101697526A (en) * 2009-10-10 2010-04-21 中国科学技术大学 Method and system for load balancing of metadata management in distributed file system
US7788303B2 (en) * 2005-10-21 2010-08-31 Isilon Systems, Inc. Systems and methods for distributed system scanning
US7873601B1 (en) * 2006-06-29 2011-01-18 Emc Corporation Backup of incremental metadata in block based backup systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692239B (en) * 2009-10-19 2012-10-03 浙江大学 Method for distributing metadata of distributed type file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7788303B2 (en) * 2005-10-21 2010-08-31 Isilon Systems, Inc. Systems and methods for distributed system scanning
US7873601B1 (en) * 2006-06-29 2011-01-18 Emc Corporation Backup of incremental metadata in block based backup systems
CN101577735A (en) * 2009-06-24 2009-11-11 成都市华为赛门铁克科技有限公司 Method, device and system for taking over fault metadata server
CN101697526A (en) * 2009-10-10 2010-04-21 中国科学技术大学 Method and system for load balancing of metadata management in distributed file system

Also Published As

Publication number Publication date
CN104054294A (en) 2014-09-17
CN104054294B (en) 2017-04-26

Similar Documents

Publication Publication Date Title
US20200371884A1 (en) Remote Data Replication Method and System
EP3694148B1 (en) Configuration modification method for storage cluster, storage cluster and computer system
US10713134B2 (en) Distributed storage and replication system and method
CN107807794B (en) Data storage method and device
JP6309103B2 (en) Snapshot and clone replication
US7840662B1 (en) Dynamically managing a network cluster
US9280430B2 (en) Deferred replication of recovery information at site switchover
EP2002339B1 (en) Use of volume containers in replication and provisioning management
US9031906B2 (en) Method of managing data in asymmetric cluster file system
WO2021136422A1 (en) State management method, master and backup application server switching method, and electronic device
WO2018098972A1 (en) Log recovery method, storage device and storage node
WO2018133662A1 (en) Data redistribution method and apparatus, and database cluster
JP2004334574A (en) Operation managing program and method of storage, and managing computer
US11221935B2 (en) Information processing system, information processing system management method, and program thereof
US9483367B1 (en) Data recovery in distributed storage environments
US20150186411A1 (en) Enhancing Reliability of a Storage System by Strategic Replica Placement and Migration
WO2020233311A1 (en) Virtual machine backup method and device based on cloud platform data center
US9513996B2 (en) Information processing apparatus, computer-readable recording medium having stored program for controlling information processing apparatus, and method for controlling information processing apparatus
CN116680256B (en) Database node upgrading method and device and computer equipment
CN110597661A (en) Virtual machine backup method and device
CN103544081B (en) The management method of double base data server and device
CN104268097A (en) Metadata processing method and system
CN116389233A (en) Container cloud management platform active-standby switching system, method and device and computer equipment
WO2014008652A1 (en) Metadata management method and device
CN110704239B (en) Data copying method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12881041

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12881041

Country of ref document: EP

Kind code of ref document: A1