WO1998003914A2 - Method and apparatus for coordinated management of a shared object - Google Patents

Method and apparatus for coordinated management of a shared object Download PDF

Info

Publication number
WO1998003914A2
WO1998003914A2 PCT/US1997/012689 US9712689W WO9803914A2 WO 1998003914 A2 WO1998003914 A2 WO 1998003914A2 US 9712689 W US9712689 W US 9712689W WO 9803914 A2 WO9803914 A2 WO 9803914A2
Authority
WO
WIPO (PCT)
Prior art keywords
local
owner
processes
shared
message
Prior art date
Application number
PCT/US1997/012689
Other languages
French (fr)
Other versions
WO1998003914A3 (en
Inventor
Jason Jeffords
Todd A Crowley
Donald Sexton
Thomas Hazel
Original Assignee
Cabletron Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/681,040 external-priority patent/US6041383A/en
Application filed by Cabletron Systems Inc filed Critical Cabletron Systems Inc
Priority to AU38059/97A priority Critical patent/AU3805997A/en
Publication of WO1998003914A2 publication Critical patent/WO1998003914A2/en
Publication of WO1998003914A3 publication Critical patent/WO1998003914A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/465Distributed object oriented systems

Abstract

A method and apparatus coordinate creation and destruction of a shared object in a distributed system. An owner process creates the shared object and each other process in the system makes a local copy of the object. When a local process has a reference to the shared object, each other process is informed of the local reference. Only the owner process can destroy the object. When the owner process determines that there are no other processes, including itself, which have a reference to the shared object, the owner process decides to destroy the object. Prior to destroying the object, however, the owner process inquires of each other process as to whether it is permissible to destroy the object. If a process does not have a local reference to the object, the process will give the owner process permission to destroy the object. When, on the other hand, the process has a local reference to the object, the process will tell the owner process not to destroy the object. This confirmation of destruction prevents the invalid destruction of shared objects in the distributed system.

Description


  
 



   METHOD AND APPARATUS FOR COORDINATED
 MANAGEMENT OF A SHARED OBJECT
 Related Cases
 This case is related to copending and commonly owned:
 a) U.S. Serial No. 08/681.040, filed July22, 1996 by Jeffords et al. entitled "Method
For Synchronizing Process In a Distributed System" (Docket C0441/7071);
 b) U.S. Serial No. 08/ ¯¯¯¯¯ filed concurrently herewith by Jeffords et al. entitled "Method and Apparatus For Synchronizing Transactions In a Distributed System" (Docket   C0441/7098);    and
 c) U.S. Serial No. 08/   ¯¯¯¯¯¯¯¯    filed concurrently herewith by Jeffords et al. entitled "Method and Apparatus for Coordination of a Shared Object In a Distributed System" (Docket
C0441/7109); which are hereby incorporated by reference and from which priority is claimed.



   Background of the Invention 1. Field of the Invention
 The present invention is directed to the management and coordination of a shared object amongst a plurality of processes in a distributed system. The creation and destruction of the object in the distributed system is controlled so that the object is created and destroyed efficiently.



  2. Discussion of the Related Art
 In a distributed connection-oriented switched network. a plurality of distributed processes are employed to provide command. control and connectivity in the network: see, for example, the
Virtual Network Services (VNS) described in co-pending and commonly owned U.S. Serial No.



  08/622.866 filed March 29, 1996, by S. Thebaut et al. (Docket C0441/7072), hereby incorporated by reference in its entirety. Due to constant inter-process interaction, some method is required to coordinate the functions of these processes and the objects they share. One such method is to coordinate the instantiation of the shared objects; see. for example, the Replicated Resource
Management (RRM) tools described in co-pending and commonly owned U.S. Serial No.



  08/585.054 filed January 11, 1996, by J. Jeffords et al. (Docket C0441/7029), hereby incorporated by reference in its entirety. In applications utilizing RRM, objects can be created or destroyed at any time and are replicated across the processes at the time of creation. However, methods are still needed to manage the lifecycles of these replicated objects, e.g., to determine when an object is no longer needed and may be destroyed.  



   Coordination of processes which share an object is generally accomplished by sending and receiving messages between processes regarding the replicated object. These messages must be coordinated so that, for example, one process does not think it is modifying the object when, for example, another process is attempting to remove the object. Such "crossing" of messages is common in distributed systems and can interfere with efforts to consistently control the state of a shared object.



   Thus, a system is needed to coordinate the actions of a plurality of processes which share an object so that a shared object can be created and destroyed in a reliable manner.



   Summarv of the Invention
 According to the present invention. the   lifecycle    of an object, shared by processes in a distributed system. is managed so that replicated objects need not exist forever. An object is assigned one owner process which is responsible for the creation and destruction of the object.



  When the owner process determines that, at a given time, none of the processes are interested in the shared object, the owner process designates the object as a candidate for being destroyed.



  Prior to destruction. however, the owner process will still confirm with each other process that it is acceptable to destroy the object. When it is thus confirmed that no processes are interested in the object, the object will be destroyed.



   According to a method embodiment. the destruction of a shared object includes the steps of the owner process sending a destruction request message to the other active processes asking if the shared object can be destroyed: each of the other active processes sending an approval message or a denial message to the owner process: and when no process has responded with a denial message, the owner process destroying the shared object. The other processes are then notified that the object has been destroyed. Alternatively, when at least one process responds with a denial message, the owner process does not destroy the object and sends a message to the other processes notifying them that the object has not been destroyed.



   In a particular embodiment. two reference counts are maintained in each copy of the shared object -- a local reference count which tracks the number of references a respective local process has to the object, and a global reference count which tracks the number of processes in the distributed system having at least one reference to the object. The information provided by the reference counting is used to determine whether an object can be destroyed. The global reference count is derived from a list of processes each having a reference to the object.  



   Another embodiment of the invention provides a level of fault tolerance to the object destruction process. If an owner process fails during the destruction of the object, the remaining processes determine which process will become the new owner process. The new owner process then determines the state of the shared object and destroys the object if no other process is interested in the shared object. By way of example. a resource pool as discussed in the '054 application may be maintained with a list of all peer processes interested in the shared object.



  This list is used when an owner process fails and a new owner process must be selected.



   The invention is further described with respect to the following detailed description and drawings, which are given by way of example only and are in no way restrictive.



   Brief Description of the Drawings
 Fig. 1 is a state diagram showing six possible states of a shared object;
 Fig. 2 is a schematic diagram of distributed processes;
 Fig. 3 is a block flow diagram of steps executed within a process;
 Fig. 4 is a block flow diagram of steps executed within the owner process;
 Figs. 5,   5a    and   5b    are block flow diagrams of steps with regard to referencing of objects;
 Fig. 6 is a timeline of events occurring in the distributed processes; and
 Fig. 7 is a schematic illustration of a computer apparatus for implementing the present invention.



   Detailed Description
 A specific embodiment of the invention will now be described wherein an object is shared amongst a plurality of processes in a distributed network. The object can be in one of six possible states, illustrated in the state transition diagram   of Fig   
 - an instantiate state (101),
 - a   locally unreferenced    state (103),
 - a locally¯referenced state (105),
 - an inaccessible state (109),
 - a request destroy state (107),
 - and a destroyed state (111).



  The shared object. whose state is being coordinated, includes the original object in an owner process and a local copy of the object in each other nonowner process. An object in each of the  owner and nonowner processes can be in the   instantiate.      locally unreferenced,    locally referenced or destroyed states. Only an object in the owner process can be in the request¯destroy state, and only an object in a nonowner process can be in the inaccessible state.



   As shown in Fig. 1, the object moves in and out of the various states. When first created by the owner process, the object is in the instantiate state 101 -- a default state. Once the owner process has created the object, it notifies all other (nonowner) processes that the object has been created. Each nonowner process then makes a local copy (local object) of the shared object; the original object may be referred to as the local copy of the owner process. Each local copy goes to the locally¯unreferenced state 103. If a local process becomes interested in, i.e., operates upon, the object, the local object will enter the   locallyreferenced    state 105.

   The local object will remain in the locally¯referenced state 105 until the local process is no longer interested in the local object. at which point the local object will go back to the   locally¯unreferenced    state 103.



   When it is determined that all copies of an object are in the locally¯unreferenced state 103, the owner process will cause the owners object to enter the request destroy state 107. At this point, each other process will be asked (by the owner process) if it is interested in the object.



  As each process receives the inquiry, the process' local object will either:   1) enter    the inaccessible state 109, if the local process is not interested in the local object, and the local process will send an approval to the owner process; or 2) the local object will remain in the locally referenced state   105,    if the local process is interested in the local object, and the local process will send a denial message to the owner process. If any single process is interested in the local object, the shared object will not be destroyed. If none of the processes are interested in the shared object and the owner process receives confirmation thereof, then the owner object will enter the destroyed state 111 (the object will be destroyed by the owner process).

   The owner process will then notify all other processes which in turn will cause the local copies to enter the destroyed state 111.



   Coordination of the object's state is provided by two reference counting functions.



  To implement these functions, each copy of the object (including the original object in the owner process) is given: a global¯reference¯count, which indicates a total number of processes in the system that are interested in the object; and a local¯reference¯count, which indicates how many references the local process has to the local object. Every time an object's local¯reference¯count increases from 0 to   1    the object's global¯reference¯count is  increased by I to indicate that an additional process is interested in the shared object.



  Similarly, every time an object's local¯reference¯count decreases from I to 0, the object's global¯reference¯count is decreased by 1 indicating that a process is no longer interested in the shared object.



   Preferably, global referencing includes more than just a number. For example, the   global reference count    may be determined from a list of addresses (a   global¯reference list)    of those processes interested in the object i.e., the number of entries in the address list being the global reference count. When a process is interested in an object for the first time (i.e., the object's local¯reference¯count increases from 0 to 1), the interested process' address is added. in each process. to the local object's global¯reference¯list and the   global¯reference¯count    of the local object is incremented. As a result. each local object includes a list of all processes with at least one reference to the object.

   When a process releases its final reference to the shared object   (i.e..    the object's local¯reference¯count decreases from I to 0), the no longer interested process' address is removed in each process' local object's global¯reference¯list, and the global¯reference¯count of the local object is decremented.



   In an event of a process failure or removal, the other processes are notified and the failed or removed process' address is removed, in each process, from the local object's global¯reference¯list if the failed/removed process had a reference to the object. In this manner, the other processes will know   ifa    failed process was interested in the object and will provide a level of fault tolerance in determining whether or not to continue the destruction process (described below in more detail).



   Reference counting is used to decide when to attempt to destroy an object. When the last remaining process interested in an object releases its reference to the object, as above, each global¯reference¯count of each local object is decreased by I to a value of 0. The owner process will be one of those processes so notified. If the owner's local¯reference¯count is also 0, the owner will then attempt to destroy the shared object.



  First, the owner's object will enter the request¯destroy state. Next, the owner process will send a request¯for¯object¯destruction message to each of the other processes. Upon receipt of this message, each other process will check its local object's local¯reference¯count to see if the local process is interested in the local object. If a local object's   local¯reference count    is 0, the local process will respond to the owner process that the shared object can be  destroyed and the local object will enter the inaccessible state 109. In this manner. the local¯reference¯count of each local object is used to confirm destruction of the object. This process will be further described below with regard to the flowcharts of Figs. 3-6.



   Fig. 2 shows a distributed network system 200 including a plurality of processes   202j,    where i = 1 to N. Although the network topology is shown as a ring, any topology can be used. The processes can run on various hardware platforms, including but not limited to:
SunSpare 5, 10 or 20 processes running SunOS or Solaris operating systems; and Intel x86 (or i960) processes running Windows 3.1, Windows '95 or Windows NT (or an embedded operating system in the case of the i960). In the present invention, and in regard to the shared object, one of the processes   202    will be the owner process. The other   processcs.   



  with regard to the shared object. will be coordinated by the owner process.



   In this embodiment. each of the processes 202 includes at least nine different functions 302-318 relating to the shared object, as shown in Fig. 3. Function 302 either creates the shared object or creates a local copy of the shared object. The first process of creating an object is the owner process; all other processes make a local copy of a shared object. Function 304 creates a local reference to the object. i.e., from the particular process   202    to the shared object. Function 306 deletes a local referenced to the object. In both functions 304 and 306, all other processes are notified only when a first/last local reference is created or deleted. i.e.. only on the transition from zero to one, or one to zero.

   When each process receives such notification. function 308 records the reference (incrementing global¯reference¯count) and function 310 removes the reference (decrements   global reference count).   



   As described above, each process includes a function 312 which responds to a request to destroy the object (received from the owner process). Each nonowner process implements function   3 14    to destroy the local object, and the owner process destroys the original object. Function 316 implements the fault tolerance capability described above, i.e., if it receives a report of the owner process having failed, a new owner process is selected (described in greater detail below). Finally, function 318 operates in each existing process when it receives a report that a new process has been added, as described in more detail below.



   An overview of the operations in the owner process is shown in Fig. 4. In step 402, the owner process creates the original object and notifies the other processes of such  creation. In step 404, each of the other processes which receives notification creates a local copy of the shared object, i.e., a local object. This local object. similar to the original object in the owner process. includes a   local¯reference¯count,    a   global reference list,    and a   global reference count.    The functions of these attributes have been discussed above.



   In step 406, each process that has a reference to (needs access to) the shared object will notify the other processes and the owner process that it is referencing the object. When the local process removes its last local reference to the local object, the local process will notify each of the other processes that the shared object is not being referenced. While the other processes are operating, either referencing the shared object or not referencing the shared object, the owner process is monitoring, in step 408, whether any of the processes references the object. If there are processes referencing the shared object, then the system returns to step 406. If there are no processes referencing the shared object, as determined by the owner process in step 408. control passes to step 410.



   When the owner process determines that no other process, including itself, has a reference to the shared object there is no need for the shared object to remain in existence.



  This determination may have a certain time limit, e.g., no process has accessed the object for
X seconds before the owner process decides to delete the object. Alternatively, the owner process may look at the maximum number of references to the object up to that point in time and decide that the object will be kept, e.g., if it was heavily referenced. An analysis of probabilistic access patterns for each shared object could be made to determine such parameters. Additionally, a GC (Garbage Collection) thread that runs at a low priority, to 'clean up' unreferenced objects, could be used.



   If the owner process were to simply destroy a shared object when it sees that there are no references at a particular point in time, the problem of "crossing" messages could occur, (i.e., another process might create a reference to the object while the owner process was trying to destroy it). Therefore, in step 410, the owner process sends a message to each of the other processes asking for permission to delete the shared object. In step 412, the owner process determines whether any other process has a reference to the object. If the owner process receives notice that at least one other process has a reference to its local object, there is no need to hear from the other processes; control passes to step 415 where the owner process notifies the other processes that the object is not being destroyed.

   If the owner process does not receive such notice. then control passes to step 414, where the owner  process confirms that all of the other processes have responded. If not all have responded, then the owner process waits until a response has been received from each of the other processes, or a process responds that a reference exists. If one or more of the other processes were to fail while the owner process was waiting for a response. the owner process would be notified of such a failure and would. of course, not expect a response.



  Once all responses (from active processes) have been received. control passes to step 416, and the owner process destroys the object. Then, the owner process notifies the other processes that the object has been destroyed (step 418) and each of the other processes destroys its local copy of the object (step 420).



   Figs. 5a and   5b    describe in greater detail the steps performed in regard to managing an object. In step 500, each nonowner process receives notice of the creation of the original object for the new   princess.    Each nonowner process then makes a local copy (step 501); the original object to be the local object of the owner process. In each process, at step 502, a determination is made as to whether a local reference to the local object has been created by the local process. If so, control passes to step 504 where the local¯reference¯count of the local object is incremented. If the local¯reference¯count equals I (step 506), i.e., this is the first reference to this local object by this process. control passes to step 508.

   Each of the other processes are notified (of this new local reference) by the process sending a message to the other processes (step 508).



   Returning to step 502. if this is not the first local reference to the local object, control passes to step   5 10    where it is determined whether an existing local reference to the local object has been terminated. If so, control passes to step 512 where the local¯reference¯count is decremented by I. If the local¯reference¯count equals 0 (step 514), i.e., that this process has no more references to the local object (last reference terminated), then control passes to step 516; and the other processes are notified that the process is no longer referencing the object.



   If an existing local reference has not been terminated at step 510, then control passes to step 518 where it is determined whether notification has been received that another process has added a first reference to the object. If such notification has been received, control passes to step 520 where the global¯reference¯count of the local object is incremented by 1. Further. at step 522. the memory address associated with the new global reference is added to the global¯reference¯list of the local object.  



   If a new reference to the object is not received at step 518, control passes to step 524 where it is determined whether notification has been received as to the removal of a last reference to the object by another process. If such notification has been received, then at step 526 the global¯reference¯count of the local object is decremented by 1. Further, in step 528 the address associated with the removed process is also removed from the   global reference¯list    of the local object.



   The following describes the steps performed in response to the   destroyobjecQrequest    message (see step 414 in Fig. 4). When, at step 524, a reference has not been removed or, at step 528. after removing a reference, control passes to step 602 where each process determines whether it has received a request from an owner process as to whether it is possible to destroy an object. If such a request has been received. control then passes to step 604 where it is determined by each process whether the local reference count of its local object is equal to 0. If not equal to zero, control passes to step 606 where the process returns a message to the owner process indicating that the shared object should not be destroyed. When the local¯reference¯count is equal to 0. control passes to step 608 where the process sets the local object to the inaccessible state.

   A process cannot access the local object when it is in this state. Next, at step 610, the process returns a message to the owner process indicating that, as far as this particular process is concerned. it is permissible to destroy the shared object.



   Returning to step 602. when there is not a pending request from an owner to destroy the object it is determined (step 612) whether an 'object destroyed' message has been received. If such a message has been received, control passes to step 614 where the local copy of the object is destroyed, otherwise control passes back to step 502.



   When the owner process sends a message to all other processes asking if the shared object can be destroyed the owner is blocked from doing anything further on the object until all other active processes respond to the destruction request. It is clear that this prevents the "crossing" of messages in the distributed system. Specifically. even though the owner process sees that there are no references to the object, it still sends around a request for permission to destroy the object. In this manner, any process which might create a reference to the object, prior to its destruction but after the owner process has determined that there are no references, can prevent the destruction of the object.  



   The method of the present invention is also fault tolerant. When, for example, the destroying owner process fails during the destruction process, a new owner from the remaining processes is determined. When the owner of an object dies, the remaining processes decide who the new owner is going to be. The object's new owner is found from a common algorithm run within each process. This algorithm is deterministic, i.e., there is only one possible new owner for an object that has just lost its owner. A new owner is found from the list of processes that are interested in the object. The algorithm guarantees that each process will "agree" on the object's new owner from the list even though the processes need never communicate with one another with regard to choosing the new owner.



     Ihe    new owner will then implement the destruction process. If the new   owncr's    local object is in the inaccessible state. it will send a message to all other processes asking if the object can be destroyed. The normal destruction process proceeds from there. If the new owner's local object is not in the inaccessible state, the new owner process will send a message to all other processes that the object can be accessed again. The receiving processes' local objects will then leave the inaccessible state and go to the   locally unreferenced¯state,   
 If the new owner process did not receive a request for destruction from the previous owner, the new owner will check its local object's   global reference count.    Note that this would now be considered the original shared object.

   If the global¯reference¯count is greater than 0,   i.e..    a process has a reference to the object. the new owner will notify all other processes that the object is still accessible. If the global¯reference¯count is equal to   0,    the new owner will attempt to destroy the object in the manner described above.



   Additionally, when a process is added to the distributed system, all other processes are notified. Upon instantiation of a new process. all objects from each of the other processes in the system are synchronized to the new process, i.e., copies of each object owned by the other processes are created in the new process. There are two rules which must be followed in order to preserve the reference counts of the objects:   (1)    all objects must be received by the new process before any process can attempt to destroy an object; and (2) after all the objects are received, the local¯reference¯count of each of the local objects needs to be checked and adjusted appropriately. After the local¯reference¯counts are determined from each of the local objects, the process is now synchronized.



   If a new process is added in the middle of the destruction process, the new process will also be asked if the object can be destroyed. Since the   objects    owner was notified of  the addition of the new process it will include the new process in the list of processes from which it is waiting for a response. The new process will not respond. however. until it has completed instantiation and synchronization. The destroying owner process will thus be blocked from continuing until all processes are finished synchronizing to the new process and the instantiation process has finished adjusting the local¯reference¯count of each of its local objects.



   This synchronization and instantiation process is needed in the case where an object itself has a reference to another object in the system. For example, during synchronization of processes in the system, all of the objects are streamed to the new process. The global¯reference¯list is included in the streamed information. but the local¯reference¯count is not included because a local object's local reference count is specific to the process. If one of the objects, however. has a reference to another object. then the referenced object's local¯reference¯count will have to be updated appropriately. Since the objects can come from different processes and in a random order, the local¯reference¯count can only be updated after all of the objects have been received.

   After all objects have been received and the local¯reference¯count updated, the new process is fully synchronized and the owner process can be unblocked from performing its destruction process.



  Example
 In one example. coordination of the local¯reference¯count. a table is used to organize the objects. The table entries match an object's name to a list of reference objects that point to the object. The object's name, as well as any attributes of the object, are streamed during synchronization. If any attributes of one object reference another object, then a reference object will be added to the list (corresponding to the object's name in the table). After all objects are received by a new process, the lists of reference objects in the table are each examined. An object's local¯reference¯count is incremented when a reference object in the list points to the actual object. The object's local¯reference¯count is increased by I for every reference object in the list. 

   After all of the lists of reference objects have been updated the process is now completely synchronized and normal operation can proceed.

 



   Assume a distributed system has 10 objects: A, B,   C.    D, E, F, G, H, 1, and J. The corresponding object names are a, b, c, d, e,   t;    g, h i and j. Then assume: object A has a reference to object B: object B has a reference to both objects C and D; object D has a reference to object E; object E has a reference to objects D and A; objects F and G each have references to object H: object H has two references to object 1: and object J has no references to it or any other object.



   The diagram below shows the relationship between the objects:
EMI12.1     

 Given that the objects can arrive at a new process in any order (and references can be circular), it is impossible to have each local object's reference (and reference count) determined when the object arrives. This problem is solved by using a table, that maps an object name to the actual object and a list of reference objects (that refer to the named object). Each table entry can be represented as follows:
 object name   -*    (object value. [object reference 1.   2... .1)   
Once all objects have arrived. i.e., synchronization is complete. the table is traversed and all reference counts and references are then determined.



   Assume a synchronization session begins. During this session the objects are received in the following order: B, D, A, C, E, I, J, G,   F,    H. Before any object arrives the table is empty. Object B is now received. The table is hashed (by object name) and it is determined that b is not in the table. B is added to the table as follows:
Object Name Object Value Object Reference List b B [1  
B has two references to object c and object d. The table is hashed   aeain    (once for c and once for d), and neither c nor d are found. Therefore. row entries for object names c and d are added to the table. each row containing a list with one reference object in it. The object value for C and D will be NULL since C and D have not yet arrived.

   The resulting table is given below:
Object Name Object Value Object Reference List b B [ ] c NULL [Ref(b- > c)] d NULL   [Ref(b- > d)j   
Object D is received. The table is hashed and d is found to already exist as an Object Name entry; the object D is thus added to Object Value under the d key. D also has a reference to the object named e. The table is hashed and e is not found (under Object Name). The object name e is placed in the table with a NULL object value and an object reference list containing one reference object. The resulting table is given below:
Object Name Object Value Object Reference List b B [] c NULL [Ref(b- > c)] d D   [Ref(b- > d)j    e NULL   [Ref(d- > e)j   
Object A is received. The table is hashed and a is not found. Both the name a and the object A are added to the table.

   The object reference list for a is empty. A has one reference to another object, the object with name b. Object name b is found in the table, so a reference object is added to b's object reference list. The resulting table is given below:  
Object Name Object Value Object Reference List b B [Ref(a- > b)] c NULL   [Ref(b- > c)j    d D   [Rcf(b- > d)]    e NULL [Ref(d- > e)] a A []
Object C is received. The table is hashed and c is found. The object C is placed in the table under c's key. C has no references so no reference objects are created. The resulting table is given below:
Object Name Object Value Object Reference List b B   [Ref(a- > b)J    c C   [Ref(b- > c)j    d D [Ref(b- > d)] e NULL [Ref(d- > e)] a A []
Object E is received. The table is hashed and e is found.

   The object E is placed in the table under e's key. E has one reference to A and one reference to D. Reference objects are placed into both   As    and D s reference lists. The table is now:
Object Name Object Value Object Reference List b B [Ref(a- > b)] c C   [Ref(b- > c)j    d D [Ref(b- > d),   Ref(e- > d)j    e E   [Ref(d- > e)1    a A   [Ref(e- > a)1     
Object I is received and the table is now:
Object Name Object Value Object Reference List b B [Ref(a- > b)] c C [Ref(b- > c)] d D [Ref(b- > d),   Ref(e- > d)j    e E   [Ref(d- > e)j    a A [Ref(e- > a)]
 I []
Object J is received and the table is now:

  
Object Name Object Value Object Reference List b B [Ref(a- > b)] c C [Ref(b- > c)] d D   [Ref(b- > d).      Ref(e- > d)]    e E [Ref(d- > e)] a A [Ref(e- > a)] i I []
J J []  
Object G is received and the table is now:
Object Name Object Value Object Reference List b B [Ref(a- > b)] c C [Ref(b- > c)] d D [Ref(b- > d), Ref (e- > d)] e E [Ref(d- > e)] a A [Ref(e- > a)]
 i I []
J J [] g G h NULL [Ref(g- > h)]
Object F is received and the table is now:
Object Name Object Value Object Reference List b B   [Ref(a- > b)j    c C [Ref(b- > c)| d D [Ref(b- > d), Ref (e- > d)] e E [Ref(d- > e)] a A [Ref(e- > a)]
 I []
J J [] g G [] h NULL [Ref(g- > h),   Ref(f- > h)]    f F [1
Object H is received and the table is now:

    
Object Name Object Value Object Reference List b   B    [Ref(a- > b)] c C   [Ref(b- > c)l    d D [Ref(b- > d),   Ref (e- > d)j    e E   [Ref(d- > e)j    a A [Ref(e- > a)]    1 I [Ref(h- > i) Ref(h- > i)l   
J J [] g G   []    h   H    Ref(g- > h).   Ref(f- > h)i    f F [1
 The process has finished synchronization. Now that all local objects are received, the reference counts and reference objects are determined by walking the table and:   1)    setting the local¯reference¯count to the number of reference objects in its Object
Reference List (third column in table above); and 2) setting the pointer in each reference object to the object value (second column in table above).

   After the table has been walked, it can be cleared.



   Finally, a detailed example will be given of the operations that occur. within a distributed system having four processes 1-4. when an owner process fails during the destruction process. As illustrated by the   timeline/flowchart    of Fig. 6. at step 702. owner process 1 creates a shared object and notifies the remaining processes 2-4 of the creation. At steps 704-708, non-owner processes 2-4 each make a local copy of the shared object (local object). At step 710, owner process 1 increments its local¯reference¯count from 0 to I (process I now has a reference to the shared object). In steps 712, 714 and 716, process 1 notifies each of processes 2-4, respectively, as to its initial reference to the shared object. In response. processes 2-4 in steps 718, 720, 722, respectively, increment the global¯reference¯count of the respective local object.

   Not shown, but as discussed above, this step also includes an addition of the address of the reference to the local copy's global¯reference¯list. At step 724. when process 1 no longer has a reference to the object, its local¯reference¯count is decremented, in this case, from I to 0. At steps   7269    728 and 730, process 1 notifies processes 2-4 as to the removal of the reference. Accordingly, at  steps 732, 734 and 736, processes 2-4 decrement the   global reference count    of the respective local object, as well as remove the address (of process 1) from the   giobal reference list    of the local object. At step 738, owner process 1 determines that the   global reference count    is equal to 0 since. in this example, there are no references to the object from processes 2-4.

   In steps 740. 742 and 744, owner process I sends a request to destroy object message to each of processes 2-4. Each of processes 2-4, respectively, in steps 746, 748 and 750. checks the local¯reference¯count of its local object. Each of processes 2-4 has a local reference count equal to   0.    and each. at steps 752, 754 and 756, respectively, returns an indication to owner process I that it is okay to destroy the shared object.



   It is at this point 757. however. that process 1 dies. As a result. owner process 1 cannot complete the operation of destroying the shared object. At steps 758, 760 and 762.



  each of processes 2-4 lose contact with process 1. At steps 764, 766 and 768, processes 2-4 implement the algorithm as to which process will become the new owner of the object. In this example, process 3 becomes tile new owner. Process 3 then determines (step 770) that the global¯reference¯count of the object, i.e., its local object, is equal to 0. Process 3 at steps 772 and 774, sends a request to destroy object message to processes 2 and 4. Each of these processes then, at steps 776 and 778. checks the   local reference count    of its local object. As above, the local reference count is equal to 0 and each of processes 2 and 4, at steps 780 and 782, returns a message that it is okay to destroy the object. Process 3, at step 784 then destroys the original object.

   Further. at steps 786 and 788, process 3 notifies processes 2 and 4 that the object has been destroyed. Each of these processes, at steps 790 and 792, respectively, destroys its local object.



   Any of the above embodiments may be implemented in a processor such as a general purpose computer 190 as shown in Fig. 7. The computer may include a computer processing unit (CPU) 191, memory 192, a processing bus 193 by which the CPU can access the memory 192, and access a network 194. Alternatively, the invention may be implemented as a memory, such as a floppy disk, compact disc, or hard drive, which contains a computer program or data structure, for providing general purpose computer instructions and data for carrying out the functions of the previous embodiments.



   Throughout the foregoing discussion. the processes have been described as communicating with each other. Interprocess communications   (IPC)    may take several forms  based on the underlying system services. but all are message based (i.e.. use message passing). Some system services that may be used for IPCs include:
 shared memory
 pipes    sockets   
Each of these communication mechanisms exhibit slightly different behaviors and uses different semantics. Thus, there must be a message-based isolation layer that sits above the underlying system services. This isolation layer provides a consistent IPC interface to the higher layer functions.



   By way of example. TCP sockets (integrated by the TCP Server module) can be used for guaranteed data delivery. UDP sockets can be used for non-critical data delivery and messaging. See D. Comer. 'lnternetworking with TCP/IP," Vol.   1,2nod    ed (1991), for a discussion of the TCP and UDP protocols. A TCP Server is a communications entity that uses TCP sockets (stream based, connection oriented, guaranteed data delivery) to emulate
UDP sockets (packet based. connectionless, nonguaranteed data delivery). The resulting service is a packet based. connectionless, guaranteed data delivery service that provides several higher layer functions as well.



   The TCP Server delivers data as packets through a connectionless interface. It also provides notification (through callbacks) of server status (contact established/lost), and data received to registered clients.



   Thus, a TCP Server can provide three major services:
 guaranteed data/message delivery (IPCs);
 notification when data is received; and
 notification of server status changes (contact established/lost).



  Alternatively, these functions may be provided by another protocol.



   Having thus described various embodiments of the invention. numerous modifications within the scope of the present invention will occur to those skilled in the art.



  Thus, this description and accompanying drawings are provided by way of example only and are not intended to be limiting. 

Claims

1. . A computer-implemented method for coordinating destruction of an object shared amongst a plurality of processes in a distributed system. one process being an owner of the object and each process having a local copy of the object. the method comprising the steps of: (a) the owner process sending a destruction request message to the other processes asking if the object can be destroyed; (b) each of the other processes sending to the owner process one of an approval message approving destruction of the object and a denial message denying destruction of the object:
and (c) the owner process waiting for an approval message from each other process or a denial message from at least one other process and (d) when all other processes have responded with an approval message, the owner process destroying its local copy of the object and sending a destroyed message reporting said destruction to each other process; and (e) each other process destroying its local copy of the object.
2. The computer-implemented method as recited in claim 1 wherein: when at least one of the other processes responds with the denial message, the owner process sending a message to each other process reporting that the object will not be destroyed.
3. The computer-implemented method as recited in claim 1. wherein each local copy of the object includes a local reference count indicating a number of references the local process has to the object and wherein the step of each other process sending a reply to the owner process includes steps of: checking the respective local reference count of the object; and when the local reference count is not zero, sending the denial message, otherwise sending the approval message.
4. The computer-implemented method as recited in claim 1, wherein each local copy of the object includes a local reference count indicating a number of references the local process has to the object and a global reference count indicating a number of the processes which have at least one reference to the object, wherein the step of the owner process sending the destruction request message includes the steps of: determining the global reference count of the object in the owner process; determining the local reference count of the object in the owner process; and when both the global reference count and the local reference count of the object in the owner process are zero. sending the destruction request message.
5. The computer-implemented method as recited in claim 4, wherein the step of each process sending a reply to the owner process includes steps of: checking the respective local reference count of the object; and when the local reference count is not zero, sending the denial message, otherwise sending the approval message.
6. The computer-implemented method as recited in claim 1, wherein step (c) comprises the step of: when one of the other processes from which the owner process is expecting a response fails, notifying the owner process so the owner process no longer waits for a response from the failed process.
7. The computer-implemented method as recited in claim 1, wherein: when a new process is added to the distributed system, the new process synchronizing with the other processes by creating local copies of all objects owned by the other processes.
8. The computer-implemented method as recited in claim 7, wherein: when the new process is added to the distributed system while the owner process is executing step (c), step (c) comprising the steps of: waiting until the new process has finished the synchronization process; sending the destruction request message to the new process; and waiting for a response from the new process.
9. A computer-implemented method for coordinating a plurality of processes in a distributed system with regard to a shared object, one process being an owner of the object. the method comprising the steps of: the owner process creating the shared object: after the owner process creates the shared object, the owner process sending a shared-object-created message to each other process indicating the shared object has been created: and when each other process receives the shared-object-created message, each other process creating a local copy of the shared object.
10. The computer-implemented method as recited in claim 9. wherein: when the owner process fails. the remaining other processes arbitrating amongst themselves as to which remaining other process will become a new owner process with regard to the shared object.
11. The computer-implemented method as recited in claim 9. wherein the step of the owner process creating the shared object includes the steps of: associating a global reference count with the shared object, the global reference count indicating a total number of processes which have at least one reference to the shared object; and associating a local reference count with the shared object. the local reference count indicating a number of references from the owner process to the shared object.
12. The computer-implemented method as recited in claim 11, wherein the step of each other process creating a local copy of the shared object includes the steps of: associating a global reference count with the local copy of the shared object, the global reference count indicating a total number of processes which have at least one reference to the shared object: and associating a local reference count with the local copy of the shared object.
the local reference count indicating a number of references from the other process to the shared object.
13. The computer-implemented method as recited in claim 12, wherein: when the global reference count and the local reference count associated with the shared object in the owner process are each equal to zero, the owner process sending a request-to-destroy message to each of the other processes requesting permission to destroy the shared object.
14. The computer-implemented method as recited in claim 13. further comprising the steps of: when each other process receives the request-to-destroy message, each other process determining a value of the local reference count associated with the respective local copy of the object; and when the respective determined value is equal to zero, the respective other process sending an approval message approving destruction of the object; and when the respective determined value is not equal to zero, the respective other process sending a denial message denying destruction of the object.
15. The computer-implemented method as recited in claim 13, wherein: each other process sending an approval message approving destruction of the object when the other process' local reference count is equal to zero; each other process sending a denial message denying destruction of the object when the other process' local reference count is not equal to zero; and when all other processes have responded with an approval message, the owner process destroying the shared object.
16. A computer-implemented method for managing a shared object amongst a plurality of processes in a distributed system, the method comprising the steps of: an owner process creating a first local copy of an object shared amongst the plurality of processes; and each other process in the plurality of processes creating a respective local copy of the shared object. each local copy of the shared object being initially in a locally unreferenced state.
17. The computer-implemented method as recited in claim 16, wherein: when a process accesses the shared object, the accessing process setting the state of the local copy of the shared object to a locally referenced state; the accessing process sending a locally referenced message to each other process indicating the setting of the local copy to the locally referenced state; and when each other process receives the locally referenced message, the other process adding the accessing process to a reference list.
18. A system comprising: a network interconnecting a plurality of processors. each processor including: means for creating a shared object; means for creating a copy of a shared object created by another processor; and means for determining when another processor has a reference to the shared object.
19. The system as recited in claim 18. wherein the means for creating a shared object includes means for destroying the shared object.
20. The system as recited in claim 19. wherein the means for destroying includes means for confirming that no other process has a reference to the shared object before destroying the object.
PCT/US1997/012689 1996-07-22 1997-07-18 Method and apparatus for coordinated management of a shared object WO1998003914A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU38059/97A AU3805997A (en) 1996-07-22 1997-07-18 Method and apparatus for coordinated management of a shared object

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US08/681,040 1996-07-22
US08/681,040 US6041383A (en) 1996-07-22 1996-07-22 Establishing control of lock token for shared objects upon approval messages from all other processes
US87354997A 1997-06-12 1997-06-12
US08/873,549 1997-06-12

Publications (2)

Publication Number Publication Date
WO1998003914A2 true WO1998003914A2 (en) 1998-01-29
WO1998003914A3 WO1998003914A3 (en) 2002-09-26

Family

ID=27102578

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/012689 WO1998003914A2 (en) 1996-07-22 1997-07-18 Method and apparatus for coordinated management of a shared object

Country Status (2)

Country Link
AU (3) AU3805997A (en)
WO (1) WO1998003914A2 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4809168A (en) * 1986-10-17 1989-02-28 International Business Machines Corporation Passive serialization in a multitasking environment
US5187790A (en) * 1989-06-29 1993-02-16 Digital Equipment Corporation Server impersonation of client processes in an object based computer operating system
GB2263797B (en) * 1992-01-31 1996-04-03 Plessey Telecomm Object orientated system
US5721919A (en) * 1993-06-30 1998-02-24 Microsoft Corporation Method and system for the link tracking of objects

Also Published As

Publication number Publication date
AU3733997A (en) 1998-02-10
WO1998003914A3 (en) 2002-09-26
AU3733897A (en) 1998-02-10
AU3805997A (en) 1998-02-10

Similar Documents

Publication Publication Date Title
US7266722B2 (en) System and method for efficient lock recovery
US6324590B1 (en) Replicated resource management system for managing resources in a distributed application and maintaining a relativistic view of state
US6141720A (en) Method and apparatus for coordination of a shared object in a distributed system
US6889253B2 (en) Cluster resource action in clustered computer system incorporation prepare operation
US6839752B1 (en) Group data sharing during membership change in clustered computer system
US5787247A (en) Replica administration without data loss in a store and forward replication enterprise
US6449734B1 (en) Method and system for discarding locally committed transactions to ensure consistency in a server cluster
JP2948496B2 (en) System and method for maintaining replicated data consistency in a data processing system
US20010042139A1 (en) Replicated resource management system for managing resources in a distributed application and maintaining a relativistic view of state
US20040205148A1 (en) Method for operating a computer cluster
CN102932164B (en) Cluster client failure shifts
US20080133668A1 (en) Managing intended group membership using domains
US6138251A (en) Method and system for reliable remote object reference management
WO1997025673A9 (en) Replicated resource management system for a distributed application maintaining a relativistic view of state
US6968359B1 (en) Merge protocol for clustered computer system
EP0631233A2 (en) Failure recovery for a distributed processing shared resource control
WO1999044119A2 (en) A method and apparatus for transporting behavior in an event-based distributed system
Craft Resource management in a decentralized system
WO1998003913A1 (en) Method and apparatus for synchronizing transactions in a distributed system
US7913050B2 (en) Fencing using a hierarchical relationship
US6212595B1 (en) Computer program product for fencing a member of a group of processes in a distributed processing environment
KR100466140B1 (en) Method, system and program products for managing processing groups of a distributed computing environment
US5692120A (en) Failure recovery apparatus and method for distributed processing shared resource control
WO1998003914A2 (en) Method and apparatus for coordinated management of a shared object
US6205510B1 (en) Method for fencing a member of a group of processes in a distributed processing environment

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 98507142

Format of ref document f/p: F

NENP Non-entry into the national phase

Ref country code: CA

122 Ep: pct application non-entry in european phase
AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG