WO2004077280A2 - System and method for communications between servers in a cluster - Google Patents

System and method for communications between servers in a cluster Download PDF

Info

Publication number
WO2004077280A2
WO2004077280A2 PCT/US2004/006215 US2004006215W WO2004077280A2 WO 2004077280 A2 WO2004077280 A2 WO 2004077280A2 US 2004006215 W US2004006215 W US 2004006215W WO 2004077280 A2 WO2004077280 A2 WO 2004077280A2
Authority
WO
WIPO (PCT)
Prior art keywords
server
cluster
advertisement
request
point
Prior art date
Application number
PCT/US2004/006215
Other languages
French (fr)
Other versions
WO2004077280A3 (en
Inventor
Prasad Peddada
Original Assignee
Bea Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bea Systems, Inc. filed Critical Bea Systems, Inc.
Publication of WO2004077280A2 publication Critical patent/WO2004077280A2/en
Publication of WO2004077280A3 publication Critical patent/WO2004077280A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5055Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering software capabilities, i.e. software resources associated or available to the machine
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users

Definitions

  • the invention relates generally to application servers and clusters of application servers, and particularly to a system and method for communications between servers in a cluster.
  • a cluster (such as a WebLogic Server cluster) comprises multiple server instances running simultaneously and working together to provide increased scalability and reliability.
  • a cluster appears to clients to be a single server.
  • the server instances that constitute a cluster can run on the same machine, or be located on different machines.
  • a cluster's capacity can be increased by adding additional server instances to the cluster on an existing machine, or by adding machines to the cluster to host the incremental server instances.
  • Each server instance in a cluster must typically run the same version of the server product.
  • a cluster is usually part of a particular server (e.g. WebLogic Server) domain.
  • a domain is an interrelated set of resources that are managed as a unit.
  • a domain includes one or more server instances, which can be clustered, non-clustered, or a combination of clustered and non-clustered instances.
  • a domain can include multiple clusters.
  • a domain also contains the application components deployed in the domain, and the resources and services required by those application components and the server instances in the domain. Examples of the resources and services used by applications and server instances include machine definitions, optional network channels, connectors, startup classes, EJB's, JSPs, etc.
  • An administrator can use a variety of criteria for organizing server instances into domains. For instance, they might choose to allocate resources to multiple domains based on logical divisions of the hosted application, geographical considerations, or the number or complexity of the resources under management.
  • one WebLogic Server instance typically acts as the Administration Server — the server instance which configures, manages, and monitors all other server instances and resources in the domain. If a domain contains multiple clusters, each server in the domain has the same Administration Server.
  • Clustered server instances behave similarly to non-clustered instances, except that they provide failover and load balancing.
  • the process and tools used to configure clustered server instances are the same as those used to configure non-clustered instances.
  • a server cluster provides the following benefits and features:
  • Scalability The capacity of an application deployed to a cluster can be increased dynamically to meet demand. Server instances can be added to a cluster without interruption of service — the application continues to run without impact to clients and end users.
  • High-Availability- In a cluster application processing can continue when a server instance fails.
  • Application components are "clustered” by deploying them on multiple server instances in the cluster — so, if a server instance on which a component is running fails, another server instance on which that component is deployed can continue application processing.
  • Failover- Failover means that when an application component (typically referred to as an "service” in the following sections) doing a particular "job” — some set of processing tasks — becomes unavailable for any reason, a copy of the failed service finishes the job. For the new service to be able to take over for the failed service there must be a copy of the failed service available to take over the job. There must also be information, available to other services and the program that manages failover, defining the location and operational status of all services — so that it can be determined that the first service failed before finishing its job.
  • WebLogic Server uses standards-based communication techniques and facilities — multicast, IP sockets, and the Java Naming and Directory Interface (JNDI) — to share and maintain information about the availability of services in a cluster. These techniques allow the server to determine that a service stopped before finishing its job, and where there is a copy of the service to complete the job that was interrupted. Information about what has been done on a job is called state. WebLogic Server maintains information about state using techniques called session replication and replica-aware stubs. When a particular service unexpectedly stops doing its job, replication techniques enable a copy of the service pick up where the failed service stopped, and finish the job.
  • JNDI Java Naming and Directory Interface
  • Load Balancing is the even distribution of jobs and associated communications across the computing and networking resources in the application server environment. For load balancing to occur there must be multiple copies of a service that can do a particularjob. Information about the location and operational status of all services must also be available.
  • WebLogic Server allows services to be clustered — deployed on multiple server instances — so that there are alternative services to do the same job. WebLogic Server shares and maintains the availability and location of deployed services using multicast, IP sockets, and JNDI.
  • Cluster members must typically keep in touch with one another to ensure consistency throughout the cluster. This is particularly relevant in keeping track of the various resources, provided by the cluster, including the fact that some resources may be provided by certain cluster members, while other cluster members provide a different set of resources, services, etc.
  • Many application server cluster products including for example BEA's WebLogic server product maintain a cluster-wide JNDI tree or naming service that keeps track of all of the available resources and services in the cluster. Each cluster member in the cluster maintains its own naming service whose view mimics that of the global tree. I n this manner, when a client (or server, or any other process) accesses a server in the cluster they get the same set of available resources, which attempts to provide consistency throughout the cluster.
  • each server within the cluster binds its resources to its internal naming service, which is then replicated (advertised) to all of the other cluster members. For example, if a server A is providing a particular service, then information about this server is first bound to server A's naming service (for example its JNDI tree), and from there is replicated to the other servers. [0008]
  • the approach used to replicate information from one cluster member orserverto another serverwithin the cluster is to multicast the information. Using multicast, information about all of a servers resources, services, etc. is multicast to each other member of the cluster.
  • mutlicast is an unreliable transport mechanism.
  • the packet of information could be intercepted or dropped along the way, resulting in one server having a different view of the naming service from the view at another server. As such, this impinges on the consistency throughout the cluster.
  • a first approach is for the second (receiving) server to issue a request (for example a NAK request) to the first (sending) server, saying "I missed an update packet - please resend it". In return the second server will be sent the missing update.
  • Another approach is for the first server to send an aggregated view (a statedump) of all of its resources and services to the second server.
  • the statedump describes the aggregate view of the services provided by a server.
  • the invention provides a system and method for communications between servers in a cluster.
  • the system allows for point-to-point messaging to be used in a clustered environment to provide communication of services provided by each server or member of that cluster.
  • Each server or member within the cluster sends out a single advertisement as before. If one of the receiving servers misses an advertisement, i.e. it becomes out-of-sync with the sending server, then the second (receiving) server makes a reliable point-to- point request to the first (sending) server asking for everything it missed.
  • this request is in the form of an http request from the receiving server to the sending server. This process ensures the message buffers are not overflowed, which in turn improves the stability of the cluster. The result is enhanced overall cluster stability and scalability.
  • Figure 1 shows an illustration of an update mechanism between servers in a cluster, which uses multicast messaging.
  • Figure 2 shows an illustration of a cluster-join mechanism between a new server and existing servers in a cluster, which uses multicast messaging.
  • Figure 3 shows an illustration of an update mechanism between servers in a cluster, which uses point-to-point messaging, in accordance with an embodiment of the invention.
  • Figure 4 shows an illustration of a cluster-join mechanism between a new server and existing servers in a cluster, which uses point-to- point messaging, in accordance with an embodiment of the invention.
  • Figure 5 shows a flowchart of a server communication process between a first and second server in a cluster, which uses point-to-point messaging, in accordance with an embodiment of the invention.
  • the invention provides a system and method for communications between servers in a cluster.
  • the invention provides a system for point-to-point messaging that can be used with or within application servers in a clustered environment to provide communication of services provided by each server or member of that cluster.
  • Each server or member within the cluster sends out a single advertisement as before. If one of the receiving servers misses an advertisement, i.e. it becomes out-of-sync with the sending server, then the second (receiving) server makes a reliable point-to-point request to the first (sending) server asking for everything it missed.
  • this request is in the form of an http request from the receiving serverto the sending server.
  • the invention provides a useful mechanism for adding new servers as members into the cluster.
  • N requests
  • the invention provides a useful mechanism for adding new servers as members into the cluster.
  • N requests
  • the new server when a new server joined the cluster, it would have to issue requests (NAK's) to each of the servers already in the cluster requesting information from each server as to the services it provides. If there are N servers already in the cluster, and each update is M bytes in size, then this would require N x M bytes in data transfer, and would take a long time for the new serverto initialize.
  • a new server need only wait for a few seconds to see who else is in the cluster. The new server can then make a http request to one node or member to retrieve a copy of the services provided by that server. The result is a reduction in the total number and size of messages that need to be transferred to and from the new server.
  • FIG 1 shows an illustration of an update mechanism 102 between servers in a cluster, which uses multicast messaging.
  • server 1 (104) must update all of the other servers 106, 108 in the cluster (in this example server 2 and server 3) on a regular basis.
  • Each update 110 is typically of the order of 500 bytes to 2k bytes.
  • Updates are sent by multicast to each server in the cluster, and the total number of multicast messages increases proportionally with an increase in cluster size. Since multicast is not a reliable protocol, if an update is missed, each server (in this case server2 and server 3) which misses the latest update must send a request to server 1 asking it to resend the update.
  • a server may choose to send an aggregated view of the services listed in the naming service or JNDI tree at that server (a statedump).
  • these aggregated views or statedumps are typically of the order of 10k-200k bytes. Communicating such large packets of data also diminishes cluster performance.
  • FIG. 2 shows an illustration of a cluster-join mechanism 120 between a new server and existing servers in a cluster, which uses multicast messaging.
  • the joining server (server 4 (122)) must issue requests 124, 126, 128 to each other member in the cluster (server 1 , server 2, and server 3), and then receive information as to the services offered by those servers. This results in a large transfer of data 130, 132, 134 to the joining server, and causes the initialize time to be lengthened, impacting the cluster stability and performance.
  • FIG 3 shows an illustration of an update mechanism 140 between servers in a cluster, which uses point-to-point messaging, in accordance with an embodiment of the invention.
  • Figure 3 illustrates how a first server in the cluster i.e. a first cluster member, in this example shown as server 3 (148), can issue a point-to-point request 150 to a second server in the cluster, i.e. a second cluster member, in this example shown as server 1 (144), requesting that the second server communicate an update of its naming service or JNDI tree, and the services defined thereby, to the first server.
  • the first server can then update its own naming service accordingly. Fewer messages need be sent between the cluster members, and of those messages that do need to be sent fewer are of the larger statedump variety. The result is that message buffers are not likely to be overflowed, and the cluster is both more stable and more scalable.
  • the point-to-point request is made using an hypertext transfer protocol (http) request.
  • http hypertext transfer protocol
  • Http is useful because it does not require a communication socket be kept open between the servers - instead, the http socket can be opened, the message sent, and the socket closed. This eliminates the need to maintain additional sockets on the servers, together with the additional overhead and reduced performance that would entail.
  • Each server in the cluster acts independently in this regard, i.e. each server makes its own determination as to whether its naming service is out-of-sync with a particular server. If it determines that it is out of sync, then the server makes its own point-to-point request to the particular serverto receive an update and remedy the discrepancy.
  • FIG 4 shows an illustration of a cluster-join mechanism 1 0 between a new server and existing servers in a cluster, which uses point-to- point messaging, in accordance with an embodiment of the invention.
  • the joining server server 4 (162)
  • the new server can then make a point-to-point or http request 164 to one node or member to receive 166 a copy or statedump of its naming service.
  • the overall result is a reduction in the number of messages, together with corresponding better cluster stability, and shorter times for new members to be added into the cluster.
  • FIG. 5 shows a flowchart 170 of a server communication process between a first and second server in a cluster, which uses point-to- point messaging, in accordance with an embodiment of the invention.
  • a first step 172 in the process is for a first server in the cluster (i.e. a first cluster), to determine that its copy of the naming service or JNDI tree is out-of-sync with the services provided by a second server in the cluster (i.e. a second cluster member).
  • the first server issues a point-to-point request (in on embodiment an http request) to the second server, seeking a naming service update from that server.
  • a point-to-point request in on embodiment an http request
  • step 176 the second server packages an update of all of its services and, in step 178, communicates the update to the first server.
  • step 180 the first server receives the update package and uses it to synchronize its naming service with the services available at the second server.
  • the server should be able to handle http requests for updated information.
  • the server should know how to process the http request/response.
  • the server should recognize an out-of-sync condition, and act accordingly.
  • ExecuteThread import weblogic .kernel . Kernel ; import weblogic . rotocol . Protocol ; import weblogic . rotocol . ServerChannel ; import weblogic . rmi . spi .HostID; import weblogic. security. acl . internal .AuthenticatedSubject; import weblogic . security. service . PrivilegedActions ; import weblogic . security. service . SecurityServiceManager; import weblogic . server . Server; import weblogic.utils .Debug; import weblogic .utils . StringUtils ; import weblogic .utils .UnsyncStringBuffer; import weblogic .utils . io.DatalO;
  • senderNum senderNum; this.
  • srvrAddress srvrAddress; this.
  • URL url new URL ("http", srvrAddress .getAddress () , srvrAddress .getPort (
  • Server.getDebug () .getDebugClusterAnnouncements ( ) ) ⁇ ClusterLogger. logFetchServerStateDump (srvrAddress .getAddress () ) ;
  • HostID finalid (HostID) srvrAddress, ⁇ SecurityServiceManager.runAs (kernelld, kernelld, new PrivilegedAction () ⁇ public Object run() ⁇ finalmsg. execute (finalid) ; return null; ⁇
  • Server.getDebug () .getDebugClusterAnnouncements () ) ⁇ ClusterLogger . logFailedhileReceivingStateDump (srvrAddress . toString ( ) , ce) ;
  • HybridMulticastReceiver package weblogic. cluster import J va. io.DatalnputStream; import Java. io. IOException; import Java. io. OutputStream; import java.net . ProtocolException; import java.net .Socket; import Java. security .AccessController; import Java. security .PrivilegedAction; import weblogic. common. internal .WLObjectlnputStream; import weblogic . kernel . ExecuteRequest ; import weblogic. ernel .ExecuteThread; import weblogic .kernel .Kernel ; import weblogic .protocol . Protocol ; import weblogic .protocol .
  • ServerChannel import weblogic .rmi . spi .HostID; import weblogic. security. acl .internal .AuthenticatedSubject; import weblogic . security. service . PrivilegedActions ; import weblogic . security. service . SecurityServiceManager,- import weblogic . server.Server; import weblogic .utils .Debug; import weblogic .utils . StringUtils ; import weblogic .utils .UnsyncStringBuffer; import weblogic .utils . io.DatalO;
  • a MulticastReceiver assembles in-coming GroupMessages from a
  • the MulticastReceiver sends NAKs
  • a MulticastSender can be configured to provide "pretty-reliable"
  • HybridMulticastReceiver HostID memberlD, int senderNum
  • this memberlD, senderNum, Kernel .getDispatchPolicyIndex( Kernel .SYSTEM_DISPATCH)
  • HTTPExecuteRequest request new HTTPExecuteReques ( srvrAddress, lastSeqNum, senderNum, memberlD) ; Kernel .execute (request , queuelndex) ; ⁇ [0028] MulticastSessionDataRecoverySerylet package weblogic. cluster; import Java. io. IOException; import j ava . io . OutputStream; import java.util .ArrayList; import java.util .HashMap; import java.util .Iterator; import javax. servlet .ServletException; import javax. servlet . ServletlnputStream; import javax. servlet . http.HttpServlet ; import javax. servlet .http.HttpServletRequest; import javax. servlet .http.HttpServletResponse;
  • ClusterDebug. log (“SENDER” + sender + " CURRENT SEQ NUM " + sender.getCurrentSeqNumO ) ;
  • ClusterDebug. log (“WRITING BYTES OF SIZE” + baos . size 0 ) ;
  • ClusterDebug. log (“Sending statedump for " + (list.sizeO + 1)+ " servers”) ;
  • the present invention may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present disclosure.
  • Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.
  • the present invention includes a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention.
  • the storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs,VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media ordevice suitable for storing instructions and/or data.

Abstract

A system and method for communications between servers in a cluster. The system allows for point-to-point messaging to be used in a clustered environment to provide communication of services provided by each server or member of that cluster. Each server or member within the cluster advertises its services as before. If one of the receiving servers misses an advertisement, i.e. it becomes out-of-sync with the sending server, then the second (receiving) server (148) makes a reliable point-to-point request (150) to the first (sending) server (144) asking for the missed services.

Description

SYSTEM AND METHOD FOR COMMUNICATIONS BETWEEN SERVERS IN A CLUSTER
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Claim of Priority:
[0001] This application claims priority to U.S. Provisional Patent
Application 60/450,294, filed February 27, 2003 entitled "SYSTEM AND METHOD FOR COMMUNICATIONS BETWEEN SERVERS IN A CLUSTER" (Atty. Docket No. BEAS-01324US0), which is incorporated herein by reference.
Field of the Invention:
[0002] The invention relates generally to application servers and clusters of application servers, and particularly to a system and method for communications between servers in a cluster.
Background:
[0003] In the field of application servers and distributed systems, clusters of servers are often used to provide highly available and scalable resources. One example of an application server is the WebLogic Server from BEA Systems, Inc. A cluster (such as a WebLogic Server cluster) comprises multiple server instances running simultaneously and working together to provide increased scalability and reliability. A cluster appears to clients to be a single server. The server instances that constitute a cluster can run on the same machine, or be located on different machines. A cluster's capacity can be increased by adding additional server instances to the cluster on an existing machine, or by adding machines to the cluster to host the incremental server instances. Each server instance in a cluster must typically run the same version of the server product.
[0004] In terms of how a cluster relates to the environment in which the application server exists, a cluster is usually part of a particular server (e.g. WebLogic Server) domain. A domain is an interrelated set of resources that are managed as a unit. A domain includes one or more server instances, which can be clustered, non-clustered, or a combination of clustered and non-clustered instances. A domain can include multiple clusters. A domain also contains the application components deployed in the domain, and the resources and services required by those application components and the server instances in the domain. Examples of the resources and services used by applications and server instances include machine definitions, optional network channels, connectors, startup classes, EJB's, JSPs, etc. An administrator can use a variety of criteria for organizing server instances into domains. For instance, they might choose to allocate resources to multiple domains based on logical divisions of the hosted application, geographical considerations, or the number or complexity of the resources under management.
[0005] In a WebLogic domain, one WebLogic Server instance typically acts as the Administration Server — the server instance which configures, manages, and monitors all other server instances and resources in the domain. If a domain contains multiple clusters, each server in the domain has the same Administration Server.
[0006] Clustered server instances behave similarly to non-clustered instances, except that they provide failover and load balancing. The process and tools used to configure clustered server instances are the same as those used to configure non-clustered instances. A server cluster provides the following benefits and features:
Scalability - The capacity of an application deployed to a cluster can be increased dynamically to meet demand. Server instances can be added to a cluster without interruption of service — the application continues to run without impact to clients and end users.
High-Availability- In a cluster, application processing can continue when a server instance fails. Application components are "clustered" by deploying them on multiple server instances in the cluster — so, if a server instance on which a component is running fails, another server instance on which that component is deployed can continue application processing.
Failover- Failover means that when an application component (typically referred to as an "service" in the following sections) doing a particular "job" — some set of processing tasks — becomes unavailable for any reason, a copy of the failed service finishes the job. For the new service to be able to take over for the failed service there must be a copy of the failed service available to take over the job. There must also be information, available to other services and the program that manages failover, defining the location and operational status of all services — so that it can be determined that the first service failed before finishing its job. There must also be information, available to other services and the program that manages failover, about the progress of jobs in process — so that a service taking over an interrupted job knows how much of the job was completed before the first service failed, for example, what data has been changed, and what steps in the process were completed. Many application servers, including WebLogic Server, use standards-based communication techniques and facilities — multicast, IP sockets, and the Java Naming and Directory Interface (JNDI) — to share and maintain information about the availability of services in a cluster. These techniques allow the server to determine that a service stopped before finishing its job, and where there is a copy of the service to complete the job that was interrupted. Information about what has been done on a job is called state. WebLogic Server maintains information about state using techniques called session replication and replica-aware stubs. When a particular service unexpectedly stops doing its job, replication techniques enable a copy of the service pick up where the failed service stopped, and finish the job.
Load Balancing - Load balancing is the even distribution of jobs and associated communications across the computing and networking resources in the application server environment. For load balancing to occur there must be multiple copies of a service that can do a particularjob. Information about the location and operational status of all services must also be available. In addition, WebLogic Server allows services to be clustered — deployed on multiple server instances — so that there are alternative services to do the same job. WebLogic Server shares and maintains the availability and location of deployed services using multicast, IP sockets, and JNDI.
[0007] Cluster members must typically keep in touch with one another to ensure consistency throughout the cluster. This is particularly relevant in keeping track of the various resources, provided by the cluster, including the fact that some resources may be provided by certain cluster members, while other cluster members provide a different set of resources, services, etc. Many application server cluster products, including for example BEA's WebLogic server product maintain a cluster-wide JNDI tree or naming service that keeps track of all of the available resources and services in the cluster. Each cluster member in the cluster maintains its own naming service whose view mimics that of the global tree. I n this manner, when a client (or server, or any other process) accesses a server in the cluster they get the same set of available resources, which attempts to provide consistency throughout the cluster. During normal use each server within the cluster binds its resources to its internal naming service, which is then replicated (advertised) to all of the other cluster members. For example, if a server A is providing a particular service, then information about this server is first bound to server A's naming service (for example its JNDI tree), and from there is replicated to the other servers. [0008] Typically, the approach used to replicate information from one cluster member orserverto another serverwithin the cluster is to multicast the information. Using multicast, information about all of a servers resources, services, etc. is multicast to each other member of the cluster. However, mutlicast is an unreliable transport mechanism. The packet of information could be intercepted or dropped along the way, resulting in one server having a different view of the naming service from the view at another server. As such, this impinges on the consistency throughout the cluster. [0009] Traditionally, there are two primary methods to make the multicast process more reliable at a higher level. A first approach is for the second (receiving) server to issue a request (for example a NAK request) to the first (sending) server, saying "I missed an update packet - please resend it". In return the second server will be sent the missing update. Another approach is for the first server to send an aggregated view (a statedump) of all of its resources and services to the second server. The statedump describes the aggregate view of the services provided by a server. Large packets of multicast messages exchanged between servers in a cluster can potentially destabilize the cluster. Frequent resend requests for service advertisements can quickly overflow the operating system message buffers, causing stability problems. As the number of services provided by a server increase, so does the size of the statedump. Coupled with the increasing of a cluster, this could lead to longer startup time and the time each server takes to stabilize in a cluster. The need to send frequent large multicast messages also impacts the cluster scalability and the performance. Summary:
[0010] The invention provides a system and method for communications between servers in a cluster. The system allows for point-to-point messaging to be used in a clustered environment to provide communication of services provided by each server or member of that cluster. Each server or member within the cluster sends out a single advertisement as before. If one of the receiving servers misses an advertisement, i.e. it becomes out-of-sync with the sending server, then the second (receiving) server makes a reliable point-to- point request to the first (sending) server asking for everything it missed. In accordance with one embodiment this request is in the form of an http request from the receiving server to the sending server. This process ensures the message buffers are not overflowed, which in turn improves the stability of the cluster. The result is enhanced overall cluster stability and scalability.
Brief Description of the Figures:
[0011] Figure 1 shows an illustration of an update mechanism between servers in a cluster, which uses multicast messaging. [0012] Figure 2 shows an illustration of a cluster-join mechanism between a new server and existing servers in a cluster, which uses multicast messaging.
[0013] Figure 3 shows an illustration of an update mechanism between servers in a cluster, which uses point-to-point messaging, in accordance with an embodiment of the invention.
[0014] Figure 4 shows an illustration of a cluster-join mechanism between a new server and existing servers in a cluster, which uses point-to- point messaging, in accordance with an embodiment of the invention. [0015] Figure 5 shows a flowchart of a server communication process between a first and second server in a cluster, which uses point-to-point messaging, in accordance with an embodiment of the invention. Detailed Description:
[0016] As disclosed herein, and embodiment of the present invention provides a system and method for communications between servers in a cluster. Generally described , the invention provides a system for point-to-point messaging that can be used with or within application servers in a clustered environment to provide communication of services provided by each server or member of that cluster. Each server or member within the cluster sends out a single advertisement as before. If one of the receiving servers misses an advertisement, i.e. it becomes out-of-sync with the sending server, then the second (receiving) server makes a reliable point-to-point request to the first (sending) server asking for everything it missed. In accordance with one embodiment this request is in the form of an http request from the receiving serverto the sending server. This process ensures the message buffers are not overflowed. The result is enhanced overall cluster stability and scalability. [0017] In addition, the invention provides a useful mechanism for adding new servers as members into the cluster. In the past, when a new server joined the cluster, it would have to issue requests (NAK's) to each of the servers already in the cluster requesting information from each server as to the services it provides. If there are N servers already in the cluster, and each update is M bytes in size, then this would require N x M bytes in data transfer, and would take a long time for the new serverto initialize. Using the present invention, a new server need only wait for a few seconds to see who else is in the cluster. The new server can then make a http request to one node or member to retrieve a copy of the services provided by that server. The result is a reduction in the total number and size of messages that need to be transferred to and from the new server.
[0018] Figure 1 shows an illustration of an update mechanism 102 between servers in a cluster, which uses multicast messaging. As shown in Figure 1 , server 1 (104) must update all of the other servers 106, 108 in the cluster (in this example server 2 and server 3) on a regular basis. Each update 110 is typically of the order of 500 bytes to 2k bytes. Updates are sent by multicast to each server in the cluster, and the total number of multicast messages increases proportionally with an increase in cluster size. Since multicast is not a reliable protocol, if an update is missed, each server (in this case server2 and server 3) which misses the latest update must send a request to server 1 asking it to resend the update. A problem arises when the servers message buffers begin to overflow, with a resulting loss in cluster performance. A server may choose to send an aggregated view of the services listed in the naming service or JNDI tree at that server (a statedump). However, these aggregated views or statedumps are typically of the order of 10k-200k bytes. Communicating such large packets of data also diminishes cluster performance.
[0019] Figure 2 shows an illustration of a cluster-join mechanism 120 between a new server and existing servers in a cluster, which uses multicast messaging. As shown in Figure 2, the joining server (server 4 (122)) must issue requests 124, 126, 128 to each other member in the cluster (server 1 , server 2, and server 3), and then receive information as to the services offered by those servers. This results in a large transfer of data 130, 132, 134 to the joining server, and causes the initialize time to be lengthened, impacting the cluster stability and performance.
[0020] Figure 3 shows an illustration of an update mechanism 140 between servers in a cluster, which uses point-to-point messaging, in accordance with an embodiment of the invention. Figure 3 illustrates how a first server in the cluster i.e. a first cluster member, in this example shown as server 3 (148), can issue a point-to-point request 150 to a second server in the cluster, i.e. a second cluster member, in this example shown as server 1 (144), requesting that the second server communicate an update of its naming service or JNDI tree, and the services defined thereby, to the first server. The first server can then update its own naming service accordingly. Fewer messages need be sent between the cluster members, and of those messages that do need to be sent fewer are of the larger statedump variety. The result is that message buffers are not likely to be overflowed, and the cluster is both more stable and more scalable.
[0021] In one embodiment the point-to-point request is made using an hypertext transfer protocol (http) request. Http is useful because it does not require a communication socket be kept open between the servers - instead, the http socket can be opened, the message sent, and the socket closed. This eliminates the need to maintain additional sockets on the servers, together with the additional overhead and reduced performance that would entail. [0022] Each server in the cluster acts independently in this regard, i.e. each server makes its own determination as to whether its naming service is out-of-sync with a particular server. If it determines that it is out of sync, then the server makes its own point-to-point request to the particular serverto receive an update and remedy the discrepancy.
[0023] Figure 4 shows an illustration of a cluster-join mechanism 1 0 between a new server and existing servers in a cluster, which uses point-to- point messaging, in accordance with an embodiment of the invention. As shown in Figure 4, the joining server (server 4 (162)), need only wait for a few seconds to see who else is in the cluster. The new server can then make a point-to-point or http request 164 to one node or member to receive 166 a copy or statedump of its naming service. There is no need to communicate directly with any of the other servers. The overall result is a reduction in the number of messages, together with corresponding better cluster stability, and shorter times for new members to be added into the cluster. [0024] Figure 5 shows a flowchart 170 of a server communication process between a first and second server in a cluster, which uses point-to- point messaging, in accordance with an embodiment of the invention. As listed in Figure 5, a first step 172 in the process is for a first server in the cluster (i.e. a first cluster), to determine that its copy of the naming service or JNDI tree is out-of-sync with the services provided by a second server in the cluster (i.e. a second cluster member). In step 174, the first server issues a point-to-point request (in on embodiment an http request) to the second server, seeking a naming service update from that server. In step 176, the second server packages an update of all of its services and, in step 178, communicates the update to the first server. In step 180, the first server receives the update package and uses it to synchronize its naming service with the services available at the second server.
Code Implementation
[0025] The following code segments illustrate, by way of example, how the system can be provided to communicate information between cluster members, in a point-to-point fashion in accordance with one embodiment of the invention. It will be evident that additional implementations may be developed within the spirt and scope of the invention, and that the invention is not limited to the examples shown. The key points to note are that any system , application server, or server that incorporates or utilizes the invention, should support the following features:
The server should be able to handle http requests for updated information.
The server should know how to process the http request/response.
The server should recognize an out-of-sync condition, and act accordingly.
r0026T HTTPExecuteReαuest. Java package weblogic. cluster; import Java. io.DatalnputStream; import java. io. IOExceptiori; import j ava . io . OutputStream; import java.net .ConnectException; import java .ne .HttpURLConnection; import j ava . net . ProtocolException; import j ava . net .UR ; import Java. security.AccessController; import Java. security. PrivilegedAction; import weblogic .common. internal .WLObjectlnputStream; import weblogic . kernel . ExecuteRequest ; import weblogic . kernel . ExecuteThread; import weblogic .kernel . Kernel ; import weblogic . rotocol . Protocol ; import weblogic . rotocol . ServerChannel ; import weblogic . rmi . spi .HostID; import weblogic. security. acl . internal .AuthenticatedSubject; import weblogic . security. service . PrivilegedActions ; import weblogic . security. service . SecurityServiceManager; import weblogic . server . Server; import weblogic.utils .Debug; import weblogic .utils . StringUtils ; import weblogic .utils .UnsyncStringBuffer; import weblogic .utils . io.DatalO;
/* package >•/ final class HTTPExecuteRequest implements ExecuteRequest { private HttpURLConnection con; private DatalnputStream in; private final String request; private final ServerChannel srvrAddress; private final int senderNum; private final HostID memberlD; private static AuthenticatedSubject kernelld = (Aut enticatedSubject) AccessController. doPrivileged (PrivilegedActions.getKernelldentityAction ( ) ) ; private static final boolean DEBUG = false; public HTTPExecuteRequest (ServerChannel srvrAddress, long lastSeqNum, int senderNum, HostID memberlD
) { this . senderNum = senderNum; this. srvrAddress = srvrAddress; this. request = getHeader (srvrAddress, lastSeqNum) ; this .memberlD = memberlD; } private void connect () throws ConnectException, IOException { URL url = new URL ("http", srvrAddress .getAddress () , srvrAddress .getPort (
Protocol. R0T0C0L_HTTP) , request) ; con = (HttpURLConnection) url .openConnectionO ; con. setDoInput (true) ; con. connect () ; in = new DatalnputStream(con. getlnputStreamO ) ; } public void execute (ExecuteThread thread) { if (DEBUG) {
ClusterDebug.log ("Request " + request + " to " + srvrAddress);
} try { if (ClusterDebug.DEBUG &&
Server.getDebug () .getDebugClusterAnnouncements ( ) ) { ClusterLogger. logFetchServerStateDump (srvrAddress .getAddress () ) ;
} connect ( ) ; if (con.getResponseCode () != 200) throw new I0Exception( "Failed to get OK response"); if (DEBUG) {
ClusterDebug. log ("GOT CONTENT LENGTH " + con.getContentLengt () ) ;
} byte[] b = readHttpResponse (in, con.getContentLengthO ) ; WLObjectlnputStream ois = MulticastManager.getlnputStream(b) ; final MemberAttributes attributes = (MemberAttributes) ois .readobject () ,- processAttributes (attributes) ; final GroupMessage finalmsg = (GroupMessage) ois . readobject () ; long currentSeqNum = ois .readLong () ;
// FIXME andyp l-Aug-02 -- identity and addressing are different final HostID finalid = (HostID) srvrAddress, SecurityServiceManager.runAs (kernelld, kernelld, new PrivilegedAction () { public Object run() { finalmsg. execute (finalid) ; return null; }
});
} catch (ConnectException ce) { if (ClusterDebug.DEBUG &&
Server.getDebug () .getDebugClusterAnnouncements () ) { ClusterLogger . logFailedhileReceivingStateDump (srvrAddress . toString ( ) , ce) ;
} } catch (IOException ioe) { ClusterLogger. logFailedWhileReceivingStateDum (srvrAddress . toString ( ) , ioe) ; } catch (ClassNotFoundException cnfe) {
ClusterLogger. logFailedToDeserializeStateDum (srvrAddress . toString ( ) , cnfe) ; } finally { try { if (in != null) in.closeO; } catch (IOException ioe) { /* ignore */ } if (con != null) con. disconnect () ; resetHTTPRequestDispatchFlagO ; } } private void resetHTTPRequestDispatchFlagO {
RemoteMemberlnfo info = MemberManager. theOne (). findOrCreate (memberlD) ; HybridMulticastReceiver receiver = (HybridMulticastReceiver) info.findOrCreateReceiver (senderNum, true) ; receiver. setHttpRequestDispatched (false) ; MemberManager . theOne ( ) . done (info) ; } private String getHeader (ServerChannel address, long lastSeqNum) {
UnsyncStringBuffer sb = new UnsyncStringBuffer () ; sb . append ( "/bea_wls_internal/psquare/p2. j sp?senderNum= " ) ; sb . append ( senderNum) ,- sb . append ( "&lastSeqNum=" ) ; sb. append (lastSeqNum) ; sb . append ( " " ) ; return sb . toString ( ) ; } private byte[] readHttpResponse (DatalnputStream is, int contentLength) throws IOException, ProtocolException byte [] b = new byte [contentLength] , DataIθ.readFully(is, b) ; return b;
private void processAttributes (MemberAttributes attributes) { RemoteMemberlnfo info = MemberManager . theOne (). findOrCreate ( attributes . identity ( ) ) ; info.processAttributes (attributes) ; MemberManager . theOne ( ) .done (info) ;
}
[0027] HybridMulticastReceiver package weblogic. cluster; import J va. io.DatalnputStream; import Java. io. IOException; import Java. io. OutputStream; import java.net . ProtocolException; import java.net .Socket; import Java. security .AccessController; import Java. security .PrivilegedAction; import weblogic. common. internal .WLObjectlnputStream; import weblogic . kernel . ExecuteRequest ; import weblogic. ernel .ExecuteThread; import weblogic .kernel .Kernel ; import weblogic .protocol . Protocol ; import weblogic .protocol . ServerChannel ; import weblogic .rmi . spi .HostID; import weblogic. security. acl .internal .AuthenticatedSubject; import weblogic . security. service . PrivilegedActions ; import weblogic . security. service . SecurityServiceManager,- import weblogic . server.Server; import weblogic .utils .Debug; import weblogic .utils . StringUtils ; import weblogic .utils .UnsyncStringBuffer; import weblogic .utils . io.DatalO;
* A MulticastReceiver assembles in-coming GroupMessages from a
* MulticastSender and executes them in order. At any point in time,
* there is a current message that it is assembling. Fragments for
* this message are assumed to be lost whenever either a heartbeat or
* a fragment arrives with a sequence number that is beyond this
* message's sequence number. The MulticastReceiver sends NAKs
* only with respect to this message. Fragments for future messages
* are kept in a fixed-size cache and are dealt with as each becomes
* current .
* A MulticastSender can be configured to provide "pretty-reliable"
* delivery or best-effort delivery. It communicates this to the
* MulticastReceiver by sending the retryEnabled flag (true means
* pretty-reliable delivery) . If it is false, the pair does not
* engage in the Heartbeat retry protocol . The MulticastReceiver
* still uses the cache, so that mis-ordered fragments can be handled,
* however it freely drops the current message as needed to make
* progress.
* SYNCHRONIZATION NOTES: There are three ways into a MulticastReceiver. •• - dispatch () to handle an incoming fragment
* - processLastSeqNumO to handle an incoming LastSeqNum from a Heartbeat
* - shutdow () to shut things down.
* All are synchronized to protect the local variables. *
* ©author Copyright (c) 1996-98 by WebLogic, Inc. All Rights Reserved.
* ©author Copyright (c) 1999-2000 by BEA WebXpress. All Rights Reserved. */ public class HybridMulticastReceiver extends MulticastReceiver { private final static boolean DEBUG = ClusterDebug.DEBUG && Server.getDebug 0.getDebugClusterAnnouncements 0 ; private boolean httpReqDispatched; //HTTP Request dispatched to get statedump private ServerChannel srvrAddress; private int senderNum; private int queuelndex; private final HostID memberlD; private static AuthenticatedSubject kernelld = (AuthenticatedSubject) AccessController. doPrivileged (PrivilegedActions .getKernelldentityAction ( ) ) ;
/♦package*/ HybridMulticastReceiver (HostID memberlD, int senderNum) { this (memberlD, senderNum, Kernel .getDispatchPolicyIndex( Kernel .SYSTEM_DISPATCH) ) ; }
// The following constructor should be used if you want request processed // by specific queue in the kernel.
/♦package*/ HybridMulticastReceiver ( HostID memberlD, int senderNum, int queuelndex
) { super (memberlD, senderNum, queuelndex) ; srvrAddress = (ServerChannel) memberlD; this . senderNum = senderNum; this .queuelndex = queuelndex; this .memberlD = memberlD; }
/* package */ void processLastSeqNum(long lastSeqNum) { if (lastSeqNum >= currentSeqNum) { fetchStateDumpOverHttp (lastSeqNum) ;
} }
/* package */ void setlnSync (int lastSeqNum) { synchronized (this) { httpReqDispatched = false; super . setlnSync (lastSeqNum) ; } }
/* package */ void setHttpRequestDispatched (boolean b) { synchronized (this) { httpReqDispatched = false;
} } private void fetchStateDumpOverHttp (long lastSeqNum) { if (httpReqDispatched) return; synchronized (this) { httpReqDispatched = true;
}
HTTPExecuteRequest request = new HTTPExecuteReques ( srvrAddress, lastSeqNum, senderNum, memberlD) ; Kernel .execute (request , queuelndex) ; } [0028] MulticastSessionDataRecoverySerylet package weblogic. cluster; import Java. io. IOException; import j ava . io . OutputStream; import java.util .ArrayList; import java.util .HashMap; import java.util .Iterator; import javax. servlet .ServletException; import javax. servlet . ServletlnputStream; import javax. servlet . http.HttpServlet ; import javax. servlet .http.HttpServletRequest; import javax. servlet .http.HttpServletResponse;
import weblogic . common . internal . LObj ectOutputStream; import weblogic .rmi . spi .HostID; import weblogic . rmi . spi . MlRuntime ; import weblogic. rmi .utils . io.RemoteObjectReplacer; import weblogic .utils .Debug ; import weblogic .utils . io .UnsyncByteArrayOutputStream;
* ©author Copyright (c) 2002 by BEA WebXpress. All Rights Reserved. */ public final class MulticastSessionDataRecoveryServlet extends HttpServlet { private final static boolean DEBUG = false; private final static int DEFAULT_BUF_SIZE = 10 "• 1024; public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
String senderNumAsString = req. getParameter ( "senderNum" ) ; String lastSeqNumAsString = req. getParameter ( "lastSeqNum" ) ; if (DEBUG) {
ClusterDebug.log ("Nak request for senderNum " + senderNumAsString);
ClusterDebug. log ("Last seq num " + lastSeqNumAsString);
} int senderNum = Integer .valueOf (senderNumAsString) . intValue ; int lastSeqNum = Integer .valueOf (lastSeqNumAsString) . intValue () ;
UnsyncByteArrayOutputStream baos = null;
WLObjectOutputStream oos = null;
OutputStream out = null; try { baos = new UnsyncByteArrayOutputStream (DEFAULT_BUF_SIZE) ; oos = new WLObjectOutputStream (baos) ; oos. setReplacer (new MulticastReplacer (RMlRuntime. getLocalHostlD 0 ) ) ;
Mul t ica s tSender s ender MulticastManager.theOne () . findSender (senderNum) ; if (DEBUG) {
ClusterDebug. log ("SENDER " + sender + " CURRENT SEQ NUM " + sender.getCurrentSeqNumO ) ;
}
GroupMessage msg = sender. createRecoverMessage () ; oos.writeObject (AttributeManager. theOne () .getLocalAttributes () ) ; oos .writeObject (msg) ; oos .writeLong (sender .getCurrentSeqNumO ) ; oos . flush ( ) ,- res . setContentType ( "application/unknown" ) ; out = res.getOutputStreamO ; res . setContentLength (baos . size ( ) ) ; if (DEBUG) {
ClusterDebug. log ("WRITING BYTES OF SIZE " + baos . size 0 ) ;
} baos.writeTo(out) ; out. flush () ,- } finally { try { if (baos != null) { baos .close () ; } } catch (IOException ioe) {} try { if (out != null) { out. close (); } } catch (IOException ioe) {} try { if (oos != null) { oos. close () ; } } catch (IOException ioe) {} } } }
r002@1 StateDumpSerylet package weblogic. cluster; import Java. io.ByteArrayOutputStream; import Java. io. IOException; import Java. io. OutputStream; import Java. security.AccessController; import Java. security. PrivilegedAction; import java.util .ArrayList; import j ava .util .HashMap; import java.util . Iterator; import javax. servlet .ServletException; import javax. servlet .ServletlnputStream; import j ava . servlet . http .HttpServlet ; import javax. servlet . http. HttpServletRequest ; import javax. servlet .http.HttpServletResponse;
import weblogic . common . internal .WLObj ectlnputstream; import weblogic. common. internal .WLObjectOutputStream; import weblogic. mi . spi .HostID; import weblogic . rmi . spi . MlRuntime; import weblogic.rmi .utils . io.RemoteObjectReplacer; import weblogic . security. acl . internal .AuthenticatedSubject; import weblogic . security. service . PrivilegedActions ; import weblogic . security. service . SecurityServiceManager; import weblogic. server.Server; import weblogic.utils.Debug; import weblogic .utils . io .DatalO;
/** * ©author Copyright (c) 2002 by BEA WebXpress. All Rights Reserved. */ public final class StateDumpServlet extends HttpServlet implements MulticastSessionlDConstants { private final static boolean DEBUG = ClusterDebug.DEBUG && Server.getDebug 0.getDebugClusterAnnouncements 0 ;
private final static int DEFAULT_BUF_SIZE = 10 * 1024; private static AuthenticatedSubject kernelld = (AuthenticatedSubject) AccessController.doPrivilege (PrivilegedActions.getKernelldentityAction () ) ; public void doGet (HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException
{
ByteArrayOutputStream baos = null; WLObjectOutputStream oos = null; OutputStream out = null; try { baos = new ByteArrayOutputStream (DEFAULT_BUF_SIZE) ; oos = new WLObjectOutputStream(baos) ,- ArrayList list = (ArrayList) MemberManager. theOne () .getRemoteMembers 0 ; if (DEBUG) {
ClusterDebug. log ("Sending statedump for " + (list.sizeO + 1)+ " servers") ;
} oos .writelnt (list . size () ) ; for (int i = 0; i < list.sizeO; i++) {
MemberAttributes attr = (MemberAttributes) list.get(i); RemoteMemberlnfo memlnfo = MemberManager. theOne (). findOrCreate ( attr. identity ( ) ) ; HostID hostID = memlnfo.getAttributes 0 -identity() ; oos . setReplacer (new MulticastReplacer (hostID) ) ; oos .writeObjectWL (memlnfo .getAttributes 0 ) ; oos .writeObject (new
StateDumpMessage (memlnfo.getMemberServices () .getAllOffers () , ANNOUNCEMENT_MANAGER_ID, memlnfo . findOrCreateReceiver (ANNOUNCEMENT_MANAGER_ID, true ) .getCurrentSeqNu O ) ) ; if (DEBUG) { ClusterDebug. log ("Sending offers of size " + memlnfo.getMemberServices 0.getAllOffers 0. size () + " of " + hostID);
}
MemberManager. theOne () .done (memlnfo) ;
} oos. setReplacer (new MulticastReplacer (RMlRuntime .getLocalHostlD 0 ) ) ; oos.writeObject (AttributeManager .theOne () .getLocalAttributes 0 ) ; oos.writeObjec (AnnouncementManager. theOne 0. createRecoverMessage () ) , oos . flush ( ) ; res . setContentType ( "application/unknown" ) ; out = res .getOutputStream ; if (DEBUG) {
ClusterDebug. log ("WRITING DATA OF SIZE " + baos. size ()) ;
} res . setContentLength (baos . size ( ) ) ; baos.writeTo(out) ; out .flush0 ; } finally { try { if (baos != null) baos .close () ; } catch (IOException ioe) {} try { if (oos != null) oos .close () ; } catch (IOException ioe) {} try { if (out != null) out .close () ; } catch (IOException ioe) {} } }
}
[0030] The present invention may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.
[0031] In some embodiments, the present invention includes a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs,VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media ordevice suitable for storing instructions and/or data.
[0032] The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. Particularly, it will be evident that while the examples described herein illustrate how the invention may be used in a WebLogic environment, other application servers, servers, server clusters, and computing environments, may use and benefit from the invention. The code examples given are presented for purposes of illustration. It will be evident that the techniques described herein may be applied using other code languages, and with different code. [0033] The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.

Claims

Claims:What is claimed is:
1. A system for communicating information about server resources between servers in a cluster, comprising: a cluster having a plurality of servers, including a first server and a second server; a set of resources or services on said first server that may be used by other servers in the cluster; and, wherein said first server sends an advertisement of its services to other servers in the cluster, wherein if said second server determines it is out of synchronization with said first server, or missed an advertisement, said second server makes a point-to-point request to said first server requesting any advertisements missed, and, wherein said first server responds to said point-to-point request by sending updated information to said second server.
2. The system of claim 1 wherein said request is in the form of an http request.
3. The system of claim 1 wherein each member ofthe cluster receives the advertisement, but those members who do not need to be updated ignore the advertisement.
4. The system of claim 1 wherein a third server may be newly added to the cluster, and wherein said third server waits for advertisements and then makes point-to-point requests to each server requesting advertisements it missed from that particular server.
5. The system of claim 1 wherein the request is a request to retrieve an update to or a copy of the sending servers JNDI tree.
6. The system of claim 5 wherein the determination as to whether the first server is out of synchronization with said first server, or missed an advertisement, is made by determining that the first server's JNDI tree is out of synchronization with the second server's JNDI tree.
7. The system of claim 6 wherein the receipt of updated information at said second server is used to synchronize its internal JNDI tree with the resources provided at first server.
8. The system of claim 5 wherein as part of the advertisement the first server packages a JNDI update of all if its services and multicasts the package to all cluster members.
9. A method of communicating information about server resources between servers in a cluster, comprising the steps of: providing a cluster including a first server and a second server, and resources operating thereon; sending an advertisement from said first server to other servers in the cluster announcing the resources or services on said first server; subsequently, if said second server determines it is out of synchronization with said first server, or missed an advertisement, making a point-to-point request from said second serverto said first server requesting any advertisements missed; and, receiving updated information from said first server at said second server and updating said second server accordingly.
10. The method of claim 9 wherein said request is in the form of an http request.
11. The method of claim 9 wherein each member of the cluster receives the advertisement, but those members who do not need to be updated ignore the advertisement.
12. The method of claim 9 wherein a third server may be newly added to the cluster, and wherein said third server waits for advertisements and then makes point-to-point requests to each server requesting advertisements it missed from that particular server.
13. The method of claim 9 wherein the request is a request to retrieve an update to or a copy of the sending servers JNDI tree.
14. The method of claim 13 wherein the determination as to whetherthe first server is out of synchronization with said first server, or missed an advertisement, is made by determining that the first server's JNDI tree is out of synchronization with the second server's JNDI tree.
15. The method of claim 14 wherein the receipt of updated information at said second server is used to synchronize its internal JNDI tree with the resources provided at first server.
16. The method of claim 13 wherein as part of the advertisement the first server packages a JNDI update of all if its services and multicasts the package to all cluster members.
17. A computer readable medium including instructions stored thereon which when executed cause the computer or computers to perform the steps of : providing a cluster including a first server and a second server, and resources operating thereon; sending an advertisement from said first serverto other servers in the cluster announcing the resources on said first server; subsequently, if said second server determines it is out of synchronization with said first server, or missed an advertisement, making a point-to-point request from said second server to said first server requesting any advertisements missed; and, receiving updated information from said first server at said second server and updating said second server accordingly.
18. The computer readable medium of claim 17 wherein said request is in the form of an http request.
19. The computer readable medium of claim 17 wherein each member of the cluster receives the advertisement, but those members who do not need to be updated ignore the advertisement.
20. The computer readable medium of claim 17 wherein a third server may be newly added to the cluster, and wherein said third server waits for advertisements and then makes point-to-point requests to each server requesting advertisements it missed from that particular server.
21. The computer readable medium of claim 17 wherein the request is a request to retrieve an update to or a copy of the sending servers JNDI tree.
22. The computer readable medium of claim 21 wherein the determination as to whether the first server is out of synchronization with said first server, or missed an advertisement, is made by determining that the first server's JNDI tree is out of synchronization with the second server's JNDI tree.
23. The computer readable medium of claim 22 wherein the receipt of updated information at said second server is used to synchronize its internal JNDI tree with the resources provided at first server.
24. The computer readable medium of claim 21 wherein as part of the advertisement the first server packages a JNDI update of all if its services and multicasts the package to all cluster members.
PCT/US2004/006215 2003-02-27 2004-02-27 System and method for communications between servers in a cluster WO2004077280A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US45029403P 2003-02-27 2003-02-27
US60/450,294 2003-02-27

Publications (2)

Publication Number Publication Date
WO2004077280A2 true WO2004077280A2 (en) 2004-09-10
WO2004077280A3 WO2004077280A3 (en) 2006-06-29

Family

ID=32927631

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/006215 WO2004077280A2 (en) 2003-02-27 2004-02-27 System and method for communications between servers in a cluster

Country Status (2)

Country Link
US (2) US7376754B2 (en)
WO (1) WO2004077280A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007057284A1 (en) 2005-11-17 2007-05-24 International Business Machines Corporation Sending routing data based on times that servers joined a cluster
US7571255B2 (en) 2003-02-27 2009-08-04 Bea Systems, Inc. System and method for communication between servers in a cluster

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8108429B2 (en) * 2004-05-07 2012-01-31 Quest Software, Inc. System for moving real-time data events across a plurality of devices in a network for simultaneous data protection, replication, and access services
US7565661B2 (en) 2004-05-10 2009-07-21 Siew Yong Sim-Tang Method and system for real-time event journaling to provide enterprise data services
US9122686B2 (en) * 2004-05-27 2015-09-01 Sap Se Naming service in a clustered environment
US7680834B1 (en) 2004-06-08 2010-03-16 Bakbone Software, Inc. Method and system for no downtime resychronization for real-time, continuous data protection
US7979404B2 (en) 2004-09-17 2011-07-12 Quest Software, Inc. Extracting data changes and storing data history to allow for instantaneous access to and reconstruction of any point-in-time data
US7904913B2 (en) 2004-11-02 2011-03-08 Bakbone Software, Inc. Management interface for a system that provides automated, real-time, continuous data protection
US7788521B1 (en) 2005-07-20 2010-08-31 Bakbone Software, Inc. Method and system for virtual on-demand recovery for real-time, continuous data protection
US7689602B1 (en) * 2005-07-20 2010-03-30 Bakbone Software, Inc. Method of creating hierarchical indices for a distributed object system
US20070094343A1 (en) * 2005-10-26 2007-04-26 International Business Machines Corporation System and method of implementing selective session replication utilizing request-based service level agreements
US8131723B2 (en) 2007-03-30 2012-03-06 Quest Software, Inc. Recovering a file system to any point-in-time in the past with guaranteed structure, content consistency and integrity
US8364648B1 (en) 2007-04-09 2013-01-29 Quest Software, Inc. Recovering a database to any point-in-time in the past with guaranteed data consistency
JP5056504B2 (en) * 2008-03-13 2012-10-24 富士通株式会社 Control apparatus, information processing system, control method for information processing system, and control program for information processing system
US8892689B1 (en) * 2008-04-30 2014-11-18 Netapp, Inc. Method and apparatus for a storage server to automatically discover and join a network storage cluster
US9262229B2 (en) * 2011-01-28 2016-02-16 Oracle International Corporation System and method for supporting service level quorum in a data grid cluster
US10706021B2 (en) 2012-01-17 2020-07-07 Oracle International Corporation System and method for supporting persistence partition discovery in a distributed data grid
US10320892B2 (en) * 2015-01-02 2019-06-11 Microsoft Technology Licensing, Llc Rolling capacity upgrade control
US10021008B1 (en) 2015-06-29 2018-07-10 Amazon Technologies, Inc. Policy-based scaling of computing resource groups
US10148592B1 (en) * 2015-06-29 2018-12-04 Amazon Technologies, Inc. Prioritization-based scaling of computing resources
US11550820B2 (en) 2017-04-28 2023-01-10 Oracle International Corporation System and method for partition-scoped snapshot creation in a distributed data computing environment
US10769019B2 (en) 2017-07-19 2020-09-08 Oracle International Corporation System and method for data recovery in a distributed data computing environment implementing active persistence
US10862965B2 (en) 2017-10-01 2020-12-08 Oracle International Corporation System and method for topics implementation in a distributed data computing environment
US20220092481A1 (en) * 2020-09-18 2022-03-24 Dell Products L.P. Integration optimization using machine learning algorithms

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5452448A (en) * 1992-03-16 1995-09-19 Hitachi, Ltd. Method of replicate file updating by monitoring file accesses and system therefor
US20030110172A1 (en) * 2001-10-24 2003-06-12 Daniel Selman Data synchronization

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6006254A (en) * 1997-08-29 1999-12-21 Mitsubishi Electric Information Technology Center America, Inc. System for the reliable, fast, low-latency communication of object state updates over a computer network by combining lossy and lossless communications
US6256634B1 (en) * 1998-06-30 2001-07-03 Microsoft Corporation Method and system for purging tombstones for deleted data items in a replicated database
US6236999B1 (en) * 1998-11-05 2001-05-22 Bea Systems, Inc. Duplicated naming service in a distributed processing system
US6577599B1 (en) * 1999-06-30 2003-06-10 Sun Microsystems, Inc. Small-scale reliable multicasting
US6349091B1 (en) * 1999-11-12 2002-02-19 Itt Manufacturing Enterprises, Inc. Method and apparatus for controlling communication links between network nodes to reduce communication protocol overhead traffic
US6385174B1 (en) * 1999-11-12 2002-05-07 Itt Manufacturing Enterprises, Inc. Method and apparatus for transmission of node link status messages throughout a network with reduced communication protocol overhead traffic
US6782398B1 (en) * 2000-06-14 2004-08-24 Microsoft Corporation Method for executing commands on multiple computers of a network
US6965938B1 (en) * 2000-09-07 2005-11-15 International Business Machines Corporation System and method for clustering servers for performance and load balancing
US6912569B1 (en) * 2001-04-30 2005-06-28 Sun Microsystems, Inc. Method and apparatus for migration of managed application state for a Java based application
US7571215B2 (en) * 2001-07-16 2009-08-04 Bea Systems, Inc. Data replication protocol
US7028030B2 (en) * 2001-08-30 2006-04-11 Bea Systems, Inc. Cluster caching with concurrency checking
US7136879B2 (en) * 2002-01-18 2006-11-14 Bea Systems, Inc. System and method for read-only entity bean caching
US9167036B2 (en) * 2002-02-14 2015-10-20 Level 3 Communications, Llc Managed object replication and delivery
US7617289B2 (en) * 2002-02-22 2009-11-10 Bea Systems, Inc. System and method for using a data replication service to manage a configuration repository
US7546364B2 (en) * 2002-05-16 2009-06-09 Emc Corporation Replication of remote copy data for internet protocol (IP) transmission
US7376754B2 (en) * 2003-02-27 2008-05-20 Bea Systems, Inc. System and method for communications between servers in a cluster
US8028002B2 (en) * 2004-05-27 2011-09-27 Sap Ag Naming service implementation in a clustered environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5452448A (en) * 1992-03-16 1995-09-19 Hitachi, Ltd. Method of replicate file updating by monitoring file accesses and system therefor
US20030110172A1 (en) * 2001-10-24 2003-06-12 Daniel Selman Data synchronization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7571255B2 (en) 2003-02-27 2009-08-04 Bea Systems, Inc. System and method for communication between servers in a cluster
WO2007057284A1 (en) 2005-11-17 2007-05-24 International Business Machines Corporation Sending routing data based on times that servers joined a cluster

Also Published As

Publication number Publication date
US7571255B2 (en) 2009-08-04
US20050021690A1 (en) 2005-01-27
WO2004077280A3 (en) 2006-06-29
US7376754B2 (en) 2008-05-20
US20080126546A1 (en) 2008-05-29

Similar Documents

Publication Publication Date Title
US7571255B2 (en) System and method for communication between servers in a cluster
US20190243963A1 (en) Replica trusted execution environment: enabling seamless replication of trusted execution environment (tee)-based enclaves in the cloud
US7861167B2 (en) Dynamically extensible application program framework including message and notification routing
US7774403B2 (en) System and method for concentration and load-balancing of requests
JP4503225B2 (en) Virtual network with adaptive dispatcher
US7701970B2 (en) Protocol negotiation for a group communication system
US7039671B2 (en) Dynamically routing messages between software application programs using named routing nodes and named message queues
US7444536B1 (en) RMI-IIOP request failover mechanism
US7243142B2 (en) Distributed computer system enhancing a protocol service to a highly available service
US8719780B2 (en) Application server with a protocol-neutral programming model for developing telecommunications-based applications
US20040002978A1 (en) Bandwidth management for remote services system
US20040078440A1 (en) High availability event topic
JP2005316993A (en) System and method for sharing object between computers over network
AU2001276932A1 (en) System and method for concentration and load-balancing of requests
US20240089352A1 (en) Udp message distribution method, udp message distribution apparatus, electronic device and computer readable storage medium
US20120096112A1 (en) Transparent distribution and decoupling of modules using asynchronous communication and scopes
Duvos et al. An infrastructure for the dynamic distribution of web applications and services
Zhao Design and implementation of a Byzantine fault tolerance framework for Web services
Al-Theneyan et al. Enhancing Jini for use across non-multicastable networks
Narasimhan Transparent fault tolerance for Java remote method invocation
Baker et al. Establishing a reliable Jini infrastructure for parallel applications
Kurzyniec et al. Combining FT-MPI with H2O: Fault-tolerant MPI across administrative boundaries
Rough et al. The GENESIS Reliable Group Communications Facility
Redkar et al. Introducing Message Queuing
Jia et al. Group Communications

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase