WO2001016760A1 - Switchable shared-memory cluster - Google Patents

Switchable shared-memory cluster Download PDF

Info

Publication number
WO2001016760A1
WO2001016760A1 PCT/US2000/024039 US0024039W WO0116760A1 WO 2001016760 A1 WO2001016760 A1 WO 2001016760A1 US 0024039 W US0024039 W US 0024039W WO 0116760 A1 WO0116760 A1 WO 0116760A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
shared
state
private
page
Prior art date
Application number
PCT/US2000/024039
Other languages
French (fr)
Inventor
Lynn Parker West
Karlon K. West
Original Assignee
Times N Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Times N Systems, Inc. filed Critical Times N Systems, Inc.
Priority to AU71007/00A priority Critical patent/AU7100700A/en
Publication of WO2001016760A1 publication Critical patent/WO2001016760A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0284Multiple user address space allocation, e.g. using different base addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/457Communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0837Cache consistency protocols with software control, e.g. non-cacheable data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/52Indexing scheme relating to G06F9/52
    • G06F2209/523Mode

Definitions

  • U.S. Patent Applications 09/273,430, filed March 19, 1999 and PCT/US00/01262, filed January 18, 2000 are hereby expressly incorporated by reference herein for all purposes.
  • U.S. Ser. No. 09/273,430 improved upon the concept of shared memory by teaching the concept which will herein be referred to as a tight cluster.
  • the concept of a tight cluster is that of individual computers, each with its own CPU(s), memory, I/O, and operating system, but for which collection of computers there is a portion of memory which is shared by all the computers and via which they can exchange information.
  • That word and usually a number of surrounding words are transferred to that particular processor's cache memory transparently by cache-memory hardware. That word and the surrounding words (if any) are transferred into a portion of the particular processor's cache memory that is called a cache line or cache block.
  • FIG. 3 shows a page 300 being switched from a shared state to a private state.
  • the shared memory 100 includes a region 110 that contains a locked pointer to the shared page.
  • FIG. 4 shows a Processor N passing a release request to the private memory 10 through the region 110.
  • FIG. 5 shows the page 300 being switched from a private state to a shared state.

Abstract

Methods, systems and devices are described for a switchable shared-memory cluster. An apparatus, includes a shared memory unit; a first processing node coupled to said shared memory unit; and a second processing node coupled to said shared memory unit. A memory page can be switched from a shared state to a private state or from said private state to a shared state. The methods, systems and devices provide advantages because the speed and scalability of parallel processor systems is enhanced.

Description

SWITCHABLE SHARED-MEMORY CLUSTER
BACKGROUND OF THE INVENTION 1. Field of the Invention
The invention relates generally to the field of computer systems which have multiple processing nodes and in which each processing node is provided with private, local memory and also in which each processing node has access to a range of memory which is shared with other processing nodes. More particularly, the invention relates to computer science techniques that utilize a switchable shared-memory cluster.
2. Discussion of the Related Art
The clustering of workstations is a well-known art. In the most common cases, the clustering involves workstations that operate almost totally independently, utilizing the network only to share such services as a printer, license-limited applications, or shared files.
In more-closely-coupled environments, some software packages (such as NQS) allow a cluster of workstations to share work. In such cases the work arrives, typically as batch jobs, at an entry point to the cluster where it is queued and dispatched to the workstations on the basis of load.
In both of these cases, and all other known cases of clustering, the operating system and cluster subsystem are built around the concept of message-passing. The term message-passing means that a given workstation operates on some portion of a job until communications (to send or receive data, typically) with another workstation is necessary. Then, the first workstation prepares and communicates with the other workstation.
Another well-known art is that of clustering processors within a machine, usually called a Massively Parallel Processor or MPP, in which the techniques are essentially identical to those of clustered workstations. Usually, the bandwidth and latency of the interconnect network of an MPP are more highly optimized, but the system operation is the same. In the general case, the passing of a message is an extremely expensive operation; expensive in the sense that many CPU cycles in the sender and receiver are consumed by the process of sending, receiving, bracketing, verifying, and routing the message, CPU cycles that are therefore not available for other operations. A highly streamlined message-passing subsystem can typically require 10,000 to 20,000 CPU cycles or more.
There are specific cases wherein the passing of a message requires significantly less overhead. However, none of these specific cases is adaptable to a general-purpose computer system. Message-passing parallel processor systems have been offered commercially for years but have failed to capture significant market share because of poor performance and difficulty of programming for typical parallel applications. Message-passing parallel processor systems do have some advantages. In particular, because they share no resources, message-passing parallel processor systems are easier to provide with high-availability features.
What is needed is a better approach to parallel processor systems.
There are alternatives to the passing of messages for closely-coupled cluster work. One such alternative is the use of shared memory for inter- processor communication. Shared-memory systems, have been much more successful at capturing market share than message-passing systems because of the dramatically superior performance of shared-memory systems, up to about four-processor systems. In Search of Clusters, Gregory F. Pfister 2nd ed. (January 1998) Prentice Hall Computer Books, ISBN: 0138997098 describes a computing system with multiple processing nodes in which each processing node is provided with private, local memory and also has access to a range of memory which is shared with other processing nodes. The disclosure of this publication in its entirety is hereby expressly incorporated herein by reference for the purpose of indicating the background of the invention and illustrating the state of the art.
However, providing high availability for traditional shared-memory systems has proved to be an elusive goal. The nature of these systems, which share all code and all data, including that data which controls the shared operating systems, is incompatible with the separation normally required for high availability. What is needed is an approach to shared-memory systems that improves availability. Although the use of shared memory for inter-processor communication is a well-known art, prior to the teachings of U.S. Ser. No. 09/273,430, filed March 19, 1999, entitled Shared Memory Apparatus and Method for Multiprocessing Systems, the processors shared a single copy of the operating system. The problem with such systems is that they cannot be efficiently scaled beyond four to eight way systems except in unusual circumstances. All known cases of said unusual circumstances are such that the systems are not good price-performance systems for general-purpose computing.
The entire contents of U.S. Patent Applications 09/273,430, filed March 19, 1999 and PCT/US00/01262, filed January 18, 2000 are hereby expressly incorporated by reference herein for all purposes. U.S. Ser. No. 09/273,430, improved upon the concept of shared memory by teaching the concept which will herein be referred to as a tight cluster. The concept of a tight cluster is that of individual computers, each with its own CPU(s), memory, I/O, and operating system, but for which collection of computers there is a portion of memory which is shared by all the computers and via which they can exchange information. U.S. Ser. No. 09/273,430 describes a system in which each processing node is provided with its own private copy of an operating system and in which the connection to shared memory is via a standard bus. The advantage of a tight cluster in comparison to an SMP is "scalability" which means that a much larger number of computers can be attached together via a tight cluster than an SMP with little loss of processing efficiency.
What is needed are improvements to the concept of the tight cluster. What is also needed is an expansion of the concept of the tight cluster. Another well-known art is the use of memory caches to improve performance. Caches provide such a significant performance boost that most modern computers use them. At the very top of the performance (and price) range all of memory is constructed using cache-memory technologies. However, this is such an expensive approach that few manufacturers use it. All manufacturers of personal computers (PCs) and workstations use caches except for the very low end of the PC business where caches are omitted for price reasons and performance is, therefore, poor. Caches, however, present a problem for shared-memory computing systems; the problem of coherence. As a particular processor reads or writes a word of shared memory, that word and usually a number of surrounding words are transferred to that particular processor's cache memory transparently by cache-memory hardware. That word and the surrounding words (if any) are transferred into a portion of the particular processor's cache memory that is called a cache line or cache block.
If the transferred cache line is modified by the particular processor, the representation in the cache memory will become different from the value in shared memory. That cache line within that particular processor's cache memory is, at that point, called a "dirty" line. The particular processor with the dirty line, when accessing that memory address will see the new (modified) value. Other processors, accessing that memory address will see the old (unmodified) value in shared memory. This lack of coherence between such accesses will lead to incorrect results. Modern computers, workstations, and PCs which provide for multiple processors and shared memory, therefore, also provide high-speed, transparent cache coherence hardware to assure that if a line in one cache changes and another processor subsequently accesses a value which is in that address range, the new values will be transferred back to memory or at least to the requesting processor.
Caches can be maintained coherent by software provided that sufficient cache-management instructions are provided by the manufacturer. However, in many cases, an adequate arsenal of such instructions are not provided. Moreover, even in cases where the instruction set is adequate, the software overhead is so great that no examples of are known of commercially successful machines which use software-managed coherence. SUMMARY OF THE INVENTION A goal of the invention is to simultaneously satisfy the above-discussed requirements of improving and expanding the tight cluster concept which, in the case of the prior art, are not satisfied. One embodiment of the invention is based on an apparatus, comprising: a shared memory unit; a first processing node coupled to said shared memory unit; and a second processing node coupled to said shared memory unit, wherein a memory page can be switched from a shared state to a private state or from said private state to a shared state. Another embodiment of the invention is based on an apparatus, comprising: a shared memory unit; a first processing node coupled to said shared memory unit; and a second processing node coupled to said shared memory unit, wherein a memory page can be switched from a shared state to a private state or from said private state to a shared state. Another embodiment of the invention is based on an electronic media, comprising: a computer program adapted to switch a page of memory from a shared state to a private state or from a private state to a shared state. Another embodiment of the invention is based on a computer program comprising computer program means adapted to perform the steps of switching a page of memory from a shared state to a private state or from a private state to a shared state when said computer program is run on a computer. Another embodiment of the invention is based on a system, comprising a multiplicity of workstations and a shared memory unit (SMU) interconnected and arranged such that memory accesses by a given workstation in a first set of address ranges will be to its local, private memory and that memory accesses to a second set of address ranges will be to shared memory, and arranged such that said accesses to shared memory are recorded and signaled by a Cache Emulation Adapter (CEA) capable of recognizing and responding to cache-coherence signals within the workstation. Another embodiment of the invention is based on a parallel computing system which has multiple PRNs each with private memory and which also is provided with some shared memory, means to switch some portion memory on a dynamic basis from shared memory to private memory. These, and other goals and embodiments of the invention will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following description, while indicating preferred embodiments of the invention and numerous specific details thereof, is given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the invention without departing from the spirit thereof, and the invention includes all such modifications.
BRIEF DESCRIPTION OF THE DRAWINGS
A clear conception of the advantages and features constituting the invention, and of the components and operation of model systems provided with the invention, will become more readily apparent by referring to the exemplary, and therefore nonlimiting, embodiments illustrated in the drawings accompanying and forming a part of this specification, wherein like reference characters (if they occur in more than one view) designate the same parts. It should be noted that the features illustrated in the drawings are not necessarily drawn to scale.
FIG. 1 illustrates a block schematic view of a systems structure showing private memories plus shared memory, representing an embodiment of the invention.
FIG. 2 illustrates a block schematic view of the details of a private memory in one processing node, showing regions which can be dynamically allocated to hold shared pages, representing an embodiment of the invention. FIG. 3 illustrates a block schematic view of a shared page being moved to one of the private memories, representing an embodiment of the invention.
FIG. 4 illustrates a block schematic view of a request from an alternative processing node to release a lock on a switched-shared page, representing an embodiment of the invention. FIG. 5 illustrates a block schematic view of a shared page being moved back to a shared-memory node, representing an embodiment of the invention. DESCRIPTION OF PREFERRED EMBODIMENTS The invention and the various features and advantageous details thereof are explained more fully with reference to the nonlimiting embodiments that are illustrated in the accompanying drawings and detailed in the following description of preferred embodiments. Descriptions of well known components and processing techniques are omitted so as not to unnecessarily obscure the invention in detail.
The teachings of U.S. Ser. No. 09/273,430 include a system which is a single entity; one large supercomputer. The invention is also applicable to a cluster of workstations, or even a network.
The invention is applicable to systems of the type of Pfister or the type of U.S. Ser. No. 09/273,430 in which each processing node has its own copy of an operating system. The invention is also applicable to other types of multiple processing node systems. The context of the invention can include a tight cluster as described in
U.S. Ser. No. 09/273,430. A tight cluster is defined as a cluster of workstations or an arrangement within a single, multiple-processor machine in which the processors are connected by a high-speed, low-latency interconnection, and in which some but not all memory is shared among the processors. Within the scope of a given processor, accesses to a first set of ranges of memory addresses will be to local, private memory but accesses to a second set of memory address ranges will be to shared memory. The significant advantage to a tight cluster in comparison to a message-passing cluster is that, assuming the environment has been appropriately established, the exchange of information involves a single STORE instruction by the sending processor and a subsequent single LOAD instruction by the receiving processor.
The establishment of the environment, taught by U.S. Ser. No. 09/273,430 and more fully by companion disclosures (U.S. Provisional Application Ser. No. 60/220,794, filed July 26, 2000; U.S. Provisional Application Ser. No. 60/220,748, filed July 26, 2000; WSGR 15245-711;
WSGR 15245-712; WSGR 15245-713; WSGR 15245-715; WSGR 15245-716; WSGR 15245-717; WSGR 15245-718; and WSGR 15245-719. the entire contents of all which are hereby expressly incorporated herein by reference for all purposes) can be performed in such a way as to require relatively little system overhead, and to be done once for many, many information exchanges. Therefore, a comparison of 10,000 instructions for message-passing to a pair of instructions for tight-clustering, is valid.
The invention can be embodied in the environment described in U.S. Ser. No. 09/273,430 with the additional feature that multiple computers can selectively switch, on a range-by-range basis, certain portions of memory from shared to private status. Thus, the invention can be embodied in a tight cluster system in which several (or many, e.g., 100, 1000, or 10,000 etc.) ranges in the second set of address ranges is to memory that can be dynamically switched between shared and private. When switched to private, a given range will be private with regard to a particular processor and will be inaccessible to other processors. In a system with some local, private memory and shared, globally visible memory, the invention can include dynamically allocating portions of shared memory to individual processors for temporary private use, and for subsequently returning those portions to the common, shared pool at a later time. The invention contemplates such an environment in which each standing computer is provided with very high-speed, low-latency communication means to shared-memory, and which shared-memory interface provides means for switching each of several banks of memory from shared to private status. Each computer in the system can also be provided with a specific inter-connection to the shared memory. In a system with some memory local and private to each of a multiplicity of processing nodes (PRNs) in which each PRN is provided with its own copy of the operating system, and in which some memory is globally visible to all PRNs and shared by all, certain advantages accrue relative to alternative computing systems. The most significant of these advantages is that only data which is shared is placed into shared memory, thus eliminating all operating-system contention and most memory-access contention among the PRN's. There can be, of course, contention between the PRNs for access to shared data which is in shared memory. The invention can further reduce contention in such a system for applications which exhibit certain data-access characteristics. Specifically, the systems of the type of Pfister or U.S. Ser. No. 09/273,430 work best if data which is shared is shared on a fine-grained basis.
That is, if the accesses to that shared data by any particular PRN are few in number prior to accesses to that shared data by other PRNs. For large-grained data sharing (one PRN uses some particular set of data many times before any other PRN uses that data at all) these share-as-needed systems still in general outperform other kinds of parallel systems, but a further improvement is possible. This invention describes such an improvement, called switched- memory systems.
In a switched-memory system, some of memory can be switched to be shared under certain circumstances but private to one particular PRN under other circumstances. In a preferred embodiment, the memory is not physically switched but rather its characteristics are switched. Operating-system services and API's can be used to control access to shared memory.
Referring to FIGS. 1 and 2, a close cluster system is shown. A shared memory 100 is coupled to a first processor 0 via a first private memory 10. The shared memory 100 is coupled to a second processor 1 via a second private memory 1 1. The shared memory 100 is coupled to a third processor 2 via a third private memory 12. The dashed line in FIG. 1 indicates that there may be more processors than are shown. Referring to FIG. 2, the first private memory 10 includes two switchable regions 51 and 61. These regions can be termed pages.
If an application needs access to a block of memory (the word "page" will be used hereafter, and this term means a page in the well-understood virtual-memory operating sense) and if the memory space needed is in shared memory, the application will use an API interface which will here be called tnsl_shared_memory_get(). The API interface tnsl_shared_memory_get() returns to the application a pointer to the memory (if available) and information relative to one or more pages of shared memory. In contrast, if an application needs access to a block of memory and the memory space needed is in private, local memory, the application will use a different API interface which will be here called local_memory_get(). In such a case, the operating system returns a pointer to the memory (if available) and returns other information indicating whether some other process or thread is already utilizing that memory.
Different kinds of locks can be obtained on shared memory, including Shared locks and Exclusive locks. If a PRN has an exclusive lock on a page and some other PRN needs that page, the second PRN will request that the first PRN release its lock. However, if a first PRN has a shared lock on some page and a second PRN requires shared access to that page, the first PRN does not need to release its shared lock on that page. For a third case, however, if a first PRN has a shared lock on some page and a second PRN requires exclusive access to that page, then the second PRN will request that the first PRN release its shared lock on the page.
Referring to FIGS. 3-5, dynamic switching and lock release is shown. FIG. 3 shows a page 300 being switched from a shared state to a private state. The shared memory 100 includes a region 110 that contains a locked pointer to the shared page. FIG. 4 shows a Processor N passing a release request to the private memory 10 through the region 110. FIG. 5 shows the page 300 being switched from a private state to a shared state.
In a preferred embodiment, tnsl_shared_memory_get() is augmented with either a second, similar function which has added call or return parameters, or with augmenting the original function with added call and return parameters. The new call parameters include but are not limited to: (1) exclusive private access requested; (2) shared private access requested; (3) exclusive private access required; and (4) shared private access required. The return parameters include an indicator as to whether the request was successfully fulfilled, whether private access is granted, and in the case of shared private access whether some other PRN is also sharing the page.
In a preferred embodiment, if a PRN acquires private access, the most significant result of the acquisition is that the PRN may then cache the data in its local, level- 1 cache. Since this cache can operate at much higher speed than other caches and main memory, such access can result in very significant performance speedups. This performance improvement will only be observed if the application does in fact access the data a number of times before releasing the page, but if the application accesses the data many times before releasing it, the speedup can be dramatic.
In the case of exclusive private access, the corresponding release function call by the application to the operating system extensions will cause the operating system extension to remove or invalidate the data in the private cache system of that particular PRN and write any modified data back to shared memory. Therefore, for exclusive private access, the system and hardware will assure correctness of operation.
In the case of shared private access, the application can assure correctness of operation if it never issues a Store to any data address in such a page; or, if it does Store into the page the application can assure correctness of operation if any data it does modify during said shared access period cannot affect other applications which may access the page while or after it has been in shared-private state. In general, shared private access is for Read-Only access. Another embodiment of the invention is to reconfigure the memory access switch so that a particular range of addresses may be in shared memory for one memory configuration, but may be addressable by a single PRN for other configurations.
While not being limited to any particular performance indicator or diagnostic identifier, preferred embodiments of the invention can be identified one at a time by testing for the substantially highest performance. The test for the substantially highest performance can be carried out without undue experimentation by the use of a simple and conventional benchmark (speed) experiment.
The term substantially, as used herein, is defined as at least approaching a given state (e.g., preferably within 10% of, more preferably within 1% of, and most preferably within 0.1% of). The term coupled, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically. The term means, as used herein, is defined as hardware, firmware and or software for achieving a result. The term program or phrase computer program, as used herein, is defined as a sequence of instructions designed for execution on a computer system. A program may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, and/or other sequence of instructions designed for execution on a computer system. Practical Applications of the Invention A practical application of the invention that has value within the technological arts is waveform transformation. Further, the invention is useful in conjunction with data input and transformation (such as are used for the purpose of speech recognition), or in conjunction with transforming the appearance of a display (such as are used for the purpose of video games), or the like. There are virtually innumerable uses for the invention, all of which need not be detailed here.
Advantages of the Invention A system, representing an embodiment of the invention, can be cost effective and advantageous for at least the following reasons. The invention improves the speed of parallel computing systems. The invention improves the scalability of parallel computing systems.
All the disclosed embodiments of the invention described herein can be realized and practiced without undue experimentation. Although the best mode of carrying out the invention contemplated by the inventors is disclosed above, practice of the invention is not limited thereto. Accordingly, it will be appreciated by those skilled in the art that the invention may be practiced otherwise than as specifically described herein.
For example, although the switchable shared-memory cluster described herein can be a separate module, it will be manifest that the switchable shared- memory cluster may be integrated into the system with which it is associated. Furthermore, all the disclosed elements and features of each disclosed embodiment can be combined with, or substituted for, the disclosed elements and features of every other disclosed embodiment except where such elements or features are mutually exclusive.
It will be manifest that various additions, modifications and rearrangements of the features of the invention may be made without deviating from the spirit and scope of the underlying inventive concept. It is intended that the scope of the invention as defined by the appended claims and their equivalents cover all such additions, modifications, and rearrangements.
The appended claims are not to be interpreted as including means-plus- function limitations, unless such a limitation is explicitly recited in a given claim using the phrase "means for." Expedient embodiments of the invention are differentiated by the appended subclaims.

Claims

CLAIMS What is claimed is:
1. An apparatus, comprising: a shared memory unit; a first processing node coupled to said shared memory unit; and a second processing node coupled to said shared memory unit, wherein a memory page can be switched from a shared state to a private state or from said private state to a shared state.
2. The apparatus of claim 1, wherein said memory page is dynamically switchable between said shared state and said private state.
3. The apparatus of claim 1, wherein said memory page is switched by changing characteristics of said memory page.
4. The apparatus of claim 1 , wherein said memory page composes one of a plurality of banks of memory, each of said banks separately switchable from said shared state to said private state or from said private state to a shared state.
5. A computer system comprising the apparatus of claim 1.
6. A method, comprising switching a page of memory from a shared state to a private state or from a private state to a shared state.
7. The method of claim 6, wherein said page of memory is switched from the shared state to the private state; and then further comprising switching said page of memory from the private state to a shared state.
8. The method of claim 6, wherein said page of memory is switched from the private state to the shared state; and then further comprising switching said page of memory from the shared state to a private state.
9. The method of claim 6, wherein said page of memory composes one of a plurality of ranges of memory.
10. The method of claim 6, further comprising releasing a lock on said page of memory before switching said page of memory from said shared state to said private state or from said private state to said shared state
11. The method of claim 10, further comprising requesting the release of the lock on said page of memory before releasing the lock.
12. The method of claim 1 1, wherein requesting the release includes providing an augmentation selected from the group consisting of exclusive private access requested; shared private access requested; exclusive private access required; and shared private access required.
13. An electronic media, comprising: a computer program adapted to switch a page of memory from a shared state to a private state or from a private state to a shared state.
14. A computer program comprising computer program means adapted to perform the steps of switching a page of memory from a shared state to a private state or from a private state to a shared state when said computer program is run on a computer.
15. A computer program as claimed in claim 14, embodied on a computer- readable medium.
16. A system, comprising a multiplicity of workstations and a shared memory unit (SMU) interconnected and arranged such that memory accesses by a given workstation in a first set of address ranges will be to its local, private memory and that memory accesses to a second set of address ranges will be to shared memory, and arranged such that said accesses to shared memory are recorded and signaled by a Cache Emulation Adapter (CEA) capable of recognizing and responding to cache-coherence signals within the workstation.
17. The system of claim 16, further comprising means for signaling between the CEA and the SMU to LOAD or STORE shared-memory values from or to the SMU.
18. The system of claim 17, further comprising means for signaling between the CEA units and the SMU sufficient to satisfy cache coherence operations across the system when said cache coherence operations involve shared memory.
19. The system of claim 16, wherein the SMU of said system includes a directory for keeping track of which workstation has which cache line.
20. The system of claim 16, wherein said SMU cache directory includes means for keeping track of the ownership status of each workstation-owned cache line (READ SHARED, READ EXCLUSIVE, WRITE).
21. The system of claim 16, wherein each CEA includes a directory of which workstation has which cache line.
22. The system of claim 16, wherein said CEA cache directory includes means for keeping track of the ownership status of each workstation-owned cache line (READ SHARED, READ EXCLUSIVE, WRITE).
23. The system of claim 16, wherein said CEA includes caching of shared- memory accesses.
24. A parallel computing system which has multiple PRNs each with private memory and which also is provided with some shared memory, means to switch some portion memory on a dynamic basis from shared memory to private memory.
25. A system of claim 24, in which the memory which is switched may be switched to one particular PRN at one time and then may be switched to another
PRN at another time, and to fully shared at yet other times.
26. A system of claim 25, in which memory which is switched to private memory may be cached by one PRN for exclusive (Read and Write) use for a period of time.
PCT/US2000/024039 1999-08-31 2000-08-31 Switchable shared-memory cluster WO2001016760A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU71007/00A AU7100700A (en) 1999-08-31 2000-08-31 Switchable shared-memory cluster

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US15215199P 1999-08-31 1999-08-31
US60/152,151 1999-08-31
US60/220,794 2000-07-25
US22097400P 2000-07-26 2000-07-26
US22074800P 2000-07-26 2000-07-26
US60/220,748 2000-07-26

Publications (1)

Publication Number Publication Date
WO2001016760A1 true WO2001016760A1 (en) 2001-03-08

Family

ID=27387201

Family Applications (9)

Application Number Title Priority Date Filing Date
PCT/US2000/024150 WO2001016738A2 (en) 1999-08-31 2000-08-31 Efficient page ownership control
PCT/US2000/024298 WO2001016743A2 (en) 1999-08-31 2000-08-31 Shared memory disk
PCT/US2000/024329 WO2001016750A2 (en) 1999-08-31 2000-08-31 High-availability, shared-memory cluster
PCT/US2000/024217 WO2001016741A2 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory
PCT/US2000/024147 WO2001016737A2 (en) 1999-08-31 2000-08-31 Cache-coherent shared-memory cluster
PCT/US2000/024216 WO2001016761A2 (en) 1999-08-31 2000-08-31 Efficient page allocation
PCT/US2000/024039 WO2001016760A1 (en) 1999-08-31 2000-08-31 Switchable shared-memory cluster
PCT/US2000/024248 WO2001016742A2 (en) 1999-08-31 2000-08-31 Network shared memory
PCT/US2000/024210 WO2001016740A2 (en) 1999-08-31 2000-08-31 Efficient event waiting

Family Applications Before (6)

Application Number Title Priority Date Filing Date
PCT/US2000/024150 WO2001016738A2 (en) 1999-08-31 2000-08-31 Efficient page ownership control
PCT/US2000/024298 WO2001016743A2 (en) 1999-08-31 2000-08-31 Shared memory disk
PCT/US2000/024329 WO2001016750A2 (en) 1999-08-31 2000-08-31 High-availability, shared-memory cluster
PCT/US2000/024217 WO2001016741A2 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory
PCT/US2000/024147 WO2001016737A2 (en) 1999-08-31 2000-08-31 Cache-coherent shared-memory cluster
PCT/US2000/024216 WO2001016761A2 (en) 1999-08-31 2000-08-31 Efficient page allocation

Family Applications After (2)

Application Number Title Priority Date Filing Date
PCT/US2000/024248 WO2001016742A2 (en) 1999-08-31 2000-08-31 Network shared memory
PCT/US2000/024210 WO2001016740A2 (en) 1999-08-31 2000-08-31 Efficient event waiting

Country Status (4)

Country Link
EP (3) EP1214651A2 (en)
AU (9) AU6949700A (en)
CA (3) CA2382929A1 (en)
WO (9) WO2001016738A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1895413A3 (en) * 2006-08-18 2009-09-30 Fujitsu Limited Access monitoring method and device for shared memory
EP2851807A1 (en) * 2013-05-28 2015-03-25 Huawei Technologies Co., Ltd. Method and system for supporting resource isolation under multi-core architecture

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040205217A1 (en) * 2001-07-13 2004-10-14 Maria Gabrani Method of running a media application and a media system with job control
US6999998B2 (en) 2001-10-04 2006-02-14 Hewlett-Packard Development Company, L.P. Shared memory coupling of network infrastructure devices
US6920485B2 (en) 2001-10-04 2005-07-19 Hewlett-Packard Development Company, L.P. Packet processing in shared memory multi-computer systems
US7254745B2 (en) 2002-10-03 2007-08-07 International Business Machines Corporation Diagnostic probe management in data processing systems
US7685381B2 (en) 2007-03-01 2010-03-23 International Business Machines Corporation Employing a data structure of readily accessible units of memory to facilitate memory access
US7899663B2 (en) 2007-03-30 2011-03-01 International Business Machines Corporation Providing memory consistency in an emulated processing environment
US9442780B2 (en) 2011-07-19 2016-09-13 Qualcomm Incorporated Synchronization of shader operation
US9064437B2 (en) * 2012-12-07 2015-06-23 Intel Corporation Memory based semaphores

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4238593A1 (en) * 1992-11-16 1994-05-19 Ibm Multiprocessor computer system
EP0908825A1 (en) * 1997-10-10 1999-04-14 BULL HN INFORMATION SYSTEMS ITALIA S.p.A. A data-processing system with cc-NUMA (cache coherent, non-uniform memory access) architecture and remote access cache incorporated in local memory
US5940870A (en) * 1996-05-21 1999-08-17 Industrial Technology Research Institute Address translation for shared-memory multiprocessor clustering

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3668644A (en) * 1970-02-09 1972-06-06 Burroughs Corp Failsafe memory system
US4484262A (en) * 1979-01-09 1984-11-20 Sullivan Herbert W Shared memory computer method and apparatus
US4403283A (en) * 1980-07-28 1983-09-06 Ncr Corporation Extended memory system and method
US4414624A (en) * 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
US4725946A (en) * 1985-06-27 1988-02-16 Honeywell Information Systems Inc. P and V instructions for semaphore architecture in a multiprogramming/multiprocessing environment
JPH063589B2 (en) * 1987-10-29 1994-01-12 インターナシヨナル・ビジネス・マシーンズ・コーポレーシヨン Address replacement device
US5175839A (en) * 1987-12-24 1992-12-29 Fujitsu Limited Storage control system in a computer system for double-writing
DE68925064T2 (en) * 1988-05-26 1996-08-08 Hitachi Ltd Task execution control method for a multiprocessor system with post / wait procedure
US4992935A (en) * 1988-07-12 1991-02-12 International Business Machines Corporation Bit map search by competitive processors
US4965717A (en) * 1988-12-09 1990-10-23 Tandem Computers Incorporated Multiple processor system having shared memory with private-write capability
DE69124285T2 (en) * 1990-05-18 1997-08-14 Fujitsu Ltd Data processing system with an input / output path separation mechanism and method for controlling the data processing system
US5206952A (en) * 1990-09-12 1993-04-27 Cray Research, Inc. Fault tolerant networking architecture
US5434970A (en) * 1991-02-14 1995-07-18 Cray Research, Inc. System for distributed multiprocessor communication
JPH04271453A (en) * 1991-02-27 1992-09-28 Toshiba Corp Composite electronic computer
EP0528538B1 (en) * 1991-07-18 1998-12-23 Tandem Computers Incorporated Mirrored memory multi processor system
US5315707A (en) * 1992-01-10 1994-05-24 Digital Equipment Corporation Multiprocessor buffer system
US5398331A (en) * 1992-07-08 1995-03-14 International Business Machines Corporation Shared storage controller for dual copy shared data
US5434975A (en) * 1992-09-24 1995-07-18 At&T Corp. System for interconnecting a synchronous path having semaphores and an asynchronous path having message queuing for interprocess communications
JP2963298B2 (en) * 1993-03-26 1999-10-18 富士通株式会社 Recovery method of exclusive control instruction in duplicated shared memory and computer system
US5590308A (en) * 1993-09-01 1996-12-31 International Business Machines Corporation Method and apparatus for reducing false invalidations in distributed systems
US5664089A (en) * 1994-04-26 1997-09-02 Unisys Corporation Multiple power domain power loss detection and interface disable
US5636359A (en) * 1994-06-20 1997-06-03 International Business Machines Corporation Performance enhancement system and method for a hierarchical data cache using a RAID parity scheme
US6587889B1 (en) * 1995-10-17 2003-07-01 International Business Machines Corporation Junction manager program object interconnection and method
US5784699A (en) * 1996-05-24 1998-07-21 Oracle Corporation Dynamic memory allocation in a computer using a bit map index
JPH10142298A (en) * 1996-11-15 1998-05-29 Advantest Corp Testing device for ic device
US5829029A (en) * 1996-12-18 1998-10-27 Bull Hn Information Systems Inc. Private cache miss and access management in a multiprocessor system with shared memory
US5918248A (en) * 1996-12-30 1999-06-29 Northern Telecom Limited Shared memory control algorithm for mutual exclusion and rollback
US6360303B1 (en) * 1997-09-30 2002-03-19 Compaq Computer Corporation Partitioning memory shared by multiple processors of a distributed processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4238593A1 (en) * 1992-11-16 1994-05-19 Ibm Multiprocessor computer system
US5940870A (en) * 1996-05-21 1999-08-17 Industrial Technology Research Institute Address translation for shared-memory multiprocessor clustering
EP0908825A1 (en) * 1997-10-10 1999-04-14 BULL HN INFORMATION SYSTEMS ITALIA S.p.A. A data-processing system with cc-NUMA (cache coherent, non-uniform memory access) architecture and remote access cache incorporated in local memory

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"DISTRIBUTED PROCESS BULLETIN BOARD", IBM TECHNICAL DISCLOSURE BULLETIN,US,IBM CORP. NEW YORK, vol. 33, no. 10A, 1 March 1991 (1991-03-01), pages 1 - 5, XP000109937, ISSN: 0018-8689 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1895413A3 (en) * 2006-08-18 2009-09-30 Fujitsu Limited Access monitoring method and device for shared memory
EP2851807A1 (en) * 2013-05-28 2015-03-25 Huawei Technologies Co., Ltd. Method and system for supporting resource isolation under multi-core architecture
EP2851807A4 (en) * 2013-05-28 2015-04-22 Huawei Tech Co Ltd Method and system for supporting resource isolation under multi-core architecture
US9411646B2 (en) 2013-05-28 2016-08-09 Huawei Technologies Co., Ltd. Booting secondary processors in multicore system using kernel images stored in private memory segments

Also Published As

Publication number Publication date
WO2001016738A8 (en) 2001-05-03
WO2001016740A2 (en) 2001-03-08
WO2001016738A9 (en) 2002-09-12
WO2001016750A2 (en) 2001-03-08
WO2001016743A3 (en) 2001-08-09
WO2001016737A3 (en) 2001-11-08
AU6949600A (en) 2001-03-26
CA2382927A1 (en) 2001-03-08
EP1214652A2 (en) 2002-06-19
WO2001016743A8 (en) 2001-10-18
WO2001016742A2 (en) 2001-03-08
CA2382929A1 (en) 2001-03-08
AU7110000A (en) 2001-03-26
WO2001016743A2 (en) 2001-03-08
WO2001016761A2 (en) 2001-03-08
WO2001016750A3 (en) 2002-01-17
AU7108300A (en) 2001-03-26
AU6949700A (en) 2001-03-26
AU7108500A (en) 2001-03-26
WO2001016742A3 (en) 2001-09-20
WO2001016761A3 (en) 2001-12-27
WO2001016738A3 (en) 2001-10-04
AU7113600A (en) 2001-03-26
WO2001016741A2 (en) 2001-03-08
WO2001016740A3 (en) 2001-12-27
WO2001016741A3 (en) 2001-09-20
EP1214651A2 (en) 2002-06-19
AU7100700A (en) 2001-03-26
WO2001016737A2 (en) 2001-03-08
CA2382728A1 (en) 2001-03-08
EP1214653A2 (en) 2002-06-19
AU7112100A (en) 2001-03-26
AU7474200A (en) 2001-03-26
WO2001016738A2 (en) 2001-03-08

Similar Documents

Publication Publication Date Title
US6345352B1 (en) Method and system for supporting multiprocessor TLB-purge instructions using directed write transactions
US5787480A (en) Lock-up free data sharing
US5802585A (en) Batched checking of shared memory accesses
US5933598A (en) Method for sharing variable-grained memory of workstations by sending particular block including line and size of the block to exchange shared data structures
Scales et al. Fine-grain software distributed shared memory on SMP clusters
Vaidyanathan et al. Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
US8902915B2 (en) Dataport and methods thereof
EP0619898A1 (en) Computer system with two levels of guests
US7769962B2 (en) System and method for thread creation and memory management in an object-oriented programming environment
WO2001016760A1 (en) Switchable shared-memory cluster
US6457107B1 (en) Method and apparatus for reducing false sharing in a distributed computing environment
EP0480858A2 (en) Hardware primary directory lock
Blumrich et al. Two virtual memory mapped network interface designs
US7073004B2 (en) Method and data processing system for microprocessor communication in a cluster-based multi-processor network
KR100978083B1 (en) Procedure calling method in shared memory multiprocessor and computer-redable recording medium recorded procedure calling program
US6810464B1 (en) Multiprocessor computer system for processing communal locks employing mid-level caches
Heinlein et al. Integrating multiple communication paradigms in high performance multiprocessors
Aude et al. The MULTIPLUS/MULPLIX parallel processing environment
US7194585B2 (en) Coherency controller management of transactions
CN117311833A (en) Storage control method and device, electronic equipment and readable storage medium
JPH03296159A (en) Memory access system for dma device
Dwarkadas et al. CASHMERE-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network
Gupta Multiple Protocol Engines for a Directory based Cache Coherent DSM Multiprocessor
KR20020063365A (en) Real time memory management method of multi processor system
Cordsen et al. Improving Communication Support for Parallel Applications⋆

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US US US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)