WO1997022054A3 - Processor redundancy in a distributed system - Google Patents

Processor redundancy in a distributed system Download PDF

Info

Publication number
WO1997022054A3
WO1997022054A3 PCT/SE1996/001609 SE9601609W WO9722054A3 WO 1997022054 A3 WO1997022054 A3 WO 1997022054A3 SE 9601609 W SE9601609 W SE 9601609W WO 9722054 A3 WO9722054 A3 WO 9722054A3
Authority
WO
WIPO (PCT)
Prior art keywords
processor
software
catastrophe
plan
creation
Prior art date
Application number
PCT/SE1996/001609
Other languages
French (fr)
Other versions
WO1997022054A2 (en
Inventor
Lars Ulrik Jensen
Original Assignee
Lars Ulrik Jensen
Ericsson Telefon Ab L M
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lars Ulrik Jensen, Ericsson Telefon Ab L M filed Critical Lars Ulrik Jensen
Priority to AU10488/97A priority Critical patent/AU1048897A/en
Publication of WO1997022054A2 publication Critical patent/WO1997022054A2/en
Publication of WO1997022054A3 publication Critical patent/WO1997022054A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/24Arrangements for supervision, monitoring or testing with provision for checking the normal operation
    • H04M3/241Arrangements for supervision, monitoring or testing with provision for checking the normal operation for stored program controlled exchanges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2035Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q3/00Selecting arrangements
    • H04Q3/42Circuit arrangements for indirect selecting controlled by common circuits, e.g. register controller, marker
    • H04Q3/54Circuit arrangements for indirect selecting controlled by common circuits, e.g. register controller, marker in which the logic circuitry controlling the exchange is centralised
    • H04Q3/545Circuit arrangements for indirect selecting controlled by common circuits, e.g. register controller, marker in which the logic circuitry controlling the exchange is centralised using a stored programme
    • H04Q3/54575Software application
    • H04Q3/54591Supervision, e.g. fault localisation, traffic measurements, avoiding errors, failure recovery, monitoring, statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2048Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share neither address space nor persistent storage

Abstract

A method of automatically recover from multiple permanent failures of processors in a distributed processor system, in particular a software driven telecommunication system. The method involves the creation of an initial configuration describing each processor and software objects executing thereon, and, for each processor the creation of a catastrophe plan to be followed if the processor has a failure. A catastrophe plan contains information as how to redistribute the software objects executing on the faulty processor to operating processor of the processor system. If a processor goes down its software objects are transferred to operating processors following the catastrophe plan for the faulty processor. A hardware and a software model of the processor system and its software is presented. A software object that has a hardware dependency is handled by the model.
PCT/SE1996/001609 1995-12-08 1996-12-06 Processor redundancy in a distributed system WO1997022054A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU10488/97A AU1048897A (en) 1995-12-08 1996-12-06 Processor redundancy in a distributed system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE9504396A SE515348C2 (en) 1995-12-08 1995-12-08 Processor redundancy in a distributed system
SE9504396-4 1995-12-08

Publications (2)

Publication Number Publication Date
WO1997022054A2 WO1997022054A2 (en) 1997-06-19
WO1997022054A3 true WO1997022054A3 (en) 1997-09-04

Family

ID=20400521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE1996/001609 WO1997022054A2 (en) 1995-12-08 1996-12-06 Processor redundancy in a distributed system

Country Status (3)

Country Link
AU (1) AU1048897A (en)
SE (2) SE515348C2 (en)
WO (1) WO1997022054A2 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6029168A (en) 1998-01-23 2000-02-22 Tricord Systems, Inc. Decentralized file mapping in a striped network file system in a distributed computing environment
US6055227A (en) * 1998-04-02 2000-04-25 Lucent Technologies, Inc. Method for creating and modifying similar and dissimilar databases for use in network configurations for telecommunication systems
DE19836347C2 (en) 1998-08-11 2001-11-15 Ericsson Telefon Ab L M Fault-tolerant computer system
US6530036B1 (en) * 1999-08-17 2003-03-04 Tricord Systems, Inc. Self-healing computer system storage
US6449731B1 (en) 1999-03-03 2002-09-10 Tricord Systems, Inc. Self-healing computer system storage
US6725392B1 (en) 1999-03-03 2004-04-20 Adaptec, Inc. Controller fault recovery system for a distributed file system
FI108599B (en) 1999-04-14 2002-02-15 Ericsson Telefon Ab L M Recovery in Mobile Systems
GB2359384B (en) * 2000-02-16 2004-06-16 Data Connection Ltd Automatic reconnection of partner software processes in a fault-tolerant computer system
US7715837B2 (en) * 2000-02-18 2010-05-11 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for releasing connections in an access network
US7058847B1 (en) * 2002-12-30 2006-06-06 At&T Corporation Concept of zero network element mirroring and disaster restoration process
US7287179B2 (en) 2003-05-15 2007-10-23 International Business Machines Corporation Autonomic failover of grid-based services
DE10328661A1 (en) 2003-06-26 2005-01-13 Deutsche Telekom Ag Telecommunication network organizing method e.g. for exceptional situations, involves central server having telecommunications network and software for organization and or execution of switching of telecommunications connections

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4371754A (en) * 1980-11-19 1983-02-01 Rockwell International Corporation Automatic fault recovery system for a multiple processor telecommunications switching control
US4710926A (en) * 1985-12-27 1987-12-01 American Telephone And Telegraph Company, At&T Bell Laboratories Fault recovery in a distributed processing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4371754A (en) * 1980-11-19 1983-02-01 Rockwell International Corporation Automatic fault recovery system for a multiple processor telecommunications switching control
US4710926A (en) * 1985-12-27 1987-12-01 American Telephone And Telegraph Company, At&T Bell Laboratories Fault recovery in a distributed processing system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DISTRIBUTED PROCESSING - PROCEEDINGS OF THE IFIP WW6 10:3...., October 1987, A-M. DEPLANCHE et al., "Task Redistribution with Allocation Constraints in a Fault-Tolerant Real-Time Multiprocessor System", pages 136-143. *
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, Volume 4, No. 8, August 1993, N-F. TZENG, "Reconfiguration and Analysis of a Fault-Tolerant Circular Butterfly Parallel System", pages 855-863. *
IEEE TRANS. ON RELIABILITY, Volume 38, No. 1, April 1989, C-M. CHEN et al., "Reliability Issues with Multiprocessor Distributed Database Systems: A Case Study", pages 153-155. *
PATENT ABSTRACTS OF JAPAN, Vol. 96, No. 01; & JP,A,07 234 849 (HITACHI LTD), 5 Sept. 1995. *
SPECIAL INTEREST GROUP ON MANAGEMENT OF DATA, No. 2, 1995, L.D. MOLESKY et al., "Recovery Protocols for Shared Memory Database Systems", pages 11-22. *

Also Published As

Publication number Publication date
SE9703132L (en)
AU1048897A (en) 1997-07-03
SE515348C2 (en) 2001-07-16
SE9703132A0 (en) 1997-08-29
SE9504396L (en) 1997-06-09
WO1997022054A2 (en) 1997-06-19
SE9504396D0 (en) 1995-12-08
SE9703132D0 (en) 1997-08-29

Similar Documents

Publication Publication Date Title
CA2150059A1 (en) Progressive Retry Method and Apparatus Having Reusable Software Modules for Software Failure Recovery in Multi-Process Message-Passing Applications
DE69311797D1 (en) FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS
WO1997022054A3 (en) Processor redundancy in a distributed system
CA2294654C (en) Fault-tolerant java virtual machine
US7058957B1 (en) Cluster event notification system
CA2265158A1 (en) Method and apparatus for correct and complete transactions in a fault tolerant distributed database system
CA2240347A1 (en) Methods and systems for reconstructing the state of a computation
CA2224689A1 (en) Remote monitoring of computer programs
DE69635669D1 (en) LOSER COUPLED COMPUTER ASSEMBLY WITH MASS MEMORY
CA2151254A1 (en) A system for taking backup in a data base
WO1999026133A3 (en) Method for maintaining the synchronized execution in fault resilient/fault tolerant computer systems
CA2270462A1 (en) Regeneration agent for back-up software
GB0410972D0 (en) Dynamic RDF groups
DE69122713D1 (en) FAULT-TOLERANT COMPUTER SYSTEM
GB2301464B (en) Multi-server fault tolerance using in-band signalling
EP0315303A3 (en) Duplicated fault-tolerant computer system with error checking
WO2003081430A3 (en) Improvements relating to fault-tolerant computers
WO2003090082A3 (en) Method and system for disaster recovery
WO2001040944A3 (en) Method and system for recovery infrastructure for computer systems
WO2004061665A3 (en) Service continuity data backup of network elements
WO2004079513A3 (en) System and method for determining when an ejb compiler needs to be executed
US5835698A (en) Unilaterally-controlled, time-insensitive, data-link recovery apparatus and method
EP0784274A3 (en) Data processing system with a plurality of storage units and a backup storage unit
DE69430649D1 (en) Fault-tolerant computer systems
WO2002099642A8 (en) A computer with fault-tolerant booting

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 97521980

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase