Implementation of a Fault Tolerant Computing Testbed

Implementation of a Fault Tolerant Computing Testbed
Author: David C. Summers
Publsiher: Unknown
Total Pages: 185
Release: 2000-06-01
Genre: Electronic Book
ISBN: 1423536614

Download Implementation of a Fault Tolerant Computing Testbed Book in PDF, Epub and Kindle

With spacecraft designs placing more emphasis on reduced cost, faster design time, and higher performance, it is easy to understand why more commercial-off-the-shelf (COTS) devices are being used in space based applications. The COTS devices offer spacecraft designers shorter design-to- orbit times, lower system costs, orders of magnitude better performance, and a much better software availability than their radiation hardened (radhard) counterparts. The major drawback to using COTS devices in space is their increased susceptibility to the effects of radiation, single event upsets (SEUs) in particular. This thesis will focus on the implementation of a fault tolerant computer system. The hardware design presented here has two different benefits. First, the system can act as a software testbed, which allows testing of software fault tolerant techniques in the presence of radiation induced SEUs. This allows the testing of the software algorithms in the environment they were designed to operate in without the expense of being placed in orbit. Additionally, the design can be used as a hybrid fault tolerant computer system. By combining the masking ability of the hardware with supporting software, the system can mask out and reset processor errors in real time. The design layout will be presented using OrCAD schematics.

Fault Tolerance Techniques for High Performance Computing

Fault Tolerance Techniques for High Performance Computing
Author: Thomas Herault,Yves Robert
Publsiher: Springer
Total Pages: 320
Release: 2015-07-01
Genre: Computers
ISBN: 9783319209432

Download Fault Tolerance Techniques for High Performance Computing Book in PDF, Epub and Kindle

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Fault Tolerant Computer Architecture

Fault Tolerant Computer Architecture
Author: Daniel Sorin
Publsiher: Morgan & Claypool Publishers
Total Pages: 116
Release: 2009-07-08
Genre: Technology & Engineering
ISBN: 9781598299540

Download Fault Tolerant Computer Architecture Book in PDF, Epub and Kindle

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art - over approximately the past 10 years - in academia and industry. Table of Contents: Introduction / Error Detection / Error Recovery / Diagnosis / Self-Repair / The Future

Fault Tolerance

Fault Tolerance
Author: Peter A. Lee,Thomas Anderson
Publsiher: Springer Science & Business Media
Total Pages: 326
Release: 2012-12-06
Genre: Computers
ISBN: 9783709189900

Download Fault Tolerance Book in PDF, Epub and Kindle

The production of a new version of any book is a daunting task, as many authors will recognise. In the field of computer science, the task is made even more daunting by the speed with which the subject and its supporting technology move forward. Since the publication of the first edition of this book in 1981 much research has been conducted, and many papers have been written, on the subject of fault tolerance. Our aim then was to present for the first time the principles of fault tolerance together with current practice to illustrate those principles. We believe that the principles have (so far) stood the test of time and are as appropriate today as they were in 1981. Much work on the practical applications of fault tolerance has been undertaken, and techniques have been developed for ever more complex situations, such as those required for distributed systems. Nevertheless, the basic principles remain the same.

Software Fault Tolerance Techniques and Implementation

Software Fault Tolerance Techniques and Implementation
Author: Laura L. Pullum
Publsiher: Artech House
Total Pages: 343
Release: 2001
Genre: Computers
ISBN: 9781580531375

Download Software Fault Tolerance Techniques and Implementation Book in PDF, Epub and Kindle

This innovative resource provides the most-comprehensive coverage of software fault tolerance techniques as it guides professionals through their design, operation and performance. It features an in-depth discussion on the advantages and disadvantages of specific techniques, so practitioners can decide which ones are best suited for their work.

Hardware and Software Architectures for Fault Tolerance

Hardware and Software Architectures for Fault Tolerance
Author: Michel Banatre
Publsiher: Springer Science & Business Media
Total Pages: 332
Release: 1994-02-28
Genre: Computers
ISBN: 354057767X

Download Hardware and Software Architectures for Fault Tolerance Book in PDF, Epub and Kindle

Fault tolerance has been an active research area for many years. This volume presents papers from a workshop held in 1993 where a small number of key researchers and practitioners in the area met to discuss the experiences of industrial practitioners, to provide a perspective on the state of the art of fault tolerance research, to determine whether the subject is becoming mature, and to learn from the experiences so far in order to identify what might be important research topics for the coming years. The workshop provided a more intimate environment for discussions and presentations than usual at conferences. The papers in the volume were presented at the workshop, then updated and revised to reflect what was learned at the workshop.

Measurement Modelling and Evaluation of Computing Systems and Dependability in Fault Tolerance

Measurement  Modelling  and Evaluation of Computing Systems and Dependability in Fault Tolerance
Author: Erwin Rathgeb,Klaus Echtle,Bruno Müller-Clostermann
Publsiher: Springer
Total Pages: 323
Release: 2010-05-28
Genre: Computers
ISBN: 9783642121043

Download Measurement Modelling and Evaluation of Computing Systems and Dependability in Fault Tolerance Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 15th International GI/ITG Conference on "Measurement, Modelling and Evaluation of Computing Systems" and "Dependability and Fault Tolerance", held in Essen, Germany, in March 2010. The 19 revised full papers presented together with 5 tool papers and 2 invited lectures were carefully reviewed and selected from 42 initial submissions. The papers cover all aspects of performance and dependability evaluation of systems including networks, computer architectures, distributed systems, software, fault-tolerant and secure systems.

Completion and Testing of a TMR Computing Testbed and Recommendations for a Flight Ready Follow on Design

Completion and Testing of a TMR Computing Testbed and Recommendations for a Flight Ready Follow on Design
Author: Damen O. Hofheinz
Publsiher: Unknown
Total Pages: 172
Release: 2000-12-01
Genre: Electronic Book
ISBN: 1423532554

Download Completion and Testing of a TMR Computing Testbed and Recommendations for a Flight Ready Follow on Design Book in PDF, Epub and Kindle

This thesis focuses on the completion and hardware testing of a fault tolerant computer system utilizing Triple Modular Redundancy (TMR). Due to the radiation environment in space, electronics in space applications must be designed to accommodate single event phenomena. While radiation hardened processors are available, they offer lower performance and higher cost than commercial off the shelf processors. In order to utilize non-hardened devices, a fault tolerance scheme such as TMR may be implemented to increase reliability in a radiation environment. The design that was completed in this effort is one such implementation. The completion of the hardware design consisted of programming logic devices, implementing hardware design corrections, and the design of an overall system controller. The testing effort included basic power and ground verification checks to programming, executing, and evaluating programs in read only memory. During this phase, additional design changes were implemented to correct design flaws. This thesis also evaluated the preliminary design changes required for a space implementation of this TMR design. This included design changes due to size, power, and weight restrictions. Additionally, a detailed analysis of component survivability was performed based on past radiation testing.