Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology

Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology
Author: Nitin Sukhija
Publisher:
Total Pages: 172
Release: 2015
Genre:
ISBN:

Download Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology Book in PDF, Epub and Kindle

Large scale systems provide a powerful computing platform for solving large and complex scientific applications. However, the inherent complexity, heterogeneity, wide distribution, and dynamism of the computing environments can lead to performance degradation of the scientific applications executing on these computing systems. Load imbalance arising from a variety of sources such as application, algorithmic, and systemic variations is one of the major contributors to their performance degradation. In general, load balancing is achieved via scheduling. Moreover, frequently occurring resource failures drastically affect the execution of applications running on high performance computing systems. Therefore, the study of deploying support for integrated scheduling and fault-tolerance mechanisms for guaranteeing that applications deployed on computing systems are resilient to failures becomes of paramount importance. Recently, several research initiatives have started to address the issue of resilience. However, the major focus of these efforts was geared more toward achieving system level resilience with less emphasis on achieving resilience at the application level. Therefore, it is increasingly important to extend the concept of resilience to the scheduling techniques at the application level for establishing a holistic approach that addresses the performability of these applications on high performance computing systems. This can be achieved by developing a comprehensive modeling framework that can be used to evaluate the resiliency of such techniques on heterogeneous computing systems for assessing the impact of failures as well as workloads in an integrated way. This dissertation presents an experimental methodology based on discrete event simulation for the analysis and the evaluation of the resilience of scheduling scientific applications on high performance computing systems. With the aid of the methodology a wide class of dependencies existing between application and computing system are captured within a deterministic model for quantifying the performance impact expected from changes in application and system characteristics. Ideally, the results obtained by employing the proposed simulation-based performance prediction framework enabled an introspective design and investigation of scheduling heuristics to reason about how to best fully optimize various often antagonistic objectives, such as minimizing application makespan and maximizing reliability.

Resilience Assessment and Evaluation of Computing Systems

Resilience Assessment and Evaluation of Computing Systems
Author: Katinka Wolter
Publisher: Springer Science & Business Media
Total Pages: 485
Release: 2012-11-02
Genre: Computers
ISBN: 3642290329

Download Resilience Assessment and Evaluation of Computing Systems Book in PDF, Epub and Kindle

The resilience of computing systems includes their dependability as well as their fault tolerance and security. It defines the ability of a computing system to perform properly in the presence of various kinds of disturbances and to recover from any service degradation. These properties are immensely important in a world where many aspects of our daily life depend on the correct, reliable and secure operation of often large-scale distributed computing systems. Wolter and her co-editors grouped the 20 chapters from leading researchers into seven parts: an introduction and motivating examples, modeling techniques, model-driven prediction, measurement and metrics, testing techniques, case studies, and conclusions. The core is formed by 12 technical papers, which are framed by motivating real-world examples and case studies, thus illustrating the necessity and the application of the presented methods. While the technical chapters are independent of each other and can be read in any order, the reader will benefit more from the case studies if he or she reads them together with the related techniques. The papers combine topics like modeling, benchmarking, testing, performance evaluation, and dependability, and aim at academic and industrial researchers in these areas as well as graduate students and lecturers in related fields. In this volume, they will find a comprehensive overview of the state of the art in a field of continuously growing practical importance.

Scheduling Problems

Scheduling Problems
Author: Rodrigo Righi
Publisher: BoD – Books on Demand
Total Pages: 156
Release: 2020-07-08
Genre: Computers
ISBN: 1789850533

Download Scheduling Problems Book in PDF, Epub and Kindle

Scheduling is defined as the process of assigning operations to resources over time to optimize a criterion. Problems with scheduling comprise both a set of resources and a set of a consumers. As such, managing scheduling problems involves managing the use of resources by several consumers. This book presents some new applications and trends related to task and data scheduling. In particular, chapters focus on data science, big data, high-performance computing, and Cloud computing environments. In addition, this book presents novel algorithms and literature reviews that will guide current and new researchers who work with load balancing, scheduling, and allocation problems.

Scheduling in Parallel Computing Systems

Scheduling in Parallel Computing Systems
Author: Shaharuddin Salleh
Publisher: Springer Science & Business Media
Total Pages: 177
Release: 2012-12-06
Genre: Computers
ISBN: 1461550653

Download Scheduling in Parallel Computing Systems Book in PDF, Epub and Kindle

Scheduling in Parallel Computing Systems: Fuzzy and Annealing Techniques advocates the viability of using fuzzy and annealing methods in solving scheduling problems for parallel computing systems. The book proposes new techniques for both static and dynamic scheduling, using emerging paradigms that are inspired by natural phenomena such as fuzzy logic, mean-field annealing, and simulated annealing. Systems that are designed using such techniques are often referred to in the literature as `intelligent' because of their capability to adapt to sudden changes in their environments. Moreover, most of these changes cannot be anticipated in advance or included in the original design of the system. Scheduling in Parallel Computing Systems: Fuzzy and Annealing Techniques provides results that prove such approaches can become viable alternatives to orthodox solutions to the scheduling problem, which are mostly based on heuristics. Although heuristics are robust and reliable when solving certain instances of the scheduling problem, they do not perform well when one needs to obtain solutions to general forms of the scheduling problem. On the other hand, techniques inspired by natural phenomena have been successfully applied for solving a wide range of combinatorial optimization problems (e.g. traveling salesman, graph partitioning). The success of these methods motivated their use in this book to solve scheduling problems that are known to be formidable combinatorial problems. Scheduling in Parallel Computing Systems: Fuzzy and Annealing Techniques is an excellent reference and may be used for advanced courses on the topic.

Design and Analysis of Scheduling Techniques for Throughput Processors

Design and Analysis of Scheduling Techniques for Throughput Processors
Author: Adwait Jog
Publisher:
Total Pages:
Release: 2015
Genre:
ISBN:

Download Design and Analysis of Scheduling Techniques for Throughput Processors Book in PDF, Epub and Kindle

Throughput Processors such as Graphics Processing Units (GPUs) are becoming an inevitable part of every computing system because of their ability to accelerate applications consisting of abundant parallelism.They are not only used to accelerate big data analytics in cloud data centers or high-performance computing (HPC) systems, but are also employed in mobile and wearable devices for efficient execution of multimedia rich applications and smooth rendering of display. In spite of the highly parallel structure of GPUs and their ability to execute multiple threads concurrently, they are far from achieving their theoretically achievable peak performance. This is attributed to several reasons such as contention for limited shared resources (e.g., caches and memory), high control-flow divergence, and limited off-chip memory bandwidth. Another reason for the low utilization and subpar performance is that the current GPUs are not well-equipped to efficiently and fairly execute multiple applications concurrently, potentially originating from different users. This dissertation is focused on managing contention in GPUs for shared cache and memory resources caused by concurrently executing threads. This contention causes severe loss in performance, fairness, locality, and parallelism. To manage this contention, this dissertation proposes techniques that are employed at two different places:core and memory. First, this dissertation shows that by intelligently scheduling the threads at the core, the generated memory request patterns can be more amenable for existing resource management techniques such as cache replacement and memory schedulingas well as performance enhancement techniques such as data prefetching. Second, this dissertation shows that considering criticality and other application characteristics to schedule memory requests at the memory controller is an effective way to manage contention at the memory.

Foundations of Real-Time Computing: Scheduling and Resource Management

Foundations of Real-Time Computing: Scheduling and Resource Management
Author: André M. van Tilborg
Publisher: Springer Science & Business Media
Total Pages: 346
Release: 1991-07-31
Genre: Computers
ISBN: 9780792391661

Download Foundations of Real-Time Computing: Scheduling and Resource Management Book in PDF, Epub and Kindle

This volume contains a selection of papers that focus on the state-of the-art in real-time scheduling and resource management. Preliminary versions of these papers were presented at a workshop on the foundations of real-time computing sponsored by the Office of Naval Research in October, 1990 in Washington, D.C. A companion volume by the title Foundations of Real-Time Computing: Fonnal Specifications and Methods complements this book by addressing many of the most advanced approaches currently being investigated in the arena of formal specification and verification of real-time systems. Together, these two texts provide a comprehensive snapshot of current insights into the process of designing and building real-time computing systems on a scientific basis. Many of the papers in this book take care to define the notion of real-time system precisely, because it is often easy to misunderstand what is meant by that term. Different communities of researchers variously use the term real-time to refer to either very fast computing, or immediate on-line data acquisition, or deadline-driven computing. This text is concerned with the very difficult problems of scheduling tasks and resource management in computer systems whose performance is inextricably fused with the achievement of deadlines. Such systems have been enabled for a rapidly increasing set of diverse end-uses by the unremitting advances in computing power per constant-dollar cost and per constant-unit-volume of space. End-use applications of deadline-driven real-time computers span a spectrum that includes transportation systems, robotics and manufacturing, aerospace and defense, industrial process control, and telecommunications.

Handbook of Scheduling

Handbook of Scheduling
Author: Joseph Y-T. Leung
Publisher: CRC Press
Total Pages: 1215
Release: 2004-04-27
Genre: Business & Economics
ISBN: 0203489802

Download Handbook of Scheduling Book in PDF, Epub and Kindle

This handbook provides full coverage of the most recent and advanced topics in scheduling, assembling researchers from all relevant disciplines to facilitate new insights. Presented in six parts, these experts provides introductory material, complete with tutorials and algorithms, then examine classical scheduling problems. Part 3 explores scheduling models that originate in areas such as computer science, operations research. The following section examines scheduling problems that arise in real-time systems. Part 5 discusses stochastic scheduling and queueing networks, and the final section discusses a range of applications in a variety of areas, from airlines to hospitals.

Fair Scheduling in High Performance Computing Environments

Fair Scheduling in High Performance Computing Environments
Author: Art Sedighi
Publisher:
Total Pages:
Release: 2019
Genre: Computer scheduling
ISBN: 9783030145699

Download Fair Scheduling in High Performance Computing Environments Book in PDF, Epub and Kindle

This book introduces a new scheduler to fairly and efficiently distribute system resources to many users of varying usage patterns compete for them in large shared computing environments. The Rawlsian Fair scheduler developed for this effort is shown to boost performance while reducing delay in high performance computing workloads of certain types including the following four types examined in this book: i. Class A - similar but complementary workloads; ii. Class B - similar but steady vs intermittent workloads; iii. Class C - Large vs small workloads; iv. Class D - Large vs noise-like workloads. This new scheduler achieves short-term fairness for small timescale demanding rapid response to varying workloads and usage profiles. Rawlsian Fair scheduler is shown to consistently benefit workload Classes C and D while it only benefits Classes A and B workloads where they become disproportionate as the number of users increases. A simulation framework, dSim, simulates the new Rawlsian Fair scheduling mechanism. The dSim helps achieve instantaneous fairness in High Performance Computing environments, effective utilization of computing resources, and user satisfaction through the Rawlsian Fair scheduler. --

On the User-scheduler Relationship in High-performance Computing

On the User-scheduler Relationship in High-performance Computing
Author: Cynthia Bailey Lee
Publisher:
Total Pages: 111
Release: 2009
Genre:
ISBN:

Download On the User-scheduler Relationship in High-performance Computing Book in PDF, Epub and Kindle

To effectively manage High-Performance Computing (HPC) resources, it is essential to maximize return on the substantial infrastructure investment they entail. One prerequisite to success is the ability of the scheduler and user to productively interact. This work develops criteria for measuring productivity, analyzes several aspects of the user-scheduler relationship via user studies, and develops solutions to some vexing barriers between users and schedulers. The five main contributions of this work are as follows. First, this work quantifies the desires of the user population and represents them as a utility function. This contribution is in four parts: a survey-based study collecting utility data from users of a supercomputer system, augmentation of the Standard Workload Format to enable scheduler research using utility functions, and a model for synthetically generating utility function-augmented workloads. Second, a number of the classic scheduling disciplines are evaluated by their ability to maximize aggregate utility of all users, using the synthetic utility functions. These evaluations show the performance impact of inaccurate runtime estimates, contradicting an oft quoted prior result [55] that inaccuracy of estimates leads to better scheduling. Third, a scheduler optimizing the aggregate utility of all users, using a genetic algorithm heuristic, is demonstrated. This contribution includes two software artifacts: an implementation of the genetic algorithm (GA) scheduler, and a modular, extensible scheduler simulation framework that simulates several classic scheduling disciplines and is interoperable with the Standard Workload Format. Fourth, the ability of users to productively interact with this scheduler by providing an accurate estimate of their resource (run time) needs is examined. This contribution consists of formalizing a frequent casual assertion from the scheduling literature, that users typically "pad" runtime estimates, into an explicit Padding Hypothesis, and then falsifying the hypothesis via a survey-based study of users of a supercomputer system. Specifically, absent an incentive to pad-and including incentives to be accurate-the inaccuracy of runtime estimates only improved from an average of 61% inaccurate to an average of 57% inaccurate. This contribution has implications not only for the proposed genetic algorithm scheduler, but for any scheduler that asks users for an estimate, which currently includes virtually all parallel job schedulers both in production use and proposed in the literature. Fifth, a survey of users of a supercomputer system and associated simulations explore the feasibility of removing one of the defining constraints of the parallel job scheduling problem-the non-preemptability of running jobs. An investigation of users' current checkpointing habits produced a workload labeled with per-job checkpoint information, enabling simulation of a checkpoint-aware GA scheduler that may preempt running jobs as it optimizes aggregate utility. Lifting the non-preemptability constraint improves performance of the GA scheduler by 16% (and 23% compared to classic EASY algorithm), including overhead penalties for job termination and restart.

Hard Real-Time Computing Systems

Hard Real-Time Computing Systems
Author: Giorgio C Buttazzo
Publisher: Springer
Total Pages: 524
Release: 2013-11-27
Genre: Computers
ISBN: 9781461430193

Download Hard Real-Time Computing Systems Book in PDF, Epub and Kindle

This updated edition offers an indispensable exposition on real-time computing, with particular emphasis on predictable scheduling algorithms. It introduces the fundamental concepts of real-time computing, demonstrates the most significant results in the field, and provides the essential methodologies for designing predictable computing systems used to support time-critical control applications. Along with an in-depth guide to the available approaches for the implementation and analysis of real-time applications, this revised edition contains a close examination of recent developments in real-time systems, including limited preemptive scheduling, resource reservation techniques, overload handling algorithms, and adaptive scheduling techniques. This volume serves as a fundamental advanced-level textbook. Each chapter provides basic concepts, which are followed by algorithms, illustrated with concrete examples, figures and tables. Exercises and solutions are provided to enhance self-study, making this an excellent reference for those interested in real-time computing for designing and/or developing predictable control applications.