Analysis of Application Power and Schedule Composition in a High Performance Computing Environment

Analysis of Application Power and Schedule Composition in a High Performance Computing Environment
Author:
Publisher:
Total Pages: 0
Release: 2016
Genre:
ISBN:

Download Analysis of Application Power and Schedule Composition in a High Performance Computing Environment Book in PDF, Epub and Kindle

As the capacity of high performance computing (HPC) systems continues to grow, small changes in energy management have the potential to produce significant energy savings. In this paper, we employ an extensive informatics system for aggregating and analyzing real-time performance and power use data to evaluate energy footprints of jobs running in an HPC data center. We look at the effects of algorithmic choices for a given job on the resulting energy footprints, and analyze application-specific power consumption, and summarize average power use in the aggregate. All of these views reveal meaningful power variance between classes of applications as well as chosen methods for a given job. Using these data, we discuss energy-aware cost-saving strategies based on reordering the HPC job schedule. Using historical job and power data, we present a hypothetical job schedule reordering that: (1) reduces the facility's peak power draw and (2) manages power in conjunction with a large-scale photovoltaic array. Lastly, we leverage this data to understand the practical limits on predicting key power use metrics at the time of submission.

Fair Scheduling in High Performance Computing Environments

Fair Scheduling in High Performance Computing Environments
Author: Art Sedighi
Publisher: Springer
Total Pages: 132
Release: 2019-04-17
Genre: Computers
ISBN: 3030145689

Download Fair Scheduling in High Performance Computing Environments Book in PDF, Epub and Kindle

This book introduces a new scheduler to fairly and efficiently distribute system resources to many users of varying usage patterns compete for them in large shared computing environments. The Rawlsian Fair scheduler developed for this effort is shown to boost performance while reducing delay in high performance computing workloads of certain types including the following four types examined in this book: i. Class A – similar but complementary workloads ii. Class B – similar but steady vs intermittent workloads iii. Class C – Large vs small workloads iv. Class D – Large vs noise-like workloads This new scheduler achieves short-term fairness for small timescale demanding rapid response to varying workloads and usage profiles. Rawlsian Fair scheduler is shown to consistently benefit workload Classes C and D while it only benefits Classes A and B workloads where they become disproportionate as the number of users increases. A simulation framework, dSim, simulates the new Rawlsian Fair scheduling mechanism. The dSim helps achieve instantaneous fairness in High Performance Computing environments, effective utilization of computing resources, and user satisfaction through the Rawlsian Fair scheduler.

Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology

Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology
Author: Nitin Sukhija
Publisher:
Total Pages: 172
Release: 2015
Genre:
ISBN:

Download Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology Book in PDF, Epub and Kindle

Large scale systems provide a powerful computing platform for solving large and complex scientific applications. However, the inherent complexity, heterogeneity, wide distribution, and dynamism of the computing environments can lead to performance degradation of the scientific applications executing on these computing systems. Load imbalance arising from a variety of sources such as application, algorithmic, and systemic variations is one of the major contributors to their performance degradation. In general, load balancing is achieved via scheduling. Moreover, frequently occurring resource failures drastically affect the execution of applications running on high performance computing systems. Therefore, the study of deploying support for integrated scheduling and fault-tolerance mechanisms for guaranteeing that applications deployed on computing systems are resilient to failures becomes of paramount importance. Recently, several research initiatives have started to address the issue of resilience. However, the major focus of these efforts was geared more toward achieving system level resilience with less emphasis on achieving resilience at the application level. Therefore, it is increasingly important to extend the concept of resilience to the scheduling techniques at the application level for establishing a holistic approach that addresses the performability of these applications on high performance computing systems. This can be achieved by developing a comprehensive modeling framework that can be used to evaluate the resiliency of such techniques on heterogeneous computing systems for assessing the impact of failures as well as workloads in an integrated way. This dissertation presents an experimental methodology based on discrete event simulation for the analysis and the evaluation of the resilience of scheduling scientific applications on high performance computing systems. With the aid of the methodology a wide class of dependencies existing between application and computing system are captured within a deterministic model for quantifying the performance impact expected from changes in application and system characteristics. Ideally, the results obtained by employing the proposed simulation-based performance prediction framework enabled an introspective design and investigation of scheduling heuristics to reason about how to best fully optimize various often antagonistic objectives, such as minimizing application makespan and maximizing reliability.

Study of Application Aware Techniques for System and Runtime Power Management

Study of Application Aware Techniques for System and Runtime Power Management
Author: Sharat Chandra Doddaghatta Shashidhar
Publisher:
Total Pages: 69
Release: 2012
Genre: High performance computing
ISBN:

Download Study of Application Aware Techniques for System and Runtime Power Management Book in PDF, Epub and Kindle

Energy efficiency of large-scale data centers is becoming a major concern not only for reasons of energy conservation, failures, and cost reduction, but also because such systems are soon reaching the limits of power available to them. Like High Performance Computing (HPC) systems, large-scale cluster-based data centers can consume power in megawatts, and of all the power consumed by such a system, only a fraction is used for actual computations. In this paper, we study the potential of application-centric aggressive power management of data center's resources for HPC workloads. Specifically, we consider power management mechanisms and controls (currently or soon to be) available at different levels and for different subsystems, and leverage several innovative approaches that have been taken to tackle this problem in the last few years, can be effectively used in an application-aware manner for HPC workloads. To do this, we first profile standard HPC benchmarks with respect to behaviors, resource usage and power impact on individual computing nodes. Based on a 4 power and latency model and the workload profiles, we develop an algorithm that can improve energy efficiency with little or no performance loss. We then evaluate our proposed algorithm through simulations using empirical power characterization and quantification. Finally, we validate the simulation results with actual executions on real hardware. The obtained results show that by using application aware power management, we can reduce the average energy consumption without significant penalty in performance. This motivates us to investigate autonomic approaches for application-aware aggressive power management and cross layer and cross function predictive subsystem level power management for large-scale data centers.

Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments

Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments
Author: Mina Naghshnejad
Publisher:
Total Pages: 308
Release: 2019
Genre:
ISBN:

Download Scheduling, Characterization and Prediction of HPC Workloads for Distributed Computing Environments Book in PDF, Epub and Kindle

As High Performance Computing (HPC) has grown considerably and is expected to grow even more, effective resource management for distributed computing sys- tems is motivated more than ever. As the computational workloads grow in quantity, it is becoming more crucial to apply efficient resource management and workload scheduling to use resources efficiently while keeping the computational performance reasonably good. The problem of efficiently scheduling workloads on resources while meeting performance standards is hard. Additionally, non-clairvoyance of job dimen- sions makes resource management even harder in real-world scenarios. Our research methodology investigates the scheduling problem compliant for HPC and researches the challenges for deploying the scheduling in real world-scenarios using state of the art machine learning and data science techniques. To this end, this Ph. D. dissertation makes the following core contributions: a) We perform a theoretical analysis of space-sharing, non-preemptive scheduling: we studied this scheduling problem and proposed scheduling algorithms with polyno- mial computation time. We also proved constant upper-bounds for the performance of these algorithms. b) We studied the sensitivity of scheduling algorithms to the accuracy of runtime and devised a meta-learning approach to estimate prediction accuracy for newly submitted jobs to the HPC system. c) We studied the runtime prediction problem for HPC applications. For this purpose, we studied the distri- bution of available public workloads and proposed two different solutions that can predict multi-modal distributions: switching state-space models and Mixture Density Networks. d) We studied the effectiveness of recent recurrent neural network models for CPU usage trace prediction for individual VM traces as well as aggregate CPU usage traces. In this dissertation, we explore solutions to improve the performance of scheduling workloads on distributed systems. We begin by looking at the problem from the theoretical perspective. Modeling the problem mathematically, we first propose a scheduling algorithm that finds a constant approximation of the optimal solution for the problem in polynomial time. We prove that the performance of the algorithm (average completion time is the constant approximation of the performance of the optimal scheduling. We next look at the problem in real-world scenarios. Considering High-Performance Computing (HPC) workload computing environments as the most similar real-world equivalent of our mathematical model, we explore the problem of predicting application runtime. We propose an algorithm to handle the existing uncertainties in the real world and show-case our algorithm with demonstrative effectiveness in terms of response time and resource utilization. After looking at the uncertainty problem, we focus on trying to improve the accuracy of existing prediction approaches for HPC application runtime. We propose two solutions, one based on Kalman filters and one based on deep density mixture networks. We showcase the effectiveness of our prediction approaches by comparing with previous prediction approaches in terms of prediction accuracy and impact on improving scheduling performance. In the end, we focus on predicting resource usage for individual applications during their execution. We explore the application of recurrent neural networks for predicting resource usage of applications deployed on individual virtual machines. To validate our proposed models and solutions, we performed extensive trace-driven simulation and measured the effectiveness of our approaches.

High Performance Computing

High Performance Computing
Author: Amanda Bienz
Publisher: Springer Nature
Total Pages: 677
Release: 2023-09-25
Genre: Computers
ISBN: 3031408438

Download High Performance Computing Book in PDF, Epub and Kindle

This volume constitutes the papers of several workshops which were held in conjunction with the 38th International Conference on High Performance Computing, ISC High Performance 2023, held in Hamburg, Germany, during May 21–25, 2023. The 49 revised full papers presented in this book were carefully reviewed and selected from 70 submissions. ISC High Performance 2023 presents the following workshops: ​2nd International Workshop on Malleability Techniques Applications in High-Performance Computing (HPCMALL) 18th Workshop on Virtualization in High-Performance Cloud Computing (VHPC 23) HPC I/O in the Data Center (HPC IODC) Workshop on Converged Computing of Cloud, HPC, and Edge (WOCC’23) 7th International Workshop on In Situ Visualization (WOIV’23) Workshop on Monitoring and Operational Data Analytics (MODA23) 2nd Workshop on Communication, I/O, and Storage at Scale on Next-Generation Platforms: Scalable Infrastructures First International Workshop on RISC-V for HPC Second Combined Workshop on Interactive and Urgent Supercomputing (CWIUS) HPC on Heterogeneous Hardware (H3)

Introduction to High Performance Computing for Scientists and Engineers

Introduction to High Performance Computing for Scientists and Engineers
Author: Georg Hager
Publisher: CRC Press
Total Pages: 350
Release: 2010-07-02
Genre: Computers
ISBN: 1439811938

Download Introduction to High Performance Computing for Scientists and Engineers Book in PDF, Epub and Kindle

Written by high performance computing (HPC) experts, Introduction to High Performance Computing for Scientists and Engineers provides a solid introduction to current mainstream computer architecture, dominant parallel programming models, and useful optimization strategies for scientific HPC. From working in a scientific computing center, the author

Applications, Tools and Techniques on the Road to Exascale Computing

Applications, Tools and Techniques on the Road to Exascale Computing
Author: Koen de Bosschere
Publisher: IOS Press
Total Pages: 688
Release: 2012
Genre: Computers
ISBN: 1614990409

Download Applications, Tools and Techniques on the Road to Exascale Computing Book in PDF, Epub and Kindle

Single processing units have now reached a point where further major improvements in their performance are restricted by their physical limitations. This is causing a slowing down in advances at the same time as new scientific challenges are demanding exascale speed. This has meant that parallel processing has become key to High Performance Computing (HPC). This book contains the proceedings of the 14th biennial ParCo conference, ParCo2011, held in Ghent, Belgium. The ParCo conferences have traditionally concentrated on three main themes: Algorithms, Architectures and Applications. Nowadays though, the focus has shifted from traditional multiprocessor topologies to heterogeneous and manycores, incorporating standard CPUs, GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays). These platforms are, at a higher abstraction level, integrated in clusters, grids and clouds. The papers presented here reflect this change of focus. New architectures, programming tools and techniques are also explored, and the need for exascale hardware and software was also discussed in the industrial session of the conference.This book will be of interest to all those interested in parallel computing today, and progress towards the exascale computing of tomorrow.

High Performance Computing

High Performance Computing
Author: Ian Foster
Publisher: Ios PressInc
Total Pages: 309
Release: 2011
Genre: Computers
ISBN: 9781607508021

Download High Performance Computing Book in PDF, Epub and Kindle

In the last decade, parallel computing technologies have transformed high-performance computing. Two trends have emerged massively parallel computing leading to exascale on the one hand and moderately parallel applications, which have opened up high-perf