Dynamic Load Balancing and Autoscaling in Distributed Stream Processing Systems

Dynamic Load Balancing and Autoscaling in Distributed Stream Processing Systems
Author: Xing Wu
Publisher:
Total Pages: 95
Release: 2015
Genre:
ISBN:

Download Dynamic Load Balancing and Autoscaling in Distributed Stream Processing Systems Book in PDF, Epub and Kindle

In big data world, Hadoop and other batch-processing tools are widely used to analyze data and get results in minutes. However, minutes of latency still cannot satisfy the proliferated needs for real-time decision in many fields such as live stock and trading feeds in financial services, telecommunications, sensor networks, online advertisement, etc. Distributed stream processing (DSP) systems aim to process, analyze and make decisions on-the-fly based on immense quantities of data streams being dynamically generated at high rates. As the rates of data streams may vary over time, DSP systems require an architecture that is elastic to handle dynamic load. Although many dynamic load balancing and autoscaling techniques for general pull-based distributed systems have been well studied, these solutions cannot be directly applied to DSP systems because DSP systems are push-based, they process data streams with different types of operators, each running on a cluster node. One research problem is to allocate data processing operators on nodes of clusters and balance the workload dynamically. Since the data volume and rate can be unpredictable, static mapping between operators and cluster resources often results in unbalanced operator load distribution. Furthermore, the problem of making DSP system scalable requires autoscaling at runtime. In this context, the operators need to be relocated among newly provisioned nodes. The contribution of this thesis is three folds. First, we proposes a software layer that is load-adaptive between a DSP engine and clusters. The architecture allows dynamic transferring of an operator to different cluster nodes at runtime and keeps the process transparent to developers. Second, an optimization method that combines correlation of resource utilization of nodes and capacity of clusters is proposed to balance load dynamically. Lastly, we design the autoscaling mechanism and algorithm to detect overload and provision nodes at runtime. We implement our design on S4, an open-source DSP engine first developed by Yahoo!. The implementation is evaluated by a top-N topic list application on Twitter streams using clusters on Amazon Web Services. The results demonstrate a 75.79% improvement on stream processing throughputs, and a 294.47% improvement on cluster resource utilization.

Load Balance For Distributed Real-time Computing Systems

Load Balance For Distributed Real-time Computing Systems
Author: Junhua Fang
Publisher: World Scientific
Total Pages: 259
Release: 2020-05-19
Genre: Computers
ISBN: 9811216169

Download Load Balance For Distributed Real-time Computing Systems Book in PDF, Epub and Kindle

This illustrative compendium analyzes the load balancing problem in distributed stream processing systems and explores a set of high-performance real-time processing scheme based on key-based balancing strategy, join-matrix model and fault tolerance mechanisms.The volume succinctly provides the theoretical support for the proposed techniques. Through a rich set of experiments and comparisons with the other state-of-the-art techniques using both standard benchmarks and real data sets, the book comprehensively verifies the correctness and effectiveness of the proposed methods.This unique title is an excellent reference text for researchers in the fields of distributed stream processing, parallel system, cloud computing, etc.

A FRAMEWORK FOR SCALABLE DISTRIBUTED JOB PROCESSING WITH DYNAMIC LOAD BALANCING USING DECENTRALIZED APPROACH

A FRAMEWORK FOR SCALABLE DISTRIBUTED JOB PROCESSING WITH DYNAMIC LOAD BALANCING USING DECENTRALIZED APPROACH
Author: Dr P. SrinivasaRao
Publisher: Lulu.com
Total Pages: 97
Release: 2017-12-30
Genre: Education
ISBN: 1387388762

Download A FRAMEWORK FOR SCALABLE DISTRIBUTED JOB PROCESSING WITH DYNAMIC LOAD BALANCING USING DECENTRALIZED APPROACH Book in PDF, Epub and Kindle

A distributed system consists of many heterogeneous processors with different processing power and all processors are interconnected with a communication channel. In such a system, if some processors are less loaded or idle and others are heavily loaded, the system performance will be reduced drastically. System performance can be improved by using proper load balancing [1, 4]. The aim of load balancing is to improve the performance measures and reduce the overall completion time and cost

Scheduling in Distributed Computing Environment Using Dynamic Load Balancing

Scheduling in Distributed Computing Environment Using Dynamic Load Balancing
Author: Priyesh Kanungo
Publisher: Anchor Academic Publishing
Total Pages: 153
Release: 2016-08
Genre: Computers
ISBN: 396067046X

Download Scheduling in Distributed Computing Environment Using Dynamic Load Balancing Book in PDF, Epub and Kindle

This book illustrates various components of Distributed Computing Environment and the importance of distributed scheduling using Dynamic Load Balancing. It describes load balancing algorithms for better resource utilization, increasing throughput and improving user’s response time. Various theoretical concepts, experiments, and examples enable students to understand the process of load balancing in computing cluster and server cluster. The book is suitable for students of Advance Operating Systems, High Performance Computing, Distributed Computing in B.E., M.C.A., M. Tech. and Ph.D courses.

Advanced Data Mining and Applications

Advanced Data Mining and Applications
Author: Xiaochun Yang
Publisher: Springer Nature
Total Pages: 848
Release: 2023-12-06
Genre: Computers
ISBN: 3031466616

Download Advanced Data Mining and Applications Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 19th International Conference on Advanced Data Mining and Applications, ADMA 2023, held in Shenyang, China, during August 21–23, 2023. The 216 full papers included in this book were carefully reviewed and selected from 503 submissions. They were organized in topical sections as follows: Data mining foundations, Grand challenges of data mining, Parallel and distributed data mining algorithms, Mining on data streams, Graph mining and Spatial data mining.

Performance Studies of Dynamic Load Balancing in Distributed Systems

Performance Studies of Dynamic Load Balancing in Distributed Systems
Author: University of California, Berkeley. Computer Science Division
Publisher:
Total Pages: 336
Release: 1987
Genre:
ISBN:

Download Performance Studies of Dynamic Load Balancing in Distributed Systems Book in PDF, Epub and Kindle

Distributed systems are often characterized by uneven loads on hosts and other resources. In this thesis, the problems concerning dynamic load balancing in loosely-coupled distributed systems are studied using trace-driven simulation, implementation, and measurement. Information about job CPU and I/O demands is collected from three production systems and used as input to a simulator that includes a representative CPU scheduling policy and considers the message exchange and job transfer costs explicitly. A prototype load balancer is implemented in the Berkeley UNIX and Sun/UNIX environments, and the results of a large number of measurement experiments performed on six workstations are presented.

Dynamic Load Balancing for Parallel and Distributed Systems

Dynamic Load Balancing for Parallel and Distributed Systems
Author: Zhiling Lan
Publisher:
Total Pages:
Release: 2002
Genre:
ISBN:

Download Dynamic Load Balancing for Parallel and Distributed Systems Book in PDF, Epub and Kindle

There are many scientific applications for which the computational load varies throughout the execution and causes uneven distribution of workload during run-time. One such class of applications is Adaptive Mesh Refinement (AMR) applications. AMR is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. A typical AMR application may require enormous computing resources, which usually cannot be satisfied by a single-processor machine, thereby requiring parallel and distributed systems. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel and distributed systems. In investigating DLB schemes, we first complete a detailed analysis of structured AMR (SAMR) applications, identifying the unique characteristics that impose severe challenges on DLB schemes. The results indicate that most of the available DLB schemes are not appropriate for SAMR applications due to their unique adaptive characteristics. Thus, we propose a novel dynamic load balancing scheme for SAMR applications on parallel systems (denoted as parallel DLB). It integrates a grid-splitting technique with direct grid movements, for which the objective is to reduce the parallel execution time. Further, our experiment shows that simply moving a DLB scheme designed for parallel systems to distributed systems will introduce significant overhead. Therefore, we propose a framework for dynamic load balancing on distributed systems (denoted as distributed DLB). It takes into consideration: (1) heterogeneity of processors, (2) heterogeneity of networks, (3) shared nature of networks, and (4) adaptive characteristics of the applications. For SAMR applications, the distributed DLB incorporates the proposed parallel DLB during the load balancing process. Both parallel DLB and distributed DLB were implemented in the ENZO code, a parallel implementation of SAMR in astrophysics and cosmology. Experiments show that the proposed DLB schemes can significantly improve the performance of SAMR applications on both parallel and distributed systems in terms of the total execution time and the quality of load balancing.