Data-Intensive Workflow Management

Data-Intensive Workflow Management
Author: Daniel C. M. de Oliveira
Publisher: Morgan & Claypool Publishers
Total Pages: 181
Release: 2019-05-13
Genre: Computers
ISBN: 168173558X

Download Data-Intensive Workflow Management Book in PDF, Epub and Kindle

Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied to parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. More recently, Data-Intensive Scalable Computing (DISC) frameworks (e.g., Apache Spark and Hadoop) and environments emerged and are being used to execute data-intensive workflows. DISC environments are composed of processors and disks in large-commodity computing clusters connected using high-speed communications switches and networks. The main advantage of DISC frameworks is that they support and grant efficient in-memory data management for large-scale applications, such as data-intensive workflows. However, the execution of workflows in cloud and DISC environments raise many challenges such as scheduling workflow activities and activations, managing produced data, collecting provenance data, etc. Several existing approaches deal with the challenges mentioned earlier. This way, there is a real need for understanding how to manage these workflows and various big data platforms that have been developed and introduced. As such, this book can help researchers understand how linking workflow management with Data-Intensive Scalable Computing can help in understanding and analyzing scientific big data. In this book, we aim to identify and distill the body of work on workflow management in clouds and DISC environments. We start by discussing the basic principles of data-intensive scientific workflows. Next, we present two workflows that are executed in a single site and multi-site clouds taking advantage of provenance. Afterward, we go towards workflow management in DISC environments, and we present, in detail, solutions that enable the optimized execution of the workflow using frameworks such as Apache Spark and its extensions.

Data-Intensive Workflow Management

Data-Intensive Workflow Management
Author: Daniel Oliveira
Publisher: Springer Nature
Total Pages: 161
Release: 2022-06-01
Genre: Computers
ISBN: 3031018729

Download Data-Intensive Workflow Management Book in PDF, Epub and Kindle

Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied to parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. More recently, Data-Intensive Scalable Computing (DISC) frameworks (e.g., Apache Spark and Hadoop) and environments emerged and are being used to execute data-intensive workflows. DISC environments are composed of processors and disks in large-commodity computing clusters connected using high-speed communications switches and networks. The main advantage of DISC frameworks is that they support and grant efficient in-memory data management for large-scale applications, such as data-intensive workflows. However, the execution of workflows in cloud and DISC environments raise many challenges such as scheduling workflow activities and activations, managing produced data, collecting provenance data, etc. Several existing approaches deal with the challenges mentioned earlier. This way, there is a real need for understanding how to manage these workflows and various big data platforms that have been developed and introduced. As such, this book can help researchers understand how linking workflow management with Data-Intensive Scalable Computing can help in understanding and analyzing scientific big data. In this book, we aim to identify and distill the body of work on workflow management in clouds and DISC environments. We start by discussing the basic principles of data-intensive scientific workflows. Next, we present two workflows that are executed in a single site and multi-site clouds taking advantage of provenance. Afterward, we go towards workflow management in DISC environments, and we present, in detail, solutions that enable the optimized execution of the workflow using frameworks such as Apache Spark and its extensions.

Automated Optimization Methods for Scientific Workflows in e-Science Infrastructures

Automated Optimization Methods for Scientific Workflows in e-Science Infrastructures
Author: Sonja Holl
Publisher: Forschungszentrum Jülich
Total Pages: 207
Release: 2014
Genre:
ISBN: 389336949X

Download Automated Optimization Methods for Scientific Workflows in e-Science Infrastructures Book in PDF, Epub and Kindle

Scientific workflows have emerged as a key technology that assists scientists with the design, management, execution, sharing and reuse of in silico experiments. Workflow management systems simplify the management of scientific workflows by providing graphical interfaces for their development, monitoring and analysis. Nowadays, e-Science combines such workflow management systems with large-scale data and computing resources into complex research infrastructures. For instance, e-Science allows the conveyance of best practice research in collaborations by providing workflow repositories, which facilitate the sharing and reuse of scientific workflows. However, scientists are still faced with different limitations while reusing workflows. One of the most common challenges they meet is the need to select appropriate applications and their individual execution parameters. If scientists do not want to rely on default or experience-based parameters, the best-effort option is to test different workflow set-ups using either trial and error approaches or parameter sweeps. Both methods may be inefficient or time consuming respectively, especially when tuning a large number of parameters. Therefore, scientists require an effective and efficient mechanism that automatically tests different workflow set-ups in an intelligent way and will help them to improve their scientific results. This thesis addresses the limitation described above by defining and implementing an approach for the optimization of scientific workflows. In the course of this work, scientists’ needs are investigated and requirements are formulated resulting in an appropriate optimization concept. In a following step, this concept is prototypically implemented by extending a workflow management system with an optimization framework, including general mechanisms required to conduct workflow optimization. As optimization is an ongoing research topic, different algorithms are provided by pluggable extensions (plugins) that can be loosely coupled with the framework, resulting in a generic and quickly extendable system. In this thesis, an exemplary plugin is introduced which applies a Genetic Algorithm for parameter optimization. In order to accelerate and therefore make workflow optimization feasible at all, e-Science infrastructures are utilized for the parallel execution of scientific workflows. This is empowered by additional extensions enabling the execution of applications and workflows on distributed computing resources. The actual implementation and therewith the general approach of workflow optimization is experimentally verified by four use cases in the life science domain. All workflows were significantly improved, which demonstrates the advantage of the proposed workflow optimization. Finally, a new collaboration-based approach is introduced that harnesses optimization provenance to make optimization faster and more robust in the future.

Workflows for e-Science

Workflows for e-Science
Author: Ian J. Taylor
Publisher: Springer Science & Business Media
Total Pages: 532
Release: 2007-12-31
Genre: Computers
ISBN: 184628757X

Download Workflows for e-Science Book in PDF, Epub and Kindle

This is a timely book presenting an overview of the current state-of-the-art within established projects, presenting many different aspects of workflow from users to tool builders. It provides an overview of active research, from a number of different perspectives. It includes theoretical aspects of workflow and deals with workflow for e-Science as opposed to e-Commerce. The topics covered will be of interest to a wide range of practitioners.

Data-Intensive Science

Data-Intensive Science
Author: Terence Critchlow
Publisher: CRC Press
Total Pages: 449
Release: 2016-04-19
Genre: Computers
ISBN: 100075569X

Download Data-Intensive Science Book in PDF, Epub and Kindle

Data-intensive science has the potential to transform scientific research and quickly translate scientific progress into complete solutions, policies, and economic success. But this collaborative science is still lacking the effective access and exchange of knowledge among scientists, researchers, and policy makers across a range of disciplines. Bringing together leaders from multiple scientific disciplines, Data-Intensive Science shows how a comprehensive integration of various techniques and technological advances can effectively harness the vast amount of data being generated and significantly accelerate scientific progress to address some of the world's most challenging problems. In the book, a diverse cross-section of application, computer, and data scientists explores the impact of data-intensive science on current research and describes emerging technologies that will enable future scientific breakthroughs. The book identifies best practices used to tackle challenges facing data-intensive science as well as gaps in these approaches. It also focuses on the integration of data-intensive science into standard research practice, explaining how components in the data-intensive science environment need to work together to provide the necessary infrastructure for community-scale scientific collaborations. Organizing the material based on a high-level, data-intensive science workflow, this book provides an understanding of the scientific problems that would benefit from collaborative research, the current capabilities of data-intensive science, and the solutions to enable the next round of scientific advancements.

Handbook of Whale Optimization Algorithm

Handbook of Whale Optimization Algorithm
Author: Seyedali Mirjalili
Publisher: Elsevier
Total Pages: 688
Release: 2023-11-24
Genre: Computers
ISBN: 0323953646

Download Handbook of Whale Optimization Algorithm Book in PDF, Epub and Kindle

Handbook of Whale Optimization Algorithm: Variants, Hybrids, Improvements, and Applications provides the most in-depth look at an emerging meta-heuristic that has been widely used in both science and industry. Whale Optimization Algorithm has been cited more than 5000 times in Google Scholar, thus solving optimization problems using this algorithm requires addressing a number of challenges including multiple objectives, constraints, binary decision variables, large-scale search space, dynamic objective function, and noisy parameters to name a few. This handbook provides readers with in-depth analysis of this algorithm and existing methods in the literature to cope with such challenges. The authors and editors also propose several improvements, variants and hybrids of this algorithm. Several applications are also covered to demonstrate the applicability of methods in this book. Provides in-depth analysis of equations, mathematical models and mechanisms of the Whale Optimization Algorithm Proposes different variants of the Whale Optimization Algorithm to solve binary, multiobjective, noisy, dynamic and combinatorial optimization problems Demonstrates how to design, develop and test different hybrids of Whale Optimization Algorithm Introduces several application areas of the Whale Optimization Algorithm, focusing on sustainability Includes source code from applications and algorithms that is available online

Scientific Workflows

Scientific Workflows
Author: Jun Qin
Publisher: Springer Science & Business Media
Total Pages: 228
Release: 2012-08-15
Genre: Computers
ISBN: 3642307159

Download Scientific Workflows Book in PDF, Epub and Kindle

Creating scientific workflow applications is a very challenging task due to the complexity of the distributed computing environments involved, the complex control and data flow requirements of scientific applications, and the lack of high-level languages and tools support. Particularly, sophisticated expertise in distributed computing is commonly required to determine the software entities to perform computations of workflow tasks, the computers on which workflow tasks are to be executed, the actual execution order of workflow tasks, and the data transfer between them. Qin and Fahringer present a novel workflow language called Abstract Workflow Description Language (AWDL) and the corresponding standards-based, knowledge-enabled tool support, which simplifies the development of scientific workflow applications. AWDL is an XML-based language for describing scientific workflow applications at a high level of abstraction. It is designed in a way that allows users to concentrate on specifying such workflow applications without dealing with either the complexity of distributed computing environments or any specific implementation technology. This research monograph is organized into five parts: overview, programming, optimization, synthesis, and conclusion, and is complemented by an appendix and an extensive reference list. The topics covered in this book will be of interest to both computer science researchers (e.g. in distributed programming, grid computing, or large-scale scientific applications) and domain scientists who need to apply workflow technologies in their work, as well as engineers who want to develop distributed and high-throughput workflow applications, languages and tools.

Computational Science – ICCS 2021

Computational Science – ICCS 2021
Author: Maciej Paszynski
Publisher: Springer Nature
Total Pages: 815
Release: 2021-06-10
Genre: Computers
ISBN: 3030779610

Download Computational Science – ICCS 2021 Book in PDF, Epub and Kindle

The six-volume set LNCS 12742, 12743, 12744, 12745, 12746, and 12747 constitutes the proceedings of the 21st International Conference on Computational Science, ICCS 2021, held in Krakow, Poland, in June 2021.* The total of 260 full papers and 57 short papers presented in this book set were carefully reviewed and selected from 635 submissions. 48 full and 14 short papers were accepted to the main track from 156 submissions; 212 full and 43 short papers were accepted to the workshops/ thematic tracks from 479 submissions. The papers were organized in topical sections named: Part I: ICCS Main Track Part II: Advances in High-Performance Computational Earth Sciences: Applications and Frameworks; Applications of Computational Methods in Artificial Intelligence and Machine Learning; Artificial Intelligence and High-Performance Computing for Advanced Simulations; Biomedical and Bioinformatics Challenges for Computer Science Part III: Classifier Learning from Difficult Data; Computational Analysis of Complex Social Systems; Computational Collective Intelligence; Computational Health Part IV: Computational Methods for Emerging Problems in (dis-)Information Analysis; Computational Methods in Smart Agriculture; Computational Optimization, Modelling and Simulation; Computational Science in IoT and Smart Systems Part V: Computer Graphics, Image Processing and Artificial Intelligence; Data-Driven Computational Sciences; Machine Learning and Data Assimilation for Dynamical Systems; MeshFree Methods and Radial Basis Functions in Computational Sciences; Multiscale Modelling and Simulation Part VI: Quantum Computing Workshop; Simulations of Flow and Transport: Modeling, Algorithms and Computation; Smart Systems: Bringing Together Computer Vision, Sensor Networks and Machine Learning; Software Engineering for Computational Science; Solving Problems with Uncertainty; Teaching Computational Science; Uncertainty Quantification for Computational Models *The conference was held virtually. Chapter “Deep Learning Driven Self-adaptive hp Finite Element Method” is available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.

Supercomputing

Supercomputing
Author: Vladimir Voevodin
Publisher: Springer Nature
Total Pages: 346
Release: 2024-01-04
Genre: Computers
ISBN: 3031494350

Download Supercomputing Book in PDF, Epub and Kindle

The two-volume set LNCS 14388 and 14389 constitutes the refereed proceedings of the 9th Russian Supercomputing Days International Conference (RuSCDays 2023) held in Moscow, Russia, during September 25-26, 2023. The 44 full papers and 1 short paper presented in these proceedings were carefully reviewed and selected from 104 submissions. The papers have been organized in the following topical sections: supercomputer simulation; distributed computing; and HPC, BigData, AI: algorithms, technologies, evaluation.