Data Intensive Workflow Management

Data Intensive Workflow Management
Author: Daniel Oliveira,Ji Liu,Esther Pacitti
Publsiher: Springer Nature
Total Pages: 161
Release: 2022-06-01
Genre: Computers
ISBN: 9783031018725

Download Data Intensive Workflow Management Book in PDF, Epub and Kindle

Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied to parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. More recently, Data-Intensive Scalable Computing (DISC) frameworks (e.g., Apache Spark and Hadoop) and environments emerged and are being used to execute data-intensive workflows. DISC environments are composed of processors and disks in large-commodity computing clusters connected using high-speed communications switches and networks. The main advantage of DISC frameworks is that they support and grant efficient in-memory data management for large-scale applications, such as data-intensive workflows. However, the execution of workflows in cloud and DISC environments raise many challenges such as scheduling workflow activities and activations, managing produced data, collecting provenance data, etc. Several existing approaches deal with the challenges mentioned earlier. This way, there is a real need for understanding how to manage these workflows and various big data platforms that have been developed and introduced. As such, this book can help researchers understand how linking workflow management with Data-Intensive Scalable Computing can help in understanding and analyzing scientific big data. In this book, we aim to identify and distill the body of work on workflow management in clouds and DISC environments. We start by discussing the basic principles of data-intensive scientific workflows. Next, we present two workflows that are executed in a single site and multi-site clouds taking advantage of provenance. Afterward, we go towards workflow management in DISC environments, and we present, in detail, solutions that enable the optimized execution of the workflow using frameworks such as Apache Spark and its extensions.

Data Intensive Workflow Management

Data Intensive Workflow Management
Author: Daniel C. M. de Oliveira,Ji Liu,Esther Pacitti
Publsiher: Morgan & Claypool Publishers
Total Pages: 181
Release: 2019-05-13
Genre: Computers
ISBN: 9781681735580

Download Data Intensive Workflow Management Book in PDF, Epub and Kindle

Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied to parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. More recently, Data-Intensive Scalable Computing (DISC) frameworks (e.g., Apache Spark and Hadoop) and environments emerged and are being used to execute data-intensive workflows. DISC environments are composed of processors and disks in large-commodity computing clusters connected using high-speed communications switches and networks. The main advantage of DISC frameworks is that they support and grant efficient in-memory data management for large-scale applications, such as data-intensive workflows. However, the execution of workflows in cloud and DISC environments raise many challenges such as scheduling workflow activities and activations, managing produced data, collecting provenance data, etc. Several existing approaches deal with the challenges mentioned earlier. This way, there is a real need for understanding how to manage these workflows and various big data platforms that have been developed and introduced. As such, this book can help researchers understand how linking workflow management with Data-Intensive Scalable Computing can help in understanding and analyzing scientific big data. In this book, we aim to identify and distill the body of work on workflow management in clouds and DISC environments. We start by discussing the basic principles of data-intensive scientific workflows. Next, we present two workflows that are executed in a single site and multi-site clouds taking advantage of provenance. Afterward, we go towards workflow management in DISC environments, and we present, in detail, solutions that enable the optimized execution of the workflow using frameworks such as Apache Spark and its extensions.

Knowledge Management in the Development of Data Intensive Systems

Knowledge Management in the Development of Data Intensive Systems
Author: Ivan Mistrik,Matthias Galster,Bruce R. Maxim,Bedir Tekinerdogan
Publsiher: CRC Press
Total Pages: 342
Release: 2021-06-15
Genre: Computers
ISBN: 9781000387414

Download Knowledge Management in the Development of Data Intensive Systems Book in PDF, Epub and Kindle

Data-intensive systems are software applications that process and generate Big Data. Data-intensive systems support the use of large amounts of data strategically and efficiently to provide intelligence. For example, examining industrial sensor data or business process data can enhance production, guide proactive improvements of development processes, or optimize supply chain systems. Designing data-intensive software systems is difficult because distribution of knowledge across stakeholders creates a symmetry of ignorance, because a shared vision of the future requires the development of new knowledge that extends and synthesizes existing knowledge. Knowledge Management in the Development of Data-Intensive Systems addresses new challenges arising from knowledge management in the development of data-intensive software systems. These challenges concern requirements, architectural design, detailed design, implementation and maintenance. The book covers the current state and future directions of knowledge management in development of data-intensive software systems. The book features both academic and industrial contributions which discuss the role software engineering can play for addressing challenges that confront developing, maintaining and evolving systems;data-intensive software systems of cloud and mobile services; and the scalability requirements they imply. The book features software engineering approaches that can efficiently deal with data-intensive systems as well as applications and use cases benefiting from data-intensive systems. Providing a comprehensive reference on the notion of data-intensive systems from a technical and non-technical perspective, the book focuses uniquely on software engineering and knowledge management in the design and maintenance of data-intensive systems. The book covers constructing, deploying, and maintaining high quality software products and software engineering in and for dynamic and flexible environments. This book provides a holistic guide for those who need to understand the impact of variability on all aspects of the software life cycle. It leverages practical experience and evidence to look ahead at the challenges faced by organizations in a fast-moving world with increasingly fast-changing customer requirements and expectations.

Data Intensive Distributed Computing Challenges and Solutions for Large scale Information Management

Data Intensive Distributed Computing  Challenges and Solutions for Large scale Information Management
Author: Kosar, Tevfik
Publsiher: IGI Global
Total Pages: 353
Release: 2012-01-31
Genre: Computers
ISBN: 9781615209729

Download Data Intensive Distributed Computing Challenges and Solutions for Large scale Information Management Book in PDF, Epub and Kindle

"This book focuses on the challenges of distributed systems imposed by the data intensive applications, and on the different state-of-the-art solutions proposed to overcome these challenges"--Provided by publisher.

Enterprise Resource Planning Concepts Methodologies Tools and Applications

Enterprise Resource Planning  Concepts  Methodologies  Tools  and Applications
Author: Management Association, Information Resources
Publsiher: IGI Global
Total Pages: 1629
Release: 2013-06-30
Genre: Business & Economics
ISBN: 9781466641549

Download Enterprise Resource Planning Concepts Methodologies Tools and Applications Book in PDF, Epub and Kindle

The design, development, and use of suitable enterprise resource planning systems continue play a significant role in ever-evolving business needs and environments. Enterprise Resource Planning: Concepts, Methodologies, Tools, and Applications presents research on the progress of ERP systems and their impact on changing business needs and evolving technology. This collection of research highlights a simple framework for identifying the critical factors of ERP implementation and statistical analysis to adopt its various concepts. Useful for industry leaders, practitioners, and researchers in the field.

Scientific Data Management

Scientific Data Management
Author: Arie Shoshani,Doron Rotem
Publsiher: CRC Press
Total Pages: 592
Release: 2009-12-16
Genre: Computers
ISBN: 9781420069815

Download Scientific Data Management Book in PDF, Epub and Kindle

Dealing with the volume, complexity, and diversity of data currently being generated by scientific experiments and simulations often causes scientists to waste productive time. Scientific Data Management: Challenges, Technology, and Deployment describes cutting-edge technologies and solutions for managing and analyzing vast amounts of data, helping

Designing Data Intensive Applications

Designing Data Intensive Applications
Author: Martin Kleppmann
Publsiher: "O'Reilly Media, Inc."
Total Pages: 658
Release: 2017-03-16
Genre: Computers
ISBN: 9781491903100

Download Designing Data Intensive Applications Book in PDF, Epub and Kindle

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

Grid and Cloud Database Management

Grid and Cloud Database Management
Author: Sandro Fiore,Giovanni Aloisio
Publsiher: Springer Science & Business Media
Total Pages: 353
Release: 2011-07-28
Genre: Computers
ISBN: 9783642200458

Download Grid and Cloud Database Management Book in PDF, Epub and Kindle

Since the 1990s Grid Computing has emerged as a paradigm for accessing and managing distributed, heterogeneous and geographically spread resources, promising that we will be able to access computer power as easily as we can access the electric power grid. Later on, Cloud Computing brought the promise of providing easy and inexpensive access to remote hardware and storage resources. Exploiting pay-per-use models and virtualization for resource provisioning, cloud computing has been rapidly accepted and used by researchers, scientists and industries. In this volume, contributions from internationally recognized experts describe the latest findings on challenging topics related to grid and cloud database management. By exploring current and future developments, they provide a thorough understanding of the principles and techniques involved in these fields. The presented topics are well balanced and complementary, and they range from well-known research projects and real case studies to standards and specifications, and non-functional aspects such as security, performance and scalability. Following an initial introduction by the editors, the contributions are organized into four sections: Open Standards and Specifications, Research Efforts in Grid Database Management, Cloud Data Management, and Scientific Case Studies. With this presentation, the book serves mostly researchers and graduate students, both as an introduction to and as a technical reference for grid and cloud database management. The detailed descriptions of research prototypes dealing with spatiotemporal or genomic data will also be useful for application engineers in these fields.