Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution
Author: Sandeep R. Patil
Publisher: IBM Redbooks
Total Pages: 30
Release: 2018-06-26
Genre: Computers
ISBN: 0738456969

Download Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution Book in PDF, Epub and Kindle

This IBM® RedpaperTM publication provides guidance on building an enterprise-grade data lake by using IBM SpectrumTM Scale and Hortonworks Data Platform for performing in-place Hadoop or Spark-based analytics. It covers the benefits of the integrated solution, and gives guidance about the types of deployment models and considerations during the implementation of these models. Hortonworks Data Platform (HDP) is a leading Hadoop and Spark distribution. HDP addresses the complete needs of data-at-rest, powers real-time customer applications, and delivers robust analytics that accelerate decision making and innovation. IBM Spectrum ScaleTM is flexible and scalable software-defined file storage for analytics workloads. Enterprises around the globe have deployed IBM Spectrum Scale to form large data lakes and content repositories to perform high-performance computing (HPC) and analytics workloads. It can scale performance and capacity both without bottlenecks.

Hortonworks Data Platform with IBM Spectrum Scale

Hortonworks Data Platform with IBM Spectrum Scale
Author: Sandeep Patil
Publisher:
Total Pages: 30
Release: 2018
Genre:
ISBN:

Download Hortonworks Data Platform with IBM Spectrum Scale Book in PDF, Epub and Kindle

This IBM® RedpaperTM publication provides guidance on building an enterprise-grade data lake by using IBM SpectrumTM Scale and Hortonworks Data Platform for performing in-place Hadoop or Spark-based analytics. It covers the benefits of the integrated solution, and gives guidance about the types of deployment models and considerations during the implementation of these models. Hortonworks Data Platform (HDP) is a leading Hadoop and Spark distribution. HDP addresses the complete needs of data-at-rest, powers real-time customer applications, and delivers robust analytics that accelerate decision making and innovation. IBM Spectrum ScaleTM is flexible and scalable software-defined file storage for analytics workloads. Enterprises around the globe have deployed IBM Spectrum Scale to form large data lakes and content repositories to perform high-performance computing (HPC) and analytics workloads. It can scale performance and capacity both without bottlenecks.

IBM Spectrum Scale: Big Data and Analytics Solution Brief

IBM Spectrum Scale: Big Data and Analytics Solution Brief
Author: Wei G. Gong
Publisher: IBM Redbooks
Total Pages: 14
Release: 2018-01-23
Genre: Computers
ISBN: 0738456632

Download IBM Spectrum Scale: Big Data and Analytics Solution Brief Book in PDF, Epub and Kindle

This IBM® RedguideTM publication describes big data and analytics deployments that are built on IBM Spectrum ScaleTM. IBM Spectrum Scale is a proven enterprise-level distributed file system that is a high-performance and cost-effective alternative to Hadoop Distributed File System (HDFS) for Hadoop analytics services. IBM Spectrum Scale includes NFS, SMB, and Object services and meets the performance that is required by many industry workloads, such as technical computing, big data, analytics, and content management. IBM Spectrum Scale provides world-class, web-based storage management with extreme scalability, flash accelerated performance, and automatic policy-based storage tiering from flash through disk to the cloud, which reduces storage costs up to 90% while improving security and management efficiency in cloud, big data, and analytics environments. This Redguide publication is intended for technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing Hadoop analytics services and are interested in learning about the benefits of the use of IBM Spectrum Scale as an alternative to HDFS.

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale
Author: Wei Gong
Publisher: IBM Redbooks
Total Pages: 42
Release: 2021-08-27
Genre: Computers
ISBN: 0738459380

Download Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale Book in PDF, Epub and Kindle

This IBM® Redpaper publication provides guidance on building an enterprise-grade data lake by using IBM Spectrum® Scale and Cloudera Data Platform (CDP) Private Cloud Base for performing in-place Cloudera Hadoop or Cloudera Spark-based analytics. It also covers the benefits of the integrated solution and gives guidance about the types of deployment models and considerations during the implementation of these models. August 2021 update added CES protocol support in Hadoop environment

IBM Software-Defined Storage Guide

IBM Software-Defined Storage Guide
Author: Larry Coyne
Publisher: IBM Redbooks
Total Pages: 158
Release: 2018-07-21
Genre: Computers
ISBN: 0738457051

Download IBM Software-Defined Storage Guide Book in PDF, Epub and Kindle

Today, new business models in the marketplace coexist with traditional ones and their well-established IT architectures. They generate new business needs and new IT requirements that can only be satisfied by new service models and new technological approaches. These changes are reshaping traditional IT concepts. Cloud in its three main variants (Public, Hybrid, and Private) represents the major and most viable answer to those IT requirements, and software-defined infrastructure (SDI) is its major technological enabler. IBM® technology, with its rich and complete set of storage hardware and software products, supports SDI both in an open standard framework and in other vendors' environments. IBM services are able to deliver solutions to the customers with their extensive knowledge of the topic and the experiences gained in partnership with clients. This IBM RedpaperTM publication focuses on software-defined storage (SDS) and IBM Storage Systems product offerings for software-defined environments (SDEs). It also provides use case examples across various industries that cover different client needs, proposed solutions, and results. This paper can help you to understand current organizational capabilities and challenges, and to identify specific business objectives to be achieved by implementing an SDS solution in your enterprise.

Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers

Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers
Author: Scott Vetter
Publisher: IBM Redbooks
Total Pages: 82
Release: 2018-01-31
Genre: Computers
ISBN: 0738456608

Download Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers Book in PDF, Epub and Kindle

Data warehouses were developed for many good reasons, such as providing quick query and reporting for business operations, and business performance. However, over the years, due to the explosion of applications and data volume, many existing data warehouses have become difficult to manage. Extract, Transform, and Load (ETL) processes are taking longer, missing their allocated batch windows. In addition, data types that are required for business analysis have expanded from structured data to unstructured data. The Apache open source Hadoop platform provides a great alternative for solving these problems. IBM® has committed to open source since the early years of open Linux. IBM and Hortonworks together are committed to Apache open source software more than any other company. IBM Power SystemsTM servers are built with open technologies and are designed for mission-critical data applications. Power Systems servers use technology from the OpenPOWER Foundation, an open technology infrastructure that uses the IBM POWER® architecture to help meet the evolving needs of big data applications. The combination of Power Systems with Hortonworks Data Platform (HDP) provides users with a highly efficient platform that provides leadership performance for big data workloads such as Hadoop and Spark. This IBM RedpaperTM publication provides details about Enterprise Data Warehouse (EDW) optimization with Hadoop on Power Systems. Many people know Power Systems from the IBM AIX® platform, but might not be familiar with IBM PowerLinuxTM, so part of this paper provides a Power Systems overview. A quick introduction to Hadoop is provided for those not familiar with the topic. Details of HDP on Power Reference architecture are included that will help both software architects and infrastructure architects understand the design. In the optimization chapter, we describe various topics: traditional EDW offload, sizing guidelines, performance tuning, IBM Elastic StorageTM Server (ESS) for data-intensive workload, IBM Big SQL as the common structured query language (SQL) engine for Hadoop platform, and tools that are available on Power Systems that are related to EDW optimization. We also dedicate some pages to the analytics components (IBM Data Science Experience (IBM DSX) and IBM SpectrumTM Conductor for Spark workload) for the Hadoop infrastructure.

IBM Spectrum Scale Security

IBM Spectrum Scale Security
Author: Felipe Knop
Publisher: IBM Redbooks
Total Pages: 116
Release: 2018-09-18
Genre: Computers
ISBN: 0738457167

Download IBM Spectrum Scale Security Book in PDF, Epub and Kindle

Storage systems must provide reliable and convenient data access to all authorized users while simultaneously preventing threats coming from outside or even inside the enterprise. Security threats come in many forms, from unauthorized access to data, data tampering, denial of service, and obtaining privileged access to systems. According to the Storage Network Industry Association (SNIA), data security in the context of storage systems is responsible for safeguarding the data against theft, prevention of unauthorized disclosure of data, prevention of data tampering, and accidental corruption. This process ensures accountability, authenticity, business continuity, and regulatory compliance. Security for storage systems can be classified as follows: Data storage (data at rest, which includes data durability and immutability) Access to data Movement of data (data in flight) Management of data IBM® Spectrum Scale is a software-defined storage system for high performance, large-scale workloads on-premises or in the cloud. IBM SpectrumTM Scale addresses all four aspects of security by securing data at rest (protecting data at rest with snapshots, and backups and immutability features) and securing data in flight (providing secure management of data, and secure access to data by using authentication and authorization across multiple supported access protocols). These protocols include POSIX, NFS, SMB, Hadoop, and Object (REST). For automated data management, it is equipped with powerful information lifecycle management (ILM) tools that can help administer unstructured data by providing the correct security for the correct data. This IBM RedpaperTM publication details the various aspects of security in IBM Spectrum ScaleTM, including the following items: Security of data in transit Security of data at rest Authentication Authorization Hadoop security Immutability Secure administration Audit logging Security for transparent cloud tiering (TCT) Security for OpenStack drivers Unless stated otherwise, the functions that are mentioned in this paper are available in IBM Spectrum Scale V4.2.1 or later releases.

IBM Platform Computing Solutions for High Performance and Technical Computing Workloads

IBM Platform Computing Solutions for High Performance and Technical Computing Workloads
Author: Dino Quintero
Publisher: IBM Redbooks
Total Pages: 176
Release: 2015-06-19
Genre: Computers
ISBN: 0738440752

Download IBM Platform Computing Solutions for High Performance and Technical Computing Workloads Book in PDF, Epub and Kindle

This IBM® Redbooks® publication is a refresh of IBM Technical Computing Clouds, SG24-8144, Enhance Inbound and Outbound Marketing with a Trusted Single View of the Customer, SG24-8173, and IBM Platform Computing Integration Solutions, SG24-8081, with a focus on High Performance and Technical Computing on IBM Power SystemsTM. This book describes synergies across the IBM product portfolio by using case scenarios and showing solutions such as IBM SpectrumTM Scale (formerly GPFSTM). This book also reflects and documents the IBM Platform Computing Cloud Services as part of IBM Platform Symphony® for analytics workloads and IBM Platform LSF® (with new features, such as a Hadoop connector, a MapReduce accelerator, and dynamic cluster) for job scheduling. Both products are used to help customers schedule and analyze large amounts of data for business productivity and competitive advantages. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering cost-effective cloud services and big data solutions on IBM Power Systems to uncover insights among client data so that they can take actions to optimize business results, product development, and scientific discoveries.

IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences

IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences
Author: Dino Quintero
Publisher: IBM Redbooks
Total Pages: 88
Release: 2019-09-08
Genre: Computers
ISBN: 073845690X

Download IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences Book in PDF, Epub and Kindle

This IBM® Redpaper publication provides an update to the original description of IBM Reference Architecture for Genomics. This paper expands the reference architecture to cover all of the major vertical areas of healthcare and life sciences industries, such as genomics, imaging, and clinical and translational research. The architecture was renamed IBM Reference Architecture for High Performance Data and AI in Healthcare and Life Sciences to reflect the fact that it incorporates key building blocks for high-performance computing (HPC) and software-defined storage, and that it supports an expanding infrastructure of leading industry partners, platforms, and frameworks. The reference architecture defines a highly flexible, scalable, and cost-effective platform for accessing, managing, storing, sharing, integrating, and analyzing big data, which can be deployed on-premises, in the cloud, or as a hybrid of the two. IT organizations can use the reference architecture as a high-level guide for overcoming data management challenges and processing bottlenecks that are frequently encountered in personalized healthcare initiatives, and in compute-intensive and data-intensive biomedical workloads. This reference architecture also provides a framework and context for modern healthcare and life sciences institutions to adopt cutting-edge technologies, such as cognitive life sciences solutions, machine learning and deep learning, Spark for analytics, and cloud computing. To illustrate these points, this paper includes case studies describing how clients and IBM Business Partners alike used the reference architecture in the deployments of demanding infrastructures for precision medicine. This publication targets technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing life sciences solutions and support.

IBM Data Engine for Hadoop and Spark

IBM Data Engine for Hadoop and Spark
Author: Dino Quintero
Publisher: IBM Redbooks
Total Pages: 126
Release: 2016-08-24
Genre: Computers
ISBN: 0738441937

Download IBM Data Engine for Hadoop and Spark Book in PDF, Epub and Kindle

This IBM® Redbooks® publication provides topics to help the technical community take advantage of the resilience, scalability, and performance of the IBM Power SystemsTM platform to implement or integrate an IBM Data Engine for Hadoop and Spark solution for analytics solutions to access, manage, and analyze data sets to improve business outcomes. This book documents topics to demonstrate and take advantage of the analytics strengths of the IBM POWER8® platform, the IBM analytics software portfolio, and selected third-party tools to help solve customer's data analytic workload requirements. This book describes how to plan, prepare, install, integrate, manage, and show how to use the IBM Data Engine for Hadoop and Spark solution to run analytic workloads on IBM POWER8. In addition, this publication delivers documentation to complement available IBM analytics solutions to help your data analytic needs. This publication strengthens the position of IBM analytics and big data solutions with a well-defined and documented deployment model within an IBM POWER8 virtualized environment so that customers have a planned foundation for security, scaling, capacity, resilience, and optimization for analytics workloads. This book is targeted at technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering analytics solutions and support on IBM Power Systems.