Cloudera Training for Data Analysts: Using Pig, Hive, and Impala with Hadoop

$2,995.00

Classroom
Onsite

Duration: 3 Days

In this hands-on course, you will learn how Apache Pig, Apache Hive, and Cloudera Impala enable data transformations and analyses via filters, joins, and user-defined functions familiar from other technologies. You will learn how to apply traditional data analytics and business intelligence skills to big data, and you'll learn how to access, manipulate, and analyze complex data sets using SQL and familiar scripting languages.

Apache Hive makes multi-structured data accessible to analysts, database administrators, and others without Java programming expertise. Apache Pig applies the fundamentals of familiar scripting languages to the Hadoop cluster. Cloudera Impala enables real-time interactive analysis of the data stored in Hadoop via a native SQL environment.

What You Will Learn

Fundamentals of Apache Hadoop and data extract, transform, load (ETL), ingestion, and processing with Hadoop tools
Joining multiple data sets and analyzing disparate data with Pig
Organizing data into tables, performing transformations, and simplifying complex queries with Hive
Performing real-time interactive analyses on massive data sets stored in HDFS or HBase using SQL with Impala
How to pick the best analysis tool for a given task in Hadoop

Audience

Data analysts, business analysts, developers, and administrators

Prerequistes

Familiarity with SQL and basic UNIX or Linux commands
Prior knowledge of Java and Apache Hadoop is not required

Course Outline

1. Hadoop Fundamentals

The Motivation for Hadoop
Hadoop Overview
HDFS
MapReduce
The Hadoop Ecosystem
Hands-On Exercise: Data Ingest with Hadoop Tools

2. Introduction to Pig

What Is Pig?
Pig's Features
Pig Use Cases
Interacting with Pig

3. Basic Data Analysis with Pig

Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly Used Functions
Hands-On Exercise: Using Pig for ETL Processing

4. Processing Complex Data with Pig

Storage Formats
Complex/Nested Data Types
Grouping
Built-In Functions for Complex Data
Iterating Grouped Data
Hands-On Exercise: Analyzing Ad Campaign Data with Pig

5. Multi-Dataset Operations with Pig

Techniques for Combining Data Sets
Joining Data Sets in Pig
Set Operations
Splitting Data Sets
Hands-On Exercise: Analyzing Disparate Data Sets with Pig

6. Extending Pig

Adding Flexibility with Parameters
Macros and Imports
UDFs
Contributed Functions
Using Other Languages to Process Data with Pig
Hands-On Exercise: Extending Pig with Streaming and UDFs

7. Pig Troubleshooting and Optimization

Troubleshooting Pig
Logging
Using Hadoop's Web UI
Optional Demo: Troubleshooting a Failed Job with the Web UI
Data Sampling and Debugging
Performance Overview
The Execution Plan
Tips for Improving the Performance of Your Pig Jobs

8. Introduction to Hive

What Is Hive?
Hive Schema and Data Storage
Comparing Hive to Traditional Databases
Hive vs. Pig
Hive Use Cases
Interacting with Hive

9. Relational Data Analysis with Hive

Hive Databases and Tables
Basic HiveQL Syntax
Data Types
Joining Data Sets
Common Built-In Functions
Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

10. Hive Data Management

Hive Data Formats
Creating Databases and Hive-Managed Tables
Loading Data into Hive
Altering Databases and Tables
Self-Managed Tables
Simplifying Queries with Views
Storing Query Results
Controlling Access to Data
Hands-On Exercise: Data Management with Hive

11. Text Processing with Hive

Overview of Text Processing
Important String Functions
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Hands-On Exercise (Optional): Gaining Insight with Sentiment Analysis

12. Hive Optimization

Query Performance
Controlling Job Execution Plan
Partitioning
Bucketing
Indexing Data

13. Extending Hive

SerDes
Data Transformation with
Custom Scripts
User-Defined Functions
Parameterized Queries
Hands-On Exercise: Data Transformation with Hive

14. Introduction to Impala

What is Impala?
How Impala Differs from Hive and Pig
How Impala Differs from Relational Databases
Limitations and Future Directions
Using the Impala Shell

15. Analyzing Data with Impala

Basic Syntax
Data Types
Filtering, Sorting, and Limiting Results
Joining and Grouping Data
Improving Impala Performance
Hands-On Exercise: Interactive Analysis with Impala

16. Choosing the Best Tool for the Job

Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
Which to Choose?

Course Labs

You will participate in hands-on exercises throughout the course.

Cloudera Training for Data Analysts: Using Pig, Hive, and Impala with Hadoop

What You Will Learn

Audience

Prerequistes

Course Outline

1. Hadoop Fundamentals

2. Introduction to Pig

3. Basic Data Analysis with Pig

4. Processing Complex Data with Pig

5. Multi-Dataset Operations with Pig

6. Extending Pig

7. Pig Troubleshooting and Optimization

8. Introduction to Hive

9. Relational Data Analysis with Hive

10. Hive Data Management

11. Text Processing with Hive

12. Hive Optimization

13. Extending Hive

14. Introduction to Impala

15. Analyzing Data with Impala

16. Choosing the Best Tool for the Job

Course Labs

Request Quote

About Us

Payment Methods

Contact

Main Menu

Solutions by Role

Connect With Us

About PI

Cloudera Training for Data Analysts: Using Pig, Hive, and Impala with Hadoop

What You Will Learn

Audience

Prerequistes

Course Outline

1. Hadoop Fundamentals

2. Introduction to Pig

3. Basic Data Analysis with Pig

4. Processing Complex Data with Pig

5. Multi-Dataset Operations with Pig

6. Extending Pig

7. Pig Troubleshooting and Optimization

8. Introduction to Hive

9. Relational Data Analysis with Hive

10. Hive Data Management

11. Text Processing with Hive

12. Hive Optimization

13. Extending Hive

14. Introduction to Impala

15. Analyzing Data with Impala

16. Choosing the Best Tool for the Job

Course Labs

Request Quote

About Us

Payment Methods

Contact

Main Menu

Solutions by Role

Connect With Us