Big Data Hadoop Course

BIG DATA- HADOOP Course Content 

Introduction

 BIG DATA- HADOOP (Development & Basic Administration)

What You Will Get From This Course?

  • In-depth understanding of Entire Big Data Hadoop and Hadoop Eco System
  • Real time idea of Hadoop Development
  • Detailed Course Materials
  • Free Core Java and UNIX Fundamentals
  • Interview Oriented Discussions
  • Get Ready for Hadoop & Spark Developer (CCA175) Certification Exam

 Overall Course Structure:

  • UNIX/LINUX Basic Commands
  • Basic UNIX Shell Scripting
  • Basic Java Programming – Core JAVA OOPS Concepts
  • Introduction to Big Data and Hadoop
  • Working With HDFS
  • Hadoop Map Reduce Concepts & Features
  • Developing Map Reduce Applications
  • Hadoop Eco System Components:
    • HIVE
    • PIG
    • HBASE
    • FLUME
    • SQOOP
    • OOZIE
  • Introduction to SPARK & SCALA
  • Real Time Tools like Putty, WinSCP, Eclipse, Hue, Cloudera Manager

Pre-Requisite:

  • Basic SQL Knowledge
  • Computer with Minimum 4GB RAM (8GM RAM Preferred)
  • Basic UNIX & Java Programming knowledge is added advantage

 Detailed Course Structure:

 Introduction to Big Data & Hadoop

  • The Big Data Problem
  • What is Big Data?
  • Challenges in processing Big Data
  • What is Hadoop?
  • Why Hadoop?
  • History of Hadoop
  • Hadoop Components Overview
    • HDFS
    • Map Reduce
  • Hadoop Eco System Introduction
  • NoSQL Database Introduction

 Understanding Hadoop Architecture

  • Hadoop 2.x Architecture
  • Introduction to YARN
  • Hadoop Daemons
  • YARN Architecture
  • Resource Manager
  • Application Master
  • Node Manager

Introduction to HDFS (Hadoop Distributed File System)

  • Rack Awareness
  • HDFS Daemons
  • Writing Files to HDFS
    • Blocks & Splits
    • Input Splits
    • Data Replication
  • Reading Files from HDFS
  • Introduction to HDFS Configuration Files

Working with HDFS

  • HDFS Commands
  • Accessing HDFS
  • CLI Approach
  • JAVA Approach [Introducing HDFS JAVA API]

Introduction to Map Reduce Paradigm

  • What is Map Reduce?
  • Detailed Map Reduce Flow
  • Introduction to Key/Value Approach
  • Detailed Mapper Functionality
  • Detailed Reducer Functionality
  • Details of Partitioner
  • Shuffle & Sort Process
  • Understanding Map Reduce Flow with Word Count Example

Map Reduce Programming

  • Introduction to Map Reduce API [New Map Reduce API]
  • Map Reduce Data Types
  • File Formats
  • Input Formats – Input Splits & Records, text input, binary input
  • Output Formats – Text Output, Binary Output
  • Configuring Development Environment – Eclipse
  • Developing a Map Reduce Application using Default Functionality
  • Identity Mapper
  • Identity Reducer
  • ToolRunner API Introduction
  • Developing Word Count Application
  • Writing Mapper, Reducer & Driver Code
  • Building Application
  • Deploying Application
  • Running the Map Reduce Application
  • Local Mode of Execution
  • Cluster Mode of Execution
  • Monitoring Map Reduce Application
  • Map Reduce Combiner
  • Map Reduce Counters
  • Map Reduce Partitioner
  • File Merge Utility

Programming with HIVE

  • Introduction to HIVE
  • Hive Architecture
  • Types of Meta store
  • Introduction to Hive Configuration Files
  • Hive Data Types
  • Simple Data Types
  • Collection Data Types
  • Types of Hive Tables
  • Managed Table
  • External Table
  • Hive Query Language (HQL or HIVE QL)
  • Creating Databases
  • Creating Tables
  • Joins in Hive
  • Group BY and Distinct operations
  • Partitioning

Static Partitioning

Dynamic Partitioning

  • Bucketing
  • Lateral View & Explode [Introduction to Hive UDFs  UDF, UDAF & UDTF]
  • XML Processing in HIVE
  • JSON processing in HIVE
  • URL Processing in HIVE
  • Hive File Formats [Introduction to Hive SERDE]
  • Parquet
  • ORC
  • AVRO
  • Introduction to HIVE Query Optimizations
  • Developing Hive UDFs in JAVA

Programming with PIG

  • Introduction to PIG
  • PIG Architecture
  • Introduction to PIG Configuration Files
  • PIG vs. HIVE vs. Map Reduce
  • Introduction to Data Flow Language
  • Pig Data Types
  • Pig Programming Modes
  • Pig Access Modes
  • Detailed PIG Latin Programming
  • PIG UDFs & UDF Development in JAVA
  • Hive - PIG Integration Introduction to HCATALOG
  • Introduction to PIG Optimization

NoSQL & HBASE

  • Introduction to NoSQL Databases
  • Types of NoSQL Databases
  • Introduction To HBASE
  • HBASE Architecture
  • HBASE Shell Interface
  • Creating Data Bases and Tables
  • Inserting Data in tables
  • Accessing data from Tables
  • HBase Filters
  • Hive & HBASE Integration
  • PIG & HBASE Integration

Introduction to Streaming & FLUME

  • Introduction to Streaming
  • Introduction to FLUME
  • FLUME Architecture
  • Flume Agent Setup
  • Types of Source, Channel & Sinks
  • Developing Sample Flume Applications
  • Introduction to KAFKA

SQOOP

  • Introduction to SQOOP
  • Connecting to RDBMS Using SQOOP
  • SQOOP Import
  • Import to HDFS o Import to HIVE o Import to HBASE o Bulk Import
    • Full Table
    • Subset of a Tables
    • All tables in DB
  • Incremental Import
  • SQOOP Export
  • Export from HDFS
  • Export from Hive

SPARK & SCALA

  • Scala Programming Basics
  • Apache Spark Basics
  • Using Spark Shell
  • Spark RDD
  • RDD Overview
  • RDD Data Sources
  • Creating and Saving RDDs
  • RDD Operations
  • Pair RDD and Pair RDD Operations
  • Concept of Persistence
  • Spark Data Frames
  • What is Data Frame?
  • Creating Data Frame from Data Sources (including converting RDD to Data Frames)
  • Data Frame Operations

Using Column Expressions

Grouping and Aggregation

Joining Data Frames

Concept of Persistence

  • Spark SQL
  • Querying table in Spark using Spark SQL
  • Querying Files and Views
  • Spark Streaming
  • Integrating Spark Streaming with Flume & Kafka