Hadoop – Course Contents

Hadoop – Course Contents

hadoop online training in hyderabad, Online hadoop training

hadoop online training in hyderabad, Online hadoop training

Course Duration: – 30 Sessions (Each session is 1Hr)

Objectives

At the end of Hadoop training course, the student will be able to :

  • Understand the basic concepts of Hadoop course
  • Use built-in functions of Hadoop Development
  • Develop the Debugging map reduce programming
  • learn to tuning for performance in Map Reduce.

Benefits to you:

  • 100% student satisfaction guaranteed, if NOT fee will be returned back to student
  • Hands-on practices
  • Teaching with real-time examples
  • Introduction of various tools to be used by Hadoop developers/administrators

Course Contents

BASICS

The Motivation for Hadoop

  • Problems with traditional large-scale systems
  • Data Storage literature survey
  • Data Processing literature Survey
  • Network Constraints
  • Requirements for a new approach

Hadoop: Basic Concepts

  • What is Hadoop?
  • The Hadoop Distributed File System
  • Hadoop Map Reduce Works
  • Anatomy of a Hadoop Cluster

Hadoop demons

Master Daemons

  • Name node
  • Job Tracker
  • Secondary name node

Slave Daemons

  • Job tracker
  • Task tracker

HDFS (Hadoop Distributed File System)

Blocks and Splits

  • Input Splits
  • HDFS Splits

Data Replication

  • Hadoop Rack Aware
  • Data high availability
  • Cluster architecture and block placement

CASE STUDIES

Programming Practices & Performance Tuning

  • Pseudo-distributed Mode
  • Fully distributed mode
  • Running daemons on dedicated nodes

Hadoop Administration

Setup Hadoop cluster of Apache, Cloudera

  • Make a fully distributed Hadoop cluster on a single laptop/desktop
  • Install and configure Apache Hadoop on a multi node cluster in lab.
  • Install and configure Cloudera Hadoop distribution in fully distributed mode
  • Monitoring the cluster
  • Name Node in Safe mode
  • Meta Data Backup

CASE STUDIES Hadoop Development

Writing a MapReduce Program

  • Examining a Sample MapReduce Program with several examples
  • Basic API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop’s Streaming API

Performing several Hadoop jobs

  • The configure and close Methods
  • Sequence Files
  • Record Reader
  • Record Writer
  • Role of Reporter
  • Output Collector
  • Counters
  • Directly Accessing HDFS
  • ToolRunner
  • Using The Distributed Cache

Several MapReduce jobs (In Detailed)

MOST EFFECTIVE SEARCH USING MAPREDUCE

GENERATING THE RECOMMENDATIONS USING MAPREDUCE

PROCESSING THE LOG FILES USING MAPREDUCE

  • Identity Mapper
  • Identity Reducer
  • Exploring well known problems using MapReduce applications

Debugging MapReduce Programs

  • Testing with MRUnit
  • Logging
  • Other Debugging Strategies.

Advanced MapReduce Programming

  • The Secondary Sort
  • Customized Input Formats and Output Formats
  • Joins in MapReduce

Monitoring and debugging on a Production Cluster

  • Counters
  • Skipping Bad Records
  • Running in local mode

Tuning for Performance in MapReduce

  • Reducing network traffic with combiner
  • Partitions
  • Reducing the amount of input data
  • Using Compression
  • Reusing the JVM
  • Running with speculative execution
  • Other Performance Aspects

CASE STUDIES CDH4 & CDH5 Enhancements

  • Name Node High – Availability
  • Name Node federation
  • Fencing
  • MapReduce Version – 2

HADOOP ANALYST

Hive

Hive concepts

Hive architecture

Install and configure hive on cluster

Different type of tables in hive

Hive library functions

Buckets

Partitions

Joins in hive

  • Inner joins
  • Outer Joins

PIG

Pig basics

Install and configure PIG on a cluster

PIG Library functions

Pig Vs Hive

Write sample Pig Latin scripts

Modes of running PIG

  • Running in Grunt shell
  • Running as Java program

Sqoop

  • Install and configure Sqoop on cluster
  • Connecting to RDBMS
  • Installing Mysql
  • Import data from Oracle/Mysql to hive
  • Export data to Oracle/Mysql
  • Internal mechanism of import/export

INTEGRATIONS

  • MongoDB with HIVE
  • MongoDB with PIG
  • HIVE with HBASE
  • HIVE with ORACLE,TERADATA

hadoop online training in hyderabad, Online hadoop training

Contact Us

5th Floor, T-Hub Building
IIIT Campus, Gachibowli Circle,
Hyderabad, Telangana 500032.
Phone: +91-9966465202 (India)
+1-415-935-5884 (USA) rajuonlinetraining@gmail.com
www.rajutechnologies.com
Skype: rajutechnologies

Raju Technologies

Raju Technologies is founded and operated by real-time EXPERTS. We'll assure that, training will be delivered with 100% quality. We always reachable after the training as well.