Big-Data Introduction and Hadoop Fundamentals
- Data Storage and Analysis
- Comparison with RDBMS
Hadoop â?? A Brief History
MapReduce â?? Part1
- Map and Reduce
- Sample Program
- Combiner
- Practitioners and Custom Partitioned
Hadoop Streaming & Pipes
HDFS
- Blocks
- NN & DN
- HDFS Federation & High Availability
HDFS Clients
- HDFS Command Line
- HDFS CLI â?? File System Operations Lab
- HDFS Web UI
- HDFS Java Client
- HDFS Java Client â?? File System Operations Lab
- CRUD Operations using Java Client
- Anatomy of File Read and File Write
- DistCp
- Cluster balancing
YARN â?? Cluster Management (Hadoop 2.x)
- How Yarn Applications run?
- YARN vs MapReduce
- YARN Scheduling
  - Capacity Scheduler
  - Fair Scheduler
  - FIFO Scheduler
- Map Reduce â?? Part2
  - Env Setup
  - Tool and ToolRunner
  - Mapper
  - Reducer
  - Driver program
  - How to package the job
  - MapReduce WebUI
  - How MapReduce Job run?
  - Shuffle & Sort
  - Speculative Execution
- InputFormats
  - Input Splits and Record Reader
  - Default Input Formats
  - Implement Custom Input Format
- OutputFormats
  - Default Output formats
  - Output Record Reader
- Compression
  - Map Output
  - Final Output
  - Splittable vs Non Splittable
  - Compression Codecs
- Serialization
  - Data types â??default
  - Writable vs Writable Comparable
  - Custom Data types â?? Custom Writable/Comparable
- File Based Data structures
  - Sequence file
  - Reading and Writing into Sequence file
  - Map File
- Tuning MapReduce Jobs
- Advanced MapReduce
  - Counters
    - Built-In Counters Classification
    - User Defined Counters
  - Sorting
    - Partial Sort
    - Total Sort
    - Secondary Sort
  - Joins
    - Map-side joins
    - Reduce-side joins
    - Distributed Cache
  - Hive
    - Comparison with RDBMS
    - HQL
    - Data types
    - Tables
    - Importing and Exporting
    - Partitioning and Bucketing â?? Advanced.
    - Joins and Join Optimization.
    - Functions- Built in & user defined
    - Advanced Optimization of HQL
    - Storage File Formats â?? Advanced
    - Loading and Storing Data
    - SerDes â?? Advanced
  - Sqoop
    - Important basics
    - Import â?? Deep dive
    - Export â?? Deep dive
    - Sqoop Optimization â?? Incremental Load
    - Many more
  - PIG
    - Important basics
    - Pig Latin
    - Data types
    - Functions â?? Built-in, User Defined
    - Loading and Storing Data
  - Flume
    - Configure Flume and Import data
    - Architecture and LAB
  - Oozie
    - Different workflow jobs
    - Ooze scheduler.
    - LAB â?? covers advanced topics
  - HBase
    - NoSQL databases Introduction
    - CAP theorem
    - HBase Architecture
    - HBase Clients â?? Java Client
    - Loadling Data
    - UDF,UDAF,UDTFs
  - Zookeeper
    - Zookeeper in HBase
    - How Zookeeper is used in Production
  - Ambari
    - Real time Cluster deployment Using Ambari
    - Monitoring the Cluster
  - REST API
    - Introduction
    - Real time Use cases of How REST is used with Hadoop
  - Labs:
    - Real Time use cases and Data sets covered (10+ Real Time datasets)
    - Word count, Sensors(Weather Sensors)Dataset, Social Media data sets like YouTube, Twitter data analysis,
    - Java and Unix Basics Lab
    - Hadoop, Hive, Sqoop, Oozie, HBase, Flume Installations â??Pseudo&Cluster

COURSE FEE â?? RS 35,000/- (Negotiable)

(FLEXIBLE PAYMENT MONTLY PAYMENT OPTIONS ARE AVAILABLE ON 0% EMI BASIS)

100% Job Guarantee Support including the post on job support till 1 year.

Master Project:

- Real-time DataWarehouse migration:
- Real-time concepts covered are
  - Hive - Advanced topics
  - Sqoop import/export
  - Oozie Scheduling
  - How Hadoop MR used in DW
  - RDBMS concepts
  - ETL tool concepts
  - Integration with Reporting tools

Gallery (6)

Intro Video

About the Trainer

5 Avg Rating

1 Reviews

6 Students

3 Courses

Srinivas Kalyan Kamalapuram

Cloudera certified

Mr.Ravi, Big data Architect @ MNC, 12+ yeas of Real Time Experience

He is Cloudera Certified Enterprise Architect with huge experience on Private and Public Cloud Technologies. The trainers are advisors and members of larger Cloud Computing Forums and seasoned integrators of IT Cloud Computing technologies with more than 12+ years in global large enterprise giants.