The course is foused on learning the basics of Big data analytics. The couse has been designed for the students who are willing to pursue thier career in Big data. Following topics will be covered during the course
1. UNDERSTANDING BIG DATA: What is big data – why big data –.Data!, Data Storage and Analysis, Comparison with Other Systems, Rational Database Management ,introduction to Hadoop – open source
technologies
2. NOSQL DATA MANAGEMENT: Introduction to NoSQL – aggregate data models – aggregates – key-value and document data models – relationships – graph databases – schema less databases – materialized views – distribution models – shading –– version – map reduce – partitioning and combining – composing
3. BASICS OF HADOOP: Data format – analyzing data with Hadoop – scaling out – Hadoop streaming – Hadoop pipes – design of Hadoop distributed file system (HDFS) – HDFS concepts – data flow – Hadoop I/O – data integrity – compression – serialization – Avro – file-based data structures.
4. MAPREDUCE APPLICATIONS: MapReduce workflows – unit tests with MRUnit – test data and local tests –anatomy of MapReduce job run – classic Map-reduce – YARN – failures in classic Map-reduce and YARN –
job scheduling – shuffle and sort – task execution – MapReduce types – input formats – output formats
5. HADOOP RELATED TOOLS: Hbase – data model and implementations – Hbase clients – Hbase examples –
praxis. Cassandra – Cassandra data model – Cassandra examples – Cassandra clients –Hadoop integration.
Pig – Grunt – pig data model – Pig Latin – developing and testing Pig Latin scripts. Hive – data types and file
formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries.