true

Learn Hadoop from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Big Data

Kranthi Kumar Kandula

13/02/2017 0 0

Bigdata

Large amount of data and data may be various types such as structured, unstructured, and semi-structured, the data which cannot processed by our traditional database applications are not enough. The challenges include storage, process, transfer, search, analysis and querying.

The characteristics of Big data

Volume(Size)

It determines the volume of the data the generated per every second. That data will be in Zettabytes or Brontobytes. If we look at airplanes they generate approximately 2.5 billion Terabyte of data each year from the sensors installed in the engines. Self-driving cars will generate 2 Petabyte of data every year. Also the agricultural industry generates massive amounts of data with sensors installed in tractors. Shell uses super-sensitive sensors to find additional oil in wells and if they install these sensors at all 10,000 wells they will collect approximately 10 Exabyte of data annually. That again is absolutely nothing if we compare it to the Square Kilometer Array Telescope that will generate 1 Exabyte of data per day.

Velocity (speed)

The term velocity refers to the speed generation of data or how fast the data is generated and processed to meet the demand and challenges.

The speed at which data is created currently is almost unimaginable: Every minute we upload 100 hours of video on YouTube. In addition, every minute over 200 million emails are sent, around 20 million photos are viewed and 30,000 uploaded on Flickr, almost 300,000 tweets are sent and almost 2.5 million queries on Google are performed.

Variety

It refers the variations or types of the data. Data classified into three types

Structured: the data which is fitted in the relational database tables such tables financial details, population census of every individual of the world

Semi-structured: Semi-structured data is a form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known asself-describing structure. Such as XML, HTML and Json data.

Unstructured: 90% of data in the world consists of unstructured data such as images, videos, sounds

Real-life Examples of the Big data

Search engines

E-Commerce

Social media

Fraud detection

Crime Prevention

Solutions for the Big Data

NoSQL: NoSQL is database environment. It designed for real-time interactive process Which supports the commodity hardware. Handles the large amount of the data. It has function of fast read and writes of data. It supports incremental, horizontal scaling, changing data formats. In the NoSQL we can store user transactions, sensor data, and customer profiles. Which provides the cluster support.

Hadoop: It supports incremental, horizontal scaling, changing data formats. It provides the batch and large scale processing Which supports the commodity hardware. Handles the large amount of the data. In Hadoop can store prediction analytics, fraud detections and recommendations. Which provides the cluster support

CASSANDRA: Which handles high volume of data. It offers quick installations and configure multi-node cluster. It scales from GB’s to PB’s of data. Is designed for continuous availability. It is the open source and cost less than 80% -90% RDBMS. Cassandra handles high velocity of data with ease (absence of difficulty). Uses schemas that support board varieties of data.

HADOOP

Hadoop is the open source frame work ,that allows the distributed processing of large data sets across the clusters of computers using simple programming model, which allows the storage and processing of the Big data in the distributed environment. Which is written in java, the developers of Hadoop is Apache Software Foundation. Which consists of two core components that are HDFS and Mapreduce.

History of the Hadoop

Hadoop origin of Hadoop is came from the Google File System paper (GFS) which was published in 2003. It consists Google Mapreduce which enable data processing over the large clusters. Hadoop is the sub project Apache Nutch in 2006. It was founded by Dough Cutting he is working in yahoo. Name of Hadoop is came from toy elephant of Dough cutting son.

2003	October	Google File System paper released
2004	December	MapReduce: Simplified Data Processing on Large Clusters
2006	January	Hadoop subproject created with mailing lists, jira, and wiki
2006	January	Hadoop is born from Nutch 197
2006	February	NDFS+ MapReduce moved out of Apache Nutch to create Hadoop
2006	February	Owen Omalley's first patch goes into Hadoop
2006	February	Hadoop is named after Cutting's son's yellow plush toy
2006	April	Hadoop 0.1.0 released
2006	April	Hadoop sorts 1.8TB on 188 nodes in 47.9 hours
2006	May	Yahoo deploys 300 machine Hadoop cluster
2006	October	Yahoo Hadoop cluster reaches 600 machines
2007	April	Yahoo runs 2 clusters of 1,000 machines
2007	June	Only 3 companies on "Powered by Hadoop Page"
2007	October	First release of Hadoop that includes HBase
2007	October	Yahoo Labs creates Pig, and donates it to the ASF
2008	January	YARN JIRA opened
2008	January	20 companies on "Powered by Hadoop Page"
2008	February	Yahoo moves its web index onto Hadoop
2008	February	Yahoo! production search index generated by a 10,000-core Hadoop cluster
2008	March	First Hadoop Summit
2008	April	Hadoop world record fastest system to sort a terabyte of data. Running on a 910-node cluster, Hadoop sorted one terabyte in 209 seconds
2008	May	Hadoop wins TeraByte Sort (World Record sortbenchmark.org)
2008	July	Hadoop wins Terabyte Sort Benchmark
2008	October	Loading 10TB/day in Yahoo clusters
2008	October	Cloudera, Hadoop distributor is founded
2008	November	Google MapReduce implementation sorted one terabyte in 68 seconds
2009	March	Yahoo runs 17 clusters with 24,000 machines
2009	April	Hadoop sorts a petabyte
2009	May	Yahoo! used Hadoop to sort one terabyte in 62 seconds
2009	June	Second Hadoop Summit
2009	July	Hadoop Core is renamed Hadoop Common
2009	July	MapR, Hadoop distributor founded
2009	July	HDFS now a separate subproject
2009	July	MapReduce now a separate subproject
2010	January	Kerberos support added to Hadoop
2010	May	Apache HBase Graduates
2010	June	Third Hadoop Summit
2010	June	Yahoo 4,000 nodes/70 petabytes
2010	June	Facebook 2,300 clusters/40 petabytes
2010	September	Apache Hive Graduates
2010	September	Apache Pig Graduates
2011	January	Apache Zookeeper Graduates
2011	January	Facebook, LinkedIn, eBay and IBM collectively contribute 200,000 lines of code
2011	March	Apache Hadoop takes top prize at Media Guardian Innovation Awards
2011	June	Rob Beardon and Eric Badleschieler spin out Hortonworks out of Yahoo.
2011	June	Yahoo has 42K Hadoop nodes and hundreds of petabytes of storage
2011	June	Third Annual Hadoop Summit (1,700 attendees)
2011	October	Debate over which company had contributed more to Hadoop.
2012	January	Hadoop community moves to separate from MapReduce and replace with YARN
2012	June	San Jose Hadoop Summit (2,100 attendees)
2012	November	Apache Hadoop 1.0 Available
2013	March	Hadoop Summit - Amsterdam (500 attendees)
2013	March	YARN deployed in production at Yahoo
2013	June	San Jose Hadoop Summit (2,700 attendees)
2013	October	Apache Hadoop 2.2 Available
2014	February	Apache Hadoop 2.3 Available
2014	February	Apache Spark top Level Apache Project
2014	April	Hadoop summit Amsterdam (750 attendees)
2014	June	Apache Hadoop 2.4 Available
2014	June	San Jose Hadoop Summit (3,200 attendees)
2014	August	Apache Hadoop 2.5 Available
2014	November	Apache Hadoop 2.6 Available
2015	April	Hadoop Summit Europe
2015	June	Apache Hadoop 2.7 Available

Highlights of the Hadoop

Yahoo uses world’s largest cluster with over 42,000 nodes running in 3 data centers.

Then Facebook which as 2000 nodes in 2010.

There are over 1000+ users.

THE COMPANIES THERE ARE WORKING WITH HADOOP

Company

Business

Technical Specs

Uses

Facebook

Social Site

8 cores and 12 TB of storage

Used as a source for reporting and machine learning

Twitter

Social site

Hadoop is used since 2010 to store and process tweets, log files using LZO compression technique as it is fast and also helps release CPU for other tasks.

LinkedIn

Social site

2X4 and 2X6 cores – 6X2TB SATA

4100 nodes

LinkedIn's data flows through Hadoop clusters.User activity, server metrics, images,transaction logs stored in HDFS are used by data analysts for business analytics like discovering people you may know.

Yahoo!

Online Portal

4500 nodes – 1TB storage, 16 GB RAM

Used for scaling tests

AOL

Online portal

ETL style processing and statistics generation

Targets machines and dual processors

EBay

Ecommerce

4K+ nodes cluster

With 300+ million users browsing more than 350 million products listed on their website, eBay has one of the largest Hadoop clu

0 Like 0 Dislike

Follow 0

Other Lessons for You

Understanding Big Data

Introduction to Big Data This blog is about Big Data, its meaning, and applications prevalent currently in the industry.It’s an accepted fact that Big Data has taken the world by storm and has become...

MyMirror

0 0

What is a SQL join?

A SQL join is a Structured Query Language (SQL) instruction to combine data from two sets of data (e.g. two tables). Before we dive into the details of a SQL join, let’s briefly discuss what SQL...

ITech Analytic Solutions

0 0

What Are Olap, Molap, Rolap, Dolap, Holap?

1. OLAP: On-Line Analytical Processing: Designates a category of applications and technologies that allow the collection, storage, manipulation and reproduction of multidimensional data, with the goal...

ITech Analytic Solutions

0 0

training #bigdatalab #online

# Fully equiped bigdata lab , for training and practice .Users can practice bigdata, datascience and machine learning technologies . User Can access this through internet , learn from anywhere. Kindly contact me for activation and subscription

Joshua Charles

0 0

Lets look at Apache Spark's Competitors. Who are the top Competitors to Apache Spark today.

Apache Spark is the most popular open source product today to work with Big Data. More and more Big Data developers are using Spark to generate solutions for Big Data problems. It is the de-facto standard...

Biswanath Banerjee

1 0

Find Hadoop near you

Looking for Hadoop ?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Hadoop Questions

Is java necessary to learn Hadoop?

18 Answers

Is there a list of the world's largest Hadoop clusters on the web?

7 Answers

Hi everyone, What is Hadoop /bigdata and what is required qualification and work experience background for Hadoop/bigdata?

12 Answers

Which is easy to learn for a fresher Hadoop or cloud computing?

5 Answers

What are some of the big data processing frameworks one should know about?

5 Answers

Looking for Hadoop Classes?

The best tutors for Hadoop Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Learn Hadoop with the Best Tutors

The best Tutors for Hadoop Classes are on UrbanPro

This website uses cookies

We use cookies to improve user experience. Choose what cookies you allow us to use. You can read more about our Cookie Policy in our Privacy Policy

Accept All

Decline All

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 55 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 7.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more