Big Data Training

As the name suggest, big data is type of data which is enormous in size and cannot be handeled using traditional file handeling systems.

Index

Big Data V/s Small Data
Hadoop and Mapreduce
- Mapreduce Programs
Apache Pig
- Pig Programs

Big Data V/s Small Data

Big Data	Small Data
Mostly unstructured	Mostly structured
Stored in petabyte, exabyte, zettabyte etc.	Stored in Mb, Gb and Tb
Increases exponentially	Increases gradually
Globally present and distributed	Locally present
Multi node clusters are used	Single node clusters are used

Hadoop

Hadoop is an open source framework that is used to efficiently store and process big data. 2 components of hadoop are:

HDFS: HDFS stands for Hadoop Distributed File System. It is the primary storage system of hadoop. HDFS creates multiple replicas of each data block and distributes them on computers throughout a cluster to enable reliable and rapid access.
MAPREDUCE: Hadoop MapReduce is the processing unit of Hadoop. In the MapReduce approach, the processing is done at the slave nodes, and the final result is sent to the master node.

More information on HDFS and YARN are present here.

MAPREDUCE PROGRAMMING USING PYTHON MAPREDUCE PROGRAMMING USING JAVA

APACHE PIG

Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

PIG LATIN LANGUAGE

The Pig Latin is a data flow language used by Apache Pig to analyze the data in Hadoop. It is a textual language that abstracts the programming from the Java MapReduce idiom into a notation.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Apache_Pig		Apache_Pig
MapReduce		MapReduce
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apache_Pig

Apache_Pig

MapReduce

MapReduce

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Big Data Training

Index

Big Data V/s Small Data

Hadoop

APACHE PIG

PIG LATIN LANGUAGE

About

Releases

Packages

Languages

Raveesh1505/BigData-Training

Folders and files

Latest commit

History

Repository files navigation

Big Data Training

Index

Big Data V/s Small Data

Hadoop

APACHE PIG

PIG LATIN LANGUAGE

About

Topics

Resources

Stars

Watchers

Forks

Languages