Hadoop is an open source project from Apache that has evolved rapidly into a major technology movement. It has emerged as the best way to handle massive amounts of data, including not only structured data but also complex, unstructured data as well. Its popularity is due in part to its ability to store, analyze and access large amounts of data, quickly and cost effectively across clusters of commodity hardware.
Apache Hadoop is not actually a single product but instead a collection of several components including the following:
MapReduce – A framework for writing applications that processes large amounts of structured and unstructured data in parallel across large clusters of machines in a very reliable and fault-tolerant manner.
Hadoop Distributed File System (HDFS) – A reliable and distributed Java-based file system that allows large volumes of data to be stored and rapidly accessed across large clusters of commodity servers.
Hive – Built on the MapReduce framework, Hive is a data warehouse that enables easy data summarization and ad-hoc queries via an SQL-like interface for large datasets stored in HDFS.
Pig – A platform for processing and analyzing large data sets. Pig consists on a high-level language (Pig Latin) for expressing data analysis programs paired with the MapReduce framework for processing these programs.
HBase – A column-oriented NoSQL data storage system that provides random real-time read/write access to big data for user applications.
ZooKeeper – A highly available system for coordinating distributed processes. Distributed applications use ZooKeeper to store and mediate updates to important configuration information.
Ambari – An open source installation lifecycle management, administration and monitoring system for Apache Hadoop clusters.
HCatalog – A table and metadata management service that provides a centralized way for data processing systems to understand the structure and location of the data stored within Apache Hadoop.
Learn From Experts !
Key Features :
* Practice on production level Cloud Servers
* Primary focus - hands-on sessions
* Real time Task Assignment - POC / our Big Data Analytics Platform
044 - 42645495
#67, 2nd Floor, Gandhi Nagar 1st Main Road,
Adyar, Chennai - 20
[Opp to Adyar Lifestyle Super Market]