The buzz about Big Data had finally got into big enterprises and the adoption of the tools had begun. In many enterprises, most of the IT staff is not familiar with the buzz words, and the concepts of Big Data and especially Hadoop.
The Apache Hadoop is a framework that allows distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
In this seminar, we’ll introduce Hadoop and BigData. We will review the architecture for Hadoop, learn about the different Hadoop nodes and their roles in the cluster, provide a brief introduction to MapReduce, learn about Pig and Hive – writing and executing SQL queries on top of Hadoop, and look at Hadoop file system – HDFS.
We will also cover other Big Data technologies, but mainly, we will understand all the terms related to Hadoop.
This seminar is ideal for IT professionals who are looking to take their first steps in working with Hadoop as well as developers interested in learning about Hadoop.
Most of the code examples in this seminar are written in Python, however, this seminar is not intended for experienced Python professionals as it only touches briefly on Python.