Hadoop Ecosystem Fundamentals for DBAs and Data Architects
The Apache Hadoop is a framework that allows distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
In this seminar we’ll emphasize the need for Hadoop and BigData. We will review the architecture for Hadoop, learn about the different Hadoop nodes and their roles in the cluster, provide a brief introduction to MapReduce, learn about Impala and Hive – writing and executing SQL queries on top of Hadoop, and look at Hadoop file system – HDFS.
We will also cover real life use cases that will allow us to better understand what this amazing framework can do.
This seminar is ideal for Oracle DBAs, Big Data and system architectures, IT administrators and database professionals who are looking to take their first steps in working with Hadoop as well as developers interested in learning about programming with SQL on top of the Hadoop cluster.
Note that this seminar is not intended for experienced Java professionals as it only touches briefly on Java and Map/Reduce.
- What is Big Data?
- Review the Hadoop Ecosystem and architecture
- Review Cloudera Hadoop different tools and latest buzzwords
- Exploring Map/Reduce and HDFS
- Going over Hadoop’s different nodes and functions (including YARN)
- Hadoop big data tools – hive , spark , impala (and more…)
- Oracle Big Data appliance
- Oracle Big Data cloud