Data Lake Workshop

Building Your Foundation in Modern Data Management

Main Speaker

Learning Tracks

Course ID

52023

Date

22-07-2025

Time

Daily seminar
9:00-16:30

Location

Daniel Hotel, 60 Ramat Yam st. Herzliya

Overview

Data lakes represent the forefront of data management, and this paradigm shift is already underway. In the data lake workshop, you’ll delve into the fundamentals of data lakes and explore their widespread adoption. Understanding the architecture of a data lake, comparing it to traditional databases and other big data solutions will be a focal point. During the workshop, you’ll work on building your own data lake from scratch, comprising essential components such as an object store, a metastore, and a query engine. Utilizing the query engine, you’ll manipulate data within your data lake, gaining insights into its structure and the dynamics of data flow within this ecosystem. Moreover, the workshop will delve into more advanced topics including Apache Iceberg tables, which facilitate CRUD operations, and address crucial aspects of data lake management such as security measures and cost controls.

Who Should Attend

  1. Data Engineers: Focused on data ingestion, transformation, and management.
  2. Developers: Integrating applications with data lakes via APIs and SDKs.
  3. Database Administrators (DBAs): Add a new technology stack, migrate existing databases to a data lake.

Prerequisites

Course Contents

Data Lakes
  • What is a data lake
  • Comparison between a data lake to a database
  • Why should you use data lakes?
  • Data lake architecture
  • Data structure and columnar format example – Apache Parquet
Hands on workshop 
  • Introduction to Docker
    • Docker basic terms: Containers, Images and Tags
    • Basic commands used during the workshop
  • Object Store
    • Bring your own object store – minio
    • Load data and browse the object store
  • Query Engine and Metastore
    • Introduction
    • Bring your own metastore – Hive metastore
    • Bring your own query engine – Trino
    • Create a table and query it is using Trino CLI
  • Data transformation
    • Use Trino to transform data from CSV to parquet (Tier1 → Tier2)
    • Use Trino to create a data for a use case (Tier2 → Tier3)
Advanced topics
  • A table format which supports CRUD – Apache Iceberg
  • Data Lake management – Security and cost controls
  • Summary

The conference starts in

Days
Hours
Minutes
Seconds