Home » Modern Data Lake Workshop

Modern Data Lake Workshop

Build and Deploy a Production-Ready Data Lake Architecture

Main Speaker

Ori Nakar

Learning Tracks

Engineering & Data Systems

Course ID

42929

Date

29-06-2026

Time

Daily seminar
9:00-16:30

Location

John Bryce ECO Tower, Homa Umigdal 29 Tel-Aviv

Overview

Data lakes power some of the world’s most data-driven organizations. In this hands-on workshop, you won’t just learn the theory – you’ll build a fully functional data lake using the same open-source tools used in production: MinIO, Trino, Hive Metastore and SQL.

By the end of the day, you’ll walk away with:

A working data lake you built yourself
The ability to design and deploy a data lake architecture end to end
Hands-on experience with distributed SQL, columnar storage and ACID-compliant table formats
Practical knowledge of security, cost optimization, and production best practices

Who Should Attend

Data Engineers
Developers
DBAs

Prerequisites

No prior experience with data lakes is required. Participants should be comfortable working with SQL and using the command line

Course Contents

Module 1 – Foundations of Data Lakes

Understand the “why” before the “how.”

What is a data lake — and what isn’t
Data lakes vs. databases vs. data warehouses
Real-world use cases and adoption patterns
Architecture deep dive: storage, compute, and metadata layers
Why Apache Parquet is the lingua franca of analytics

Module 2 – Environment Setup

Spin up your lab in minutes.

Docker essentials: containers, images, and orchestration
Launch the full workshop stack with a single command

Module 3 – Object Storage

Build the foundation layer.

Deploy MinIO as an S3-compatible object store
Organize, load, and browse datasets
Understand buckets, prefixes, and access patterns

Module 4 – Query Engine & Metastore

Make your data queryable.

Deploy Hive Metastore for centralized schema management
Deploy Trino as a high-performance distributed SQL engine
Create external tables and run your first queries

Module 5 – Data Transformation

Turn raw data into analytics-ready assets.

Transform raw CSV into optimized Parquet (Tier 1 → Tier 2)
Build curated analytical datasets (Tier 2 → Tier 3)
Partitioning strategies and performance tuning

Module 6 – Analytics & Visualization

See your data come to life.

Connect Apache Zeppelin (or your preferred BI tool) to Trino
Run interactive queries and build visualizations
Explore real-world analytics scenarios on your own data

Module 7 – Apache Iceberg & Advanced Topics

Go beyond the basics.

Apache Iceberg: ACID transactions on a data lake
Time-travel queries and snapshot isolation
Schema evolution, updates, and deletes at scale
Security, governance, and cost management

Module 8 – Wrap-Up

Take it with you.

Key architectural patterns and decision frameworks
From workshop to production: deployment strategies
Curated resources for your continued learning

Ready to build your first data lake? Bring your laptop, your curiosity,and optionally your own dataset — we’ll handle the rest.