Modern Data Lake Workshop

Build and Deploy a Production-Ready Data Lake Architecture

Main Speaker

Learning Tracks

Course ID

42929

Date

29-06-2026

Time

Daily seminar
9:00-16:30

Location

John Bryce ECO Tower, Homa Umigdal 29 Tel-Aviv

Overview

Data lakes power some of the world’s most data-driven organizations. In this hands-on workshop, you won’t just learn the theory – you’ll build a fully functional data lake using the same open-source tools used in production: MinIO, Trino, Hive Metastore and SQL.  

By the end of the day, you’ll walk away with:  
  • A working data lake you built yourself
  • The ability to design and deploy a data lake architecture end to end
  • Hands-on experience with distributed SQL, columnar storage and ACID-compliant table formats
  • Practical knowledge of security, cost optimization, and production best practices

Who Should Attend

  • Data Engineers
  • Developers
  • DBAs

Prerequisites

No prior experience with data lakes is required. Participants should be comfortable working with SQL and using the command line

Course Contents

Module 1 – Foundations of Data Lakes

Understand the “why” before the “how.”  
  • What is a data lake — and what isn’t
  • Data lakes vs. databases vs. data warehouses
  • Real-world use cases and adoption patterns
  • Architecture deep dive: storage, compute, and metadata layers
  • Why Apache Parquet is the lingua franca of analytics

Module 2 – Environment Setup

Spin up your lab in minutes.  
  • Docker essentials: containers, images, and orchestration
  • Launch the full workshop stack with a single command

Module 3 – Object Storage

Build the foundation layer.  
  • Deploy MinIO as an S3-compatible object store
  • Organize, load, and browse datasets
  • Understand buckets, prefixes, and access patterns

Module 4 – Query Engine & Metastore

Make your data queryable.  
  • Deploy Hive Metastore for centralized schema management
  • Deploy Trino as a high-performance distributed SQL engine
  • Create external tables and run your first queries

Module 5 – Data Transformation

Turn raw data into analytics-ready assets.  
  • Transform raw CSV into optimized Parquet (Tier 1 → Tier 2)
  • Build curated analytical datasets (Tier 2 → Tier 3)
  • Partitioning strategies and performance tuning

Module 6 – Analytics & Visualization

See your data come to life.  
  • Connect Apache Zeppelin (or your preferred BI tool) to Trino
  • Run interactive queries and build visualizations
  • Explore real-world analytics scenarios on your own data

Module 7 – Apache Iceberg & Advanced Topics

Go beyond the basics.  
  • Apache Iceberg: ACID transactions on a data lake
  • Time-travel queries and snapshot isolation
  • Schema evolution, updates, and deletes at scale
  • Security, governance, and cost management

Module 8 – Wrap-Up

Take it with you.  
  • Key architectural patterns and decision frameworks
  • From workshop to production: deployment strategies
  • Curated resources for your continued learning
 

Ready to build your first data lake? Bring your laptop, your curiosity,and optionally your own dataset — we’ll handle the rest.  

The conference starts in

Days
Hours
Minutes
Seconds