Home » Decoding the Transformer – A Zero to One Guide for Data Scientists & Software Engineers

Decoding the Transformer – A Zero to One Guide for Data Scientists & Software Engineers

Main Speaker

Yuval Rom

Learning Tracks

Engineering & Data Systems

Course ID

42930

Date

29-06-2026

Time

Daily seminar
9:00-16:30

Location

John Bryce ECO Tower, Homa Umigdal 29 Tel-Aviv

Overview

This seminar provides a structured, end-to-end introduction to Transformers, guiding participants from foundational concepts to building a minimal working model. Through a combination of conceptual explanations and hands-on live coding, participants will explore the evolution from earlier sequence models to attention-based architectures, develop an intuitive and practical understanding of embeddings, attention mechanisms, and transformer components, and learn how these elements come together in modern language models.

Who Should Attend

Data Scientists and Software Engineers who want to gain a foundational, practical understanding of Transformers, including those with basic familiarity with tensors and programming who aim to read, implement, and experiment with modern deep learning architectures.

Prerequisites

Course Contents

Evolution of sequence models: N-grams, RNNs/LSTMs, and CNNs and their limitations
Embeddings and token representation
The bottleneck problem in sequence modeling
Attention mechanism: intuition, concepts, and components (Q, K, V)
Implementation and analysis of attention (including debugging and experiments)
Self-attention and contextual representation
Positional encoding and handling sequence order
Transformer architecture and its core components
Masking in autoregressive models (padding and causal masking)
Building a minimal GPT model end-to-end
Practical considerations and limitations of transformers
Core mental models of transformers and attention