Decoding the Transformer – A Zero to One Guide for Data Scientists & Software Engineers

Main Speaker

Learning Tracks

Course ID

42930

Date

29-06-2026

Time

Daily seminar
9:00-16:30

Location

John Bryce ECO Tower, Homa Umigdal 29 Tel-Aviv

Overview

This seminar provides a structured, end-to-end introduction to Transformers, guiding participants from foundational concepts to building a minimal working model. Through a combination of conceptual explanations and hands-on live coding, participants will explore the evolution from earlier sequence models to attention-based architectures, develop an intuitive and practical understanding of embeddings, attention mechanisms, and transformer components, and learn how these elements come together in modern language models.

Who Should Attend

Data Scientists and Software Engineers who want to gain a foundational, practical understanding of Transformers, including those with basic familiarity with tensors and programming who aim to read, implement, and experiment with modern deep learning architectures.

Prerequisites

Course Contents

  • Evolution of sequence models: N-grams, RNNs/LSTMs, and CNNs and their limitations
  • Embeddings and token representation
  • The bottleneck problem in sequence modeling
  • Attention mechanism: intuition, concepts, and components (Q, K, V)
  • Implementation and analysis of attention (including debugging and experiments)
  • Self-attention and contextual representation
  • Positional encoding and handling sequence order
  • Transformer architecture and its core components
  • Masking in autoregressive models (padding and causal masking)
  • Building a minimal GPT model end-to-end
  • Practical considerations and limitations of transformers
  • Core mental models of transformers and attention

The conference starts in

Days
Hours
Minutes
Seconds