This seminar provides a structured, end-to-end introduction to Transformers, guiding participants from foundational concepts to building a minimal working model. Through a combination of conceptual explanations and hands-on live coding, participants will explore the evolution from earlier sequence models to attention-based architectures, develop an intuitive and practical understanding of embeddings, attention mechanisms, and transformer components, and learn how these elements come together in modern language models.
Data Scientists and Software Engineers who want to gain a foundational, practical understanding of Transformers, including those with basic familiarity with tensors and programming who aim to read, implement, and experiment with modern deep learning architectures.