Data Streaming with Spark

Data Streaming with Spark

Main Speaker:

Tracks:

Data

Seminar Categories:

Data
NoSQL

Course ID:

43686

Date:

26.6.2019

Time:

Daily seminar
9:00-16:30

43686

Overview

Apache Spark™ is a fast and general engine for large-scale data processing, and arguably the first open source software that makes distributed programming truly accessible to data scientists.

Using Apache Spark™ you can write applications quickly using Java, Scala, Python, and R. Spark Streaming makes it easy to build scalable fault-tolerant streaming applications.

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.

This seminar relates to the fundamentals about Apache Spark and guides you through everything you need to know about Spark data streaming.

Who Should Attend

  • Data scientists that wants to cache and query large amounts of in memory data
  • Developers that wands to create execution pipelines and general graph processing
  • Any Spark user that wants to extends it knowledge and learn about data streaming

Prerequisites

  • Familiarity with any modern programming language like Java/Python/Scala

Course Contents

  • Introduction to Spark
  • The idea behind streaming
  • Pipeline coding for streams
  • Basic concepts
  • Spark Language integrated API
  • Recovery and fault tolerance
  • Integration with interactive queries
  • Deployment options (standalone/zookeepr & HDFS)
  • Performance tuning


DevGeekWeek 2019





Contact

DevGeekWeek 2019





Skip to content