Data Streaming with Spark
Apache Spark™ is a fast and general engine for large-scale data processing, and arguably the first open source software that makes distributed programming truly accessible to data scientists.
Using Apache Spark™ you can write applications quickly using Java, Scala, Python, and R. Spark Streaming makes it easy to build scalable fault-tolerant streaming applications.
Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.
This seminar relates to the fundamentals about Apache Spark and guides you through everything you need to know about Spark data streaming.
Who Should Attend
- Data scientists that wants to cache and query large amounts of in memory data
- Developers that wands to create execution pipelines and general graph processing
- Any Spark user that wants to extends it knowledge and learn about data streaming
- Familiarity with any modern programming language like Java/Python/Scala
- Introduction to Spark
- The idea behind streaming
- Pipeline coding for streams
- Basic concepts
- Spark Language integrated API
- Recovery and fault tolerance
- Integration with interactive queries
- Deployment options (standalone/zookeepr & HDFS)
- Performance tuning