Introductory Spark workshop - IPYNB notebook and data
Introductory workshops for beginners in Apache Spark with Python (pyspark) and SQL (Spark SQL). Repository includes IPYNB notebooks and data.
Note: file paths in notebooks will require updating
Covers some core concepts using Spark for data analysis including:
Demonstrates the concept of “Tidy Data” using example code in Apache Spark and tidying five common types of untidy data: