A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Explore a variety of tutorials and interactive demonstrations focused on Big Data technologies like Hadoop, Spark, and more, primarily presented in the format of Jupyter notebooks. Most notebooks are self-contained, with instructions for installing all required services. They can be run on Google Colab or in a virtual Ubuntu machine/container.
Most executable Jupyter notebooks are tested on an Ubuntu virtual machine through a GitHub automated workflow. The log file for successful executions is named: action_log.txt (see also: Google Colab vs. GitHub Ubuntu Runner
).
Current status:
The Github workflow is a starting point for what is known as Continuous Integration (CI) in DevOps/Platform Engineering circles.