项目作者: groda

项目描述 :
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
高级语言: Jupyter Notebook
项目地址: git://github.com/groda/big_data.git
创建时间: 2019-08-27T20:29:46Z
项目社区:https://github.com/groda/big_data

开源协议:MIT License

下载


big_data

Big Data for beginners

Explore a variety of tutorials and interactive demonstrations focused on Big Data technologies like Hadoop, Spark, and more, primarily presented in the format of Jupyter notebooks. Most notebooks are self-contained, with instructions for installing all required services. They can be run on Google Colab or in a virtual Ubuntu machine/container.

Setting Up Hadoop: Single-Node Configuration

Running Apache Spark in Standalone Mode

MapReduce Tutorials

PySpark Tutorials

Miscellaneous Tutorials

Virtualization and Cloud Automation

Big Data Learning Pathways

About this repository

Notebooks Testing and CI

Most executable Jupyter notebooks are tested on an Ubuntu virtual machine through a GitHub automated workflow. The log file for successful executions is named: action_log.txt (see also: Google Colab vs. GitHub Ubuntu Runner Open In Colab Render in nbviewer).

Current status:

  • Run Notebooks on Ubuntu
  • Run One Notebook on Ubuntu

The Github workflow is a starting point for what is known as Continuous Integration (CI) in DevOps/Platform Engineering circles.