项目作者: Annielytix

项目描述 :
A Beginner's Guide to Azure Databricks
高级语言:
项目地址: git://github.com/Annielytix/Ready2019_AA_AI_200.git
创建时间: 2018-08-24T19:40:36Z
项目社区:https://github.com/Annielytix/Ready2019_AA_AI_200

开源协议:MIT License

下载


Updated by Laura Edell, Sr. Data Scientist | Microsoft MSUS CTO CSU Organization

Date: 2/11/2019

A Beginner’s Guide to Azure Databricks

Use the labs in this repo to get started using Spark in Azure Databricks.

0a. Start by following the Setup Guide to prepare your Azure environment.

0b. Download the labfiles from this repo used in the lab exercise or fork this repository to your own. After you successfully complete both steps listed as 0a and 0b, please complete the following in order:

  1. Lab 1 - Getting Started with Spark. In this lab, you’ll learn how to provision a Spark cluster in an Azure Databricks workspace, followed by interacting with the data using Python or Scala.

  2. Lab 2 - Running a Spark Job. In this lab, you’ll learn how to configure a Spark job for silent execution allowing you to schedule your batch processing workloads.

  3. Lab 3 - Using Structured Streaming. In this lab, you’ll learn how to use Spark to process stream(s) of real-time data inline with common IoT (sensors) scenarios.

  4. Lab 4 - Introduction to Machine Learning. In this lab you’ll train & later evaluate a classification model.

In the Advanced Databricks workshop, you will cover MMLSpark and several types of Supervised and Unsupervised Machine Learning use cases (https://aka.ms/Ready-DA-AAAI-TS319)