项目作者: UnoSD

项目描述 :
C# Livy client to submit Spark jobs to HDInsight and other Spark clusters
高级语言: C#
项目地址: git://github.com/UnoSD/SparkSharp.git
创建时间: 2017-07-07T21:45:40Z
项目社区:https://github.com/UnoSD/SparkSharp

开源协议:GNU General Public License v2.0

下载


SparkSharp

C# Livy client to submit Spark jobs to HDInsight and other Spark clusters

It contains also a snippet to run Spark SQL on Cosmos DB and return the results

Example usages:

Simple

  1. using (var client = new LivyClient("http://url-to-livy", "username", "password"))
  2. using (var session = await client.CreateSessionAsync(SimpleExampleSessionConfiguration.GetConfiguration()))
  3. {
  4. var sum = await session.ExecuteStatementAsync<int>("val res = 1 + 1\nprintln(res)");
  5. // Prints 2
  6. Console.WriteLine(sum);
  7. }
  8. `

Cosmos DB Spark SQL

  1. var cosmosSettings = new CosmosCollectionSettings
  2. {
  3. Name = "CosmosName",
  4. Key = "CosmosKey",
  5. Database = "CosmosDatabase",
  6. Collection = "CosmosCollection",
  7. PreferredRegions = "CosmosPreferredRegions"
  8. };
  9. using (var client = new HdInsightClient("clusterName", "admin", "password"))
  10. using (var cosmos = new CosmosDbLivySession(client, cosmosSettings, CosmosExampleSessionConfiguration.GetConfiguration()))
  11. {
  12. // Group by on Cosmos, yeah!
  13. const string sql = "SELECT id, SUM(json.total) AS total FROM cosmos GROUP BY id";
  14. var results = await cosmos.QuerySparkSqlAsync<Result>(sql);
  15. // Prints all the records resulting from the query and mapped to Result
  16. results.ToList().ForEach(t => Console.WriteLine($"{t.ContactIdentifier}:{t.Count}"));
  17. }

Cosmos DB connector for Spark jars available here (with the guide in the wiki on how to set it up in HDInsight): https://github.com/Azure/azure-cosmosdb-spark/tree/master/releases

On exceptions, kill the dangling session from here: https://\.azurehdinsight.net/yarnui/hn/cluster/apps/RUNNING