Skip to content

UnoSD/SparkSharp

Repository files navigation

SparkSharp

C# Livy client to submit Spark jobs to HDInsight and other Spark clusters

It contains also a snippet to run Spark SQL on Cosmos DB and return the results

Example usages:

Simple

using (var client = new LivyClient("http://url-to-livy", "username", "password"))
using (var session = await client.CreateSessionAsync(SimpleExampleSessionConfiguration.GetConfiguration()))
{
    var sum = await session.ExecuteStatementAsync<int>("val res = 1 + 1\nprintln(res)");
    
    // Prints 2
    Console.WriteLine(sum);
}

Cosmos DB Spark SQL

var cosmosSettings = new CosmosCollectionSettings
{
    Name = "CosmosName",
    Key = "CosmosKey",
    Database = "CosmosDatabase",
    Collection = "CosmosCollection",
    PreferredRegions = "CosmosPreferredRegions"
};

using (var client = new HdInsightClient("clusterName", "admin", "password"))
using (var cosmos = new CosmosDbLivySession(client, cosmosSettings, CosmosExampleSessionConfiguration.GetConfiguration()))
{
    // Group by on Cosmos, yeah!
    const string sql = "SELECT id, SUM(json.total) AS total FROM cosmos GROUP BY id";

    var results = await cosmos.QuerySparkSqlAsync<Result>(sql);
                
    // Prints all the records resulting from the query and mapped to Result
    results.ToList().ForEach(t => Console.WriteLine($"{t.ContactIdentifier}:{t.Count}"));
}

Cosmos DB connector for Spark jars available here (with the guide in the wiki on how to set it up in HDInsight): https://github.com/Azure/azure-cosmosdb-spark/tree/master/releases

On exceptions, kill the dangling session from here: https://<yourHdInsightClusterName>.azurehdinsight.net/yarnui/hn/cluster/apps/RUNNING

About

C# Livy client to submit Spark jobs to HDInsight and other Spark clusters

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages