Skip to content

Commit

Permalink
Upgrading Neo4j and Spark
Browse files Browse the repository at this point in the history
Dependency upgrades:

* Neo4j 2.2.0 — Auth disabled
* Apache Spark 1.3.0

I’ve upgraded the dependencies for Spark and Neo4j to their current
stable releases. Various fixes to support the most recent version of
the Neo4j kernel. I’ve disabled Neo4j’s new basic auth feature by
default.
  • Loading branch information
kbastani committed Mar 26, 2015
1 parent d5c535b commit f71b090
Show file tree
Hide file tree
Showing 12 changed files with 56 additions and 58 deletions.
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,33 +23,33 @@ The Neo4j Mazerunner service in this image is a [unmanaged extension](http://neo
Installation requires 3 docker image deployments, each containing a separate linked component.

* *Hadoop HDFS* (sequenceiq/hadoop-docker:2.4.1)
* *Neo4j Graph Database* (kbastani/docker-neo4j:latest)
* *Apache Spark Service* (kbastani/neo4j-graph-analytics:latest)
* *Neo4j Graph Database* (kbastani/docker-neo4j:2.2.0)
* *Apache Spark Service* (kbastani/neo4j-graph-analytics:1.1.0)

Pull the following docker images:

docker pull sequenceiq/hadoop-docker:2.4.1
docker pull kbastani/docker-neo4j:latest
docker pull kbastani/neo4j-graph-analytics:latest
docker pull kbastani/docker-neo4j:2.2.0
docker pull kbastani/neo4j-graph-analytics:1.1.0

After each image has been downloaded to your Docker server, run the following commands in order to create the linked containers.

# Create HDFS
docker run -i -t --name hdfs sequenceiq/hadoop-docker:2.4.1 /etc/bootstrap.sh -bash

# Create Mazerunner Apache Spark Service
docker run -i -t --name mazerunner --link hdfs:hdfs kbastani/neo4j-graph-analytics
docker run -i -t --name mazerunner --link hdfs:hdfs kbastani/neo4j-graph-analytics:1.1.0

# Create Neo4j database with links to HDFS and Mazerunner
# Replace <user> and <neo4j-path>
# with the location to your existing Neo4j database store directory
docker run -d -P -v /Users/<user>/<neo4j-path>/data:/opt/data --name graphdb --link mazerunner:mazerunner --link hdfs:hdfs kbastani/docker-neo4j
docker run -d -P -v /Users/<user>/<neo4j-path>/data:/opt/data --name graphdb --link mazerunner:mazerunner --link hdfs:hdfs kbastani/docker-neo4j:2.2.0

### Use Existing Neo4j Database

To use an existing Neo4j database, make sure that the database store directory, typically `data/graph.db`, is available on your host OS. Read the [setup guide](https://github.com/kbastani/docker-neo4j#start-neo4j-container) for *kbastani/docker-neo4j* for additional details.

> Note: The kbastani/docker-neo4j image is running Neo4j 2.1.7. If you point it to an older database store, that database may become unable to be attached to a previous version of Neo4j. Make sure you back up your store files before proceeding.
> Note: The kbastani/docker-neo4j:2.2.0 image is running Neo4j 2.2.0. If you point it to an older database store, that database may become unable to be attached to a previous version of Neo4j. Make sure you back up your store files before proceeding.
### Use New Neo4j Database

Expand Down Expand Up @@ -89,9 +89,9 @@ The result of the analysis will set the property with `{analysis}` as the key on
To begin graph analysis jobs on a particular metric, HTTP GET request on the following Neo4j server endpoints:

### PageRank

http://172.17.0.21:7474/service/mazerunner/analysis/pagerank/FOLLOWS

* Gets all nodes connected by the `FOLLOWS` relationship and updates each node with the property key `pagerank`.

* The value of the `pagerank` property is a float data type, ex. `pagerank: 3.14159265359`.
Expand All @@ -101,15 +101,15 @@ To begin graph analysis jobs on a particular metric, HTTP GET request on the fol
### Closeness Centrality (New)

http://172.17.0.21:7474/service/mazerunner/analysis/closeness_centrality/FOLLOWS

* Gets all nodes connected by the `FOLLOWS` relationship and updates each node with the property key `closeness_centrality`.

* The value of the `closeness_centrality` property is a float data type, ex. `pagerank: 0.1337`.

* A key node centrality measure in networks is closeness centrality (Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994). It is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes.

### Triangle Counting

http://172.17.0.21:7474/service/mazerunner/analysis/triangle_count/FOLLOWS

* Gets all nodes connected by the `FOLLOWS` relationship and updates each node with the property key `triangle_count`.
Expand All @@ -123,7 +123,7 @@ To begin graph analysis jobs on a particular metric, HTTP GET request on the fol
### Connected Components

http://172.17.0.21:7474/service/mazerunner/analysis/connected_components/FOLLOWS

* Gets all nodes connected by the `FOLLOWS` relationship and updates each node with the property key `connected_components`.

* The value of `connected_components` property is an integer data type, ex. `connected_components: 181`.
Expand All @@ -132,10 +132,10 @@ To begin graph analysis jobs on a particular metric, HTTP GET request on the fol

* Connected components are used to find isolated clusters, that is, a group of nodes that can reach every other node in the group through a *bidirectional* traversal.

### Strongly Connected Components
### Strongly Connected Components

http://172.17.0.21:7474/service/mazerunner/analysis/strongly_connected_components/FOLLOWS

* Gets all nodes connected by the `FOLLOWS` relationship and updates each node with the property key `strongly_connected_components`.

* The value of `strongly_connected_components` property is an integer data type, ex. `strongly_connected_components: 26`.
Expand Down
Binary file not shown.
18 changes: 10 additions & 8 deletions src/extension/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@

<groupId>org.mazerunner</groupId>
<artifactId>extension</artifactId>
<version>1.0</version>
<version>1.1.0-RELEASE</version>

<properties>
<neo4j.version>2.1.7</neo4j.version>
<neo4j.version>2.2.0</neo4j.version>
<joda.version>2.3</joda.version>
<guava.version>17.0</guava.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
Expand All @@ -30,14 +30,16 @@
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-kernel</artifactId>
<version>2.2.0</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-io</artifactId>
<version>${neo4j.version}</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.neo4j</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ public void initializeTest()
{
hadoopSitePath = "/etc/hadoop/core-site.xml";
hadoopHdfsPath = "/etc/hadoop/hdfs-site.xml";
hadoopHdfsUri = "hdfs://0.0.0.0:8020";
hadoopHdfsUri = "hdfs://0.0.0.0:9000";
mazerunnerRelationshipType = "CONNECTED_TO";
rabbitmqNodename = "localhost";
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Transaction;
import org.neo4j.kernel.GraphDatabaseAPI;

import java.io.IOException;
import java.util.Spliterator;
Expand Down Expand Up @@ -87,7 +86,7 @@ protected void compute() {
*/
protected void computeDirectly() throws IOException {

Transaction tx = ((GraphDatabaseAPI)db).tx().unforced().begin();
Transaction tx = db.beginTx();

Node partitionNode = null;

Expand Down
5 changes: 2 additions & 3 deletions src/extension/src/main/java/translation/Writer.java
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
import org.neo4j.graphdb.*;
import org.neo4j.graphdb.traversal.Evaluators;
import org.neo4j.helpers.collection.IteratorUtil;
import org.neo4j.kernel.GraphDatabaseAPI;
import org.neo4j.tooling.GlobalGraphOperations;

import java.io.BufferedReader;
Expand Down Expand Up @@ -197,9 +196,9 @@ public static Path exportSubgraphToHDFSParallel(GraphDatabaseService db) throws
}

public static void writeBlockForNode(Node n, GraphDatabaseService db, BufferedWriter bufferedWriter, int reportBlockSize, String relationshipType) throws IOException {
Transaction tx = ((GraphDatabaseAPI)db).tx().unforced().begin();
Transaction tx = db.beginTx();
Iterator<Relationship> rels = n.getRelationships(withName(relationshipType), Direction.OUTGOING).iterator();
// Stream<Relationship> relStream = StreamSupport.stream(Spliterators.spliteratorUnknownSize(rels.iterator(), Spliterator.NONNULL), true);

while(rels.hasNext()) {
try {
Relationship rel = rels.next();
Expand Down
5 changes: 2 additions & 3 deletions src/extension/src/test/java/translation/WriterTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Transaction;
import org.neo4j.kernel.GraphDatabaseAPI;
import org.neo4j.test.TestGraphDatabaseFactory;

import java.io.*;
Expand Down Expand Up @@ -109,7 +108,7 @@ public void testParallelUpdate() throws Exception {

GraphDatabaseService db = setUpDb();

Transaction tx = ((GraphDatabaseAPI)db).tx().unforced().begin();
Transaction tx = db.beginTx();


// Use test configurations
Expand Down Expand Up @@ -173,7 +172,7 @@ public static void writeListFile(String path, String nodeList) throws IOExceptio

private static GraphDatabaseService setUpDb()
{
return new TestGraphDatabaseFactory().newImpermanentDatabaseBuilder().newGraphDatabase();
return new TestGraphDatabaseFactory().newImpermanentDatabase();
}

public void testSendProcessorMessage() throws Exception {
Expand Down
14 changes: 3 additions & 11 deletions src/spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,9 @@
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<repositories>
<repository>
<id>orgapachespark-1043</id>
<name>Apache Spark 1.1.1</name>
<url>https://repository.apache.org/content/repositories/orgapachespark-1043/</url>
</repository>
</repositories>

<groupId>org.mazerunner</groupId>
<artifactId>spark</artifactId>
<version>1.0</version>
<version>1.1.0-RELEASE</version>

<properties>
<jetty.version>7.6.9.v20130131</jetty.version>
Expand Down Expand Up @@ -83,7 +75,7 @@
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.1.0</version>
<version>1.3.0</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
Expand All @@ -94,7 +86,7 @@
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.10</artifactId>
<version>1.1.0</version>
<version>1.3.0</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,9 @@ public void initializeTest()
{
hadoopSitePath = "/Users/kennybastani/hadoop-1.0.4/conf/core-site.xml";
hadoopHdfsPath = "/Users/kennybastani/hadoop-1.0.4/conf/hdfs-site.xml";
hadoopHdfsUri = "hdfs://ip-172-31-5-251.us-west-1.compute.internal:9000";
hadoopHdfsUri = "hdfs://0.0.0.0:9000";
mazerunnerRelationshipType = "CONNECTED_TO";
rabbitmqNodename = "ec2-54-183-26-46.us-west-1.compute.amazonaws.com";
rabbitmqNodename = "localhost";
sparkHost = "local";
appName = "mazerunner";
executorMemory = "13g";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ public class Worker {
private String sparkMaster = "local";

@Option(name="--hadoop.hdfs",usage="The HDFS URL (e.g. hdfs://0.0.0.0:9000).", metaVar = "<url>")
private String hadoopHdfs = "hdfs://10.0.0.4:8020";
private String hadoopHdfs = "hdfs://0.0.0.0:9000";

@Option(name="--spark.driver.host",usage="The host name of the Spark driver (eg. ec2-54-67-91-4.us-west-1.compute.amazonaws.com)", metaVar = "<url>")
private String driverHost = "mazerunner";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

import junit.framework.TestCase;
import org.apache.spark.api.java.JavaSparkContext;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.mazerunner.core.algorithms;
import org.mazerunner.core.config.ConfigurationLoader;
Expand All @@ -14,6 +16,18 @@
import java.util.List;

public class GraphProcessorTest extends TestCase {
JavaSparkContext javaSparkContext;

@Before
public void setUp() {
ConfigurationLoader.testPropertyAccess = true;
javaSparkContext = GraphProcessor.initializeSparkContext();
}

@After
public void tearDown() {
GraphProcessor.javaSparkContext.close();
}

@Test
public void testProcessEdgeList() throws Exception {
Expand All @@ -30,7 +44,7 @@ public void testProcessEdgeList() throws Exception {
"3 0"
)).iterator());

GraphProcessor.processEdgeList(new ProcessorMessage(path, GraphProcessor.TRIANGLE_COUNT, ProcessorMode.Partitioned));
GraphProcessor.processEdgeList(new ProcessorMessage(path, GraphProcessor.TRIANGLE_COUNT, ProcessorMode.Unpartitioned));
}

@Test
Expand Down Expand Up @@ -58,8 +72,8 @@ public void testShortestPaths() throws Exception {

// Test writing the PageRank result to HDFS path
FileUtil.writeListFile(path, nodeList.iterator());
JavaSparkContext javaSparkContext = GraphProcessor.initializeSparkContext();
Iterable<String> results = algorithms.closenessCentrality(javaSparkContext.sc(), path);

Iterable<String> results = algorithms.closenessCentrality(GraphProcessor.javaSparkContext.sc(), path);
results.iterator().forEachRemaining(System.out::print);
}
}
7 changes: 0 additions & 7 deletions src/spark/src/test/scala/hdfs/FileUtilTest.java

This file was deleted.

0 comments on commit f71b090

Please sign in to comment.