Skip to content

radiocosmology/padloper

Repository files navigation

HIRAX Layout DB

A repository containing Python interface for accessing the HIRAX Layout DB, as well as the code for the web interface.

How to start everything up

Assuming that JanusGraph, Flask, and React are installed and configured (see sections below), we can start the JanusGraph server, followed by the Flask server, and then the React server for testing.

To start the JanusGraph server, do

cd janusgraph-full-0.x.xx/
bin/janusgraph.sh start

Once the JanusGraph server is started, we start the Flask server (using port 4300, which is currently the default port that the React server communicates with):

cd flask-interface/
flask run --no-debugger -p 4300

Finally, we start the React server:

cd web-interface/
npm start

JanusGraph Installation Instructions

The following steps outline how to install JanusGraph along with the Apache Cassandra storage backend and the Elasticsearch indexing backend, as well as setting up Gremlin-Python, which is used for querying the JanusGraph backend from a Python interface. The operating system used is Windows 10 running Windows Subsystem for Linux using Ubuntu 20.04.2 LTS, however the following instructions will work for native Ubuntu 20.04.2 LTS.

Installing Java

JanusGraph is built on top of Apache TinkerPop, which, in turn, is built on top of Java and hence requires Java SE 8. The implementation of Java that we will install is OpenJDK 1.8. First, refresh the list of available packages:

sudo apt update

Next, install OpenJDK 1.8:

sudo apt install openjdk-8-jdk

To verify that the correct version has been installed, run java -version. A version similar to openjdk version "1.8.0_292" should be displayed.

Setting the $JAVA_HOME environment variable

Head to /usr/lib/jvm/ and locate the installation fo the JDK. It should look similar to /usr/lib/jvm/java-8-openjdk-amd64. Set the $JAVA_HOME environment variable to the point to the installation of the JDK:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Setting up JanusGraph

From the JanusGraph Releases, download the .zip of the "full" installation of the latest JanusGraph version (the file name should resemble janusgraph-full-X.X.X.zip, where X.X.X is the version number), and extract the contents. This "full" installation includes pre-configured JanusGraph, Apache Cassandra and Elasticsearch.

Important: You should follow all the steps below for proper functioning of padloper.

  • Add schema.default=none to the graph properties, i.e., to conf/janusgraph-cql-es.properties. This will only allow vertices/edges of the right type to be added and will throw an exception if you break the schema.
  • The default memory settings can cause the JanusGraph to use quite a bit of RAM. Here are the parameters you can/should tune:
    • With the default settings, the biggest hog is the Cassandra backend (which uses 25% of the RAM or 8GB, whichever is less!). For something more reasonable, edit the MAX_HEAP_SIZE setting in cassandra/conf/cassandra-env.sh. You can set it to something like 512M or 1024M. So far 512 MB has proven perfectly adequate.
    • You can also set the JVM heap size for JanusGraph and the ElasticSearch backend. The relevant parameters are -Xms and -Xmx in each of elasticsearch/config/jvm.options, conf/jvm-8.options and conf/jvm-11.options. These represent the initial and maximum heap size allowed, respectively, and you should set them to the same value. For instance, for a heap size of 1 GB, set them to -Xms1g and -Xmx1g; for 512 MB, -Xms512m and -Xmx512m. If you put these settings in each of the configuration files listed earlier, then a total of 2 GB would be used.
  • From here, start the JanusGraph server by running
    bin/janusgraph.sh start
    
  • Finally, you need to define the schema. Open the Gremlin console as described in the next session, and execute the commands in the index_setup.txt file. This will tell JanusGraph which vertex/edge properties are allowed, their type and will also create indices for faster searching.

Connecting to JanusGraph

Gremlin Console

Once connected to the JanusGraph server, we can open the Gremlin console by running

bin/gremlin.sh

Next, we may create a remote connection to the JanusGraph server. To use the variables when remotely accessing the Gremlin server using Gremlin console, we can connect to the server with a session:

:remote connect tinkerpop.server conf/remote.yaml session

From here, we can send commands to the JanusGraph server by preceding them with :>. We can avoid this by running

:remote console

which will enable sending all queries directly to the JanusGraph server and avoid the need of :>.

Gremlin-Python

We can also access the JanusGraph server from a Python interface. First, we install the gremlinpython Python module by running

pip install gremlinpython

Important: make sure the version of gremlinpython is supported by your version of Janusgraph. For instance, for v0.6.2 of Janusgraph, more recent versions of gremlinpython are not supported and you have to do:

pip install gremlinpython==3.5.3

Now, we may create a Python file to connect to and query the graph:

# Import full gremlinpython functionality
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

# Instantiate a Gremlin Graph clientside
graph = Graph()

# Connect to the server, instantiate traversal of graph. Note that the server is opened on port 8182 by default.
g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))

# Get the vertices of the graph as a list, and print them.
print(g.V().toList())

This code will print a list of the vertices of the graph.

Recommendation (deprecated): Update Netty version

As of at least Janusgraph 0.6.2, this recommendation is deprecated, but is left here in case a similar issue arises in the future.

Using JanusGraph 0.5.3 or later versions, it is possible that a Netty version between 4.1.44 and 4.1.46 is used. To check, navigate to lib/ and check the version numbers of netty-all, netty-common, netty-buffer, netty-codec, netty-handler, netty-resolver, and netty-transport files. These versions have a bug that may cause intermittent Apache Cassandra exceptions that become more frequent as the number of vertices and edges gets larger. Read this GitHub issue for more details.

If your Netty version is newer than 4.1.46, this section may be skipped.

Otherwise, head to the Maven repository for Netty and download the latest (stable) versions for netty-all, netty-common, netty-buffer, netty-codec, netty-handler, netty-resolver, and netty-transport, and replace the old .jar files in lib/ with these new files.

Installing Flask

To install Flask, run the following command:

pip install -Iv Flask==2.0.1 
pip install -Iv python-dotenv==0.19.0

This will install Flask version 2.0.1 and python-dotenv version 0.19.0 (see the TODOs, updating these will make Flask not work), which will read the .flaskenv file in the flask-interface folder to configure the Flask server.

Setting up React

In web-interface, run npm install to install all dependencies. However, react-scripts must be set to version 4.0.3 (see the TODOs).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published