Repository containing scripts used during the development of prototype for my Master's Project. The dissertation explaning all the research behind it can be found here.
To run these scripts, you will need to have installed:
Once those are installed, create a new virtual environment for the project using the terminal:
mkvirtualenv dblp
Now, you'll need to install the depencencies by running the following command inside src
directory:
make deps
-
In
load_dblp.py
:- Specify your SQLite connection string.
- Setup your Neo4J connection information.
- Remember to download the DBLP dataset from here: http://dblp.uni-trier.de/xml/
-
Run
load_dblp.py
to parse all the entries indblp.xml
to a SQLite DB and then load it to Neo4J.- The part to parse the data from the XML file may take a while 😢. Go get a drink or go to sleep (that's what I did 😴).
- You only need to parse from
dblp.xml
once... you can comment the line 19 once you have your SQLite DB populated. - In this script you can tweak what date you actually want to have in Neo4J. For instance, I created a view to select just a subset of the original DBLP data.
-
Run
graph_aggregator.py
once your Neo4J DB is populated. This script is used to generate the Aggregated Graph that allows us to perform OLAP queries in the graph DB. Make sure to change thedimensions
list accordingly. -
Now that your Aggregated Graph is up and running, you can submit your OLAP queries in it 💁.
In order to run the graph_aggregator.py
script, you'll need two instances of Neo4J running at the same time. I was able to accomplish that by using a Neo4J Instance Manager called iNeo.