Skip to content

Commit

Permalink
Fix timeout issue (#29)
Browse files Browse the repository at this point in the history
* Update documentation

* Throw on too many tries; add docs

* Add executor troubleshooting documentation
  • Loading branch information
j6k4m8 committed Feb 26, 2019
1 parent d5c9513 commit 8b41bb1
Show file tree
Hide file tree
Showing 4 changed files with 129 additions and 15 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Expand Up @@ -4,7 +4,9 @@
- Support for node and edge attributes in non-template contexts
- Executors:
- `NetworkxExecutor`
- Support for attribute filtering
- Support for node and edge constraints
- `Neo4jExecutor`
- Support for node and edge constraints
- **0.3.0**
- Overhaul of DotMotif Parsers
- Implement EBNF spec of language, and complete parser (`lark`)
Expand Down
4 changes: 2 additions & 2 deletions README.md
Expand Up @@ -13,9 +13,9 @@ You can currently write motifs in dotmotif form, which is a DSL that specializes
`threecycle.motif`
```
# A excites B
A -+ B
A -> B [type = "excitatory"]
# B inhibits C
B -| C
B -> C [type = "inhibitory"]
```

## Ingesting the motif into dotmotif
Expand Down
76 changes: 76 additions & 0 deletions docs/Troubleshooting-Neo4jExecutor.md
@@ -0,0 +1,76 @@
# Troubleshooting `dotmotif.Neo4jExecutor`

The Neo4j executor has a lot of moving parts, so here are some frequently encountered issues and how to solve them.

## Best-Practices

### Memory Management

Memory variables should almost always be set explicitly; `Neo4jExecutor`s cannot currently "guess" how much memory you want it to use.

- `max_memory`: How much RAM to use (maximum). Suggested value is "XG", where X is the integer number of gigabytes of RAM your machine has, minus a bit. (So for a 128GB machine, `120G` seems like a good start.)
- `initial_memory`: How big the JVM stack should be to start. (You can generally ignore this if you don't know what it means.)

<details>
<summary>Important Warnings</summary>
When creating a new Executor, specify `max_memory` and `initial_memory` so that the Executor can expand to fill the available space. But...a few warnings! **If you set `max_memory` too high, your container will silently fail (as the container will try to allocate too much memory, and then do whatever it is that the JVM does instead of segfaulting noisily).

`initial_memory` should be set high enough that the JVM doesn't have to reprovision memory too many times; but be aware that this amount of memory will be _unavailable_ to the rest of your system while the executor is alive.
</details>

## Errors when starting a new Executor

<ul>
<li>
<details>
<summary><b>I get a warning that "host port 7474 is already in use."</b></summary>

You may already have a running Neo4jExecutor or Neo4j database container which is using the 7474 port. Check with `docker ps`.
</details>
</li>
<li>
<details>
<summary><b>The executor waits for a long time, and then tells me it failed to reach the Neo4j server.</b></summary>

This means that the executor tried to create a new docker container, but was unable to reach it.
</details>
</li>
</ul>

## Errors with executor responses

<ul>
<li>
<details>
<summary><b>After I run <code>Executor.find</code>, the result list is empty! But I know that my graph contains that motif!</b></summary>

The .find() returns a _cursor_ to your results, not your results themselves. Please consider the following:

```python

E = Neo4jExecutor(...)

results = E.find(motif)
A = results.to_table()
B = results.to_table()
```

`A` will contain all of your results; `B` will be EMPTY, since you already "used up" your results in the first call to `to_table`. You can learn more about cursors [here](https://py2neo.org/v4/database.html#cursor-objects). Once you have assigned these values to `A`, you can reuse them as many times as you like.

Common Follow-up Question: _WHYYY DO YOU DO THIS_

In some cases, you may receive too many results to easily process (many gigabytes of results). In these cases, you will want to 'stream' the results instead of getting a list of all of them. Here, `next(results)` is your friend!

If you prefer that the executor return a nicely parsed table of results, you can pass `cursor=False` to `find`. You will then get back a Python data structure instead of a cursor.


</details>
</li>
<li>
<details>
<summary><b>The executor waits for a long time, and then tells me it failed to reach the Neo4j server.</b></summary>

This means that the executor tried to create a new docker container, but was unable to reach it.
</details>
</li>
</ul>
60 changes: 48 additions & 12 deletions dotmotif/executors/Neo4jExecutor.py
Expand Up @@ -69,20 +69,55 @@ def __init__(self, **kwargs) -> None:
If there is no existing database and you do not pass in a graph, you
must pass an `import_directory`, which the container will mount as an
importable CSV resource.
Arguments:
db_bolt_uri (str): If connecting to an existing server, the URI
of the server (including the port, probably 7474).
username (str: "neo4j"): The username to use to attach to an
existing server.
password (str): The password to use to attach to an existing server.
graph (nx.Graph): If provisioning a new database, the networkx
graph to import into the database.
import_directory (str): If provisioning a new database, the local
directory to crawl for CSVs to import into the Neo4j database.
Commonly used when you want to quickly and easily start a new
Executor that uses the export from a previous graph.
autoremove_container (bool: True): Whether to delete the container
when the executor is deconstructed. Set to False if you'd like
to be able to connect with other executors after the first one
has closed.
max_memory (str: "4G"): The maximum amount of memory to provision.
initial_memory (str: "2G"): The starting heap-size for the Neo4j
container's JVM.
max_retries (int: 20): The number of times DotMotif should try to
connect to the neo4j container before giving up.
"""

db_bolt_uri: str = kwargs.get("db_bolt_uri", None)
username: str = kwargs.get("username", "neo4j")
password: str = kwargs.get("password", None)
self._max_memory_size: str = kwargs.get("max_memory", "8G")
self._autoremove_container: str = kwargs.get(
"autoremove_container", True
)
self._max_memory_size: str = kwargs.get("max_memory", "4G")
self._initial_heap_size: str = kwargs.get("initial_memory", "2G")
self.max_retries: int = kwargs.get("max_retries", 20)
self._initial_heap_size: str = kwargs.get("initial_memory", "4G")

graph: nx.Graph = kwargs.get("graph", None)
import_directory: str = kwargs.get("import_directory", None)

self._created_container = False

if (
(db_bolt_uri and graph) or
(db_bolt_uri and import_directory) or
(import_directory and graph)
):
raise ValueError(
"Specify EXACTLY ONE of db_bolt_uri/graph/import_directory."
)

if db_bolt_uri and password:
# Authentication information was provided. Use this to log in and
# connect to the existing database.
Expand Down Expand Up @@ -132,7 +167,7 @@ def _create_container(self, import_dir: str):
./bin/neo4j-admin set-initial-password neo4jpw &&
./bin/neo4j start &&
tail -f /dev/null'""",
# auto_remove=True,
auto_remove=self._autoremove_container,
detach=True,
environment={
"NEO4J_dbms_memory_heap_initial__size": self._initial_heap_size,
Expand All @@ -150,15 +185,16 @@ def _create_container(self, import_dir: str):
res = requests.get("http://localhost:7474")
if res.status_code == 200:
container_is_ready = True
except:
pass
else:
tries += 1
time.sleep(2)
if tries > self.max_retries:
raise IOError(
f"Could not connect to neo4j container {self._running_container}."
)
else:
tries += 1
time.sleep(2)
if tries > self.max_retries:
raise IOError(
f"Could not connect to neo4j container {self._running_container}."
"For more information, see Troubleshooting-Neo4jExecutor.md in the docs."
)
except requests.RequestException as e:
raise requests.RequestException("Failed to reach Neo4j HTTP server.") from e
self.G = Graph(password="neo4jpw")

def _teardown_container(self):
Expand Down

0 comments on commit 8b41bb1

Please sign in to comment.