Fix timeout issue (#29)

* Update documentation * Throw on too many tries; add docs * Add executor troubleshooting documentation
aplbrain · Feb 26, 2019 · 8b41bb1 · 8b41bb1
1 parent d5c9513
commit 8b41bb1
Show file tree

Hide file tree

Showing 4 changed files with 129 additions and 15 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,7 +4,9 @@
         - Support for node and edge attributes in non-template contexts
     - Executors:
         - `NetworkxExecutor`
-            - Support for attribute filtering
+            - Support for node and edge constraints
+        - `Neo4jExecutor`
+            - Support for node and edge constraints
 - **0.3.0**
     - Overhaul of DotMotif Parsers
         - Implement EBNF spec of language, and complete parser (`lark`)

diff --git a/README.md b/README.md
@@ -13,9 +13,9 @@ You can currently write motifs in dotmotif form, which is a DSL that specializes
 `threecycle.motif`
 ```
 # A excites B
-A -+ B
+A -> B [type = "excitatory"]
 # B inhibits C
-B -| C
+B -> C [type = "inhibitory"]
 ```
 
 ## Ingesting the motif into dotmotif

diff --git a/docs/Troubleshooting-Neo4jExecutor.md b/docs/Troubleshooting-Neo4jExecutor.md
@@ -0,0 +1,76 @@
+# Troubleshooting `dotmotif.Neo4jExecutor`
+
+The Neo4j executor has a lot of moving parts, so here are some frequently encountered issues and how to solve them.
+
+## Best-Practices
+
+### Memory Management
+
+Memory variables should almost always be set explicitly; `Neo4jExecutor`s cannot currently "guess" how much memory you want it to use.
+
+- `max_memory`: How much RAM to use (maximum). Suggested value is "XG", where X is the integer number of gigabytes of RAM your machine has, minus a bit. (So for a 128GB machine, `120G` seems like a good start.)
+- `initial_memory`: How big the JVM stack should be to start. (You can generally ignore this if you don't know what it means.)
+
+<details>
+<summary>Important Warnings</summary>
+When creating a new Executor, specify `max_memory` and `initial_memory` so that the Executor can expand to fill the available space. But...a few warnings! **If you set `max_memory` too high, your container will silently fail (as the container will try to allocate too much memory, and then do whatever it is that the JVM does instead of segfaulting noisily).
+
+`initial_memory` should be set high enough that the JVM doesn't have to reprovision memory too many times; but be aware that this amount of memory will be _unavailable_ to the rest of your system while the executor is alive.
+</details>
+
+## Errors when starting a new Executor
+
+<ul>
+<li>
+<details>
+<summary><b>I get a warning that "host port 7474 is already in use."</b></summary>
+
+You may already have a running Neo4jExecutor or Neo4j database container which is using the 7474 port. Check with `docker ps`.
+</details>
+</li>
+<li>
+<details>
+<summary><b>The executor waits for a long time, and then tells me it failed to reach the Neo4j server.</b></summary>
+
+This means that the executor tried to create a new docker container, but was unable to reach it.
+</details>
+</li>
+</ul>
+
+## Errors with executor responses
+
+<ul>
+<li>
+<details>
+<summary><b>After I run <code>Executor.find</code>, the result list is empty! But I know that my graph contains that motif!</b></summary>
+
+The .find() returns a _cursor_ to your results, not your results themselves. Please consider the following:
+
+```python
+
+E = Neo4jExecutor(...)
+
+results = E.find(motif)
+A = results.to_table()
+B = results.to_table()
+```
+
+`A` will contain all of your results; `B` will be EMPTY, since you already "used up" your results in the first call to `to_table`. You can learn more about cursors [here](https://py2neo.org/v4/database.html#cursor-objects). Once you have assigned these values to `A`, you can reuse them as many times as you like.
+
+Common Follow-up Question: _WHYYY DO YOU DO THIS_
+
+In some cases, you may receive too many results to easily process (many gigabytes of results). In these cases, you will want to 'stream' the results instead of getting a list of all of them. Here, `next(results)` is your friend!
+
+If you prefer that the executor return a nicely parsed table of results, you can pass `cursor=False` to `find`. You will then get back a Python data structure instead of a cursor.
+
+
+</details>
+</li>
+<li>
+<details>
+<summary><b>The executor waits for a long time, and then tells me it failed to reach the Neo4j server.</b></summary>
+
+This means that the executor tried to create a new docker container, but was unable to reach it.
+</details>
+</li>
+</ul>
diff --git a/dotmotif/executors/Neo4jExecutor.py b/dotmotif/executors/Neo4jExecutor.py
@@ -69,20 +69,55 @@ def __init__(self, **kwargs) -> None:
         If there is no existing database and you do not pass in a graph, you
         must pass an `import_directory`, which the container will mount as an
         importable CSV resource.
+
+        Arguments:
+            db_bolt_uri (str): If connecting to an existing server, the URI
+                of the server (including the port, probably 7474).
+            username (str: "neo4j"): The username to use to attach to an
+                existing server.
+            password (str): The password to use to attach to an existing server.
+            graph (nx.Graph): If provisioning a new database, the networkx
+                graph to import into the database.
+            import_directory (str): If provisioning a new database, the local
+                directory to crawl for CSVs to import into the Neo4j database.
+                Commonly used when you want to quickly and easily start a new
+                Executor that uses the export from a previous graph.
+            autoremove_container (bool: True): Whether to delete the container
+                when the executor is deconstructed. Set to False if you'd like
+                to be able to connect with other executors after the first one
+                has closed.
+            max_memory (str: "4G"): The maximum amount of memory to provision.
+            initial_memory (str: "2G"): The starting heap-size for the Neo4j
+                container's JVM.
+            max_retries (int: 20): The number of times DotMotif should try to
+                connect to the neo4j container before giving up.
+
         """
 
         db_bolt_uri: str = kwargs.get("db_bolt_uri", None)
         username: str = kwargs.get("username", "neo4j")
         password: str = kwargs.get("password", None)
-        self._max_memory_size: str = kwargs.get("max_memory", "8G")
+        self._autoremove_container: str = kwargs.get(
+            "autoremove_container", True
+        )
+        self._max_memory_size: str = kwargs.get("max_memory", "4G")
+        self._initial_heap_size: str = kwargs.get("initial_memory", "2G")
         self.max_retries: int = kwargs.get("max_retries", 20)
-        self._initial_heap_size: str = kwargs.get("initial_memory", "4G")
 
         graph: nx.Graph = kwargs.get("graph", None)
         import_directory: str = kwargs.get("import_directory", None)
 
         self._created_container = False
 
+        if (
+            (db_bolt_uri and graph) or
+            (db_bolt_uri and import_directory) or
+            (import_directory and graph)
+        ):
+            raise ValueError(
+                "Specify EXACTLY ONE of db_bolt_uri/graph/import_directory."
+            )
+
         if db_bolt_uri and password:
             # Authentication information was provided. Use this to log in and
             # connect to the existing database.
@@ -132,7 +167,7 @@ def _create_container(self, import_dir: str):
             ./bin/neo4j-admin set-initial-password neo4jpw &&
             ./bin/neo4j start &&
             tail -f /dev/null'""",
-            # auto_remove=True,
+            auto_remove=self._autoremove_container,
             detach=True,
             environment={
                 "NEO4J_dbms_memory_heap_initial__size": self._initial_heap_size,
@@ -150,15 +185,16 @@ def _create_container(self, import_dir: str):
                 res = requests.get("http://localhost:7474")
                 if res.status_code == 200:
                     container_is_ready = True
-            except:
-                pass
-            else:
-                tries += 1
-                time.sleep(2)
-                if tries > self.max_retries:
-                    raise IOError(
-                        f"Could not connect to neo4j container {self._running_container}."
-                    )
+                else:
+                    tries += 1
+                    time.sleep(2)
+                    if tries > self.max_retries:
+                        raise IOError(
+                            f"Could not connect to neo4j container {self._running_container}."
+                            "For more information, see Troubleshooting-Neo4jExecutor.md in the docs."
+                        )
+            except requests.RequestException as e:
+                raise requests.RequestException("Failed to reach Neo4j HTTP server.") from e
         self.G = Graph(password="neo4jpw")
 
     def _teardown_container(self):