Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource contention on Windows using the Python API #3399

Open
ZmeiGorynych opened this issue Apr 28, 2024 · 10 comments
Open

Resource contention on Windows using the Python API #3399

ZmeiGorynych opened this issue Apr 28, 2024 · 10 comments
Assignees
Labels
bug Something isn't working high-priority transactions Transaction management related issues

Comments

@ZmeiGorynych
Copy link

When I try to insert nodes and relations in a loop, like below, sooner or later I get the following exception:
(<class 'RuntimeError'>, RuntimeError('remove: The process cannot access the file because it is being used by another process.: "C:\\Users\\Egor\\Dropbox\\Code\\motleycrew\\examples\\bug_db\\nodes.statistics_and_deleted.ids.wal"'), <traceback object at 0x000001E21B6AB280>)
This only happens on Windows (at least not on a Mac), but on Windows it always does :(.
Sleeping between inserts makes the problem less likely, but it still appears sometimes.

Running kuzu==0.3.2 installed with pip, on Python 3.11 on Window.s

import shutil
from pathlib import Path
from time import sleep
import os
import shutil

import kuzu

WORKING_DIR = Path(__file__).parent
DB_PATH = WORKING_DIR / "bug_db"
node_table_name = "entity"
rel_table_name = "links"

if os.path.isdir(DB_PATH):
    shutil.rmtree(DB_PATH)


database = kuzu.Database(DB_PATH)
connection = kuzu.Connection(database)
node_tables = connection._get_node_table_names()

if node_table_name not in node_tables:
    connection.execute("CREATE NODE TABLE %s (ID STRING, PRIMARY KEY(ID))" % node_table_name)
rel_tables = connection._get_rel_table_names()
rel_tables = [rel_table["name"] for rel_table in rel_tables]
if rel_table_name not in rel_tables:
    connection.execute(
        "CREATE REL TABLE {} (FROM {} TO {}, predicate STRING)".format(
            rel_table_name, node_table_name, node_table_name
        )
    )


def create_relation(subj: str, obj: str, rel: str) -> None:
    connection.execute(
        (
            "MATCH (n1:{}), (n2:{}) WHERE n1.ID = $subj AND n2.ID = $obj "
            "CREATE (n1)-[r:{} {{predicate: $pred}}]->(n2)"
        ).format(node_table_name, node_table_name, rel_table_name),
        {"subj": subj, "obj": obj, "pred": rel},
    )


def create_entity(entity: str) -> None:
    connection.execute(
        "CREATE (n:%s {ID: $entity})" % node_table_name,
        {"entity": entity},
    )


create_entity("entity0")

sleep_time = 0
for i in range(10):
    print(i + 1)
    create_entity(f"entity{i+1}")
    sleep(sleep_time)
    create_relation("entity1", f"entity{i+1}", "rel1")
    sleep(sleep_time)
@ZmeiGorynych ZmeiGorynych changed the title Resource contention on Windows Resource contention on Windows using the Python API Apr 28, 2024
@ZmeiGorynych
Copy link
Author

Just noticed that this doesn't happen if I make the relation name unique, eg f"rel{i+1}" instead of "rel1".
Do the relation names have to be unique even when they connect different nodes?

@prrao87
Copy link
Member

prrao87 commented Apr 28, 2024

Hi @ZmeiGorynych, relationship tables reference the underlying node table's primary keys in the FROM and TO columns (you can only have one relationship between any given pair of nodes based on their primary keys).

Just clarifying the intent behind this snippet, which doesn't seem right (or safe) to me:

sleep_time = 0
for i in range(10):
    print(i + 1)
    create_entity(f"entity{i+1}")
    sleep(sleep_time)
    create_relation("entity1", f"entity{i+1}", "rel1")
    sleep(sleep_time)

When you just create entity1 within the for loop and then immediately reference it to create the relationship, the transaction may not have been committed yet (nor the primary key index created/updated safely for the relationship table to reference). From an I/O perspective, you're much better off creating the nodes beforehand, so that the node table's hash index can get up to date and all the transactions are committed safely. Then you would be able to safely reference these primary keys in a subsequent loop to update the relationships. Maybe @ray6080 might be able to offer more advice if I've missed anything here.

@ZmeiGorynych
Copy link
Author

ZmeiGorynych commented Apr 29, 2024

Thanks a lot for the quick answer!
However, I would challenge it on several counts:
firstly, as I said the exception vanishes when I replace the line
create_relation("entity1", f"entity{i+1}", f"rel")
with
create_relation("entity1", f"entity{i+1}", f"rel{i+1}") ,
and it doesn't seem like that would happen if your explanation was correct.

Secondly, the above is a very natural calling pattern, think of it as having a function add_child that is called several times in a row, and each time it first creates a child node and then a relationship between it and entity1.

Finally, I find it very alarming that you appear to say that a collision between subsequent execute methods if they are not called in a certain magic order is reasonable behavior for kuzu. Is there no way to make sure each call of the execute method returns only after all the commits have been made, all the indices updated, etc? I don't mind waiting a bit longer on each call, but having random errors like this pop up on a logically sound sequence of calls might be a deal breaker for us using kuzu if there is no fix or workaround :(

@semihsalihoglu-uw
Copy link
Contributor

semihsalihoglu-uw commented Apr 29, 2024

Thanks for reporting the bug. Just to weigh in: I don't see anything wrong with the query. Each of your calls to connection.execute should auto-commit before returning, so you should not even need to sleep in between these. This looks like a bug with our management of the nodes.statistics_and_deleted.ids.wal file. Let's have @ray6080 look into this bug and fix it.

@semihsalihoglu-uw semihsalihoglu-uw added bug Something isn't working transactions Transaction management related issues labels Apr 29, 2024
@ZmeiGorynych
Copy link
Author

Thanks @semihsalihoglu-uw! Relieved to have you agree that this is indeed a bug :)
Probably not caught yet because your tests are likely on a Linux, and Windows can be a lot more anal about locking files.
Hopefully the fact that the bug disappears if the relation names are all distinct will make it easier to locate.

@ZmeiGorynych
Copy link
Author

ZmeiGorynych commented May 8, 2024

Just got a very similar error in a different context:
(<class 'RuntimeError'>, RuntimeError('remove: The process cannot access the file because it is being used by another process.: "C:\\Users\\Egor\\Dropbox\\Code\\motleycrew\\examples\\research_agent\\research_db\\catalog.kz.wal"'), <traceback object at 0x00000A8CDAEA8D80>)
while executing a very basic query
'CREATE REL TABLE is_subquestion (FROM Question TO Question)'

Is intermittent, sometimes it arises when I call the above query and sometimes not in exactly the same code.

Is there any ETA on resolving the above and hopefully also this one? :)

Thanks a lot!

@ray6080
Copy link
Contributor

ray6080 commented May 8, 2024

Hey @ZmeiGorynych , we will take a look into it today and get back to u on the ETA. prioritizing this now.

@benjaminwinger
Copy link
Collaborator

I haven't been able to reproduce this, but it could just be from another program accessing the wal file when it is being removed after committing. Is it possible that it's caused by Dropbox accessing the files? Have you tried disabling anti-virus scanning for the database directory?

@ZmeiGorynych
Copy link
Author

Ah, nice catch! After moving the disk location to a non-Dropbox directory, the error disappeared (or at least hasn't appeared after a dozen or so runs).
Still, this sounds like something others might encounter - would it be worth waiting for (a second?) and retrying if this error occurs, instead of failing right away?

@benjaminwinger
Copy link
Collaborator

That might work, though I think a more reliable alternative would be to avoid removing the WAL files at all and to just empty them instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high-priority transactions Transaction management related issues
Projects
None yet
Development

No branches or pull requests

5 participants