Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing documentation on filtering rows on LOAD CSV when using CALL {} IN TRANSACTIONS #13408

Open
Thijss opened this issue Feb 29, 2024 · 2 comments

Comments

@Thijss
Copy link

Thijss commented Feb 29, 2024

Hi there,

I would like to suggest adding some extra documentation on using LOAD CSV on large datasets where filtering is required.

I stumbled upon this when upgrading old Neo4j cypher code (v3.5.x) to new syntax (v5.17.0)

I was using the following in the old syntax:

USING PERIODIC COMMIT
LOAD CSV ...
WITH row
WHERE ...
MERGE...(etc.)

When converting this to the new syntax I first tried this:

LOAD CSV ...
CALL {
    WITH row
    WHERE ...
    MERGE...(etc.)
} IN TRANSACTIONS

However this results in the following error:

neo4j.exceptions.CypherSyntaxError: Importing WITH should consist only of simple references to outside variables. WHERE is not allowed.

After some tinkering I came up with the following:

LOAD CSV ...
WITH row
WHERE ...
CALL {
    WITH row
    MERGE...(etc.)
} IN TRANSACTIONS

which seems to work fine.

If this is a viable solution in your view, this could perhaps be added as an example, to save time for future users.

@Thijss Thijss added the feature label Feb 29, 2024
@InverseFalcon
Copy link

The reason your initial approach did not work is due to restrictions on the importing WITH clause...the one at the start of the subquery, as the purpose of an importing WITH is only to define what is in-scope for the subquery, you cannot filter on it.

You can add a subsequent WITH immediately after for the purpose of filtering afterward:

LOAD CSV ...
CALL {
    WITH row   // importing WITH, can't filter with a WHERE
    
    WITH row  // regular WITH, we can filter this one
    WHERE ...
    MERGE...(etc.)
} IN TRANSACTIONS

@Thijss
Copy link
Author

Thijss commented Mar 1, 2024

Thank you for clarifying!

Aside from the runtime error, I did not find any documentation on the restriction of these importing WITH clauses.

Also, this restriction seems to apply specifically to importing WITH clauses within CALL {} IN TRANSACTIONS: I ran the same query without transactions and it did not throw this error (it ran out of memory though).

Anyways, perhaps the restriction info and solution example could be added to the documentation here:
https://neo4j.com/docs/cypher-manual/current/clauses/load-csv/#load-csv-importing-large-amounts-of-data

Besides, I found references to the deprecated USING PERIODIC COMMIT here. I think these should be updated?
https://neo4j.com/developer/guide-import-csv/#_optimizing_load_csv_for_performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants