Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues importing from CSVs that contain strings with quotes #3461

Open
oysterlanguage opened this issue May 8, 2024 · 0 comments
Open

Issues importing from CSVs that contain strings with quotes #3461

oysterlanguage opened this issue May 8, 2024 · 0 comments
Assignees
Labels
bug Something isn't working data-import-export Issues related to data importing or exporting, such as copy to/from statements

Comments

@oysterlanguage
Copy link

Having issues importing from CSVs that contain strings with quotes in them. When I load the csv through pandas it works as expected.

import kuzu
import csv
import pandas as pd

target_vertices_field_names = ["ID", "Name", "Quote"]
data = [
    {"ID": 1, "Name": "John Doe", "Quote": 'This is a "quote"'},
    {"ID": 2, "Name": "Jane Smith", "Quote": 'Another, "example" here'}
]
with open('/tmp/output.csv', 'w', newline='') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=data[0].keys(), quotechar='"', escapechar='\\')
    writer.writeheader()
    for row in data:
        writer.writerow(row)

with open('/tmp/output.csv', 'r') as file:
    print(file.read())

db = kuzu.Database("kuzu_test_db")
conn = kuzu.Connection(db)
try:
    conn.execute(
        "CREATE NODE TABLE Test("
        "ID INT64,"
        "Name STRING, "
        "Quote STRING, "
        "PRIMARY KEY (ID))"
    )
except:
    pass

response = conn.execute(f'COPY Test FROM (LOAD WITH HEADERS (ID INT64, Name STRING, Quote STRING) FROM "/tmp/output.csv" (HEADER=true) WHERE NOT EXISTS {{MATCH (t:Test) WHERE t.ID = ID}} RETURN *)')
while response.has_next():
    print(f"Inserted Test {response.get_next()}")

# df = pd.read_csv('/tmp/output.csv')
# response = conn.execute(f'COPY Test FROM (LOAD WITH HEADERS (ID INT64, Name STRING, Quote STRING) FROM df WHERE NOT EXISTS {{MATCH (t:Test) WHERE t.ID = ID}} RETURN *)')
# while response.has_next():
#     print(f"Inserted Test {response.get_next()}")

response = conn.execute('MATCH (t:Test) RETURN *')
while response.has_next():
    print(response.get_next())

Depending on how i format the csv i get different errros:

writer = csv.DictWriter(csvfile, fieldnames=data[0].keys(), quotechar='"', escapechar='\\', quoting=csv.QUOTE_ALL)

"ID","Name","Quote"
"1","John Doe","This is a ""quote"""
"2","Jane Smith","Another, ""example"" here"

RuntimeError: Copy exception: Error in file /tmp/output.csv on line 2: quote should be followed by end of file, end of value, end of row or another quote.
writer = csv.DictWriter(csvfile, fieldnames=data[0].keys(), quotechar='"', escapechar='\\', quoting=csv.QUOTE_NONE)

ID,Name,Quote
1,John Doe,This is a \"quote\"
2,Jane Smith,Another\, \"example\" here

RuntimeError: Copy exception: Error in file /tmp/output.csv, on line 3: expected 3 values per row, but got more.

Tested on version 0.4.1

@prrao87 prrao87 added bug Something isn't working data-import-export Issues related to data importing or exporting, such as copy to/from statements labels May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data-import-export Issues related to data importing or exporting, such as copy to/from statements
Projects
None yet
Development

No branches or pull requests

3 participants