New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Data Import): custom csv delimiters, UTF-8 BOM handling #26183
base: develop
Are you sure you want to change the base?
Conversation
Example files for the UTF8-BOM fix: You can compare the two files using the xxd tool: $ xxd utf8.csv
00000000: 2263 6f6c 3122 3b22 636f 6c32 220d 0a31 "col1";"col2"..1
00000010: 3b32 ;2 Note the first 3 bytes $ xxd utf8bom.csv
00000000: efbb bf22 636f 6c31 223b 2263 6f6c 3222 ..."col1";"col2"
00000010: 0d0a 313b 32 ..1;2 |
23e1336
to
e5f2878
Compare
@akhilnarang, fixed commit messages |
@akhilnarang, Docs linked fixed.
|
@vmatt it's fine, semgrep picks up unchanged parts in modified files as well. After merge it won't bother again until the file is edited again, since it's intentional in some places. |
@akhilnarang I just realised that i forgot to push the commit where I added Linter will still fail on frappe-modifying-but-not-comitting-other-method, not sure if it will go through the rest of the steps, if that step still failing. But beside this, I'd say it's complete now, and safe to merge, but let me know, if I have to make any adjustments. |
Great, will test it and let you know. |
98bd595
to
45eabd3
Compare
Hey @akhilnarang, sorry, I messed up my branches late yesterday and accidentally combined two different pull request changes together (#26200). All fixed now, won't happen again. Please trigger the tests now again. |
Should I add semgrep exceptions for File.py to ignore these errors, so it would pass the last test, as these were already part of the code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things
Co-authored-by: Akhil Narang <me@akhilnarang.dev>
Avoid semgrep issue with translated string Signed-off-by: Akhil Narang <me@akhilnarang.dev>
Signed-off-by: Akhil Narang <me@akhilnarang.dev>
4ca4372
to
0f4e916
Compare
Hello,
Closes #26182
Closes #22151
The current Frappe Data Import only supports standard CSV files, leading to issues when importing data from other *sv file formats (e.g., TSV, semicolon-separated values).
This pull request introduces the following enhancements:
csv.Sniffer
to automatically detect delimiters like tabs (\t
), semicolons (;
), and commas (,
) in the uploaded file.delimiter_options
is added to the Data Import doctype, allowing users to manually specify the delimiter used in their file, ensuring successful import even when auto-detection fails. (default value is ",;\t|" - most common delimiter types)These changes expand the capabilities of the Data Import Doctype, making it more flexible and user-friendly for a broader range of file formats and business contexts.
Without fixing the UTF8 with BOM parsing, if i'm loading my semicolon separated, utf-8 BOM file, it will still fail to recognize the first column, as Frappe fails to strip the doublequotes properly, because the first row will start as \ufeff"col1", instead "col1":
Tested and validated, added two new test cases in the test_importer.py file.
docs: https://docs.erpnext.com/docs/user/manual/en/data-import
I look forward to your feedback and comments!