Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement request - allow user specified field wrapper #46

Open
seanpowell75 opened this issue Oct 9, 2023 · 3 comments
Open

enhancement request - allow user specified field wrapper #46

seanpowell75 opened this issue Oct 9, 2023 · 3 comments

Comments

@seanpowell75
Copy link

Settings (and Manual Parse) allows user to specify the delimiter, but not the text/field wrapper applied to the file. For example, text file might use | as the column separator but each field is inside its own field wrapper of ^ - so file reads:
^field1^|^field2^|^field3^
Ability to specify this would be very useful in some scenarios (like mine) and can be disregarded by others if not applicable to them

@jokedst
Copy link
Owner

jokedst commented Oct 9, 2023

Fun fact, this is how it worked initially, but I removed it because of bugs. Those bugs are fixed now though...

Currently the code mimics the build-in CSV parser in the .NET lib, i.e. the escape sequence for a string that contains the quote character itself is to double it. So e.g. 4" beam must be written "4"" beam".
Is this the behavior you expect? Some files use an escape character, typically backslash, so it would be "4\" beam".

@seanpowell75
Copy link
Author

I don't think that will work for the files I work with (or the example above). As you say, it seems to be implemented to mimic the functionality of the .Net TextFieldParser (just looked it up) so it has a yes/no for "HasFieldsEnclosedInQuotes" which is the field wrapper, but its set as quotes or nothing. The csv files I work with (technically, not comma separated values but a similar flat file structure) uses a non-alphanumerical character as the delimiter, and a different non-alphanumerical character as the field wrapper. The rationale is that it improves the robustness of the process if your input file happens to have commas or quotes in some of the fields, hence user needs to be able to specify both the delimiter an the field wrapper character. Is adding this back in something you'd consider?

@ps2goat
Copy link

ps2goat commented Dec 28, 2023

I have the same situation, however I'm using a really unique character instead of ^. This tool picks up that character as the separator instead of the non-printing white space character that is the actual separator. That allows the different columns and values to be parsed without the wrapper characters, but it adds additional columns.

This is a nice "bug" (misused feature/quirk) as long as you don't have the field wrapping char in your actual values. The only downside is that you have extra columns in the output, but if you aren't doing select *, they are filtered out of the result set. So if you set your manual parse setting to use ^ as the separator, that could be a quick workaround.

image image

But +1 to adding support for custom delimiters!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants