Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track IFS with file-format flags at CLI parse (was: file format flags are not overriding mlrrc) #1340

Open
holmescharles opened this issue Jul 14, 2023 · 8 comments

Comments

@holmescharles
Copy link

holmescharles commented Jul 14, 2023

I am using miller 6.8.0 on OSX. Here is my mlrrc:

pprint

When I run the following command, I get the following error:

❯ mlr --csv cat << EOF
a,b c
1,2
3,4
EOF

mlr: mlr: CSV header/data length mismatch 2 != 1 at filename (stdin) row 2.

As you can see, miller is defaulting to pprint even though I passed "--csv". It is interpreting spaces as the delimiter, not the commas.

Next, I change my mlrrc to:

#pprint

Now I can run the previous command with no issue:

❯ mlr --csv cat << EOF
a,b c
1,2
3,4
EOF

a,b c
1,2
3,4

According to the documentation, miller is supposed to override any file formats in the mlrrc when options are passed in the terminal, but this is not happening.

@aborruso
Copy link
Contributor

If I run

mlr --csv cat << EOF
a,b c
1,2
3,4
EOF

I have

a,b c
1,2
3,4

But I'm using mlr 6.7.0

@holmescharles
Copy link
Author

What is your mlrrc?

@aborruso
Copy link
Contributor

And I have the same using 6.8.

What is your mlrrc?

I do not use it

@holmescharles
Copy link
Author

I do not use it

Ok, then that explains why you don't have the same error. Now try setting your mlrrc to match mine.

@aborruso
Copy link
Contributor

Ok, then that explains why you don't have the same error. Now try setting your mlrrc to match mine.

the same error

@johnkerl johnkerl self-assigned this Jul 17, 2023
@johnkerl johnkerl added the bug label Jul 17, 2023
@johnkerl
Copy link
Owner

johnkerl commented Aug 19, 2023

Hi @holmescharles and thanks for the issue!

I looked into this and it's not quite what any of us thought.

And unfortunately I think you may not like the answer! :) And neither do I ... 😉

When pprint is in the .mlrrc file and --csv is on the command line, the input-file format is correctly found to be PPRINT from the .mlrrc, and then is correctly changed to be CSV. That isn't broken.

But what happens is --pprint, when first encountered, sets IFS from not-specified-yet, to one or more spaces. Then when --csv is encountered, the IFS is already set, and so doesn't get changed to be comma.

This isn't a .mlrrc issue as it first appeared -- rather, you get the same result with

$ mlr --pprint --csv cat << EOF
a,b c
1,2
3,4
EOF
mlr: mlr: CSV header/data length mismatch 2 != 1 at filename (stdin) row 2.

So: a fix could be that --csv should always re-write IFS after --pprint has set it.

This is a bigger issue and it goes back all the way to Miller 1 and involves the fact that these flags are overlapping. So --pprint and --csv set the input file format and IFS; and so on. And from the beginning I wanted people to do either

mlr --ifs ';' --csv ...

or

mlr --csv --ifs ';' ...

-- and Miller has always had that flexibility, and I don't want to take it away. I think it would break a lot of things for a lot of people, to change it now.

So ...... I think the best that can be done at present is this suggestion:

mlr --pprint --csv --ifs comma cat << EOF
a,b c
1,2
3,4
EOF

with the explicit --ifs comma.

I wish I had a better answer, but, at this point in time, and given the long-standing usage pattern above, I don't see a better option.

Later thought: I thought there may be a way to accommodate this but I'm still unsure. Namely:

  • Currently, if IFS is not set, --pprint sets input file format to PPRINT and IFS to one or more spaces.
  • Currently, if IFS is not set, --csv sets input file format to CSV and IFS to comma.
  • The flag for if IFS was set or not is here: https://github.com/johnkerl/miller/blob/6.8.0/internal/pkg/cli/option_types.go#L48-L54
  • What I could do is change it from "if IFS set" to "is IFS set for the last-specified input format"
  • This would still need handling for mlr --ifs ';' --csv ... since there is no last-specified input format when --ifs comes first on the command line ... 🤔

@johnkerl johnkerl changed the title File format flags are not overriding mlrrc Track IFS with file-format flags at CLI parse (was: file format flags are not overriding mlrrc) Aug 19, 2023
@holmescharles
Copy link
Author

Maybe I'm thinking about this incorrectly, but could you keep track of the give format flag and wait until the end to use that to set things like IFS etc? That way, if the user specifies something else, it waits until the last minute to assign unassigned variables? I'm not sure how things are implemented, but this could just be tracked in some new variable? Then, if you specify --pprint before --csv, it will completely ignore pprint.

Alternatively, pprint is just csv with "contiguous whitespace" as delimiters. For my own curiosity, how would you specify contiguous whitespace as IFS?

@johnkerl
Copy link
Owner

johnkerl commented Aug 21, 2023

Maybe I'm thinking about this incorrectly, but could you keep track of the give format flag and wait until the end to use that to set things like IFS etc?

Yes, you & I are talking about the same thing -- what you just wrote, and my bullet list above.

So --csv --ifs semicolon and --ifs semicolon --csv would do the same, but, if --pprint (or --xtab or whatever) appears after that, it would "reset".

pprint is just csv with "contiguous whitespace" as delimiters

Not quite. --ipprint is the same as --icsvlite --ifs space --repifs. And --opprint is not the same as --o anything else. This is because there's significant code reuse between the csv-lite and pprint readers, whereas the pprint writer has very much its own logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants