Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header not CSV but data is CSV with -o <fmt> #449

Open
mkgvt opened this issue Jun 27, 2023 · 4 comments
Open

Header not CSV but data is CSV with -o <fmt> #449

mkgvt opened this issue Jun 27, 2023 · 4 comments
Assignees
Labels
Feature request Feature request

Comments

@mkgvt
Copy link

mkgvt commented Jun 27, 2023

Specifying the output format using the -o <fmt> option results in the body being CSV but the header is not. This makes further processing with CSV tools (such as xsv) more difficult than it should be as the header line is seen as a single field rather than a field per formatted item.

Example: I would have expected a comma immediately after (raw) which leads to an error from xsv:

$ nfdump -o 'fmt:%tsr,%bpp' -r /nfcapd.202306230000
Date first seen (raw)        Bpp
1687492697.088,    40
1687492696.832,    44
1687492697.856,    40
1687387348.992,   216
1687492696.320,    40
1687492699.648,    40
1687492698.368,    40
1687492648.960,   380
1687492799.488,   134
...
$ nfdump -o 'fmt:%tsr,%bpp' -r /nfcapd.202306230000 | xsv table
Date first seen (raw)        Bpp
CSV error: record 1 (line: 2, byte: 33): found record with 2 fields, but the previous record has 1 fields

I believe the issue occurs as the format is parsed (in ParseOutputFormat) and header_string is created. It looks like commas between fields should be inserted at that time.

@phaag
Copy link
Owner

phaag commented Jul 1, 2023

Yes - true - but it was not meant to create a csv output :) but I will check, if the change does not break other things.

@thezoggy
Copy link
Contributor

for csv, i just use -o csv

@phaag
Copy link
Owner

phaag commented Oct 14, 2023

In order to be more flexibel I propose to replace the old csv code with an user defined such as
nfdump -o 'csv:%tsr,%bpp'

It needs some work to implement.

@phaag phaag self-assigned this Oct 14, 2023
@phaag phaag added the Feature request Feature request label Oct 14, 2023
@MrdUkk
Copy link

MrdUkk commented Jan 21, 2024

I agree completely here. because ALL COLUMNS csv export is bloat for everyone's use. only limited number of columns needed for practical jobs.
Having to dump all columns in every situation produces very large files resulting in lots of CO2 emissions because hardware eats energy. and wasting tons of unneeded bytes in CSV results in high CPU, RAM, Disk storage utilization :)

Currenly im adding 'header line' by simple bash script but that way is not very robust.

and second opinion: CSV export should not be dropped (marked as obsolete in current version)! JSON-export has very large overhead! it was bloated for that type of data. more overhead , more disk utilization, more I/O, more RAM and CPU intensive operations. so using CSV should be more flexible as mentioned by topic-starter and not dropped.

for example workflow: nfcapd -> nfdump -> csv -> clickhouse timeseriesDB import from csv file.... and job is done flawlessly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Feature request
Projects
None yet
Development

No branches or pull requests

4 participants