Header not CSV but data is CSV with -o <fmt> #449

mkgvt · 2023-06-27T13:10:46Z

Specifying the output format using the -o <fmt> option results in the body being CSV but the header is not. This makes further processing with CSV tools (such as xsv) more difficult than it should be as the header line is seen as a single field rather than a field per formatted item.

Example: I would have expected a comma immediately after (raw) which leads to an error from xsv:

$ nfdump -o 'fmt:%tsr,%bpp' -r /nfcapd.202306230000
Date first seen (raw)        Bpp
1687492697.088,    40
1687492696.832,    44
1687492697.856,    40
1687387348.992,   216
1687492696.320,    40
1687492699.648,    40
1687492698.368,    40
1687492648.960,   380
1687492799.488,   134
...
$ nfdump -o 'fmt:%tsr,%bpp' -r /nfcapd.202306230000 | xsv table
Date first seen (raw)        Bpp
CSV error: record 1 (line: 2, byte: 33): found record with 2 fields, but the previous record has 1 fields

I believe the issue occurs as the format is parsed (in ParseOutputFormat) and header_string is created. It looks like commas between fields should be inserted at that time.

The text was updated successfully, but these errors were encountered:

phaag · 2023-07-01T08:20:36Z

Yes - true - but it was not meant to create a csv output :) but I will check, if the change does not break other things.

thezoggy · 2023-07-10T18:48:53Z

for csv, i just use -o csv

phaag · 2023-10-14T15:05:24Z

In order to be more flexibel I propose to replace the old csv code with an user defined such as
nfdump -o 'csv:%tsr,%bpp'

It needs some work to implement.

MrdUkk · 2024-01-21T10:26:08Z

I agree completely here. because ALL COLUMNS csv export is bloat for everyone's use. only limited number of columns needed for practical jobs.
Having to dump all columns in every situation produces very large files resulting in lots of CO2 emissions because hardware eats energy. and wasting tons of unneeded bytes in CSV results in high CPU, RAM, Disk storage utilization :)

Currenly im adding 'header line' by simple bash script but that way is not very robust.

and second opinion: CSV export should not be dropped (marked as obsolete in current version)! JSON-export has very large overhead! it was bloated for that type of data. more overhead , more disk utilization, more I/O, more RAM and CPU intensive operations. so using CSV should be more flexible as mentioned by topic-starter and not dropped.

for example workflow: nfcapd -> nfdump -> csv -> clickhouse timeseriesDB import from csv file.... and job is done flawlessly!

phaag self-assigned this Oct 14, 2023

phaag added the Feature request Feature request label Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Header not CSV but data is CSV with -o <fmt> #449

Header not CSV but data is CSV with -o <fmt> #449

mkgvt commented Jun 27, 2023

phaag commented Jul 1, 2023

thezoggy commented Jul 10, 2023

phaag commented Oct 14, 2023

MrdUkk commented Jan 21, 2024 •

edited

Header not CSV but data is CSV with -o <fmt> #449

Header not CSV but data is CSV with -o <fmt> #449

Comments

mkgvt commented Jun 27, 2023

phaag commented Jul 1, 2023

thezoggy commented Jul 10, 2023

phaag commented Oct 14, 2023

MrdUkk commented Jan 21, 2024 • edited

MrdUkk commented Jan 21, 2024 •

edited