Add some way to split a field #24

Llammissar · 2017-02-01T21:02:21Z

Another feature request that came to mind as I was working. Consider the following single column of data:

file
5core_05thread
5core_06thread
5core_07thread
5core_08thread

I ended up doing it in post-process, but I think it'd be handy to have some way to split fields so that it comes out like this:

cores  threads
5      5
5      6
5      7
5      8

The text was updated successfully, but these errors were encountered:

jondegenhardt · 2017-02-01T21:51:22Z

Nice use case. My first thought is to wonder if there enough commonality in these patterns to develop a tool around. More examples would shed light on this. But, if it turned out that the flexibility of awk or sed is needed, then it might be best to leave these tasks to those tools and custom scripts.

Llammissar · 2017-02-02T17:21:37Z

That's a good point, and I'm not unsympathetic to it at all. If I hit more examples, I'll try to remember to outline them here.

I'll note up front that I really don't like sed/awk for this sort of thing because they're specifically general line-oriented tools. It's fine if there's something like "cores" to anchor on for extracting numbers and splitting them (and I think you rightly surmise that I wasn't looking to necessarily extract the column name in the same operation), but for the more general case? They're clunky-- the awareness of columns is extremely powerful and useful.

Just doodling here, but something like:
tsv-filter --split 1:_:cores,threads
...could be helpful. Or maybe something like regex substitution via capture groups:
tsv-filter --split 1:'([0-9]+)cores_([0-9]+)threads':cores,threads
...if we continue looking at my original example. (The column selector is necessary for the more general case that you have multiple columns with the delimiter of interest -- colon, for example -- but you only want to split one of them and the other is something like a timestamp.)

Broadly, I think I'd characterise this class of problem as "normalisation", which also includes other transformations on columns. (For example, some existing tools produce measures in whole seconds, so I want to multiply that my 1000 or divide the millisecond metrics by the same so they can be compared properly. ...This might be a separate ER?)

jondegenhardt added the enhancement label Feb 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some way to split a field #24

Add some way to split a field #24

Llammissar commented Feb 1, 2017

jondegenhardt commented Feb 1, 2017

Llammissar commented Feb 2, 2017

Add some way to split a field #24

Add some way to split a field #24

Comments

Llammissar commented Feb 1, 2017

jondegenhardt commented Feb 1, 2017

Llammissar commented Feb 2, 2017