Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixlengths -- insert extra commas not at end #323

Open
ggrothendieck opened this issue Jun 9, 2023 · 0 comments
Open

fixlengths -- insert extra commas not at end #323

ggrothendieck opened this issue Jun 9, 2023 · 0 comments

Comments

@ggrothendieck
Copy link

ggrothendieck commented Jun 9, 2023

One reason to need fixlengths is that there are multiple subfields for one field without quoting it. If that field is the last then fixlengths can be used but not if it is the second last, say. What would be nice is if the position of the insertion point for the extra commas could be specified. For example -1 would mean the extra comma(s) would be inserted at the last comma. If there are no commas then the commas are still added at the end.

Here is an example of sample input taken from https://stackoverflow.com/questions/76423878/reading-a-csv-file-into-r-which-contains-comma-separated-values-in-single-obser/76427295#76427295

 clothes,colours,size 
 shirt,blue,green,grey,small
 shirt,yellow,black,small
 shorts,blue,medium
 shorts,black,large

The corresponding output would be

clothes,colour1,colour2,colour3,size
shirt,blue,green,grey,small
shirt,yellow,black,,small
shorts,blue,,,medium
shorts,black,,,large

although I think it would be sufficient if it did not deal with the header since that can always be skipped by whatever program is reading it in.

To be clear, this gawk program from same source would accept that input and produce that output for this particular example.

# To run: gawk -f process.awk myfile.csv > myfile2.csv
# To configure: edit header= line as needed
BEGIN { 
    header = "clothes,colour1,colour2,colour3,size" 

    commas = gensub(/[^,]/, "", "g", header)
    ncommas = length(commas)
    FS = OFS = ","
}
NR == 1 { print header; next } # skip input header & use header variable instead
{ 
  if (NF > 1) print gensub(",", substr(commas, 1, ncommas - NF + 2), NF-1)
  else print $0 commas
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant