Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV files, variable number of fields #2188

Open
PSLLSP opened this issue Mar 29, 2024 · 0 comments
Open

CSV files, variable number of fields #2188

PSLLSP opened this issue Mar 29, 2024 · 0 comments
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. csv The csv file format, csv output format, or generally CSV-related.

Comments

@PSLLSP
Copy link

PSLLSP commented Mar 29, 2024

hledger 1.32.3

Current hledger can process only well formed CSV files. It can skip several lines at the start of the file. And a rule end can be used to force skip to the rest of the current file.

I would like to have an option to process CSV files with variable number of fields, like that when field is missing, hledger will not end with an error because field is missing.

  • there could be an option to define minimal number of mandatory fields
  • new variable will be defined, that will keep number of fields in the current record; awk has variable NF, something like that

There is rule skip in if block statement, this could be used to skip records that doesn't have correct number of fields or take an action on these records, like adding missing information.


I have CSV records, those have several header lines followed by CSV records those have data I want to process. I can design hledger rule file that process one such file but I cannot design rule file that process those files when joined with cat *.csv > import.csv.

Example:

$ cat import.csv
statement: 2023/01
account: 123456789/4321
owner: John Smith
start date: 1.1.2023
end date: 31.1.2023
opening balance:  100.00 EUR
closing balance:  100.50 EUR

"ID";"Date";"Volume";"Currency";"Description";"Note"
"54321";"31.01.2023";"0.5";"EUR";"Interest";""

statement: 2023/02
account: 123456789/4321
owner: John Smith
start date: 1.2.2023
end date: 28.2.2023
opening balance:  100.50 EUR
closing balance:  101.00 EUR

"ID";"Date";"Volume";"Currency";"Description";"Note"
"56324";"28.02.2023";"0.5";"EUR";"Interest";""

I would like to define that no fields are mandatory (to be able to skip empty lines)
I would like to use if block rule to:

  • skip records/lines those do not have 6 fields
  • skip records where field1 (%1) has value ID - to ignore headers
  • process the rest like regular CSV records

I tried to design import rules for simplified case (all records have 6 fields) and it can be done. I only miss an option to compare number of CSV fields, so I have to use a workarround to find valid CSV records

CSV file for test:

$ cat test-import1.csv
"ID";"Date";"Volume";"Currency";"Description";"Note"
"54321";"31.01.2023";"0.5";"EUR";"Interest";""

"ID";"Date";"Volume";"Currency";"Description";"Note"
"56324";"28.02.2023";"0.6";"EUR";"Interest";""

"ID";"Date";"Volume";"Currency";"Description";"Note"
"56327";"31.03.2023";"0.7";"";"Interest";"CURRENCY is missing"

import rules:

$ cat test-import1.csv.rules
# hledger import rules

skip 0
separator ;

fields f1,f2,f3,f4,f5,f6

# debug; it prints only records those were accepted and included in the journal...
comment \n%f1;%f2;%f3;%f4;%f5;%f6

# define date format
date-format %d.%m.%Y

# skip header record
if %f1 ID
  skip

# skip records where "volume" field is empty.
# just a test that record can be ignored
if %f3 ^$
  skip

# other test for empty field.
#if ! %f3 .
#  skip

code  %f1
date  %f2
amount  %f3 %f4
description %f5 - %f6

# Currency is missing, define default value
# variable cannot be redefined :-( (%f4 USD)
# workarround - define currency or amount, like "amount %f3 USD"
if %f4 ^$
  amount %f3 USD
$ hledger -f test-import1.csv print
2023-01-31 (54321) Interest -
    ; 54321;31.01.2023;0.5;EUR;Interest;
    expenses:unknown         0.5 EUR
    income:unknown          -0.5 EUR

2023-02-28 (56324) Interest -
    ; 56324;28.02.2023;0.6;EUR;Interest;
    expenses:unknown         0.6 EUR
    income:unknown          -0.6 EUR

2023-03-31 (56327) Interest - CURRENCY is missing
    ; 56327;31.03.2023;0.7;;Interest;CURRENCY is missing
    expenses:unknown         0.7 USD
    income:unknown          -0.7 USD

It can even process file where some lines are not valid CSV records:

$ cat test-import2.csv
statement: 2023/01
account: 123456789/4321
owner: John Smith
start date: 1.1.2023
end date: 31.1.2023
opening balance:  100.00 EUR
closing balance:  100.50 EUR

"ID";"Date";"Volume";"Currency";"Description";"Note"
"54321";"31.01.2023";"0.5";"EUR";"Interest";""

statement: 2023/02
account: 123456789/4321
owner: John Smith
start date: 1.2.2023
end date: 28.2.2023
opening balance:  100.50 EUR
closing balance:  101.00 EUR

"ID";"Date";"Volume";"Currency";"Description";"Note"
"56324";"28.02.2023";"0.5";"EUR";"Interest";""
$ ln -sf test-import1.csv.rules test-import2.csv.rules
$ hledger -f test-import2.csv print
2023-01-31 (54321) Interest -
    ; 54321;31.01.2023;0.5;EUR;Interest;
    expenses:unknown         0.5 EUR
    income:unknown          -0.5 EUR

2023-02-28 (56324) Interest -
    ; 56324;28.02.2023;0.5;EUR;Interest;
    expenses:unknown         0.5 EUR
    income:unknown          -0.5 EUR
@simonmichael simonmichael added A-WISH Some kind of improvement request, hare-brained proposal, or plea. csv The csv file format, csv output format, or generally CSV-related. labels Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. csv The csv file format, csv output format, or generally CSV-related.
Projects
None yet
Development

No branches or pull requests

2 participants