Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication key on gl_journal_entries causes incomplete ledger #19

Open
robindebois opened this issue Oct 12, 2020 · 4 comments
Open

Replication key on gl_journal_entries causes incomplete ledger #19

robindebois opened this issue Oct 12, 2020 · 4 comments

Comments

@robindebois
Copy link

robindebois commented Oct 12, 2020

I am not 100% sure if the behaviour displayed by the tap is intentional or accidental; please feel free to discard issue is this was by design.

Expected behaviour

When loading journal entries with the tap I expected to (eventually) end up with all journal entries that are available in Mambu.

Actual behaviour

The tap does an incremental load with the booking_date as a replication key. However, there are multiple reasons why the choice of key leads to an incomplete ledger:

  • reversal operations on transactions have backdated booking_dates
  • actual transactions in Mambu can be backdated, and this is common practice in finance: internal payment processes typically lag behind the physical execution and therefore will need to be backdated all the time

This also means for accounting purposes this data becomes useless, as the sum of all general ledger accounts doesn't sum to 0 anymore.

Steps to reproduce

  • create a transaction in Mambu with an entry date (booking date) before the last run of the tap
  • re-run the tap: the entry will not be loaded.

How to fix:

As far as I can tell changing the replication key to creation_date will resolve it.

@tomekzbrozek
Copy link

@robindebois I just came across the same issue when working with Mambu GL Journal Entries endpoint and it seems to me that creation_date can't be used as a replication key because value of that replication key is eventually passed to API request parameters which are:

from | Start date - The first booking date you want to retrieve journal entries from. Required
-- | --
to | End date - The last booking date you want to retrieve journal entries from. Required

I'm not entirely sure, but I guess that booking date == entry_date which forces us to use this column as a replication key, otherwise we'd be comparing apples to oranges (i.e. passing creation_date to API request which is then translated into booking date by mambu API).

I came across this because I realized that every day, a couple of journal entries were not loaded into my database when using INCREMENTAL replication, exactly because the difference between their creation_date and entry_date is bigger than the width of my incremental replication strategy.

@tomekzbrozek
Copy link

@robindebois I can see that this tap uses neither creation_date or entry_date but booking_date: https://github.com/singer-io/tap-mambu/blob/master/tap_mambu/schema.py#L95-L99 which seems to be most relevant. Seems to me that the issue can be closed.

@tomekzbrozek
Copy link

tomekzbrozek commented Jan 13, 2021

I tried replicating gl_journal_entries data incrementally with the up-to-date mambu tap but it misses all journal entries which have booking_date more than 1 day earlier than creation_date. So the ledger is still incomplete.

The only solution I can think of for now is swapping replication method from INCREMENTAL to FULL.

@robindebois
Copy link
Author

robindebois commented Jan 13, 2021

Hi Tom,
The tap uses the v1 POST /gljournalentries:search endpoint (not the GET one, which you seem to suggest).. which in its body has the following filter criterium (see https://github.com/singer-io/tap-mambu/blob/master/tap_mambu/sync.py#L680):

{
                "filterConstraints": [
                    {
                        "filterSelection": "CREATION_DATE",
                        "filterElement": "BETWEEN",
                        "value": '{gl_journal_entries_from_dt_str}',
                        "secondValue": "{now_date_str}"
                    }
                ]
            }

As far as my ticket was concerned -- let's step over the intense confusion in which the Mambu documentation leaves me on value/booking and entry dates -- I think both booking_date and entry_date or value_date might not be the fields to replicate on incrementally.

Basically every search iteration, it should always get all the entries made between the most recently captured creation_date and now; and the reason is that all the other dates can be in the past, so occurring before the bookmark in the state file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants