fix(#2562): Prevent transaction deduplication for imported transactions #2618

ttlgeek · 2024-04-16T23:00:35Z

Hello,

After looking into the issue, it seems that the transaction deduplication is due to the two transactions having the same amount and account and the second transaction happened within 7 days of the first one.

After reading the import transaction code, the fuzzy matching matching uses the 7 days method to try to match similar transactions and merge them.

Proposed solution: prevent the fuzzy matching from matching imported transactions (where import_id is not null) to prevent this behavior.

@MatissJanis what do you think ?

…d transactions

netlify · 2024-04-16T23:00:53Z

✅ Deploy Preview for actualbudget ready!

Name	Link
🔨 Latest commit	`b54fab0`
🔍 Latest deploy log	https://app.netlify.com/sites/actualbudget/deploys/663b56fa5374fd000809d798
😎 Deploy Preview	https://deploy-preview-2618.demo.actualbudget.org
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

github-actions · 2024-04-16T23:01:49Z

Bundle Stats — desktop-client

Hey there, this message comes from a GitHub action that helps you and reviewers to understand how these changes affect the size of this project's bundle.

As this PR is updated, I'll keep you updated on how the bundle size is impacted.

Total

Files count	Total bundle size	% Changed
9	4.71 MB	0%

Changeset

No files were changed

View detailed bundle breakdown

Added

No assets were added

Removed

No assets were removed

Bigger

No assets were bigger

Smaller

No assets were smaller

Unchanged

Asset	File Size	% Changed
static/js/indexeddb-main-thread-worker-e59fee74.js	13.5 kB	0%
static/js/resize-observer.js	18.37 kB	0%
static/js/usePreviewTransactions.js	790 B	0%
static/js/BackgroundImage.js	122.29 kB	0%
static/js/AppliedFilters.js	21.16 kB	0%
static/js/narrow.js	59.81 kB	0%
static/js/wide.js	262.38 kB	0%
static/js/ReportRouter.js	1.23 MB	0%
static/js/index.js	3 MB	0%

github-actions · 2024-04-16T23:01:50Z

Bundle Stats — loot-core

Hey there, this message comes from a GitHub action that helps you and reviewers to understand how these changes affect the size of this project's bundle.

As this PR is updated, I'll keep you updated on how the bundle size is impacted.

Total

Files count	Total bundle size	% Changed
1	1.2 MB → 1.2 MB (+24 B)	+0.00%

Changeset

File	Δ	Size
`packages/loot-core/src/server/accounts/sync.ts`	📈 +24 B (+0.11%)	22.13 kB → 22.16 kB

View detailed bundle breakdown

Added

No assets were added

Removed

No assets were removed

Bigger

Asset	File Size	% Changed
kcab.worker.js	1.2 MB → 1.2 MB (+24 B)	+0.00%

Smaller

No assets were smaller

Unchanged

No assets were unchanged

MatissJanis · 2024-04-17T07:04:45Z

packages/loot-core/src/server/accounts/sync.ts

@@ -492,7 +492,7 @@ export async function reconcileTransactions(
      // fields.
      fuzzyDataset = await db.all(
        `SELECT id, is_parent, date, imported_id, payee, category, notes, reconciled FROM v_transactions
-           WHERE date >= ? AND date <= ? AND amount = ? AND account = ?`,
+           WHERE date >= ? AND date <= ? AND amount = ? AND account = ? AND imported_id IS NULL`,


💭 thought: ‏this is an interesting proposal, but I think it would break an existing use-case.

There are folks that do both bank-sync + manual imports (don't ask me why.. I couldn't explain that. It's just my observation from the questions on discord). So for these folks the bank-sync imports would work fine with your patch. However, then the subsequent manual csv imports would break as they would create duplicates.

(also semi-related note - lately I've been considering enabling the "import" button even for accounts that have bank-sync enabled)

There is a new addition to the import modal to skip the reconciliation logic entirely. It's a checkbox you can turn on. Would using that solve your problem?

I might be able to shed some light on the reasons:

not everyone will be interacting with actual every single month & several banks heavily restrict how far syncs go back, so there might be gaps with just sync (me, sometimes :) )

some people might not trust the syncing (from either Actual or their bank's side) enough and want to manually import their year overview (f.e.)

(the checkbox would work perfect for my use-case, as I rely on it to avoid duplicates)

There is a new addition to the import modal to skip the reconciliation logic entirely. It's a checkbox you can turn on.

Disabling all reconciliation / dedup is too aggressive of a fix for #2562 .

In cases where people trust the data they're importing, and they trust that the imported IDs will correctly match, they just want to avoid magic heuristics from silently deleting their transactions.

jacopo-j · 2024-04-20T13:46:29Z

Hi, I took a look at this PR (context: I'm the author of #2562). I understand MatissJanis' point, and I believe the issue could be approached in a different way to avoid breaking existing use cases.

Do not completely exclude transactions that have a imported_id != NULL from the deduplication logic. Instead, skip the deduplication of two transactions only if both transactions have an imported_id set and the IDs are different.

(On a side note, I believe reconciled transactions should be excluded from deduplication altogether, instead.)

I think this would both solve the issue I reported and avoid breaking the other use case. Hopefully it makes sense.

MatissJanis · 2024-04-20T17:54:29Z

Do not completely exclude transactions that have a imported_id != NULL from the deduplication logic. Instead, skip the deduplication of two transactions only if both transactions have an imported_id set and the IDs are different.

I think this would work. Can't come up with any counter arguments for this at the moment.

strazto · 2024-05-07T07:16:05Z

Instead, skip the deduplication of two transactions only if both transactions have an imported_id set and the IDs are different.

This sounds good to me.
When I import from an OFX file, what I'm expecting is an option to disable deduplication heuristics, not to disable deduplication entirely.

pmoon00 · 2024-05-10T04:48:48Z

@ttlgeek Just wondering if you were going to continue this work as suggested by the comments above. If not I can give it a go. It would be my first contribution so would probably take longer, but got to start somewhere. Let me know what your plans are.

strazto · 2024-05-16T03:34:16Z

@pmoon00

@ttlgeek Just wondering if you were going to continue this work as suggested by the comments above. If not I can give it a go.

Looks like for whatever reason he's not coming back to it anytime soon. Possibly he's deployed his forked branch to his own instance and is happy with its' behaviour.

It would be my first contribution so would probably take longer, but got to start somewhere.

This PR / branch is probably still good basis for your work btw - Take the work here and then apply @MatissJanis + @jacopo-j 's comments (Should mainly be a change to the SQL query that Matiss commented on) and you shouldn't have to do too much to get approved.

strazto · 2024-05-16T03:34:30Z

In fact, I may also have a go at this over my lunch break now

strazto · 2024-05-16T04:05:30Z

@jacopo-j re-reading your comment

Do not completely exclude transactions that have a imported_id != NULL from the deduplication logic.

Instead, skip the deduplication of two transactions only if both transactions have an imported_id set and the IDs are different.

I think @ttlgeek 's PR already does what you're suggesting, it's just not obvious from the default diff.
Expanding the context of the diff yields the following:

actual/packages/loot-core/src/server/accounts/sync.ts

Lines 380 to 402 in e5933ad

    
               // First, match with an existing transaction's imported_id. This 
        
               // is the highest fidelity match and should always be attempted 
        
               // first. 
        
               if (trans.imported_id) { 
        
                 match = await db.first( 
        
                   'SELECT * FROM v_transactions WHERE imported_id = ? AND account = ?', 
        
                   [trans.imported_id, acctId], 
        
                 ); 
        
                 if (match) { 
        
                   hasMatched.add(match.id); 
        
                 } 
        
               } 
        
               // If it didn't match, query data needed for fuzzy matching 
        
               if (!match) { 
        
                 // Look 7 days ahead and 7 days back when fuzzy matching. This 
        
                 // needs to select all fields that need to be read from the 
        
                 // matched transaction. See the final pass below for the needed 
        
                 // fields. 
        
                 fuzzyDataset = await db.all( 
        
                   `SELECT id, is_parent, date, imported_id, payee, category, notes, reconciled FROM v_transactions 
        
                      WHERE date >= ? AND date <= ? AND amount = ? AND account = ?`,

To rephrase your specification:

Do not completely exclude transactions that have a imported_id != NULL from the deduplication logic.
Instead, skip the deduplication of two transactions only if both transactions have an imported_id set and the IDs are different.

if tx_in has imported_id:
  if tx_in.imported_id corresponds to existing_tx:
    dedup using IDs
    exit
    
# In the above, both transactions have imported ID, but the IDs match, so no skip

# The following logic must hold for the the rest of this flow:
# tx_in has no imported_id
# OR
# tx_in's imported_id does not match (ids are different)

if an existing_tx has no imported_id:
  dedup using heuristics
  exit

# (
# tx_in has no imported_id
# OR
# tx_in.imported_id does not match (ids are different)
#  )
# AND existing_tx has no imported_id

This behaviour is consistent with @ttlgeek 's PR.

MatissJanis

I was trying to poke holes in the logic, but couldn't find any issues. Thanks for correcting me!

I'm happy to merge this, but I'd like another maintainer to review it since it's a risky change to the reconciliation logic.

strazto · 2024-05-17T07:55:46Z

I was trying to poke holes in the logic, but couldn't find any issues. Thanks for correcting me!

This morning I thought about it and had a feeling I may be overlooking something.
It's definitely tricky inverting the logic between what the SQL statements are doing and the idea of "when to skip".

I think possibly there's an issue with this being the skip case:

# (
# tx_in has no imported_id
# OR
# tx_in.imported_id does not match (ids are different)
#  )
# AND existing_tx has no imported_id

Because that includes tx_in.imported_id == NULL && existing_tx.imported_id == NULL which is NOT a skip case.
Your instinct to double check is valid.

I think the change is small, however, we can just add another condition to ttlgeek's original SQL modification.

In terms of style, I think we should have linebreaks that delineate the logic more clearly in the SQL statements, and comments,

`SELECT id, is_parent, date, imported_id, payee, category, notes, reconciled FROM v_transactions
   WHERE date >= ? AND date <= ? AND amount = ? AND account = ? 
   -- Only perform it if either ID is null
   AND ( imported_id IS NULL OR ? IS NULL)`, //  tx_in.imported_id should be last parameter

pmoon00 · 2024-05-17T08:26:30Z

I was trying to poke holes in the logic, but couldn't find any issues. Thanks for correcting me!

This morning I thought about it and had a feeling I may be overlooking something.

It's definitely tricky inverting the logic between what the SQL statements are doing and the idea of "when to skip".

I think possibly there's an issue with this being the skip case:
# (

# tx_in has no imported_id

# OR

# tx_in.imported_id does not match (ids are different)

#  )

# AND existing_tx has no imported_id 
Because that includes tx_in.imported_id == NULL && existing_tx.imported_id == NULL which is NOT a skip case.

Your instinct to double check is valid.

I think the change is small, however, we can just add another condition to ttlgeek's original SQL modification.

In terms of style, I think we should have linebreaks that delineate the logic more clearly in the SQL statements, and comments,
`SELECT id, is_parent, date, imported_id, payee, category, notes, reconciled FROM v_transactions

   WHERE date >= ? AND date <= ? AND amount = ? AND account = ? 

   -- Only perform it if either ID is null

   AND ( imported_id IS NULL OR ? IS NULL)`, //  tx_in.imported_id should be last parameter

Would it be clearer if the AND imported_id IS NULL predicate is only included if the tx_in.imported_id is set/exists?

I'm not sure what the conventions around dynamic SQL predicates are in this project though.

strazto · 2024-05-17T08:33:08Z

Would it be clearer if the AND imported_id IS NULL predicate is only included if the tx_in.imported_id is set/exists?

I'm not sure what the conventions around dynamic SQL predicates are in this project though.

Different branching logic overall might be preferable and more readable, however my assumption is that dynamic SQL should be avoided.
It's more likely to lead to bugs and could potentially become a vector for injection vulns (not very likely in this case).

It's true that this statement slightly violates DRY with the code branch that checks that tx_in.imported_id is null, but I don't think it's a big problem.

AND ( imported_id IS NULL OR ? IS NULL)

strazto · 2024-05-17T10:20:38Z

Hi @MatissJanis @pmoon00 I've taken a stab at this in my own PR #2770 which bases off of @ttlgeek 's branch (though without all the master -> feature merge commits because I think those are ugly).

The only difference is the logic of the SQL statement, which now allows the dedup to occur if an existing transaction OR the input transaction are null.
There's also a formatting difference in the SQL statement (I've re-ordered it and added linebreaks + comments to logically separate the fuzzy logic from the null conditioning )

fix(actualbudget#2562): Prevent transaction deduplication for importe…

a5ebcc6

…d transactions

trafico-bot bot added the 🔍 Ready for Review Pull Request is not reviewed yet label Apr 16, 2024

github-actions bot changed the title ~~fix(#2562): Prevent transaction deduplication for imported transactions~~ [WIP] fix(#2562): Prevent transaction deduplication for imported transactions Apr 16, 2024

trafico-bot bot added 🚧 WIP Still work-in-progress, please don't review and don't merge and removed 🔍 Ready for Review Pull Request is not reviewed yet labels Apr 16, 2024

ttlgeek added 2 commits April 17, 2024 00:09

chore(): eslint fixes

17159e8

chore(): Add release note file

9905c47

MatissJanis reviewed Apr 17, 2024

View reviewed changes

ttlgeek mentioned this pull request Apr 17, 2024

[Bug]: Transaction deduplication happens even if transactions have different imported_ids #2562

Open

2 tasks

ttlgeek added 3 commits April 17, 2024 23:53

Merge branch 'master' into fix-2526

3fd1321

Merge branch 'master' into fix-2526

372d38c

Merge branch 'master' into fix-2526

351bccf

ttlgeek added 5 commits April 21, 2024 21:18

Merge branch 'master' into fix-2526

a4f1203

Merge branch 'master' into fix-2526

5449df3

Merge branch 'master' into fix-2526

fe33ef0

Merge branch 'master' into fix-2526

3139f79

Merge branch 'master' into fix-2526

54c94f8

strazto mentioned this pull request May 7, 2024

Add option to disable reconciliation when importing OFX files. #2564

Merged

Merge branch 'master' into fix-2526

b54fab0

MatissJanis approved these changes May 16, 2024

View reviewed changes

MatissJanis changed the title ~~[WIP] fix(#2562): Prevent transaction deduplication for imported transactions~~ fix(#2562): Prevent transaction deduplication for imported transactions May 16, 2024

trafico-bot bot added ✅ Approved Pull Request has been approved and can be merged and removed 🚧 WIP Still work-in-progress, please don't review and don't merge labels May 16, 2024

strazto mentioned this pull request May 17, 2024

fix(#2562): Prevent transaction deduplication for imported transactions #2770

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#2562): Prevent transaction deduplication for imported transactions #2618

fix(#2562): Prevent transaction deduplication for imported transactions #2618

ttlgeek commented Apr 16, 2024 •

edited

netlify bot commented Apr 16, 2024 •

edited

github-actions bot commented Apr 16, 2024 •

edited

github-actions bot commented Apr 16, 2024 •

edited

MatissJanis Apr 17, 2024

jlsjonas Apr 29, 2024 •

edited

strazto May 7, 2024

jacopo-j commented Apr 20, 2024 •

edited

MatissJanis commented Apr 20, 2024

strazto commented May 7, 2024

pmoon00 commented May 10, 2024

strazto commented May 16, 2024

strazto commented May 16, 2024

strazto commented May 16, 2024 •

edited

MatissJanis left a comment

strazto commented May 17, 2024

pmoon00 commented May 17, 2024

strazto commented May 17, 2024

strazto commented May 17, 2024

fix(#2562): Prevent transaction deduplication for imported transactions #2618

Are you sure you want to change the base?

fix(#2562): Prevent transaction deduplication for imported transactions #2618

Conversation

ttlgeek commented Apr 16, 2024 • edited

netlify bot commented Apr 16, 2024 • edited

✅ Deploy Preview for actualbudget ready!

github-actions bot commented Apr 16, 2024 • edited

Bundle Stats — desktop-client

github-actions bot commented Apr 16, 2024 • edited

Bundle Stats — loot-core

MatissJanis Apr 17, 2024

Choose a reason for hiding this comment

jlsjonas Apr 29, 2024 • edited

Choose a reason for hiding this comment

strazto May 7, 2024

Choose a reason for hiding this comment

jacopo-j commented Apr 20, 2024 • edited

MatissJanis commented Apr 20, 2024

strazto commented May 7, 2024

pmoon00 commented May 10, 2024

strazto commented May 16, 2024

strazto commented May 16, 2024

strazto commented May 16, 2024 • edited

MatissJanis left a comment

Choose a reason for hiding this comment

strazto commented May 17, 2024

pmoon00 commented May 17, 2024

strazto commented May 17, 2024

strazto commented May 17, 2024

ttlgeek commented Apr 16, 2024 •

edited

netlify bot commented Apr 16, 2024 •

edited

github-actions bot commented Apr 16, 2024 •

edited

github-actions bot commented Apr 16, 2024 •

edited

jlsjonas Apr 29, 2024 •

edited

jacopo-j commented Apr 20, 2024 •

edited

strazto commented May 16, 2024 •

edited