Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client/asset/{dcr,btc}: RPC wallets TX History #2693

Merged
merged 6 commits into from May 3, 2024

Conversation

martonp
Copy link
Contributor

@martonp martonp commented Feb 5, 2024

This implements the transaction history functionality for DCR and BTC RPC wallets. A separate DB is created for each wallet. The DBs are identified by one of the addresses that the wallet controls. On connect, the wallet will loop through each of the available DBs and see if the wallet controls the address used to identify DB.

client/asset/btc/btc.go Outdated Show resolved Hide resolved
@JoeGruffins
Copy link
Member

JoeGruffins commented Feb 7, 2024

I was able to hit a panic with loadbots:

./loadbot -p sidestacker -mkt dcr_btc -trace -randomosc > ~/ocean/loadbot1.txt 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0xa689dd]

goroutine 16402 [running]:
github.com/dgraph-io/badger/skl.(*Skiplist).IncrRef(...)
        /home/joe/go/pkg/mod/github.com/dgraph-io/badger@v1.6.2/skl/skl.go:86
github.com/dgraph-io/badger.(*DB).getMemTables(0xc000a14000)
        /home/joe/go/pkg/mod/github.com/dgraph-io/badger@v1.6.2/db.go:493 +0xdd
github.com/dgraph-io/badger.(*DB).get(0xc000a14000, {0xc003071cc0, 0x19, 0x19})
        /home/joe/go/pkg/mod/github.com/dgraph-io/badger@v1.6.2/db.go:525 +0x6c
github.com/dgraph-io/badger.(*Txn).Get(0xc003472480, {0xc003486198, 0x11, 0x11})
        /home/joe/go/pkg/mod/github.com/dgraph-io/badger@v1.6.2/txn.go:407 +0x27e
decred.org/dcrdex/client/asset/btc.(*BadgerTxDB).findFreeBlockKey(0x1f29240?, 0x2aa67e0?, 0x506)
        /home/joe/git/dcrdex/client/asset/btc/txdb.go:123 +0x196
decred.org/dcrdex/client/asset/dcr.(*ExchangeWallet).checkPendingTxs.func2.(*BadgerTxDB).StoreTx.func2(0xc000a14000?)
        /home/joe/git/dcrdex/client/asset/btc/txdb.go:149 +0xd7
github.com/dgraph-io/badger.(*DB).Update(0xc000b2b480?, 0xc00253daa8)
        /home/joe/go/pkg/mod/github.com/dgraph-io/badger@v1.6.2/txn.go:696 +0x7a
decred.org/dcrdex/client/asset/btc.(*BadgerTxDB).StoreTx(...)
        /home/joe/git/dcrdex/client/asset/btc/txdb.go:142
decred.org/dcrdex/client/asset/dcr.(*ExchangeWallet).checkPendingTxs.func2({0x73, 0x3, 0x65, 0xa8, 0xd3, 0x12, 0xbe, 0xce, 0xbf, 0x1c, ...}, ...)
        /home/joe/git/dcrdex/client/asset/dcr/dcr.go:5791 +0x412
decred.org/dcrdex/client/asset/dcr.(*ExchangeWallet).checkPendingTxs(0xc00050b0e0, {0x1f32aa8?, 0xc000592730}, 0x51b)
        /home/joe/git/dcrdex/client/asset/dcr/dcr.go:5810 +0x97c
created by decred.org/dcrdex/client/asset/dcr.(*ExchangeWallet).handleTipChange in goroutine 697
        /home/joe/git/dcrdex/client/asset/dcr/dcr.go:6071 +0x331

Also see in logs a lot of this:

2024-02-07 15:45:28.593 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry
2024-02-07 15:45:28.593 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry
2024-02-07 15:45:28.782 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry
2024-02-07 15:45:29.074 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry
2024-02-07 15:45:29.074 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry
2024-02-07 15:45:29.341 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry
2024-02-07 15:45:29.590 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry
2024-02-07 15:45:29.590 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry
2024-02-07 15:45:30.125 [ERR] CORE:STACKER:1[dcr]: failed to store tx in tx history db: Transaction Conflict. Please retry

and

2024-02-07 16:00:41.903 [ERR] CORE:STACKER:0: blocking notification channel
2024-02-07 16:00:41.903 [ERR] CORE:STACKER:0: blocking notification channel
2024-02-07 16:00:41.904 [ERR] CORE:STACKER:0: blocking notification channel
2024-02-07 16:00:41.905 [ERR] CORE:STACKER:0: blocking notification channel
2024-02-07 16:00:41.906 [ERR] CORE:STACKER:0: blocking notification channel
2024-02-07 16:00:41.907 [ERR] CORE:STACKER:0: blocking notification channel
2024-02-07 16:00:41.909 [ERR] CORE:STACKER:0: blocking notification channel
2024-02-07 16:00:41.912 [ERR] CORE:STACKER:0: blocking notification channel
2024-02-07 16:00:41.915 [ERR] CORE:STACKER:0: blocking notification channel

@martonp
Copy link
Contributor Author

martonp commented Feb 10, 2024

I fixed the concurrency issues. The DCR wallet was loading all previous transactions from the genesis block, which included a bunch of ticket votes and receiving transactions. It's good that this happened because it uncovered some other problems. Now, the DCR wallet will not load old transactions, but once #2694 is merged and we are loading the transaction history, I think we should use a separate trading wallet for the loadbot that is not buying/voting on tickets.

client/asset/btc/btc.go Outdated Show resolved Hide resolved
client/asset/btc/btc.go Show resolved Hide resolved
client/asset/btc/btc.go Outdated Show resolved Hide resolved
client/asset/btc/btc.go Outdated Show resolved Hide resolved
wg, err := btc.connect(ctx)
// A separate baseCtx is used to ensure that all subprocesses are complete
// before the node is shut down.
baseCtx, killBase := context.WithCancel(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is a little scary. Lemme see if I follow the reasoning. Basically, we want any process that might write to the DB to complete before we Close the DB. If we didn't do this, we might potentially see errors in the logs about trying to write to a closed db. Is that right? How likely are those errors? Do they matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not just for the DB, but also for the client. There were errors in which the client was already shut down, but it was being called. I guess the errors may not matter, but imo it's more scary to be running code that calls things which are already shut down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look at this the more I don't like it. If the context is canceled, we shouldn't be trying to use the wallet anyway. Are the various goroutines checking the context at the appropriate places? A stray context canceled error is ok, but if it happens often, we can probably resolve it without this background-derived context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've removed the double wg from all wallets. There should only be some context cancelled/db already closed errors.

client/asset/btc/txdb.go Outdated Show resolved Hide resolved
Comment on lines 121 to 141
ctx, die := context.WithCancel(context.Background())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this instead of creating a connect function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a connect function.

if err != badger.ErrConflict {
return err
}
time.Sleep(10 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not using sleepTime and time.Sleep is not great. Can we pass in a context and use select?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by passing in a context. I only want it to sleep if there is a conflict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use time.Sleep unmonitored like this. If they cancel the app context, this function should also quit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quitting in the middle of this function is kind of like quitting in the middle of a database transaction. The only reason this is sleeping is because the transaction previously conflicted with another transaction. I think we should let it retry instead of just dropping the transaction due to a conflict. This is not really "unmonitored ". In the new Connect function, when the context is cancelled, no new transaction will be allowed. Then it will wait for all transactions to complete before closing the database.

}

func (db *BadgerTxDB) findFreeBlockKey(txn *badger.Txn, blockNumber uint64) ([]byte, error) {
func (db *BadgerTxDB) newBlockKey(txn *badger.Txn, blockNumber uint64) ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txn is unused.

Comment on lines 174 to 184
getKey := func(i uint64) []byte {
if blockNumber == 0 {
return pendingKey(i)
}
return blockKey(blockNumber, i)
}

for i := uint64(0); ; i++ {
key := getKey(i)
_, err := txn.Get(key)
if errors.Is(err, badger.ErrKeyNotFound) {
return key, nil
}
if err != nil {
return nil, err
}
seq, err := db.seq.Next()
if err != nil {
return nil, err
}
return getKey(seq), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper function seems pointless.

func (db *BadgerTxDB) newBlockKey(txn *badger.Txn, blockNumber uint64) ([]byte, error) {
	seq, err := db.seq.Next()
	if err != nil {
		return nil, err
	}
	if blockNumber == 0 {
		return pendingKey(seq), nil
	}
	return blockKey(blockNumber, seq), nil
}

return false, errors.New("cannot reconfigure to different wallet while there are active trades")
}
return true, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not calling newWallel.Disconnect() on any of the error paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 5814 to 5794

if lastQuery != 0 && lastQuery < tip-blockQueryBuffer {
blockToQuery = lastQuery - blockQueryBuffer
} else if lastQuery >= tip-blockQueryBuffer {
} else {
blockToQuery = tip - blockQueryBuffer
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I connect the beta wallet, I don't see any transactions. I dug in a little bit, and I see that the first lastQuery is 0, which because of the lastQuery != 0 condition gives us only the tip - blockQueryBuffer blocks to scan. Shouldn't it scan from 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated in #2748. This PR just deals with transactions that are made after the first time the wallet is connected to dexc.

Comment on lines 4102 to 4075
if rebroadcast && tx != nil {
btc.subprocessWg.Add(1)
go func() {
defer btc.subprocessWg.Done()
if hashSent, err := btc.node.sendRawTransaction(tx); err != nil {
btc.log.Debugf("Rebroadcasting counterparty contract %v (THIS MAY BE NORMAL): %v", txHash, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the subprocesWg for synchronization, but the call to sendRawTransaction ultimately uses the baseCtx, not the passed ctx. I don't see how that might cause a deadlock or anything, but it's fundamentally incorrect to do that with this 2-context pattern, and another reason that I think this pattern is probably not the right solution.

wg, err := btc.connect(ctx)
// A separate baseCtx is used to ensure that all subprocesses are complete
// before the node is shut down.
baseCtx, killBase := context.WithCancel(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look at this the more I don't like it. If the context is canceled, we shouldn't be trying to use the wallet anyway. Are the various goroutines checking the context at the appropriate places? A stray context canceled error is ok, but if it happens often, we can probably resolve it without this background-derived context.

This implements the transaction history functionality for DCR and BTC RPC
wallets. A separate DB is created for each wallet. The DBs are identified
by one of the addresses that the wallet controls. On connect, the wallet
will loop through each of the available DBs and see if the wallet controls
the address used to identify DB.
This diff fixes multiple concurrency issues related to the transaction
databases in DCR, BTC, and ETH.

- The wallets are updated with a `subprocessWg` that tracks each of the
  goroutines started by the wallet. On shutdown, all goroutines must
  complete before the wallet clients and the transaction DBs are closed.
- BaderDB returns an ErrConflict when an update transaction encounters a
  stale read. These are now handled by retrying with an exponential backoff.
- The transaction databases now contain a WaitGroup that keeps track of
  the queries and garbage collection jobs. When `Close` is called, it
  waits for all jobs to finish before closing.
@buck54321 buck54321 merged commit 2a419f3 into decred:master May 3, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants