New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client/asset/{dcr,btc}: RPC wallets TX History #2693
Conversation
I was able to hit a panic with loadbots:
Also see in logs a lot of this:
and
|
I fixed the concurrency issues. The DCR wallet was loading all previous transactions from the genesis block, which included a bunch of ticket votes and receiving transactions. It's good that this happened because it uncovered some other problems. Now, the DCR wallet will not load old transactions, but once #2694 is merged and we are loading the transaction history, I think we should use a separate trading wallet for the loadbot that is not buying/voting on tickets. |
client/asset/btc/btc.go
Outdated
wg, err := btc.connect(ctx) | ||
// A separate baseCtx is used to ensure that all subprocesses are complete | ||
// before the node is shut down. | ||
baseCtx, killBase := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is a little scary. Lemme see if I follow the reasoning. Basically, we want any process that might write to the DB to complete before we Close
the DB. If we didn't do this, we might potentially see errors in the logs about trying to write to a closed db. Is that right? How likely are those errors? Do they matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not just for the DB, but also for the client. There were errors in which the client was already shut down, but it was being called. I guess the errors may not matter, but imo it's more scary to be running code that calls things which are already shut down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I look at this the more I don't like it. If the context is canceled, we shouldn't be trying to use the wallet anyway. Are the various goroutines checking the context at the appropriate places? A stray context canceled error is ok, but if it happens often, we can probably resolve it without this background-derived context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I've removed the double wg from all wallets. There should only be some context cancelled/db already closed errors.
8f9e4ba
to
ba94e7b
Compare
client/asset/btc/txdb.go
Outdated
ctx, die := context.WithCancel(context.Background()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this instead of creating a connect function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a connect function.
client/asset/btc/txdb.go
Outdated
if err != badger.ErrConflict { | ||
return err | ||
} | ||
time.Sleep(10 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not using sleepTime
and time.Sleep
is not great. Can we pass in a context and use select
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by passing in a context. I only want it to sleep if there is a conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use time.Sleep
unmonitored like this. If they cancel the app context, this function should also quit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quitting in the middle of this function is kind of like quitting in the middle of a database transaction. The only reason this is sleeping is because the transaction previously conflicted with another transaction. I think we should let it retry instead of just dropping the transaction due to a conflict. This is not really "unmonitored ". In the new Connect
function, when the context is cancelled, no new transaction will be allowed. Then it will wait for all transactions to complete before closing the database.
client/asset/btc/txdb.go
Outdated
} | ||
|
||
func (db *BadgerTxDB) findFreeBlockKey(txn *badger.Txn, blockNumber uint64) ([]byte, error) { | ||
func (db *BadgerTxDB) newBlockKey(txn *badger.Txn, blockNumber uint64) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txn
is unused.
client/asset/btc/txdb.go
Outdated
getKey := func(i uint64) []byte { | ||
if blockNumber == 0 { | ||
return pendingKey(i) | ||
} | ||
return blockKey(blockNumber, i) | ||
} | ||
|
||
for i := uint64(0); ; i++ { | ||
key := getKey(i) | ||
_, err := txn.Get(key) | ||
if errors.Is(err, badger.ErrKeyNotFound) { | ||
return key, nil | ||
} | ||
if err != nil { | ||
return nil, err | ||
} | ||
seq, err := db.seq.Next() | ||
if err != nil { | ||
return nil, err | ||
} | ||
return getKey(seq), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper function seems pointless.
func (db *BadgerTxDB) newBlockKey(txn *badger.Txn, blockNumber uint64) ([]byte, error) {
seq, err := db.seq.Next()
if err != nil {
return nil, err
}
if blockNumber == 0 {
return pendingKey(seq), nil
}
return blockKey(blockNumber, seq), nil
}
return false, errors.New("cannot reconfigure to different wallet while there are active trades") | ||
} | ||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not calling newWallel.Disconnect()
on any of the error paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
if lastQuery != 0 && lastQuery < tip-blockQueryBuffer { | ||
blockToQuery = lastQuery - blockQueryBuffer | ||
} else if lastQuery >= tip-blockQueryBuffer { | ||
} else { | ||
blockToQuery = tip - blockQueryBuffer | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I connect the beta wallet, I don't see any transactions. I dug in a little bit, and I see that the first lastQuery
is 0
, which because of the lastQuery != 0
condition gives us only the tip - blockQueryBuffer
blocks to scan. Shouldn't it scan from 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is updated in #2748. This PR just deals with transactions that are made after the first time the wallet is connected to dexc
.
client/asset/btc/btc.go
Outdated
if rebroadcast && tx != nil { | ||
btc.subprocessWg.Add(1) | ||
go func() { | ||
defer btc.subprocessWg.Done() | ||
if hashSent, err := btc.node.sendRawTransaction(tx); err != nil { | ||
btc.log.Debugf("Rebroadcasting counterparty contract %v (THIS MAY BE NORMAL): %v", txHash, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the subprocesWg
for synchronization, but the call to sendRawTransaction
ultimately uses the baseCtx
, not the passed ctx
. I don't see how that might cause a deadlock or anything, but it's fundamentally incorrect to do that with this 2-context pattern, and another reason that I think this pattern is probably not the right solution.
client/asset/btc/btc.go
Outdated
wg, err := btc.connect(ctx) | ||
// A separate baseCtx is used to ensure that all subprocesses are complete | ||
// before the node is shut down. | ||
baseCtx, killBase := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I look at this the more I don't like it. If the context is canceled, we shouldn't be trying to use the wallet anyway. Are the various goroutines checking the context at the appropriate places? A stray context canceled error is ok, but if it happens often, we can probably resolve it without this background-derived context.
This implements the transaction history functionality for DCR and BTC RPC wallets. A separate DB is created for each wallet. The DBs are identified by one of the addresses that the wallet controls. On connect, the wallet will loop through each of the available DBs and see if the wallet controls the address used to identify DB.
This diff fixes multiple concurrency issues related to the transaction databases in DCR, BTC, and ETH. - The wallets are updated with a `subprocessWg` that tracks each of the goroutines started by the wallet. On shutdown, all goroutines must complete before the wallet clients and the transaction DBs are closed. - BaderDB returns an ErrConflict when an update transaction encounters a stale read. These are now handled by retrying with an exponential backoff. - The transaction databases now contain a WaitGroup that keeps track of the queries and garbage collection jobs. When `Close` is called, it waits for all jobs to finish before closing.
This implements the transaction history functionality for DCR and BTC RPC wallets. A separate DB is created for each wallet. The DBs are identified by one of the addresses that the wallet controls. On connect, the wallet will loop through each of the available DBs and see if the wallet controls the address used to identify DB.