Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canonicalizing txn pointer in NetworkOPsImp::processTransaction leads to race condition/incorrect txn result (Version: 2.1.0) #4957

Open
ubuddemeier opened this issue Mar 19, 2024 · 3 comments

Comments

@ubuddemeier
Copy link

Issue Description

The call to canonicalize here: https://github.com/XRPLF/rippled/blob/master/src/ripple/app/misc/NetworkOPs.cpp#L1232
can lead to a race condition, if a transaction is processed synchronously (i.e. "local"), while the same transaction is processed again asynchronously (e.g. received from a peer). See the following log excerpt pertaining to a particular transaction. I added additional logging not present in repo, ID1 stands for the txn ID. Note how the transaction result changes while doSubmit is still busy converting the results. This ultimately leads to an incorrect result returned by doSubmit.

2024-Mar-19 15:28:02.615291761 UTC RPCHandler:NFO doSubmit calling_processTransaction TX ID1 ptr=0x7f1bc017a320
2024-Mar-19 15:28:02.615395205 UTC NetworkOPs:NFO NetworkOPsImp::processTransaction before_canonicalize TX ID1 ptr=0x7f1bc017a320
2024-Mar-19 15:28:02.616145354 UTC NetworkOPs:NFO NetworkOPsImp::processTransaction after_canonicalize TX ID1 ptr=0x7f1bc017a320
2024-Mar-19 15:28:02.616840113 UTC NetworkOPs:NFO NetworkOPsImp::doTransactionSync setApplying TX ID1 ptr=0x7f1bc017a320
2024-Mar-19 15:28:02.617390617 UTC NetworkOPs:NFO NetworkOPsImp::apply TX ID1 result=tesSUCCESS ptr=0x7f1bc017a320 local=true
2024-Mar-19 15:28:02.617481900 UTC NetworkOPs:NFO NetworkOPsImp::apply clearApplying TX ID1 ptr=0x7f1bc017a320 local=true
2024-Mar-19 15:28:02.638734806 UTC Protocol:NFO [003] PeerImp::handleTransaction TX ID1
2024-Mar-19 15:28:02.641427701 UTC NetworkOPs:NFO NetworkOPsImp::processTransaction before_canonicalize TX ID1 ptr=0x7f1bc80c06d0
2024-Mar-19 15:28:02.641445326 UTC NetworkOPs:NFO NetworkOPsImp::processTransaction after_canonicalize TX ID1 ptr=0x7f1bc017a320
2024-Mar-19 15:28:02.641599727 UTC NetworkOPs:NFO NetworkOPsImp::doTransactionAsync setApplying TX ID1 ptr=0x7f1bc017a320
2024-Mar-19 15:28:02.641772745 UTC NetworkOPs:NFO NetworkOPsImp::apply TX ID1 result=tefPAST_SEQ ptr=0x7f1bc017a320 local=false
2024-Mar-19 15:28:02.642194166 UTC NetworkOPs:NFO NetworkOPsImp::apply clearApplying TX ID1 ptr=0x7f1bc017a320 local=false
2024-Mar-19 15:28:02.638380244 UTC RPCHandler:NFO doSubmit returned_from_processTransaction TX ID1 res=tesSUCCESS ptr=0x7f1bc017a320
2024-Mar-19 15:28:02.643990745 UTC RPCHandler:NFO doSubmit done_with TX ID1 res=tefPAST_SEQ ptr=0x7f1bc017a320

Steps to Reproduce

Python script to generate requests:

import hashlib, time, random, asyncio
from xrpl.wallet import Wallet
from xrpl.models.transactions import Payment
from xrpl.models.response import ResponseStatus
from xrpl.transaction import sign
from xrpl.asyncio.transaction import submit
from xrpl.asyncio.ledger import get_latest_validated_ledger_sequence
from xrpl.asyncio.account import get_next_valid_seq_number, get_balance
from xrpl.asyncio.clients import AsyncWebsocketClient
from xrpl.core.addresscodec import encode_seed
from xrpl import CryptoAlgorithm
from xrpl.asyncio.clients.exceptions import XRPLRequestFailureException

Note: edit xrpl/asyncio/clients/websocket_base.py as follows:

self._websocket = await websocket_client.connect(self.url, extra_headers={"X-User:" : "anon"})

url = "ws://IP_ADDRESS_GOES_HERE:60516"
fee = 10
nWallets = 100
nTasks = 0
targetThroughput = 0
runTime = 60
fundAmount = 1000000000

rnd = random.Random(1)

def seed_from_pass(passphrase):
return encode_seed(hashlib.sha512(passphrase).digest()[:16], CryptoAlgorithm.SECP256K1)

def wallet_from_seed(passphrase):
return Wallet.from_seed(seed_from_pass(passphrase), algorithm=CryptoAlgorithm.SECP256K1)

nTxns = 0
tStart = 0
async def generate_load(client2, wallets):
global nTxns, rnd
nWallets = len(wallets)
if nWallets == 0:
return
async with AsyncWebsocketClient(url) as client:
while (t := time.perf_counter()) < tStart + runTime:
nTxns += 1
# rate control
if targetThroughput > 0:
tTarget = tStart + nTxns / targetThroughput
if tTarget > tStart + runTime:
nTxns -= 1
break
if (t < tTarget):
await asyncio.sleep(tTarget - t)
# generate the txn
wRecv = rnd.choice(wallets)
iSend = rnd.randrange(nWallets)
iOffset = 0
while True:
wSend = wallets[(iSend + iOffset) % nWallets]
if not wSend.busy and wSend.address != wRecv.address:
break
iOffset += 1
if iOffset == nWallets:
iOffset = 0
await asyncio.sleep(0) # all wallets are busy, need to wait for one to become available
tx = sign(Payment(account=wSend.address, amount="1", destination=wRecv.address, sequence=wSend.seq, fee=str(fee)), wSend)
# dispatch the txn
wSend.busy = True
response = await submit(tx, client)
failed = True
if response.status == ResponseStatus.SUCCESS:
if response.result['engine_result'] == 'tesSUCCESS':
wSend.seq += 1
failed = False
else:
if 'account_sequence_next' in response.result:
wSend.seq = int(response.result['account_sequence_next'])
else:
wSend.seq = await get_next_valid_seq_number(wSend.address, client)
wSend.busy = False
if failed:
print(f"wallet:{wSend.id} FAILED TX {tx.get_hash()} {response}")

async def main():

make some wallets

print(f"Generating wallets...")
master = wallet_from_seed(b"masterpassphrase")
wallets = [wallet_from_seed(b"test-%d" % i) for i in range(nWallets)]
for i in range(len(wallets)):
wallets[i].id = i
wallets[i].busy = False

async with AsyncWebsocketClient(url) as client:
if fundAmount > 0:
# fund wallets
print(f"Funding wallets...")
masterSeq = await get_next_valid_seq_number(master.address, client)
for w in wallets:
try:
bal = await get_balance(w.address, client)
except XRPLRequestFailureException:
bal = 0
if (bal < fundAmount):
tx = Payment(account=master.address, amount=str(fundAmount - bal), destination=w.address, sequence=masterSeq, fee=str(fee))
masterSeq += 1
txSigned = sign(tx, master)
elif (bal > fundAmount):
seq = await get_next_valid_seq_number(w.address, client)
tx = Payment(account=w.address, amount=str(bal - fundAmount), destination=master.address, sequence=seq, fee=str(fee))
txSigned = sign(tx, w)
else:
continue
response = await submit(txSigned, client)
if response.status != ResponseStatus.SUCCESS or response.result['engine_result'] != 'tesSUCCESS':
print(f"txn failed: {response.result}")
exit(1)

print(f"Fetching sequence numbers...")
for w in wallets:
  w.seq = await get_next_valid_seq_number(w.address, client)

print(f"Running load test...")    
global nTasks
if nTasks == 0 or nTasks > nWallets:
  nTasks = nWallets
tasks = [generate_load(client, wallets) for _ in range(nTasks)]
global tStart
tStart = time.perf_counter() 
await asyncio.gather(*tasks)

print(f"Result: {nTxns} txns {nTxns / runTime} txns/s")

if name == "main":
asyncio.run(main())

Expected Result

All transactions succeed.

Actual Result

Some transactions return tefPAST_SEQ although the txns have never been seen before, e.g.: (ID1 same as above)

wallet:63 FAILED TX ID1 Response(status=<ResponseStatus.SUCCESS: 'success'>, result={'accepted': False, 'account_sequence_available': 10, 'account_sequence_next': 10, 'applied': False, 'broadcast': False, 'engine_result': 'tefPAST_SEQ', 'engine_result_code': -190, 'engine_result_message': 'This sequence number has already passed.', 'kept': False, 'open_ledger_cost': '10', 'queued': False, 'tx_blob': '1200002200000000240000000961400000000000000168400000000000000A732102D6A4BE0770363D5815C5AE550C11A530E9DCEC2AAE62DFCF800583A47C44B6CD7446304402200D467F5443159BE926B3DF24E509A7F3C710BB21AB41498B16F1E36BB8CBB187022064049CCBA5B6C8ABC3057B7749733459ECF0B27C1D50099ED43EF6D9A7B0F5248114BDEDD0D17B55022C7F4F6D94752D889C7CB7929E83141AF85539F475E61DDC7BE0D454C67187FDC42A3B', 'tx_json': {'Account': 'rJKEbyiTLskpBMjHmVGqWCQ8628HK2XUQ3', 'Amount': '1', 'Destination': 'rsTchTjz2PtXyTMvHgz8Gvw65ojuUkmMvB', 'Fee': '10', 'Flags': 0, 'Sequence': 9, 'SigningPubKey': '02D6A4BE0770363D5815C5AE550C11A530E9DCEC2AAE62DFCF800583A47C44B6CD', 'TransactionType': 'Payment', 'TxnSignature': '304402200D467F5443159BE926B3DF24E509A7F3C710BB21AB41498B16F1E36BB8CBB187022064049CCBA5B6C8ABC3057B7749733459ECF0B27C1D50099ED43EF6D9A7B0F524', 'hash': 'ID1'}, 'validated_ledger_index': 13}, id='submit_267', type=<ResponseType.RESPONSE: 'response'>)

Environment

Linux ip-xx-xx-xx-xx 6.1.66-93.164.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jan 2 23:50:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

rippled version 2.1.0, built from branch 'master'

Supporting Files

This is on a testnet with 3 nodes. Contents of rippled.cfg:

[server]
port_rpc_admin_local
port_peer
port_https_rpc_public
port_http_rpc_public
port_wss_public
port_ws_public

[port_rpc_admin_local]
port = 5005
ip = 127.0.0.1
admin = 127.0.0.1
protocol = https

[port_peer]
port = PEER_PORT
ip = 0.0.0.0
protocol = peer

[port_https_rpc_public]
port = 60505
ip = 0.0.0.0
protocol = https
secure_gateway = 0.0.0.0

[port_http_rpc_public]
port = 60515
ip = 0.0.0.0
protocol = http
secure_gateway = 0.0.0.0

[port_wss_public]
port = 60506
ip = 0.0.0.0
protocol = wss
secure_gateway = 0.0.0.0

[port_ws_public]
port = 60516
ip = 0.0.0.0
protocol = ws
secure_gateway = 0.0.0.0

#-------------------------------------------------------------------------------

load testing only

[transaction_queue]
minimum_txn_in_ledger = 4294967295
target_txn_in_ledger = 4294967295
normal_consensus_increase_percent = 4294967295

[node_size]
small

[ledger_history]
864000

[fetch_depth]
full

[node_db]
type=NuDB
path=SERVER_HOME/db/nudb
earliest_seq=3
online_delete=864000

[database_path]
SERVER_HOME/db

[debug_logfile]
SERVER_HOME/logs/debug.log

[perf]
perf_log=SERVER_HOME/logs/perf.log
log_interval=10

[sntp_servers]
time.windows.com
time.apple.com
time.nist.gov
pool.ntp.org

[peer_private]
1

[validators_file]
validators.txt

[voting]
reference_fee = 10
account_reserve = 1000
owner_reserve = 10

[ssl_verify]
0

[rpc_startup]
{"command" : "log_level", "severity" : "info" }

@ximinez
Copy link
Collaborator

ximinez commented Mar 19, 2024

Cool edge case.

This is why we emphasize over and over again that the result returned by RPC submit is always preliminary, and could change before the result is final. See also: https://xrpl.org/docs/concepts/transactions/reliable-transaction-submission/

As far as I can tell, this same result would be observed if the transaction came in from the network well before the transaction was submitted locally, and therefore, I don't think the result is incorrect. The user doesn't know what's happening internally, and if they submit a transaction to more than one server, they need to understand that they may get unintuitive results.

I'll leave this ticket open to give other folks a chance to weigh in, but I don't think this is a problem.

@ubuddemeier
Copy link
Author

ubuddemeier commented Mar 19, 2024 via email

@ximinez
Copy link
Collaborator

ximinez commented Mar 19, 2024

But it isn't novel. It's was submitted to another node before the result was returned on the first node.

What else could happen?

So it does look like Transaction doesn't use any locking, though I may have missed something. Given that, the absolute worst case I can think of is that the result fields returned by submit have a mix of the original results and modified results. That's not great, but not that big a deal - that result is preliminary anyway.

I suppose a (reader/writer) lock could be added to Transaction to ensure it doesn't get modified while the results are being read, but this situation could still very easily happen even with that mechanism in place. All it takes is for the network transaction to acquire the writer lock before doSubmit gets the reader lock. The problem is that locking and unlocking could cause a major performance hit when traffic is high, and you're multiplying that small delay by hundreds or thousands of transactions.

I still don't see this as a significant problem, because as I said above, the preliminary transaction results are not incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants