Skip to content
This repository has been archived by the owner on Jun 10, 2019. It is now read-only.

Revisit locking design for State object #17

Open
karlfloersch opened this issue Jan 9, 2019 · 29 comments
Open

Revisit locking design for State object #17

karlfloersch opened this issue Jan 9, 2019 · 29 comments
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@karlfloersch
Copy link
Contributor

karlfloersch commented Jan 9, 2019

Is your feature request related to a problem? Please describe.
The state object requires locking logic when starting a new block or ingesting transactions & deposits. An example of a race condition without locking is as follows:

  1. Transaction A & B are ingested at the same time.
  2. Transaction A transfers range to Alice in block 5, transaction B transfers the same range to Bob in block 5. Both transactions are valid, but only one can be included as they have the same block number & transfer the same range twice.
  3. Because there is no lock, both transactions make it past the validity checks concurrently.
  4. Both transactions are applied to the state and saved in the tx_log... uh oh!
  5. Consumers of the tx_log now have two transactions effecting the same range in the same block! Oh no--that's invalid!

I am currently solving this by locking the sending account when a transaction is added. That means that each of these txs would attempt to acquire a lock for the same sender & one would fail.

How do locks currently get acquired and released?

Right now, locking is very primitive. Lock acquiring looks like this:

state.js

  attemptAcquireLocks (keywords) {
    keywords.push('all')
    if (keywords.some((val) => { return this.lock.hasOwnProperty(val) })) {
      // Failed to acquire locks
      return false
    }
    // Acquire locks
    for (let i = 0; i < keywords.length - 1; i++) {
      this.lock[keywords[i]] = true
    }
    return true
  }

Where keywords is every sender address in every transfer record of the transaction being added. If a lock fails to be acquired, then we wait some amount of time & try again with:

    while (!this.attemptAcquireLocks([type])) {
      // Wait before attempting again
      await timeout(timeoutAmt())
    }

Problems with this approach

I used this as a quick and dirty solution but I believe it is a bit dangerous in that it may fill up the event queue if there are a large number of callbacks competing for the same lock. I don't actually know how bad this is but I imagine it could be a problem if someone were to intentionally fill up the event queue with tons of txs which are from the same sender.

Solutions?

I don't know the best way to approach this I just intuit there's a solution I haven't yet explored. Maybe to do with queues and promises, but my thoughts in this area are not developed enough to describe at the moment.

I'd love feedback!

Some resources I've taken a look at...

@karlfloersch karlfloersch added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Jan 9, 2019
@ben-chain
Copy link
Contributor

For what it's worth, I will note that the race condition you describe does not produce an "invalid" block in the same way that would be used in an invalid history challenge. At the leaf parsing step of the merkle sum tree, all but one of the transactions would be given a sum of 0 and so those branches would not be able to be used in an exit game at all. So, the race condition doesn't actually invalidate anything in the sense of having to exit or lose money.

That said, it obviously seems like horrible practice to not solve this issue--especially if the operator is signing off on inclusion promise attestations to get the "0-conf"-like property.

Another note--locking based on account does have the issue that it restricts the user from sending more than one transaction per block. Ideally, we would want to lock by range... not sure how much overhead this would end up adding.

It does seem like a proper queue module could be useful here depending on how quick locking/unlocking is--from queue.js, "it can be significantly faster than using arrays." I'm not sure whether the DOS issue you described above is preventable without proper anti-DOS measures, though.

On recieving a transaction, you'd add it to the queue, with a callback to include/log the transaction, and then you would just repeatedly dequeue in a loop:

  1. check if any parts of the ranges are locked, if so, reject
  2. lock all the ranges,
  3. accept transaction (run callback)

@karlfloersch
Copy link
Contributor Author

locking based on account does have the issue that it restricts the user from sending more than one transaction per block

These account locks are released upon completion of the addTransaction() function, so there should be no problem with having multiple transactions for the same account in the same block.


It may be worth mentioning that I've implemented a simple promise queue & simply processed transactions in sequence but this approach seemed to have even worse memory limit issues & was slightly slower.

@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 300.0 DAI (300.0 USD @ $1.0/DAI) attached to it.

@mvayngrib
Copy link
Contributor

@karlfloersch how are you measuring the memory limit / speed issues?

@gitcoinbot
Copy link

gitcoinbot commented Feb 1, 2019

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 1 week, 5 days ago.
Please review their action plans below:

1) famiiz has applied to start work (Funders only: approve worker | reject worker).

Good
Vsjsisbsbsnskzzbbz zip, jinx Xbox, nzkzbz znzkzz z zkzkznznzznnzmzkzkzznzbzbzsbjzjzjzn

Learn more on the Gitcoin Issue Details page.

2) lsaether has applied to start work (Funders only: approve worker | reject worker).

Would love to take a stab at this and see what kind of solution I can implement. I'm proficient in JS/ TS but can't describe a solution to this off the top of my head, would need to play around with the codebase first.

Learn more on the Gitcoin Issue Details page.

3) mvayngrib has been approved to start work.

i can factor out the queueing/locking logic into a separate module. You'll probably be able to get rid of your async-lock dependency too while you're at it.

Learn more on the Gitcoin Issue Details page.

@mvayngrib
Copy link
Contributor

is execution in order of invocation important? Currently, if the attempted lock acquisitions are something like 1. ['a', 'b'], 2. ['b', 'c'], 3. ['a', 'c'], 4. ['a', 'b'], 4 might end up executing in parallel with 2, and before 3

if execution in order of invocation is not important, then at every lock release, u can check the enqueued operations and choose the maximum number that can be run in parallel, as opposed to just the luck of the draw

@frndxyz
Copy link

frndxyz commented Feb 3, 2019

isnt the solution if this to add timestamp with it to solve concurrency and queue jam issue.

@karlfloersch
Copy link
Contributor Author

karlfloersch commented Feb 4, 2019

@mvayngrib

if execution in order of invocation is not important, then at every lock release, u can check the enqueued operations and choose the maximum number that can be run in parallel, as opposed to just the luck of the draw

This seems reasonable. It makes sense to have a datastructure which makes it easy to "choose the maximum number of parallelizable transactions" and then begin processing those transactions. The tricky part is managing this complex queue.

@frndxyz

isnt the solution if this to add timestamp with it to solve concurrency and queue jam issue.

How does this work? Slightly confused about exactly how to use the timestamp

@mvayngrib
Copy link
Contributor

@karlfloersch after playing with it on paper, and discussing with a friend, I'm no longer sure it matters whether parallelizable tasks run earlier or later. I think your locking design is actually fine efficiency-wise, the main issues being:

  • it's embedded in state.js (lots of lock/unlock)
  • it uses timeouts
  • this kind of intersecting-lock-set queue might be useful on its own, in other parts of the project

i see @Sprimage has been approved, but if he changes his mind, i'd like to give this a try

@Sprimage
Copy link

Sprimage commented Feb 5, 2019

Yea @mvayngrib i figured the polling loop too was an issue and decided using https://www.npmjs.com/package/promise-queue should wrap it up. If you're really interested, we can work together to come up with the best solution as soon as possible

@Sprimage
Copy link

Sprimage commented Feb 5, 2019

Ok so queuing promises can take a lot of memory. What if we move the job queue out of program memory and use something like fs? So each file represents a job (the json of the functions args) and we loop through the dir to execute each job in the queue and delete the file once executed?

@mvayngrib
Copy link
Contributor

@Sprimage fs sounds like a big compromise speed-wise. Blocks are processed one a time, what's the max # txs in a block? Also, as far as I understand, queueing is an exception rather than a rule. Txs can be processed in parallel and only queue up when they collide on a lock (as they are currently)

@karlfloersch
Copy link
Contributor Author

@mvayngrib

what's the max # txs in a block?

This depends on the block time & tps as there is no hard upper limit on txs in a block. Right now the block time is set to 60 seconds, and I've benchmarked the state object alone and gotten around 3k tps on my laptop. So if we were to make the express API handle more tps, theoretically with 1 minute block times we could get blocks with 180,000 transactions, which would be approx 18MB.

@mvayngrib
Copy link
Contributor

@karlfloersch are you considering using a persistent buffer of some sort, e.g. redis or fs, or is that out of the scope?

@Sprimage
Copy link

Sprimage commented Feb 6, 2019

@mvayngrib

Also, as far as I understand, queueing is an exception rather than a rule. Txs can be processed in parallel and only queue up when they collide on a lock (as they are currently)

I know queuing is an exception. But the same is the root of this issue isn't it?

@Sprimage
Copy link

Sprimage commented Feb 6, 2019

Also external dependencies and platform specific dependencies would make fs the better option compared to redis. @karlfloersch do you think fs will work?

@Sprimage
Copy link

Sprimage commented Feb 6, 2019

If you have slack or something similar so we can discuss this in real time and consider the solution together would be great too

@gitcoinbot
Copy link

@Sprimage Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

4 similar comments
@gitcoinbot
Copy link

@Sprimage Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@gitcoinbot
Copy link

@Sprimage Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@gitcoinbot
Copy link

@Sprimage Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@gitcoinbot
Copy link

@Sprimage Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@mvayngrib
Copy link
Contributor

this still alive?

@spm32
Copy link

spm32 commented Feb 28, 2019

Hey @mvayngrib are you interested in taking this on? I think we should be passing it back to the crowd at this point.

@mvayngrib
Copy link
Contributor

@ceresstation sure, i can give it a try. I already applied though gitcoin

@spm32
Copy link

spm32 commented Mar 1, 2019

Just accepted :)

@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work for 300.0 DAI (300.0 USD @ $1.0/DAI) has been submitted by:

  1. @mvayngrib

@ceresstation please take a look at the submitted work:


@mvayngrib
Copy link
Contributor

poke @ceresstation

@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


The funding of 300.0 DAI (300.0 USD @ $1.0/DAI) attached to this issue has been approved & issued to @mvayngrib.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants