Race condition causing data inconsistency when nodes are coming up #148

indrekj · 2020-07-28T08:48:26Z

I think I found a race condition that is causing invalid data.

Version: latest master (7893228).

Some background: We use a slightly modified version of phoenix_pubsub which has some performance optimizations (one PR up here as well, + we added tag lookup ets table to speed up delta merges). We encountered a shard crash that seemed to happen because of our modifications. After I was able to create a failing test, I also noticed that the bug exists also on the original branch (though, in the original branch, it does not cause a shard crash but data inconsistency instead).

For us, this seems to happen when there's a network partition or Kubernetes thinks it's a good idea to move/add some pods around.

This is really hard to replicate in the real world. It usually happens for us maybe once a month.

Scenario (same as in the test but in the words):

Node A and B are connected
Alice joins node A (meta = initial)
Node A syncs with Node B
Alice is changed (meta = update1) - this update has been sent out but has not reached Node B yet
Node C joins - connects with A & B
Alice is changed (meta = update2)
Node C sends out transfer reqs (no response yet)
Node C receives delta update from A (with latest alice)
Node C receives transfer ack from Node B (which does not have the latest alice)
problem: the latest alice is overwritten by the old alice from Node B

Failing test:

test "delta before transfer from a different node", config do
  a = new(:a, config)
  b = new(:b, config)
  {a, _, _} = State.replica_up(a, b.replica)
  {b, _, _} = State.replica_up(b, a.replica)

  alice = new_pid()

  # Alice joins Node A
  a = State.join(a, alice, "lobby", :alice, "initial")

  # Node A sends updates to node B
  assert {b, [{{_, _, :alice}, _, _}], _} = State.merge(b, State.extract(a, b.replica, b.context))
  assert [:alice] = b |> State.online_list() |> keys()
  a = State.reset_delta(a)

  # Alice is updated first time
  a = State.leave(a, alice, "lobby", :alice)
  a = State.join(a, alice, "lobby", :alice, "update1")

  # update1 is not received by Node B (because of network delay or network
  # partition) or is received a lot later
  a = State.reset_delta(a)

  # Node C comes up
  c = new(:c, config)
  {b, _, _} = State.replica_up(b, c.replica)
  {a, _, _} = State.replica_up(a, c.replica)
  {c, _, _} = State.replica_up(c, a.replica)
  {c, _, _} = State.replica_up(c, b.replica)

  # Alice is updated second time
  a = State.leave(a, alice, "lobby", :alice)
  a = State.join(a, alice, "lobby", :alice, "second")

  # Lets assume Node C also sent out transfer_req to Node B here, but Node C
  # receives delta heartbeat from Node A first.
  assert {c, [{{_, _, :alice}, "second", _}], []} = State.merge(c, a.delta)

  # Here everything is fine. Node C sees the latest alice.
  assert [
    {{"lobby", _, :alice}, "second", _}
  ] = c |> State.online_list()

  # Now Node C receives transfer ack from B (who has alice with one missed update)
  assert {c, _, _} = State.merge(c, State.extract(b, c.replica, c.context))
  assert [
    {{"lobby", _, :alice}, "second", {{:a, 1}, 2}}
  ] = c |> State.online_list()
  # ^ This fails because the most recent alice is overwritten with the old
  # alice (who has "initial" now in the meta")

  # Lets say we ignore the previous inconsistency and wait for transfer ack
  # from the node A as well
  assert {c, _, _} = State.merge(c, State.extract(a, c.replica, c.context))
  assert [
    {{"lobby", _, :alice}, "second", _}
  ] = c |> State.online_list()
  # ^ This still fails - now there is no alice online at all
end

Also link: salemove@fdfe57c

Note: As this is quite complex to replicate in the real world, I cannot be 100% sure that my test is exactly what is happening. I'm fairly certain there's "values" overwriting happening because I was able to change this line to use true = :ets.insert_new and this threw an error when there were new pods coming up (it took 2 weeks to catch that though).

In case my assumptions and the test case are correct - I still don't have a good idea how to fix it...

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition causing data inconsistency when nodes are coming up #148

Race condition causing data inconsistency when nodes are coming up #148

indrekj commented Jul 28, 2020 •

edited

Race condition causing data inconsistency when nodes are coming up #148

Race condition causing data inconsistency when nodes are coming up #148

Comments

indrekj commented Jul 28, 2020 • edited

indrekj commented Jul 28, 2020 •

edited