Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Two-phase commit is locked" after prepare phase timeout #2119

Open
palage4a opened this issue Jul 19, 2023 · 1 comment · May be fixed by #2126 or #2137
Open

"Two-phase commit is locked" after prepare phase timeout #2119

palage4a opened this issue Jul 19, 2023 · 1 comment · May be fixed by #2126 or #2137
Labels
bug Something isn't working

Comments

@palage4a
Copy link
Contributor

palage4a commented Jul 19, 2023

Description

If we have timeout during the prepare phase on a tarantool instance,
then the next update processes of config will fail with "Two phase commit is locked".

Repro

create cartridge app

cartridge create --name myapp .;
cd myapp;

add sleep to validate_config function in custom role (app/roles/custom.lua),

it should look like:

local function validate_config()
    require('fiber').sleep(1)
    return true
end

start cluster

cartridge build;
cartridge start -d;
cartridge replicasets setup;

connect to an instance

cartridge enter s1-master;

decrease validate_config_timeout to get timeout on prepare phase

(function()
    local vars = require('cartridge.vars').new('cartridge.twophase')
    vars.options.validate_config_timeout = 0.1
end)()

try to apply CW config and get error "timed out" from instance with app.roles.custom role*

(function()
    local twophase = require('cartridge.twophase')
    return twophase.patch_clusterwide({})
end)()

after that return validate_config_timeout to its default value

(function()
    local vars = require('cartridge.vars').new('cartridge.twophase')
    vars.options.validate_config_timeout = 10
end)()

or just restart instance s1-master;

try to apply CW-config again

(function()
    local twophase = require('cartridge.twophase')
    return twophase.patch_clusterwide({})
end)()

Output

➜  myapp git:(main) ✗ cartridge enter s1-master;

connected to myapp.s1-master
myapp.s1-master> (function()
    local vars = require('cartridge.vars').new('cartridge.twophase')
    vars.options.validate_config_timeout = 0.1
end)()
---
...

myapp.s1-master> (function()
    local twophase = require('cartridge.twophase')
    return twophase.patch_clusterwide({})
end)()
---
- null
- line: 482
  class_name: NetboxCallError
  err: '"localhost:3301": timed out'
  file: '...ha/code/palage4a/myapp/.rocks/share/tarantool/errors.lua'
  stack: "stack traceback:\n\t...ha/code/palage4a/myapp/.rocks/share/tarantool/errors.lua:482:
    in function 'netbox_wait_async'\n\t...palage4a/myapp/.rocks/share/tarantool/cartridge/pool.lua:248:
    in function 'map_call'\n\t...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:388:
    in function 'twophase_commit'\n\t...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:563:
    in function <...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:489>\n\t[C]:
    in function 'xpcall'\n\t...ha/code/palage4a/myapp/.rocks/share/tarantool/errors.lua:145:
    in function 'pcall'\n\t...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:589:
    in function <...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:581>\n\t[C]:
    in function 'pcall'\n\tbuiltin/box/console.lua:412: in function <builtin/box/console.lua:385>\n\t[string
    \"local function func(...) return require('cons...\"]:1: in main chunk\n\t[C]:
    in function 'pcall'\n\tbuiltin/box/console.lua:412: in function 'eval'\n\tbuiltin/box/console.lua:723:
    in function 'repl'\n\tbuiltin/box/console.lua:895: in function <builtin/box/console.lua:881>\n\t[C]:
    in function 'pcall'\n\tbuiltin/socket.lua:1084: in function <builtin/socket.lua:1082>"
  str: 'NetboxCallError: "localhost:3301": timed out'
...

myapp.s1-master> (function()
    local vars = require('cartridge.vars').new('cartridge.twophase')
    vars.options.validate_config_timeout = 10
end)()
---
...

myapp.s1-master> (function()
    local twophase = require('cartridge.twophase')
    return twophase.patch_clusterwide({})
end)()
---
- null
- line: 99
  class_name: Prepare2pcError
  err: '"localhost:3301": Two-phase commit is locked'
  file: '...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua'
  str: 'Prepare2pcError: "localhost:3301": Two-phase commit is locked'
  stack: "stack traceback:\n\t...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:99:
    in function <...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:59>\n\t[C]:
    in function 'xpcall'\n\t...ha/code/palage4a/myapp/.rocks/share/tarantool/errors.lua:145:
    in function <...ha/code/palage4a/myapp/.rocks/share/tarantool/errors.lua:139>\n\t[C]:
    at 0x0108d37de0\nduring async net.box call to localhost:3301, function \"_G.__cartridge_clusterwide_config_prepare_2pc\"\nstack
    traceback:\n\t...palage4a/myapp/.rocks/share/tarantool/cartridge/pool.lua:248:
    in function 'map_call'\n\t...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:388:
    in function 'twophase_commit'\n\t...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:563:
    in function <...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:489>\n\t[C]:
    in function 'xpcall'\n\t...ha/code/palage4a/myapp/.rocks/share/tarantool/errors.lua:145:
    in function 'pcall'\n\t...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:589:
    in function <...ge4a/myapp/.rocks/share/tarantool/cartridge/twophase.lua:581>\n\t[C]:
    in function 'pcall'\n\tbuiltin/box/console.lua:412: in function <builtin/box/console.lua:385>\n\t[string
    \"local function func(...) return require('cons...\"]:1: in main chunk\n\t[C]:
    in function 'pcall'\n\tbuiltin/box/console.lua:412: in function 'eval'\n\tbuiltin/box/console.lua:723:
    in function 'repl'\n\tbuiltin/box/console.lua:895: in function <builtin/box/console.lua:881>\n\t[C]:
    in function 'pcall'\n\tbuiltin/socket.lua:1084: in function <builtin/socket.lua:1082>"
...

myapp.s1-master>
@palage4a palage4a added the bug Something isn't working label Jul 19, 2023
@palage4a
Copy link
Contributor Author

palage4a commented Jul 19, 2023

Second repro

Create cartridge app

cartridge create --name myapp .
cd myapp

Add sleep to validate_config function in custom role (app/roles/custom.lua),

it should look like that:

diff --git a/app/roles/custom.lua b/app/roles/custom.lua
index 509353d..43bdf4c 100644
--- a/app/roles/custom.lua
+++ b/app/roles/custom.lua
@@ -17,6 +17,11 @@ local function stop()
 end

 local function validate_config(conf_new, conf_old) -- luacheck: no unused args
+    if math.random(5) == 1 then
+        local vars = require('cartridge.vars').new('cartridge.twophase')
+        require('fiber').sleep(vars.options.validate_config_timeout + 1)
+    end
+
     return true
 end

Start cluster

cartridge build;
cartridge start -d;
cartridge replicasets setup;

Reproduce

Call this command multiple times until "two-phase commit is locked" error

curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'

Output

➜  myapp git:(main) ✗ curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'
{"data":{"cluster":{"failover":false}}}%
➜  myapp git:(main) ✗ curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'
{"data":{"cluster":{"failover":false}}}%
➜  myapp git:(main) ✗ curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'
{"data":{"cluster":{"failover":false}}}%
➜  myapp git:(main) ✗ curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'
{"data":{"cluster":{"failover":false}}}%
➜  myapp git:(main) ✗ curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'
{"data":{"cluster":{"failover":false}}}%
➜  myapp git:(main) ✗ curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'
{"errors":[{"message":"\"localhost:3301\": timed out","extensions":{"io.tarantool.errors.stack":"stack traceback:\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:482: in function 'netbox_wait_async'\n\t...palage4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/pool.lua:248: in function 'map_call'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:388: in function 'twophase_commit'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:563: in function <...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:489>\n\t[C]: in function 'xpcall'\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:145: in function 'pcall'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:589: in function 'patch_clusterwide'\n\t...pp\/.rocks\/share\/tarantool\/cartridge\/lua-api\/failover.lua:149: in function 'set_params'\n\t...pp\/.rocks\/share\/tarantool\/cartridge\/lua-api\/failover.lua:175: in function 'call'\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:38: in function 'defaultResolver'\n\t...\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:300: in function <...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:226>\n\t[C]: in function 'xpcall'\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:145: in function 'pcall'\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:335: in function 'sub'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:658: in function <...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:636>\n\t[C]: in function 'pcall'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:736: in function 'process_client'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:1199: in function <...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:1198>\n\t[C]: in function 'pcall'\n\tbuiltin\/socket.lua:1084: in function <builtin\/socket.lua:1082>","io.tarantool.errors.class_name":"NetboxCallError"}}]}%
➜  myapp git:(main) ✗ curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'
{"errors":[{"message":"\"localhost:3301\": Two-phase commit is locked","extensions":{"io.tarantool.errors.stack":"stack traceback:\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:99: in function <...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:59>\n\t[C]: in function 'xpcall'\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:145: in function <...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:139>\n\t[C]: at 0x010f1fdde0\nduring async net.box call to localhost:3301, function \"_G.__cartridge_clusterwide_config_prepare_2pc\"\nstack traceback:\n\t...palage4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/pool.lua:248: in function 'map_call'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:388: in function 'twophase_commit'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:563: in function <...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:489>\n\t[C]: in function 'xpcall'\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:145: in function 'pcall'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:589: in function 'patch_clusterwide'\n\t...pp\/.rocks\/share\/tarantool\/cartridge\/lua-api\/failover.lua:149: in function 'set_params'\n\t...pp\/.rocks\/share\/tarantool\/cartridge\/lua-api\/failover.lua:175: in function 'call'\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:38: in function 'defaultResolver'\n\t...alage4a\/myapp\/.rocks\/share\/tarantool\/graphql\/execute.lua:303: in function 'getFieldEntry'\n\t...\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:300: in function <...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:226>\n\t[C]: in function 'xpcall'\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:145: in function 'pcall'\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:335: in function 'sub'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:658: in function <...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:636>\n\t[C]: in function 'pcall'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:736: in function 'process_client'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:1199: in function <...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:1198>\n\t[C]: in function 'pcall'\n\tbuiltin\/socket.lua:1084: in function <builtin\/socket.lua:1082>","io.tarantool.errors.class_name":"Prepare2pcError"}}]}%
➜  myapp git:(main) ✗ curl -X "POST" "http://localhost:8081/admin/api" -d '{"query": "mutation{cluster{failover(enabled:false)}}", "variables": {} }'
{"errors":[{"message":"\"localhost:3301\": Two-phase commit is locked","extensions":{"io.tarantool.errors.stack":"stack traceback:\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:99: in function <...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:59>\n\t[C]: in function 'xpcall'\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:145: in function <...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:139>\n\t[C]: at 0x010f1fdde0\nduring async net.box call to localhost:3301, function \"_G.__cartridge_clusterwide_config_prepare_2pc\"\nstack traceback:\n\t...palage4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/pool.lua:248: in function 'map_call'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:388: in function 'twophase_commit'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:563: in function <...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:489>\n\t[C]: in function 'xpcall'\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:145: in function 'pcall'\n\t...ge4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/twophase.lua:589: in function 'patch_clusterwide'\n\t...pp\/.rocks\/share\/tarantool\/cartridge\/lua-api\/failover.lua:149: in function 'set_params'\n\t...pp\/.rocks\/share\/tarantool\/cartridge\/lua-api\/failover.lua:175: in function 'call'\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:38: in function 'defaultResolver'\n\t...alage4a\/myapp\/.rocks\/share\/tarantool\/graphql\/execute.lua:303: in function 'getFieldEntry'\n\t...\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:300: in function <...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:226>\n\t[C]: in function 'xpcall'\n\t...ha\/code\/palage4a\/myapp\/.rocks\/share\/tarantool\/errors.lua:145: in function 'pcall'\n\t...age4a\/myapp\/.rocks\/share\/tarantool\/cartridge\/graphql.lua:335: in function 'sub'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:658: in function <...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:636>\n\t[C]: in function 'pcall'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:736: in function 'process_client'\n\t...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:1199: in function <...de\/palage4a\/myapp\/.rocks\/share\/tarantool\/http\/server.lua:1198>\n\t[C]: in function 'pcall'\n\tbuiltin\/socket.lua:1084: in function <builtin\/socket.lua:1082>","io.tarantool.errors.class_name":"Prepare2pcError"}}]}%
➜  myapp git:(main) ✗

@palage4a palage4a linked a pull request Jul 26, 2023 that will close this issue
3 tasks
@palage4a palage4a linked a pull request Nov 28, 2023 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant