Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread conflict in Resource::Type for regex node definitions #9329

Open
jstange opened this issue Apr 24, 2024 · 0 comments
Open

Thread conflict in Resource::Type for regex node definitions #9329

jstange opened this issue Apr 24, 2024 · 0 comments
Labels
bug Something isn't working help wanted Issue has been reviewed & PRs are welcome

Comments

@jstange
Copy link

jstange commented Apr 24, 2024

Describe the Bug

We have node manifests of the following form:

node /^service-(server|client)(?:-test)?-(\d+)/ {
  $myservice_role=$1
  $myservice_id=$2
  
  case $myservice_role {
    'server': {
      # do one set of things
    }
   'client': {
     # do some different things
   }
<...etc...>
}

We have observed these nodes randomly swapping server and client behavior, as though their pattern match was for the wrong hostname. This only occurs with threading enabled.

Expected Behavior

A host's sense of its hostname should not be transposed with that of another host. This may in fact be a problem with regexp backreferences generally.

Steps to Reproduce

We're able to quickly reproduce the issue for debugging using ~20 hosts, all matching the node pattern, kicking off agent runs continuously (e.g. while /bin/true;do puppet agent --test;done). We use the following manifest to capture the misbehavior:

node /^thing-(foo|bar)(?:-test)?-(\d+)/ {
  $first_blob=$1
  $second_blob=$2

  if $::hostname =~ /^thing-(foo|bar)(?:-test)?-(\d+)/ {
    if $1==$first_blob and $2==$second_blob {
      # all is good, just notify
      notify { 'regex_test_debug':
        message => "hostname=${::hostname} first_blob=${first_blob} second_blob=${second_blob}"
      }
    } else {
      # fail loudly
      fail ("regex_test_debug: \$1=$1 \$2=$2 but \$first_blob=$first_blob and \$second_blob=$second_blob")
    }
  } else {
    fail ("regex_test_debug: $::hostname does not match node regex")
  }
}

This gives us appropriate fail log entries whenever the node pattern match results do not correspond to the hostname:

2024-04-24T16:17:53.402-04:00 ERROR [qtp943675003-391] [puppetserver] Puppet Evaluation Error: Error while evaluating a Function Call, regex_test_debug: $1=bar $2=8 but $first_blob=foo and $second_blob=5 (file: /export/home/puppet/environments/production/manifests/regex_test.pp, line: 22, column: 7) on node thing-bar-8.example

Environment

  • Puppet Server 7.x (various, up to and including 7.15.0)
  • CentOS 7.9.2009, AlmaLinux 8.9

Additional Context

I believe I've identified where these nodes step on one another during concurrent agent runs, however the fix isn't as trivial as adding a sempahore[1].

In Resource::Type (lib/puppet/resource/type.rb) it seems that all nodes which match a regexp node pattern share a common instance of this class. In my test case above we find it running around with the name __node_regexp__thing-foobar:-test-d, and concurrent agent runs report the same Ruby object_id.

This object is slightly stateful, in that the @match attribute seems to contain the most recent string matched against the name pattern. We observe a 1:1 correlation between the state of this variable and what the agent run associated with multiple relevant threads thinks the hostname pattern match object was supposed to contain. I'm not quite sure how it propagates back to the DSL, but it seems likely that this is where it's coming from.

The correct fix isn't immediately obvious, but is probably of one of the following forms:

  • Guard the value of @match between the time it's set and the time a node is done with it. I am not sure how to definitively tell when it would be safe to unlock.
  • "Slot" the value of @match based on the source string. This would mean that calling objects would need to be able to identify what pattern they mean by the string that matched the pattern, and I haven't found an obvious attribute to use for the purpose.
  • Force distinct Resource::Type objects for individual agents

Any approach I've thought of is nontrivial and should have some input from someone more expert in this codebase.

Our situation manifesting this bug is peculiar to the use of backreferences in node blocks in the DSL, but I can see it impacting anything that uses a pattern and results in a Resource::Type object being created.

...oh, and lest I forget, calls to #evaluate_code can run concurrently on the same Resource::Type object. This may (I am not sure and cannot prove it) cause recycled MatchScope objects downstream of both scope.ephemeral_from and code.safeevaluate to get overwritten across threads. Comments imply that those objects do get reused, but I haven't actually seen this to be the case so I'm not sure this is an issue. This aspect would be trivial to solve with a semaphore, if indeed it is a problem.

[1] Technically if I wrap the whole of Parser::Compiler#evaluate_ast_node in an environment.lock.synchronize block it makes the problem go away. However this seems like too high-level a bottleneck. The correct fix is probably to Resource::Type.

@jstange jstange added the bug Something isn't working label Apr 24, 2024
@mhashizume mhashizume added the help wanted Issue has been reviewed & PRs are welcome label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Issue has been reviewed & PRs are welcome
Projects
None yet
Development

No branches or pull requests

2 participants