Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DNS resolver configurable #1612

Merged
merged 6 commits into from Oct 20, 2020
Merged

Make DNS resolver configurable #1612

merged 6 commits into from Oct 20, 2020

Conversation

imiric
Copy link
Contributor

@imiric imiric commented Aug 31, 2020

This introduces a new composite DNS option (--dns, K6_DNS, and dns in JS options) to allow finer-grained control of k6's DNS resolution.

The sub-options are:

  • ttl: the cache expiration period per hostname. Note that currently k6 doesn't respect the TTL in the DNS record, as that involves a deeper refactoring of the resolver that's currently in progress.
    Possible values are:

    • inf: hosts are cached indefinitely for the duration of k6 run (previous default).
    • 0: the cache is disabled and lookups will be done on each new connection.
    • 3000, 30s, 1m, etc.: cache DNS lookups for the specified period. If no unit is provided, milliseconds are assumed.

    Default value: 5m.

  • select: strategy to use when selecting an IP if more than one is returned in the response.
    Possible values are:

    • first: return the first IP from the response (previous default).
    • random: return a random IP from the response.
    • roundRobin: rotate the IP returned for each new connection. The current implementation is not stable and could repeat or skip IPs, but this should be good enough in most cases.

    Default value: random.

  • policy: IP version preference.
    Possible values are:

    • preferIPv4: return an IPv4 address if available, otherwise fall back to IPv6.
    • preferIPv6: return an IPv6 address if available, otherwise fall back to IPv4.
    • onlyIPv4: only use IPv4 addresses even if an IPv6 address is returned.
    • onlyIPv6: only use IPv6 addresses even if an IPv4 address is returned.
    • any: no preference, return any address (previous default).

    Default value: preferIPv4.

Usage examples

  • CLI: k6 run --dns='ttl=0,select=roundRobin,policy=onlyIPv4' ...
  • Environment variable: K6_DNS='ttl=10m,select=first,policy=preferIPv6' k6 run ...
  • JS options:
    export let options = {
      dns: {
        ttl: '1m',
        select: 'random',
        policy: 'any'
      }
    }

Breaking changes

As discussed:

  • The default ttl if unspecified is 5m instead of the previous inf behavior.
  • The default select if unspecified is random instead of the previous first behavior.

Important note

These resolver settings will only be applied on new HTTP/TCP connections. Since DNS lookups are only done before the connection is established, if the test script and server enable HTTP keep-alives, the script might continue to use the established connection to the "old" host even though the DNS record had changed or the TTL expired. I.e. k6 will not actively close the connection or keep track of the TTL in the background, which it arguably shouldn't anyway.

To circumvent this the script will either have to let the connection close by itself after the idle timeout (should we make http.Transport.IdleConnTimeout configurable?), or enable the noConnectionReuse option to disable keep-alives altogether, which is not ideal...

This PR closes #726 and #738 if used correctly, but is more of a workaround than a proper fix.

@imiric imiric requested review from mstoykov and na-- August 31, 2020 12:06
lib/options.go Outdated Show resolved Hide resolved
imiric pushed a commit that referenced this pull request Sep 3, 2020
@imiric imiric requested a review from na-- September 3, 2020 09:45
@imiric
Copy link
Contributor Author

imiric commented Sep 3, 2020

@na-- @mstoykov please take a look at the latest changes.

From manual functional tests both TTL and strategy seem to work OK, but I can't resolve the config consolidation failure. From digging into it it seems that cliConf is modified by fileConf on line 232 here:

https://github.com/loadimpact/k6/blob/5ef753777c5231f4f08d8955ce7432b95db34205/cmd/config.go#L232-L236

So when it's applied again on line 236 it actually applies the modified values, not the original ones that were passed to getConsolidatedConfig().

If I'm not mistaken, to make this work we need a deep copy of the original cliConf which can then override everything else, but I didn't want to open that can of worms here as it can impact potentially everything depending on the current behavior.

Anyway, please test this manually yourselves. The easiest way I've found is changing /etc/hosts since there's no way to mock this currently (and also why the tests I mentioned in the description are useless).

lib/options.go Outdated
Comment on lines 56 to 110
type NullDNSStrategy struct {
DNSStrategy
Valid bool
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awful, but I needed a nullable type to work with the Apply() mechanism... It seems any config item needs to have this logic for the overriding to work. :-/

core/local/local_test.go Outdated Show resolved Hide resolved
js/runner_test.go Outdated Show resolved Hide resolved
lib/netext/dialer.go Outdated Show resolved Hide resolved
lib/netext/dialer.go Outdated Show resolved Hide resolved
lib/options.go Outdated Show resolved Hide resolved
lib/options.go Outdated Show resolved Hide resolved
lib/options.go Outdated Show resolved Hide resolved
lib/options.go Outdated Show resolved Hide resolved
cmd/options.go Outdated Show resolved Hide resolved
imiric pushed a commit that referenced this pull request Sep 8, 2020
lib/testutils/resolver.go Outdated Show resolved Hide resolved
js/runner.go Outdated Show resolved Hide resolved
lib/netext/dialer.go Outdated Show resolved Hide resolved
@imiric imiric mentioned this pull request Sep 9, 2020
imiric pushed a commit that referenced this pull request Sep 9, 2020
@imiric imiric force-pushed the fix/726-738-dns-cache-refresh branch from 5275d5f to f280b60 Compare September 9, 2020 17:27
@imiric
Copy link
Contributor Author

imiric commented Sep 9, 2020

All major issues with the configuration and tests should now be resolved, and the current test failure is because of the global netext.LookupIP override I'm doing. This was the simplest way to mock the stdlib call in tests, as all other attempts I tried involved changing the netext.Resolver interface. I don't like this approach as it changes the implementation too much simply for testing purposes, while mocking Runner.Resolver is messy, so suggestions are welcome. :) A possible approach could be to pass the LookupIP function to NewResolver() or another constructor, but that could also be problematic.

And apparently TestDNSResolver is flaky, judging by the AppVeyor failure. 😞

@imiric imiric changed the title WIP Make DNS cache configurable Make DNS resolver configurable Sep 9, 2020
lib/options.go Outdated Show resolved Hide resolved
lib/options.go Outdated Show resolved Hide resolved
js/runner_test.go Outdated Show resolved Hide resolved
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't completely finished looking through the code, will mention anything else as individual comments

lib/options.go Outdated Show resolved Hide resolved
lib/options.go Outdated Show resolved Hide resolved
lib/options.go Outdated
Comment on lines 102 to 247
if string(text) == DefaultDNSConfigText {
*c = DefaultDNSConfig()
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very premature and unnecessary optimization 😅 And what's stopping me from reversing the order of params, strategy=first,ttl=inf? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't meant as an optimization, but a way to avoid more complex logic for setting default values. I'd argue this shouldn't even be in UnmarshalText(), but it was one of those quick workarounds for when the exact default string is used, so the order didn't matter. I'll see if I can get rid of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so I wasn't able to remove this easily, as it causes TestConfigConsolidation to fail 😭 Try it for yourself, comment out this block, run the test and see all the issues... (I have an intense hatred for that specific test 😆, having seen it fail so many times. It's huge and makes it difficult to parse the errors and go to the source.)

To fix it I'd need to manually handle the default values for both fields plus their Valid values, which is tricky to get right without messing up some fallback logic. So I'd chalk this one up as another TODO for #883. Let me know if you want a comment for it or have another suggestion.

lib/options.go Outdated Show resolved Hide resolved
lib/options.go Outdated Show resolved Hide resolved
lib/netext/resolver.go Outdated Show resolved Hide resolved
lib/netext/resolver.go Outdated Show resolved Hide resolved
imiric pushed a commit that referenced this pull request Sep 10, 2020
imiric pushed a commit that referenced this pull request Sep 10, 2020
imiric pushed a commit that referenced this pull request Sep 10, 2020
imiric pushed a commit that referenced this pull request Sep 10, 2020
@imiric imiric force-pushed the fix/726-738-dns-cache-refresh branch from ed65b6f to 60797be Compare September 10, 2020 16:10
imiric pushed a commit that referenced this pull request Oct 15, 2020
imiric pushed a commit that referenced this pull request Oct 15, 2020
imiric pushed a commit that referenced this pull request Oct 15, 2020
imiric pushed a commit that referenced this pull request Oct 15, 2020
na--
na-- previously approved these changes Oct 16, 2020
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉 I am not sure if local.TestDNSResolver wouldn't prove to be flaky in the future, but 🤞, we'll address it then

"for a persistent cache, '0' to disable the cache,\nor a positive duration, e.g. '1s', '1m', etc. "+
"Milliseconds are assumed if no unit is provided.\n"+
"Possible select values to return a single IP are: 'first', 'random' or 'round-robin'.\n"+
"Possible policy values are: 'preferIPv4', 'preferIPv6', 'onlyIPv4', 'onlyIPv6' or 'any'.\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply rename round-robin to roundRobin to be in line with the other options? And if we make the identifiers case-insensitive, then it'd be even better. I see the new enumer version seems to support that.

mstoykov
mstoykov previously approved these changes Oct 16, 2020
Copy link
Collaborator

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, apart from my worries that the change to ParseExtentedDuration will break scripts.

I haven't tested it, but will try to get to it today.

lib/netext/resolver.go Outdated Show resolved Hide resolved
lib/netext/resolver.go Outdated Show resolved Hide resolved
@@ -95,8 +95,12 @@ func (d Duration) String() string {
}

// ParseExtendedDuration is a helper function that allows for string duration
// values containing days.
// values containing days. If no units are provided, milliseconds are assumed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also applies to other configurations such as:

  • setup/teardownTimeout
  • gracefulStop/rampdown
  • timeunit
  • duration
  • starttime
    ... more
    I am not saying this is bad idea, I am just pointing out this is breaking change even if it's likely a welcome one
    Also ... as far as I can see ... this is not used consistantly so some options will be unmarshalled from JSON to this but not from the cli flags for example min-iteration-duration will use time.ParseDuration as this is what pflag says or maybe .. not, not certain on whether there is somethign that will check for UnmarshalText before that.

But I would argue this should've been another function in this PR and have this merging be another PR in which this is more tested and looked into
cc @na--

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think it's not going to be a big issue and we have to address #1305 eventually... @mstoykov keep in mind that this won't actually be used for integer JS/JSON values because of https://github.com/loadimpact/k6/blob/caf82ab98ec3b5dc581519a5269c276ca05a2306/lib/types/types.go#L154-L160
So I'm not sure how much effect this will actually have

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but yeah, maybe we shouldn't make half-measures... so I'm fine if we use a separate function this time, with a TODO to switch completely to it and address #1305 in another PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking in the code that you just linked ... I am even more of the opinion that this should be a separate functio now ... as this seems to make things ... worse :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough - I can't think of an example how to break it, but it'd definitely be cleaner, so 👍 from me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the heads up, I agree that it's safer to use a different function. I changed it in the recent push, along with some squashing since I had to fix the merge conflicts anyway.

imiric pushed a commit that referenced this pull request Oct 19, 2020
imiric pushed a commit that referenced this pull request Oct 19, 2020
@imiric imiric dismissed stale reviews from mstoykov and na-- via 37de2a1 October 19, 2020 14:37
@imiric imiric force-pushed the fix/726-738-dns-cache-refresh branch from caf82ab to 37de2a1 Compare October 19, 2020 14:37
@imiric imiric force-pushed the fix/726-738-dns-cache-refresh branch from 37de2a1 to 6cb1a23 Compare October 19, 2020 15:20
@imiric imiric requested review from mstoykov and na-- October 19, 2020 15:25
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected DNS caching
5 participants