Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: "attempt to calculate the remainder with a divisor of zero" #161

Open
jeromegn opened this issue Nov 20, 2022 · 12 comments
Open

Panic: "attempt to calculate the remainder with a divisor of zero" #161

jeromegn opened this issue Nov 20, 2022 · 12 comments

Comments

@jeromegn
Copy link

I got the following panic:

thread 'main' panicked at 'attempt to calculate the remainder with a divisor of zero', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:709:15

Looks like this line is problematic:

pub fn info<'a>(&self, list: &'a ServerList) -> &'a ServerInfo {
&list[self.cur % list.servers.len()]
}

I haven't had time to dig into it, but I can take a look next week.

@partim
Copy link
Member

partim commented Nov 21, 2022

It seems you are iterating over an empty server list. Did you perchance forget to call ResolvConf::finalize before using the config?

@jeromegn
Copy link
Author

jeromegn commented Nov 21, 2022

I did! Well, I used ResolvConf::default() which finalizes. Then I added my servers and changed a few options. That's a common pattern in Rust.

Perhaps it would be better to use a "builder" pattern since finalization is required?

What's odd is I explicitly put servers in there. Could this condition happen under different circumstances?

            let mut conf = ResolvConf::default();
            conf.servers = vec![ServerConf::new(config.fwd_addr, TransportKind::Tcp)];
            conf.options.timeout = Duration::from_secs(5);

            StubResolver::from_conf(conf)

Edit: I now see the docs says this is like a builder pattern. I think it's less error-prone to make a ResolvConfBuilder that can only return a ResolvConf on finalize (or build).

@partim
Copy link
Member

partim commented Nov 21, 2022

I agree this should be changed into having a separate builder type.

But in any case, looking at the code properly now, it should be able to deal with empty server lists. Could you perchance get a stack trace to see what it’s been doing before it crashed? (Just the domain-related parts of the stack trace are enough.)

@jeromegn
Copy link
Author

Yes. Initially I thought I wasn't running with the proper RUST_BACKTRACE env vars and didn't bother to look for it, but we actually are!

thread 'main' panicked at 'attempt to calculate the remainder with a divisor of zero', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/re
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/panicking.rs:142:14
   2: core::panicking::panic
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/panicking.rs:48:5
   3: <domain::resolv::stub::ServerListCounter>::info
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:709:15
   4: <domain::resolv::stub::Query>::current_server
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:349:9
   5: <domain::resolv::stub::Query>::run_query::{closure#0}
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:338:22
   6: <core::future::from_generator::GenFuture<<domain::resolv::stub::Query>::run_query::{closure#0}> as core::future::future::Future>::poll
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/future/mod.rs:91:19
   7: <domain::resolv::stub::Query>::run::{closure#0}
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:287:47
   8: <core::future::from_generator::GenFuture<<domain::resolv::stub::Query>::run::{closure#0}> as core::future::future::Future>::poll
             at ./rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/future/mod.rs:91:19
   9: <domain::resolv::stub::StubResolver>::query::<domain::base::name::parsed::ParsedDname<&bytes::bytes::Bytes>, domain::base::question::Question<domain::base::name::parsed::ParsedDname<&bytes::bytes::Bytes>>>::{closure#0}
             at ./usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/domain-0.7.0/src/resolv/stub/mod.rs:111:13

@jeromegn
Copy link
Author

This appears to happen repeatedly during the same execution. I'm assuming it can get in a bad state and stay that way.

@partim
Copy link
Member

partim commented Nov 21, 2022

Using a stub resolver in the way you build it above works fine here – indeed in this case you don’t need to explicitly finalize t. It also looks in the code like it should handle an empty UDP list correctly by using the TCP list.

Can you try and create a minimalist breaking example from your code? I don’t think I’ll get anywhere from just staring at the code longer ;)

@jeromegn
Copy link
Author

jeromegn commented Nov 21, 2022

Ah! Maybe this is related them: I have 2 stub resolvers, one for UDP and one for TCP. I'm forwarding queries and I want to keep the same protocol as the original query.

        let tcp_recursor = {
            let mut conf = ResolvConf::default();
            conf.servers = vec![ServerConf::new(config.fwd_addr, TransportKind::Tcp)];
            conf.options.timeout = Duration::from_secs(5);

            StubResolver::from_conf(conf)
        };

        let udp_recursor = {
            let mut conf = ResolvConf::default();
            conf.servers = vec![ServerConf::new(config.fwd_addr, TransportKind::Udp)];
            conf.options.timeout = Duration::from_secs(5);

            StubResolver::from_conf(conf)
        };

Later on, I use them like so:

                let res = match w.kind() {
                    TransportKind::Udp => self.udp_recursor.query(q).await,
                    TransportKind::Tcp => self.tcp_recursor.query(q).await,
                };

(where w is the "transport" used for the original query)

Maybe there's a better way to achieve this? My current reasoning is: if the query is UDP, then that's all I can respond with. Often that'll end up being a truncate response and the client will make a second query over TCP, in which case I'll use the TCP resolver.

Can you try and create a minimalist breaking example from your code? I don’t think I’ll get anywhere from just staring at the code longer ;)

I don't quite know how this happens :( The server we forward to is a local unbound on the host. I've only seen happen on 4 hosts out of 50+ running this software for over a week now. I'm not logging nearly enough information to know when it's first triggered.

@partim
Copy link
Member

partim commented Nov 22, 2022

We have a plan to factor our the actual networking from the stub resolver to make it easier to build proxies and such. For the time being, you approach looks good and should work.

I guess I’ll stare a bit more at the code to try and figure out why it tries using the empty list.

@jeromegn
Copy link
Author

Thank you!

Anything I can do in the meantime? I deployed our server to more hosts and this seems to happen quite a bit. Restarting the server seems to be the only fix.

I do not mind refactoring our own code to circumvent this issue. I could manually reproduce what the stub resolver does without the server logic. We have a single server, so we don't need that whole thing.

@partim
Copy link
Member

partim commented Nov 22, 2022

The actual querying code is rather simple. All of it is in src/resolv/stub/mod.rs.

The question is turned into a query in Query::create_message and ServerInfo::prepare_message (the latter adds an OPT record if the server supports EDNS – Unbound does, so you want to do that), and the ServerInfo::query wraps the future for the actual query (ServerInto::query_tcp or ServerInfo::query_udp) into a timeout and fires it off. The query futures basically send the request and then wait for a response that matches the request. That’s really it.

@jeromegn
Copy link
Author

Thanks! For now I've forked this repo and make Query::run_query public. Using that instead of run shouldn't use the next_server function.

@partim
Copy link
Member

partim commented Nov 23, 2022

Cheeky ;)

For what it’s worth: Factoring out the transports and making them available directly is probably the next thing we are going to do once I have finished #160.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants