Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding/Deleting multiple Custom DNS Records in a short amount of time triggers systemd service StartLimitBurst leading to disable pihole-FTL service #2606

Open
mnlhfr opened this issue Jun 9, 2023 · 11 comments · May be fixed by #3019

Comments

@mnlhfr
Copy link

mnlhfr commented Jun 9, 2023

Versions

  • Pi-hole: v5.17.1
  • AdminLTE: v5.20.1
  • FTL: v5.23

Platform

  • OS and version: Ubuntu 22.04.2 LTS (5.15.0-73-generic)
  • Platform: Proxmox-VM

Expected behavior

Adding or Deleting custom DNS records through webadmin GUI or respectively directly through POST requests to the API endpoint at /admin/scripts/pi-hole/php/customdns.php causes pihole-FTL to just be reloaded.

Actual behavior / bug

repeatedly adding or deleting DNS records through the web interface (/admin/scripts/pi-hole/php/customdns.php) causes pihole-FTL to be restarted (not reloaded) with each and every DNS record added. This leads to systemd hitting the StartLimitBurst=5 configured in /etc/systemd/system/pihole-FTL.service leading to consecutive restarts of the service to fail.

systemctl status pihole-FTL:

× pihole-FTL.service - Pi-hole FTL
     Loaded: loaded (/etc/systemd/system/pihole-FTL.service; enabled; vendor preset: enabled)
     Active: failed (Result: start-limit-hit) since Fri 2023-06-09 19:26:53 UTC; 17s ago
    Process: 3209 ExecStartPre=/opt/pihole/pihole-FTL-prestart.sh (code=exited, status=0/SUCCESS)
    Process: 3222 ExecStart=/usr/bin/pihole-FTL -f (code=exited, status=0/SUCCESS)
    Process: 3245 ExecStopPost=/opt/pihole/pihole-FTL-poststop.sh (code=exited, status=0/SUCCESS)
   Main PID: 3222 (code=exited, status=0/SUCCESS)
        CPU: 165ms

Jun 09 19:26:53 pihole01 pihole-FTL[3222]: [2023-06-09 19:26:52.992 3222M] Resizing "FTL-queries" from 4587520 to (86016 * 56) == 4816896 (/dev/shm: 5.1MB used, 4.2GB total, FTL uses 5.1MB)
Jun 09 19:26:53 pihole01 pihole-FTL[3222]: [2023-06-09 19:26:52.997 3222M] Resizing "FTL-queries" from 4816896 to (90112 * 56) == 5046272 (/dev/shm: 5.4MB used, 4.2GB total, FTL uses 5.3MB)
Jun 09 19:26:53 pihole01 pihole-FTL[3222]: [2023-06-09 19:26:52.001 3222M] Resizing "FTL-queries" from 5046272 to (94208 * 56) == 5275648 (/dev/shm: 5.6MB used, 4.2GB total, FTL uses 5.6MB)
Jun 09 19:26:53 pihole01 pihole-FTL[3222]: [2023-06-09 19:26:53.007 3222M] Resizing "FTL-queries" from 5275648 to (98304 * 56) == 5505024 (/dev/shm: 5.8MB used, 4.2GB total, FTL u
Jun 09 19:26:53 pihole01 systemd[1]: Stopping Pi-hole FTL...
Jun 09 19:26:53 pihole01 systemd[1]: pihole-FTL.service: Deactivated successfully.
Jun 09 19:26:53 pihole01 systemd[1]: Stopped Pi-hole FTL.
Jun 09 19:26:53 pihole01 systemd[1]: pihole-FTL.service: Start request repeated too quickly.
Jun 09 19:26:53 pihole01 systemd[1]: pihole-FTL.service: Failed with result 'start-limit-hit'.
Jun 09 19:26:53 pihole01 systemd[1]: Failed to start Pi-hole FTL.

Steps to reproduce

Steps to reproduce the behavior:

Shell

  1. run this shell script
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test1.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test2.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test3.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test4.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test5.local
/usr/local/bin/pihole -a addcustomdns 127.0.0.1 test6.local
  1. systemctl status pihole-FTL will now be killed and fail to start with Failed with result 'start-limit-hit'

Webadmin

  1. Add or delete 6 DNS records in under 60 seconds.
  2. systemctl status pihole-FTL will now be killed and fail to start with Failed with result 'start-limit-hit'

Debug Token

I don't think this should be necessary here.

Additional context

I am using the pihole provider from kubernetes-sigs/external-dns in combination with borchero/switchboard to automatically add DNS records for services in my cluster when they get exposed. This worked fine while setting everything up.

However, after exposing more services for external-dns to manage DNS for, pihole-FTL started to behave oddly/refused to start.
The way kubernetes-sigs/external-dns adds the dns records, is by just simply sending POST requests to the same endpoint the webadmin GUI uses (/admin/scripts/pi-hole/php/customdns.php). Due to this API endpoint not offering any update functionality and also the fact that no TXT records are possible, this results in a little bit of a "spammy" behaviour from external-dns. As far as i understand the implementation of the pihole provider on external-dns, there are a couple workarounds in place, due to the API limitations of pihole-FTL. There might also be another bug in the pihole provider for external-dns, but I have not yet spent the time to dig into that side any deeper and I believe that even if this was the case, it would not change validity of this bug report.

For a quick and dirty workaround I adjusted StartLimitBurst in /etc/systemd/system/pihole-FTL.service.

When digging through the source code of pihole I noticed a couple things and I am not exactly sure where this should be fixed.

  1. https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L223
    empty string as default value for the reload argument.

  2. https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L280
    effectively results in pihole -a addcustomdns 127.0.0.1 test1.local (notice neither true nor false are part of the argument for restart)

  3. https://github.com/pi-hole/pi-hole/blob/6a45c6a8e027e1ac30d4556a88f31684bc80ccf1/pihole#L579
    pihole shell script defers to AddCustomDNSAddress in webpage.sh

  4. https://github.com/pi-hole/pi-hole/blob/6a45c6a8e027e1ac30d4556a88f31684bc80ccf1/advanced/Scripts/webpage.sh#L719-L743
    RestartDNS command will be issued due to missing empty reload argument

  5. https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L284
    RestartDNS triggered again. Should this be restartdns reload-lists instead of just restartdns here?

Conclusion

The issue here seems to affect both the AdminLTE as well as the pihole repository, so I am not entirely sure how the best or preferred way of fixing this would look like.

@yubiuser
Copy link
Member

yubiuser commented Jun 9, 2023

Thanks for your detailed analysis, you included all relevant aspects. However, I'm not sure if this is really a bug or not. Pivot is the

StartLimitBurst in /etc/systemd/system/pihole-FTL.service

This was added to prevent endless-start-stop cycles in case of an error. Adding a custom DNS record needs a full FTL restart to take effect. However, if you know you're going to add a lot in a row, $reload can help to postpone the restart until everything is added. This is what we do when we import a (teleporter) backup (see #2519). I'm not familiar with the tools you use to add the records, but as the access the API endpoints directly they always trigger the restart immediately.
I have no good idea how to solve (as in balance) between preventing to-many-restarts and being able to (manually) adding a lot of custom DNS records (other than increasing the limit). I think no one of us thought about adding >5 records manually within 60 seconds.

@DL6ER
I guess the same issue will happen in v6?

@mnlhfr
Copy link
Author

mnlhfr commented Jun 9, 2023

thanks for your quick reply!

I initially thought that a pihole restartdns reload-lists would be enough, but I seem to have missed that during my testing earlier.

However, as far as I can tell it is enough to issue a systemctl reload, which does not cause any issues with the service, even if triggered many times in quick succession.

  1. try to resolv custom DNS record that does not exist
# nslookup testing-01.local localhost
Server:         localhost
Address:        127.0.0.1#53

** server can't find testing-01.local: NXDOMAIN
  1. add the record using cmd utility with reload=false
# pihole -a addcustomdns 127.0.0.1 testing-01.local false
  [✓] Adding custom DNS entry...
  1. record still not in effect
# nslookup testing-01.local localhost
Server:         localhost
Address:        127.0.0.1#53

** server can't find testing-01.local: NXDOMAIN
  1. i initially thought that restartdns reload-lists would do the job, but:
# pihole restartdns reload-lists
  [✓] Reloading DNS lists
# nslookup testing-01.local localhost
Server:         localhost
Address:        127.0.0.1#53

** server can't find testing-01.local: NXDOMAIN
  1. However, if we reload the service using systemctl:
# systemctl reload pihole-FTL
  1. the record is active now
# nslookup testing-01.local localhost
Server:         localhost
Address:        127.0.0.1#53

Name:   testing-01.local
Address: 127.0.0.1

So, just reloading the service with systemctl seems to be enough and doesn't seem to trigger any issues with the service:

  1. StartLimitBurst is set to the default value of 5
# systemctl cat pihole-FTL | grep StartLimitBurst
StartLimitBurst=5
  1. reload the service 100 times
# for i in $(seq 1 100); do systemctl reload pihole-FTL; done
  1. service still in active state
# systemctl status pihole-FTL
● pihole-FTL.service - Pi-hole FTL
     Loaded: loaded (/etc/systemd/system/pihole-FTL.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-06-09 21:19:01 UTC; 1min 43s ago
    Process: 7317 ExecStartPre=/opt/pihole/pihole-FTL-prestart.sh (code=exited, status=0/SUCCESS)
    Process: 7450 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
   Main PID: 7330 (pihole-FTL)
      Tasks: 19 (limit: 9401)
     Memory: 10.4M
        CPU: 279ms
     CGroup: /system.slice/pihole-FTL.service
             └─7330 /usr/bin/pihole-FTL -f

Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.
Jun 09 21:20:39 pihole01 systemd[1]: Reloading Pi-hole FTL...
Jun 09 21:20:39 pihole01 systemd[1]: Reloaded Pi-hole FTL.

@rdwebdesign
Copy link
Member

rdwebdesign commented Jun 9, 2023

Your analysis in not correct here:

https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L280
effectively results in pihole -a addcustomdns 127.0.0.1 test1.local (notice neither true nor false are part of the argument for restart)

When needed, $reload will be set to false on the lines just above the code you posted:

https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L274-L281

Resulting in: pihole -a addcustomdns 127.0.0.1 test1.local false

@mnlhfr
Copy link
Author

mnlhfr commented Jun 9, 2023

Your analysis in not correct here:

https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L280

effectively results in pihole -a addcustomdns 127.0.0.1 test1.local (notice neither true nor false are part of the argument for restart)

When needed, $reload will be set to false on the lines just above the code you posted:

https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L274-L281

Resulting in: pihole -a addcustomdns 127.0.0.1 test1.local false

you are correct, thanks for pointing that out! and sorry for the confusing!

@yubiuser
Copy link
Member

yubiuser commented Jun 9, 2023

However, as far as I can tell it is enough to issue a systemctl reload, which does not cause any issues with the service, even if triggered many times in quick succession.

I think you are correct. The reload triggers the read of custom.list by dnsmasq

Jun  9 23:35:39 dnsmasq[3578109]: read /etc/hosts - 7 names
Jun  9 23:35:39 dnsmasq[3578109]: read /etc/pihole/custom.list - 22 names
Jun  9 23:35:39 dnsmasq[3578109]: read /etc/pihole/local.list - 0 names

Note: this will only work for the custom DNS records, not custom CNAME records (as files in /etc/dnsmasq.d/ are not read by reload)

@yubiuser
Copy link
Member

yubiuser commented Jun 9, 2023

Please try if

pihole checkout core no_reload fixes the issue for you.

@mnlhfr
Copy link
Author

mnlhfr commented Jun 9, 2023

Please try if

pihole checkout core no_reload fixes the issue for you.

not quite, but when I additionally add the reload to the func.php as well, it seems to be working:
https://github.com/pi-hole/AdminLTE/blob/3a11976ee8ecc50e2dd9efd76caad1ad41894dd5/scripts/pi-hole/php/func.php#L284

thanks!

@yubiuser
Copy link
Member

yubiuser commented Jun 9, 2023

Good catch.

We did not plan to release any new v5 version and focus on v6 - however the changes necessary here are trivial after the bug was dissected. We'll discuss internally how to proceed.

@DL6ER
Copy link
Member

DL6ER commented Jun 10, 2023

I guess the same issue will happen in v6?

No. v6 tries to be "more clever" than many of the elements we have in v5. This involves trying to minimize restarting of FTL to as seldom as possible. Custom DNS records are a prime example. When I coded this part of the v6 interface, I very much disliked that the DNS cache is completely flushed by a restart/reload so I changed two things: Firstly, this file is now in a watched directors (hostsdir) so FTL finds itself out when the file is changed (no need to send a signal at all). Secondly, I submitted a patch upstream into dnsmasq (it is already included in the current release of dnsmasq) that ensures the DNS cache is not completely flushed but only those elements from the updated list are removed (and then repopulated).

@yubiuser
Copy link
Member

After internal discussion we decided to not release a new v5 version. Reasons are

  • issue exists a really long time - this is the first time someone noticed it
  • it's fixed for the affected user
  • it's fixed in v6 already.

@BoKKeR
Copy link

BoKKeR commented Feb 1, 2024

@PromoFaux PromoFaux linked a pull request May 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants