PiHole 5 suddenly stops to resolve DNS requests

Please follow the below template, it will help us to help you!

Actual Behaviour:

pihole FTL stops resolving DNS requests for unknown reasons - sometimes after some hours, sometimes after a few seconds. I cannot see anything in the logs. I tried to update to the latest development branch, without success. After restarting DNS resolver using 'pihole restartdns' or 'service pihole.FTL restart', it works again, but after some time it stops to resolve DNS requests again.

'pihole -t' shows the following output when the issue occurs:

root@raspi1:/home/pi# pihole -t
  [i] Press Ctrl-C to exit
11:27:00: forwarded 71.178.168.192.in-addr.arpa to 192.168.178.1
11:27:00: reply 192.168.178.71 is kasa71-hs100.fritz.box
11:27:00: reply 192.168.178.71 is HS100.fritz.box
11:27:00: query[PTR] 5.6.6.0.8.c.e.c.0.b.f.8.4.f.1.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.d.f.ip6.arpa from 127.0.0.1
11:27:00: config fd00::21f4:8fb0:cec8:665 is NXDOMAIN
11:27:00: query[PTR] c.5.e.1.3.e.d.3.1.7.f.2.5.5.8.c.0.0.0.0.0.0.0.0.0.0.0.0.0.0.d.f.ip6.arpa from 127.0.0.1
11:27:00: config fd00::c855:2f71:3de3:1e5c is NXDOMAIN
11:27:00: query[PTR] 1.3.0.5.5.e.5.7.3.8.d.8.9.8.4.5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.d.f.ip6.arpa from 127.0.0.1
11:27:00: config fd00::5489:8d83:75e5:5031 is NXDOMAIN
11:27:01: query[A] login.microsoftonline.com from fd00::c855:2f71:3de3:1e5c

That's it, no more output and noting happens on the log. When I restart FTL service from a different shell, log continues for some time, before it again stops on a query line.

Expected Behaviour:

pihole.FTL should work without interruptions :slight_smile:

Debug Token:

https://tricorder.pi-hole.net/znv6s3r2dd

If you are looking for stability, the dev branch is not what you want to be running. It is developmental, and not a stable release.

I recommend going back to the stable master with pihole checkout master, then we can troubleshoot from there.

I tried several versions without success. Even "master" versions sometimes stop working. And also reinstalling or reconfiguring the while system did not help. I am back on master now, I recreated the pihole-FTP.db, and resolving still stops sometimes for unknown reasons. Nothing special in pihole-FTL.log, nothing in pihole -t output...

  Current Pi-hole version is v5.1.2.
  Current AdminLTE version is v5.1.1.
  Current FTL version is v5.2.

I have also been seeing this issue on the dev branches
Please add

DEBUG_LOCKS=true
DEBUG_DATABASE=true

in /etc/pihole/pihole-FTL.conf and pihole restartdns

the log file (likely the entire one) is of interest when the outage happens and can be uploaded to tricorder once you have restored DNS and then we can take a look

Just to complement technicalpyro's hint for uploading files into Pi-hole's tricorder: You can execute

cat /var/log/pihole-FTL.log | pihole tricorder

or substitute with any relevant other log, and then post the token(s) here.
(see also How do I debug my Pi-hole installation? for other ways)

1 Like

Yeah, development contains the change that Pi-hole stores one hostname per address instead of one hostname per device (so multiple hostnames per device are possible). This is important in case of ISP-provided IPv6 hostnames. It required a non-backwards-compatible extension to the database.

Just so I get this right: There was nothing in between 04:59:00 and 08:32:21 ? There was also no activity at all in between?


It was exactly what we fixed for clients in pi-hole/FTL#889 but this time, the bug showed up during resolving the host name of upstream destinations which was not active for some time. This has never happened on our test systems, so we didn't see this before. Sorry.

It would be awesome if you could check whether

ftl checkout fix/development_locked_old_upstreams

fixes the issue you are seeing.

Yeah, this somehow sounds plausible given the bug the mentioned PR fixes. BTW, it was already merged into development so no need to switch to the custom branch.

The issue was that when FTL saw an upstream destination which was not being used for two hours, it could end up in a dead-lock. This never happened to me, because my VoIP phone does some two queries every hour (24/7), so there is activity even when everything else is offline. This should now be fixed.

Hmm, interesting. How old was the gravity database you restored from the backup?

The error is not critical, the trigger is added thereafter. If it wasn't there, then it didn't need to be removed.

It should have been created in the 2 to 3 step:

Apparently, your backup was somehow broken, I still don't expect any issues from that.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.