Pi-hole DNS resolver service stopped unexpectedly

FabledGumbo · August 25, 2021, 2:15am

Please follow the below template, it will help us to help you!

Expected Behaviour:

[Uninterrupted DNS resolution]

Actual Behaviour:

[Pi-hole's DNS resolver service was stopped according to device's web interface. Rebooting the pi-hole did not restart the DNS resolver service, but using the 'Restart DNS resolver' button in the web interface did. The pi-hole was still accessible via SSH and web interface while the DNS resolver was not running.]

Debug Token:

[https://tricorder.pi-hole.net/0UryRv8f/]

jfb · August 25, 2021, 3:16am

There is nothing out of the ordinary in your debug log. Look in the following logs for any errors or unusual activity around the time the Pi-hole stopped working.

/var/log/pihole-FTL.log

/var/log/syslog

FabledGumbo · August 25, 2021, 5:21am

Unfortunately, I'm not sure the exact time that the problem occurred at, but I don't think the logs are going to be very helpful anyway.

There's no syslog in the directory you mentioned, and my pihole-FTL.log is empty.

MichaIng · August 25, 2021, 2:01pm

You can use journalctl to see all system logs. To check Pi-hole FTL service state changes explicitly:

journalctl -u pihole-FTL

FabledGumbo · August 25, 2021, 8:33pm

The issue reoccurred today and my pi-hole gave me similar advice, actually. I wasn't able to restart the pihole-FTL.service this time and received the following error message:

[✗] Job for pihole-FTL.service failed because a timeout was exceeded.
See "systemctl status pihole-FTL.service" and "journalctl -xe" for details.

Here's the output of systemctl status pihole-FTL.service:

● pihole-FTL.service - LSB: pihole-FTL daemon
   Loaded: loaded (/etc/init.d/pihole-FTL; generated)
   Active: failed (Result: timeout) since Wed 2021-08-25 21:16:37 BST
; 1min 37s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 17715 ExecStart=/etc/init.d/pihole-FTL start (code=killed, signal=TER
M)

Aug 25 21:11:37 DietPi systemd[1]: Starting LSB: pihole-FTL daemon...
Aug 25 21:11:37 DietPi pihole-FTL[17545]: Terminated
Aug 25 21:11:38 DietPi pihole-FTL[17715]: .
Aug 25 21:11:38 DietPi pihole-FTL[17715]: Stopped
Aug 25 21:11:38 DietPi su[17738]: (to pihole) root on no
ne
Aug 25 21:11:38 DietPi su[17738]: pam_unix(su:session): session opened for user
pihole by (uid=0)
Aug 25 21:16:37 DietPi systemd[1]: pihole-FTL.service: S
tart operation timed out. Terminating.
Aug 25 21:16:37 DietPi systemd[1]: pihole-FTL.service: C
ontrol process exited, code=killed, status=15/TERM
Aug 25 21:16:37 DietPi systemd[1]: pihole-FTL.service: F
ailed with result 'timeout'.
Aug 25 21:16:37 DietPi systemd[1]: Failed to start LSB:
pihole-FTL daemon.
root@DietPi:~# Job for pihole-FTL.service failed because a timeout was exceeded.

And here's the output of journalctl -xe: -- Logs begin at Wed 2021-08-25 17:34:40 BST, end at Wed 2021-08-25 21:20:02 BST - Pastebin.com

MichaIng · August 25, 2021, 9:30pm

The service start times out.

In the journal logs I see a lot of entries from "dnsmasq". Not sure currently whether FTL's internal dnsmasq logs with that identifier or is there a dedicated dnsmasq installed on your system?

Also I see Unbound is running: Did that work before, i.e. you configured Unbound to run on a different port like 5335?

If the above are no conflicts, please try the following (as root or via sudo):

systemctl restart pihole-FTL
# It may take a while until the start attempt times out and fails
cat /var/log/pihole-FTL.log
# If the log file does not contain anything relevant or anything at all:
bash -x /etc/init.d/pihole-FTL start

The last command calls the service script in debug mode so that every single command and condition within is printed to console. That way we can see where exactly it hangs.

jfb · August 25, 2021, 9:37pm

On none of my various Pi's (Raspbian, Armbian or DietPi) do I see any dnsmasq entries or pihole-FTL entries with journalctl -xe

From the OP's debug log:

*** [ DIAGNOSING ]: Ports in use
127.0.0.1:5053 cloudflare (IPv4)
127.0.0.1:39531 cloudflare (IPv4)
[::1]:3350 xrdp-sesma (IPv6)
*:22 dropbear (IPv4)
*:22 dropbear (IPv6)
*:3389 xrdp (IPv6)
*:5335 unbound (IPv4)
*:5335 unbound (IPv6)
*:5335 unbound (IPv4)
*:5335 unbound (IPv6)
*:5335 unbound (IPv4)
*:5335 unbound (IPv6)
*:5335 unbound (IPv4)
*:5335 unbound (IPv6)
[::1]:8953 unbound (IPv6)
127.0.0.1:8953 unbound (IPv4)
*:5901 Xtigervnc (IPv4)
*:5901 Xtigervnc (IPv6)
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[80] is in use by lighttpd
[80] is in use by lighttpd

FabledGumbo · August 25, 2021, 9:51pm

I'm not sure if there's a dedicated dnsmasq installed or not. For context, I originally configured the pi-hole to be a recursive DNS resolver using Unbound, but I dropped it because the queries on uncached pages were considerably slower than not using a pi-hole at all. Since then, I converted the pi-hole to a DoH resolver by using Cloudflared, which is how it's configured currently.

I don't think this is actually a pi-hole issue, to be honest. I recently discovered that random wireless clients on my LAN are unintentionally DOS'ing my router via DHCP renewal requests, and I recently made changes at my router to try and mitigate the bad behavior. Since then, the traffic on my pi-hole has ballooned dramatically, which I assume is what caused pihole-FTL to crash the last two days.

Sorry to cross-post, but here's the OpenWrt thread where I'm trying to troubleshoot the LAN DOS'ing issue: Limiting active connections per client using rate limiting and conntrack - Installing and Using OpenWrt / Network and Wireless Configuration - OpenWrt Forum

At this point I'm not sure if it's better to close this thread since I don't think pi-hole is to blame, or if anyone here might happen to have any suggestions for configuration changes I can make to my pi-hole to make it more resilient against being flooded.

jfb · August 25, 2021, 10:00pm

11.8 million queries in 24 hours likely led to memory exhaustion on the Pi. That's a lot of queries.

From your debug log I see you have conditional forwarding enabled. Try disabling that - this can frequently be a source of looping DNS traffic.

    REV_SERVER=true
    REV_SERVER_CIDR=192.168.1.0/24
    REV_SERVER_TARGET=192.168.1.1
    REV_SERVER_DOMAIN=

We can check for loops with the ouput of these commands from the Pi terminal:

echo ">stats >quit" | nc localhost 4711

echo ">top-clients >quit" | nc localhost 4711

echo ">top-domains >quit" | nc localhost 4711

MichaIng · August 25, 2021, 10:09pm

While this shouldn't be the issue, when you don't use Unbound anymore and to assure no dnsmasq installed:

apt purge dnsmasq
apt purge --autoremove unbound

With that amount of requests, probably the timeout happens because Pi-hole takes too long to load all those queries (last 24h) into memory (for dashboard)? Not an issue on a capable server, but on an RPi?
EDIT: Ah jfb is thinking into the same direction.

What is actually the size of the database?

ls -l /etc/pihole/pihole-FTL.db

FabledGumbo · August 25, 2021, 10:48pm

I just tried following both of your advice.

I tried disabling conditional forwarding from the web interface GUI and I think the change took, but it knocked out pihole-FTL.service again.

Here's the output of those commands in the same order that you asked me to run them:

echo ">stats >quit" | nc localhost 4711
domains_being_blocked 1621809
dns_queries_today 11822612
ads_blocked_today 9701
ads_percentage_today 0.082055
unique_domains 1980
queries_forwarded 11353742
queries_cached 7732
clients_ever_seen 40
unique_clients 40
dns_queries_all_types 11822612
reply_NODATA 10
reply_NXDOMAIN 44
reply_CNAME 68
reply_IP 71
privacy_level 0
status enabled

echo ">top-clients >quit" | nc localhost 4711
0 5885044 192.168.1.1 OpenWrt.lan
1 5885033 fd1a:25d6:5a17::1 DietPi

echo ">top-domains >quit" | nc localhost 4711
0 11769535 lb._dns-sd._udp.0.1.168.192.in-addr.arpa

apt purge dnsmasq did not remove anything because there was nothing to remove, but apt purge --autoremove unbound successfully removed Unbound.

Here's the output: -rw-r--r-- 1 pihole pihole 2540281856 Aug 25 23:37 /etc/pihole/pihole-FTL.db

I'm guessing that means the database is 2.5 GB big.

jfb · August 25, 2021, 11:19pm

FabledGumbo:

echo ">top-clients >quit" | nc localhost 4711
0 5885044 192.168.1.1 OpenWrt.lan
1 5885033 fd1a:25d6:5a17::1 DietPi

echo ">top-domains >quit" | nc localhost 4711
0 11769535 lb._dns-sd._udp.0.1.168.192.in-addr.arpa

That's your problem. The arpa query is an mDNS query (multicast DNS). Query volume is likely exacerbated by the conditional forwarding.

That is 2.5 GB, pretty large. The database is storing all your queries, and with 11.8 million a day it grows pretty quickly. If you don't mind losing your database (which is so large it has limited usefulness), run the following commands to delete it and start a new one. With conditional forwarding disabled, your queries should be under control and the database won't grow so quickly, and Pi-hole will be able to load the previous 24 hours of data on a restart without failing.

Note the alternate option below that if you want to keep your existing database for future reference.

Remove database and start with fresh:

sudo service pihole-FTL stop

sudo rm /etc/pihole/pihole-FTL.db

sudo service pihole-FTL start

Save old database and start with fresh:

sudo service pihole-FTL stop

sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL-old.db

sudo service pihole-FTL start

FabledGumbo · August 26, 2021, 12:38am

I went ahead and dumped the database. I'll leave the pi-hole running without conditional forwarding and see if it breaks the next time the wireless clients on my LAN freak out.

FabledGumbo · August 31, 2021, 8:26am

My pi-hole hasn't crashed since I disabled conditional forwarding and dumped the database, so I marked jfb's suggestion to disable conditional forwarding as the solution.

system · September 21, 2021, 8:26am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.