Please follow the below template, it will help us to help you!
Expected Behaviour:
[Uninterrupted DNS resolution]
Actual Behaviour:
[Pi-hole's DNS resolver service was stopped according to device's web interface. Rebooting the pi-hole did not restart the DNS resolver service, but using the 'Restart DNS resolver' button in the web interface did. The pi-hole was still accessible via SSH and web interface while the DNS resolver was not running.]
There is nothing out of the ordinary in your debug log. Look in the following logs for any errors or unusual activity around the time the Pi-hole stopped working.
The issue reoccurred today and my pi-hole gave me similar advice, actually. I wasn't able to restart the pihole-FTL.service this time and received the following error message:
[✗] Job for pihole-FTL.service failed because a timeout was exceeded.
See "systemctl status pihole-FTL.service" and "journalctl -xe" for details.
Here's the output of systemctl status pihole-FTL.service:
● pihole-FTL.service - LSB: pihole-FTL daemon
Loaded: loaded (/etc/init.d/pihole-FTL; generated)
Active: failed (Result: timeout) since Wed 2021-08-25 21:16:37 BST
; 1min 37s ago
Docs: man:systemd-sysv-generator(8)
Process: 17715 ExecStart=/etc/init.d/pihole-FTL start (code=killed, signal=TER
M)
Aug 25 21:11:37 DietPi systemd[1]: Starting LSB: pihole-FTL daemon...
Aug 25 21:11:37 DietPi pihole-FTL[17545]: Terminated
Aug 25 21:11:38 DietPi pihole-FTL[17715]: .
Aug 25 21:11:38 DietPi pihole-FTL[17715]: Stopped
Aug 25 21:11:38 DietPi su[17738]: (to pihole) root on no
ne
Aug 25 21:11:38 DietPi su[17738]: pam_unix(su:session): session opened for user
pihole by (uid=0)
Aug 25 21:16:37 DietPi systemd[1]: pihole-FTL.service: S
tart operation timed out. Terminating.
Aug 25 21:16:37 DietPi systemd[1]: pihole-FTL.service: C
ontrol process exited, code=killed, status=15/TERM
Aug 25 21:16:37 DietPi systemd[1]: pihole-FTL.service: F
ailed with result 'timeout'.
Aug 25 21:16:37 DietPi systemd[1]: Failed to start LSB:
pihole-FTL daemon.
root@DietPi:~# Job for pihole-FTL.service failed because a timeout was exceeded.
In the journal logs I see a lot of entries from "dnsmasq". Not sure currently whether FTL's internal dnsmasq logs with that identifier or is there a dedicated dnsmasq installed on your system?
Also I see Unbound is running: Did that work before, i.e. you configured Unbound to run on a different port like 5335?
If the above are no conflicts, please try the following (as root or via sudo):
systemctl restart pihole-FTL
# It may take a while until the start attempt times out and fails
cat /var/log/pihole-FTL.log
# If the log file does not contain anything relevant or anything at all:
bash -x /etc/init.d/pihole-FTL start
The last command calls the service script in debug mode so that every single command and condition within is printed to console. That way we can see where exactly it hangs.
On none of my various Pi's (Raspbian, Armbian or DietPi) do I see any dnsmasq entries or pihole-FTL entries with journalctl -xe
From the OP's debug log:
*** [ DIAGNOSING ]: Ports in use
127.0.0.1:5053 cloudflare (IPv4)
127.0.0.1:39531 cloudflare (IPv4)
[::1]:3350 xrdp-sesma (IPv6)
*:22 dropbear (IPv4)
*:22 dropbear (IPv6)
*:3389 xrdp (IPv6)
*:5335 unbound (IPv4)
*:5335 unbound (IPv6)
*:5335 unbound (IPv4)
*:5335 unbound (IPv6)
*:5335 unbound (IPv4)
*:5335 unbound (IPv6)
*:5335 unbound (IPv4)
*:5335 unbound (IPv6)
[::1]:8953 unbound (IPv6)
127.0.0.1:8953 unbound (IPv4)
*:5901 Xtigervnc (IPv4)
*:5901 Xtigervnc (IPv6)
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[80] is in use by lighttpd
[80] is in use by lighttpd
I'm not sure if there's a dedicated dnsmasq installed or not. For context, I originally configured the pi-hole to be a recursive DNS resolver using Unbound, but I dropped it because the queries on uncached pages were considerably slower than not using a pi-hole at all. Since then, I converted the pi-hole to a DoH resolver by using Cloudflared, which is how it's configured currently.
I don't think this is actually a pi-hole issue, to be honest. I recently discovered that random wireless clients on my LAN are unintentionally DOS'ing my router via DHCP renewal requests, and I recently made changes at my router to try and mitigate the bad behavior. Since then, the traffic on my pi-hole has ballooned dramatically, which I assume is what caused pihole-FTL to crash the last two days.
At this point I'm not sure if it's better to close this thread since I don't think pi-hole is to blame, or if anyone here might happen to have any suggestions for configuration changes I can make to my pi-hole to make it more resilient against being flooded.
While this shouldn't be the issue, when you don't use Unbound anymore and to assure no dnsmasq installed:
apt purge dnsmasq
apt purge --autoremove unbound
With that amount of requests, probably the timeout happens because Pi-hole takes too long to load all those queries (last 24h) into memory (for dashboard)? Not an issue on a capable server, but on an RPi? EDIT: Ah jfb is thinking into the same direction.
That's your problem. The arpa query is an mDNS query (multicast DNS). Query volume is likely exacerbated by the conditional forwarding.
That is 2.5 GB, pretty large. The database is storing all your queries, and with 11.8 million a day it grows pretty quickly. If you don't mind losing your database (which is so large it has limited usefulness), run the following commands to delete it and start a new one. With conditional forwarding disabled, your queries should be under control and the database won't grow so quickly, and Pi-hole will be able to load the previous 24 hours of data on a restart without failing.
Note the alternate option below that if you want to keep your existing database for future reference.
Remove database and start with fresh:
sudo service pihole-FTL stop
sudo rm /etc/pihole/pihole-FTL.db
sudo service pihole-FTL start
Save old database and start with fresh:
sudo service pihole-FTL stop
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL-old.db
sudo service pihole-FTL start
I went ahead and dumped the database. I'll leave the pi-hole running without conditional forwarding and see if it breaks the next time the wireless clients on my LAN freak out.
My pi-hole hasn't crashed since I disabled conditional forwarding and dumped the database, so I marked jfb's suggestion to disable conditional forwarding as the solution.