TP-link Deco X55 Mesh System is running as modem + DHCP server with a single Pi-hole for its DNS server. This configuration is functioning as expected 95% of the time. The Deco system has a scheduled reboot daily at 3AM.
Raspberry Pi 5/8GB
Pi-hole [v5.18.3] FTL [v5.25.2] Web Interface [v5.21]
The Pi is also running Unbound.
Actual Behavior:
I have had several instances (one this morning) where the Pi is intermittently unresponsive (web portal down, no DNS, etc) and haven't figured out a time/schedule/pattern, where it will take several reboots to recover. As part of that recovery I typically have to remove it as the DNS server on the DHCP scope - I suspect it's unable to recover when getting hammered with requests. The issue this morning appears to have occurred when the 3AM reboot happened and I see the number of DNS requests shoot through the roof. The Pi also failed last night around 8PM.
As part of troubleshooting I've disabled the use of Unbound and also noted that I could be hitting the rate limit for a 60s period during morning reboots for whatever reason.
I increased the rate limit from 1000 to 2000 as well.
Do you have Pi-hole "Conditional Forwarding" configured pointed at your router IP?
And do you also have the Pi-hole IP configured in the router WAN DNS settings?
If so, this closes a partial DNS loop that can trigger rate-limiting.
No Conditional Forwarding configured, I suppose I could add that in?
Sadly the TP-Link unit does not allow you to configure the WAN DNS - it can only be configured on the DHCP scope it's providing. There also is not a method to disable DHCP services on Deco
Yes.
Just as long as you dont close a DNS loop by configuring the Pi-hole IP in the WAN/Internet DNS settings on the router.
Dont need to (see also above).
Entering the Pi-hole IP in the LAN DHCP server DNS settings is sufficient.
Usually no other router settings needs changing from factory defaults.
Could you post output for below four?
Might want to redact some bits!
nc localhost 4711 <<< '>stats >quit'
nc localhost 4711 <<< '>top-domains >quit'
nc localhost 4711 <<< '>top-clients >quit'
sudo pihole-FTL dhcp-discover
Did you check the logs in below folder around the time of the crashes?
/var/log/pihole/
The .gz archived ones can be browsed with the zless command.
Yup! I'm familiar with that guide and this is how it is setup. Appreciate the link to it.
Will work on grabbing output from above.
Regarding undervoltage/brownouts, this was something that crossed my mind. While I am using an official RPi USB-C brick I do not have it plugged into a battery backup - just a surge protector. So very potentially could be an issue here.
pihole.log quite literally stops recording until it reboots (crash noted at 17:10) -
Dec 12 17:08:06 dnsmasq[4334]: forwarded dns.msftncsi.com to 1.0.0.1
Dec 12 17:08:06 dnsmasq[4334]: reply dns.msftncsi.com is 131.107.255.255
Dec 12 17:11:44 dnsmasq[1408]: started, version pi-hole-v2.90+1 cachesize 10000
Dec 12 17:11:44 dnsmasq[1408]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n IDN DHCP DHCPv6 Lua TFTP no-conntrack ipset no-nftset auth cryptohash DNSSEC loop-detect inotify dumpfile
Dec 12 17:11:44 dnsmasq[1408]: using nameserver 1.1.1.1#53
I also see this from running
journalctl -b -1
Dec 12 16:39:01 raspberrypi systemd[1]: Finished phpsessionclean.service - Clean php session files.
Dec 12 17:07:02 raspberrypi sshd[10011]: Connection reset by 192.168.0.58 port 56829 [preauth]
Dec 12 17:07:47 raspberrypi kernel: macb 1f00100000.ethernet eth0: Link is Down
Dec 12 17:07:50 raspberrypi NetworkManager[810]: [1734041270.6632] device (eth0): carrier: link connected
Dec 12 17:07:50 raspberrypi kernel: macb 1f00100000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
Dec 12 17:07:51 raspberrypi kernel: macb 1f00100000.ethernet eth0: Link is Down
Dec 12 17:07:54 raspberrypi NetworkManager[810]: [1734041274.7592] device (eth0): carrier: link connected
Dec 12 17:07:54 raspberrypi kernel: macb 1f00100000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
Dec 12 17:07:57 raspberrypi kernel: macb 1f00100000.ethernet eth0: Link is Down
Dec 12 17:08:00 raspberrypi NetworkManager[810]: [1734041280.9032] device (eth0): carrier: link connected
Dec 12 17:08:00 raspberrypi kernel: macb 1f00100000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
Dec 12 17:09:01 raspberrypi CRON[10216]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Dec 12 17:09:01 raspberrypi CRON[10217]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Dec 12 17:09:01 raspberrypi CRON[10216]: pam_unix(cron:session): session closed for user root
Dec 12 17:09:01 raspberrypi systemd[1]: Starting phpsessionclean.service - Clean php session files...
Dec 12 17:09:01 raspberrypi systemd[1]: phpsessionclean.service: Deactivated successfully.
Dec 12 17:09:01 raspberrypi systemd[1]: Finished phpsessionclean.service - Clean php session files.
lines 1252-1336/1336 (END)
Looking at the most recent (no -1) it simply starts at the next reboot/power on at 17:11
The fact that you can ping it but not ssh indicates you don't have a pihole issue as these have nothing to with dns. A poor or underated power source could cause this. For the PI5 are you using the the official power brick for something else?
What are you using for a boot device? SD Card, USB, NVME? An SD Card going bad could be a cause or, if your using a NVME Hat on the PI5 it might not play nice with PCIE 3.0 etc.
It might be good to know what OS / version your running on the pi as well. Whats the output of uname -a and cat /etc/os-release
Yeah I also suspect some HW failure like an inadequate power supply.
Or poor USB cable for power.
Or maybe a faulty ethernet cable/switch port.
The eth0 link shouldnt go down three times over a period of 10 seconds.
Or were you fiddling with the connection at that time?
Did below return any when searching the whole journal?
Yep, it's looking more like a hardware/power issue.
It's the official power brick but as noted earlier I do not have it on a UPS or anything, I've ordered one, to be delivered tomorrow, that I'm going to see if it improves.
It's an SD Card - I think SanDisk Extreme C10, U3, V30 etc. Need to validate
edit Ethernet is a Monoprice Cat6A with no physical damage but you never know...
The Deco unit has 3 ethernet ports on it that show as fine in settings but also potentially an issue - maybe power is fluctuating on both of the units (they're on the same surge strip)
Also interesting is that I set up a cheap monitor on the RPi, I'm able to sign into it locally, open Firefox, log in to Pi-Hole, etc.
For some reason I cannot remotely access the RPi (PiHole interface, SSH, etc) from my Desktop1 and Laptop2. I am successfully able to access it from Laptop1.
I've completely reimaged the SD Card (Samsung Pro btw) and done a clean install of PiHole + Unbound and super verified all configurations without any question.
I've also plugged both the router and RPi into a battery UPS to rule out any wonky power fluctuations on the system. Just configured the Deco to use the PiHole a minute ago so we will see how everything behaves now.
In the event I still have issues the ethernet cable is going to be next on the list for swapping out.
Have you every been able to connect to the pi from those machines? What error messages are you getting when you try to connect? This could be a problem with keys or even an ssh client not being installed on the machine etc. Additional info would be helpful.
So specifically no error at all - just refused to connect.
As of this morning there have been no crashes as seen before - I'm going to chock it up to probably a power issue that putting the UPS in place has solved.
I also improved the configuration with the Conditional Forwarding that @deHakkelaar suggested
EDIT: Below one shows if a link is currently down.
The NO-CARRIER flag for eth0 which is not physically connected in below example:
$ ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0 DOWN b8:27:eb:xx:xx:xx <NO-CARRIER,BROADCAST,MULTICAST,UP>
wlan0 UP e8:94:f6:xx:xx:xx <BROADCAST,MULTICAST,UP,LOWER_UP>
If so, you'll profit from that CF setting bc names instead of IP's will be shown on the webGUI.
Also you can resolve by name now instead of connecting via IP.
Eg on a client of mine:
$ dig hakpc
[..]
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39282
[..]
;; ANSWER SECTION:
hakpc. 2 IN A 10.0.0.11
;; Query time: 3 msec
;; SERVER: 10.0.0.2#53(10.0.0.2) (UDP)
;; WHEN: Sun Dec 15 18:04:50 CET 2024
;; MSG SIZE rcvd: 51