Expected Behaviour:
PiHole should consistently return DNS queries to whole network.
O/S:
- Raspbian GNU/Linux 10 (buster)
Hardware:
- RasPi 3B+, ethernet connection.
- Hard wired into gateway TP-Link AC1200 / VR400.
Conditions:
- PiHole running as sole network-wide DHCP server and DNS sinkhole.
- Router and other networking equipment set up with DHCP disabled.
- Regardless of DHCP being disabled, PiHole's MAC and IP are bound on all network infrastructure. (did this as an attempt to fix issue).
- Upstream DNS is Google.
Actual Behaviour:
Everything works perfectly for varying amounts of time, from between 30 minutes since last reboot as far as 24+hours since last reboot. At seemingly random intervals, DNS resolution for the whole network starts failing.
While in this failed state, web interface is sluggish in response, but despite devices on the network not receiving DNS results, the query log implies they're being received and served normally.
This issue can be immediately resolved by either SSH'ing in to the Pi and rebooting it, or following the steps to reboot through the PiHole's web interface (by navigating directly to its IP - pi.hole doesn't resolve in this situation).
I ran a diagnostic log while in this failed state. The log failed to upload to tricorder.pi-hole.net.
To reiterate, once in this state the only 'fix' is to reboot the hardware, at which point after loading, everything network-wide immediately works fine again and receives DNS resolution - until at some unspecified time later, it stops working. Repeat ad infinitum. I cannot identify any pattern to it entering this failed state.
Steps taken:
In an effort to resolve the issue I have so far:
- Verified with
date
that the date and time are correct. With that in mind and having read a similar thread, I do also have an RTC on its way in the post, lest that turn out to be the root cause. - Tried flashing a known-good (working) disk image of Raspbian 10 with PiHole installed, set up for my network configuration.
- Tried the above on an entirely new and different RasPi 3B+.
- Running
pihole -r
and selecting repair - Running
pihole -r
and selecting reconfigure - Been through all configuration options on networking equipment with a fine-tooth comb to make sure nothing (obvious to me, at least) is configured wrongly.
- Tried physically moving the RasPi to a different location and connecting it to a second router acting as a switch - again all ethernet connected back to the gateway. Same story, works fine until it stops.
- LOTS of Googling and trying different things that also didn't work.
None of these steps have resolved my issue.
Theories:
- Something in my Rasbian configuration is wrong.
- Something in my pihole configuration is wrong.
- Something in my router/network configuration is wrong.
- A particular client device within my network is causing problems.
For all my googling and reading of this and other forums, I've not been able to narrow down my issue beyond these delightfully broad theories. The intermittency of the issue occurring is what's baffling me. It can be fine for an hour or fine for 12 hours or fine for a whole day but it still ends up in this failed state with seemingly no explanation.
Debug Token:
As mentioned earlier, PiHole was unable to upload the debug log to Tricorder. Please find the pasted debug log here.
Any help would be enormously appreciated!