Please follow the below template, it will help us to help you!
If you are Experiencing issues with a Pi-hole install that has non-standard elements (e.g you are using nginx instead of lighttpd, or there is some other aspect of your install that is customised) - please use the Community Help category.
Expected Behaviour:
[Pi Hole should resolve DNS queries and internet should work smoothly]
Actual Behaviour:
[Pi Hole randomly stops resolving DNS queries and I'm presented with DNS_PROBE_FINISHED_BAD_CONFIG. To resolve this I need to reboot the RapberryPi and then things get back to normal.]
Look in the pihole log at /var/log/pihole.log and see if the queries that are being forwarded to unbound are being resolved.
As a troubleshooting step, change your upstream DNS resolver in Pi-hole to something other than your local instance of unbound. This will help narrow down the source of the problem.
Here are the entries from /var/log/pihole.log
If you see closely, you will see reply error is SERVFAIL and at Aug 9 05:01:59 dnsmasq[663] it starts responding to queries. Similar patter is repeated until I changed DNS resolver from unbound to upstream. That fixed this issue, but them there have been incidents in past where a genuine website was ending up in SERVFAIL and so only switched from upstream resolver to unbound after reading couple of articles.
Aug 9 04:58:06 dnsmasq[663]: forwarded hb-a.services.vnc.com to 127.0.0.1
Aug 9 04:58:06 dnsmasq[663]: reply error is SERVFAIL
Aug 9 05:01:59 dnsmasq[663]: query[A] play.googleapis.com from 200.200.200.7
Aug 9 05:01:59 dnsmasq[663]: forwarded play.googleapis.com to 127.0.0.1
Aug 9 05:01:59 dnsmasq[663]: reply play.googleapis.com is 216.58.197.42
Aug 9 10:08:02 dnsmasq[663]: forwarded www.sony.net.root-servers.net to 127.0.0.1
Aug 9 10:08:02 dnsmasq[663]: reply error is SERVFAIL
Aug 9 10:08:02 dnsmasq[663]: reply error is SERVFAIL
Well, I do see query for NTP server in the log, but how does that justify that Pi has incorrect time. The RaspberryPi has been running fine and what could through off the system time, specially when time is quite correct for the time zone it's in.
Aug 9 08:08:06 dnsmasq[663]: forwarded 1.debian.pool.ntp.org to 127.0.0.1
Aug 9 08:08:06 dnsmasq[663]: query[A] 2.debian.pool.ntp.org from 127.0.0.1
Aug 9 08:08:06 dnsmasq[663]: forwarded 2.debian.pool.ntp.org to 127.0.0.1
Aug 9 08:08:06 dnsmasq[663]: query[AAAA] 2.debian.pool.ntp.org from 127.0.0.1
Aug 9 08:08:06 dnsmasq[663]: forwarded 2.debian.pool.ntp.org to 127.0.0.1
And if it's bound to be like that (not able to keep up with current zone's time), then it's a question mark Pi's credibility. I'm just trying to understand where it all went wrong and why?
Pi's do not have onboard clocks. They get their time from an NTP server. If the time on the Pi is sufficiently inaccurate, they can have difficulty synching with an NTP server. If a Pi has a sudden power loss, it can lose the time.
Run the Pi date command and verify that the time on your Pi is the correct local time. If the time is correct, we'll continue investigating the problem.
Thanks jfb
That explains and it's quite helpful. There have been incidents where Pi had sudden power loss and can justify why it lost track of time and couldn't sycn-up with NTP. Like I mentioned earlier, for now I've switched to upstream DNS resolver instead on unbound. And time of Pi is correct per local time.
How should I proceed further in order to troubleshoot the issue? Shall I revert to unbound resolver and wait for issue to resurface?
There are a few things you can do to increase the reliability of the Pi.
Put it on a UPS or battery backup so it doesn't lose power suddenly.
Add an inexpensive RTC (real time clock). This hooks onto the GPIO pins on the Pi and provides an onboard clock.
I would add both unbound and a third party upstream resolver in your Pi-hole settings. Pi-hole has an algorithm that will favor the fastest responder. But, having two will also allow Pi-hole to shift to one if the other fails to respond.
You can also increase the verbosity of unbound in your unbound configuration file. The verbosity levels are discussed in the link below. This will show more of the details of what is happening in the unbound software.