DNS requests go unanswered after bootup, later starts working for no reason

Newbie to Linux here. Been using FreeBSD for almost 10 years, so not new to CLI or anything, just new to Debian/Raspbian/Linux. Thankfully its relatively easy to move from BSD to linux and figure things out, minus stuff for systemd, which is still quite different from FreeBSD.

I have an opnsense machine that I previously used for my home network. It handled all DNS and DHCP requests. This is all still true, except I have chosen to have all of my machines that connect via DHCP go to my raspberry pi for filtering with pihole. I set it up about 4 or 5 months ago and it worked fine, until I had to reboot the raspberry pi.

I’ve now noticed that when I reboot the Pi, it seems to selectively decide to not listen to some DNS requests, for anywhere from a few minutes to a few hours, depending. I have been unable to determine the cause, nor how to fix it, nor have I figured out how to make it fix itself, except to simply stop using the DNS on my pi until it starts working. Tonight, I did some package updates followed by a reboot, and then spent 2 hours trying to figure out what was wrong with Minecraft, only to figure out it was a DNS issue. G/F hates me right now since Minecraft wouldn’t run (first thing the launcher does is go to the internet to validate files, which the DNS failed and so it would crash out).

I’ve used tcpdump and wireshark to validate that my dns requests are leaving my desktop and also verified they are received by the pi, but there is no response packet being sent back. (See screenshot)

There is a possibility that I’ve screwed up the Pi’s networking, and this is some artifact of that. When I first setup this pi, I tried to give it a static IP and static DNS entries. However, since I have to use Raspbian Buster, and I couldn’t figure out how to get it to work properly, I instead opted to give it a static IP via DHCP. So this could be a remnant of that.

Expected Behaviour:

I expect that on reboot, that pihole should come up within a minute or two and start working, serving DNS requests.

Actual Behaviour:

pihole seems to be up (accessible via WebGUI, no indication of problems I can see), and all seems to be completely fine and normal, but DNS requests are ignored. After some indeterminate period of time pihole will randomly start working. I’ve had it start working after 5 minutes, sometimes after a few hours. There’s nothing I’ve been able to figure out as to how to make it do whatever it does to start working, it just randomly starts working.

Debug Token:

https://tricorder.pi-hole.net/r22yi981ic

Thanks to whoever took the time to read this far. :wink:

Your debug log shows that the Pi is not receiving traffic on its outward facing IP. Pi-Hole checks blocking from the loopback IP (the Pi itself), the outward facing IP, and DNS resolution using an external DNS server. The first and last tests passed, but the Pi did not answer the request on the 106 IP. I would check the connectivity on the Pi to the router and also check your ethernet cable connection, etc.

*** [ DIAGNOSING ]: Name resolution (IPv4) using a random blocked domain and a known ad-serving domain
[✓] sudanikuwaity.com is 0.0.0.0 via localhost (127.0.0.1)
[✗] Failed to resolve sudanikuwaity.com via Pi-hole (192.168.2.106)
[✓] doubleclick.com is 172.217.14.78 via a remote, public DNS server (8.8.8.8)

Okay, so I got some time to dig into this more.

I rebooted my Pi, and it didn’t work for about 3 minutes, then started working before I could do my testing, so I rebooted it again.

After reboot, I found DNS was not working again, so I did some tests.

From an SSH session on my laptop I connected to the Pi, and did a tcpdump on port 53 and monitored the specific IP address I was using to I could see when I would get a response.

First I verified that when I tried to load a webpage on my laptop on Chrome, it didn’t work. I saw the DNS requests come into the pi, but no response back.

Then on the pi itself via SSH I did the following:

ping 8.8.8.8 (it succeeded)
nslookup www.google.com 8.8.8.8 (it succeeded)
nslookup www.google.com 127.0.0.1 (it succeeded)
nslookup www.google.com 192.168.2.106 (it failed and said connection timed out; no servers could be reached).

I did try using dig as well, and it was able to do requests on 127.0.0.1 without problems.

I was connected via SSH to the Pi for the whole process, and watching via tcpdump, so I tend to think that the cabling and such is proper, and my opnsense router doesn’t seem to be blocking anything from the nslookup I did.

One thing I have noticed though, is that if I do a “pihole status”, everything seems to immediately start working. I’ve done 3 tests to make sure it wasn’t a coincidence.

In all cases, pihole status output shows a checkmark that DNS service is running and pi-hole blocking is Enabled.

So I’m kind of back to thinking something with the software configuration is making pihole angry on bootup. But I’m not sure where to start.

Any ideas where to go to troubleshoot this down further? Is there any kind of debug mode I can use?

On one hand, I really do want to troubleshoot this down further, on the other hand I also feel like adding a cron job to run every minute and have it run “pihole status”, but I’ll be honest, that feels so janky to me. LOL.

One other thing I want to mention, my browser shows DNS_PROBE_FINISHED_BAD_CONFIG, but as that typically means something along the lines of ‘I didn’t get a DNS response, so the config must be invalid’, so I don’t think this is diagnostically useful, but I figured I’d add it for completeness.

What did you do to resolve this problem?

Sorry, I didn’t answer it very well, so let me try to answer again.

I did nothing to “fix it”. On every reboot, pihole ends up in this broken state. I can ping out, I can do DNS queries to the internet, everything seems to work fine, except for the pihole software.

Internally, I’ve figured out that if I do a “pihole status”, that pihole immediately starts working properly, so I’m convinced it’s a software problem, software misconfiguration, or a race condition. The only thing I have to wonder is if my bootups are so fast they’re causing a race condition with a less than desirable outcome. I’m booting from an SSD via USB3.

If you have specific commands you’d like me to run on the Pi when its in the broken state, I can easily reboot it and run those commands, but since “pihole status” corrects the anomaly, I tend to think it’s something with pihole directly, or a race condition that pihole needs to wait for.

I did an upgrade of packages and a reboot, and yet again my DNS wasn’t working.

I did another “pihole -d” and it can be read at https://tricorder.pi-hole.net/42x7a94fk3

I did this over an SSH session, and I was able to ping 8.8.8.8 from the pi itself as well as query DNS entries from 1.1.1.1 and 8.8.8.8, so I have no doubt that the networking is working just fine, otherwise I wouldn’t have even been able to SSH in.

Just like I stated above though, DNS requests were ignored by pihole (this was from the same machine I used to SSH into the pi):

C:>nslookup www.google.com 192.168.2.106
DNS request timed out.
timeout was 2 seconds.
Server: UnKnown
Address: 192.168.2.106

DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
*** Request to UnKnown timed-out

I then did a “pihole status” from an SSH session…

C:>nslookup www.google.com 192.168.2.106
Server: raspberrypi4
Address: 192.168.2.106

Non-authoritative answer:
Name: www.google.com
Addresses: 2607:f8b0:4007:800::2004
172.217.4.132

And like magic, pihole started working. I’ve also noticed that if I do “pihole -g” that also seems to fix it.

In any case, the issue seems to be straight up with pihole software and not a hardware or networking misconfiguration.

How can I help debug this to identify the problem?

Thanks.

What are your upstream configured DNS servers ?

nslookup -class=chaos -type=txt servers.bind 192.168.2.106

Might have created a DNS loop somewhere.
Check if load is unusually high that might be an indicator of a loop:

uptime

top

Check if date/time is correct:

timedatectl

And while doing the nslookup tests from the client, tail the logs on Pi-hole:

pihole -t

Or filter for queries coming from one particular <CLIENT_IP>:

tailf /var/log/pihole.log | grep -A3 <CLIENT_IP>

Maybe experiencing Under-voltage ?

dmesg | grep -i under-voltage

This is the results with the system freshly rebooted, so the problem was occurring. I verified the issue occurred before I started and after I finished the first set of tests from a workstation that uses the pihole exclusively:

root@raspberrypi4:~# nslookup -class=chaos -type=txt servers.bind 192.168.2.106
;; connection timed out; no servers could be reached

root@raspberrypi4:~# uptime
12:07:19 up 6 min, 4 users, load average: 0.47, 0.49, 0.28
root@raspberrypi4:~# timedatectl
Local time: Wed 2020-01-22 12:07:24 MST
Universal time: Wed 2020-01-22 19:07:24 UTC
RTC time: n/a
Time zone: America/Phoenix (MST, -0700)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
root@raspberrypi4:~# dmesg | grep -i under-voltage (I deliberately bought a PSU that is 5.25v to avoid undervoltage problems)

top shows CPU usage at 0.09 (I do run librenms on this, but the pihole issue began before I installed librenms).

pihole -t showed no additional log entries when doing the nslookups.

I did see that dnsmasq shows a query for one of my servers from 127.0.0.1, and it responded with a cached request:

Jan 22 12:10:02 dnsmasq[1117]: query[A] server1.my.domainxxx.com from 127.0.0.1
Jan 22 12:10:02 dnsmasq[1117]: cached server1.my.domainxxx.com is 192.168.2.109
Jan 22 12:10:02 dnsmasq[1117]: query[A] localhost.my.domainxxx.com from 127.0.0.1
Jan 22 12:10:02 dnsmasq[1117]: cached localhost.my.domainxxx.com is 127.0.0.1
Jan 22 12:10:02 dnsmasq[1117]: query[A] server2.my.domainxxx.com from 127.0.0.1
Jan 22 12:10:02 dnsmasq[1117]: cached server2.my.domainxxx.com is 192.168.2.100

If I watch it, It happens every 5 minutes like clockwork, but only those entries appear in the logs, nothing else.

I then verified it was still broken from a windows desktop, which it was.

After running a “pihole status”, which instantly makes everything work fine:

root@raspberrypi4:~# nslookup -class=chaos -type=txt servers.bind 192.168.2.106
Server: 192.168.2.106
Address: 192.168.2.106#53

servers.bind text = “192.168.2.1#53 141 1”

root@raspberrypi4:~# uptime
12:13:25 up 12 min, 4 users, load average: 0.01, 0.16, 0.19
root@raspberrypi4:~# timedatectl
Local time: Wed 2020-01-22 12:13:36 MST
Universal time: Wed 2020-01-22 19:13:36 UTC
RTC time: n/a
Time zone: America/Phoenix (MST, -0700)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no

pihole -t instantly started showing lots of log entries before ‘pihole status’ even finished.

I did just learn something though. If I look at the /var/log/pihole.log file and do a “nslookup www.google.com 127.0.0.1” when nothing else on my network can resolve anything, queries returns from pihole.

Jan 22 12:31:50 dnsmasq[1116]: query[A] www.google.com from 127.0.0.1
Jan 22 12:31:50 dnsmasq[1116]: cached www.google.com is 216.58.217.196
Jan 22 12:31:50 dnsmasq[1116]: query[AAAA] www.google.com from 127.0.0.1
Jan 22 12:31:50 dnsmasq[1116]: cached www.google.com is 2607:f8b0:4007:802::2004

So it appears that 127.0.0.1 will work, and is verified in the logs, but 192.168.2.106 does not. Very strange!

And, of course, once I do a “pihole status” the logs show all the requests from my network devices.

Thanks.

Is your router 192.168.2.1 configured to use Pi-hole for DNS upstream ?
Usually its under WAN or Internet settings.
That would be the loop.

My router (opnsense) is setup to tell DHCP clients on the network to use the Pi, but the router itself uses my ISP’s DNS. Since I prefer to go with the assumption that it is my fault this isn’t working, I changed pihole to use Google (8.8.8.8) exclusively. I rebooted the Pi and the issue persists. When I do an nslookup I get the same connection timed out, no servers could be reached.

Just to check my work, I did check the /etc/resolv.conf file on my router, and it does not list 192.168.2.106. It lists four IPs, 127.0.0.1 (it does have unbound DNS setup since I was using it before Pihole, and I haven’t disabled it as I cannot rely on the Pi exclusively because of this issue) and three IPs from my ISP. To be honest, I’m not sure if I can disable it in the future since my opnsense router is acting as my DHCP server. I just haven’t dig down deep enough to investigate that as I’ve hit this snag first.

In conclusion, I don’t see any way I could have a loop with my current configuration, and choosing to have pihole use Google’s DNS exclusively didn’t fix the issue either.

Yeah if you have Google DNS exclusively upstream, this would not loop back to Pi-hole.
Other than checking the logs, I have no idea.

dmesg | less

less /var/log/pihole-FTL.log

less /var/log/pihole.log

less /var/log/syslog

journalctl -ax

EDIT: You could attach the debugger once you get Pi-hole responding again and wait for it to lockup:

https://docs.pi-hole.net/ftldns/debugging/

Ow ps. you can tail most of the logs live:

dmesg -w

tail -f /var/log/{pihole*.log,syslog,daemon.log,kern.log}

Ow and check if nothing running full:

df -h

So Pi-hole is operational for itself (i.e. resolving hostnames via its very own 127.0.0.1 just fine), you can access Pi-hole’s Web UI, you’ve verified that DNS traffic is arriving at Pi-hole’s network interface, yet Pi-hole does not see any DNS requests - until you do a pihole status, when DNS resolution instantly kicks in.

To me, this sound very much like a firewall is blocking your connections up to the moment you successfully initiate a connection from the inside, allowing all further DNS traffic until your next reboot.

You wouldn’t run any elaborate firewall package or iptables rulesets where you accidentally blocked DNS packets from travelling freely on your Pi-hole?

1 Like

Yeah good one.
Could be some rate limiter.
What rules are active now ?

sudo iptables -nL

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.