DNS resolution is currently unavailable -- Pi-hole stopped working

walrusbutterfly · March 19, 2023, 6:56am

Hello,

I'm currently running Pihole in a Kubernetes cluster with the following setup:

Pihole is running as a DNS server and DHCP server where the container uses the host network
DHCP services are disabled on my router so that Pihole is the only DHCP server on the network
Router IP is 192.168.1.1
Pihole DHCP server advertises 192.168.1.201 (load balancer IP) as the DNS server to clients

Everything works fine with this configuration initially, but around 24 hours later, my network completely goes offline. All devices are not able to connect to the internet: laptop, phone, cluster, etc. I'm able to fix this issue by re-enabling DHCP services in my router and all my devices go back online in a few minutes (including Pihole).

I think the root cause is the Pihole. My guess is that the network in the cluster somehow breaks b/c the Pihole DNS servers are not able to resolve any host. Since the Pihole DNS can't resolve hosts, all devices on my network get disconnected since they are also using the Pihole server. This issue has happened to be every day so far and I've been constantly re-enabling / disabling the DHCP services in my router to fix this issue.

I checked my Pihole instance during the outage and it runs into a "DNS resolution is currently unavailable" error on start up. What's really interesting is the cluster is not able to identify my network router during this time period or even connect to it. The routing tables look off to me because my router is 192.168.1.1 and my subnet is 192.168.1.X.

*** [ DIAGNOSING ]: Network routing table
   default dev eth0 scope link src 169.254.94.23 metric 202 
   10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1 
   10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink 
   169.254.0.0/16 dev eth0 scope link src 169.254.94.23 metric 202 
   169.254.0.0/16 dev flannel.1 scope link src 169.254.205.225 metric 204 
   169.254.0.0/16 dev vetha862da1b scope link src 169.254.93.74 metric 206 
   169.254.0.0/16 dev veth45f08aca scope link src 169.254.4.77 metric 207 
   169.254.0.0/16 dev veth8e7f3d7a scope link src 169.254.49.147 metric 208 
   169.254.0.0/16 dev vetha93639c3 scope link src 169.254.110.127 metric 209 
   169.254.0.0/16 dev vethbb57e142 scope link src 169.254.115.117 metric 210 
   169.254.0.0/16 dev veth3e4c3ddd scope link src 169.254.158.58 metric 211 
   169.254.0.0/16 dev vethfc75c21b scope link src 169.254.160.56 metric 212 
   169.254.0.0/16 dev vethe2c8e953 scope link src 169.254.110.164 metric 213 
   169.254.0.0/16 dev vethc7a1cbf8 scope link src 169.254.10.124 metric 214 
   169.254.0.0/16 dev vethb2cdf2af scope link src 169.254.39.205 metric 215 
   169.254.0.0/16 dev vethdf62c2e0 scope link src 169.254.168.122 metric 216 
   169.254.0.0/16 dev veth919866d8 scope link src 169.254.211.80 metric 217 

*** [ DIAGNOSING ]: Networking
[✓] IPv4 address(es) bound to the eth0 interface:
    169.254.94.23/16

[✓] IPv6 address(es) bound to the eth0 interface:
    fdc8:6195:2176:46be:d6b4:e96:9dc2:5971/64
    fe80::b38:5f8a:6c1f:775d/64

[i] Default IPv4 gateway(s):
     eth0
   * Pinging first gateway eth0...
[✗] Gateway did not respond. (https://discourse.pi-hole.net/t/why-is-a-default-gateway-important-for-pi-hole/3546)

After re-enabling DHCP on my router, my network goes back online and my Pi-hole starts working again. I've attached the debug logs that were emitted during the outage and after the outage. You can see that during the outage, the Pi-hole is trying to interface with 169.254.94.23/16, but after the outage, it's using the correct 192.168.1.X network. I'm not sure if this is merely a side-effect of the issue or the actual root cause.

My hunch is that I'm facing a chicken and egg problem, but I don't know how to verify it or even fix it. Could it be because that the cluster (physical nodes) depend on the Pi-hole for DNS queries? If the Pi-hole goes down then the cluster network goes down since it's no longer able to resolve any DNS queries. However, the Pi-hole automatically restarts, so why doesn't the system self-heal after the Pi-hole reboots and broadcasts offers new DHCP leases to the cluster nodes? Again this is just a hunch and I have no idea if I'm on the right track or not.

Bucking_Horn · March 19, 2023, 7:34am

(For your own privacy and security, do not post full, unsanitised debug log output here.
Instead, upload the logs an just post the token, please.)

This looks lie a networking issue, rather than a Pi-hole one.

*** [ DIAGNOSING ]: Networking
[✓] IPv4 address(es) bound to the eth0 interface:
    169.254.94.23/16

*** [ DIAGNOSING ]: Setup variables
    DHCP_LEASETIME=24

Your Pi-hole host machine has fallen back to an IPv4 link-local address.

Usually, that happens when no other means of IPv4 address assignment are available, i.e. your host cannot contact any DHCP server, and it hasn't been manually configured for a static address.

I'd recommend to configure such a static IP address for your DHCP server, if you've not already done so.

Perhaps your Kubernetes cluster is dynamically assigning IP addresses to virtual hosts/containers?

A DHCP server should have a fixed address (as should a DNS server) - otherwise, on DHCP lease expiry, DHCP clients will routinely fail to renew their leases via a unicast to the DHCP server's IP that issued the lease. They eventually will fallback to broadcasting for DHCP servers on the link and try to negotiate a new lease, but that would take considerably longer than just renewing it.
It would also put unnecessary load on your network, as it's quite likely that your clients would flood your network with DHCP broadcasts at the same time.

walrusbutterfly · March 19, 2023, 10:23am

Thanks for the pointers! I agree that something with the network configuration is incorrect and I knew there was something off about the 169.254.94.23/16 subnet.

Usually, that happens when no other means of IPv4 address assignment are available, i.e. your host cannot contact any DHCP server, and it hasn't been manually configured for a static address.

This sounds plausible. Pi-hole DHCP server goes down and the physical nodes do not have a static IP address to fall back onto. Is there an easy way to set a static IP for the raspberry pi?

However, one part I fully don't understand is this: both the PiHole DHCP and DNS servers have a static IP address that's provided by the cluster load balancer (192.168.1.201). I've also configured the Pihole DHCP server to include this server in the DHCP offer packet, so all the clients should know about this. For example, I checked each of my devices' resolv.conf, and they each seem to contain the correct DNS server.

$ cat /etc/resolv.conf
#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
#   scutil --dns
#
# SEE ALSO
#   dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
search lan
nameserver 192.168.1.201

So if the Pihole DHCP server has a static IP, then how come on lease expiry, the DHCP clients continuously fail to contact it?

Bucking_Horn · March 19, 2023, 10:50am

As I understand your configuration, the machine hosting the DHCP server itself has no static IP - your load balancer has?

I wouldn't recommend load-balancing your DHCP server.
A DHCP server would only be able to answer to your client's DHCP broadcasts from the same link anyway.

I'd presume your load-balancer to completely ignore DHCP broadcasts.

If your Pi-hole machine would be on the same link as your client, it would be Pi-hole that answers those DHCP broadcasts directly(!) (and if it isn't, your clients would use another DHCP server, if any).

EDIT:
If you are absolutely determined to load-balance DHCP, you may want to consider deploying a DHCP relay on your load-balancer, though most relays would require fixed target DHCP server IPs as well.
Even if you'd find one that supports dynamically allocating DHCP servers: As your actual DHCP servers have no fixed IP, you will likely incur the additonal time for a complete DHCP cycle instead of just a unicast when renewing an existing lease, as it would then be the relay that would lose the known DHCP IP address.

walrusbutterfly · March 19, 2023, 11:27am

Ok that makes sense. I've been mainly using the load balancer to provide a static IP for the Pihole DNS server. It just happens that the DHCP server also runs in the same container. What you said makes sense because I've never see a DHCP response get sent back from the load balancer. But I would imagine that clients are agnostic of the DHCP server's IP anyways if the DHCP packets are sent in a broadcast fashion? Running Pihole in Kubernetes means that the Pihole instance might be deployed to any of the nodes at any given time. So the Pihole instance can be running on any host.

somewhat related to the previous topic: this post seems to suggest that it's bad for the Pihole host to depend on the Pihole DNS

Is my train of thought correct here?

Cluster nodes get IP assignment from Pihole DHCP along with the Pihole DNS server IP.
Things are happy and everything works as expected.
Some time passes and cluster nodes need to renew DHCP lease.
- Maybe some of the cluster nodes are not able to contact w/ Pihole at that moment (Kubernetes does like to cycle pods) so the renewal fails. Also I have no other DHCP servers on the network.
From what you said earlier, this means that the cluster node isn't able to get an IP assignment so it falls back onto the IPv4 link-local address. Does this mean it's essentially disconnected from my LAN and from my router?
- why am I able to still SSH into these nodes? but at the same time why can't these nodes see my router ping 192.168.1.1?
Assuming the master node (I only have 1) is now on the link-local network, then all new Pihole instances that spin up will run under the IPv4 link-local network with the master node.
- why can't the Pihole instances running on the disconnected node handle DHCP requests from that node if they're on the same link-local network?

I really would like to understand the underlying problem more since I don't have much expertise in the area. However, I think updating my hosts with a static IP and the router IP is a promising solution. That seemed to be the issue when I ssh'd onto these nodes and saw that each of their IP routes lacked the router IP.

Bucking_Horn · March 19, 2023, 11:32am

Not entirely.
It is rather limited to link-local networking only, i.e. talk is only possible among clients on the same link (in lieu of a gateway).

No.
Clients only use a broadcast to discover a DHCP server.
Once they've acquired a lease through one, they will unicast to that DHCP server's IP.

But all of this is related to networking rather than Pi-hole.
I can't really help you much further, other than repeating that I wouldn't recommend load-balancing your DHCP server.

The decisive question seems to be why your Pi-hole host would fallback to using an IPv4 LLA.
If it wouldn't, clients should be able to find the new DHCP server's IP - but you'd still cause unnecessary DHCP traffic that could be avoided by using a fixed IP DHCP server.

system · April 9, 2023, 11:32am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.