Alternate Highly-Available Pi-hole Setup using CoreDNS

Hey All,

I ran into this great thread in the Feature Request section that talks about using lsyncd to have a redundant Pi-hole deployment, which I can't seem to reply to with my own deployment details:

In deploying this setup, I realized that it was messing up the statistics because I had some devices that were using the DNS resolvers in a round-robin fashion. So I started to look at how to run a DNS forwarder on my second instance, that used the Pi-hole if it was available and fell-back to CloudFlare if not.

This assumes that Pi-hole outages are rare, and I'm OK losing blocks for a few minutes until the Pi-hole recovers. I have thoughts on how to improve this at the end of the post. :smile:

I originally considered using Dnsmasq, but in reading the documentation for how it handles upstream selection and recovery from failure I realized I would need to write a daemon to handle health checking / rewriting of the Dnsmasq configuration. This is because it's not sequentially deterministic, and instead will send the query to multiple resolvers and use whichever responds first. So in most cases this would probably do what I want, but I wanted a bit more of a guarantee.

So instead I took the lsyncd configuration file from the post above, and edited the associated shell script to continue rsyncing the data and rewriting the configurations but to no longer restart the Pi-hole services. After shutting down the Pi-hole services, and disabling them so they don't start on boot, I deployed CoreDNS with this configuration file:

.:5353 {
  errors
  cache
  bind 127.0.0.1
  forward . tls://1.1.1.1 tls://1.0.0.1 {
    expire 60s
    max_fails 2
    policy sequential
    tls_servername cloudflare-dns.com
  }
}

. {
  log
  errors
  bind 10.0.0.22
  prometheus 0.0.0.0:9153
  hosts /etc/pihole/custom.list {
    ttl 2
    reload 2s
    fallthrough
  }
  forward . 10.0.0.25:53 127.0.0.1:5353 {
    max_fails 2
    policy sequential
  }
}

The result is a CoreDNS configuration with two listeners, one on the node's external IP (10.0.0.22:53) and the other on 127.0.0.1:5353.

For the :53 listener, CoreDNS will use the /etc/pihole/custom.list to properly handle queries for any Local DNS entries specified in the Pi-hole UI. Note: I moved my pi.hole DNS entry from /etc/pihole/local.list to /etc/pihole/custom.list since I wanted it to always to resolve to my primary Pi-hole instance and not the secondary.

If none of the custom.list entries match the query, it'll then fallthrough and forward the query to the Pi-hole at 10.0.0.25:53. If that instance is unhealthy, it'll instead forward queries to the other CoreDNS listener on 127.0.0.1:5353. This listener forwards queries directly to CloudFlare, and has caching enabled so I don't hammer them.

As I mentioned above I do lose blocks in this case, but it'll switch back to the Pi-hole within ~500ms of it becoming healthy again and is generally invisible to the clients.

One upside of this deployment is that the full Pi-hole configuration is being synchronized to the second instance and is mostly unused. If my primary Pi-hole instance were to become permanently unhealthy, I can shut down CoreDNS and launch Pi-hole on the secondary and get all the blocks back. This would only take me a few moments to execute.

Potential Improvement #1: I am only including the custom.list hosts file in my Corefile. I could maybe also have another entry for /etc/pihole/local.list. However, for me there is nothing of value in there. The hostname for my Pi-hole instance is duplicated in the custom.list file.

Potential Improvement #2: This is a bit more of a pie-in-the-sky sort of improvement, but I want to look at the possibility of consuming the gravity.db file as a CoreDNS plugin. I'd still forward requests to the Pi-hole by default (for proper metrics), but use the gravity plugin when falling back to CloudFlare (when the Pi-hole is down).

As a bit of an update, I've started to push ahead on writing a CoreDNS plugin that also blocks queries by reading the Pi-hole's gravity.db file. Managed to get an initial POC working, and so I'm going to push forward with actually integrating it into a plugin.

It seems like the only limitation will be that I won't be able to handle the MAC Address-based client definitions, so you'd still need to define clients by IP. This is because CoreDNS only exposes the network address of the client to the plugin. Teaching CoreDNS to probe the network for MAC addresses seems a bit intense.

I'm very interested in your efforts to make use of gravity.db from a different DNS recursive resolver; I'd like to make use of Pi-Hole's data, but using my own resolver (PowerDNS Recursor, which supports LUA scripting and can easily access SQlite3 databases). If your PoC plugin is published somewhere, please post a link for the curious out here :slight_smile: