Hey All,
I ran into this great thread in the Feature Request section that talks about using lsyncd
to have a redundant Pi-hole deployment, which I can't seem to reply to with my own deployment details:
In deploying this setup, I realized that it was messing up the statistics because I had some devices that were using the DNS resolvers in a round-robin fashion. So I started to look at how to run a DNS forwarder on my second instance, that used the Pi-hole if it was available and fell-back to CloudFlare if not.
This assumes that Pi-hole outages are rare, and I'm OK losing blocks for a few minutes until the Pi-hole recovers. I have thoughts on how to improve this at the end of the post.
I originally considered using Dnsmasq
, but in reading the documentation for how it handles upstream selection and recovery from failure I realized I would need to write a daemon to handle health checking / rewriting of the Dnsmasq
configuration. This is because it's not sequentially deterministic, and instead will send the query to multiple resolvers and use whichever responds first. So in most cases this would probably do what I want, but I wanted a bit more of a guarantee.
So instead I took the lsyncd
configuration file from the post above, and edited the associated shell script to continue rsyncing the data and rewriting the configurations but to no longer restart the Pi-hole services. After shutting down the Pi-hole services, and disabling them so they don't start on boot, I deployed CoreDNS with this configuration file:
.:5353 {
errors
cache
bind 127.0.0.1
forward . tls://1.1.1.1 tls://1.0.0.1 {
expire 60s
max_fails 2
policy sequential
tls_servername cloudflare-dns.com
}
}
. {
log
errors
bind 10.0.0.22
prometheus 0.0.0.0:9153
hosts /etc/pihole/custom.list {
ttl 2
reload 2s
fallthrough
}
forward . 10.0.0.25:53 127.0.0.1:5353 {
max_fails 2
policy sequential
}
}
The result is a CoreDNS configuration with two listeners, one on the node's external IP (10.0.0.22:53
) and the other on 127.0.0.1:5353
.
For the :53
listener, CoreDNS will use the /etc/pihole/custom.list
to properly handle queries for any Local DNS entries specified in the Pi-hole UI. Note: I moved my pi.hole
DNS entry from /etc/pihole/local.list
to /etc/pihole/custom.list
since I wanted it to always to resolve to my primary Pi-hole instance and not the secondary.
If none of the custom.list
entries match the query, it'll then fallthrough and forward the query to the Pi-hole at 10.0.0.25:53
. If that instance is unhealthy, it'll instead forward queries to the other CoreDNS listener on 127.0.0.1:5353
. This listener forwards queries directly to CloudFlare, and has caching enabled so I don't hammer them.
As I mentioned above I do lose blocks in this case, but it'll switch back to the Pi-hole within ~500ms of it becoming healthy again and is generally invisible to the clients.
One upside of this deployment is that the full Pi-hole configuration is being synchronized to the second instance and is mostly unused. If my primary Pi-hole instance were to become permanently unhealthy, I can shut down CoreDNS
and launch Pi-hole on the secondary and get all the blocks back. This would only take me a few moments to execute.
Potential Improvement #1: I am only including the custom.list
hosts file in my Corefile
. I could maybe also have another entry for /etc/pihole/local.list
. However, for me there is nothing of value in there. The hostname for my Pi-hole instance is duplicated in the custom.list
file.
Potential Improvement #2: This is a bit more of a pie-in-the-sky sort of improvement, but I want to look at the possibility of consuming the gravity.db
file as a CoreDNS plugin. I'd still forward requests to the Pi-hole by default (for proper metrics), but use the gravity
plugin when falling back to CloudFlare (when the Pi-hole is down).