Here is what I just achieved in my setup...
Need :
--Highest availability level possible for DNS
--Highest security level possible for DNS
--Solution requiring as little manual intervention as possible
--Solution ensuring data integrity
While trying to deploy an HA instance of PI-Hole in my Kubernetes cluster, I ended up with problems many others faced before me. After searching all over the place, I did not find an easy solution and accepted the challenge of doing it myself
Problem No 1-Avoiding the GUI (not possible as of now)
From PI-Hole v5, it is not possible to add your blocklist without manually clicking in the GUI anymore. That pure Microsoft / Windows philosophy is not good at all for automation like Kubernetes, so I had to work around that.
Problem No 2-SQLite (unavoidable as of now)
That's another one that is problematic when you need multiple access to the same resource. Because PI-Hole can not be redirected to my actual HA cluster based on MariaDB, again I had to deal with that.
3-Loosing client's source IP (loose some availability / flexibility or loose the source IP. Pick your poison).
When routing requests through Kubernetes infrastructure of load balancers, ingress and services, you often loose the client's source IP. The option to avoid that forces the use of local network stack which itself is not good for HA because you end up linked to a specific node / IP associated with a single container.
So how did I solved all of that ?
Environment is :
--A cluster of 2 pfSense firewalls around which the entire network is built
--A Kubernetes cluster of 9 nodes (3 control plane and 6 workers)
--That K8S cluster is using Longhorn for storage, MetalLB for load balancing and both Nginx and HAProxy for ingress.
About point No 1:
I created two deployments in Kubernetes. Both are using two ReadWriteMany volumes from Longhorn. One is for /etc/pihole and the other is for /etc/dnsmasq.d.
First deployment is limited to a single replica and mounts these volumes as RW. That deployment also exposes ports 53 (TCP and UDP), as well as port 80. I need to start that deployment first for it to create the files needed by the others.
The second deployment can have many replicas (running with 3 here) and forces the volumeMounts as readonly for both volumes. Also, that deployment exposes only port 53 and not port 80.
A maximum of options are configured using environment variables (DNSSec ; upstream DNS ; conditional forwarding ; etc).
The Kubernetes service points to both pods (RW and RO). Because only the RW one has its port 80 open, only that one will present the GUI. That way, when connecting the GUI, I am sure that it is served from the right container. Should that single pod goes down for any reason, DNS service survives and management will come back within a minute.
About point No 2:
Despite the volumes are ReadWriteMany, only a single container actually mounts is as RW. The others mount it as RO. Thanks to that, data integrity is ensured and SQLite is protected.
About point No 3:
--Network devices points to the pfSense cluster to get their DNS service
--pfSense will resolve local names itself and will query Pi-Hole for the rest
--Should Pi-Hole not be available despite the HA (ex: ESXi itself is rebooted, taking down everything), pfSense will fallback to a public DNS as a last resort.
Thanks to that:
--pfSense can log everything and provide me with the client's source IP and its associated queries.
--DNS will always be available (if pfSense is down, the entire network is down...)
--PI-Hole will be enforced as long as it is running (which is very high HA now)
--Management required from the GUI can be done in a safe way
--Changes are propagated to every PI-Hole instance thanks to the shared volume
--As many PI-Holes as desired can run together and share the load / ensure the service
Still, it would be good for PI-Hole to consider realities associated with Kubernetes but in the mean time, I managed to work my way around every problem.