PI-Hole high availability with Kubernetes

Here is what I just achieved in my setup...

Need :
--Highest availability level possible for DNS
--Highest security level possible for DNS
--Solution requiring as little manual intervention as possible
--Solution ensuring data integrity

While trying to deploy an HA instance of PI-Hole in my Kubernetes cluster, I ended up with problems many others faced before me. After searching all over the place, I did not find an easy solution and accepted the challenge of doing it myself :slight_smile:

Problem No 1-Avoiding the GUI (not possible as of now)

From PI-Hole v5, it is not possible to add your blocklist without manually clicking in the GUI anymore. That pure Microsoft / Windows philosophy is not good at all for automation like Kubernetes, so I had to work around that.

Problem No 2-SQLite (unavoidable as of now)

That's another one that is problematic when you need multiple access to the same resource. Because PI-Hole can not be redirected to my actual HA cluster based on MariaDB, again I had to deal with that.

3-Loosing client's source IP (loose some availability / flexibility or loose the source IP. Pick your poison).

When routing requests through Kubernetes infrastructure of load balancers, ingress and services, you often loose the client's source IP. The option to avoid that forces the use of local network stack which itself is not good for HA because you end up linked to a specific node / IP associated with a single container.

So how did I solved all of that ?

Environment is :
--A cluster of 2 pfSense firewalls around which the entire network is built
--A Kubernetes cluster of 9 nodes (3 control plane and 6 workers)
--That K8S cluster is using Longhorn for storage, MetalLB for load balancing and both Nginx and HAProxy for ingress.

About point No 1:

I created two deployments in Kubernetes. Both are using two ReadWriteMany volumes from Longhorn. One is for /etc/pihole and the other is for /etc/dnsmasq.d.

First deployment is limited to a single replica and mounts these volumes as RW. That deployment also exposes ports 53 (TCP and UDP), as well as port 80. I need to start that deployment first for it to create the files needed by the others.

The second deployment can have many replicas (running with 3 here) and forces the volumeMounts as readonly for both volumes. Also, that deployment exposes only port 53 and not port 80.

A maximum of options are configured using environment variables (DNSSec ; upstream DNS ; conditional forwarding ; etc).

The Kubernetes service points to both pods (RW and RO). Because only the RW one has its port 80 open, only that one will present the GUI. That way, when connecting the GUI, I am sure that it is served from the right container. Should that single pod goes down for any reason, DNS service survives and management will come back within a minute.

About point No 2:

Despite the volumes are ReadWriteMany, only a single container actually mounts is as RW. The others mount it as RO. Thanks to that, data integrity is ensured and SQLite is protected.

About point No 3:
--Network devices points to the pfSense cluster to get their DNS service
--pfSense will resolve local names itself and will query Pi-Hole for the rest
--Should Pi-Hole not be available despite the HA (ex: ESXi itself is rebooted, taking down everything), pfSense will fallback to a public DNS as a last resort.

Thanks to that:
--pfSense can log everything and provide me with the client's source IP and its associated queries.
--DNS will always be available (if pfSense is down, the entire network is down...)
--PI-Hole will be enforced as long as it is running (which is very high HA now)
--Management required from the GUI can be done in a safe way
--Changes are propagated to every PI-Hole instance thanks to the shared volume
--As many PI-Holes as desired can run together and share the load / ensure the service

Still, it would be good for PI-Hole to consider realities associated with Kubernetes but in the mean time, I managed to work my way around every problem.

Some extra details...

To get a reliable access to the management, I had to fix the IP address for this pod in my CNI (calico). I also have BGP routing between the kubernetes cluster and the LAN. Thanks to that, when needed, I can open the required access in the firewall and reach a pod directly. I had to do this for my mail server (Poste.IO) as well for it to see the direct client's IPs.

The second point is that ReadOnly pods need to be re-started whenever I do an update on the RW pod. For that, I just need to delete the RO pods. I can delete all of them at once and DNS will be provided by the RW pod, or I can delete them progressively so some remain online to answer queries, despite with the previous config.

Hi,

i'm interested by your setup. Can you share it?

It's a helm or simple manifest ?

Thanks
Cedric

Hi,

using manifests here. I avoid Helm templates as much as possible. I do not feel in control of my setup when using helm charts.

Most of the solution is outside of PI-Hole itself because natively, PI-Hole is unable to do what I need. For that reason, it is difficult to share the entire solution...

Before using the manifests, you need to already have :
Step 1
--An HA solution for your gateway that provides DNS to everyone
--Do your logging from that solution
--Configuring it to forward non-local queries to the VIP used for PI-Hole

I use pfSense and its built-in HA (CARP Failover) and DNS Forwarder. I use the Forwarder in sequential mode, starting with PI-Hole. That way, nothing will bypass PI-Hole unless it is down itself. In that case, pfSense will forward to public DNS outide, bypassing the security but leaving DNS functional and logging everything.

Step 2
--A functional Kubernetes cluster
--A CNI that can fix IP addresses
--Routing between the cluster and the outside world
--Using MetalLB as a loadbalancer
--I use IPv6 / IPv4 DualStacking ; remove IPv6 if you do not use it...
--HA storage with ReadWriteMany

I built mine with Kubeadm under Ubuntu 22.04. Running the latest version (upgraded to 1.29.1 yesterday). I use Calico as a CNI. BGP is configured between Calico and pfSense. Longhorn is my storage engine. I gave a 100G of space to longhorn on each of my 6 workers and everything exist in 3 replicas. That gives me 200G of HA local storage space.

So again, this goes way beyond PI-Hole itself and there is a ton of work, debug, trial and error and more... Most of that is built with manifests and is required for the solution to work but again, I can not provide you with my details because for sure your situation will not match mine to the point of being able to re-use mine.

Now the fun stufff :slight_smile:

Create the volumes for PI-Hole :
+-+-+-+-+-+-+
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: lh-pvc-pihole-etc
namespace: prod-services
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: longhorn

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: lh-pvc-pihole-dnsm
namespace: prod-services
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: longhorn
+-+-+-+-+-+-+

First deployment is for PI-Hole RW
+-+-+-+-+-+-+
kind: Deployment
metadata:
labels:
app: pihole
name: pihole-rw
namespace: prod-services
spec:
replicas: 1
selector:
matchLabels:
app: pihole
strategy:
type: Recreate
template:
metadata:
labels:
app: pihole
annotations:
cni.projectcalico.org/ipAddrs: "["2001:db8:1234:100::1", "10.244.0.11"]"
spec:
containers:
- name: pihole-rw
image: pihole/pihole:2024.01.0
imagePullPolicy: IfNotPresent
env:
- name: TZ
value: America/Toronto
- name: VIRTUAL_HOST
value: 'pihole.example.org'
- name: DNSSEC
value: 'true'
- name: DNSMASQ_LISTENING
value: all
- name: PIHOLE_DNS_
value: 2606:4700:4700::1111;2620:119:53::53
- name: FTLCONF_RATE_LIMIT
value: 0/0
- name: FTLCONF_LOCAL_IPV4
value: 172.16.0.11
- name: FTLCONF_MAXDBDAYS
value: '90'
- name: REV_SERVER
value: 'true'
- name: REV_SERVER_DOMAIN
value: local.lan
- name: REV_SERVER_TARGET
value: 172.16.0.1
- name: REV_SERVER_CIDR
value: 172.16.0.0/12
- name: WEBPASSWORD
valueFrom:
secretKeyRef:
name: pihole-web-password
key: password
volumeMounts:
- name: pihole-etc
mountPath: /etc/pihole
- name: pihole-dnsm
mountPath: /etc/dnsmasq.d
ports:
- name: dns-tcp
containerPort: 53
protocol: TCP
- name: dns-udp
containerPort: 53
protocol: UDP
- name: web
containerPort: 80
protocol: TCP
resources:
requests:
cpu: "20m"
memory: "512Mi"
limits:
cpu: "250m"
memory: "896Mi"
readinessProbe:
exec:
command: ['dig', '@127.0.0.1', 'cloudflare.com']
timeoutSeconds: 20
initialDelaySeconds: 5
periodSeconds: 60
livenessProbe:
tcpSocket:
port: dns-tcp
initialDelaySeconds: 15
periodSeconds: 30
volumes:
- name: pihole-etc
persistentVolumeClaim:
claimName: lh-pvc-pihole-etc
- name: pihole-dnsm
persistentVolumeClaim:
claimName: lh-pvc-pihole-dnsm
+-+-+-+-+-+-+

Must create 2 services because you can not put TCP and UDP in a single one.
+-+-+-+-+-+-+
kind: Service
apiVersion: v1
metadata:
name: pihole-udp
namespace: prod-services
annotations:
metallb.universe.tf/allow-shared-ip: dns
metallb.universe.tf/address-pool: pihole-pool
spec:
selector:
app: pihole
ipFamilyPolicy: PreferDualStack
ipFamilies:
- IPv6
- IPv4
ports:

  • protocol: UDP
    port: 53
    name: dnsudp
    targetPort: 53
    type: LoadBalancer

kind: Service
apiVersion: v1
metadata:
name: pihole-tcp
namespace: prod-services
annotations:
metallb.universe.tf/allow-shared-ip: dns
metallb.universe.tf/address-pool: pihole-pool
spec:
selector:
app: pihole
ipFamilyPolicy: PreferDualStack
ipFamilies:
- IPv6
- IPv4
ports:

  • protocol: TCP
    port: 53
    name: dnstcp
    targetPort: 53
  • protocol: TCP
    port: 80
    name: web
    targetPort: 80
    type: LoadBalancer
    +-+-+-+-+-+-+

Now I add the ReadOnly pods
+-+-+-+-+-+-+
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: pihole
name: pihole-ro
namespace: prod-services
spec:
replicas: 3
selector:
matchLabels:
app: pihole
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: pihole
annotations:
cni.projectcalico.org/ipv6pools: "["pihole-pool6"]"
spec:
containers:
- name: pihole-ro
image: pihole/pihole:2024.01.0
imagePullPolicy: IfNotPresent
env:
- name: TZ
value: America/Toronto
- name: DNSSEC
value: 'true'
- name: DNSMASQ_LISTENING
value: all
- name: PIHOLE_DNS_
value: 2606:4700:4700::1111;2620:119:53::53
- name: FTLCONF_RATE_LIMIT
value: 0/0
- name: FTLCONF_MAXDBDAYS
value: '90'
- name: REV_SERVER
value: 'true'
- name: REV_SERVER_DOMAIN
value: local.lan
- name: REV_SERVER_TARGET
value: 172.16.0.1
- name: REV_SERVER_CIDR
value: 172.16.0.0/12
- name: WEBPASSWORD
valueFrom:
secretKeyRef:
name: pihole-web-password
key: password
volumeMounts:
- name: pihole-etc
mountPath: /etc/pihole
readOnly: true
- name: pihole-dnsm
mountPath: /etc/dnsmasq.d
readOnly: true
ports:
- name: dns-tcp
containerPort: 53
protocol: TCP
- name: dns-udp
containerPort: 53
protocol: UDP
resources:
requests:
cpu: "20m"
memory: "512Mi"
limits:
cpu: "250m"
memory: "896Mi"
readinessProbe:
exec:
command: ['dig', '@127.0.0.1', 'cloudflare.com']
timeoutSeconds: 20
initialDelaySeconds: 5
periodSeconds: 60
livenessProbe:
tcpSocket:
port: dns-tcp
initialDelaySeconds: 15
periodSeconds: 30
volumes:
- name: pihole-etc
persistentVolumeClaim:
claimName: lh-pvc-pihole-etc
- name: pihole-dnsm
persistentVolumeClaim:
claimName: lh-pvc-pihole-dnsm
+-+-+-+-+-+-+-+

Last step is to add an ingress (and secure it) to reach the management
+-+-+-+-+-+-+-+
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: pihole-ingress
namespace: prod-services
labels:
auth1: mtls
auth2: app-password
auth3: none
security: restricted
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/auth-tls-secret: ingress-nginx/ca-secret
nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
nginx.ingress.kubernetes.io/auth-tls-verify-depth: "1"
nginx.ingress.kubernetes.io/auth-tls-error-page: "https://static.example.org/mtls_error.html"
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
ingressClassName: nginx
tls:

  • hosts:
    • pihole-dc.example.com
      secretName: pihole-tls-secret
      rules:
  • host: pihole-dc.example.com
    http:
    paths:
    • path: /
      pathType: Prefix
      backend:
      service:
      name: pihole-tcp
      port:
      number: 80
      +-+-+-+-+-+-+

But again, for these manifest to work, there is ton of requirements : Cert-Manager, ingress-nginx, MetalLB and more.

Bonne chance avec ton propre déploiement,

Oups ; indent was removed when I posted the reply... Be sure to fix it when you try it.

I also noticed that I copied - pasted from a previous version of my files. You need to distinguish the two deployements (app = pihole-w and app = pihole-r) to avoid confusion between the services. pihole-w is the Web deployment (RW) and pihole-r is the resolver deployment (RO)

Thanks you very much for informations :smiley:

Merci de toutes ces informations, ca mériterais presque un push sur un repo git

If you want to keep indentation, you need to use "Preformatted text".

You have 3 options:

  • type your text or paste file contents, select the text and click on this button image in the edit window;

  • type your text or paste file contents, select the text and press CTRL + E;

  • use "fences" (3 backticks ```) before and after your text, like this:

```
Text
```

The result will look like this:

Text

For who is looking for formatted yaml:


# Persistent Volume Claims for Pi-hole
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lh-pvc-pihole-etc
  namespace: prod-services
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: longhorn

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lh-pvc-pihole-dnsm
  namespace: prod-services
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: longhorn

# Deployment for Pi-hole RW
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: pihole
  name: pihole-rw
  namespace: prod-services
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pihole
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: pihole
      annotations:
        cni.projectcalico.org/ipAddrs: '["2001:db8:1234:100::1", "10.244.0.11"]'
    spec:
      containers:
      - name: pihole-rw
        image: pihole/pihole:2024.01.0
        imagePullPolicy: IfNotPresent
        env:
        - name: TZ
          value: America/Toronto
        - name: VIRTUAL_HOST
          value: 'pihole.example.org'
        - name: DNSSEC
          value: 'true'
        - name: DNSMASQ_LISTENING
          value: all
        - name: PIHOLE_DNS_
          value: 2606:4700:4700::1111;2620:119:53::53
        - name: FTLCONF_RATE_LIMIT
          value: 0/0
        - name: FTLCONF_LOCAL_IPV4
          value: 172.16.0.11
        - name: FTLCONF_MAXDBDAYS
          value: '90'
        - name: REV_SERVER
          value: 'true'
        - name: REV_SERVER_DOMAIN
          value: local.lan
        - name: REV_SERVER_TARGET
          value: 172.16.0.1
        - name: REV_SERVER_CIDR
          value: 172.16.0.0/12
        - name: WEBPASSWORD
          valueFrom:
            secretKeyRef:
              name: pihole-web-password
              key: password
        volumeMounts:
        - name: pihole-etc
          mountPath: /etc/pihole
        - name: pihole-dnsm
          mountPath: /etc/dnsmasq.d
        ports:
        - name: dns-tcp
          containerPort: 53
          protocol: TCP
        - name: dns-udp
          containerPort: 53
          protocol: UDP
        - name: web
          containerPort: 80
          protocol: TCP
        resources:
          requests:
            cpu: "20m"
            memory: "512Mi"
          limits:
            cpu: "250m"
            memory: "896Mi"
        readinessProbe:
          exec:
            command: ['dig', '@127.0.0.1', 'cloudflare.com']
          timeoutSeconds: 20
          initialDelaySeconds: 5
          periodSeconds: 60
        livenessProbe:
          tcpSocket:
            port: dns-tcp
          initialDelaySeconds: 15
          periodSeconds: 30
      volumes:
      - name: pihole-etc
        persistentVolumeClaim:
          claimName: lh-pvc-pihole-etc
      - name: pihole-dnsm
        persistentVolumeClaim:
          claimName: lh-pvc-pihole-dnsm

# Services for Pi-hole
---
apiVersion: v1
kind: Service
metadata:
  name: pihole-udp
  namespace: prod-services
  annotations:
    metallb.universe.tf/allow-shared-ip: dns
    metallb.universe.tf/address-pool: pihole-pool
spec:
  selector:
    app: pihole
  ipFamilyPolicy: PreferDualStack
  ipFamilies:
    - IPv6
    - IPv4
  ports:
  - protocol: UDP
    port: 53
    name: dnsudp
    targetPort: 53
  type: LoadBalancer

---
apiVersion: v1
kind: Service
metadata:
  name: pihole-tcp
  namespace: prod-services
  annotations:
    metallb.universe.tf/allow-shared-ip: dns
    metallb.universe.tf/address-pool: pihole-pool
spec:
  selector:
    app: pihole
  ipFamilyPolicy: PreferDualStack
  ipFamilies:
    - IPv6
    - IPv4
  ports:
  - protocol: TCP
    port: 53
    name: dnstcp
    targetPort: 53
  - protocol: TCP
    port: 80
    name: web
    targetPort: 80
  type: LoadBalancer

# Deployment for Pi-hole RO
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: pihole
  name: pihole-ro
  namespace: prod-services
spec:
  replicas: 3
  selector:
    matchLabels:
      app: pihole
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: pihole
      annotations:
        cni.projectcalico.org/ipv6pools: '["pihole-pool6"]'
    spec:
      containers:
      - name: pihole-ro
        image: pihole/pihole:2024.01.0
        imagePullPolicy: IfNotPresent
        env:
        - name: TZ
          value: America/Toronto
        - name: DNSSEC
          value: 'true'
        - name: DNSMASQ_LISTENING
          value: all
        - name: PIHOLE_DNS_
          value: 2606:4700:4700::1111;2620:119:53::53
        - name: FTLCONF_RATE_LIMIT
          value: 0/0
        - name: FTLCONF_MAXDBDAYS
          value: '90'
        - name: REV_SERVER
          value: 'true'
        - name: REV_SERVER_DOMAIN
          value: local.lan
        - name: REV_SERVER_TARGET
          value: 172.16.0.1
        - name: REV_SERVER_CIDR
          value: 172.16.0.0/12
        - name: WEBPASSWORD
          valueFrom:
            secretKeyRef:
              name: pihole-web-password
              key: password
        volumeMounts:
        - name: pihole-etc
          mountPath: /etc/pihole
          readOnly: true
        - name: pihole-dnsm
          mountPath: /etc/dnsmasq.d
          readOnly: true
        ports:
        - name: dns-tcp
          containerPort: 53
          protocol: TCP
        - name: dns-udp
          containerPort: 53
          protocol: UDP
        resources:
          requests:
            cpu: "20m"
            memory: "512Mi"
          limits:
            cpu: "250m"
            memory: "896Mi"
        readinessProbe:
          exec:
            command: ['dig', '@127.0.0.1', 'cloudflare.com']
          timeoutSeconds: 20
          initialDelaySeconds: 5
          periodSeconds: 60
        livenessProbe:
          tcpSocket:
            port: dns-tcp
          initialDelaySeconds: 15
          periodSeconds: 30
      volumes:
      - name: pihole-etc
        persistentVolumeClaim:
          claimName: lh-pvc-pihole-etc
      - name: pihole-dnsm
        persistentVolumeClaim:
          claimName: lh-pvc-pihole-dnsm

# Ingress for Pi-hole
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: pihole-ingress
  namespace: prod-services
  labels:
    auth1: mtls
    auth2: app-password
    auth3: none
    security: restricted
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/auth-tls-secret: ingress-nginx/ca-secret
    nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
    nginx.ingress.kubernetes.io/auth-tls-verify-depth: "1"
    nginx.ingress.kubernetes.io/auth-tls-error-page: "https://static.example.org/mtls_error.html 1"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
      - pihole-dc.example.com
    secretName: pihole-tls-secret
  rules:
  - host: pihole-dc.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: pihole-tcp
            port:
              number: 80

Note: I don't tested yet and will update if necessary

Hi all. This was such a fun project, and I appreciate everyone else's work here (especially @Heracles31). In that light, I would like to contribute back:

I made a Kustomized version of this, and have fully tested it in my cluster. I documented it pretty thoroughly (at least I think so). Feel free to fork it and edit/change/play/deploy as you like.

Update (2024-09-07 #1): It seems there are severe drawbacks to this approach. While it is load balancing across all nodes, it is not actually fully functional. At least, not without severe workarounds:

  1. Weirdness with DNSMASQ_LISTENING setting. If you choose local, you can't perform DNS lookups against it. If you set it to single, it works, but you can't upload a backup (seriously at my wits' end here trying to explain that one). It looks like it works, but nothing actually happens. If you set it to all, everything works with regards to restoring a backup, but it doesn't filter anything.
  2. Adding domains doesn't "automatically" work. For some reason, the database may take time to replicate, even though all pods are sharing the same PVC. Regardless, adding a domain to the blacklist does not auto-propagate to the read-only nodes. Ran many adhoc tests to prove this. In some cases, running pihole -g would help, even though the database is "read-only" on those systems. It was too unpredictable to make adding/removing domains tenable.
  3. The problem in 1 with regards to not being able to upload a backup can be worked around. It involves manually editing the /etc/dnsmasq.d/01-pihole.conf file to add or remove interface=eth0 depending on whether you wanted to have the backup loading feature work. However, any change to this file needs to be propagated. If you start with it missing, all of the pods will do DNS without blocking. If you add it later, the read-only pods won't block without logging in to each pod with exec and running /usr/local/bin/pihole restartdns.

Given these limitations, I may not continue working on this. Possible ways forward:

  1. All nodes are read-write, but somehow the UI points to a reverse proxy that talks to each of them ALL the time (instead of load balancing). Any changes made to one would then be made to all.
  2. Leader/follower model. Any changes to the leader are pushed to the follower(s) with automated reload/restart of pihole/dns/etc.
  3. Run multiple, normal instances with their own PVCs, fully read-write. Manage add/remove of white- and black- lists via exec pihole using -b, -w, --regex, --white-regex, --wild, --white-wild flags to add to the lists.