High Availability Pi-hole via Docker Swarm and GlusterFS

I came across a few posts that deal with issues surrounding multiple Pi-hole instances that my high availability implementation seem to resolve. Instead of hijacking the posts, which I already tried to do, I figured I'd create a new one for my solution.

I use Docker Swarm to replicate the Pi-hole service and MACVLAN routing to ensure proper communication between client and Pi-hole services. To keep settings, lists, etc in sync, I use bind mounts backed by GlusterFS.

The solution is working well for me, even handling the upgrade to 5.0 without issue. The playbook I created might not be the most flexible but it should serve as a good example and get you most of the way there, assuming you have a bit of technical chops.

You may need to smash together the following two resources I created:

Feedback and suggestions welcome!

Related Topics:

Curious how this works with PiHole 5, and the move from files to database for whitelist/blacklist, etc.

Do each of your nodes maintain their own DB, or do you have one that all instances read/write to?

The solution has been working w/o issue for me with PiHole 5. The DB is stored in the GlusterFS volume which is replicated to each node. Each node then mounts this volume and is able to read/write to the DB and updates are instantly replicated on all nodes. Basically each node sees a local DB but since it's backed by GlusterFS, only one exists, and you don't need to run any scripts to sync changes.

I think I might give this a try. I currently am using Portainer to manage my docker swarm, so I might take a stab at building out a docker compose file to create a stack to be used, since I am not as familiar with Ansible.

So does this mean that, no matter which web instance you connect to, it shows the stats for all instances combined? So for example, if you had two nodes, and each had served 100 queries, the web interface would show 200, no matter which node I connect to?

Also, if I do a disable for 5 minutes from the web interface, will that disable command replicate to all nodes, or would I have to execute that command on each node individually. I'm guessing that is something that would need to be executed on each node, but figured I'd ask.

@_popeofmope one other question, did you get you setup working with IPv6 or just IPv4?

I only used IPv4 and disabled IPv6.

Stats appear to be in sync on my instances so I'd imagine they're stored in the DB, there's a slight discrepancy but could be browser caching. Disabling on one instance will disable all so that must be stored in the DB.

You may need to set up the MACVLAN network manually, Portainer didn't handle this as expected iirc.

I think I’ve got it built out and working. I have IPv4 and IPv6. I’ve got Unbound setup as a recursive dns for PiHole.

My only “issue” is that right now my main setup is using one of the pihole feature branches, Mac clients. Once that gets rolled in, I’ll give this another go and do further testing.

Thank you for sharing your setup! Nice work!

I have a similar setup - I am running 3 PiHoles on 3 RPI3B+ and using a modified rsync script via ssh and cron I pulled from Reddit to keep settings synchronized. On the front end I have 2 NGINX containers running via Docker on two separate machines (Synology NAS and RPI3 as part of the cluster). I then have my Unifi USG specify the two NGINX IPs as DNS servers in my DHCP setup for my LAN network. The NGINX containers listen for port 53 UDP requests and load balance across the instances.

This all has worked swimmingly for over a year now and through multiple upgrades. The only downside I haven't been able to accomplished is the transparent proxy (passing client IP) through NGINX to PiHole and back for per client tracking - it is supposed to work but I've been unsuccessful to date.

Anyway, I've come across MACVLAN a few times but never dug in - does it have the ability to pass transparently and load balance? (I'll review the link you shared and research further).

Happy my solution has helped you a bit. I don't load balance my Pi-hole instances, instead listing the IPs manually in my Edgerouter config. This means clients access the Pi-holes directly and client stats work w/o issue. Do you have the following in your nginx config:

proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

With regards to load balancing MACVLAN services, I haven't set this up but I'd start here, Use swarm mode routing mesh | Docker Documentation

Thank you for the prompt response. The reason I don't list the PiHoles directly as part of the DHCP is that it isn't balanced, it ends up sending them all 108K queries to the first one listed unless it's down (HA) vs. my setup now has 108K queries spread almost identically across 3 at 36K each.

I may be mistaken but I don't believe headers are for HTTP where I am just using a STREAM.

Below is my NGINX config:

user root;

error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events {
    worker_connections 1024;
}

stream {
  upstream dns_servers {
    server 10.0.21.100:53;
    server 10.0.21.110:53;
    server 10.0.21.120:53;
    server 1.1.1.1:53 backup;
  }
  server {
      listen [::]:53 udp;
      listen *:53 udp;
      proxy_pass dns_servers;
      #proxy_bind $remote_addr:$remote_port transparent;
      #proxy_responses 0;
      #proxy_timeout 1s;
      error_log /var/log/nginx/dns.log info;
  }
}

The following line is what should enable the transparent proxy but what I suspect is happening is it does not know how to route the response back to the client and as a result just hangs

proxy_bind $remote_addr:$remote_port transparent;

This is where I started https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/

Is $remote_port needed? Also have you tried to hard code the proxy_bind value? I found this blog post that goes into transparency in a little more detail, https://www.nginx.com/blog/ip-transparency-direct-server-return-nginx-plus-transparent-proxy/

Past that I got nothing for now, maybe someone else can chime in

what about write locking , are you CERTAIN that the clients cannot mess up the "SPOF" DB ?

would't it make sense to put the nodes into each others upstream config and use routing mesh or haproxy in front ?

regards

I haven't looked into that but also don't think I've experienced any issues like that.