[solved] Timeouts (Traefik + docker swarm + replicated Pi-hole)

I'm intending to use pihole for DNS requests and its admin interface. I have a (partially) working configuration and a malfunctioning configuration:

Working
Configuration

  • deploy pihole on docker swarm over a custom overlay network
  • create custom route in traefik by defining dynamic configuration, pointing to host pihole_pihole:80
  • frontend is accessible through http://pihole.traefik/admin

Drawback

  • load balancing is done through docker swarm and is random, each call ends up at a different pihole node, making it impractical

Malfunctioning
Configuration

  • create the traefik integration (swarm mode) through docker labels
  • traefik picks this up correctly, creating route and service
  • having traefik do the load balancing, I can now use sticky sessions to make sure I remain connected to the same node

Problem
I'm getting timeouts when getting http://pihole.traefik/admin. Only once in a blue moon I get a response . The system stays responsive for 2 seconds and then becomes inactive again. I don't see relevant error logs on the pihole containers or traefik itself, so I'm not sure where I'm going wrong here.

System

  • Docker version 19.03.12-ce, build 48a66213fe
  • Swarm enabled, three nodes

The functioning traefik configuration is super basic, so I'll leave that out. Below is my pihole compose file, including the labels that I use in the malfunctioning scenario. Any ideas about why my setup is failing are greatly appreciated, I've been staring at this problem for a while now and I'm not sure where to go next.

version: "3.5"
services:
  pihole:
    image: pihole/pihole:latest
    hostname: '{{.Node.Hostname}}'
    ports:
      - "53:53/tcp"
      - "53:53/udp"
    environment:
      TZ: 'Europe/Brussels'
      WEBPASSWORD: 'dddd'
      # swarm does not support cap_add, this reverts to previous behaviour
      FTL_CMD: 'debug'
      DNSMASQ_LISTENING: 'all'
    volumes:
       - '/srv/pihole/etc-pihole/:/etc/pihole/'
       - '/srv/pihole/etc-dnsmasq.d/:/etc/dnsmasq.d/'
    dns:
      - 127.0.0.1
      - 8.8.8.8
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
      # using these labels when trying the malfunctioning scenario (route through traefik docker swarm integration)
      #labels: 
        #traefik.enable: 'true'
        #traefik.http.services.pihole.loadbalancer.server.port: 80
        #traefik.http.services.pihole.loadbalancer.sticky.cookie: "true"
        #traefik.http.routers.pihole.rule: "Host(`pihole.traefik`)"
        #traefik.http.routers.pihole.service: "pihole"
        #traefik.http.routers.pihole.entrypoints: "web"
      placement:
        constraints:
          - "node.labels.dns==true"

networks:
  default:
    external: true
    name: frontend

As per usual the answer came to me after asking about it. As I'm using a custom docker network traefik needs to be aware of this as well: add

[providers.docker]
  network = "frontend" # this is mine

configuration. Slightly confused as to why my other whoami test did work without this adjustment, but, hey, can't complain.