No DNS replies over ingress

Hello!
I've recently set up a Raspi 4B cluster (4 nodes) and deployed PiHole aaS. While doing that I ran into a few problems. I currently have a hacky solution running successfully but who knows for how long it will work.

The problem

version: "3"

services:
  pihole:
    container_name: pihole
    image: pihole/pihole:latest
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.role == manager]
    volumes:
      - "/home/ubuntu/docker/pihole/etc-pihole/:/etc/pihole/"
      - "/home/ubuntu/docker/pihole/etc-dnsmasq.d/:/etc/dnsmasq.d/"
    ports:
      - 53:53/tcp
      - 53:53/udp
    restart: unless-stopped
    dns:
      - 127.0.0.1
      - 84.200.69.80
    environment:
      - "TZ=Austria/Vienna"
      - "WEBPASSWORD=<redacted>"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.pihole.entrypoints=web"
      - "traefik.http.routers.pihole.rule=Host(`pihole.backbone.localdomain`)"
      - "traefik.http.services.pihole-svc.loadbalancer.server.port=80"
      - "traefik.http.routers.pihole.service=pihole-svc"
    networks:
      - traefik-backend
      - bridge

networks:
  traefik-backend:
    external:
      name: traefik-backend
  bridge: #hacky workaround to make PiHole not publishing DNS to the overlay net

As you can see it's pretty much a standard PiHole docker-compose.yml with a few differences:

  • deployment with only 1 replica on the master node (because hell why not?)
  • a load of labels for Traefik
  • 2 networks

The relevant part here are the networks. traefik-backend was a bridge connecting every container that had a web frontend to Traefik. I changed it for swarming to an overlay network (hell yeah I want to move services to other hosts but should still be able to proxy them) And that's where problems started to arise.

PiHole is configured to expose dnsmasq to eth0 which it did successfully so far but when I restarted the container none of my DNS requests could be resolved anymore. I suspect it's due to the new network type. PiHole opened to the traefik-backend and I couldn't do a single DNS request anymore even given that the ports were forwarded to the host.

Fun fact: the web interface was available and reachable all the time during my doings. Before. While only traefik-backend and ingress was available and after I applied my "solution".

The hacky solution

My solution to this was to add the bridge net. It brought itself before the traefik-backend and became eth0. I can now resolve hosts again.

Debug token

0b1k0yve6n

My question

I haven't come up with the real question yet. I want to know why I PiHole can't resolve DNS requests when connected to an overlay network. I mean I can open the dashboard, so I should also be able to communicate with the DNS server right?

Don't use latest.
It's good to get you started on the most recent version, but you should use a versioned image for productive deployments. Using latest is the equivalent of running unattended updates.

In your case, this unintended update by a container restart may have provoked an error, likely the same as Docker pihole/pihole:latest -> v5.4 not starting.

See if using v.5.5 would solve your issue.

1 Like

PiHole is currently working. It can resolve requests and I can access the interface, so no problem there. I restarted the container a few times to confirm my hacky solution and it was always the same issue. DNS request get resolved over bridge but not on traefik-backend

Thanks for your suggestion. I'll implement it.

Since we've had a few security releases during the last few days, it'll be difficult to attribute your observation to a specific version when using latest.

Once you've confirmed which version you're own, please post a new debug token.

I can't promise to be of much help, since you are running a specialised setup, and debug logs are less expressive/reliable for Docker Pi-holes anyway, but I'll take a look.
That may take some time, though, since I'm just about to go offline. :wink:

Thanks for your prompt reply! I've downgraded the image to v5.5.1 and generated a new debug log:

sjfn399afh

I haven't confirmed the bug yet... I'll send in my results in another post.

Test results (web interface/dns resolving) with Traefik disabled (no revproxy):

  • with workaround (bridge net): success/success
  • w/o workaround (overlay net): success/failure (timeout)
  • w/o any net (ingress net): success/success

Well that's weird. So it must be the overlay net? Any idea where I f'd up? Also ingress is an overlay net. I am so confused.

Update:
After thinking that maybe ingress is not eth0 I checked and these are my results.

root@384267f5cef9:/# ip ad
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
299: eth0@if300: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:00:00:0b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.11/24 brd 10.0.0.255 scope global eth0
       valid_lft forever preferred_lft forever
301: eth2@if302: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:15:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet 172.21.0.3/16 brd 172.21.255.255 scope global eth2
       valid_lft forever preferred_lft forever
303: eth1@if304: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:00:02:03 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 10.0.2.3/24 brd 10.0.2.255 scope global eth1
       valid_lft forever preferred_lft forever
root@384267f5cef9:/#

Fun so I got 3 networks now instead of 2. A quick investigation later:
ingress:

"Name": "pihole_pihole.1.2skkd211qptxzgavrt5wnmf4v",
"EndpointID": "7f7485560d605427003766b8d6e6ceb16a00d4a3c11e16694520a303d794e48d",
"MacAddress": "02:42:0a:00:00:0b",
"IPv4Address": "10.0.0.11/24",
"IPv6Address": ""

docker_gwbridge:

"Name": "gateway_049bee6c09b5",
"EndpointID": "68c592e04635a3c783ed3dbdd79969f7f5ab5ee9d94db050be359c93ef04f4b5",
"MacAddress": "02:42:ac:15:00:03",
"IPv4Address": "172.21.0.3/16",
"IPv6Address": ""

And the 3rd network is pihole_default so nothing to verify here...

PiHole still exposes its DNS server to the ingress network and I just can't come up with an explanation why it works over that net but not on traefik-backend...

Do you have any idea so far?

Your problem seems indeed related to networking issues, likely introduced by the way your Docker network mode configuration handles client isolation.

By default, Pi-hole only listens to requests local to the interface it has been configured for (eth0 in your case).

If you intend it to serve DNS requests for different subnets and interfaces, consider switching Interface listening behavior under Settings|DNS to one of the Listen on all options, or supply the respective DNSMASQ_LISTENING: <local|all|NIC> environment variable in your Docker compose.

You're right but I'm surely not the only person that has PiHole running on a swarm. I guess I'd have to ask the Docker community to find out where I f'd up.

My PiHole is only on LAN so your suggestion should be safe. Thanks for your time! I'll mark your answer as the solution when I found PiHole working.

Workaround confirmed. Thanks for your time!