DNS/UI hanging + unable to generate debug logs

Expected Behaviour:

UI should render, and, obviously, expect DNS resolution to be highly available.
-Ubuntu 20.04 (as apt install) and 24.04 (as a container via compose)
-Both are VMs under VMware 6.5 running on Dell 520/530 series servers

Actual Behaviour:

After some period of time, DNS resolution ceases and the UI will not render. The other pihole instance is working fine and will resolve/render, at least until it also ceases. In all cases, VM/container reboot and restartdns appear to correct the behavior.

Debug Token:

No token is generated since "pihole -d" won't complete and does not continue after "Dashboard headers". Can reach tricorder.

1/29/2025 - Debug now completing: https://tricorder.pi-hole.net/7wDiHT1S/

Network configuration is using Windows AD/DNS as first-level DNS resolvers, who then use the pihole instances as forwarders. Conditional forwarding is configured to point back to them, just in case. Average query rate is 1000-1500/hr. In this case, the second instance (pihole-2) is actually very lightly loaded since it is currently backing up the primary, in case of DNS failure.

This has been going on for over a year, which lead me to build the secondary pihole instance on 24.04 and use a container. I did not expect the container deployment to see the same issue, thus, felt my final recourse was to seek help here. Primary pihole has been set to reboot daily in an attempt to auto-recover and potentially alleviate the DNS/UI problems.

Without a debug log, we need information.

From a terminal window, in Pi-hole server:

  • What is the output of pihole -v ?
  • What is the output of curl -I http://localhost/admin/?
  • What is the output of tail -n 20 /var/log/lighttpd/error-pihole.log?

Do you see any error messages in your browser Console (devtools)?

root@pihole-2:/var/log/pihole# sudo pihole -v
Pi-hole version is v5.18.3 (Latest: v5.18.4)
web version is v5.21 (Latest: v5.21)
FTL version is v5.25.2 (Latest: v5.25.2)
root@pihole-2:/var/log/pihole# sudo curl -I http://localhost/admin/
^C
root@pihole-2:/var/log/pihole# tail -n 20 /var/log/lighttpd/error-pihole.log
root@pihole-2:/var/log/pihole#
root@pihole-2:/var/log/pihole#

I cancelled curl after about 2 minutes.

I'm not super proficient with devtools, but the only discernable difference between the working pihole instance and the not is this message:

Apparently the web server is not working correctly.

What is the output of sudo systemctl status lighttpd.service?

Since this is a container, systemctl is not exposed. I'm trying to run this as clean to pihole documentation, so I did not make any extra configuration not identified in the GH documentation.

Here is some "ps" information instead:

root 28 0.0 0.0 212 0 ? S Jan23 0:00 s6-supervise lighttpd
root 29 0.0 0.0 212 0 ? S Jan23 0:00 s6-supervise lighttpd-access-log
root 30 0.0 0.0 212 0 ? S Jan23 0:00 s6-supervise lighttpd-error-log
www-data 235 0.0 0.0 5008 3712 ? S Jan23 0:30 lighttpd -D -f /etc/lighttpd/lighttpd.conf

I thought this was running in a Ubuntu VM.

Please post the compose file used to start the container.

I have two instances, both have the same problem over time. Current issue is with the container instance.

services:
  pihole:
    container_name: pihole
    hostname: pihole-2
    image: pihole/pihole:latest
    # For DHCP it is recommended to remove these ports and instead add: network_mode: "host"
    ports:
      - "53:53/tcp"
      - "53:53/udp"
      - "80:80/tcp"
    environment:
      TZ: 'America/New_York'
      WEBPASSWORD: '********'
    # Volumes store your data between container upgrades
    volumes:
      - './etc-pihole:/etc/pihole'
      - './etc-dnsmasq.d:/etc/dnsmasq.d'
    #   https://github.com/pi-hole/docker-pi-hole#note-on-capabilities
    restart: unless-stopped

I checked the UI again and didn't see the same error reported earlier. After the login page, same result, but no issues being seen.

Thought this was worthy to report. Login page does report issues, but that seems to be working as expected.

Also, the curl command is behaving today, unlike the hang yesterday. Run inside and outside the container.

root@pihole-2:/# curl -I http://localhost/admin
HTTP/1.1 301 Moved Permanently
Location: /admin/
Date: Wed, 29 Jan 2025 18:00:08 GMT
Server: lighttpd/1.4.59

Now found "pihole -d" will complete, although UI won't render and DNS queries won't resolve to it. Updated first post to include the token.

Your debug log doesn't show web server (lighttpd) issues and the web server logs are completely empty.

This could mean the web interface is working as expected inside the container, but you are not able to access it... or there is something wrong, but the logs are not logging the error messages.

Let's first check if lighttpd is running and if it is writing to the log.

Please run this command inside the container:

service lighttpd status # this should return something like:
                        #  > up (pid 3704) 369 seconds

service lighttpd restart
tail /var/log/lighttpd/error-pihole.log

If the restart is successful, there will be a new line in the log, containing server started (lighttpd/1.4.59). Also, service lighttpd status will return a new PID.

Do keep in mind this problem occurs with both container and non-container deployments.

root@pihole-2:~# service lighttpd status
up (pid 217) 507097 seconds
root@pihole-2:~# service lighttpd restart
root@pihole-2:~# service lighttpd status
up (pid 222575) 4 seconds
root@pihole-2:~# tail /var/log/lighttpd/error-pihole.log
2025-01-29 15:54:40: server.c.1513) server started (lighttpd/1.4.59)

No change in UI behavior:

DNS requests are still timing out.

Your curl command and the last commands inside the container show the web server is working. Also, the debug log shows Pi-hole is working. There are no obvious errors.

Keep in mind that the current Pi-hole v5 code wasn't changed in more than 7 months and no one else complained about a similar issue.

This is a local issue, but without explicit error messages in the logs and no error messages in the browser the only suggestions I have are:

  • disable all browser extensions and try again;
  • try to access from a different browser, in another machine;
  • check if your VMs are up and running and if they are reachable all the time;
  • check if there are firewall rules blocking something

Besides that, I'm out of ideas.

DNS resolution is the major problem, which is independent of browser extensions. Obviously, the VMs are up since we can interrogate them. If I restartdns, it starts to work, so firewall rules don't make sense. Also, I have the primary instance which is working fine.

I can't believe this is a local issue due to it working just fine, then not, and fixed after a magical reboot. Without diving further, it appears the basic issue is the DNS services and the UI hangs on them not functioning properly. Using the container, that is the environment that is recommended.

I do have DNSSEC and IPv6 enabled, although IPv6 is non-functional due to routing issues. I'll try with those disabled, but that should not hang the DNS services.

In response to this, I just went to double check the UI. It is now moving to a new stage, but complaining about losing connection to the API on the Dashboard, and this for the Query Log:

I'll wait for your next response before pursuing a restart.

Edit/correction: I interpret "local issue" as something outside the container/application. I accept there is something unique to the environment, since that is why I'm posting and hoping to contribute to the improvement of the product. Are there commands for us to evaluate what is happening with the DNS components?

Docker is reporting the container as unhealthy too.

CONTAINER ID   IMAGE                       COMMAND      CREATED       STATUS                  PORTS                                                                                                             NAMES
baa55c13994e   pihole/pihole:latest        "/s6-init"   2 weeks ago   Up 6 days (unhealthy)   0.0.0.0:53->53/tcp, :::53->53/tcp, 0.0.0.0:80->80/tcp, 0.0.0.0:53->53/udp, :::80->80/tcp, :::53->53/udp, 67/udp   pihole

Yes. I was trying to say this is only happening to you (We don't have other opened issues with similar description. This doesn't happen in my container or in my tests... It is very weird, but lets try to debug it).

This seems an intermittent network issue, but I'm just guessing.

You said you will try to disable IPv6. It doesn't hurt trying that. If it still doesn't work, at least it will be one less variable.

Then restart the container. Do you see any errors in docker logs?

Almost certainly not an intermittent network issue. Using nslookup shows the container just isn't resolving to the local service. This removes the network entirely and isolates the container. nslookup inside the container to the other pihole instance works flawlessly, which depends on the network. nslookup inside the container to 8.8.8.8 also works flawlessly and demonstrates reachability to the Internet.

Current UI error messages are reporting DNS and FTL aren't happy. I'll restart, but this is pixie dust :wink: .

Do you recommend we try anything differently when this happens again? This happens about 1-2 times a month.

Restarted the container, all is well now. Though, there was this in the /var/log/lighttpd/error-pihole.log.

root@pihole-2:/var/log/lighttpd# cat error-pihole.log
2025-01-29 15:54:40: server.c.1513) server started (lighttpd/1.4.59)
2025-01-29 22:32:41: server.c.1513) server started (lighttpd/1.4.59)
2025-01-29 22:32:41: gw_backend.c.475) unlink /run/lighttpd/pihole-php-fastcgi.socket-0 after connect failed: Connection refused

I should note that was captured before I changed the configuration to remove DNSSEC and IPv6, which is now unconfigured.

Please share that nslookup along with its output and the IP of the client that nslookup was run from.

Since Pi-hole's DNS service is reported as not running at times, you could inspect /var/log/pihole/FTL.log for related messages.

This may close a DNS loop, at least a partial one for local names, specifically unknown ones. If that happens, I'd expect your Pi-hole to register either rate limit or max number of concurrent DNS queries warnings, which potentially may render Pi-hole's DNS services unavailable, at least for very short periods.

From your debug log, I've also noticed that you've configured local DNS records that are both prone to conflicts and incomplete.

To start with, you have created a CNAME record for a public domain, pointing to a local domain that Pi-hole isn't authoritative for:

-rw-r--r-- 1 root root 46 Jan 16 09:02 /etc/dnsmasq.d/05-pihole-custom-cname.conf
   cname=mail.<redacted>.com,atlas.commandpost.local

That won't work reliably, as your Pi-hole doesn't hold any DNS record definitions for atlas.commandpost.local). If it sometimes resolves, then that's only because Pi-hole may have cached some DNS reply for atlas.commandpost.local from an independent separate lookup.

This by design - note the description on Pi-hole's CNAME panel:

The target of a CNAME must be a domain that the Pi-hole already has in its cache or is authoritative for. This is a universal limitation of CNAME records.

To address this, you could probably consider to create matching local DNS records for atlas.commandpost.local within Pi-hole - provided you resolved the conflicting definition for mail.<redacted>.com first.

You are also shadowing public domains with a local DNS A records:

-rw-r--r-- 1 root root 49 Jan 16 09:01 /etc/pihole/custom.list
   10.1.0.3 mail.<redacted>.com
   10.1.0.3 www.<redacted>.com

Note that you've only created A records here.
Consequently, any client requests for other record types will still be forwarded upstream to public DNS servers.
In particular, that would mean that an nslookup for www.<redacted>.com would return public IPv6 addresses along with that private IPv4, and consequently, IPv6-capable clients may then prefer talk to an IPv6 address.

In addition, mail.<redacted>.com would conflict with your CNAME definition for that same domain.
You should decide whether you want to keep either the CNAME or the A record.

And finally, you are using commandpost.local as your local/search domain.
You should note that .local is the TLD reserved for the mDNS protocol, and should not be used with plain DNS. Doing so may create unnecessary additional network traffic and potentially confuse you when DNS and mDNS would yield different resolution results.
Granted, that potential is lower in your case, as your search domain is longer than just .local, but you still should consider to use one of the TLDs reserved for home network usage, like .internal, .lan or .home.arpa.