DNS resolution unavailable - unless local.list is deleted

Hi,

I'm running pi-hole on a QNAP NAS inside a docker container (official image).
When I "first" start the container, everything works smoothly and as a byproduct an /etc/pihole/local.list file is created that maps the container IP to the pihole and pi.hole host names.

When I restart the container, the start process eventually times out and I get a this on the console:

::: Testing pihole-FTL DNS: FTL started!
::: Testing lighttpd config: Syntax OK
::: All config checks passed, cleared for startup ...
 ::: Docker start setup complete
  [✗] DNS resolution is currently unavailable
  [✗] DNS resolution is not available
[cont-init.d] 20-start.sh: exited 1.
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] syncing disks.
[s6-finish] sending all processes the TERM signal.

After removing the local.list file and restarting the container, pi-hole comes up again without any issues.

I am providing two local DNS servers for the container using the DNS1 and DNS2 environment variables.
The container's resolv.conf strangely only mentions nameserver 127.0.0.11 although I'm explicitely specifying two nameservers with the container config.

Please follow the below template, it will help us to help you!

Expected Behaviour:

Smooth startup.

Actual Behaviour:

Timeout causing a container restart loop.

Debug Token:

[✓] Your debug token is: 66s36mve5t

Apparently the 127.0.0.11 nameserver in the container's resolv.conf is how Docker handles name resolution internally - so that should be ok.
Running dig while the start script claims that DNS resolution is unavailable, shows that that's not true:

[~] # docker exec -ti pihole dig a google.com

; <<>> DiG 9.10.3-P4-Debian <<>> a google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32842
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		196	IN	A	172.217.18.14

;; Query time: 2 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Mon Nov 12 16:21:43 UTC 2018
;; MSG SIZE  rcvd: 55

I am providing two local DNS servers for the container using the DNS1 and DNS2 environment variables.

Environment variables are not what you use to modify resolv.conf, docker controls that file with the --dns argument. Please try docker run with --dns 127.0.0.1 --dns 1.1.1.1 (or whichever you prefer for the second one).

I.E. The DNS environment variables are for the Pi-hole service (port 53) upstream servers. --dns arguments are for the linux container (resolv.conf), used by the pi-hole services and scripts...if that makes sense.

I'm aware of that, sorry if I expressed that misleadingly.
I'm supplying 127.0.0.1 and 192.168.10.3 (one of the mentioned local DNS servers) via the --dns Option to the container too (I'm actually using docker-compose to create the container, so it's dns:, but that shouldn't matter).

Sebastian

Does the container DNS match your host server's resolv.conf settings? I'm wondering if your docker-compose format is slightly off and it isn't alerting you to the fact you have a typo.

Double check your docker-compose version reference documentation for DNS settings and confirm your yaml is the right format.

The host server's resolv.conf is different from the container's as it contains one of my local DNS servers and my WiFI router's IP (which is as it is supposed to be).
The container's resolv.conf only contains the Docker interal DNS 127.0.0.11, which in turn uses the servers supplied via the --dns option.
There's nothing wrong with my docker-compose config (docker-compose usually complains loudly if there was).
Here's the relevant excerpt:

version: "3"
services:
  app:
    image: pihole/pihole:latest
    container_name: pihole
    hostname: pihole
    domainname: example.com
    dns:
      - 127.0.0.1
      - 192.168.10.3
    ports:
      - [...]

I ran a tcpdump at the 192.168.10.3 DNS server while the container is running the start script with the local.list file in place.
I see dozens of queries for the AAAA record of "pi.hole.", repeating every 2 seconds - which the server knows nothing about and thus replies with NXDOMAIN each time.

About two minutes in, I'm also starting to see queries for the A and AAAA records of pi.hole.example.com. (with example.com being a placeholder for my local domain), and occasionally also for the A record of pi.hole. - all of those get a NXDOMAIN reply...

When I remove the local.list file and restart the container, I don't see any of those queries hitting my DNS server.
I see the query for raw.githubusercontent.com. and some others - so that looks fine to me.

Sebastian

I now (temporarily) added a "hole" zone to the 192.168.10.3 DNS server and a "pi.hole." record inside it.
With this, the container starts without any issues even with the local.list file present.

So why is it not querying my DNS server for "pi.hole." when the local.list file is missing and why is it not using the local.list file (that contains the pi.hole host entry) when it is present?

Sebastian

According to the debug token you provided it did use the local.list for pi.hole back when you ran debug. read /etc/pihole/local.list - 4 addresses

Perhaps a Null IPv4 address would work better since that's the default mode anyway. Try ServerIP=0.0.0.0 in your docker-compose environment variables and see if that makes it happier.

Unfortunately, that doesn't change anything - other than the contents of the local.list file now being:

0.0.0.0 pihole
0.0.0.0 pi.hole

Sebastian

Could you add the environment var PH_VERBOSE: 1, destroy your container to clear out the logs, start a new container, and then provide a full docker log of the container and your docker-compose file? Please remove any passwords if necessary.

So the issue apparently is the way that DNS resolution is checked within the gravity.sh script, specifically the gravity_CheckDNSResolutionAvailable() function.

It only checks for resolvability of "pi.hole" if the local.list file exists - so that explains that:

  local lookupDomain="pi.hole"

  # Determine if $localList does not exist
  if [[ ! -e "${localList}" ]]; then
    lookupDomain="raw.githubusercontent.com"
  fi

And then we have that part:

# Determine if $lookupDomain is resolvable
  if timeout 1 getent hosts "${lookupDomain}" &> /dev/null; then
    # Print confirmation of resolvability if it had previously failed
    if [[ -n "${secs:-}" ]]; then
      echo -e "${OVER}  ${TICK} DNS resolution is now available\\n"
    fi
    return 0
  elif [[ -n "${secs:-}" ]]; then
    echo -e "${OVER}  ${CROSS} DNS resolution is not available"
    exit 1
  fi

  # If the /etc/resolv.conf contains resolvers other than 127.0.0.1 then the local dnsmasq will not be queried and pi.hole is NXDOMAIN.
  # This means that even though name resolution is working, the getent hosts check fails and the holddown timer keeps ticking and eventualy fails
  # So we check the output of the last command and if it failed, attempt to use dig +short as a fallback
  if timeout 1 dig +short "${lookupDomain}" &> /dev/null; then
    if [[ -n "${secs:-}" ]]; then
      echo -e "${OVER}  ${TICK} DNS resolution is now available\\n"
    fi
    return 0
  elif [[ -n "${secs:-}" ]]; then
    echo -e "${OVER}  ${CROSS} DNS resolution is not available"
    exit 1
  fi

So first this tries to look up the pi.hole name using "getent hosts pi.hole" and if that fails, it uses "dig +short pi.hole".
I don't really understand what the comment wants to say.
If the resolv.conf doesn't have a "nameserver 127.0.0.1" entry, why would the dig command try to resolve the name using 127.0.0.1?

I'm not really sure which cases the script is trying to cover in that function, but I think it should either not try to only resolve the "pi.hole" name or explicitely query the 127.0.0.1 dnsmasq via "dig +short @127.0.0.1 pi.hole".
But this may break other things, assuming there's a reason why it was coded this way.

Anyway, in my case, as said before, the container's resolv.conf only contains a "nameserver 127.0.0.11" (yes, that's "dot eleven") line, so the resolver is not querying the 127.0.0.1 dnsmasq daemon with either command.
Apparently, as I understand 1, there's no way to have a resolv.conf containing "nameserver 127.0.0.1" (which makes sense under Docker's "one process - one container" doctrine) inside the container.

I collected the debug output with PH_VERBOSE=1, but it's not really enlightening.
It just shows that it hangs in the gravity_CheckDNSResolutionAvailable() function until the timeout hits.

Sebastian

....strangely, I have another pi-hole container on a different system, where "dig pi.hole" returns the container's IP even though it also only has the 127.0.0.11 nameserver in it's resolv.conf.
DNS-wise both containers are configured completely identical.
The only difference is, that the container causing the issue is attached to a user-defined network.

The container's DNS settings are definitely identical

# docker inspect -f '{{ .HostConfig.Dns }}' pihole
[127.0.0.1 192.168.10.3]

and

# docker inspect -f '{{ .HostConfig.Dns }}' pihole-backup
[127.0.0.1 192.168.10.3]

Weird... :-?

Sebastian

I don't have an explanation for that hostname being set if you're trying to override it with the dns argument.

In my testing the priority for DNS resolv.conf servers is as follows, from high to low:

  1. Use whatever --dns arugment says (or dns: list in docker-compose v3)
  2. Use what /etc/docker/daemon.json says for DNS: key list
  3. Inherit the Docker host's DNS settings

Apparently that's the expected behavior - look at the Docker documentation I linked:

"Docker daemon runs an embedded DNS server which provides DNS resolution among containers connected to the same user-defined network, so that these containers can resolve container names to IP addresses. If the embedded DNS server is unable to resolve the request, it will be forwarded to any external DNS servers configured for the container. To facilitate this when the container is created, only the embedded DNS server reachable at 127.0.0.11 will be listed in the container’s resolv.conf file."

That should be fine, though, as the .HostConfig.Dns setting of the container refers the dns servers supplied via --dns, i.e. 127.0.0.1 as seen from inside the container.

The strange thing is that apparently that does not work as expected on the QNAP NAS.
May even be a bug on the QNAP side of things....

Sebastian

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.