Intermittent 'config error is REFUSED'

So the REFUSED comes from the TCP query to android.clients.google.com Maybe your configured DNS server (192.168.1.60) does not answer DNS requests or something else is broken.

Please test whether this works from your Pi-hole

dig +tcp download.docker.com @192.168.1.60

and also please generate a recent debug log (you old one already expired so I cannot look into the config any more):

pihole -d

I'm sure we'll be able to find out what is going on in the end. And if this takes a few changes to FTL to make the error message more explanatory in what actually failed, then we can surely do this as well.

edit

It would be helpful if you can also add DEBUG_FLAGS=true in pihole-FTL.conf and restart FTL. You should (maybe) be able to trigger the issue using

dig +tcp download.docker.com @127.0.0.1

on your Pi-hole

Here's the suggested query as well as a traditional nslookup of the same name to the same target.

;; Connection to 192.168.1.60#53(192.168.1.60) for download.docker.com failed: connection refused.
root@lurker:~# nslookup download.docker.com 192.168.1.60
Server:         192.168.1.60
Address:        192.168.1.60#53

Non-authoritative answer:
download.docker.com     canonical name = d2h67oheeuigaw.cloudfront.net.
Name:   d2h67oheeuigaw.cloudfront.net
Address: 13.226.193.49
Name:   d2h67oheeuigaw.cloudfront.net
Address: 13.226.193.65
Name:   d2h67oheeuigaw.cloudfront.net
Address: 13.226.193.113
Name:   d2h67oheeuigaw.cloudfront.net
Address: 13.226.193.25

My pihole-FTL.conf now looks like this:

PRIVACYLEVEL=0
DEBUG_QUERIES=true
DEBUG_FLAGS=true

After restart, the dig command got the following output:


; <<>> DiG 9.10.3-P4-Debian <<>> +tcp download.docker.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62022
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;download.docker.com.           IN      A

;; Query time: 2 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Aug 23 14:37:44 CDT 2020
;; MSG SIZE  rcvd: 48

The associated pihole-FTL.log entry for this query is here:

[2020-08-23 14:37:44.122 15047/F15035] **** new TCP query[A] "download.docker.com" from 127.0.0.1 (ID 16, FTL 107002, /root/project/src/dnsmasq/forward.c:1947)
[2020-08-23 14:37:44.122 15047/F15035] Resizing "/FTL-strings" from 57344 to 61440
[2020-08-23 14:37:44.122 15047/F15035] download.docker.com is not known
[2020-08-23 14:37:44.124 15047/F15035] **** got reply error is ::500:a505:dcc1:1656:0 (ID 16, /root/project/src/dnsmasq/rfc1035.c:1041)
[2020-08-23 14:37:44.124 15047/F15035]      Flags:
[2020-08-23 14:37:44.124 15047/F15035] TCP worker terminating (client disconnected)

New debug token is: https://tricorder.pi-hole.net/1r7gllaspl

I was able to go through my system and clean up the networking a little bit. trimmed out two of the 192.168.1 addresses, and pruned some docker interfaces as well. The ones that remain all relate to the bare-bones needed for docker to function, as best I can tell.

So nothing extraordinary in your debug log, except for


which should, however, not have any immediate consequences for the current situation (Pi-hole does not strictly need to know these details to work properly).

This is the error. The server on 192.168.1.60 is not accepting TCP queries. The problem is on this server, not in your Pi-hole. I will, nevertheless, improve the FTL error message to make this more clear in the future.

The 'enp0s31f6' messages are likely due to the struggles I've been having with Debian regarding the naming conventions of interfaces. Classic linux naming conventions of 'ethX' got deprecated and so I rearranged a bunch of things to use the new names for the same devices after a dist upgraded (eth0 -> enp0s31f6), and I had originally tried to keep those classic names for ease of use and less headaches trying to reflect those changes in anything that may use them on my system. Then I did a kernel upgrade the other day and it appears that my previous efforts to revert the naming convention somehow miraculously (and retroactively) took... months later? Damnit, linux >.<
And of course now pihole is complaining about it.. but the setupvars.conf (the only file under /etc/pihole I can find that references it) still has that newer name in it. I can fix that easy enough, at least.

Regarding the 192.168.1.60 lancache DNS service not accepting tcp queries on port 53, this does appear confirmed by netstat output:

udp        0      0 192.168.1.60:53         0.0.0.0:*                           4484/docker-proxy
udp        0      0 192.168.1.60:123        0.0.0.0:*                           1472/ntpd
udp        0      0 192.168.1.60:1900       0.0.0.0:*                           2708/dleyna-rendere
udp        0      0 192.168.1.60:54770      0.0.0.0:*                           2708/dleyna-rendere
udp        0      0 192.168.1.60:56246      0.0.0.0:*                           1009/java
udp        0      0 192.168.1.60:43765      0.0.0.0:*                           1531/java

But the question becomes one of 'this is the upstream provider for all of pihole-FTL DNS queries, why is it a very selective few out of everything that kicks over to TCP rather than UDP and cause these errors?' Why am I not experiencing a complete DNS blackout instead of just irritating intermittent problems for a handful of selective names (download.docker.com and android.clients.google.com being the chief ones)?

There are multiple reasons.

Reason 1: The client does perform a TCP request because it is asked to

The reply to a query was too large to fit into a UDP response (which cannot be scattered into multiple packets).

This was your initial problem:

Querying this domain myself and seeing how much information is contained in there

Answer records
android.clients.google.com. 300	IN	CNAME	android.l.google.com.
android.l.google.com.	299	IN	A	216.58.207.46
android.l.google.com.	299	IN	A	172.217.18.14
android.l.google.com.	299	IN	A	172.217.18.174
android.l.google.com.	299	IN	A	216.58.206.14
android.l.google.com.	299	IN	A	172.217.21.206
android.l.google.com.	299	IN	A	172.217.23.142
android.l.google.com.	299	IN	A	216.58.205.238
android.l.google.com.	299	IN	A	172.217.21.238
android.l.google.com.	299	IN	A	172.217.16.206
android.l.google.com.	299	IN	A	172.217.23.174
android.l.google.com.	299	IN	A	172.217.22.110
android.l.google.com.	299	IN	A	172.217.22.46
android.l.google.com.	299	IN	A	172.217.18.110
android.l.google.com.	299	IN	A	172.217.16.142
android.l.google.com.	299	IN	A	216.58.212.174
android.l.google.com.	299	IN	A	216.58.212.142

it's not very surprising. Imagine the response contains additional records such as the name servers, you are over the limit of what a UDP datagram can carry.

Check out RFC 5966 DNS Transport over TCP

In the absence of EDNS0 (Extension Mechanisms for DNS 0), the normal behaviour of any DNS server needing to send a UDP response that would exceed the 512-byte limit is for the server to truncate the response so that it fits within that limit and then set the TC flag in the response header. When the client receives such a response, it takes the TC flag as an indication that it should retry over TCP instead.

So your client is asked to perform a query over TCP. And behaves accordingly. There is nothing wrong about this. The fact that your 192.168.6.10 does not accept TCP queries violates the quoted RFC in at least two places.
Section 1;

This document therefore updates the core DNS protocol specifications such that support for TCP is henceforth a REQUIRED part of a full DNS protocol implementation.

Section 4:

All general-purpose DNS implementations MUST support both UDP and TCP transport.

Reason 2: The client does perform a TCP request because it wants to

This is the case, e.g., here:

The first query (A) still works because FTL knows the answer from cache. The second query (AAAA), however, cannot be answered as FTL does not know this from cache and the upstream server refused to give a response over TCP.

It only makes sense to forward TCP queries from a client also over TCP to the upstream server as this may have happened because the original UDP query was truncated in the first place. Pi-hole behaves just as expected and designed here.

Cool, thank you for the detailed response :slight_smile: I'll head over to the lancache folks and see if there's any way to enable TCP protocol DNS services for their container.

lancache-dns is nothing else than a local bind container. As much as I dislike bind (due to its unnecessary complexity and history of epic security holes), it should offer TCP by default. I suspect there is just a missing firewall rule / docker port exposure.

I don't believe that to be the case.

1 - I have no firewall rules in place of any sort on this system, as it is not public facing
2 - It's visible here in the docker ps output that it is explicitly only listening on udp.

f6e660c4fdc3        lancachenet/lancache-dns:latest   "/bin/bash -e /init/…"   25 hours ago        Up 25 hours         192.168.1.60:53->53/udp            lancache_dns_1

How about this? :wink:

To expose on both TCP and UDP, include two lines:

EXPOSE 53/tcp
EXPOSE 53/udp

LOL - sorry, when you said 'docker port exposure' I was interpreting that as 'the port is in service but is not exposed to clients, IE via firewall blocking or whatnot'. I'm a beginner at docker and did not realize that was specific language pertaining to containers. TIL! Thank you :slight_smile:

One moment while I try that out and restart the container.

Hmm - not sure where that blob path would be in my environment, but I do see this entry in the lancache/docker-compose.yml file that was relevant to setting them all up.

root@lurker:~/lancache# cat docker-compose.yml
version: '2'
services:
  dns:
    image: lancachenet/lancache-dns:latest
    env_file: .env
#    restart: unless-stopped
    ports:
      - ${DNS_BIND_IP}:53:53/udp
  sniproxy:
    image: lancachenet/sniproxy:latest
    env_file: .env
#    restart: unless-stopped
    ports:
      - 443:443/tcp
  monolithic:
    image: lancachenet/monolithic:latest
    env_file: .env
#    restart: unless-stopped
    ports:
      - 192.168.1.61:80:80/tcp
    volumes:
      - ${CACHE_ROOT}/cache:/data/cache
      - ${CACHE_ROOT}/logs:/data/logs

Would it just be a matter of adding a second like to that DNS set of arguments and running the 'update_containers.sh' script? sample change below

  dns:
    image: lancachenet/lancache-dns:latest
    env_file: .env
#    restart: unless-stopped
    ports:
      - ${DNS_BIND_IP}:53:53/udp
      - ${DNS_BIND_IP}:53:53/tcp

I was quoting the Dockerfile, this is used for building the container. You likely don't have it offline. However, it is the proper place to change this upstream.

My entire knowledge about lancache comes from one quick Github search :wink: I don't know what this script is doing. I'd say nothing can go wrong. If something stops working, just revert it to what it was before. And still submit a request to get this changes upstream. What lancache-dns is doing is still violating the standards and should get fixed for everyone.

I don't have a copy of the git repo offline, no. I have relied on just grabbing docker images and running them.

I added the line to the docker-compose.yml file and ran the script, worked as desired.

9c4b0b5c293c        lancachenet/lancache-dns:latest   "/bin/bash -e /init/…"   57 seconds ago      Up 55 seconds       192.168.1.60:53->53/tcp, 192.168.1.60:53->53/udp   lancache_dns_1
root@lurker:~/lancache# netstat -anp|grep ":53 "
tcp        0      0 192.168.1.60:53         0.0.0.0:*               LISTEN      10355/docker-proxy
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      3921/pihole-FTL
tcp        0      0 192.168.1.4:53          0.0.0.0:*               LISTEN      3921/pihole-FTL
udp        0      0 192.168.1.60:53         0.0.0.0:*                           10384/docker-proxy
udp        0      0 127.0.0.1:53            0.0.0.0:*                           3921/pihole-FTL
udp        0      0 192.168.1.4:53          0.0.0.0:*                           3921/pihole-FTL

Also re-ran the dig command provided earlier, worked fine now.

root@lurker:~/lancache# !dig
dig +tcp download.docker.com @192.168.1.60

; <<>> DiG 9.10.3-P4-Debian <<>> +tcp download.docker.com @192.168.1.60
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50467
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 13, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;download.docker.com.           IN      A

;; ANSWER SECTION:
download.docker.com.    14      IN      CNAME   d2h67oheeuigaw.cloudfront.net.
d2h67oheeuigaw.cloudfront.net. 59 IN    A       143.204.160.58
d2h67oheeuigaw.cloudfront.net. 59 IN    A       143.204.160.90
d2h67oheeuigaw.cloudfront.net. 59 IN    A       143.204.160.115
d2h67oheeuigaw.cloudfront.net. 59 IN    A       143.204.160.18

;; AUTHORITY SECTION:
.                       34788   IN      NS      k.root-servers.net.
.                       34788   IN      NS      l.root-servers.net.
.                       34788   IN      NS      m.root-servers.net.
.                       34788   IN      NS      a.root-servers.net.
.                       34788   IN      NS      b.root-servers.net.
.                       34788   IN      NS      c.root-servers.net.
.                       34788   IN      NS      d.root-servers.net.
.                       34788   IN      NS      e.root-servers.net.
.                       34788   IN      NS      f.root-servers.net.
.                       34788   IN      NS      g.root-servers.net.
.                       34788   IN      NS      h.root-servers.net.
.                       34788   IN      NS      i.root-servers.net.
.                       34788   IN      NS      j.root-servers.net.

;; Query time: 146 msec
;; SERVER: 192.168.1.60#53(192.168.1.60)
;; WHEN: Mon Aug 24 16:12:26 CDT 2020
;; MSG SIZE  rcvd: 363

Thank you @Bucking_Horn and @DL6ER for all the feedback and help :slight_smile:

2 Likes

I had posted my question on the lancache discord and shared my solution above as an update before there was a response, and a user on the discord server said they submitted it as a PR to the project. Hoorah for progress :slight_smile:

https://github.com/lancachenet/lancache-dns/pull/116

https://github.com/lancachenet/docker-compose/pull/20

How does this work? How can you give advise and bugfix also other projects in addition to all you do for Pi-hole? Do you ever sleep? Once again I have to say, I can't do other, that Pi-hole and the people behind it are the most amazing online project I have ever encountered in my entire life. And this goes back to the roots of the Internet so consider this an absolute statement.

1 Like

Answer (link+following replies)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.