So the REFUSED comes from the TCP query to android.clients.google.com Maybe your configured DNS server (192.168.1.60) does not answer DNS requests or something else is broken.
Please test whether this works from your Pi-hole
dig +tcp download.docker.com @192.168.1.60
and also please generate a recent debug log (you old one already expired so I cannot look into the config any more):
pihole -d
I'm sure we'll be able to find out what is going on in the end. And if this takes a few changes to FTL to make the error message more explanatory in what actually failed, then we can surely do this as well.
edit
It would be helpful if you can also add DEBUG_FLAGS=true in pihole-FTL.conf and restart FTL. You should (maybe) be able to trigger the issue using
I was able to go through my system and clean up the networking a little bit. trimmed out two of the 192.168.1 addresses, and pruned some docker interfaces as well. The ones that remain all relate to the bare-bones needed for docker to function, as best I can tell.
which should, however, not have any immediate consequences for the current situation (Pi-hole does not strictly need to know these details to work properly).
This is the error. The server on 192.168.1.60 is not accepting TCP queries. The problem is on this server, not in your Pi-hole. I will, nevertheless, improve the FTL error message to make this more clear in the future.
The 'enp0s31f6' messages are likely due to the struggles I've been having with Debian regarding the naming conventions of interfaces. Classic linux naming conventions of 'ethX' got deprecated and so I rearranged a bunch of things to use the new names for the same devices after a dist upgraded (eth0 -> enp0s31f6), and I had originally tried to keep those classic names for ease of use and less headaches trying to reflect those changes in anything that may use them on my system. Then I did a kernel upgrade the other day and it appears that my previous efforts to revert the naming convention somehow miraculously (and retroactively) took... months later? Damnit, linux >.<
And of course now pihole is complaining about it.. but the setupvars.conf (the only file under /etc/pihole I can find that references it) still has that newer name in it. I can fix that easy enough, at least.
Regarding the 192.168.1.60 lancache DNS service not accepting tcp queries on port 53, this does appear confirmed by netstat output:
But the question becomes one of 'this is the upstream provider for all of pihole-FTL DNS queries, why is it a very selective few out of everything that kicks over to TCP rather than UDP and cause these errors?' Why am I not experiencing a complete DNS blackout instead of just irritating intermittent problems for a handful of selective names (download.docker.com and android.clients.google.com being the chief ones)?
Reason 1: The client does perform a TCP request because it is asked to
The reply to a query was too large to fit into a UDP response (which cannot be scattered into multiple packets).
This was your initial problem:
Querying this domain myself and seeing how much information is contained in there
Answer records
android.clients.google.com. 300 IN CNAME android.l.google.com.
android.l.google.com. 299 IN A 216.58.207.46
android.l.google.com. 299 IN A 172.217.18.14
android.l.google.com. 299 IN A 172.217.18.174
android.l.google.com. 299 IN A 216.58.206.14
android.l.google.com. 299 IN A 172.217.21.206
android.l.google.com. 299 IN A 172.217.23.142
android.l.google.com. 299 IN A 216.58.205.238
android.l.google.com. 299 IN A 172.217.21.238
android.l.google.com. 299 IN A 172.217.16.206
android.l.google.com. 299 IN A 172.217.23.174
android.l.google.com. 299 IN A 172.217.22.110
android.l.google.com. 299 IN A 172.217.22.46
android.l.google.com. 299 IN A 172.217.18.110
android.l.google.com. 299 IN A 172.217.16.142
android.l.google.com. 299 IN A 216.58.212.174
android.l.google.com. 299 IN A 216.58.212.142
it's not very surprising. Imagine the response contains additional records such as the name servers, you are over the limit of what a UDP datagram can carry.
In the absence of EDNS0 (Extension Mechanisms for DNS 0), the normal behaviour of any DNS server needing to send a UDP response that would exceed the 512-byte limit is for the server to truncate the response so that it fits within that limit and then set the TC flag in the response header. When the client receives such a response, it takes the TC flag as an indication that it should retry over TCP instead.
So your client is asked to perform a query over TCP. And behaves accordingly. There is nothing wrong about this. The fact that your 192.168.6.10 does not accept TCP queries violates the quoted RFC in at least two places.
Section 1;
This document therefore updates the core DNS protocol specifications such that support for TCP is henceforth a REQUIRED part of a full DNS protocol implementation.
Section 4:
All general-purpose DNS implementations MUST support both UDP and TCP transport.
Reason 2: The client does perform a TCP request because it wants to
This is the case, e.g., here:
The first query (A) still works because FTL knows the answer from cache. The second query (AAAA), however, cannot be answered as FTL does not know this from cache and the upstream server refused to give a response over TCP.
It only makes sense to forward TCP queries from a client also over TCP to the upstream server as this may have happened because the original UDP query was truncated in the first place. Pi-hole behaves just as expected and designed here.
Cool, thank you for the detailed response I'll head over to the lancache folks and see if there's any way to enable TCP protocol DNS services for their container.
lancache-dns is nothing else than a local bind container. As much as I dislike bind (due to its unnecessary complexity and history of epic security holes), it should offer TCP by default. I suspect there is just a missing firewall rule / docker port exposure.
1 - I have no firewall rules in place of any sort on this system, as it is not public facing
2 - It's visible here in the docker ps output that it is explicitly only listening on udp.
f6e660c4fdc3 lancachenet/lancache-dns:latest "/bin/bash -e /init/…" 25 hours ago Up 25 hours 192.168.1.60:53->53/udp lancache_dns_1
LOL - sorry, when you said 'docker port exposure' I was interpreting that as 'the port is in service but is not exposed to clients, IE via firewall blocking or whatnot'. I'm a beginner at docker and did not realize that was specific language pertaining to containers. TIL! Thank you
One moment while I try that out and restart the container.
Hmm - not sure where that blob path would be in my environment, but I do see this entry in the lancache/docker-compose.yml file that was relevant to setting them all up.
I was quoting the Dockerfile, this is used for building the container. You likely don't have it offline. However, it is the proper place to change this upstream.
My entire knowledge about lancache comes from one quick Github search I don't know what this script is doing. I'd say nothing can go wrong. If something stops working, just revert it to what it was before. And still submit a request to get this changes upstream. What lancache-dns is doing is still violating the standards and should get fixed for everyone.
I had posted my question on the lancache discord and shared my solution above as an update before there was a response, and a user on the discord server said they submitted it as a PR to the project. Hoorah for progress
How does this work? How can you give advise and bugfix also other projects in addition to all you do for Pi-hole? Do you ever sleep? Once again I have to say, I can't do other, that Pi-hole and the people behind it are the most amazing online project I have ever encountered in my entire life. And this goes back to the roots of the Internet so consider this an absolute statement.