Instead of normal search results, the system will occasionally error out and say REFUSED. I checked the query log and saw the following:
This coincides with the error from pihole.log:
;; Truncated, retrying in TCP mode.
Aug 17 14:28:57 dnsmasq[28833]: query[A] android.clients.google.com from 192.168.1.4
Aug 17 14:28:57 dnsmasq[28833]: config error is REFUSED
The problem is intermittent and reproducible on any DNS client (windows, linux, android, etc) when it happens.
I run a lancache system for our internet connection to lighten the load on big downloads like games from various stores (xbox, steam, epic, etc).
root@lurker:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d48bc2fc0e91 lancachenet/lancache-dns:latest "/bin/bash -e /init/…" 5 weeks ago Up 6 days 192.168.1.60:53->53/udp lancache_dns_1
40999908168f lancachenet/monolithic:latest "/bin/bash -e /init/…" 5 weeks ago Up 6 days 192.168.1.61:80->80/tcp, 443/tcp lancache_monolithic_1
12b13d441f95 lancachenet/sniproxy:latest "/scripts/bootstrap.…" 5 weeks ago Up 6 days 0.0.0.0:443->443/tcp lancache_sniproxy_1
Trying to see what logs for that I can dig through, although it is interesting that it would be 'not responding' for only this DNS query when every other query was working just fine. It is literally ONLY this host lookup that ever fails in this way.
Yes, the lancache container is running on the same physical system. This is why I had to force pihole-FTL to bind only to the 192.168.1.4 and 127.0.0.1 IP's, because if one came up first, it would block the other from working.
My general setup of request/upstream forwarding is:
client DNS request -> 192.168.1.4 (pihole) for filtering junk -> 192.168.1.60 (lancache for heavy data download caching) -> 8.8.8.8
It's currently behaving well, but if it chokes on that hostname again I'll see about querying each spot along the way to find a definitive problem spot.
I don't know if this is significant, but your log lists a java process to be active on 192.168.1.60, but not on port 53, as shown in the following excerpt:
*** [ DIAGNOSING ]: Ports in use
192.168.1.60:9081 java (IPv4)
The same address does not show up for port 53. In fact, Pi-hole is the only process listening on port 53.
Also, your Pi-hole machine seems to be accumulating an awful lot of IP addresses (I am counting 11 altogether, loopbacks excluded).
Is this by intention?
Port 9081 is Aircontrol2, a Ubiquiti management platform for their nano-beam devices that give internet to different parts of my property. I haven't looked at that in a while and it definitely does not need to run on all IP's, so I just reconfigured that to only exist on 192.168.1.4:9081.
The only IP's that I personally configured/maintain are 192.168.1.0/24 and 10.98.200.0/24. I believe the rest are from the original steamcache docker package (before I transitioned to lancache) that docker did not clean up after all the previous containers/images were taken down. I will definitely need to go back and prune/clean them up again, since lancache-monolithic covers all the various services under one container and IP instead of one per service (steam, windows update, epic, etc).
and I can do a manual nslookup query to 192.168.1.4, 196.168.1.60, and 8.8.8.8 that are all responding normally. This is getting stupidly frustrating.
When I edit my resolv.conf to skip straight to 8.8.8.8 or 192.168.1.60, download.docker.com resolves fine. When I put it back to 192.168.1.4 the docker hostname resolve fails.
Aug 18 02:34:38 dnsmasq[13242]: forwarded download.docker.com to 192.168.1.60
Aug 18 02:34:38 dnsmasq[13242]: query[AAAA] download.docker.com from 192.168.1.4
Aug 18 02:34:38 dnsmasq[13242]: forwarded download.docker.com to 192.168.1.60
and then another from when I just spammed the nslookup briefly.
Aug 18 02:34:38 dnsmasq[13242]: reply download.docker.com is <CNAME>
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:8c00:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:ba00:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:c600:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:ce00:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:d600:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:e200:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:5800:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:8400:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[6611]: query[A] download.docker.com from 192.168.1.4
Aug 18 02:34:38 dnsmasq[6611]: config error is REFUSED
Aug 18 02:34:38 dnsmasq[6611]: query[AAAA] download.docker.com from 192.168.1.4
Aug 18 02:34:38 dnsmasq[6611]: config error is REFUSED
Aug 18 02:34:38 dnsmasq[13242]: query[A] download.docker.com from 192.168.1.4
Aug 18 02:34:38 dnsmasq[13242]: forwarded download.docker.com to 192.168.1.60
Aug 18 02:34:38 dnsmasq[13242]: query[AAAA] download.docker.com from 192.168.1.4
Aug 18 02:34:38 dnsmasq[13242]: forwarded download.docker.com to 192.168.1.60
Aug 18 02:34:38 dnsmasq[13242]: reply download.docker.com is <CNAME>
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:ba00:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:c600:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:ce00:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:d600:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:e200:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:5800:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:8400:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 2600:9000:2031:8c00:3:db06:4200:93a1
Aug 18 02:34:38 dnsmasq[13242]: reply download.docker.com is <CNAME>
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 13.226.193.113
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 13.226.193.25
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 13.226.193.49
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 13.226.193.65
Aug 18 02:34:38 dnsmasq[13242]: reply download.docker.com is <CNAME>
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 13.226.193.113
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 13.226.193.25
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 13.226.193.49
Aug 18 02:34:38 dnsmasq[13242]: reply d2h67oheeuigaw.cloudfront.net is 13.226.193.65
Aug 18 02:34:38 dnsmasq[6612]: query[A] download.docker.com from 192.168.1.4
Aug 18 02:34:38 dnsmasq[6612]: cached download.docker.com is <CNAME>
Aug 18 02:34:38 dnsmasq[6612]: cached d2h67oheeuigaw.cloudfront.net is 13.226.193.65
Aug 18 02:34:38 dnsmasq[6612]: cached d2h67oheeuigaw.cloudfront.net is 13.226.193.49
Aug 18 02:34:38 dnsmasq[6612]: cached d2h67oheeuigaw.cloudfront.net is 13.226.193.25
Aug 18 02:34:38 dnsmasq[6612]: cached d2h67oheeuigaw.cloudfront.net is 13.226.193.113
Aug 18 02:34:38 dnsmasq[6612]: query[AAAA] download.docker.com from 192.168.1.4
Aug 18 02:34:38 dnsmasq[6612]: cached download.docker.com is <CNAME>
Aug 18 02:34:38 dnsmasq[6612]: config error is REFUSED
One thing I notice is that on all the 'REFUSED' results, the PID is not the same as the rest. Currently the pihole-FTL process is 13242, and when DNS results come in cleanly, they all have log entries on that PID. literally every unknown/REFUSED result spawns some other PID - 6611 and 6612 in the log above, apparently it's a much larger problem, as my pihole.log has some 580+ entries of this style that I had not known about because of it's intermittent nature.
config error is REFUSED may hint a a firewalled or otherwise inaccessible or completely lacking upstream server, or a respective error as returned by an upstream.
Yet there is no doubt your single .60 upstream is responding to Pi-hole, otherwise you would have no DNS resolution at all.
Regarding the second process id: dnsmasq may fork for TCP resolution, and that has been subject of a recent patch by dnsmasq.
But just to make sure and preclude the obvious: You wouldn't run a second dnsmasq instance?
In fiddling more I just had a situation where the host-resolve failed against 192.168.1.60 when I had that in my resolv.conf - seems I may need to do some digging in the lancache logs, not that I've been able to find any so far to start with >.<
No second dnsmasq service running that I'm aware of, pihole is it.
Yep, I'm familiar with how to send a dns request to a specific host. As I noted earlier anytime I run the commands manually - no matter which server I pointed it at - seemed to work fine, even though I could run 'apt-get update' (which calls on dns) would reliably fail for the exact same hostname I had just looked up by hand again and again and again.
The fact that it's not readily reproducible bears some similarity to the TCP forking crash that the patch I mentioned was intended to address, but I'm just guessing here.
I'll see if I can get a developer involved, but it's already late my side of the pond.
Meanwhile, you could try if we get some more details as to why the query fails in the log by adding the following line to /etc/pihole/pihole-FTL.conf:
Development support has looked over your situation again, and agree that using DEBUG_QUERIES should at least reveal the process id of new forks for TCP resolution, so you should consider this if you've not already done so.
An additional recommendation would be to check your firewalls if they would allow TCP traffic for port 53 (your successful DNS queries prove that port 53 UDP is not the problem).
I might add to this that cutting down on the number of IPs on your Pi-hole machine also lowers the risk of IP specific firewall rules intercepting traffic originating from an IP that would be considered restricted - just in case such rules would exist.
You should review your firewall rules on both your Pi-hole installation and your lancache.net instances to allow for ports 53 TCP as well as UDP.
To explain why that would fit your observation:
Normal DNS happens over UDP by default and is used for the vast majority of queries. Using TCP is a necessity once the DNS answer's size exceeds maximum UDP datagram size.
If TCP 53 is blocked, DNS queries would only fail if a DNS servers answers get too big (or if a client explcitly demands TCP resolution upfront).
I do not have firewall rules on the system running these services, the system is not a gateway device to the internet and therefore has nothing set to block or filter packets.
Aug 22 15:54:26 dnsmasq[14116]: config error is REFUSED
pihole-FTL.log
[2020-08-22 15:54:26.054 27899M] CNAME android.clients.google.com
[2020-08-22 15:54:26.054 27899M] **** got reply android.clients.google.com is (CNAME) (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.054 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.054 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.054 27899M] CNAME android.clients.google.com ---> android.l.google.com
[2020-08-22 15:54:26.054 27899M] **** got reply android.l.google.com is 172.217.9.142 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.054 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.054 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.054 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.054 27899M] **** got reply android.l.google.com is 172.217.9.174 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.054 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.054 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.054 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.054 27899M] **** got reply android.l.google.com is 172.217.12.46 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.054 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.055 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.055 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.055 27899M] **** got reply android.l.google.com is 172.217.12.78 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.055 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.055 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.055 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.055 27899M] **** got reply android.l.google.com is 172.217.14.174 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.055 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.055 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.055 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.055 27899M] **** got reply android.l.google.com is 216.58.193.142 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.055 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.055 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.055 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.055 27899M] **** got reply android.l.google.com is 216.58.194.46 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.055 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.055 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.055 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.055 27899M] **** got reply android.l.google.com is 216.58.194.78 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.056 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.056 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.056 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.056 27899M] **** got reply android.l.google.com is 172.217.1.142 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.056 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.056 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.056 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.056 27899M] **** got reply android.l.google.com is 172.217.1.238 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.056 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.056 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.056 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.056 27899M] **** got reply android.l.google.com is 172.217.2.238 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.056 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.056 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.056 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.056 27899M] **** got reply android.l.google.com is 172.217.6.142 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.056 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.056 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.056 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.057 27899M] **** got reply android.l.google.com is 172.217.6.174 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.057 27899M] android.l.google.com is not known
[2020-08-22 15:54:26.057 27899M] Query is permitted as at least one whitelist entry matched
[2020-08-22 15:54:26.057 27899M] CNAME android.l.google.com
[2020-08-22 15:54:26.057 27899M] **** got reply android.l.google.com is 172.217.9.14 (ID 6102, /root/project/src/dnsmasq/cache.c:494)
[2020-08-22 15:54:26.062 14116/F27899] TCP worker forked for client 192.168.1.190 on interface eth0 (192.168.1.4)
[2020-08-22 15:54:26.068 14116/F27899] **** new TCP query[A] "android.clients.google.com" from 192.168.1.190 (ID 6105, FTL 111772, /root/project/src/dnsmasq/forward.c:1947)
[2020-08-22 15:54:26.068 14116/F27899] android.clients.google.com is known as not to be blocked (whitelisted)
[2020-08-22 15:54:26.068 14116/F27899] **** got reply error is ::500:a515:ed0b:ba55:0 (ID 6105, /root/project/src/dnsmasq/rfc1035.c:1041)
[2020-08-22 15:54:26.072 14116/F27899] TCP worker terminating (client disconnected)```