I've had a read and I'm confident that that's not the problem. It's a new pihole install but it's definitely been running properly otherwise.
TCP seems to work
I managed to catch a single bad query earlier today that did get a positive result when retried with TCP. I had it scripted up like so, the problem had gone when I tried to run it a second time immediately after the failure.
# On the pihole host
echo "dig - UDP - against pihole"
dig @127.0.0.1 google.com
echo "dig - TCP - against pihole"
dig +tcp @127.0.0.1 google.com
And the output (trimmed):
# UDP
; <<>> DiG 9.16.1-Ubuntu <<>> @127.0.0.1 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62843 <---------
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; OPT=15: 00 17 ("..")
;; Query time: 3 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Jan 13 18:52:20 AEDT 2022
#TCP
; <<>> DiG 9.16.1-Ubuntu <<>> +tcp @127.0.0.1 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16022
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; Query time: 119 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Jan 13 18:52:21 AEDT 2022
And the corresponding FTL logs:
[2022-01-13 18:52:20.794 120256M] Processing FTL hook from src/dnsmasq/forward.c:1599...
[2022-01-13 18:52:20.794 120256M] Flags: F_FORWARD F_IPV4 F_QUERY
[2022-01-13 18:52:20.794 120256M] **** new UDP IPv4 query[A] query "google.com" from lo/127.0.0.1#50340 (ID 18337, FTL 20209, src/dnsmasq/forward.c:1601)
[2022-01-13 18:52:20.795 120256M] google.com is not known
[2022-01-13 18:52:20.795 120256M] DNS cache: 127.0.0.1/google.com is not blocked
[2022-01-13 18:52:20.796 120256M] Processing FTL hook from src/dnsmasq/rfc1035.c:1110...
[2022-01-13 18:52:20.796 120256M] Flags: F_CONFIG F_RCODE
[2022-01-13 18:52:20.796 120256M] ***** Unknown cache query
[2022-01-13 18:52:20.796 120256M] **** got cache reply: error is REFUSED (nowhere to forward to) (ID 18337, src/dnsmasq/rfc1035.c:1110)
[2022-01-13 18:52:20.796 120256M] EDE: network error (23)
[2022-01-13 18:52:20.796 120256M] Set reply to REFUSED (8) in src/dnsmasq_interface.c:2071
[2022-01-13 18:52:20.905 129730/F120256] TCP worker forked for client 127.0.0.1 on interface lo with IP 127.0.0.1
[2022-01-13 18:52:20.905 129730/F120256] Reopening Gravity database for this fork
[2022-01-13 18:52:20.912 129730/F120256] Closing Telnet socket for this fork
[2022-01-13 18:52:20.912 129730/F120256] Closing Unix socket for this fork
[2022-01-13 18:52:20.913 129730/F120256] Processing FTL hook from src/dnsmasq/forward.c:2077...
[2022-01-13 18:52:20.913 129730/F120256] Flags: F_FORWARD F_IPV4 F_QUERY
[2022-01-13 18:52:20.913 129730/F120256] **** new TCP IPv4 query[A] query "google.com" from lo/127.0.0.1#43519 (ID 18338, FTL 20210, src/dnsmasq/forward.c:2080)
[2022-01-13 18:52:20.913 129730/F120256] google.com is known as not to be blocked
[2022-01-13 18:52:21.017 129730/F120256] Processing FTL hook from src/dnsmasq/forward.c:2245...
[2022-01-13 18:52:21.018 129730/F120256] Flags: F_FORWARD F_IPV4 F_SERVER
[2022-01-13 18:52:21.018 129730/F120256] **** forwarded google.com to 208.67.222.222#53 (ID 18338, src/dnsmasq/forward.c:2245)
[2022-01-13 18:52:21.018 129730/F120256] FTL_CNAME called with: src = (null), dst = google.com, id = 18338
[2022-01-13 18:52:21.019 129730/F120256] google.com is known as not to be blocked
[2022-01-13 18:52:21.019 129730/F120256] Query 18338: CNAME google.com
[2022-01-13 18:52:21.020 129730/F120256] Processing FTL hook from src/dnsmasq/rfc1035.c:893...
[2022-01-13 18:52:21.020 129730/F120256] Flags: F_FORWARD F_IPV4 F_UPSTREAM
[2022-01-13 18:52:21.020 129730/F120256] **** got upstream reply from 208.67.222.222#53: google.com is 172.217.24.46 (ID 18338, src/dnsmasq/rfc1035.c:893)
[2022-01-13 18:52:21.020 129730/F120256] Set reply to IP (4) in src/dnsmasq_interface.c:2144
[2022-01-13 18:52:21.025 129730/F120256] TCP worker terminating (client disconnected)
I did notice that query volumes go up when the error occurs, but no signs of rate-limiting, so I suspect it's just Chrome retrying aggressively. I didn't analyse the logs properly but a lot of the REFUSED results are for Google domains.
Special version
Thanks, I've just run that now. Looks like it worked fine and FTL is running so I'll wait for the next time it fails.