Pihole 6.0.3 on a Pi4:
Linux pi4 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux
Debian GNU/Linux 11 (bullseye)
Actual Behaviour:
Did the upgrade from v5 yesterday with this problem:
[✓] Deleting existing list cache
[✗] DNS resolution is currently unavailable
[i] Waiting until DNS resolution is available.................................................................................................................... [i] Waiting until DNS resolution is available..................................................................................................................... [i] Waiting until DNS resolution is available..................................................................................................................... [i] Waiting until DNS resolution is available..................................................................................................................... [i] Waiting until DNS resolution is available........................................................................................................................^C
[i] User-abort detected
[✓] Cleaning up stray matter
ended up having to do a repair install, which now basically works.
Now I have a heartbeat service which pings a remote uptime-kuma server every 15 minutes, what I'm observing since the upgrade to v6 are intermittent fails:
Further digging through other logs, I found this from another cron job which regularely calls another server not on ddns.net:
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.hnd.bayern.de', port=443): Max retries exceeded with url: xxxxxxxx?addhr=hr_gefahr&vhs_type=std&ensvhs=&member= (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f97ce9460>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
Everything's back to normal 15 minutes later.
I also checked the load of the pihole raspberry at those times, all normal.
No, it's not dynamic and I observe the behavior with other domains as well. It's actually fluctuating for these "every 15 minutes" cron jobs, works for hours, then not at some arbitrary point in time it fails. In the case of python scripts with
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f77a6ef70>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
(in this case for a domain of a service I don't self-host)
Here's a full trace of a curl request last night at 05:20:00 - the request was running on the same box as pihole itself, both directly on the host, so timestamps from curl and pihole are in sync:
# Hostname: pi4
# Date/Time: 2025/02/24 05:20:00 (GMT+1)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
...
curl: (28) Resolving timed out after 10000 milliseconds
Trace from /var/log/pihole/pihole.log
I yyyyy-ed out a selfhosted service which resolved properly at 05:20:00 and xxxxxx-ed out the service which keeps failing - even though I now added its IP address to the custom records, to eliminate any up-stream issues. Note that it took apparently 18 seconds until the curl request at 05:20:00 was "registered" at 05:20:18 with query[A] yyyyyyyy.ddns.net from 127.0.0.1 while another request form another cron job from a different box at 05:20:00 was resolved within the same second
Feb 24 05:20:00 dnsmasq[615232]: query[A] influxdb.xxxxxxxx.ddns.net from 192.168.0.180
Feb 24 05:20:00 dnsmasq[615232]: cached influxdb.xxxxxxxx.ddns.net is x.x.x.x
Feb 24 05:20:00 dnsmasq[615232]: query[AAAA] influxdb.xxxxxxxx.ddns.net from 192.168.0.180
Feb 24 05:20:00 dnsmasq[615232]: cached influxdb.xxxxxxxx.ddns.net is NODATA-IPv6
Feb 24 05:20:00 dnsmasq[615232]: query[A] cs.appsmith.com from 192.168.0.180
Feb 24 05:20:00 dnsmasq[615232]: cached-stale cs.appsmith.com is 3.12.32.188
Feb 24 05:20:00 dnsmasq[615232]: cached-stale cs.appsmith.com is 3.16.134.105
Feb 24 05:20:00 dnsmasq[615232]: forwarded cs.appsmith.com to 8.8.8.8
Feb 24 05:20:00 dnsmasq[615232]: query[AAAA] cs.appsmith.com from 192.168.0.180
Feb 24 05:20:00 dnsmasq[615232]: cached cs.appsmith.com is NODATA-IPv6
Feb 24 05:20:18 dnsmasq[615232]: reply cs.appsmith.com is 3.16.134.105
Feb 24 05:20:18 dnsmasq[615232]: reply cs.appsmith.com is 3.12.32.188
Feb 24 05:20:18 dnsmasq[615232]: query[AAAA] imap.gmail.com from 192.168.0.180
Feb 24 05:20:18 dnsmasq[615232]: cached-stale imap.gmail.com is 2a00:1450:400c:c0a::6d
Feb 24 05:20:18 dnsmasq[615232]: cached-stale imap.gmail.com is 2a00:1450:400c:c0a::6c
Feb 24 05:20:18 dnsmasq[615232]: forwarded imap.gmail.com to 8.8.8.8
Feb 24 05:20:18 dnsmasq[615232]: query[A] imap.gmail.com from 192.168.0.180
Feb 24 05:20:18 dnsmasq[615232]: cached-stale imap.gmail.com is 66.102.1.108
Feb 24 05:20:18 dnsmasq[615232]: cached-stale imap.gmail.com is 66.102.1.109
Feb 24 05:20:18 dnsmasq[615232]: forwarded imap.gmail.com to 8.8.8.8
Feb 24 05:20:18 dnsmasq[615232]: query[A] yyyyyyyy.ddns.net from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: /etc/pihole/hosts/custom.list yyyyyyyy.ddns.net is y.y.y.y
Feb 24 05:20:18 dnsmasq[615232]: query[AAAA] yyyyyyyy.ddns.net from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: cached-stale yyyyyyyy.ddns.net is NODATA-IPv6
Feb 24 05:20:18 dnsmasq[615232]: forwarded yyyyyyyy.ddns.net to 8.8.8.8
Feb 24 05:20:18 dnsmasq[615232]: query[A] influxdb.xxxxxxxx.ddns.net from 192.168.0.180
Feb 24 05:20:18 dnsmasq[615232]: cached influxdb.xxxxxxxx.ddns.net is x.x.x.x
Feb 24 05:20:18 dnsmasq[615232]: query[PTR] 200.0.168.192.in-addr.arpa from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: cached-stale 192.168.0.200 is ccu.fritz.box
Feb 24 05:20:18 dnsmasq[615232]: forwarded 200.0.168.192.in-addr.arpa to 192.168.0.1
Feb 24 05:20:18 dnsmasq[615232]: query[AAAA] imap.gmail.com from 192.168.0.180
Feb 24 05:20:18 dnsmasq[615232]: cached-stale imap.gmail.com is 2a00:1450:400c:c0a::6c
Feb 24 05:20:18 dnsmasq[615232]: cached-stale imap.gmail.com is 2a00:1450:400c:c0a::6d
Feb 24 05:20:18 dnsmasq[615232]: reply 192.168.0.200 is ccu.fritz.box
Feb 24 05:20:18 dnsmasq[615232]: query[A] imap.gmail.com from 192.168.0.180
Feb 24 05:20:18 dnsmasq[615232]: cached-stale imap.gmail.com is 66.102.1.109
Feb 24 05:20:18 dnsmasq[615232]: cached-stale imap.gmail.com is 66.102.1.108
Feb 24 05:20:18 dnsmasq[615232]: query[A] yyyyyyyy.ddns.net from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: /etc/pihole/hosts/custom.list yyyyyyyy.ddns.net is y.y.y.y
Feb 24 05:20:18 dnsmasq[615232]: query[AAAA] yyyyyyyy.ddns.net from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: cached-stale yyyyyyyy.ddns.net is NODATA-IPv6
Feb 24 05:20:18 dnsmasq[615232]: forwarded yyyyyyyy.ddns.net to 127.0.0.1#5053
Feb 24 05:20:18 dnsmasq[615232]: forwarded yyyyyyyy.ddns.net to 8.8.8.8
Feb 24 05:20:18 dnsmasq[615232]: forwarded yyyyyyyy.ddns.net to 8.8.4.4
Feb 24 05:20:18 dnsmasq[615232]: forwarded yyyyyyyy.ddns.net to 208.67.222.222
Feb 24 05:20:18 dnsmasq[615232]: forwarded yyyyyyyy.ddns.net to 208.67.220.220
And here's the pi4's beszel graph at that time, this was not caused by a maxed-out CPU either:
Your curl output doesn't contain a domain, so its unclear whether it would refer to yyyyy or xxxxx (or maybe some other domain).
In addition:
There seems to be no yyyyy query at 05:20, only xxxxx?
Note that queries for yyyyy as well as for xxxxx have been answered instantly, so what you observe is not a delay in resolution.
As Pi-hole instantly registers requests, it would seem that the process issuing the request does communicate with Pi-hole no earlier than the logged time.
What's remarkable about your output is that there is a sudden jump from 05:20:00 to 05:20:18.
Just to be sure, you didn't perhaps strip some lines from that output to focus on your issue, did you?
Would there be any activities logged in FTL.log around that time?
To clarify, the request for influxdb.xxxxxxxx.ddns was a request which resolved within the second it was triggered by the respective cronjob from another box at 05:20:00.
At the same time, at 05:20:00 a 2nd cron job directly on the pihole host called
gave up after 10 seconds and the respective request was only logged by pihole at 05:20:18.
And no, I did not cut any lines from the log between 05:20:00 and 05:20:18.
Empty, nothing logged at that time in FTL.log.
The 2 cron jobs I traced here run every 15 minutes, there's never an issue, except kind of every night (there was a similar pattern around 3am last night).
Feb 24 10:35:00 dnsmasq[615232]: query[A] influxdb.yyyyyyy.ddns.net from 192.168.0.180
Feb 24 10:35:00 dnsmasq[615232]: cached-stale influxdb.yyyyyyy.ddns.net is y.y.y.y
Feb 24 10:35:00 dnsmasq[615232]: forwarded influxdb.yyyyyyy.ddns.net to 8.8.8.8
Feb 24 10:35:00 dnsmasq[615232]: query[AAAA] influxdb.yyyyyyy.ddns.net from 192.168.0.180
Feb 24 10:35:00 dnsmasq[615232]: cached influxdb.yyyyyyy.ddns.net is NODATA-IPv6
Feb 24 10:35:00 dnsmasq[615232]: query[AAAA] cs.appsmith.com from 192.168.0.180
Feb 24 10:35:00 dnsmasq[615232]: cached cs.appsmith.com is NODATA-IPv6
Feb 24 10:35:00 dnsmasq[615232]: reply influxdb.yyyyyyy.ddns.net is y.y.y.y
Feb 24 10:35:00 dnsmasq[615232]: query[A] cs.appsmith.com from 192.168.0.180
Feb 24 10:35:00 dnsmasq[615232]: cached-stale cs.appsmith.com is 3.16.134.105
Feb 24 10:35:00 dnsmasq[615232]: cached-stale cs.appsmith.com is 3.12.32.188
Feb 24 10:35:00 dnsmasq[615232]: forwarded cs.appsmith.com to 8.8.8.8
Feb 24 10:35:00 dnsmasq[615232]: reply cs.appsmith.com is 3.16.134.105
Feb 24 10:35:00 dnsmasq[615232]: reply cs.appsmith.com is 3.12.32.188
Feb 24 10:35:00 dnsmasq[615232]: query[A] xxxxxx.ddns.net from 127.0.0.1
Feb 24 10:35:00 dnsmasq[615232]: /etc/pihole/hosts/custom.list xxxxxx.ddns.net is x.x.x.x
As explained, queries for yyyyy as well as for xxxxx have all been answered within the second they registered, so resolution speed isn't your issue.
The difference is that Pi-hole sees the request arriving 18 seconds after your cron job issued it.
That would leave us with no indication that Pi-hole would be involved.
If Pi-hole would have been restarting or updating its gravity database at that time, that would have registered in FTL.log.
Then the sudden jump really is remarkable.
I'm not entirely sure, but I think pihole-FTL's/dnsmasq's defaults to a 10 second timeout, but we see an upstream answer arriving 18 seconds later:
Another interesting detail is that queries for yyyyyyyy.ddns.net A and AAAA records from 127.0.0.1 appear twice in the same second:
Feb 24 05:20:18 dnsmasq[615232]: query[A] yyyyyyyy.ddns.net from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: /etc/pihole/hosts/custom.list yyyyyyyy.ddns.net is y.y.y.y
Feb 24 05:20:18 dnsmasq[615232]: query[AAAA] yyyyyyyy.ddns.net from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: cached-stale yyyyyyyy.ddns.net is NODATA-IPv6
(…)
Feb 24 05:20:18 dnsmasq[615232]: query[PTR] 200.0.168.192.in-addr.arpa from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: cached-stale 192.168.0.200 is ccu.fritz.box
(…)
Feb 24 05:20:18 dnsmasq[615232]: query[A] yyyyyyyy.ddns.net from 127.0.0.1
Feb 24 05:20:18 dnsmasq[615232]: /etc/pihole/hosts/custom.list yyyyyyyy.ddns.net is y.y.y.y
Feb 24 05:20:18 dnsmasq[615232]: query[AAAA] yyyyyyyy.ddns.net from 127.0.0.1
Does your cron job perhaps run twice?
Also, your .180 seems pretty active at that time, but its also silent between 05:20:00 and 05:20:18, making me wonder whether your system may have adjusted its clock at that time?
I just added a 2nd pihole v5 which I started in an older pi zero as well as 8.8.8.8 as additional servers to /etc/systemd/resolved.conf on the .26 pihole:
resolvectl status
Global
Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: foreign
DNS Servers: 127.0.0.1 192.168.0.111 8.8.8.8
Maybe that will make a difference if 127.0.0.1 doesn't respond in time
So .180 has been sending lookups, repeating and finally expanding them by the search domain, but didn't get a reply, while the corresponding requests either failed to register in Pi-hole, or did so only much later.
And as your cronjob doesn't run twice, that would suggest that curl also went into repeating its request.
This may suggest that DNS requests may have been queued.
Depending on dns.replyWhenBusy, Pi-hole may silently drop queries when writing to its database, but that would usually just take ms, unless under heavy load, and you'd still the queries registering, making it rather unlikely that's the case here.
Nevertheless, let's see how many DNS requests Pi-hole had to handle at that time.
What's the output of:
sudo pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT count(*) FROM queries \
WHERE timestamp > strftime('%s','2025-02-24 05:21', '-3 seconds', 'utc') \
AND timestamp <= strftime('%s','2025-02-24 05:21', 'utc');"
And also:
sudo pihole-FTL --config database.DBinterval
I was just going to ask what your /etc/resolv.conf on your Pi-hole machine looks like: Perhaps curl (or rather the lookup initiated by it) has been trying to talk to another nameserver from resolv.conf before it opted to ask Pi-hole?
But I see that you've changed it in the meantime.
The comment on top of resolv.conf suggests its under control of resolvconf.
So Pi-hole commits queries from its in-memory to its long-term database once every 60 seconds (which is the default), and there were 12 queries in 3 minutes around the time of your observation.
That's far from being enough to have Pi-hole drop requests.
Not with regard to your observation, at least as long as your resolv.conf wouldn't point to 127.0.0.53.
But pihole-FTL may fail to start if port 53 would already be bound (e.g. during a future reboot), so it's better to disable systemd-resolved's stub resolver.
/etc/systemd/resolved.conf.d/90-pi-hole-disable-stub-listener.conf does not exist. In fact there's no /etc/systemd/resolved.conf.d directory, there's only an /etc/systemd/resolved.conf
All the changes I applied yesterday didn't make any difference, same problem with that ddns.net domain. I think I'll change that to another top-level to see if that makes a difference.
However, I'm still not convinced my pihole v6 is fully stable:
Feb 25 10:19:10 dnsmasq[615232]: query[A] pool.ntp.org from 127.0.0.1
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 8.8.8.8
Feb 25 10:19:10 dnsmasq[615232]: query[AAAA] pool.ntp.org from 127.0.0.1
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 8.8.8.8
Feb 25 10:19:10 dnsmasq[615232]: query[A] pool.ntp.org from 127.0.0.1
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 8.8.8.8
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 8.8.4.4
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 208.67.222.222
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 208.67.220.220
Feb 25 10:19:10 dnsmasq[615232]: query[AAAA] pool.ntp.org from 127.0.0.1
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 8.8.8.8
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 8.8.4.4
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 208.67.222.222
Feb 25 10:19:10 dnsmasq[615232]: forwarded pool.ntp.org to 208.67.220.220
Feb 25 10:19:10 dnsmasq[615232]: reply pool.ntp.org is 195.201.107.151
Feb 25 10:19:10 dnsmasq[615232]: reply pool.ntp.org is 141.144.246.224
Feb 25 10:19:10 dnsmasq[615232]: reply pool.ntp.org is 185.232.69.65
Feb 25 10:19:10 dnsmasq[615232]: reply pool.ntp.org is 188.245.97.96
Feb 25 10:19:10 dnsmasq[615232]: reply pool.ntp.org is NODATA-IPv6
Just an update/observation here: I've moved the 15 minute cron job which was previously running on the pihole host itself to a different box where after 2 nights it ran without hiccups. So it seems the issue is related to lookups from a client on the pihole which fail every once in a while.
Those NTP messages can be ignored, as your log demonstrates that resolution succeeded just 4 seconds later.
That may suggest that the cronjob has triggered the very issue that it was observing. It's probably related to how curl handled the lookup.
There never would have been?
systemd-resolved's stub listener would usually default to 127.0.0.53:53, which your previous output has also shown.
Perhaps resolv.conf had nameserver 127.0.0.53 in it at some time, rather than nameserver 127.0.0.1?