Strange ntp issues with router/pihole and occassional failure of DNS

Please follow the below template, it will help us to help you!

Expected Behaviour:

ntp should resolve easily from both the router and pihole when a command like sudo ntpdate -d -q pool.ntp.org works from one of the local network clients. Local clients should get issued with IP addresses by the DHCP server of the pihole.

Actual Behaviour:

ntpdate fails with DNS issues; logs show that - for example - a lookup for 2.debian.pool.ntp.org works one minute but a few seconds later will fail (shouldn't it be cached) and localhost (in the pihole logs, so presumably the rpi running pihole itself) is making 12000 requests for DNS resolution of pool.ntp.org addresses in 24hours. It actual looks up the same address twice in 1s, despite the first lookup getting a "OK (forwarded)" response [shouldn't the 2nd be OK (cached)?].

When this is happening DHCP fails. It doesn't work even with a reboot. nmap reports that port 53 is open at this time, clients (who were connected and so have IP already) can ping the pihole successfully. DHCP on the router can be turned on again (and off on the pihole) in order to fix the problem.

To repeat: This is an intermittent failure, it all works for a couple of days before failing.

Debug Token:

Currently not working, formerly https://tricorder.pi-hole.net/gwq3fmkvtq

Router info:

My router is an "EE smart hub" with a MAC of "94:6A:B0:..." suggesting it is an arcadyan product; AFAICT it's a rebranded "BT home hub" of some sort(?). Nmap tells me it's running OpenWRT but it's locked down, at least I can't get a shell on it.

It logs output such as
00:12:32, 01 Jan. NTP synchronization failure
00:12:17, 01 Jan. NTP synchronization start

This happens when restarting the router, as I have now (looks like there's no battery, or datetime isn't saved?) and at random other times over which I have no control, the router config has no NTP settings exposed.

Pihole setup and narrative of issue appearance:

I have pihole set to do DHCP, which has been working fine.

What happened was the router failed to do ntp sync, which caused the pihole to then be unrecognised, I think, losing its IP address as seen from the router. So, then no DNS worked. I fixed this by temporarily enabling DHCP (made authoratitive) on the router again, the pihole appeared then to be routable and re-disabling the router's own DHCP then appeared to make things work again.

But both pihole and router don't appear to be able to do ntp properly.

When pihole -- on which I installed ntpdate for debugging -- fails to resolve the NTP servers a client on my network can resolve and communicates perfectly well with the same servers (presumably using the pihole for the DNS); but simultaneously the router is unable to sync NTP either ...

This post here appears to describe the same issue, but claims it was fixed by re-enabling the piholes settings for "never forward" (at /admin/settings.php?tab=dns), but I have those 2 settings enabled already and haven't disabled/changed them (and the pihole was working).

This post about a Synology router sounds like it could be the same issue too, but that post was unresolved too it seems.

Let's look in the pihole.log for both queries and replies for these domains. Run this command from the Pi terminal and post the output:

sudo grep pool.ntp.org /var/log/pihole.log | tail -n50

Currently:

pihole:~ $ sudo grep pool.ntp.org /var/log/pihole.log | tail -n50
Mar  5 01:07:20 dnsmasq[1343]: reply 0.debian.pool.ntp.org is 85.199.214.102
Mar  5 01:07:20 dnsmasq[1343]: reply 0.debian.pool.ntp.org is 139.162.219.252
Mar  5 01:07:20 dnsmasq[1343]: reply 0.debian.pool.ntp.org is 93.93.129.102
Mar  5 01:07:20 dnsmasq[1343]: query[A] 1.debian.pool.ntp.org from 127.0.0.1
Mar  5 01:07:20 dnsmasq[1343]: forwarded 1.debian.pool.ntp.org to 208.67.222.222
Mar  5 01:07:20 dnsmasq[1343]: reply 1.debian.pool.ntp.org is 185.144.157.134
Mar  5 01:07:20 dnsmasq[1343]: reply 1.debian.pool.ntp.org is 82.197.188.130
Mar  5 01:07:20 dnsmasq[1343]: reply 1.debian.pool.ntp.org is 195.141.190.190
Mar  5 01:07:20 dnsmasq[1343]: reply 1.debian.pool.ntp.org is 91.209.0.19
Mar  5 01:07:20 dnsmasq[1343]: query[A] 2.debian.pool.ntp.org from 127.0.0.1
Mar  5 01:07:20 dnsmasq[1343]: forwarded 2.debian.pool.ntp.org to 208.67.222.222
Mar  5 01:07:20 dnsmasq[1343]: reply 2.debian.pool.ntp.org is 162.213.35.121
Mar  5 01:07:20 dnsmasq[1343]: reply 2.debian.pool.ntp.org is 85.199.214.101
Mar  5 01:07:20 dnsmasq[1343]: reply 2.debian.pool.ntp.org is 178.79.152.182
Mar  5 01:07:20 dnsmasq[1343]: reply 2.debian.pool.ntp.org is 195.195.221.100
Mar  5 01:07:20 dnsmasq[1343]: query[A] 3.debian.pool.ntp.org from 127.0.0.1
Mar  5 01:07:20 dnsmasq[1343]: forwarded 3.debian.pool.ntp.org to 208.67.222.222
Mar  5 01:07:20 dnsmasq[1343]: reply 3.debian.pool.ntp.org is 193.150.34.2
Mar  5 01:07:20 dnsmasq[1343]: reply 3.debian.pool.ntp.org is 37.220.20.12
Mar  5 01:07:20 dnsmasq[1343]: reply 3.debian.pool.ntp.org is 85.199.214.98
Mar  5 01:07:20 dnsmasq[1343]: reply 3.debian.pool.ntp.org is 162.213.34.249
Mar  5 01:07:20 dnsmasq[1343]: query[A] 4.debian.pool.ntp.org from 127.0.0.1
Mar  5 01:07:20 dnsmasq[1343]: forwarded 4.debian.pool.ntp.org to 208.67.222.222
Mar  5 01:07:20 dnsmasq[1343]: reply 4.debian.pool.ntp.org is NXDOMAIN
Mar  5 01:07:20 dnsmasq[1343]: query[A] 4.debian.pool.ntp.org from 127.0.0.1
Mar  5 01:07:20 dnsmasq[1343]: cached 4.debian.pool.ntp.org is NXDOMAIN
Mar  5 01:52:15 dnsmasq[494]: query[A] 2.debian.pool.ntp.org from 127.0.0.1
Mar  5 01:52:15 dnsmasq[494]: forwarded 2.debian.pool.ntp.org to 208.67.220.220
Mar  5 01:52:15 dnsmasq[494]: query[AAAA] 2.debian.pool.ntp.org from 127.0.0.1
Mar  5 01:52:15 dnsmasq[494]: forwarded 2.debian.pool.ntp.org to 208.67.220.220
Mar  5 01:52:15 dnsmasq[494]: reply 2.debian.pool.ntp.org is 185.53.93.157
Mar  5 01:52:15 dnsmasq[494]: reply 2.debian.pool.ntp.org is 85.199.214.99
Mar  5 01:52:15 dnsmasq[494]: reply 2.debian.pool.ntp.org is 85.199.214.222
Mar  5 01:52:15 dnsmasq[494]: reply 2.debian.pool.ntp.org is 95.215.175.2
Mar  5 01:52:15 dnsmasq[494]: reply 2.debian.pool.ntp.org is 2001:738:0:860:1::2
Mar  5 01:52:15 dnsmasq[494]: reply 2.debian.pool.ntp.org is 2001:67c:2e74:1::20
Mar  5 01:52:15 dnsmasq[494]: reply 2.debian.pool.ntp.org is 2001:41d0:305:2100::dd3
Mar  5 01:52:15 dnsmasq[494]: reply 2.debian.pool.ntp.org is 2001:1640:3:4::dead:beef
Mar  5 02:41:04 dnsmasq[494]: query[A] 2.debian.pool.ntp.org from 127.0.0.1
Mar  5 02:41:04 dnsmasq[494]: forwarded 2.debian.pool.ntp.org to 208.67.220.220
Mar  5 02:41:04 dnsmasq[494]: query[AAAA] 2.debian.pool.ntp.org from 127.0.0.1
Mar  5 02:41:04 dnsmasq[494]: forwarded 2.debian.pool.ntp.org to 208.67.220.220
Mar  5 02:41:04 dnsmasq[494]: reply 2.debian.pool.ntp.org is 2604:a880:800:c1::10:5006
Mar  5 02:41:04 dnsmasq[494]: reply 2.debian.pool.ntp.org is 2a00:fd80:aaaa:ffff::eeee:ff1
Mar  5 02:41:04 dnsmasq[494]: reply 2.debian.pool.ntp.org is 2a04:92c7:e:537::96cb
Mar  5 02:41:04 dnsmasq[494]: reply 2.debian.pool.ntp.org is 2001:470:1f08:137f::2
Mar  5 02:41:04 dnsmasq[494]: reply 2.debian.pool.ntp.org is 195.154.223.198
Mar  5 02:41:04 dnsmasq[494]: reply 2.debian.pool.ntp.org is 178.79.152.182
Mar  5 02:41:04 dnsmasq[494]: reply 2.debian.pool.ntp.org is 195.219.205.9
Mar  5 02:41:04 dnsmasq[494]: reply 2.debian.pool.ntp.org is 193.150.34.2

This shows that every query was correctly resolved by the upstream resolvers. If this is an intermittent issue, it may be a problem with the upstream resolvers. My first troubleshooting step would be to switch upstream resolvers.

But notice in this screen capture that it resolves 2.debian.pool.ntp.org and 5s later (a dozen NTP resolving requests later) it reports the same tertiary subdomain as unknown.

This image is to show frequency of requests, .254 is my router:

I'm not sure why the query log is showing that. Do you see matching behavior at those times in /var/log/pihole.log and /var/log/pihole.log.1

240 requests to each of the 5 ntp servers in 24 hours isn't excessive.

Don't you think I need to find why pihole and router are hammering out requests for resolution of the same 5 NTP servers at a rate of 500 per hour? Why aren't they cached by the pihole? 5s seems too short for TTL, why would the pihole be going to the upstream server again after such a short time?

Can I force caching of the servers regardless of the TTL DNS reports? Can I make the pihole return the result of pool.ntp.org or uk.pool.ntp.org queries instead of the requested N.debian.pool.ntp.org subdomains?

Sorry, lots of questions, thanks for your help.

[Apologies too if stats aren't consistent with what you're seeing, I'm looking at different time ranges to what are in the specific images shown. I don't have full records.]

You should look at the software on your pi that is requesting ntp. Check to see if you have this file; if so you can edit it for your preferred servers.

/etc/ntp.conf

This command may help you determine what is making the requests - this will show running processes with ntp in the process name or file location

ps -aux | grep ntp

Another option is to map the ntp server names in your /etc/hosts file on the Pi and that will provide the IP address you want to be used.

If you dig one of the ntp domains listed using your upstream DNS server, you will get a TTL of 150 seconds, or 2.5 minutes. That means those domains will live in cache for 2.5 minutes, and will be out of cache by the next time they are requested.

The snippet of pihole.log you posted shows a burst of requests every 45 minutes or so.

The image above of the "Recent Queries" page where the status is given as "Unknown" for the query of the N.debian.pool.ntp.org domains corresponds to parts like this in the pihole.log:

Mar  4 23:24:15 dnsmasq[492]: query[AAAA] 2.debian.pool.ntp.org from 127.0.0.1
Mar  4 23:24:15 dnsmasq[492]: config error is REFUSED
Mar  4 23:24:15 dnsmasq[492]: query[A] 2.debian.pool.ntp.org from 127.0.0.1
Mar  4 23:24:15 dnsmasq[492]: config error is REFUSED
Mar  4 23:24:15 dnsmasq[492]: query[AAAA] 2.debian.pool.ntp.org from 127.0.0.1
Mar  4 23:24:15 dnsmasq[492]: config error is REFUSED
Mar  4 23:24:15 dnsmasq[492]: query[A] 2.debian.pool.ntp.org from 127.0.0.1
Mar  4 23:24:15 dnsmasq[492]: config error is REFUSED
Mar  4 23:24:15 dnsmasq[492]: query[AAAA] 2.debian.pool.ntp.org from 127.0.0.1
Mar  4 23:24:15 dnsmasq[492]: config error is REFUSED
Mar  4 23:24:15 dnsmasq[492]: query[A] 3.debian.pool.ntp.org from 127.0.0.1
Mar  4 23:24:15 dnsmasq[492]: config error is REFUSED
Mar  4 23:24:15 dnsmasq[492]: query[AAAA] 3.debian.pool.ntp.org from 127.0.0.1
Mar  4 23:24:15 dnsmasq[492]: config error is REFUSED
Mar  4 23:24:15 dnsmasq[492]: query[A] 3.debian.pool.ntp.org from 127.0.0.1
Mar  4 23:24:15 dnsmasq[492]: config error is REFUSED

If I do grep "23:24:15" /var/log/pihole.log.1 | wc -l then the output is 156, which I think means it's making 78 request in that one second, though half are IPv6, so it's only 39 NTP requests presumably .. in 1 second? Am I misinterpreting that?

FWIW I'm using the default pihole install over 2018-11-13-raspbian-stretch-lite so NTP AIUI is via the systemd-timesyncd which (as I found a couple of days ago) doesn't turn up in a pgrep ntp (what's that aphorism about naming things ...).

This is likely the problem. The Pi is requesting the domain name, and the query isn't being resolved. So, it receives no answer, then asks repeatedy until it gets an answer. I don't see an obvious cause for refusal.

I've done a few test queries on the N.debian.pool.ntp.org addresses, using ntpdate and with web (:80), and they seem pretty flaky to me, they're running other things too, I don't really see much documentation on them(?).

Checking the /etc/systemd/timesyncd.conf file I see that the debian pool are the default there, so that's good cause I can change them, so I've added:

NTP=uk.pool.ntp.org
FallbackNTP=pool.ntp.org

Which AIUI is enough, pool allocates already, that should be enough failover, reading the ntp.org docs they seem to recommend using pool instead of more specific subdomains.

It strikes me that the root issue could be rate-limiting by the upstream DNS (OpenDNS), which seems to make the rpi [and my router] go in to "apeshit" mode and make stupid numbers of requests.

I'll try this and see if it cures things, it will take a couple of weeks before I'm sure, like I said it has been an intermittent problem.

Sounds like a plan. Good luck and reply on this thread when you have a resolution or new info. There are likely others who will run into this problem at some point.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.