Pi-Hole/Raspian stops responding overnight, every night

Please follow the below template, it will help us to help you!

Expected Behaviour:

[Pi-Hole should run without crashing overnight]

Actual Behaviour:

_[Every night my Raspberry Pi-based Pi-Hole stops responding. It cannot be pinged and I cannot SSH on to it until I pull the power and reboot it. Looking in pihole.log shows many "config error is REFUSED" entries, which seem to start being generated between 3 and 4am and continue until I reboot:

May 7 03:11:46 dnsmasq[24841]: reply d2ihp1a8eyb4of.cloudfront.net is 143.204.192.136
May 7 03:11:46 dnsmasq[24841]: reply d2ihp1a8eyb4of.cloudfront.net is 143.204.192.85
May 7 03:11:46 dnsmasq[24841]: reply d2ihp1a8eyb4of.cloudfront.net is 143.204.192.73
May 7 03:11:46 dnsmasq[24841]: reply d2ihp1a8eyb4of.cloudfront.net is 143.204.192.79
May 7 04:00:00 dnsmasq[24841]: query[PTR] 252.2.168.192.in-addr.arpa from 127.0.0.1
May 7 04:00:00 dnsmasq[24841]: config error is REFUSED
May 7 04:00:00 dnsmasq[24841]: query[PTR] 252.2.168.192.in-addr.arpa from 127.0.0.1
May 7 04:00:00 dnsmasq[24841]: config error is REFUSED
May 7 04:00:00 dnsmasq[24841]: query[PTR] 238.2.168.192.in-addr.arpa from 127.0.0.1
May 7 04:00:00 dnsmasq[24841]: config error is REFUSED
May 7 04:00:00 dnsmasq[24841]: query[PTR] 238.2.168.192.in-addr.arpa from 127.0.0.1
May 7 04:00:00 dnsmasq[24841]: config error is REFUSED
May 7 04:00:00 dnsmasq[24841]: query[PTR] 219.2.168.192.in-addr.arpa from 127.0.0.1
May 7 04:00:00 dnsmasq[24841]: config error is REFUSED
May 7 04:00:00 dnsmasq[24841]: query[PTR] 219.2.168.192.in-addr.arpa from 127.0.0.1
May 7 04:00:00 dnsmasq[24841]: config error is REFUSED
May 7 04:00:00 dnsmasq[24841]: query[PTR] 248.2.168.192.in-addr.arpa from 127.0.0.1
May 7 04:00:00 dnsmasq[24841]: config error is REFUSED
May 7 04:00:00 dnsmasq[24841]: query[PTR] 248.2.168.192.in-addr.arpa from 127.0.0.1

Otherwise it seems to be functioning fine.
]_

Debug Token:

[td3dwch9os]

Disable Conditional Forwarding temporarily and see if that fixes it. The DNS server you are forwarding 192.168.2.x PTR queries to is refusing to answer.

Will do, thanks. I'll report back tomorrow.

Thinking about this more, and not knowing a huge amount about DNS personally, it does appear that the PTR queries are being responded too earlier on:

May 7 01:00:00 dnsmasq[24841]: query[PTR] 203.2.168.192.in-addr.arpa from 127.0.0.1
May 7 01:00:00 dnsmasq[24841]: forwarded 203.2.168.192.in-addr.arpa to 192.168.2.1
May 7 01:00:00 dnsmasq[24841]: query[PTR] 152.2.168.192.in-addr.arpa from 127.0.0.1
May 7 01:00:00 dnsmasq[24841]: forwarded 152.2.168.192.in-addr.arpa to 192.168.2.1
May 7 01:00:00 dnsmasq[24841]: query[PTR] 151.2.168.192.in-addr.arpa from 127.0.0.1
May 7 01:00:00 dnsmasq[24841]: forwarded 151.2.168.192.in-addr.arpa to 192.168.2.1
May 7 01:00:00 dnsmasq[24841]: query[PTR] 211.2.168.192.in-addr.arpa from 127.0.0.1
May 7 01:00:00 dnsmasq[24841]: forwarded 211.2.168.192.in-addr.arpa to 192.168.2.1
May 7 01:00:00 dnsmasq[24841]: query[PTR] 217.2.168.192.in-addr.arpa from 127.0.0.1

but something is causing the error around 3-4am, so could it be an issue on the router side? I'm using an Asus AC68u running the latest Merlin firmware.

Just a wild idea, not knowing which type of internet connection you have and where you are located, but this is a typical time when DSL ISP preform the mandatory 24h forced disconnection (at last here in Germany)

I'm in the UK but I'm not aware of any such overnight disconnection here.

There is another type of error as well:

May 7 05:08:48 dnsmasq[24841]: query[A] uk.pool.ntp.org from 127.0.0.1
May 7 05:08:48 dnsmasq[24841]: config error is REFUSED
May 7 05:08:48 dnsmasq[24841]: query[AAAA] uk.pool.ntp.org from 127.0.0.1
May 7 05:08:48 dnsmasq[24841]: config error is REFUSED
May 7 05:08:48 dnsmasq[24841]: query[A] uk.pool.ntp.org from 127.0.0.1
May 7 05:08:48 dnsmasq[24841]: config error is REFUSED
May 7 05:08:48 dnsmasq[24841]: query[AAAA] uk.pool.ntp.org from 127.0.0.1
May 7 05:08:48 dnsmasq[24841]: config error is REFUSED
May 7 05:42:57 dnsmasq[24841]: query[A] uk.pool.ntp.org from 127.0.0.1
May 7 05:42:57 dnsmasq[24841]: config error is REFUSED
May 7 05:42:57 dnsmasq[24841]: query[AAAA] uk.pool.ntp.org from 127.0.0.1
May 7 05:42:57 dnsmasq[24841]: config error is REFUSED
May 7 05:42:57 dnsmasq[24841]: query[A] uk.pool.ntp.org from 127.0.0.1
May 7 05:42:57 dnsmasq[24841]: config error is REFUSED
May 7 05:42:57 dnsmasq[24841]: query[AAAA] uk.pool.ntp.org from 127.0.0.1

Not as frequent as the PTR errors, but you can see the errors carrying on right until pulling the power and rebooting:

May 7 08:00:00 dnsmasq[24841]: query[PTR] 1.1.1.1.in-addr.arpa from 127.0.0.1
May 7 08:00:00 dnsmasq[24841]: config error is REFUSED
May 7 08:00:00 dnsmasq[24841]: query[PTR] 1.1.1.1.in-addr.arpa from 127.0.0.1
May 7 08:00:00 dnsmasq[24841]: config error is REFUSED
May 7 07:17:17 dnsmasq[604]: started, version pi-hole-2.80 cachesize 10000
May 7 07:17:17 dnsmasq[604]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth DNSSEC loop-detect inotify dumpfile

Your clock is off.

May 7 0800 REFUSED
Reboot (Pulling the power on a running Raspberry Pi device is a very bad idea and will kill SDcards quite readily.)
May 7 0700 dnsmasq start up.

Edit: Accurate time is very important for things like DNSSEC, time errors will result in loss of DNS capability.

I don't think it is? During the start sequence you can see the ntp time sync a few lines lower down:

May 7 07:17:35 dnsmasq[604]: forwarded uk.pool.ntp.org to 1.0.0.1
May 7 07:17:35 dnsmasq[604]: forwarded uk.pool.ntp.org to 1.1.1.1
May 7 07:17:35 dnsmasq[604]: query[AAAA] uk.pool.ntp.org from 127.0.0.1
May 7 07:17:35 dnsmasq[604]: forwarded uk.pool.ntp.org to 1.0.0.1
May 7 07:17:35 dnsmasq[604]: forwarded uk.pool.ntp.org to 1.1.1.1
May 7 07:17:35 dnsmasq[604]: reply uk.pool.ntp.org is 85.199.214.101
May 7 07:17:35 dnsmasq[604]: reply uk.pool.ntp.org is 85.199.214.100
May 7 07:17:35 dnsmasq[604]: reply uk.pool.ntp.org is 81.21.65.169
May 7 07:17:35 dnsmasq[604]: reply uk.pool.ntp.org is 176.58.109.199
May 7 07:17:35 dnsmasq[604]: reply uk.pool.ntp.org is NODATA-IPv6
May 7 08:05:54 dnsmasq[604]: query[A] api.github.com from 127.0.0.1
May 7 08:05:54 dnsmasq[604]: forwarded api.github.com to 1.0.0.1
May 7 08:05:54 dnsmasq[604]: forwarded api.github.com to 1.1.1.1

Just after 8am was when I rebooted.

I appreciate that pulling the power is a bad idea, but if the Pi won't respond to ping or SSH is there any alternative?

dnsmasq[24841] 08:00:00
POWER KILLED
dnsmasq[604] 07:14:17 (It rebooted to before it died.)

But with no real time battery backed clock, how can my Pi know what the time is until it syncs via ntp?

The fact that it received a reply for the DNS address of the time server does not establish that the time was set correctly. If the time is far off from actual, syncing with an ntp server may not restore the time.

Run the date command and see if the current time on the Pi matches your local time.

Take a look in /var/log/syslog for the last few nights and see if there is any related activity about the times where you start to see problems.

So the same thing happened again last night. The date command returns the correct time (and the same time as running the date on my router).

May 8 03:25:54 dnsmasq[20568]: query[A] dht.libtorrent.org from 192.168.2.248
May 8 03:25:54 dnsmasq[20568]: forwarded dht.libtorrent.org to 1.0.0.1
May 8 03:25:54 dnsmasq[20568]: query[A] www.googleapis.com from 192.168.2.248
May 8 03:25:54 dnsmasq[20568]: forwarded www.googleapis.com to 1.0.0.1
May 8 03:25:54 dnsmasq[20568]: query[A] play.googleapis.com from 192.168.2.248
May 8 03:25:54 dnsmasq[20568]: forwarded play.googleapis.com to 1.0.0.1
May 8 04:00:00 dnsmasq[20568]: query[PTR] 151.2.168.192.in-addr.arpa from 127.0.0.1
May 8 04:00:00 dnsmasq[20568]: config error is REFUSED
May 8 04:00:00 dnsmasq[20568]: query[PTR] 151.2.168.192.in-addr.arpa from 127.0.0.1
May 8 04:00:00 dnsmasq[20568]: config error is REFUSED
May 8 04:00:00 dnsmasq[20568]: query[PTR] 219.2.168.192.in-addr.arpa from 127.0.0.1
May 8 04:00:00 dnsmasq[20568]: config error is REFUSED
May 8 04:00:00 dnsmasq[20568]: query[PTR] 219.2.168.192.in-addr.arpa from 127.0.0.1
May 8 04:00:00 dnsmasq[20568]: config error is REFUSED

The syslog at this time shows the following:

May 8 03:39:04 pi-hole systemd[1]: phpsessionclean.service: Succeeded.
May 8 03:39:04 pi-hole systemd[1]: Started Clean php session files.
May 8 03:40:02 pi-hole CRON[9429]: (root) CMD ( PATH="$PATH:/usr/local/bin/" pihole updatechecker local)
May 8 03:50:01 pi-hole CRON[9471]: (root) CMD ( PATH="$PATH:/usr/local/bin/" pihole updatechecker local)
May 8 04:00:01 pi-hole CRON[9513]: (root) CMD ( PATH="$PATH:/usr/local/bin/" pihole updatechecker local)
May 8 04:05:23 pi-hole rngd[390]: stats: bits received from HRNG source: 660064
May 8 04:05:23 pi-hole rngd[390]: stats: bits sent to kernel pool: 604608
May 8 04:05:23 pi-hole rngd[390]: stats: entropy added to kernel pool: 604608
May 8 04:05:23 pi-hole rngd[390]: stats: FIPS 140-2 successes: 33

The syslog entries continue right up until 09:00 when I reboot the Pi:
May 8 08:39:01 pi-hole CRON[11385]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
May 8 08:39:04 pi-hole systemd[1]: Starting Clean php session files...
May 8 08:39:04 pi-hole systemd[1]: phpsessionclean.service: Succeeded.
May 8 08:39:04 pi-hole systemd[1]: Started Clean php session files.
May 8 08:40:01 pi-hole CRON[11438]: (root) CMD ( PATH="$PATH:/usr/local/bin/" pihole updatechecker local)
May 8 08:50:01 pi-hole CRON[11480]: (root) CMD ( PATH="$PATH:/usr/local/bin/" pihole updatechecker local)
May 8 09:00:01 pi-hole CRON[11527]: (root) CMD ( PATH="$PATH:/usr/local/bin/" pihole updatechecker local)
May 8 08:17:05 pi-hole fake-hwclock[106]: Fri 8 May 07:17:01 UTC 2020
May 8 08:17:05 pi-hole systemd[1]: Started udev Coldplug all Devices.
May 8 08:17:05 pi-hole systemd[1]: Starting Helper to synchronize boot up for ifupdown...

At the time of rebooting attempts to SSH on to or ping the Pi fail. As I see it there are two issues here. Firstly when are the connection requests refused at around 4am when they have been successful for 20 hours and then why does this cause the Pi to stop responding to SSH and pings?

There is again a leap in time.


What do you use as upstream DNS Server?
If

part of the problem might be there.

That leap in time is just lack of a battery backed clock, so the Pi does't know what time it is until NTP syncs it later in the boot process.

The problem is resolved now though, and I feel quite stupid. Looking through the syslog with a friend who knows more about Linux logs than I ever will he picked out the following:

May 8 03:25:55 pi-hole dhcpcd[557]: eth0: 10 second defence failed for 192.168.2.155
May 8 03:25:55 pi-hole dhcpcd[557]: eth0: deleting route to 192.168.2.0/24
May 8 03:25:55 pi-hole dhcpcd[557]: eth0: deleting default route via 192.168.2.1
May 8 03:25:55 pi-hole avahi-daemon[366]: Withdrawing address record for 192.168.2.155 on eth0.
May 8 03:25:55 pi-hole avahi-daemon[366]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.2.155.
May 8 03:25:55 pi-hole avahi-daemon[366]: Interface eth0.IPv4 no longer relevant for mDNS.

I have a Humax hard disk recorder running custom firmware that I set up years ago, and guess what? I chose the same static IP address for it that I used for my Pi, and around 3-4am every night the Humax box wakes up for maintenance purposes.....

All good now, and many thanks to people who suggested things above,.

Glad you solved it :slight_smile:

Properly shutting things down instead of just pulling the plug will solve that. Shutdown process writes the current time to the fakeclock.

Agreed, but if the network stack is down (which it was in my case), not sure how else I could have shut down?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.