PiHole suddenly crashing/unreachable after OS update/upgrade

Please follow the below template, it will help us to help you!

If you are Experiencing issues with a Pi-hole install that has non-standard elements (e.g you are using nginx instead of lighttpd, or there is some other aspect of your install that is customised) - please use the Community Help category.

Expected Behaviour:

Raspbian Buster, PiZero W. PiHole running fine for about 1 year]

Actual Behaviour:

Today ran update and full-upgrade on Raspi. Then ran PiHole -up. Soon afterwards PiHole crashed and unreachable on ping. Have now Installed new PiHole onto new SD card and finally swapped onto spare PiZero W, but still exactly the same... crashing every hour or so. Have not changed anything else on my network.

Debug Token:

_https://tricorder.pi-hole.net/ugxep73j4v_

Your debug log is normal.

This problem may lie outside Pi-hole, either in OS level software or in hardware. You wer not able to ping the device - that is an OS level function. Were you pinging the domain name or the IP?

Do any of the following logs show any problems just prior to and around/after the time of one of the crashes?

/var/log/pihole.log

/var/log/pihole-FTL.log

/var/log/syslog

Or the command dmesg

Pinging PiHole directly on 192.168.1.193. To try to resolve, I iInstalled fresh copy of PiHole onto new Raspbian Buster image on brand new SD card. Still the same, so migrated over to spare PiZero W. Coincidental that it only fell over after sudo apt update/full-upgrade and PiHole -up.

Thanks jfb, I’ll look at the logs.

Colin

With a fresh install of Pi-hole, the Pi-hole update will do nothing. So, the failure appears to lie in the OS updates/upgrades.

/var/log/syslog: It’s referring to 192.168.1.170. PiHole is on 192.168.1.193.

Mar 1 00:07:50 raspberrypi dhcpcd[336]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.170
Mar 1 00:07:51 raspberrypi dhcpcd[336]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.170
Mar 1 00:07:51 raspberrypi dhcpcd[336]: wlan0: 10 second defence failed for 192.168.1.170
Mar 1 00:07:51 raspberrypi avahi-daemon[281]: Withdrawing address record for 192.168.1.170 on wlan0.
Mar 1 00:07:51 raspberrypi avahi-daemon[281]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.170.
Mar 1 00:07:51 raspberrypi dhcpcd[336]: wlan0: deleting route to 192.168.1.0/24

I’ll poke into my router ip tables. I have many RasPi’s on my network. .170 was a new one that I set up yesterday, but no idea why it decided to clash with .193. My problem :frowning:

It’s all good fun.

So, I suppose I did change something on my network that introduced the .170, but I still don’t see how it’s messed up the PiHole on .193. Leave it with me and I’ll do some more poking. It’s still up for now.

It’ll be something stupid. Probably my Talktalk router… I’ll blame that.

.170?... that’s interesting though.

Hi jfb,

My PiHole was unreachable again when I came home from work tonight, so I rebooted it and started a continuous ping. It ran fine again until 22.54, then this happened: (copied from /var/log/syslog)

Mar 2 22:40:01 raspberrypi CRON[2096]: (root) CMD ( PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker local)
Mar 2 22:50:02 raspberrypi CRON[2269]: (root) CMD ( PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker local)
Mar 2 22:54:26 raspberrypi dhcpcd[346]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.193
Mar 2 22:54:27 raspberrypi dhcpcd[346]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.193
Mar 2 22:54:27 raspberrypi dhcpcd[346]: wlan0: 10 second defence failed for 192.168.1.193
Mar 2 22:54:27 raspberrypi avahi-daemon[306]: Withdrawing address record for 192.168.1.193 on wlan0.
Mar 2 22:54:27 raspberrypi avahi-daemon[306]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.193.
Mar 2 22:54:27 raspberrypi dhcpcd[346]: wlan0: deleting route to 192.168.1.0/24
Mar 2 22:54:27 raspberrypi dhcpcd[346]: wlan0: deleting default route via 192.168.1.1
Mar 2 22:54:27 raspberrypi avahi-daemon[306]: Interface wlan0.IPv4 no longer relevant for mDNS.
Mar 2 22:54:28 raspberrypi systemd[1]: dhcpcd.service: Main process exited, code=killed, status=11/SEGV
Mar 2 22:54:28 raspberrypi avahi-daemon[306]: Interface wlan0.IPv6 no longer relevant for mDNS.
Mar 2 22:54:28 raspberrypi avahi-daemon[306]: Leaving mDNS multicast group on interface wlan0.IPv6 with address fdaa:bbcc:ddee:0:96b:cbbf:8e6c:be67.
Mar 2 22:54:28 raspberrypi avahi-daemon[306]: Withdrawing address record for fdaa:bbcc:ddee:0:96b:cbbf:8e6c:be67 on wlan0.
Mar 2 22:54:28 raspberrypi systemd[1]: dhcpcd.service: Failed with result 'signal'.

That was the last syslog report entry before it dropped off. What’s with it withdrawing IPv4 and IPv6 address records and deleting routes to network and router?

The Pi stayed up for about 55 minutes between rebooting and it becoming unreachable again.

I’ve never seen it going loopy like this before.

Colin

Colin,

Created an account to comment on your thread. My Pi Zero W recently had (and still has) the exact same problem that started after I updated it as well. My Pi Zero W is the secondary Pihole on my LAN. Primary Pihole is on a Pi 3B+. I chased this issue around for the better part of 2 weeks, trying a bunch of different fixes, including doing what you've done as well. I'm fairly certain this behavior is caused by the group rekeying happening with the router and the Pi, which happens every hour (or a configurable amount of time, depending on your router). I never solved the root issue, but I've implemented a bandaid to restart dhcpcd if the Pi can't ping the router by running a cronjob every 5 min. This is obviously less than ideal if this is your primary Pihole. I'd be curious to know why the Pi Zero W seems to be struggling with group rekeying - something I haven't had time to research in depth. Hope this helps.

This doesn't seem related to Pi-hole.

This is your RPi trying to reacquire its IP address (supposedly after its wifi connection dropped off - I'd guess a carrier lost is preceding your log excerpt).
But 192.168.1.193 is already claimed by another h/w address (which would be your RPi, of course), so the check fails and IP and associated connection is dropped as well.

It's really odd that dhcpcd5 is failing with a segmentation fault after that (code=killed, status=11/SEGV). That shouldn't happen at all and may constitute a bug in dhcpcd5.

This is indeed a system error, so you should consider to consult RPi support as well.

Just a hunch:
Does the same error occur if you try to run from a fresh image without doing a full-upgrade?

This discussion on PiAware seems related, exhibiting the same symptoms:

The developer of dhcpcd5 was indirectly involved in at least fixing the seg fault, but the triggering condition seems to be caused by misbehaving routers.

On first glance, I couldn't find a backport for Buster, nor does a newer dhcpcd5 release seem to be planned for Debian's next release.

I was going to do as Bucking_Horn suggested... fresh image without doing a full-upgrade but before I got that far, I decided to reset my router to factory defaults, and PiHole has now been up and stable for about 20 hours which is 20 times better than it was. I haven't touched the Pi or its config so I can only assume that some conflict had developed between my router and the Pi OS. I hadn't changed any router config since I set up the primary DNS to the PiHole over a year ago. So, all good now. PiHole is back doing its stuff... sad faces in place of adverts, and I can get back to trying to build my network controlled tank robot with Arduino and RasPi, but that's getting way off topic now. (I can see many headaches coming with that one). If it's of any interest or value here, my router is a TalkTalk Wi-Fi Hub (Model: Sagem FAST 5364-X.TY). Thanks for all your suggestions and input guys... Very much appreciated.

Glad to hear your Pihole is back in business. An update on my Pi Zero W after reading through most of the thread Bucking_Horn linked to. It would certainly seem my router, a Zyxel c3000z is guilty of sending bad ARP data to my Pis (both Zero and 3B+). Since I have the newest firmware, there isn't much I can do about the router unless I want to replace it. So, what I did was added the "noarp" entry to dhcpcd.conf and restarted the service. Also removed the other bandaid approach I had been using as explained previously. This noarp fix also appears to have resolved the issue and I haven't seen a single entry with the errors you posted above and my Pi Zero W has remained online for 24 hours now. My basic understanding is that disabling the ARP check on the Pi is probably low risk and low impact for a home network, especially when using static IPs for the Pihole devices. Something to keep in mind if the problem returns. Anyway, good luck with the next project - also appreciate the advice provided in this thread.

That's good to know. The PiHole is still up so I'll leave things as they are just now, but I'll keep a note of the noarp fix. I've got another Pi Zero, a 3B and two Pi 4's permanently online doing various things but none of them have had the same problem. I'll maybe have a look at syslog on them and compare it to the PiHole Zero. I'm no expert at diagnosing problems like this, so any assistance is always welcome... but I don't make a habit of it! I have some experience with Cisco and many other fixed and radio network devices from 40 years in telecoms... I work with cables and fibres every day, then I come home and do similar stuff here. I really should get out more :slight_smile: Thanks again all.

Update: My PiHole kept continuing to drop off again after my last post. I tried swapping again from one PiZero W to another, but it still didn't stay up for more than an hour. I also tried the suggested 'noarp' fix in dhcpcd.conf, but that also didn't improve things. I've now installed it on a Pi 3B hard-wired to my router, and that seems to be holding up. The only common thing I can see here is that it doesn't like the wlan0 interface on any PiZero W.

I am running latest Pi-hole on latest Raspbian Buster on Wifi with no issues.

I'm not looking to lock horns with you on this. My PiHole ran faultlessly on WiFi for well over a year before going south. I'm now just trying to figure out what has caused it to fail. As previously intimated, I binned my previous setup and migrated it onto an all new PiZero board, new Micro SD card with new Buster image and new PiHole install, but it still failed after about one hour.

I’m not a software engineer by any measure, but do I know about swapping hardware to diagnose faults in telecoms networks. I’ve been doing it for 40 years.

The point I was making is that it’s now broken on both of my PiZero W’s, but is now running very happily on my hard-wired Pi 3B.

It seems to me that your original point was that no PiZero W is capable of running Pi-hole without crashing. JFB's response wasn't 'locking horns' with you but instead providing a data point that Pi-hole does in fact run fine on PiZeroW's.

The point that I was making is that this appears to be unique to your install, and does not appear related to any recent OS updates. I have latest OS on multiple Zero W wifi with Pi-hole, and all works. I don't think this is something that was done by an OS install.

Have you checked the sleep settings for the WiFi adapter on the Zero W?

Further to that, I've changed nothing in my router, and I've got another PiZero W which has been running a webcam for about two years. All good there.

Sleep settings... good suggestion. I'll check that.