My primary Pi-hole randomly stopped passing traffic in the middle of the night for no apparent reason (although, after reviewing my logs further, appears to have been triggered by Daily apt download activities@03:27:05
Pihole2 syslog
This little RPi3B+ has been absolutely incredible. I'm honestly surprised my micro SD card hasn't failed in the past ~4 years, or so that we've been running it. Thankfully the Pihole4 VM picked right up so we didn't lose connectivity, but i'd be a lot more comfortable if I could have some assistance trying to big brain the solution here -- i'm obviously floundering.
OS: Raspbian GNU/Linux 10 (buster) armv7l
Host: Raspberry Pi 3 Model B Plus Rev 1.3
Kernel: 5.10.17-v7+
Uptime: 3 hours, 58 mins
CPU: BCM2835 (4) @ 1.400GHz
Memory: 120MiB / 973MiB
Actual Behaviour:
Pihole stopped passing traffic this was the very last message from my pihole.log:
Jun 12 03:27:35 dnsmasq[1822]: query[AAAA] www.google.com from 192.168.107.216
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
I believe it was triggered automatically by my UnattendedUpgrades -- what's best practices state for Pi-hole, as to proper method for keeping my various machines up to date? Thanks for your input, sir.
I have the following entry in my /etc/apt/apt.conf.d/20auto-upgrades:
● apt-daily.timer - Daily apt download activities
Loaded: loaded (/lib/systemd/system/apt-daily.timer; enabled; vendor preset: enabled)
Active: active (waiting) since Sat 2021-06-12 03:08:29 EDT; 16h ago
Trigger: Sun 2021-06-13 01:01:58 EDT; 5h 0min left
Jun 12 03:08:29 pihole2 systemd[1]: Started Daily apt download activities.
Sounds about right, I believe this confirms our original suspicions, my bad -- it was triggered through the UnattendedUpgrades package, which i discovered has a sixty minute RandomizedDelaySec variance between trigger times, apparently it's ONLY the reboot time which I have manually selected (typically I get an email via bsd-mailx service i think if anything goes awry).
Sorry, I'm bad. Lesson learned, RTFM before posting in a half panicked state. I should have been able to work through this one on my own. Appreciate your diligence, sir. Have a wonderful rest of your weekend. Stay safe.
Okay, so the Daily apt download activities timer was triggered at 03:08:29 EDT and eventually mucked something up, but what? I can't really make sense of the syslog, but apparently something was corrupt that it couldn't recover from:
Jun 12 03:08:29 pihole2 systemd-fsck[247]: 0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
Jun 12 03:09:01 pihole2 CRON[955]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Jun 12 03:09:04 pihole2 systemd[1]: Starting Clean php session files...
Jun 12 03:09:04 pihole2 systemd[1]: phpsessionclean.service: Succeeded.
Jun 12 03:09:04 pihole2 systemd[1]: Started Clean php session files.
Jun 12 03:10:01 pihole2 CRON[1008]: (root) CMD ( PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker local)
Jun 12 03:17:01 pihole2 CRON[1054]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 12 03:19:57 pihole2 rngd[323]: stats: bits received from HRNG source: 3380064
Jun 12 03:19:57 pihole2 rngd[323]: stats: bits sent to kernel pool: 3300512
Jun 12 03:19:57 pihole2 rngd[323]: stats: entropy added to kernel pool: 3300512
Jun 12 03:19:57 pihole2 rngd[323]: stats: FIPS 140-2 successes: 168
Jun 12 03:19:57 pihole2 rngd[323]: stats: FIPS 140-2 failures: 1
Jun 12 03:19:57 pihole2 rngd[323]: stats: FIPS 140-2(2001-10-10) Monobit: 0
Jun 12 03:19:57 pihole2 rngd[323]: stats: FIPS 140-2(2001-10-10) Poker: 1
Jun 12 03:19:57 pihole2 rngd[323]: stats: FIPS 140-2(2001-10-10) Runs: 0
Jun 12 03:19:57 pihole2 rngd[323]: stats: FIPS 140-2(2001-10-10) Long run: 0
Jun 12 03:19:57 pihole2 rngd[323]: stats: FIPS 140-2(2001-10-10) Continuous run: 0
Jun 12 03:19:57 pihole2 rngd[323]: stats: HRNG source speed: (min=424.805; avg=831.668; max=1065.128)Kibits/s
Jun 12 03:19:57 pihole2 rngd[323]: stats: FIPS tests speed: (min=4.255; avg=7.429; max=16.629)Mibits/s
Jun 12 03:19:57 pihole2 rngd[323]: stats: Lowest ready-buffers level: 2
Jun 12 03:19:57 pihole2 rngd[323]: stats: Entropy starvations: 0
Jun 12 03:19:57 pihole2 rngd[323]: stats: Time spent starving for entropy: (min=0; avg=0.000; max=0)us
Jun 12 03:20:01 pihole2 CRON[1079]: (root) CMD ( PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker local)
Jun 12 03:27:05 pihole2 systemd[1]: Starting Daily apt download activities...
If your filesystem isn't properly unmounted, writes to open files may have failed, which will likely and randomly corrupt any program that has to rely on some kind of persistent state sooner or later.
You should try to find out which of your unattended upgrades would require or enforce unmounting filesystems, and probably shouldn't perform those upgrades unattended.
If that's not possible, it may be advisable -as a precautionary measure- that you stop certain processes explicitly before you initiate your unattended upgrades, and restart them afterwards.
Thank you kindly for the detailed explanation, sir. At this point I'm shifting gears and going to assume my micro SD card in the RPi3B+ is finally going bad. Things just started acting weird, my device no longer passes traffic through the wireguard tunnel (i see tx: traffic, but rx: 0 B is perpetually stuck) and the only thing that makes sense at this point is pending drive failure.
That's my best uneducated guess right now. Probably should be performing backup/recovery before I lose complete access to the device and data. Already used Teleporter and backed up my wireguard configs. Thanks again @Bucking_Horn !