Please follow the below template, it will help us to help you!
If you are Experiencing issues with a Pi-hole install that has non-standard elements (e.g you are using nginx instead of lighttpd, or there is some other aspect of your install that is customised) - please use the Community Help category.
Expected Behaviour:
Pi-Hole Functioning normally. I've been running pi-hole for about a year or so now. Started on a Pi-Zero and am now on a Pi4 B+ with the Argon SSD case, running on a 250GB SSD. Updated to all latest versions
Actual Behaviour:
Was watching TV (we stream everything) and browsing the web on my phone when I noticed random devices were losing access to the internet. I rebooted every device to only continue having the issue. It was when I logged on to my laptop that I noticed while trying to browse the web that it said there was a DNS resolve issue. So I jumped on to my Pi-Hole Remote on my phone and, sure enough, it said it couldn't connect to Pi-Hole. I was still able to ssh in to the Pi4 and reboot the machine which cause Pi-Hole to come up.
The issue I'm bringing to the table is that I noticed while doing a tail-ftl that there were ALOT of SQLite3 misuse and corruption logs flying by. You will see in the debug token.
I'm not sure if there is a bigger issue at hand or if there is something that can be done to fix it, but I wanted to bring it to your attention before it maybe affected more people.
-----tail of pihole-FTL.log------
[2021-08-17 21:26:25.804 621M] SQLite3 message: database corruption at line 81887 of [5d4c65779d] (11)
[2021-08-17 21:26:25.804 621M] SQLite3 message: statement aborts at 10: [SELECT interface FROM network JOIN network_addresses ON network_addresses.network_id = network.id WHERE network_addresses.ip = ? AND interface != 'N/A' AND interface IS NOT NULL;] data (11)
Run these commands to move the existing database to a new file name and create a fresh database.
sudo service pihole-FTL stop
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL-old.db
sudo service pihole-FTL start
Thanks. Looking at your debug log, I don't see any hits that the two issues (database issue / DNS resolution problem) are connected. This is furthermore supported by the observation that DNS was up again after a device reboot while the database was still similarly broken afterwards. In fact, the database is only a passive tool for reporting what is going on and not required for DNS operation. It can even be disabled altogether and Pi-hole still works fine.
Could you check some older (/var/log/pihole.log.1, etc.) files if the database issue was already present before you notices the outage?
Could you also check /var/log/pihole.log.1 (maybe another rotated file) to see if Pi-hole was still receiving queries when you had the outage and whether it replied to them (or did it refuse to reply to them, etc.)?
I've seen a few times you observe (only the DNS issue) when the upstream server was not available so Pi-hole started to hand out REFUSED for all queries it couldn't serve locally from cache because it had no valid destination to forward the queries to. I'm not yet saying that's it but it would match your DNS outage observation well.
I am currently experiencing the issue again. I do see a "client denied (at max capacity of 255)" in there. I have uploaded another debug log. I will check in to the logs and rotated logs and let you know what I see shortly.
I also see the counters as blank and percent blocked as "undefined".
I also ran the commands provided by @jfb and the services are back from restarting the FTL. I am going to check the rotated logs still, but here's hoping the DB issue is resolved.
The important thing we want to check here is if the issues really happened at the same time (= once caused the other) or if the database issue might have existed for longer.
Also, what is the output of
date -d "@$(sqlite3 /etc/pihole/pihole-FTL-old.db "SELECT MAX(timestamp) FROM queries;")"
(already adapted to the renamed database)?
How does this correlate to the DNS resolving issue you reported?
Here is a view of pihole-FTL.log.3 and 2, respectively. Seems there were issues for a few days. I'm thinking at this point the DB being corrupt was the issue. The FTL logs have calmed down dramatically.
Here is the output from the command you sent. I believe the DB issue correlates to causing the server to crash, thus not allowing the server to handle the DNS requests.
I just realized you were maybe asking about the output and correlation to the DNS issue I reported, not the DB comments. The time output by the command is absolutely around when DNS stopped working.
Could you check the log file exactly at the time when the Pi stopped working completely? The content of /var/log/pihole.log (or one of its rotated away copied) will be interesting as well.
From what you describe, I extracted that the issues have been there before the DNS outage happens and your current assumption is that the database issues finally ended up in the DNS outage, right?
Correct. I am not sure when exactly it would have stopped working but you can see evidence of the issue in all 3 rotated FTL logs. I can upload them somewhere for you to check out, if you would like. I will dig, too, but I'm sure you'd find it faster.
Your second screenshot is interesting because time jumps forward and backwards again in the same log file. Maybe your clock is unstable and gets corrected very often...
Hmm, yes, maybe. We just haven't seen this before but I set up a testing Pi-hole yesterday whose database I corrupted manually. We'll see if it stops working at some point too.
The database is locked (but not corrupted) and there are too many clients trying to access the API at once. Do you have any scripts accessing the Pi-hole API and maybe gathering data? Also, could you check if you have (zombie) PHP processes running? The PHP API processes should exit immediately after serving the content. If they don't, this could explain what you are seeing here.
I did notice that the RPi4 had the wireless network connected. Product of when I moved to the M.2 SSD. I brought down it's Wi-Fi and removed the saved network so it won't reconnect. Since then, it seems there hasn't been any issues. I wonder if that's why there were too many clients connected? It was set to only listen on eth0 though.
I don't have any scripts running, at least none that I setup. Maybe all of this was a product of my microSD copy to SSD. /var/log/pihole-FTL.log.1 hasn't been updated since 22:55 on 8/19