SQLite3 Misuse and Corruption causing crash

flesh420x · August 18, 2021, 1:47am

Please follow the below template, it will help us to help you!

If you are Experiencing issues with a Pi-hole install that has non-standard elements (e.g you are using `nginx` instead of `lighttpd`, or there is some other aspect of your install that is customised) - please use the Community Help category.

Expected Behaviour:

Pi-Hole Functioning normally. I've been running pi-hole for about a year or so now. Started on a Pi-Zero and am now on a Pi4 B+ with the Argon SSD case, running on a 250GB SSD. Updated to all latest versions

Actual Behaviour:

Was watching TV (we stream everything) and browsing the web on my phone when I noticed random devices were losing access to the internet. I rebooted every device to only continue having the issue. It was when I logged on to my laptop that I noticed while trying to browse the web that it said there was a DNS resolve issue. So I jumped on to my Pi-Hole Remote on my phone and, sure enough, it said it couldn't connect to Pi-Hole. I was still able to ssh in to the Pi4 and reboot the machine which cause Pi-Hole to come up.

The issue I'm bringing to the table is that I noticed while doing a tail-ftl that there were ALOT of SQLite3 misuse and corruption logs flying by. You will see in the debug token.

I'm not sure if there is a bigger issue at hand or if there is something that can be done to fix it, but I wanted to bring it to your attention before it maybe affected more people.

Debug Token:

https://tricorder.pi-hole.net/0slhkD80/

jfb · August 18, 2021, 1:54am

Your long term database is corrupted.

   -----tail of pihole-FTL.log------
   [2021-08-17 21:26:25.804 621M] SQLite3 message: database corruption at line 81887 of [5d4c65779d] (11)
   [2021-08-17 21:26:25.804 621M] SQLite3 message: statement aborts at 10: [SELECT interface FROM network JOIN network_addresses ON network_addresses.network_id = network.id WHERE network_addresses.ip = ? AND interface != 'N/A' AND interface IS NOT NULL;] data (11)

Run these commands to move the existing database to a new file name and create a fresh database.

sudo service pihole-FTL stop

sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL-old.db

sudo service pihole-FTL start

DL6ER · August 18, 2021, 6:01am

Thanks. Looking at your debug log, I don't see any hits that the two issues (database issue / DNS resolution problem) are connected. This is furthermore supported by the observation that DNS was up again after a device reboot while the database was still similarly broken afterwards. In fact, the database is only a passive tool for reporting what is going on and not required for DNS operation. It can even be disabled altogether and Pi-hole still works fine.

Could you check some older (/var/log/pihole.log.1, etc.) files if the database issue was already present before you notices the outage?
Could you also check /var/log/pihole.log.1 (maybe another rotated file) to see if Pi-hole was still receiving queries when you had the outage and whether it replied to them (or did it refuse to reply to them, etc.)?

I've seen a few times you observe (only the DNS issue) when the upstream server was not available so Pi-hole started to hand out REFUSED for all queries it couldn't serve locally from cache because it had no valid destination to forward the queries to. I'm not yet saying that's it but it would match your DNS outage observation well.

flesh420x · August 18, 2021, 11:11am

I am currently experiencing the issue again. I do see a "client denied (at max capacity of 255)" in there. I have uploaded another debug log. I will check in to the logs and rotated logs and let you know what I see shortly.

I also see the counters as blank and percent blocked as "undefined".

https://tricorder.pi-hole.net/7BvSuSci/

flesh420x · August 18, 2021, 11:23am

Thank you. I went ahead and moved the db to old and started again. I will monitor and let you know if it fixed the SQLite3 issue.

flesh420x · August 18, 2021, 11:25am

I also ran the commands provided by @jfb and the services are back from restarting the FTL. I am going to check the rotated logs still, but here's hoping the DB issue is resolved.

flesh420x · August 18, 2021, 11:52am

The tail is definitely better than it was. FAR less traffic. Thanks again.

DL6ER · August 18, 2021, 12:19pm

The important thing we want to check here is if the issues really happened at the same time (= once caused the other) or if the database issue might have existed for longer.

Also, what is the output of

date -d "@$(sqlite3 /etc/pihole/pihole-FTL-old.db "SELECT MAX(timestamp) FROM queries;")"

(already adapted to the renamed database)?

How does this correlate to the DNS resolving issue you reported?

flesh420x · August 18, 2021, 12:22pm

Here is a view of pihole-FTL.log.3 and 2, respectively. Seems there were issues for a few days. I'm thinking at this point the DB being corrupt was the issue. The FTL logs have calmed down dramatically.

flesh420x · August 18, 2021, 12:25pm

Here is the output from the command you sent. I believe the DB issue correlates to causing the server to crash, thus not allowing the server to handle the DNS requests.

flesh420x · August 18, 2021, 12:35pm

I just realized you were maybe asking about the output and correlation to the DNS issue I reported, not the DB comments. The time output by the command is absolutely around when DNS stopped working.

All log times are in EST America/NewYork

DL6ER · August 18, 2021, 7:22pm

Could you check the log file exactly at the time when the Pi stopped working completely? The content of /var/log/pihole.log (or one of its rotated away copied) will be interesting as well.

From what you describe, I extracted that the issues have been there before the DNS outage happens and your current assumption is that the database issues finally ended up in the DNS outage, right?

flesh420x · August 18, 2021, 9:44pm

Correct. I am not sure when exactly it would have stopped working but you can see evidence of the issue in all 3 rotated FTL logs. I can upload them somewhere for you to check out, if you would like. I will dig, too, but I'm sure you'd find it faster.

My assumption is that the DB having RW issues ultimately cause the DNS/FTL service to fail.!

DL6ER · August 19, 2021, 6:59am

Your second screenshot is interesting because time jumps forward and backwards again in the same log file. Maybe your clock is unstable and gets corrected very often...

Hmm, yes, maybe. We just haven't seen this before but I set up a testing Pi-hole yesterday whose database I corrupted manually. We'll see if it stops working at some point too.

flesh420x · August 19, 2021, 1:12pm

It looks like FTL is having issues again. I haven't had a down scenario, but the logs show DB locked and issues again.

pihole-FTL.log.1.targz (24.8 KB)

DL6ER · August 20, 2021, 9:31am

The database is locked (but not corrupted) and there are too many clients trying to access the API at once. Do you have any scripts accessing the Pi-hole API and maybe gathering data? Also, could you check if you have (zombie) PHP processes running? The PHP API processes should exit immediately after serving the content. If they don't, this could explain what you are seeing here.

flesh420x · August 20, 2021, 6:29pm

I did notice that the RPi4 had the wireless network connected. Product of when I moved to the M.2 SSD. I brought down it's Wi-Fi and removed the saved network so it won't reconnect. Since then, it seems there hasn't been any issues. I wonder if that's why there were too many clients connected? It was set to only listen on eth0 though.

I don't have any scripts running, at least none that I setup. Maybe all of this was a product of my microSD copy to SSD. /var/log/pihole-FTL.log.1 hasn't been updated since 22:55 on 8/19

DL6ER · August 20, 2021, 8:47pm

Did you copy the database while the device was running or did you shut down the system for this?

flesh420x · August 20, 2021, 9:28pm

Well, the DB would have been running and I didn’t even think about that until just now. I had to do a copy while the system was up. I suck haha

jbbrenner90 · August 27, 2021, 7:02am

i am having this issue as well with no luck of fixing.