Database locked (again)

bcutter · July 1, 2023, 4:26pm

Expected Behaviour:

New network devices show up, diagnosis page entries can be deleted

Actual Behaviour:

No new devices showing up, diagnosis page entries can not be deleted

To shorten things up: database is locked once again. Pretty similar to

Found out accidentally because of:
grafik

grafik

/var/log/pihole/FTL.log:

[2023-07-01 15:00:00.193 29056/T29083] Notice: Database size is 3216.38 MB, deleted 9 rows
[2023-07-01 15:05:59.642 29056/T29084] WARNING: Long-term load (15min avg) larger than number of processors: 4.8 > 4
[2023-07-01 15:10:00.213 29056/T29083] Notice: Database size is 3216.42 MB, deleted 35 rows
[2023-07-01 15:11:00.024 29056/T29084] WARNING: Long-term load (15min avg) larger than number of processors: 7.2 > 4
[2023-07-01 15:11:00.052 29056/T29083] Encountered error while trying to store client in long-term database
[2023-07-01 15:11:00.053 29056/T29083] Statement finalization failed when trying to store queries to long-term database
[2023-07-01 15:11:00.053 29056/T29083] Error while trying to close database: database is locked
[2023-07-01 15:11:00.096 29056/T29083] SQLite3 message: database is locked in "DELETE FROM network_addresses WHERE lastSeen < 1593609060;" (5)
[2023-07-01 15:11:00.096 29056/T29083] ERROR: SQL query "DELETE FROM network_addresses WHERE lastSeen < 1593609060;" failed: database is locked (SQLITE_BUSY)
[2023-07-01 15:11:01.051 29056/T29084] add_message(type=6, message=excessive load) - SQL error step DELETE: database is locked
[2023-07-01 15:11:01.052 29056/T29084] Error while trying to close database: database is locked
[2023-07-01 15:12:01.214 29056/T29083] ERROR: SQL query "END TRANSACTION" failed: database is locked (SQLITE_BUSY)
[2023-07-01 15:12:01.215 29056/T29083] END TRANSACTION failed when trying to store queries to long-term database
[2023-07-01 15:12:01.215 29056/T29083] Keeping queries in memory for later new attempt
[2023-07-01 15:12:02.383 29056/T29083] ERROR: SQL query "END TRANSACTION" failed: database is locked (SQLITE_BUSY)
...
...
...

Matching that time:

sudo ll /etc/pihole/pihole-FTL.db
-rw-rw-r-- 1 pihole pihole 3,0G Jul  1 15:10 /etc/pihole/pihole-FTL.db

What I did to "fix" it (again):

Checked it's really the well-known "Database locked" issue with sudo cat /var/log/pihole/FTL.log | grep -c "database is locked" giving 617 which is > 0...
Ran sqlite3 /etc/pihole/pihole-FTL.db "PRAGMA integrity_check;" giving ok
Ran pihole -r and did a repair (not yet, waiting if lucky and a dev reads this first so we can probably test something)
Checked if deleting stuff from diagnosis log is working, new network devices show up in tools/network and if graphs are being updated. Yes to all of that:

(HOW TO) AVOID THIS IN FUTURE!

We don't know why this happens from time to time. But we know it happens regularly for plenty of users. Which is very frustrating! How to make any progress here and regain stability? Is it the size of the database getting critical (~ 3 GB)? Let's please shed some light on this. I don't want to continue running Pi-Hole that way.

I'd like to repost the very same words as in Database locked (one symptom: deleting ONE entry in Pi-hole diagnosis triggers deleetion of 1616 (invisible) messages which fails) - #3 by bcutter half a year ago:

I'm just wondering how I could detect this issue in future (right in time instead of many days later by accident!)... maybe

grep-ing the pihole-FTL.log regularly for the appearance of database is locked?

Can Pi-hole run some kind of self-check and announce this issue somehow? Possibly by replacing the "Active" with "Error!" or something like that?

I'd like to add another idea:
3. I could monitor if /etc/pihole/pihole-FTL.db did not update within the last x minutes and use that as alarm trigger. I think idea 1 is better than that.

Not sure if there's a non-interactive way of doing a repair (pihole -r stops and needs a decision if repair or reinstall is desired) so I could automate it in this way:

detect if database is locked
run a repair of pi-hole

That's my short-term target here to re-gain stability. Thank for supporting this as well as the long-term goal of avoiding this to happen at all

Update: ran pihole -r meanwhile and updated this post. Could not wait any longer due to urgent tasks needing a fully working Pi-hole.

rdwebdesign · July 1, 2023, 5:41pm

There are many possible causes for a Sqlite3 "database lock" message.

To find out what is causing this in your specific case, a debug log will be of great help.

jfb · July 1, 2023, 6:10pm

You have not provided a debug token as specified in the help template.

bcutter · July 1, 2023, 6:28pm

I don't want to start the old your debug logs contain way too much sensitive information which is why I don't provide them (in full version, at best after I could manually remove sensitive parts on my own) discussion once again.

E. g. for diagnozing a database lock error it is completely irrelevant which domains I have on my domain list and you also don't need a complete overview of my network infrastructure.

I'll provide debug logs (again) once you changed the debug creation process. How about letting users a) choose which data to include instead of collecting just everything or b) review and remove parts. For a privacy relevant FOSS this debug part always was the dark part of this project

Getting off-topic here I think. You could maybe at least answer
grafik

DanSchaper · July 1, 2023, 6:34pm

Really? It's right there for you to look at.

github.com/pi-hole/pi-hole

advanced/Scripts/piholeDebug.sh

6a45c6a8e

#!/usr/bin/env bash
# Pi-hole: A black hole for Internet advertisements
# (c) 2017 Pi-hole, LLC (https://pi-hole.net)
# Network-wide ad blocking via your own hardware.
#
# Generates pihole_debug.log to be used for troubleshooting.
#
# This file is copyright under the latest version of the EUPL.
# Please see LICENSE file for your rights under this license.

# shellcheck source=/dev/null

# -e option instructs bash to immediately exit if any command [1] has a non-zero exit status
# -u a reference to any variable you haven't previously defined
# with the exceptions of $* and $@ - is an error, and causes the program to immediately exit
# -o pipefail prevents errors in a pipeline from being masked. If any command in a pipeline fails,
# that return code will be used as the return code of the whole pipeline. By default, the
# pipeline's return code is that of the last command - even if it succeeds
set -o pipefail
#IFS=$'\n\t'

This file has been truncated. show original

We ask for the logs to help us determine how to spend our free time helping you. If you don't want to provide that information then be prepared to have to do all of the work for yourself.

bcutter · July 1, 2023, 6:37pm

Oh man. I'm not talking about "how it is generated" but "what it contains" (on the nose: too much! for a privacy relevant project - that's my point). Really hope this is a lost in translation and not a "I really don't get why you care bout privacy that much" thing

DanSchaper · July 1, 2023, 6:41pm

I've done just about as much possible to balance the need to get enough information to not have to ask 100 questions to trace down something that could be solved with a quick lookover of the logs. The debug logs expire after 48 hours and really don't contain anything identifiable.

I'm happy you have high standards for your privacy. I have high standards for how I spend my free time, if you want to make the process difficult and time consuming then we'll just move on to help others. It's simply a matter of how 5 people can support for free on their own time.

jfb · July 1, 2023, 6:46pm

The debug log is available for your inspection. Use that the same way the devs will, to help diagnose your problem.

As noted in the header of each debug log:

This process collects information from your Pi-hole, and optionally uploads it to a unique and random directory on tricorder.pi-hole.net.

The intent of this script is to allow users to self-diagnose their installations. This is accomplished by running tests against our software and providing the user with links to FAQ articles when a problem is detected. Since we are a small team and Pi-hole has been growing steadily, it is our hope that this will help us spend more time on development.

bcutter · July 1, 2023, 7:04pm

Yes, so let's look at all the network specific stuff trying to figure out why the database locks every now and then - and does so for other users too, where few topics by the way also never were solved.

I ran pihole -d -c (while -c only seems to do few regular integrity checks which I did manually - but interestingly it found something ).

How to provide the reviewed log file without uploading the complete one to tricorder.pi-hole.net? In theory, could one

a) run pihole -d and wait until it finishes and asks for upload - not answering this yet
b) review and manually edit pihole_debug.log
c) turn back to a) and now upload it

and the edited version (step b) will be uploaded? In other words: will it upload the /var/log/pihole/pihole_debug.log as it is at the time I say "yes, upload" - or is there a cached version or sth.?

rdwebdesign · July 1, 2023, 7:08pm

Probably because the users didn't provide a full debug log and there wasn't enough information to find out what was causing the issue.

DanSchaper · July 1, 2023, 7:11pm

The debug process is a general overview of everything, it's not going to be specific to every different use case and ask 15 questions on what might or might not be seen as relevant to that specific run. Again, this is trying to cover the widest range of possibilities and cases in a single function. I've really only had 2 or 3 people take offense to the process over the many thousands of times the process has been used.

It's great that you want to keep your information private but then you get the trade off of doing the debug work yourself.

You can pipe the redacted file output to pihole tricorder and get a token url:

cat FILENAME | pihole tricorder

bcutter · July 1, 2023, 7:14pm

Now we're finally talking. Thanks, will provide it shortly.

bcutter · July 1, 2023, 7:40pm

Upload successful, your token is: https://tricorder.pi-hole.net/KvkiXDlO/

To minimize your time effort (I appreciate you guys taking the time to have a look at it at all as I know you're busy constantly pushing this project forward), following information from my pre-screening:

Transparency: What I mainly discovered regarding privacy relevant information and marked with [***REMOVED***]:

local storage information
server name
local DNS custom list content
domain list names
DNS query log
admin email address

Findings/Results:

Originally 1.234 lines, now 806 lines.
Few false positives in "tail of FTL.log" section regarding "database is locked" as it is locked during running "pihole -d -c" (database integrity checks)
Interesting part is in section "Integrity errors in /etc/pihole/pihole-FTL.db found." I think. But it also only tells THERE ARE integrity errors, but not WHY or WHEN they occured.

For now I'd try to fix the integrity errors somehow which is more a database job than a network/DNS task I think.

Anyway, looking forward to your response and recommendations.

jfb · July 1, 2023, 7:55pm

-rw-r--r-- 1 root root 1,3K Jul  1 18:27 /etc/pihole/pihole-FTL.conf
   MAXDBDAYS=1095

This is contributing to the very large size of your query database (3GB) , which may be contributing to the formation of errors shown in your debug log. You have redacted the available disk space, so I can't rule that out as a potential problem.

Try the following as a troubleshooting measure:

Run these commands to move the existing query database to a new file name and create a fresh database.

sudo service pihole-FTL stop

sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL-old.db

sudo service pihole-FTL start

This will clear any errors and reduce the size of the database immediately. From this starting point, see if your problem persists.

bcutter · July 1, 2023, 8:12pm

Yes that's my statistics demand. Didn't see an allowed maximum or recommendation here so running it this way for a long time. Are 3 GB an issue for a 32 bit OS?
Storage is not an issue, plenty of it available.

That will basically park the old a bit rusty car away without having live access to it anymore (yes I still have the keys but once the new car is in the garage, I'll likely not use the old one anymore), instead I start with a new, clean one. That's not what I consider as "fix" but as swap, like today's mechanics do a lot

Instead of throwing away all the history I more tend to

a. try to fix the integrity errors (while Pi-hole is offline of course) - it's a simple SQLite database afaiks...
b. set up an individual alarm system to get notified immediately if database locks occur again

This way (especially with b) I can see over time if there's a pattern (always when server load is high, when certain tasks are run on the host, power outages (impossible) etc. etc.). This year after January 7th and July 1st there've not been many incidents to start with pattern recognition. But starting from scratch is not an attractive option in my eyes.

How can I non-interactively repair (not re-configure!) a pi-hole installation? pihole -r is exclusively interactive.

jfb · July 1, 2023, 8:24pm

The repair process won't fix a query database. It repairs the installation of the Pi-hole software.

The repair option is a subset of the installer script.

github.com/pi-hole/pi-hole

automated%20install/basic-install.sh

6a45c6a8e

#!/usr/bin/env bash
# shellcheck disable=SC1090

# Pi-hole: A black hole for Internet advertisements
# (c) Pi-hole (https://pi-hole.net)
# Network-wide ad blocking via your own hardware.
#
# Installs and Updates Pi-hole
#
# This file is copyright under the latest version of the EUPL.
# Please see LICENSE file for your rights under this license.

# pi-hole.net/donate
#
# Install with this command (from your Linux machine):
#
# curl -sSL https://install.pi-hole.net | bash

# -e option instructs bash to immediately exit if any command [1] has a non-zero exit status
# We do not want users to end up with a partially working install, so we exit the script

This file has been truncated. show original

bcutter · July 1, 2023, 8:28pm

That's still not an answer on the simple question if a repair is possible non-interactively
Oh well running pihole -r and choosing repair "fixed" all issues (symptoms of the root cause - a locked database) twice already while at the same time keeping the query database. I have no idea how, but that's the output.

Bucking_Horn · July 1, 2023, 9:44pm

Do you use NFS mounts that would hold Pi-hole's database files?

Did the file jump to grow to that size only recently, or has it always been about that size?

Your debug log shows that you've enabled Pi-hole's Conditional Forwarding, which may result in a partial DNS loop, depending on your CF target's configuration.
Did you switch or change that only recently?

A partial DNS loop may trigger ocassional short spikes in CPU load on certain requests and excessive logging of repeated requests for identical domains, which could have contributed to a growing log size, and also to the long term load warnings in your debug log. Redacted parts of your debug log may have allowed us a glimpse at actual queries, which could have revealed such a loop.

However, those loops are also guarantueed to trigger a specific dnsmasq warning, which is completely absent from your debug log, while there are almost 30 long term load warnings.

This would indeed suggest another cause for the load (and your debug log shows that there a quite a few other processes running besides Pi-hole).

Would you be able to attribute those load peaks to some other process's activity from the times of those respective log messages?

Those warnings can be reduced to two variants:

*** [ DIAGNOSING ]: Pi-hole FTL Query Database
(...)
    TEXT value in query_storage.domain
    TEXT value in query_storage.additional_info

Let's have a look - what's the output of:

pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT domain, max(timestamp), count(*) FROM query_storage WHERE typeof(domain) = 'text' GROUP BY 1 LIMIT 10;"

pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT additional_info, max(timestamp), count(*) FROM query_storage WHERE typeof(additional_info) = 'text' GROUP BY 1 LIMIT 10;"

But together with long term CPU load warnings and the database lock messages, that also makes me wonder whether you would run some third-party scripts that interact with Pi-hole's query database?

bcutter · July 1, 2023, 10:40pm

No, everything related to Pi-hole is local.

In January on the last incident (database is locked) it was roughly 2.7 GB.

No, nothing on Pi-hole's setup has changed recently, only regular maintenance (updates).

No need to speculate here: on the same host running Pi-hole there are other services, which from time to time produce a bit of load. Nothing too exhaustive/excessive (regularly on cron-jobs like at midnight, which does not match Pi-hole's load warnings); Pi-hole's load warnings are way to aggressive imo (getting off-topic).
Interestingly, there's no pattern (database locks don't match a particular "high load situation"). But to be honest there are too little incidents to give that last sentence weight.

1st statement:

|1644755728|10826
(null)|1635882387|4
(null).home.net|1635882387|2
*google.com|1641749868|36
---n6n371pfsn2q0x.66v2aeydyw-ukw6--0hqg.com|1608811725|2
-15zn4p3r0mic800d.3yghrd1aih4k0.com|1633799527|2
-4yq85be2ijpp.8ehuh7apyat.com|1624043562|2
-5annkty.vtidh9mc7tud9g2p-k5n4.com|1619440260|2
-5tc8ofjjf-2.2x-do8pnas.com|1632427312|2
-acgvvpf05.bt5q1v5k87g1fh9.com|1620672117|2

replaced actual one with "home.net"
no idea what domains the others are - are those actual domains? looks a bit like darknet or malformed stuff

2nd statement:

|1644756031|19965784
17|1644755728|10719
a1051.b.akamai.net|1622311064|30
aax-eu-retail-direct.amazon-adsystem.com|1644753708|6545
ads4.admatic.com.tr|1635729223|1
adserver-2084671375.us-east-1.elb.amazonaws.com|1644623103|5
adservingvpc-1239633584.eu-west-1.elb.amazonaws.com|1642971853|4
ams02-convex.dotomi.com|1601803741|1
ams1-ib.adnxs.com|1644751851|27
api.dropbox.com|1622331957|9

what to do with that output/this information?

I don't think there's something accessing the query database directly. What I use around the Pi-hole eco-system is:

PADD - locally, very rarely
Pihole adlist tool - locally, maybe 5 times a year
Home Assistant integration - doing API calls, constantly

chrislph · July 1, 2023, 10:52pm

Finally, as in a FAQ from 6 years ago.

It's still a very useful step because you have at least 100 errors in your query database, a warning for excessive load and ongoing locked database errors. Clearly something is going wrong with your database, and this is not being widely reported and appears to be unique to your deployment, since you make a point of saying that it's happened again.

That makes me want to pay close attention to what else you have running alongside the Pi-hole processes, and how they are behaving, especially in regard to resource usage and i/o spikes.

Resetting the database will allow the symptom to be reset to look for causal factors if it happens again, eg other processes' resource and i/o usage, file size growth rates, voltage problems, power outages, storage failure, inadequate storage performance, certain applications or certain times triggering the problem, etc etc. These will help home in on the cause.

You might find the atop package useful. It works like top or htop but also logs processes, cpu, memory and storage and lets you explore usage historically. You can go back hours, days or weeks to explore usage in 10 minute chunks. There are various guides available, eg this and this. If / next time you get database corruption you can look back and analyse what was going on before it happened.

$ apt show atop