Database locked (again)

I've done just about as much possible to balance the need to get enough information to not have to ask 100 questions to trace down something that could be solved with a quick lookover of the logs. The debug logs expire after 48 hours and really don't contain anything identifiable.

I'm happy you have high standards for your privacy. I have high standards for how I spend my free time, if you want to make the process difficult and time consuming then we'll just move on to help others. It's simply a matter of how 5 people can support for free on their own time.

2 Likes

The debug log is available for your inspection. Use that the same way the devs will, to help diagnose your problem.

As noted in the header of each debug log:

This process collects information from your Pi-hole, and optionally uploads it to a unique and random directory on tricorder.pi-hole.net.

The intent of this script is to allow users to self-diagnose their installations. This is accomplished by running tests against our software and providing the user with links to FAQ articles when a problem is detected. Since we are a small team and Pi-hole has been growing steadily, it is our hope that this will help us spend more time on development.

Yes, so let's look at all the network specific stuff trying to figure out why the database locks every now and then - and does so for other users too, where few topics by the way also never were solved.

I ran pihole -d -c (while -c only seems to do few regular integrity checks which I did manually - but interestingly it found something :tada:).


How to provide the reviewed log file without uploading the complete one to tricorder.pi-hole.net? In theory, could one

  • a) run pihole -d and wait until it finishes and asks for upload - not answering this yet
  • b) review and manually edit pihole_debug.log
  • c) turn back to a) and now upload it

and the edited version (step b) will be uploaded? In other words: will it upload the /var/log/pihole/pihole_debug.log as it is at the time I say "yes, upload" - or is there a cached version or sth.?

Probably because the users didn't provide a full debug log and there wasn't enough information to find out what was causing the issue.

1 Like

The debug process is a general overview of everything, it's not going to be specific to every different use case and ask 15 questions on what might or might not be seen as relevant to that specific run. Again, this is trying to cover the widest range of possibilities and cases in a single function. I've really only had 2 or 3 people take offense to the process over the many thousands of times the process has been used.

It's great that you want to keep your information private but then you get the trade off of doing the debug work yourself.

You can pipe the redacted file output to pihole tricorder and get a token url:

cat FILENAME | pihole tricorder
3 Likes

Now we're finally talking. Thanks, will provide it shortly.

Upload successful, your token is: https://tricorder.pi-hole.net/KvkiXDlO/


To minimize your time effort (I appreciate you guys taking the time to have a look at it at all as I know you're busy constantly pushing this project forward), following information from my pre-screening:

Transparency: What I mainly discovered regarding privacy relevant information and marked with [***REMOVED***]:

  • local storage information
  • server name
  • local DNS custom list content
  • domain list names
  • DNS query log
  • admin email address

Findings/Results:

  • Originally 1.234 lines, now 806 lines.
  • Few false positives in "tail of FTL.log" section regarding "database is locked" as it is locked during running "pihole -d -c" (database integrity checks)
  • Interesting part is in section "Integrity errors in /etc/pihole/pihole-FTL.db found." I think. But it also only tells THERE ARE integrity errors, but not WHY or WHEN they occured.

For now I'd try to fix the integrity errors somehow which is more a database job than a network/DNS task I think.

Anyway, looking forward to your response and recommendations.

-rw-r--r-- 1 root root 1,3K Jul  1 18:27 /etc/pihole/pihole-FTL.conf
   MAXDBDAYS=1095

This is contributing to the very large size of your query database (3GB) , which may be contributing to the formation of errors shown in your debug log. You have redacted the available disk space, so I can't rule that out as a potential problem.

Try the following as a troubleshooting measure:

Run these commands to move the existing query database to a new file name and create a fresh database.

sudo service pihole-FTL stop

sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL-old.db

sudo service pihole-FTL start

This will clear any errors and reduce the size of the database immediately. From this starting point, see if your problem persists.

3 Likes
  • Yes that's my statistics demand. Didn't see an allowed maximum or recommendation here so running it this way for a long time. Are 3 GB an issue for a 32 bit OS?
  • Storage is not an issue, plenty of it available.

That will basically park the old a bit rusty car away without having live access to it anymore (yes I still have the keys but once the new car is in the garage, I'll likely not use the old one anymore), instead I start with a new, clean one. That's not what I consider as "fix" but as swap, like today's mechanics do a lot :slight_smile:

Instead of throwing away all the history I more tend to

  • a. try to fix the integrity errors (while Pi-hole is offline of course) - it's a simple SQLite database afaiks...
  • b. set up an individual alarm system to get notified immediately if database locks occur again

This way (especially with b) I can see over time if there's a pattern (always when server load is high, when certain tasks are run on the host, power outages (impossible) etc. etc.). This year after January 7th and July 1st there've not been many incidents to start with pattern recognition. But starting from scratch is not an attractive option in my eyes.

How can I non-interactively repair (not re-configure!) a pi-hole installation? pihole -r is exclusively interactive.

The repair process won't fix a query database. It repairs the installation of the Pi-hole software.

The repair option is a subset of the installer script.

  1. That's still not an answer on the simple question if a repair is possible non-interactively :slight_smile:
  2. Oh well running pihole -r and choosing repair "fixed" all issues (symptoms of the root cause - a locked database) twice already while at the same time keeping the query database. I have no idea how, but that's the output.

Do you use NFS mounts that would hold Pi-hole's database files?

Did the file jump to grow to that size only recently, or has it always been about that size?

Your debug log shows that you've enabled Pi-hole's Conditional Forwarding, which may result in a partial DNS loop, depending on your CF target's configuration.
Did you switch or change that only recently?

A partial DNS loop may trigger ocassional short spikes in CPU load on certain requests and excessive logging of repeated requests for identical domains, which could have contributed to a growing log size, and also to the long term load warnings in your debug log. Redacted parts of your debug log may have allowed us a glimpse at actual queries, which could have revealed such a loop.

However, those loops are also guarantueed to trigger a specific dnsmasq warning, which is completely absent from your debug log, while there are almost 30 long term load warnings.

This would indeed suggest another cause for the load (and your debug log shows that there a quite a few other processes running besides Pi-hole).

Would you be able to attribute those load peaks to some other process's activity from the times of those respective log messages?

Those warnings can be reduced to two variants:

*** [ DIAGNOSING ]: Pi-hole FTL Query Database
(...)
    TEXT value in query_storage.domain
    TEXT value in query_storage.additional_info

Let's have a look - what's the output of:

pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT domain, max(timestamp), count(*) FROM query_storage WHERE typeof(domain) = 'text' GROUP BY 1 LIMIT 10;"
pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT additional_info, max(timestamp), count(*) FROM query_storage WHERE typeof(additional_info) = 'text' GROUP BY 1 LIMIT 10;"

But together with long term CPU load warnings and the database lock messages, that also makes me wonder whether you would run some third-party scripts that interact with Pi-hole's query database?

No, everything related to Pi-hole is local.

In January on the last incident (database is locked) it was roughly 2.7 GB.

No, nothing on Pi-hole's setup has changed recently, only regular maintenance (updates).

No need to speculate here: on the same host running Pi-hole there are other services, which from time to time produce a bit of load. Nothing too exhaustive/excessive (regularly on cron-jobs like at midnight, which does not match Pi-hole's load warnings); Pi-hole's load warnings are way to aggressive imo (getting off-topic).
Interestingly, there's no pattern (database locks don't match a particular "high load situation"). But to be honest there are too little incidents to give that last sentence weight.

1st statement:
|1644755728|10826
(null)|1635882387|4
(null).home.net|1635882387|2
*google.com|1641749868|36
---n6n371pfsn2q0x.66v2aeydyw-ukw6--0hqg.com|1608811725|2
-15zn4p3r0mic800d.3yghrd1aih4k0.com|1633799527|2
-4yq85be2ijpp.8ehuh7apyat.com|1624043562|2
-5annkty.vtidh9mc7tud9g2p-k5n4.com|1619440260|2
-5tc8ofjjf-2.2x-do8pnas.com|1632427312|2
-acgvvpf05.bt5q1v5k87g1fh9.com|1620672117|2
  • replaced actual one with "home.net"
  • no idea what domains the others are - are those actual domains? looks a bit like darknet or malformed stuff :slight_smile:
2nd statement:
|1644756031|19965784
17|1644755728|10719
a1051.b.akamai.net|1622311064|30
aax-eu-retail-direct.amazon-adsystem.com|1644753708|6545
ads4.admatic.com.tr|1635729223|1
adserver-2084671375.us-east-1.elb.amazonaws.com|1644623103|5
adservingvpc-1239633584.eu-west-1.elb.amazonaws.com|1642971853|4
ams02-convex.dotomi.com|1601803741|1
ams1-ib.adnxs.com|1644751851|27
api.dropbox.com|1622331957|9
  • what to do with that output/this information?

I don't think there's something accessing the query database directly. What I use around the Pi-hole eco-system is:

Finally, as in a FAQ from 6 years ago.

It's still a very useful step because you have at least 100 errors in your query database, a warning for excessive load and ongoing locked database errors. Clearly something is going wrong with your database, and this is not being widely reported and appears to be unique to your deployment, since you make a point of saying that it's happened again.

That makes me want to pay close attention to what else you have running alongside the Pi-hole processes, and how they are behaving, especially in regard to resource usage and i/o spikes.

Resetting the database will allow the symptom to be reset to look for causal factors if it happens again, eg other processes' resource and i/o usage, file size growth rates, voltage problems, power outages, storage failure, inadequate storage performance, certain applications or certain times triggering the problem, etc etc. These will help home in on the cause.

You might find the atop package useful. It works like top or htop but also logs processes, cpu, memory and storage and lets you explore usage historically. You can go back hours, days or weeks to explore usage in 10 minute chunks. There are various guides available, eg this and this. If / next time you get database corruption you can look back and analyse what was going on before it happened.

$ apt show atop

This is a user-configurable option. You can choose not to display these.

https://docs.pi-hole.net/ftldns/configfile/#check_load

1 Like

OK that one made me lol, thx :laughing: How do I debug my Pi-hole installation? is on top of my Pi-hole bookmarks now.

Resetting the database will do absolutely nothing regarding the load situation. It seems like "load" is the only trail we're following currently.

There's load, I know where it comes from, I have long-term records, I'm aware of and using tools like htop, iotop -aoP, bcmstat (Pi-specific), glances etc. - I just can't compare load to "database is locked" situations. No pattern (yet), not even a single match. Therefore atop might be of interest. But as it can take months for the next locked database situation (or even 3 years in case I would start from scratch) with several reboots in between.

Hmm, your debug log shows long-term load to be high for prolongued periods, at least from ~01:05 to 01:40, from ~02:40 to 03:25 and from ~20:07 to 20:52 on 2023-07-01.
The log also shows that Pi-hole is repeatedly and successfully writing to the database during those times. It's only to the end of the last time frame that writing to the database has failed (which prompts Pi-hole to keep the requests in memory and retry later).

Note that Pi-hole is only the messenger here:
It does only report the load, as observed by the OS.
Pi-hole reporting those messages does not mean that Pi-hole is causing the load (and in the absence of a DNS loop or excessively large number of client requests, I consider it very unlikely that Pi-hole has caused them).

Whatever that other high-load process (or yet another additional one) is: It is either interacting with Pi-hole's database or doing large amounts of filesystem operations at that time (which are slow to complete on an RPi's sd card and may well contribute to your observation).

Since write access seems to fail, you should also try to preclude sd card issues.

chrislph's advice should be useful in further investigating this.

The database structure was changed to use a view instead of a table in the past.
As your database goes back for 3 years, the text values you see may be remnants of the old database structure (the view caters for both variants, so information is presented correctly).
Those records should grow out of the database eventually.

The SQL statements seem to confirm that, as they only returned records well over a year old.

To be sure, we should also take a look at the most recent timestamps by running:

pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT max(timestamp) FROM query_storage WHERE typeof(domain) = 'text';"
pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT max(timestamp) FROM query_storage WHERE typeof(additional_info) = 'text';"

How is Home Assistant constantly interacting with Pi-hole?
What information is it trying to gather?
If you are not depending on that, could you try to disable that Pi-hole integration and see how that would affect your observation?

The "load situation" is not a Pi-hole problem. You're running lots of other things on your Pi-hole server and something appears to be periodically bottlenecking the system. To be blunt, that's now for you to diagnose using a process of elimination, tools like atop and the clues mentioned. Starting a new query database will give you a fresh baseline from which to start your diagnostics.

1 Like

That last one was from Pi-hole's debug check/log creation as mentioned and can be ignored.

System is running on a SSD.

  • I'm not a dev, it's part of the listed integration documentation.
  • It gets states, almost everything provided by the API.
  • I depend on that. Disabling it is not an option. And it is not needed I guess, at least because this is not the part producing load on the system. It just reads data from the Pi-Hole API - there should be absolutely no write attempts afaik.

Running
pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT max(timestamp) FROM query_storage WHERE typeof(domain) = 'text';"
gave: 1644756031 (equals 13.02.2022 - 13:40:31)

and pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT max(timestamp) FROM query_storage WHERE typeof(additional_info) = 'text';"
gave: 1644755728 (equals 13.02.2022 - 13:35:28)

Is that good?

I identified the high load task and could reduce the quantity and intensity by configuring that process. Long-term load records already show that improvement. Anyway, it won't go that low so the load warning will never be triggered. The whole system is running fine and there are no negative effects - next to rare hiccups (2 times this year) of Pi-Hole's database, which is very likely due to it's size.

I'll very likely

  • implement my custom database monitoring
  • consider reducing the MAXDBDAYS setting (stop Pi-Hole/FTL, copy/backup 3 years database, set MAXDBDAYS to 365, start Pi-Hole/FTL - database should shrink now - set MAXDBDAYS to 1095 again (with restarting FTL) and let it grow again)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.