True, but should this warrant an exception from rate-limiting? Rather not.
you're suggesting to
which means there will be no load on any configured resolver (server=). I wouldn't treat this as a rate limit violation, but the decission is yours.
I've disabled rate limiting on my system (RATE_LIMIT=0/0), so I'll find out soon enough if refusing the queries will cause problems.
I'm not yet sure what would be the best solution. If clients do not get a REFUSED
immediately, they should also not retry immediately - but only after some timeout. So it might be a better solution to silently drop queries during the busy database issue. Typical retry delays should be on the order of seconds until the first retry is triggered.
Drop the query but keep logging it, as before, so we can see it is happening.
If you're not sure, isn't it possible to code
- assume allowed (current situation)
- reply refused
- drop query
so we have a choice, and can test both options.
It is surely possible, just notably more work so it takes longer until feature realization.
Please try
pihole checkout ftl tweak/blocking_while_busy
pihole checkout web tweak/blocking_while_busy
These branches are based on the latest beta so anyone testing this should ensure you are checking out also the other the beta components before switching to this FTL variant.
The new config option is called
REPLY_WHEN_BUSY
with the accepted options
DROP
REFUSE
BLOCK
-
ALLOW
(the default)
You will not see anything special about ALLOW
(as before). However, you will see details in the Query Log for the other three options. They are shown in orange (in contrast to the others in red/green/gray) to highlight that something out of the ordinary has happened here.
ALLOW
BLOCK
REFUSE
DROP
The screenshot above reveals a retry timeout of five seconds for the DROP
method which is why it is still my favorite here.
I'll test this immediatlly, one quick question:
The database is busy problem only manifests during pihole -g and pihole -up. Is there a way to simulate a database busy condition, this to avoid running pihole -g all the time?
Thanks
Yes, run
BEGIN EXCLUSIVE;
on the /etc/pihole/gravity.db
database to start the busy simulation and
END;
to end it.
using drop...
something appears to be wrong, the web interface shows
the database shows:
the log also shows this, multiple identical queries at exactly the same time.
This very much looks like it hasn't worked. FTL should have logged
REPLY_WHEN_BUSY: Drop queries when the database is busy
during startup if it has read it correctly.
from the log:
[2021-08-18 11:57:25.030 26281M] ADDR2LINE: Enabled
[2021-08-18 11:57:25.031 26281M] REPLY_WHEN_BUSY: Drop queries when the database is busy
[2021-08-18 11:57:25.031 26281M] Finished config file parsing
[2021-08-18 11:57:25.033 26281M] Database version is 9
[2021-08-18 11:57:25.033 26281M] Resizing "FTL-strings" from 40960 to (81920 * 1) == 81920 (/dev/shm: 1.6MB used, 484.0MB total, FTL uses 1.6MB)
edit
lots of duplicate queries for various domains from various clients.
/edit
edit2
I assume the new status type 15 means dropped.
I switched back to dev, than to the new branches (tweak/blocking_while_busy)
With dev, I see:
[2021-08-18 12:55:17.212 11202M] domain_in_list("incoming.telemetry.mozilla.org", 0x73710670, whitelist): Database is busy, assuming domain is NOT on list
[2021-08-18 12:55:19.107 11202M] domain_in_list("inbox.google.com", 0x73713eb0, whitelist): Database is busy, assuming domain is NOT on list
[2021-08-18 12:56:21.068 11202M] domain_in_list("collector-hpn.ghostery.net", 0x1a198b8, blacklist): Database is busy, assuming domain is NOT on list
[2021-08-18 12:56:21.073 11202M] domain_in_list("collector-hpn.privacy.ghostery.net", 0x73713550, whitelist): Database is busy, assuming domain is NOT on list
With tweak/blocking_while_busy, I see (29 times the 3 messages)
[2021-08-18 13:04:36.644 14886M] Gravity database is busy (whitelist)
[2021-08-18 13:04:36.645 14886M] Gravity database is busy (blacklist)
[2021-08-18 13:04:34.628 14886M] Gravity database is busy (gravity)
and I find 29 status 15 entries:
the web interface shows:
so at least, drop appears to work.
I do however not understand where the duplicates suddenly come from, such as:
identical queries, identical time, both forwarded to unbound
-
Please add
DEBUG_QUERIES
andDEBUG_FLAGS
topihole-FTL.conf
to get more information inpihole.FTL.log
. We're looking for lines likeBlocking XXX as YYY is to be ZZZ (gravity database is not available)
where
ZZZ
should be one ofrefused
dropped
blocked
(there will be no message for the allow case, queries are just permitted in this case)
-
How do the corresponding lines in
/var/log/pihole.log
look like? This will help us identify if it was really the client which send two at the same time. -
Does this (duplicated queries) also happen for
REFUSE
BLOCK
andALLOW
? Does it only happen when the database is busy? -
Status 15 is (as you see on the web interface screenshot): "Database is busy". It does not imply that the query was dropped, just that it was handled specially (according to the user config). See my screenshots above - you should see how they were replied to in the
Reply
column. Drop =NONE
(because there was no reply),REFUSED
= (well)REFUSED
andBLOCK
= [depending on your blocking mode] (e.g.,IP
for the defaultNULL
blocking replying with0.0.0.0
).
Reduced the number of blocklists to allow the pihole -g time to be about 1 minute.
DEBUG_QUERIES
and DEBUG_FLAGS
added
created little script, ran script 4 times (DROP, REFUSE, BLOCK ALLOW):
#!/bin/bash
sudo service pihole-FTL stop
pihole -f
pihole -g
cat /var/log/pihole/pihole-FTL.log | grep 'is gravity blocked'
created archive with both logs for each:
drop.zip (14.0 KB)
refuse.zip (11.0 KB)
block.zip (10.6 KB)
allow.zip (13.3 KB)
Yes, I made sure there are duplicates in every zip/log
No, The duplicate requests are at the end of the log, after pihole -g has finished.
correct, but since you preffer drop, that's what I will be using, so I matched the grafana text accordingly.
We are not looking for "is gravity blocked" but rather "gravity database is not available". I haven't found this in any of your files.
I haven't found duplicates in drop.zip
. After gravity ran, the following domains are queried:
[2021-08-18 15:16:59.141 18321M] **** new UDP IPv4 query[A] query "ctldl.windowsupdate.com" from -:<client1>
[2021-08-18 15:16:59.150 18321M] **** new UDP IPv4 query[AAAA] query "ctldl.windowsupdate.com" from -:1<client1>
[2021-08-18 15:16:59.210 18321M] **** new UDP IPv4 query[A] query "crl.identrust.com" from -:<client1>
[2021-08-18 15:16:59.216 18321M] **** new UDP IPv4 query[AAAA] query "crl.identrust.com" from -:<client1>
[2021-08-18 15:17:10.769 18321M] **** new UDP IPv4 query[A] query "ssl.google-analytics.com" from -:<client2>
Neither does refuse.zip
contain any duplicates.
Next I checked allow.zip
and there we see some duplicates after the gravity run (the last three):
[2021-08-18 15:25:11.197 20626M] **** new UDP IPv4 query[A] query "ssl.google-analytics.com" from -:<client 2>#53530
[2021-08-18 15:25:36.200 20626M] **** new UDP IPv4 query[A] query "livetileedge.dsx.mp.microsoft.com" from -:<client 1>#57229
[2021-08-18 15:25:46.078 20626M] **** new UDP IPv4 query[A] query "userlocation.googleapis.com" from -:<client 2>#59625
[2021-08-18 15:25:51.898 20626M] **** new UDP IPv4 query[A] query "firefox.settings.services.mozilla.com" from -:<client 1>#57790
[2021-08-18 15:25:51.909 20626M] **** new UDP IPv4 query[A] query "firefox.settings.services.mozilla.com" from -:<client 1>#65422
[2021-08-18 15:25:51.918 20626M] **** new UDP IPv4 query[A] query "firefox.settings.services.mozilla.com" from -:<client 1>#65158
However, they are all correct. I left the port from which they sent the query at the end of the copied line - you can see that they are all different, i.e., those are independent queries from the clients themselves. As to why they are doing this ... I don't know. They arrived with about 10 milliseconds distance between them and, at first, I was confused why they aren't cached but you may have disabled caching and then everything behaves just as expected.
Similarly for block.zip
where the query to getgreenshot.org
was duplicated, however, these also were doubled queries from the clients themselves so nothing looks like a Pi-hole induced issue.
Your tests did not reveal anything unusual, however, it also doesn't seem you experienced the case we want to study here (query whilst database was busy).
changed that...
looks like the gravity run needs to be long enough to trigger this event, I ran all the tests again with all lists enabled.
I can't trigger this using BLOCK (didn't try ALLOW, you said there will be no messages) I even tried throwing a random list in DPT and run that during pihole -g, still no messages...
from DROP:
[2021-08-19 02:51:55.762 23632M] Blocking ssl.google-analytics.com as ssl.google-analytics.com is to be dropped (gravity database is not available)
[2021-08-19 02:51:55.764 23632M] Blocking ssl.google-analytics.com as ssl.google-analytics.com is to be dropped (gravity database is not available)
[2021-08-19 02:51:55.766 23632M] Blocking ssl.google-analytics.com as ssl.google-analytics.com is to be dropped (gravity database is not available)
[2021-08-19 02:51:55.769 23632M] Blocking ssl.google-analytics.com as ssl.google-analytics.com is to be dropped (gravity database is not available)
[2021-08-19 02:51:55.771 23632M] Blocking ssl.google-analytics.com as ssl.google-analytics.com is to be dropped (gravity database is not available)
[2021-08-19 02:51:55.773 23632M] Blocking ssl.google-analytics.com as ssl.google-analytics.com is to be dropped (gravity database is not available)
[2021-08-19 02:51:55.776 23632M] Blocking ssl.google-analytics.com as ssl.google-analytics.com is to be dropped (gravity database is not available)
[2021-08-19 02:51:55.779 23632M] Blocking ssl.google-analytics.com as ssl.google-analytics.com is to be dropped (gravity database is not available)
duplicates example in the DROP test: search for armmf
[2021-08-19 02:54:58.701 23632M] **** new UDP IPv4 query[A] query "armmf.adobe.com" from -:192.168.2.227#56094 (ID 194, FTL 193, src/dnsmasq/forward.c:1561)
[2021-08-19 02:55:03.008 23632M] **** new UDP IPv4 query[A] query "armmf.adobe.com" from -:192.168.2.227#56094 (ID 195, FTL 194, src/dnsmasq/forward.c:1561)
[2021-08-19 02:55:03.011 23632M] **** new UDP IPv4 query[A] query "armmf.adobe.com" from -:192.168.2.227#56094 (ID 196, FTL 195, src/dnsmasq/forward.c:1561)
[2021-08-19 02:55:03.016 23632M] **** new UDP IPv4 query[A] query "armmf.adobe.com" from -:192.168.2.227#56094 (ID 197, FTL 196, src/dnsmasq/forward.c:1561)
- armmf duplicate in REFUSE (search for 49157)
- ablink duplicate in BLOCK (search for 54907)
- jsdelivr duplicate in BLOCK (search for 59822)
- digicert duplicate in BLOCK (search for 60033)
BLOCK.zip (139.2 KB)
DROP.zip (27.7 KB)
REFUSE.zip (151.9 KB)
Just hoping I'm doing the correct tests and analysis...
Everything still looks fine - except the very fast retries when we want to drop. The duplicate queries are here client-side also.
I pushed another update (FTL version should be vDev-2edd512
) that should prevent the almost immediate replies (few ms delay) you've seen in DROP
mode for ssl.google-analytics.com
above.
edit 6c6188f1
would be okay as well. I fixed a small glitch with TCP queries not always being dropped.
installed
Current Pi-hole version is development v5.3.1-100-g98867d8d
Current AdminLTE version is tweak/blocking_while_busy v5.5.1-58-gb4a26d3
Current FTL version is tweak/blocking_while_busy vDev-2edd512
testing may take a while...
I've run pihole -g like twenty times now, the text 'gravity database is not available' doesn't appear in the log file anymore...
DROP.zip (40.1 KB)
to ensure something needed to be dropped, I ran
#!/bin/bash
domains=("userlocation.googleapis.com." "ssl.google-analytics.com." "incoming.telemetry.mozilla.org." "settings-win.data.microsoft.com.")
# initialize RANDOM only once, to avoid using the same entry evry time
RANDOM=$$$(date +%s)
while :; do
selecteddomain=${domains[ $RANDOM % ${#domains[@]} ]}
dig @192.168.2.57 A ${selecteddomain}
sleep 5
done
on another pi
Although I didn't get a notification from this topic, I ran pihole -up this morning and got an update.
Current Pi-hole version is development v5.3.1-100-g98867d8d
Current AdminLTE version is tweak/blocking_while_busy v5.5.1-58-gb4a26d3
Current FTL version is tweak/blocking_while_busy vDev-6c6188f
after running the update, I checked for status type 15, and yes, their back 69 entries after the new FTL became active, 15 unique domains
DEBUG_QUERIES
and DEBUG_FLAGS
aren't active anymore (space on TMPFS is limited)
In the pihole-FTL log, I found (41 times):
[2021-08-20 09:09:57.387 9764M] Gravity database is busy (whitelist)
[2021-08-20 09:09:57.388 9764M] Gravity database is busy (blacklist)
and (90 matches)
[2021-08-20 09:11:40.007 9764M] ERROR: Gravity database not available
[2021-08-20 09:11:40.007 9764M] get_client_groupids("192.168.2.228") - SQL error step: database is locked
I assume this is normal, some FTL thread is still trying to acces the database.
I'm NOT sure how you determine what you consider to be a client initiated resubmitted query, and thus a normal situation. Could you please explain what to look for, to identify duplicates (that shouldn't happen) and resubmitted (that is normal)?
Will you be inviting additional testers, possibly make this part of dev?
Thanks, again, for your time and effort.