Disabling Pi-Hole Breaks DNS for ~1 min

Please follow the below template, it will help us to help you!

Please ensure that you are running the latest version of the beta code.
Check.

Problem with Beta 5.0:
Per the title, if I disable Pi-Hole for a set time, DNS is broken for at least 1 minute before operations resume as normal. What used to be a completely seamless transition now has an abrupt network disruption.

Raspberry Pi Zero

Debug Token:
https://tricorder.pi-hole.net/i4qqgfo8tf

To get a feel for the volume of activity and number of domains on blocklist, what is the output of the following commands from the Pi terminal:

echo ">stats" | nc localhost 4711

ls -lha /etc/pihole/*.db

1 Like
domains_being_blocked 1629768
dns_queries_today 50958
ads_blocked_today 18527
ads_percentage_today 36.357391
unique_domains 2572
queries_forwarded 16482
queries_cached 15949
clients_ever_seen 29
unique_clients 29
dns_queries_all_types 50958
reply_NODATA 307
reply_NXDOMAIN 4
reply_CNAME 2865
reply_IP 15079
privacy_level 0
status enabled
---EOM---
-rw-rw-r-- 1 pihole pihole 122M Jan 26 12:02 /etc/pihole/gravity.db
-rw-r--r-- 1 pihole pihole 2.3M Jan 24 18:05 /etc/pihole/macvendor.db
-rw-r--r-- 1 pihole pihole 2.8G Jan 26 15:11 /etc/pihole/pihole-FTL.db

From your debug log

[2020-01-26 11:24:10.824 9059] Blocking status is disabled
[2020-01-26 11:24:10.834 9059] SQLite3 message: file unlinked while open: /etc/pihole/gravity.db (28)

Can you confirm that the file /etc/pihole/gravity.db is deleted when disabling Pi-hole? This should not happen...

1 Like

It appears to remain unmodified through the disable event

Before:
-rw-rw-r-- 1 pihole pihole 122M Jan 27 03:30 /etc/pihole/gravity.db

During/After:
-rw-rw-r-- 1 pihole pihole 122M Jan 27 03:30 /etc/pihole/gravity.db

Okay, please open two terminals and open

tail -f /var/log/pihole-FTL.log

in one of them and run

pihole disable

in the other.

Once you run the disabling command, the tail window should show you some lines incoming like

[2020-01-26 11:30:31.296 9059] Reloading DNS cache

and

[2020-01-26 11:32:43.692 9059] Compiled 0 whitelist and 2 blacklist regex filters in [...] msec

What is the delay in between these two messages? Does name resolution work after the Compiled ... message?

Only if the answer to the latter is no: Please check /var/log/pihole.log in addition. Do you see the new queries (which are not being answered) incoming there at all?

1 Like

Disabling and re-enabling pi-hole:

[2020-01-27 12:44:48.649 9059] Reloading DNS cache
[2020-01-27 12:44:48.649 9059] Blocking status is disabled
[2020-01-27 12:46:12.284 9059] INFO: No regex whitelist entries found
[2020-01-27 12:46:12.389 9059] Compiled 0 whitelist and 2 blacklist regex filters in 110.5 msec
[2020-01-27 12:48:03.801 9059] Reloading DNS cache
[2020-01-27 12:48:03.801 9059] Blocking status is enabled
[2020-01-27 12:49:34.246 9059] INFO: No regex whitelist entries found
[2020-01-27 12:49:34.352 9059] Compiled 0 whitelist and 2 blacklist regex filters in 111.6 msec

The DNS downtime coincides fairly perfectly with the gap between those two events, with names resolved as soon as the Compiled message appears

Edit: the answer to your question is yes, so the below may or may not be useful - keeping it anyways since it's already been run.

from /var/log/pihole.log I get:

Jan 27 12:46:13 dnsmasq[9059]: read /etc/hosts - 4 addresses
Jan 27 12:46:13 dnsmasq[9059]: failed to load names from /etc/pihole/custom.list: No such file or directory
Jan 27 12:46:13 dnsmasq[9059]: read /etc/pihole/local.list - 4 addresses
Jan 27 12:49:35 dnsmasq[9059]: read /etc/hosts - 4 addresses
Jan 27 12:49:35 dnsmasq[9059]: failed to load names from /etc/pihole/custom.list: No such file or directory
Jan 27 12:49:35 dnsmasq[9059]: read /etc/pihole/local.list - 4 addresses
Jan 27 12:51:17 dnsmasq[9059]: read /etc/hosts - 4 addresses
Jan 27 12:51:17 dnsmasq[9059]: failed to load names from /etc/pihole/custom.list: No such file or directory
Jan 27 12:51:17 dnsmasq[9059]: read /etc/pihole/local.list - 4 addresses

There appears to be gaps during the periods between

Yes, in between FTL is re-reading the lists. I have never seen this taking more than (at most!) very few seconds. I wonder if this has to do with your 30 clients your tiny devices handles. Could you put

DEBUG_DATABASE=true

into your /etc/pihole/pihole-FTL.conf, run pihole restartdns and repeat you test?
I wonder what the log may reveal to us.

PM'd the results, as they were too long to share here.

On a side-note, given that I'm the only one who seems to have experienced these issues (I keep hoping for my sanity that someone else chimes in with a "me too!") I'm starting to wonder if it's specific to my setup (RPi-0, running DietPi).

I may go with a clean install of Raspbian Lite to see if the issue persists, as I'd hate to take up too much developer time if I'm all alone!

It is worth mentioning though that everything was working flawlessly in v4, this issue is in some way related to the beta as far as I can tell.

Due to the per-client options, a lot of the internal machinery got more complex. The issue is that we have to reread this all when changing something somewhere, I just never saw any notable delay in any of my tests.

Okay, so let's look at your log excerpt. The disable command initiates this instantaneously:

[2020-01-27 16:26:43.842 638] Reloading DNS cache

Querying the configuration for all the clients is finished here:

[2020-01-27 16:26:44.172 638] Querying gravity database table vw_gravity

which is finished only two minutes later:

[2020-01-27 16:28:13.311 638] Querying gravity database table vw_regex_blacklist
[2020-01-27 16:28:13.316 638] Querying gravity database table vw_regex_whitelist

This is very strange as in here we only COUNT the number of distinct domains. This almost looks like you are missing the index on the gravity table?...

Could you please run

time sqlite3 /etc/pihole/gravity.db "SELECT count(DISTINCT domain) FROM gravity;"
time sqlite3 /etc/pihole/gravity.db "SELECT count(DISTINCT domain) FROM vw_gravity;"

and report the output?

Once done, does a run of pihole -g help resolving this issue (test again with above two lines)?

1 Like

I have the same problem. When enabling or disabling blocking, there is an approx. 1 minute delay before DNS resolution works.

Might be on the right track, as this is the result of the above 2 time commands:

time sqlite3 /etc/pihole/gravity.db "SELECT count(DISTINCT domain) FROM gravity;"
2378482

real    1m1.658s
user    0m32.470s
sys     0m3.541s

time sqlite3 /etc/pihole/gravity.db "SELECT count(DISTINCT domain) FROM vw_gravity;"
2378482

real    0m51.983s
user    0m43.606s
sys     0m3.199s

I'm guesing the Pi (I'm running a Pi3b) just can't easily handle parsing out the distinct records, when the blocklist is so large and with so many duplicates:

[i] Number of gravity domains: 4645932 (2378482 unique domains)
1 Like

I use a Raspberry Pi 2B and I have the same problem.
I am using Raspbian Lite.
The problem continues after run pihole -g.

time sqlite3 /etc/pihole/gravity.db "SELECT count(DISTINCT domain) FROM gravity;"
1085359

real 0m31.359s
user 0m26.293s
sys 0m4.994s

time sqlite3 /etc/pihole/gravity.db "SELECT count(DISTINCT domain) FROM vw_gravity;"
1085357

real 0m43.238s
user 0m38.431s
sys 0m4.676s

Number of gravity domains: 2272668 (1085357 unique domains)

1 Like

The problem, as I see it, isn't limited to disabling pihole. I noticed the delay (DNS resolution unavailable) when trying to get acquainted with group management. In the group management documentation, it says you need to enter the command pihole restartdns reload-lists, whenever changing things in group management. I never assumed this could affect other pihole features, such as disabling pihole, so I requested to add code, already available in the sources, to display a warning (count down), whenever this delay occurred.

You can read my request here, unfortunately, nobody confirmed the delay. You still might want to add a comment in the topic, this to ensure that you get at least a warning (on the console, if possible in the web interface), in case the problem cannot be resolved entirely (I'm hoping for a performance improvement, but don't know if this is possible).

Off topic:
I haven't been using pihole disable for a long time, since the feature (disabling / enabling pihole) is applied to pihole and thus to ALL clients.
In pihole v.4.3.2, disabling pihole resulted in simply commenting out the gravity list in /etc/dnsmasq.d/01-pihole.conf and inform pihole-FTL.
In pihole beta5, it looks like this is achieved (NOT TESTED) by changing the variable BLOCKING_ENABLED=true and inform pihole-FTL.
Both solutions result in a delay, practically unnoticeable in v4.3.2, aparentlly very much noticeable in beta5.

To overcome this problem (impact on ALL clients), I searched and found a solution for Windows devices.
My solution is explained here, it assumes using unbound, thus avoiding having to go outside the LAN for DNS resolutions.
Note that the solution (the windows command script) can be used without unbound, by simply using an other resolver (208.67.222.222 - OpenDNS). I even implemented (NOT explained in the topic, explained here) running the script, without having the UAC prompt every time.

The down side of this solution: you have to install the script + desktop shortcut on each windows device you want to give the option.

My result for pi zero. My reloading takes minutes. Zero being single core aint helping.

time sqlite3 /etc/pihole/gravity.db "SELECT count(DISTINCT domain) FROM gravity;"
1672511

real    1m32.540s
user    0m54.671s
sys     0m11.784s
pi@RaspPi:~$ time sqlite3 /etc/pihole/gravity.db "SELECT count(DISTINCT domain) FROM vw_gravity;"
1672507

real    1m45.270s
user    1m15.627s
sys     0m13.049s

Thanks for confirming the delay. I will have pihole -g compute the number once and simply store them in the database as numeric value. Your devices will be able to read this number instantly.

WIP

2 Likes

Done. Please try

pihole checkout core tweak/gravity_count
pihole checkout ftl tweak/gravity_count

and check if this reduces the delay you've been observing here. If it does not work initially (strange number shown on the dashboard), make sure you run pihole -g.

Requested branch "tweak/gravity_count" is not available for the ftl checkout

only having core checked out does not solve the problem.

I see the same.

Sorry, it seems my git push command got interrupted without me noticing. FTL should now be available.

After following your steps it works fine.

pihole disable

[2020-02-03 10:04:27.036 28138] Reloading DNS cache
[2020-02-03 10:04:27.036 28138] Blocking status is disabled
[2020-02-03 10:04:27.098 28138] INFO: No regex blacklist entries found
[2020-02-03 10:04:27.099 28138] INFO: No regex whitelist entries found
[2020-02-03 10:04:27.140 28138] Compiled 0 whitelist and 0 blacklist regex filters in 43.4 msec

pihole enable

[2020-02-03 10:06:02.006 28138] Reloading DNS cache
[2020-02-03 10:06:02.006 28138] Blocking status is enabled
[2020-02-03 10:06:02.069 28138] INFO: No regex blacklist entries found
[2020-02-03 10:06:02.071 28138] INFO: No regex whitelist entries found
[2020-02-03 10:06:02.100 28138] Compiled 0 whitelist and 0 blacklist regex filters in 32.4 msec

1 Like