DNS server should start quickly.
I'm running it on a raspbian vm under vmware esxi.
Traffic is quite high tho.
Actual Behaviour:
Each time that I try to update pi-hole the update goes quick and without problems.
The problems start after restarting the dns server again.
All looks fine but the web interface keeps giving me a "not working" sign and no stats appear.
This goes away after 30 minutes or so tho.
As the dns seems to go down sometimes I started looking around and found the FTL tail log.
Here I see that in the time that it is down it's doing a lot of resizing and remapping of tables
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:35.400 4268M] Resizing "FTL-strings" from 22691840 to (22732800 * 1) == 22732800 (/dev/shm: 75.7MB used, 517.0MB total, FTL uses 75.7MB)
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:36.391 4268M] Resizing "FTL-domains" from 13762560 to (689152 * 20) == 13783040 (/dev/shm: 75.7MB used, 517.0MB total, FTL uses 75.7MB)
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:39.599 4268M] Resizing "FTL-strings" from 22732800 to (22773760 * 1) == 22773760 (/dev/shm: 75.7MB used, 517.0MB total, FTL uses 75.7MB)
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:40.226 4268M] Resizing "FTL-domains" from 13783040 to (690176 * 20) == 13803520 (/dev/shm: 75.8MB used, 517.0MB total, FTL uses 75.8MB)
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:43.795 4268M] Resizing "FTL-strings" from 22773760 to (22814720 * 1) == 22814720 (/dev/shm: 75.8MB used, 517.0MB total, FTL uses 75.8MB)
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:44.076 4268M] Resizing "FTL-domains" from 13803520 to (691200 * 20) == 13824000 (/dev/shm: 75.8MB used, 517.0MB total, FTL uses 75.8MB)
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:44.968 4268M] DB warn: TYPE should not be 100
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:45.044 4268M] Resizing "FTL-queries" from 38207488 to (872448 * 44) == 38387712 (/dev/shm: 75.9MB used, 517.0MB total, FTL uses 75.8MB)
Jun 08 09:34:48 server-pihole pihole-FTL[4268]: [2023-06-08 09:34:47.942 4268M] Resizing "FTL-domains" from 13824000 to (692224 * 20) == 13844480 (/dev/shm: 76.0MB used, 517.0MB total, FTL uses 76.0MB)
I already lowered the amount of days to keep to 7 but it still needs ages to get started.
Maybe I should lower it to 1 as I don't really need to look up things.
But would this speed up the resizing/remapping aswell?
A quick fix was clearing the database then it start right away but I prefer that it stays running like it did for months before certain update.
Debug Token:
I was running the debugging diagnostics via the web interface but it hangs at *** [ DIAGNOSING ]: Dashboard headers
and the entire GUI seems to be unresponsive now so I'll get back to you with the token when I have it.
can I only restart the web interface without disrupting the dns service now that it's running again?
[?] Would you like to upload the log? [y/N] y
* Using curl for transmission.
* curl failed, contact Pi-hole support for assistance.
* Error message: curl: (22) The requested URL returned error: 500
[✗] There was an error uploading your debug log.
Please try again or contact the Pi-hole team for assistance.
No, it would just reduce the required amount of disk space for the database. Pi-hole reimports the latest 24 hours of history during a restart so I'd recommend trying disabling the database altogether - especially given that you said the web UI isn't working, either.
Because we need to import the DNS history so we can append new queries at the end. Making this parallel would require a lot of work and is usually not needed. What is causing the high delay is either:
Very many queries (multi millions per hour range),
Very slow processor, or
Very slow disk speed.
Having said that, the reason for any kind of slowness is typically that the hardware simply cannot process the amount of data. This would also fit to your observation
It's easier for us to answer this when you can give us a rough estimate of
What is the used hardware to run Pi-hole?
How many queries are there roughly (per hour or per day)?
How many clients is your Pi-hole serving approximately?
What do I need to do to disable the database and what's the bad part of it?
Just not being able to see what request are blocked/passed?
It runs on a small Lenovo PC as VMWare virtual machine.
It also has another VM running a 3CX VOIP PABX.
All worked fine for months till some certain update tho.
every 10 minutes I see around 8-12K requests in the stats.
55K since the restart an hour ago.
Hard to tell with all those tablets and smartphones these days that pollute the network.
But it's 2 subnets so < 500 and concurrent maybe 50+
You didn't mention any evictions so far?
Also, Pi-hole's embedded dnsmasq comes with a hard-coded maximum cache size of 10,000 entries.
How did you go about applying 50,000?
Probably not related, but your debug log shows that you use some *.local as your local/search domain name.
You should note that .local is reserved for use by the mDNS protocol and should NOT be used with DNS. While most modern OSs would come with mDNS support, Apple devices in particular would regularly employ mDNS for local name resolution and service discovery.
To get an idea how many different client IPs your Pi-hole has seen over time, please share the result of:
pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT count(*) FROM client_by_id;"
pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT count(*) FROM network_addresses;"
This can be caused by Pi-hole's upstream DNS servers not responding in time or not at all, or by a DNS loop of sorts.
Both may contribute to a (potentially vastly) exaggerated count of DNS requests, which may well have an impact on your observation of slow start-up times.
You should investigate this before you decide to disable the database.
Are Pi-hole's upstream DNS resolvers responding?
You should be able to tell by investigating the logs specifically for requests that didn't receive a reply.
Is 192.168.0.1 using Pi-hole as an upstream DNS resolver?
As you've enabled Pi-hole's Conditional Forwarding, this could have closed a DNS loop if your 192.168.0.1 would in turn use Pi-hole: Requests for unknown hostnames would then be bounced back and forth among Pi-hole and 192.168.0.1 ad infinitum, or until time-out or rate and concurrency limits would kick in.
And just to still my curiousity:
Given that your request count seems rather high, are you running your Pi-hole in some kind of campus or company environment?
192.168.0.1 is the Microsoft AD/DNS server on site A.
the 'problem' pi-hole is on site B.
The Microsoft server forwards to a pi-hole server on site A.
Requests from site B should not pass site A's microsoft dns as it will only end there for xxx.ourdomain.local requests. The requests on site A's pi-hole are a lot less.
This is a (multi site) institution for people with mental disability which can be compared to a small campus/school I guess.
Software on devices may generate requests for non-existent local or non-dot domains for various reasons (e.g. connectivity or captive portal checks from browsers, or mail address checks from mail clients).
Those would cause the DNS loop to close as described above.
In that case, you should have observed 150 requests for such an offending domain in short succession in your logs. i.e one identical request after the other.
Would that be the case for you?
EDIT:
Pi-hole has been reported to operate on larger sites like university campus's.
Chances are some secondary causes may impact or at least contribute to your observation, like that potential DNS loop we are trying to investigate.
you are probably refering to requests like this one
wpad.ourdomain.local
which are proxy testing requests or something like that done by browsers.
3542 since the last few hours which is a lot but not problematic I guess.
What I noticed is that 85% of the requests are now coming from _Gateway.
On the other side this is the IP address of the 2 gateways (lan & wifi) but not _Gateway.
But when I click on it it shows " showing all queries for client 192.168.10.10" which is correct.
(router is dhcp, dns & gateway and uses pi-hole for lookups, I need to test is adding the pi-hole IP as dns in the dhcp works on the hotspot. You can't connect to other devices in hotspot mode maybe the dns is an exeption)
Yes, that would be one example, and a count like that could indeed suggest a DNS loop, as client's usually request that not very often (perhaps once per session, if at all).
I'd expect there to be additonal similar domains as well, so the total count of looped queries
may be much higher.
Since your debug log shows that your DHCP server at 192.168.10.10 is correctly distributing the Pi-hole host machine's IP address as local DNS resolver, I'd expect the vast majority of DNS requests to originate directly from clients.
Yet you are observing 85% to originate from your gateway.
This would again support my suspicion of an active DNS loop - unless your router/gateway at 192.168.10.10 would indeed aggregate DNS traffic of the majority of clients in your network.
However, that could be still be the case in your scenario, as you mention two separate networks, and the bulk of your clients could well reside in your site A, which potentially aggregates DNS traffic on behalf of site A clients
Would you expect the majority of DNS requests to originate from site A's 192.168.0.0/24(?) network?
Do you run one Pi-hole machine for each of your networks?
You could use the following nslookups to verify if an unknown domain lookup triggers a loop, run from a client in either site's network each:
yes, I have a pi-hole on site A (192.168.0.16) and one on site B (192.168.10.3)
both forward ourdomain.local to 192.168.0.1 which is the microsoft AD/DNS/DHCP server.
the forwarder on that server is site A's pi-hole.
wired clients use 0.1 as dns
wireless clients use one of the gateway addresses of the router's dhcp ranges (varies between 40.10, 50.10 - 54.10)
The router has as system dns the pihole (0.16)
as mentioned before if hotspot mode allows connecting to 0.16 from that range/vlan aswell I could enter 0.16 in the dhcp settings then the stats will look more realistic and not just giving a bundled amount by gateway IP. And less hops too or stressing for the router too.
here some lookups but I don't know how we can detect a possible loop with that info.
I didn't ask for the results.
I suggested to use those lookups for unknown domains to verify a loop - one of yours returns an IP.
You should monitor Pi-hole's logs for those 150 looped requests while issuing those lookups, e.g. by runing pihole -t.
Also, you should see the max concurrent warning in Pi-hole's UI.
It frooze again after running fine for over a week.
I had the logging running all that time and it shows this
Jun 26 16:12:19: query[A] ocws.officeapps.live.com from 192.168.10.123
Jun 26 16:12:19: query[A] gopulse-evolve.lulululemon.com from 192.168.10.10
tail: /var/log/pihole/pihole.log: bestand is ingekort
last line can be translated to "file has been shorted"
I'm not sure if that means a storage problem or just a log file resizing after something crashed/ended.