Occasionally piHole stops responding to dns queries

Elleni · April 25, 2021, 10:03pm

Expected Behaviour: answers dns queries

-operating system: piHole installed on debian. The only other service I installed on this vm is coturn.
-hardware: ProxMox VM

Actual Behaviour:

Occasionally piHole stops replying to dns queries. Webinterface is still available and login into shell I can confirm that ping still works, so local dns resolution is working. I first thought, it might have something to do with ressources so I assigned more RAM (2GB) but ProxMox interface shows only 891MB are used, so it should not be that. The VM is running for 17 days now. Today 4 o'clock in the morning I started getting mail warnings from another server because it could not download something. Loging into that server (has piHole as its dns server configured) I can confirm that ping any adress does not get any reply.

/etc/init.d/pihole-FTL status -> running
/etc/init.d/pihole-FTL restart -> not stopped - may still be shutting down or shutdown failed - killing now

As soon as the service is started again, piHole replies to dns queries. How can this be fixed?

Debug Token:

https://tricorder.pi-hole.net/d3ehe51rnf

jfb · April 25, 2021, 10:43pm

This is about the time the weekly gravity update ran.

-rw-r--r-- 1 root root 1755 Feb 20 02:38 /etc/cron.d/pihole
   48 3   * * 7   root    PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updateGravity >/var/log/pihole_updateGravity.log || cat /var/log/pihole_updateGravity.log

What are the contents of /var/log/pihole_updateGravity.log

Elleni · April 26, 2021, 12:15am

Thanks for your quick reply.

jfb · April 26, 2021, 1:15am

That looks normal. FTL restarted normally.

Any entries about that time in /var/log/pihole-FTL.log or /var/log/syslog?

Elleni · April 26, 2021, 7:19am

In /var/log/pihole.1.log I only see the pause. Last query was replied 03:49:04. Then it started answering queries after service restart.

In /var/log/pihole-FTL.log.1 I see something:

At 3:50 there is an attempt to reload cache then db_set_property_FTL called but database is not available, db_update_counters called but db not available, parse_neighbour_cache db not available, getNamefromIP db not available.

/var/log/syslog.log.1 does not show any error arround that time. Just the updateChecker and updateGravity entries.

Elleni · April 27, 2021, 7:06am

I hope, there is anything I can do to solve this. Or at minimum a workaround like creating a cronjob which restarts the service, though I would prefer finding the source of the error instead of workarounding this.

Anything else, I can provide to find out why the db suddenly is not available after 17 days uptime? Could it have to do with dns cache size not being big enough? Thanks in advance for your apreciated help

Bucking_Horn · April 27, 2021, 10:59am

As you already stated, your database seems to be inaccessible at times:

*** [ DIAGNOSING ]: contents of /var/log

-rw-r--r-- 1 pihole pihole 780 Apr 25 05:00 /var/log/pihole-FTL.log
   -----tail of pihole-FTL.log------
   [2021-04-25 03:00:00.333 484/T488] Notice: Database size is 65.25 MB, deleted 210 rows
   [2021-04-25 03:50:00.048 484M] Reloading DNS cache
   [2021-04-25 03:50:00.389 484/T488] db_set_FTL_property(1, 1619315342) called but database is not available!
   [2021-04-25 03:50:00.389 484/T488] db_update_counters(3, 0) called but database is not available!
   [2021-04-25 03:50:00.390 484/T488] parse_neighbor_cache() - Database is not available
   [2021-04-25 04:00:20.083 484/T490] getNameFromIP("10.0.57.20") - Database not available
   [2021-04-25 05:00:20.825 484/T490] getNameFromIP("10.0.57.20") - Database not available

While likely not related, your debug log also shows some potential connectivity problems:

*** [ DIAGNOSING ]: Name resolution (IPv4) using a random blocked domain and a known ad-serving domain
[✗] Failed to resolve id0720.icu via localhost (127.0.0.1)
[✗] Failed to resolve id0720.icu via Pi-hole (10.0.57.200)
[✓] doubleclick.com is 216.58.215.238 via a remote, public DNS server (8.8.8.8)

Running pihole -r with Reconfigure would update your Pi-hole for use of the correct IP address.

I also noticed some zabbix and turnserver processes in your log.
While the latter may potentially be quite resource intensive if heavily used, the former has been observed to exhaust memory and disk space by issuing queries in the tens-of-thousands per hour for some of our users.

You could probably try to evaluate whether those processes would contribute to your issue, e.g. by hosting them on a different device for a while.

Elleni · April 27, 2021, 1:46pm

Thanks for your observations. I did a pihole -r.

You are right about the two mentioned services. Coturn should run on this machine, but is hardly ever used. In fact our nextcloud instance is very small. We are a compagny with less then 10 employes. And during the last some weeks I can even confirm that there were no talk sessions (thats why I have installed coturn for) running at all. As for zabbix - this is the zabbix client/agent for the server to be monitored only. I did not install zabbix server on this piHole VM.

How can I find out if one of these services is causing this? How would you proceed? I'd rather not like to remove the coturn service, and I'd rather like to monitor this vm with zabbix.

Speaking of zabbix - I could send you a html site of the zabbix screens during relevant time. But I do not see any significant load in cpu/ram/disk. The only small peak is during the FTL-Update 3.50 o'clock.

As I have seen that there is an upgrade available, I decided to update/upgrade debian and pihole itself, and rebooted.

If you have any hint howto fix this and or if you wish to see more informations, I'd happily provide them.

Else I would reboot the VM and see if this occurs again, and report back.

The ressources assigned to this vm are as follows:

2 GB RAM - average usage: not even ~50% - most of the time about 800 MB.
16 GB diskspace - 12Gb available/free used ~17%

On the other hand I have a second piHole vm on another host with the same blacklists that one does not have those problems, and it does not have coturn and zabbix agent installed (yet), so don't get me wrong. I do not deny that one of these services could be the culprit, but I just don't know where to start digging to get a stable pihole again while being able to run those two additional services mentioned.

What I could do is wait for some weeks to see if the problem is reproduced and then try disabling ex. the zabbix agent and then once again check if the problem then disapears and so on.

What I found a bit strange is that restarting the pihole-FTL did not work out of the box, when the problem was there. So there was this message about the service being killed - see first post. Can this give a hint?

Elleni · April 28, 2021, 6:54am

Is it possible that there are too many lists it configured in piHole? Or that I have to increase some sort of value like max open files or disk cache size or something similar? Or is there some debug logging to enable to get more information?

Bucking_Horn · April 28, 2021, 7:16pm

The way you describe it, your TURN server likely isn't involved since there was no activity at the time of failur. If you'd suffer from the high volume DNS quereis by zabbix I mentioned, you'd have noticed them in your query log.

The only possible lead is jfb's chronological correlation of the time the failure first registered (3:50) with the time of Pi-hole's weekly adlist update (3:48), and the only other issue from your log was the resolution failure due to a wrong IPv4 address configuration.

I currently have no working theory why your database would become inaccessible. Your system logs not containing any errors and the failure only having occured once so far do not make that any easier.

Or did it reoccur yet?

Elleni · April 30, 2021, 6:59am

It occasionally - maybe once a month - had reoccured before. I will report back when it happens the next time then. Is there anything specific to check as soon as it happens again?

system · May 21, 2021, 6:59am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.