PiHole-FTL Crashing in the middle of the night

Expected Behaviour:

Not Crashing

Actual Behaviour:

Crashing, web site is up, the FTL crashes. Trying to re-enable it fails, disabling and re-enabling fails, requires a full run of the command pihole -restartdns on the command line

Debug Token:

https://tricorder.pi-hole.net/0678r5pnw8

You have another DNS server running on port 53, and this is interfering with pihole-FTL on that port:

[53] is in use by named (https://docs.pi-hole.net/main/prerequisites/#ports)
[53] is in use by named (https://docs.pi-hole.net/main/prerequisites/#ports)

restarting the DNS resolves the problem every time, which seems to point at a Zombie process of sorts.
Since this issue, I've deleted the corrupt FTL database (from the errors this was the problem) and restarted it, the problem seems to have gone away.

It will come back when named (Bind) comes back and fights for port 53.

It won’t because the only other dns running in the pc is on a different eth and ip. The only thing using port 53 on the ip and eth0 for pihole is pihole.

Again, as I said previously, I have restarted pihole DNS from the ui successfully. The ui runs, pihole runs, but it seems the database getting somehow corrupted the pihole doing something between 11pm and 12am it crashes pihole-FTL and I suspect leaves a zombie process holding the port.

I haven’t restarted bind on the other ip because that is still running, my pihole is configured to be connected to a single ip and eth, it isn’t being stolen, otherwise pihole would not start at all.

If something else was taking the port it would not start at all. Hence I believe this to be a zombie process of pihole from the point it crashes due to a bad malformed database.

Deleting the database and restarting pihole resolved the issue and it hasn’t reoccurred, my other dns hasn’t been affected.

In short I believe I may have identified the problem.
I think pihole does some sort of update check , and dump of long term data into the database at night, this from the logs around the time of the issue. This does some kind of work on the database. In accessing the database the FTL crashes but leaves a zombie process locking port 53.
This takes the DNS element out, but leaves the UI intact. This is why I am able to use the interface on the screen, in order to restart the DNS successfully.

Restarting the DNS I believe uses the lock file to identify and kill the zombie process. It can then restart successfully.

Checking the logs I can see the database is corrupt. Trying to do a short term query log examination through UI, is successful. However, trying to do a long-term query throws exception errors that are on screen.

Since deleting the FTL database all is okay, both short and long term queries return without exception being thrown.

Therefore, either there is an error in the update checking script…. Or the service dumping short to long term storage.

Well, a good way to check would be to check what process has port 53 before you restart the DNS and be absolutely sure.

Dan, I suggest you read the messages above.

LesTHawk, I did.

You want to find out of there is a zombie process to confirm your suspicions then my advice is the way to go.

If you want to find what sort of process run automatically then you check /etc/cron.d/pihole, I'll quote that content here:

dan@raspberrypi:/etc/cron.d $ cat pihole
# Pi-hole: A black hole for Internet advertisements
# (c) 2017 Pi-hole, LLC (https://pi-hole.net)
# Network-wide ad blocking via your own hardware.
#
# Updates ad sources every week
#
# This file is copyright under the latest version of the EUPL.
# Please see LICENSE file for your rights under this license.
#
#
#
# This file is under source-control of the Pi-hole installation and update
# scripts, any changes made to this file will be overwritten when the software
# is updated or re-installed. Please make any changes to the appropriate crontab
# or other cron file snippets.

# Pi-hole: Update the ad sources once a week on Sunday at a random time in the
#          early morning. Download any updates from the adlists
#          Squash output to log, then splat the log to stdout on error to allow for
#          standard crontab job error handling.
53 4   * * 7   root    PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updateGravity >/var/log/pihole_updateGravity.log || cat /var/log/pihole_updateGravity.log

# Pi-hole: Flush the log daily at 00:00
#          The flush script will use logrotate if available
#          parameter "once": logrotate only once (default is twice)
#          parameter "quiet": don't print messages
00 00   * * *   root    PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole flush once quiet

@reboot root /usr/sbin/logrotate /etc/pihole/logrotate

# Pi-hole: Grab local version and branch every 10 minutes
*/10 *  * * *   root    PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker local

# Pi-hole: Grab remote version every 24 hours
46 12  * * *   root    PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker remote
@reboot root    PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updatechecker remote reboot

Oh, it will.

By default, Pi-hole's dnsmasq binds to all interfaces, but listens only on the one's it is configured to.
This will still invite conflicts for different processes trying to claim port 53.

No it wont, because as I’ve intimated previously, several times now, I know what I’m doing. I’ve got pihole tied to eth0, I’ve got bind tied to eth1. I’m one of those weird people who runs a domain controller on a pi using samba-addc.
So to be absolutely clear.

  1. Bind9 is configured for ONLY eth1.
  2. Pihole is bound to ONLY eth0.
  3. I know what I’m doing and am not duplicating an adaptor over 2 DNS servers.
  4. I’ve resolved the issue THAT APPEARED IN THE LOGS, of a corrupted FTL database by deleting it and letting pihole recreate it.

This setup has been running for 2 years. Please don’t question the config. The config is fine. Because I’m not an idiot who configures pihole on an adaptor that is used by a separate DNS running on the same port.

My reason for logging this issue is not to correct my config, as I’ve said, MY CONFIG IS CORRECT. I’ve logged it so that someone can investigate why the FTL database spontaneously combusted and corrupted in the hope that it prevents it happening to anyone else as a result of recent updates.

So you had no intention of heeding any of our advice or guidance. Okay. Best of luck to you!

No Dan, I had every intention of taking advice but only from people who listen or in this case read what information I have provided and not just blast a response without consideration of the facts. I said several times that I had a separate DNS running non-conflicted but you ignored me. That’s not support, that’s belligerence.

I've moved this to Community Help, hopefully someone comes by to provide you the answers you already have decided are the only acceptable ones.

1 Like

Are you always so full of yourself that only your answers are correct? I’m sure I’m correct because I’ve fully investigated the issue already.
Your attitude frankly sucks.

I believe that should be, "Your attitude, frankly, sucks".

Often enough crashes are assumed when other issues happen somewhere under the hood so we first have to find out what the issue really is.

Can you show these logs?

I'm not yet ready to believe that it is the database dumping that causes the issue because data is dumped into the database once every minute (unless it is changed in /etc/pihole/pihole-FTL.conf). This to reduce possible data loss on power loss which is, btw, a common thing on Raspberry Pis where user pull the plug for shutdown....
There is not a single special event at night where something happens in FTL.

For now, the last few lines of /var/log/pihole-FTL.log will be of interest when the issue happened. Also the content of /etc/pihole/pihole-FTL.conf if it exists (it usually doesn't only when manually created).

Happy to help DL6ER, I’m not sure if the logs will still be available, but I’ll check.
I’m doing some additional work on the pi today adding a ups so I’ll check and come back.

We're not in a rush, so we'll just wait until it happens next time. No need for looking around and digging out old files that are likely gone.

Commas in the sentence weren’t necessary providing the message is clear. I wasn’t writing a treatise. I was replying to a rude individual who seems to think he’s always right, then tries to troll people who actually are.