Pihole-FTL / DNS failing every 12-24h requiring reboot

Please follow the below template, it will help us to help you!

Expected Behaviour:

Pihole v5 has been unstable on both my RPI's (4 and 3B+) both running Buster. I am not using anything non-standard, so I would expect both the admin web interface to Pihole and Pihole itself to be running properly on these configurations.

Actual Behaviour:

On both my devices, RPI4 / Buster and RPI3B+ / Buster, Pihole v5 (master branch) has been unstable. The symptoms are that the web interface slows to a crawl (>1 minute page load times) and the device stops responding to DNS requests. They system / OS is responsive, and I can ssh in no problem. The system is not taxed and there is ample CPU/Mem/Disk with no swapping.

The only way to recover is to reboot the device, as running "pihole restartdns" or "sudo systemctl restart pihole-FTL" both fail.

On both my setups I run unbound, and cloudflared. Both systems were very stable in the past prior to v5

Debug Token:

https://tricorder.pi-hole.net/i257cdevjr

Something is causing a whole lot of sockets to be open:

*** [ DIAGNOSING ]: Ports in use
127.0.0.1:5053 cloudflare (IPv4)
127.0.0.1:35369 cloudflare (IPv4)
*:22 sshd (IPv4)
*:22 sshd (IPv6)
127.0.0.1:5353 unbound (IPv4)
[80] is in use by lighttpd
[80] is in use by lighttpd
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
*:445 smbd (IPv6)
*:139 smbd (IPv6)
*:445 smbd (IPv4)
*:139 smbd (IPv4)
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
*:32400 Plex Me (IPv6)
127.0.0.1:32401 Plex Me (IPv4)
127.0.0.1:35007 Plex Sc (IPv4)
*:1656 Plex DL (IPv4)
*:32469 Plex DL (IPv4)
127.0.0.1:32600 Plex Tu (IPv4)
127.0.0.1:37871 Plex Tu (IPv4)

Probably unrelated but 38 adlists and 325+ whitelist/regex modifications isn't going to help things.

   [2020-05-16 06:42:04.397 22351] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 06:42:04.397 22351] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 06:45:25.968 22384] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 06:45:25.968 22384] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 07:15:04.384 22599] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 07:15:04.384 22599] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 07:17:38.183 22687] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 07:17:38.183 22687] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 07:36:42.435 777] Resizing "/FTL-strings" from 126976 to 131072
   [2020-05-16 07:54:03.365 23126] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 07:54:03.365 23126] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 08:14:22.786 23293] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 08:14:22.786 23293] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 08:21:04.477 23352] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 08:21:04.477 23352] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 08:35:46.808 23470] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 08:35:46.809 23470] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 08:40:44.120 777] Resizing "/FTL-dns-cache" from 65536 to 69632
   [2020-05-16 08:40:47.326 23542] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 08:40:47.331 23542] SQLite3 message: misuse at line 157333 of [18db032d05] (21)
   [2020-05-16 08:43:44.128 23723] SQLite3 message: API call with invalid database connection pointer (21)
   [2020-05-16 08:43:44.128 23723] SQLite3 message: misuse at line 157333 of [18db032d05] (21)

Does pihole -g complete without displaying any errors for malformed lists?

No issues running pihole -g . I run regularly and remove bad lists

Are there any scaling considerations I need to worry about with the number of blocklists and whitelists? # file descriptors for example?

No concerns with the number you have. It does appear that your database might be corrupted. Let's try these steps which will move your existing database and then create a new one from your migration backup. If it doesn't produce the desired result, you can restore your old database.

sudo mv /etc/pihole/gravity.db /etc/pihole/gravity-old.db

pihole -g -r

Thanks I will give it a try and let you know. Yesterday I flushed the pihole-FTL.db but I never flushed the gravity db.

So far so good. This is the longest pihole has stayed up since last Monday. I think something must've corrupted the gravity db like you said and teleporter moved that issue to my redundant server.

Thanks for your help!

OK unfortunately the issue is back.

I went to check the admin portal, and immediately DNS stopped responding. The log entries from journalctl -xe show errors from around the same time I hit the admin page.

Front end symptom is that the web GUI takes a very long time to load (30-60 seconds). It displays as DNS as not running in the top left window area.

pi@plexpi : ~ $ sudo pihole restartdns

[✗] Job for pihole-FTL.service failed because the control process exited with error code.

See "systemctl status pihole-FTL.service" and "journalctl -xe" for details.

pi@plexpi : ~ $ sudo journalctl -xe | grep pihole-FTL

May 18 10:04:19 plexpi sudo[22124]: pi : TTY=pts/0 ; PWD=/home/pi ; USER=root ; COMMAND=/bin/systemctl pihole-FTL status

May 18 10:04:32 plexpi sudo[22173]: pi : TTY=pts/0 ; PWD=/home/pi ; USER=root ; COMMAND=/bin/systemctl status pihole-FTL

May 18 10:05:12 plexpi systemd[1]: Stopping LSB: pihole-FTL daemon...

-- Subject: A stop job for unit pihole-FTL .service has begun execution

-- A stop job for unit pihole-FTL .service has begun execution.

May 18 10:05:18 plexpi pihole-FTL [22279]: .....

May 18 10:05:18 plexpi pihole-FTL [22279]: Not stopped; may still be shutting down or shutdown may have failed, killing now

May 18 10:05:18 plexpi systemd[1]: pihole-FTL .service: Control process exited, code=exited, status=1/FAILURE

-- An ExecStop= process belonging to unit pihole-FTL .service has exited.

May 18 10:05:18 plexpi systemd[1]: pihole-FTL .service: Failed with result 'exit-code'.

-- The unit pihole-FTL .service has entered the 'failed' state with result 'exit-code'.

May 18 10:05:18 plexpi systemd[1]: Stopped LSB: pihole-FTL daemon.

-- Subject: A stop job for unit pihole-FTL .service has finished

-- A stop job for unit pihole-FTL .service has finished.

May 18 10:05:18 plexpi systemd[1]: Starting LSB: pihole-FTL daemon...

-- Subject: A start job for unit pihole-FTL .service has begun execution

-- A start job for unit pihole-FTL .service has begun execution.

May 18 10:05:23 plexpi pihole-FTL [22318]: .....

May 18 10:05:23 plexpi pihole-FTL [22318]: Not stopped; may still be shutting down or shutdown may have failed, killing now

May 18 10:05:23 plexpi systemd[1]: pihole-FTL .service: Control process exited, code=exited, status=1/FAILURE

-- An ExecStart= process belonging to unit pihole-FTL .service has exited.

May 18 10:05:23 plexpi systemd[1]: pihole-FTL .service: Failed with result 'exit-code'.

-- The unit pihole-FTL .service has entered the 'failed' state with result 'exit-code'.

May 18 10:05:23 plexpi systemd[1]: Failed to start LSB: pihole-FTL daemon.

-- Subject: A start job for unit pihole-FTL .service has failed

-- A start job for unit pihole-FTL .service has finished with a failure.

Moved /etc/resolv.conf back to 1.1.1.1 ... Ran a pihole -d,

[✓] Your debug token is: https://tricorder.pi-hole.net/wz4zk5pnn0

I've seen stability improve thus-far by catching 2 potential issues that may contribute to this problem:

  1. I had DNSSEC enabled in the pihole admin gui. I should've had this ticked off given I'm using unbound

  2. I noticed both unbound and avahi-daemon were using port 5353...so i moved unbound to 5335 just in case. Technically avahi-daemon shouldn't hog the port, but best to resolve the conflict.

Again, spoke too soon. The pihole dns service is still failing periodically. I have no idea what could be causing it.

'pihole restartdns' fails, as does 'sudo systemctl pihole-FTL restart' ... requiring a reboot

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.