PiHole-FTL service timeout at start

JulioQc · December 19, 2018, 11:14pm

Hi,

Out of the blue my Pihole-FTL died and now refuses to start. Here is the output of pihole-r (after changing resolv.conf to 8.8.8.8 first)

[✓] Root user check

        .;;,.
        .ccccc:,.
         :cccclll:.      ..,,
          :ccccclll.   ;ooodc
           'ccll:;ll .oooodc
             .;cll.;;looo:.
                 .. ','.
                .',,,,,,'.
              .',,,,,,,,,,.
            .',,,,,,,,,,,,....
          ....''',,,,,,,'.......
        .........  ....  .........
        ..........      ..........
        ..........      ..........
        .........  ....  .........
          ........,,,,,,,'......
            ....',,,,,,,,,,,,.
               .',,,,,,,,,'.
                .',,,,,,'.
                  ..'''.

  [i] Existing PHP installation detected : PHP version 7.0.33-0+deb9u1
  [i] Repair option selected
  [✓] Disk space check
  [✓] Update local cache of available packages

  [✓] Checking apt-get for upgraded packages... up to date!

  [i] Installer Dependency checks...
  [✓] Checking for apt-utils
  [✓] Checking for dialog
  [✓] Checking for debconf
  [✓] Checking for dhcpcd5
  [✓] Checking for git
  [✓] Checking for iproute2
  [✓] Checking for whiptail

  [i] Performing reconfiguration, skipping download of local repos
  [✓] Resetting repository within /etc/.pihole...
  [✓] Resetting repository within /var/www/html/admin...
  [i] Main Dependency checks...
  [✓] Checking for bc
  [✓] Checking for cron
  [✓] Checking for curl
  [✓] Checking for dnsutils
  [✓] Checking for iputils-ping
  [✓] Checking for lsof
  [✓] Checking for netcat
  [✓] Checking for psmisc
  [✓] Checking for sudo
  [✓] Checking for unzip
  [✓] Checking for wget
  [✓] Checking for idn2
  [✓] Checking for sqlite3
  [✓] Checking for libcap2-bin
  [✓] Checking for dns-root-data
  [✓] Checking for resolvconf
  [✓] Checking for lighttpd
  [✓] Checking for php7.0-common
  [✓] Checking for php7.0-cgi
  [✓] Checking for php7.0-sqlite3

  [✓] Enabling lighttpd service to start on reboot...
  [✓] Checking for user 'pihole'
  [✓] Installing scripts from /etc/.pihole

  [i] Installing configs from /etc/.pihole...
  [i] Existing dnsmasq.conf found... it is not a Pi-hole file, leaving alone!
  [✓] Copying 01-pihole.conf to /etc/dnsmasq.d/01-pihole.conf

  [i] Installing blocking page...
  [✓] Creating directory for blocking page, and copying files
  [✗] Backing up index.lighttpd.html
      No default index.lighttpd.html file found... not backing up

  [✓] Installing sudoer file

  [✓] Installing latest Cron script

  [✓] Installing latest logrotate script

  [i] FTL Checks...

  [✓] Detected x86_64 architecture
  [i] Checking for existing FTL binary...
  [i] Latest FTL Binary already installed (v4.1). Confirming Checksum...
  [i] Checksum correct. No need to download!
  [✓] man pages installed and database updated
  [i] Testing if systemd-resolved is enabled
  [i] Systemd-resolved is not enabled
  [✓] Starting lighttpd service...
  [✓] Enabling lighttpd service to start on reboot...
  [i] Restarting services...
  [✓] Enabling pihole-FTL service to start on reboot...
  [i] Starting pihole-FTL service...

And it just dies there. Manual service pihole start simply returns a timeout...

Job for pihole-FTL.service failed because a timeout was exceeded.
See "systemctl status pihole-FTL.service" and "journalctl -xe" for details.

Result from systemctl status pihole-FTL.service output

systemd[1]: Starting LSB: pihole-FTL daemon...
pihole-FTL[2304]: Terminated
pihole-FTL[2503]: Stopped
su[2595]: Successful su for pihole by root
su[2595]: + ??? root:pihole
su[2595]: pam_unix(su:session): session opened for user pihole by (uid=0)
systemd[1]: pihole-FTL.service: Start operation timed out. Terminating.
systemd[1]: Failed to start LSB: pihole-FTL daemon.
systemd[1]: pihole-FTL.service: Unit entered failed state.
systemd[1]: pihole-FTL.service: Failed with result 'timeout'.

And journalctl -xe output

Dec 19 17:06:17 PiHOLE systemd[1]: Failed to start LSB: pihole-FTL daemon.
-- Subject: Unit pihole-FTL.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit pihole-FTL.service has failed.
--
-- The result is failed.

Latest lines of /var/log/pihole-FTL.log is it can help (repeat these a millions times, always increasing)

[2018-12-19 17:08:52.756] Notice: Increasing domains struct size from 267000 to 268000
[2018-12-19 17:08:57.116] Notice: Increasing domains struct size from 268000 to 269000
[2018-12-19 17:09:00.784] Notice: Increasing domains struct size from 269000 to 270000

It's running on a Debian Strech VM. Any help would be welcomed before I scrap the VM and start over.

Thank you

jfb · December 20, 2018, 1:30am

How many domains were on blocklist?

JulioQc · December 20, 2018, 1:37pm

2,310,414 domains are on the blocklist. Been working fine over 6 months and across updates. Updated from FTL 4.0 to 4.1 about 1 week ago.

If it can help: Your debug token is: wjsnsxljlf

UPDATE: the 01 VM instance pihole-FTL service came back up overnight, now the 02 went down with the same symptoms....

UPDATE2: If I change /etc/resolv.conf to something else than 127.0.0.1 the service will start again about 1h later on it's own.

jfb · December 21, 2018, 4:10am

What are the outputs of the following commands:

echo ">stats" | nc localhost 4711

dir -l -h /etc/pihole/pihole-FTL.db

JulioQc · December 21, 2018, 12:58pm

As of this morning, both are dead so I had to remove them from our pool....
Here is the output:

root@PiHOLE-01-M:/home/it# echo ">stats" | nc localhost 4711
localhost [127.0.0.1] 4711 (?) : Connection refused
root@PiHOLE-01-M:/home/it# dir -l -h /etc/pihole/pihole-FTL.db
-rw-r--r-- 1 pihole pihole 4.6G Dec 20 20:24 /etc/pihole/pihole-FTL.db

On second one:

root@PiHOLE-02-M:/home/it# echo ">stats" | nc localhost 4711
localhost [127.0.0.1] 4711 (?) : Connection refused
root@PiHOLE-02-M:/home/it# dir -l -h /etc/pihole/pihole-FTL.db
-rw-r--r-- 1 pihole pihole 4.0G Dec 20 20:36 /etc/pihole/pihole-FTL.db

And for comparison, the output from the working one (3rd instance):

root@PiHOLE-01-L:/home/it# echo ">stats" | nc localhost 4711
domains_being_blocked 2309703
dns_queries_today 946972
ads_blocked_today 73189
ads_percentage_today 7.728740
unique_domains 755908
queries_forwarded 839437
queries_cached 34346
clients_ever_seen 5
unique_clients 5
dns_queries_all_types 946972
reply_NODATA 14171
reply_NXDOMAIN 28827
reply_CNAME 20369
reply_IP 796062
privacy_level 0
status enabled
---EOM---
^C
root@PiHOLE-01-L:/home/it# dir -l -h /etc/pihole/pihole-FTL.db
-rw-r--r-- 1 pihole pihole 2.0G Dec 21 07:57 /etc/pihole/pihole-FTL.db

UPDATE: after deleting /etc/pihole/pihole-FTL.db and a full repair using pihole -r the service started normally.

jfb · December 21, 2018, 2:55pm

I believe what is happening is that you have a very large number of queries in 24 hours (almost a million) and this is all stored in memory and written to the long term database. When you restart FTL, you have to open the very large database and then read the most recent 24 hours of data (this is the default) into memory. You see the increasing allocation of resources.

If you don't need the long term data (or don't need it for long), you can change the default behavior and limit the number of days retained in the database, disable import from the database on startup, or disable the database completely.

Note the following in the documentation: "The config parameter DBIMPORT controls whether FTL loads information from the database on startup. It need to do this to populate the internal datastructure with the most recent history. However, as importing from the database on disk can delay FTL on very large deploys, it can be disabled using this option."

JulioQc · December 21, 2018, 3:11pm

Yes that makes sense knowing we had an issue a week or two ago where we had a DNS loop. That causes the DNS queries to loop accross the DNS hierarchy until the packet TTL expired. So the pihole was being overloaded with +1000% queries for a few weeks. After solving the issue it went back up to normal operations but I suppose the DB was on the edge of collapse until this it died this week.

Oh well, I'll keep it running as it is right now but if the issue persist I'll follow your advice. Thank you mate, much appreciated!

system · January 11, 2019, 3:11pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.