DNS goes offline after a few hours

Expected Behaviour:

DNS Resolution should not crash after 2-3 hours of normal operation

Actual Behaviour:

After 2-3 hours of normal operation I get the message from padd.sh that DNS is off and the webinterface is not responding anymore or very slow. Logging in via SSH always works. When I try to restart the DNS service I get the message

Job for pihole-FTL.service failed because the control process exited with error code.

When I check systemctl status I get this message:

Only a hard reboot of my pi "fixes" this problem. But always reappears after 2-3 hours.

Debug Token:

https://tricorder.pi-hole.net/2i8631brhj

Can you post the output of

/var/log/pihole-FTL.log

thats awkward, I get Permission denied although I am logged in as root user. Sorry I am not that familiar with linux. Do I first need to permit myself?
Funny thing is I can cd into the directory.

EDIT: Opening it with nano worked. Please see the post below

[2020-05-26 10:45:31.415 629] Listening on port 4711 for incoming IPv4 telnet connections
[2020-05-26 10:45:31.415 629] Listening on port 4711 for incoming IPv6 telnet connections
[2020-05-26 10:45:31.415 629] Listening on Unix socket
[2020-05-26 10:45:31.416 629] Reloading DNS cache
[2020-05-26 10:45:31.416 629] Blocking status is enabled
[2020-05-26 10:45:31.435 629] INFO: No regex blacklist entries found
[2020-05-26 10:45:31.436 629] INFO: No regex whitelist entries found
[2020-05-26 10:45:31.438 629] Compiled 0 whitelist and 0 blacklist regex filters in 7.9 msec
[2020-05-26 11:11:00.011 629] Shutting down...
[2020-05-26 11:11:00.589 629] Finished final database update
[2020-05-26 11:11:00.591 629] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-05-26 11:11:00.591 629] ---------------------------->  FTL crashed!  <----------------------------
[2020-05-26 11:11:00.592 629] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-05-26 11:11:00.592 629] Please report a bug at https://github.com/pi-hole/FTL/issues
[2020-05-26 11:11:00.592 629] and include in your report already the following details:
[2020-05-26 11:11:00.592 629] FTL has been running for 1530 seconds
[2020-05-26 11:11:00.592 629] FTL branch: master
[2020-05-26 11:11:00.592 629] FTL version: v5.0
[2020-05-26 11:11:00.592 629] FTL commit: 3d7c095
[2020-05-26 11:11:00.592 629] FTL date: 2020-05-10 18:58:38 +0100
[2020-05-26 11:11:00.592 629] FTL user: started as pihole, ended as pihole
[2020-05-26 11:11:00.592 629] Compiled for armhf (compiled on CI) using arm-linux-gnueabihf-gcc (Debian 6.3.0-18) 6.3.0 20170516
[2020-05-26 11:11:00.592 629] Received signal: Segmentation fault
[2020-05-26 11:11:00.592 629]      at address: 0x76f6b000
[2020-05-26 11:11:00.592 629]      with code: SEGV_MAPERR (Address not mapped to object)
[2020-05-26 11:11:00.592 629] Backtrace:
[2020-05-26 11:11:00.592 629] B[0000]: 0x4888e4, /usr/bin/pihole-FTL(+0x238e4) [0x4888e4]
[2020-05-26 11:11:00.593 629] B[0001]: 0x76d9b130, /lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0x76d9b130]
[2020-05-26 11:11:00.593 629] B[0002]: 0x4800ea, /usr/bin/pihole-FTL(DB_save_queries+0x129) [0x4800ea]
[2020-05-26 11:11:00.593 629] B[0003]: 0x483898, /usr/bin/pihole-FTL(DB_thread+0xbb) [0x483898]
[2020-05-26 11:11:00.593 629] ------ Listing content of directory /dev/shm ------
[2020-05-26 11:11:00.593 629] File Mode User:Group  Filesize Filename
[2020-05-26 11:11:00.593 629] rwxrwxrwx root:root 140 .
[2020-05-26 11:11:00.593 629] rwxr-xr-x root:root 4K ..
[2020-05-26 11:11:00.593 629] rw------- pihole:pihole 4K FTL-per-client-regex
[2020-05-26 11:11:00.593 629] rw------- pihole:pihole 4K FTL-dns-cache
[2020-05-26 11:11:00.593 629] rw------- pihole:pihole 29K FTL-overTime
[2020-05-26 11:11:00.593 629] rw------- pihole:pihole 20K FTL-upstreams
[2020-05-26 11:11:00.594 629] rw------- pihole:pihole 12 FTL-settings
[2020-05-26 11:11:00.594 629] ---------------------------------------------------
[2020-05-26 11:11:00.594 629] Thank you for helping us to improve our FTL engine!
[2020-05-26 11:11:00.594 629] FTL terminated!
[2020-05-26 11:11:00.594 629] 1.530234e+03 s! ##########
[2020-05-27 12:27:32.952 641] Using log file /var/log/pihole-FTL.log
[2020-05-27 12:27:32.952 641] ########## FTL started! ##########
[2020-05-27 12:27:32.952 641] FTL branch: master
[2020-05-27 12:27:32.952 641] FTL version: v5.0
[2020-05-27 12:27:32.952 641] FTL commit: 3d7c095
[2020-05-27 12:27:32.952 641] FTL date: 2020-05-10 18:58:38 +0100
[2020-05-27 12:27:32.952 641] FTL user: pihole
[2020-05-27 12:27:32.952 641] Compiled for armhf (compiled on CI) using arm-linux-gnueabihf-gcc (Debian 6.3.0-18) 6.3.0 20170516
[2020-05-27 12:27:32.952 641] Starting config file parsing (/etc/pihole/pihole-FTL.conf)
[2020-05-27 12:27:32.952 641]    SOCKET_LISTENING: only local
[2020-05-27 12:27:32.952 641]    AAAA_QUERY_ANALYSIS: Show AAAA queries
[2020-05-27 12:27:32.952 641]    MAXDBDAYS: max age for stored queries is 365 days
[2020-05-27 12:27:32.952 641]    RESOLVE_IPV6: Resolve IPv6 addresses
[2020-05-27 12:27:32.952 641]    RESOLVE_IPV4: Resolve IPv4 addresses
[2020-05-27 12:27:32.952 641]    DBINTERVAL: saving to DB file every minute
[2020-05-27 12:27:32.952 641]    DBFILE: Using /etc/pihole/pihole-FTL.db
[2020-05-27 12:27:32.952 641]    MAXLOGAGE: Importing up to 24.0 hours of log data
        [2020-05-27 12:27:32.952 641]    PRIVACYLEVEL: Set to 0
[2020-05-27 12:27:32.952 641]    IGNORE_LOCALHOST: Show queries from localhost
[2020-05-27 12:27:32.952 641]    BLOCKINGMODE: Null IPs for blocked domains
[2020-05-27 12:27:32.953 641]    ANALYZE_ONLY_A_AND_AAAA: Disabled. Analyzing all queries
[2020-05-27 12:27:32.953 641]    DBIMPORT: Importing history from database
[2020-05-27 12:27:32.953 641]    PIDFILE: Using /var/run/pihole-FTL.pid
[2020-05-27 12:27:32.953 641]    PORTFILE: Using /var/run/pihole-FTL.port
[2020-05-27 12:27:32.953 641]    SOCKETFILE: Using /var/run/pihole/FTL.sock
[2020-05-27 12:27:32.953 641]    SETUPVARSFILE: Using /etc/pihole/setupVars.conf
[2020-05-27 12:27:32.953 641]    MACVENDORDB: Using /etc/pihole/macvendor.db
[2020-05-27 12:27:32.953 641]    GRAVITYDB: Using /etc/pihole/gravity.db
[2020-05-27 12:27:32.953 641]    PARSE_ARP_CACHE: Active
[2020-05-27 12:27:32.953 641]    CNAME_DEEP_INSPECT: Active
[2020-05-27 12:27:32.953 641]    DELAY_STARTUP: No delay requested.
[2020-05-27 12:27:32.953 641]    BLOCK_ESNI: Enabled, blocking _esni.{blocked domain}
[2020-05-27 12:27:32.953 641] Finished config file parsing
[2020-05-27 12:27:32.999 641] Database version is 5
[2020-05-27 12:27:32.999 641] Database successfully initialized
[2020-05-27 12:27:33.069 641] New upstream server: 127.0.0.1 (0/1024)

The interesting part is around something like "FTL terminated" or so.

I found a part where FTL crashed. Is this enough?

Which upstream DNS-server are you using?

Can you provide the corresponding log from /var/log/pihole.log?
The developers will ask you for more logs and steps to proceed when the had a look on this.

I use Cloudflare as Upstream DNS provider in combination with DoH and DNSSEC. Will post the log in a sec

EDIT: this log is huge. basically all DNS querys. which part would be important?

This looks like a genuine crash. You could try if the following command would result in some additional information (though I doubt it would be more detailed or verbose than what your log has provided already):

sudo systemctl status --full --no-pager pihole-FTL.service

I have alerted the developers, but it may take some time before they have the chance to take a look at it (different time zones and daytime jobs interfering).

Will post the output in a while. Dont't worry guys, I am patient. I am already grateful someone even cares about the issue :slight_smile:

This is the output by now, but meanwhile I rebooted my pi again. I will post another output once it crashes again:

● pihole-FTL.service - LSB: pihole-FTL daemon
   Loaded: loaded (/etc/init.d/pihole-FTL; generated)
   Active: active (exited) since Wed 2020-05-27 12:27:34 CEST; 51min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 399 ExecStart=/etc/init.d/pihole-FTL start (code=exited, status=0/SUCCESS)

May 27 12:27:15 raspberrypi systemd[1]: Starting LSB: pihole-FTL daemon...
May 27 12:27:16 raspberrypi pihole-FTL[399]: Not running
May 27 12:27:16 raspberrypi su[488]: (to pihole) root on none
May 27 12:27:16 raspberrypi su[488]: pam_unix(su:session): session opened for user pihole by (uid=0)
May 27 12:27:34 raspberrypi pihole-FTL[399]: FTL started!
May 27 12:27:34 raspberrypi systemd[1]: Started LSB: pihole-FTL daemon.

If you want to be super awesome you could attach a debugger to pihole-FTL as described here:

This would give additional information if it crashed again.

ok will do that later. Needa head to the next meeting. Thx alot for your help. Really appreciate it!

1 Like

I believe it is now "screening". As I said I am not too familiar with that kinda stuff. Does this look OK?

I'm not familiar with the debugger but at first sight is looks good. I'll think we just leave it now as it is and wait for the devs :slight_smile:

1 Like

You shouldn't have run the sudo ... line in the debugger ((gdb) prompt), however it should not have caused issues. We will find out when the crash happens and we either get or don't get meaningful data. Until then, we are eagerly expecting what will happen.

yea I noticed that too after entering the line :stuck_out_tongue:
so meanwhile pihole crashed again but I don't see any output in the terminal.
As far as I understood the documentation I should see an output in the terminal, right?

SNAG-0014 27.05.2020 20.29.12

Thats all I see. it is running since a few hours. Dunno when exactly the crash happened.

EDIT: When I "quit" it says "Debugging Session Active". Is the data being dumped to a file?

Hmm, so something went wrong. It may be that you had several forks of pihole-FTL running when attaching the debugger and, by chance, we picket the wrong one.

Next time, instead of

sudo gdb -p $(pidof pihole-FTL)

please use

sudo gdb -p 13813

where this 13813 is the number you get from the command

grep "PID of FTL process" /var/log/pihole-FTL.log | tail -n1 | sed "s/.*: //"

Sorry for that, however, it is not always as straightforward to debug a multi-parallel application (using threads and process-forks at the same time).

No, you terminated the debugging session by quitting.

just a little update. The last crash was this morning. Afterwards I rebooted and started the screening with gdb (thx DL6ER for your help). Funny thing is, since then it didn't crash. I will post the output of gdb once it crashes again.

ok it crashed a few minutes ago. but all I see is this:

[Thread 0x734ff460 (LWP 2701) exited]
[New Thread 0x734ff460 (LWP 2704)]
[Thread 0x734ff460 (LWP 2704) exited]
[Detaching after fork from child process 2765]
[Detaching after fork from child process 2766]
[Detaching after fork from child process 2767]
[Detaching after fork from child process 2768]
[Detaching after fork from child process 2769]
[Detaching after fork from child process 2770]
[Detaching after fork from child process 2771]
[Detaching after fork from child process 2772]
[Detaching after fork from child process 2773]
[Detaching after fork from child process 2774]
[Detaching after fork from child process 2775]
[Detaching after fork from child process 2776]
[Detaching after fork from child process 2777]
[Detaching after fork from child process 2778]
[Detaching after fork from child process 2779]
[Detaching after fork from child process 2780]
[Detaching after fork from child process 2781]
[Detaching after fork from child process 2782]
[Detaching after fork from child process 2783]
[Detaching after fork from child process 2784]
[Detaching after fork from child process 2785]
[Detaching after fork from child process 2786]
[Detaching after fork from child process 2787]
[Detaching after fork from child process 2788]
[Detaching after fork from child process 2789]
[Detaching after fork from child process 2790]
[Detaching after fork from child process 2791]
[Detaching after fork from child process 2792]
[Detaching after fork from child process 2800]
[New Thread 0x734ff460 (LWP 2826)]
[Thread 0x734ff460 (LWP 2826) exited]
[New Thread 0x734ff460 (LWP 2829)]
[Thread 0x734ff460 (LWP 2829) exited]
[New Thread 0x734ff460 (LWP 2860)]
[Thread 0x734ff460 (LWP 2860) exited]
[New Thread 0x734ff460 (LWP 2865)]
[Thread 0x734ff460 (LWP 2865) exited]
[New Thread 0x734ff460 (LWP 2870)]
[Thread 0x734ff460 (LWP 2870) exited]
[New Thread 0x734ff460 (LWP 2874)]
[Thread 0x734ff460 (LWP 2874) exited]
[Detaching after fork from child process 2941]
[New Thread 0x734ff460 (LWP 2966)]
[Thread 0x734ff460 (LWP 2966) exited]
[New Thread 0x734ff460 (LWP 2969)]
[Thread 0x734ff460 (LWP 2969) exited]
[New Thread 0x734ff460 (LWP 3000)]
[Thread 0x734ff460 (LWP 3000) exited]
[New Thread 0x734ff460 (LWP 3005)]
[Thread 0x734ff460 (LWP 3005) exited]
[New Thread 0x734ff460 (LWP 3010)]
[Thread 0x734ff460 (LWP 3010) exited]
[New Thread 0x734ff460 (LWP 3013)]
[Thread 0x734ff460 (LWP 3013) exited]
[Detaching after fork from child process 3079]
[New Thread 0x734ff460 (LWP 3104)]
[Thread 0x734ff460 (LWP 3104) exited]
[New Thread 0x734ff460 (LWP 3107)]
[Thread 0x734ff460 (LWP 3107) exited]
[New Thread 0x734ff460 (LWP 3138)]
[Thread 0x734ff460 (LWP 3138) exited]
[New Thread 0x734ff460 (LWP 3143)]
[Thread 0x734ff460 (LWP 3143) exited]
[New Thread 0x734ff460 (LWP 3148)]
[Thread 0x734ff460 (LWP 3148) exited]
[New Thread 0x734ff460 (LWP 3151)]
[Thread 0x734ff460 (LWP 3151) exited]
[Detaching after fork from child process 3217]