FTL randomly crashes after update to 5.0

Expected Behaviour:

Pi-hole operating normally
Ubuntu 18.04
Amazon VPS, 0.5 GB RAM
OpenVPN
Note: at some point I was using nginx, but after update to 5.0 I reverted the changes, I do not believe they are relevant to the issue

Actual Behaviour:

FTL randomly crashes, minutes or seconds after restart or reboot

Debug Token:

removed

Could this part of nginx configuration cause the issue? If so, how do I revert it?

chown -R www-data:www-data /var/www/html
chmod -R 755 /var/www/html

Remove /var/www/html and run pihole -r to repair?

This is a check deeply down in the dnsmasq core (receive_query -> iface_check subroutine). Can you follow the debugging instructions in our documentation so we can work together on a fix for this?

Thanks!

https://docs.pi-hole.net/ftldns/debugging/

@DL6ER thanks for your help
Either I'm doing something wrong, or FTL restarts by itself:

$ sudo gdb -p $(pidof pihole-FTL)
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
17638: No such file or directory.
Attaching to process 17655
Reading symbols from /usr/bin/pihole-FTL...done.
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.27.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/.build-id/28/c6aade70b2d40d1f0f3d0a1a0cad1ab816448f.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.27.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.27.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_compat.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_compat-2.27.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_nis.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_nis-2.27.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libnsl.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnsl-2.27.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.27.so...done.
done.
0x00007f380e100384 in __libc_read (fd=fd@entry=18, buf=buf@entry=0x7ffc433f2b08, nbytes=nbytes@entry=1) at ../sysdeps/unix/sysv/linux/read.c:27
27      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
(gdb) continue
Continuing.
[Inferior 1 (process 17655) exited normally]
(gdb) backtrace
No stack.

Every time I update http://pi.hole/admin/settings.php, it shows that FTL has recently started, about a minute ago. If it restarts that often, how to catch a crash?

FTL stopped restarting every minute yesterday evening (without any intervention) and is stable so far, I have attached gdb, waiting for crash.

There has to be a reason for restarting. Likely you changing some configuration that needs a restart. There shouldn't be anything that is doing automated restarts (someone would have to be responsible for doing this). But I'm glad it stopped :wink:

Is it normal for http://pi.hole/admin/settings.php to report different PIDs for FTL on refreshing the page?
I made a small video here:
https://nxmail.org/s/63RFJiHyWeR9Hsm
At the start, PID is 1094 and Time FTL started is 12:42:48
Then I refresh the page several times and these values are changing
But sometimes they come back to "original", captured it at 1:07

gdb stays attached (I think), this is the output):

[Thread 0x7f7c77fff700 (LWP 4810) exited]
[New Thread 0x7f7c77fff700 (LWP 4811)]
[Thread 0x7f7c77fff700 (LWP 4811) exited]
[New Thread 0x7f7c77fff700 (LWP 4813)]
[Thread 0x7f7c77fff700 (LWP 4813) exited]
[New Thread 0x7f7c77fff700 (LWP 4815)]
[Thread 0x7f7c77fff700 (LWP 4815) exited]
[New Thread 0x7f7c77fff700 (LWP 4818)]
[Thread 0x7f7c77fff700 (LWP 4818) exited]
[New Thread 0x7f7c77fff700 (LWP 4820)]
[Thread 0x7f7c77fff700 (LWP 4820) exited]
[New Thread 0x7f7c77fff700 (LWP 4822)]
[Thread 0x7f7c77fff700 (LWP 4822) exited]
[New Thread 0x7f7c77fff700 (LWP 4826)]
[Thread 0x7f7c77fff700 (LWP 4826) exited]

No, not necessarily, however, when there are TCP requests being made in your network, it may be that the settings page picks up PIDs of the forks instead of the main process. PHP is too limited here but it is nothing you should worry about. It's only a symptom, the underlying machinery will be working correctly, nonetheless.

That's expected. Threads are spawned to handle API requests so they are not blocking DNS resolution. It will go away when you close all web interfaces everywhere.

Got it!

(gdb) continue
Continuing.

Program received signal SIG34, Real-time event 34.
0x00007f890b358384 in __libc_read (fd=fd@entry=17, buf=buf@entry=0x7fff6affa4f8, nbytes=nbytes@entry=1)
    at ../sysdeps/unix/sysv/linux/read.c:27
27      in ../sysdeps/unix/sysv/linux/read.c
(gdb) backtrace
#0  0x00007f890b358384 in __libc_read (fd=fd@entry=17, buf=buf@entry=0x7fff6affa4f8, nbytes=nbytes@entry=1)
    at ../sysdeps/unix/sysv/linux/read.c:27
#1  0x0000565452c65fde in read (__nbytes=1, __buf=0x7fff6affa4f8, __fd=17) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2  read_write (fd=fd@entry=17, packet=packet@entry=0x7fff6affa4f8 "", size=size@entry=1, rw=rw@entry=1) at src/dnsmasq/util.c:696
#3  0x0000565452c3e482 in tcp_request (confd=confd@entry=17, now=now@entry=1589985445, local_addr=local_addr@entry=0x7fff6affa5f0, 
    netmask=..., netmask@entry=..., auth_dns=auth_dns@entry=0) at src/dnsmasq/forward.c:1911
#4  0x0000565452c57824 in check_dns_listeners (now=now@entry=1589985445) at src/dnsmasq/dnsmasq.c:1961
#5  0x0000565452c597bc in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at src/dnsmasq/dnsmasq.c:1203
#6  0x0000565452c158ac in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:93
(gdb) 

This happened when I added test record to whitelist.
I tested it because I had an issue today on this instance where I could not add anything to whitelist due to readonly database, so I ran pihole -g -r to fix it.
I already did it here in another instance of Pi-hole:


Actually, I also tried

sudo setfacl -Rbn /etc/pihole/
sudo setfacl -Rbn /etc/pihole/gravity.db

as discussed in V5.0 Docker, Whitelist domain, writing to readonly database , but it did not have any effect. User www-data could not write to gravity.db

So, did I cause that crash doing what I should not have or is it an issue you can look into?

A bit different backtrace if I attach gdb to lowest available PID of FTL:

(gdb) continue
Continuing.
[New Thread 0x7f89073cd700 (LWP 3231)]
[Thread 0x7f89073cd700 (LWP 3231) exited]
[New Thread 0x7f89073cd700 (LWP 3234)]
[Thread 0x7f89073cd700 (LWP 3234) exited]
[New Thread 0x7f89073cd700 (LWP 3235)]
[Thread 0x7f89073cd700 (LWP 3235) exited]

Thread 1 "pihole-FTL" received signal SIG34, Real-time event 34.
0x00007f890b06abf9 in __GI___poll (fds=0x565454cbc810, nfds=7, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29      in ../sysdeps/unix/sysv/linux/poll.c
(gdb) backtrace
#0  0x00007f890b06abf9 in __GI___poll (fds=0x565454cbc810, nfds=7, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x0000565452c3942a in poll (__timeout=-1, __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  do_poll (timeout=timeout@entry=-1) at src/dnsmasq/poll.c:78
#3  0x0000565452c5972b in main_dnsmasq (argc=<optimized out>, argv=<optimized out>) at src/dnsmasq/dnsmasq.c:1125
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The real-time events are fine. They are triggered when you manipulate lists to tell FTL to update its list cache. This is not the crash we're looking for.

Step 4 here should have set this signal to be ignored to not influence debugging (as said, we expect this and it is perfectly fine!)

That's strange, I double checked and it's configured as it should:

$ sudo cat /root/.gdbinit
handle SIGHUP nostop SIGPIPE nostop SIGTERM nostop SIG32 nostop SIG34 nostop SIG35 nostop

Anyways, I did not have a second crash ever, for the record: I fixed permissions for /var/www/html and ran pihole -g -r, this may or may not be relevant.
Thanks for helping me with this issue, very much appreciate it.