FTL crash after update v4.3.1

I reveice almost the same report:

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7522f460 (LWP 2335)]
0x76dfc428 in strchrnul () from /lib/arm-linux-gnueabihf/libc.so.6
(gdb) backtrace
#0  0x76dfc428 in strchrnul () from /lib/arm-linux-gnueabihf/libc.so.6
#1  0x76dc2174 in vfprintf () from /lib/arm-linux-gnueabihf/libc.so.6
#2  0x76e6a024 in __vasprintf_chk () from /lib/arm-linux-gnueabihf/libc.so.6
#3  0x76e69f30 in __asprintf_chk () from /lib/arm-linux-gnueabihf/libc.so.6
#4  0x004d5b2a in asprintf (__fmt=0x5c05d4 "SELECT id FROM network WHERE hwaddr = '%s';", __ptr=0x7522ec60)
    at /usr/arm-linux-gnueabihf/include/bits/stdio2.h:178
#5  parse_neighbor_cache () at src/database/network-table.c:382
#6  0x004d8ae0 in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#7  0x76ed5494 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#8  0x76e58578 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(hope pasting it a second time helps in any way, didn't want to flood the thread with unneccessary information, but as you asked "all" to put FTL into the debugger…)

Running Pihole on DietPi v6.28.0/RPi 2 Model B, connection via Ethernet, Wifi disabled.
Adding MAXDBDAYS=0 to /etc/pihole/setupVars.conf seems to solve the issue as a workaround.

Thanks so far, you're indeed all affected by the very same bug. We're working on further debugging on the GitHub issue and I will hopefully know how to code a fix for you to try when I return from work in a few hours.

… and I will hopefully know how to code a fix for you to try when I return from work in a few hours.

Now that's a cliffhanger! :joy:

1 Like

if you need more tests, I can do it later tonight if that's not fixed yet.

I got the same issue on the latest update just now on a RPi 3B+

output from gdb:

Attaching to process 15834
[New LWP 15835]
[New LWP 15836]
[New LWP 15837]
[New LWP 15838]
[New LWP 15839]
[New LWP 15840]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
__GI___poll (timeout=-1, nfds=6, fds=0xede7b0) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) continue
Continuing.
[New Thread 0x735ff460 (LWP 15849)]
[Thread 0x735ff460 (LWP 15849) exited]
[New Thread 0x735ff460 (LWP 15850)]
[Detaching after fork from child process 15851]
[Thread 0x735ff460 (LWP 15850) exited]
[New Thread 0x735ff460 (LWP 15852)]

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x74f3f460 (LWP 15838)]
__strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37) at strchrnul.c:50
50      strchrnul.c: No such file or directory.
(gdb) backtrace
#0  __strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37) at strchrnul.c:50
#1  0x76d5c174 in __find_specmb (format=0x1 <error: Cannot access memory at address 0x1>) at printf-parse.h:108
#2  _IO_vfprintf_internal (s=s@entry=0x74f3eac0, format=format@entry=0x1 <error: Cannot access memory at address 0x1>, ap=..., ap@entry=...) at vfprintf.c:1315
#3  0x76e04024 in __GI___vasprintf_chk (result_ptr=result_ptr@entry=0x74f3ec60, flags=flags@entry=1, format=0x1 <error: Cannot access memory at address 0x1>, format@entry=0x0, args=...,
args@entry=...) at vasprintf_chk.c:66
#4  0x76e03f30 in __asprintf_chk (result_ptr=result_ptr@entry=0x74f3ec60, flags=flags@entry=1, format=0x1 <error: Cannot access memory at address 0x1>) at asprintf_chk.c:32
#5  0x00484b2a in asprintf (__fmt=0x56f5d4 "SELECT id FROM network WHERE hwaddr = '%s';", __ptr=0x74f3ec60) at /usr/arm-linux-gnueabihf/include/bits/stdio2.h:178
#6  parse_neighbor_cache () at src/database/network-table.c:382
#7  0x00487ae0 in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#8  0x76e6f494 in start_thread (arg=0x74f3f460) at pthread_create.c:486
#9  0x76df2578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) continue
Continuing.

Thread 5 "database" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x76d32230 in __GI_abort () at abort.c:79
#2  0x0048cb9c in SIGSEGV_handler (sig=<optimized out>, si=<optimized out>, unused=<optimized out>) at src/signals.c:80
#3  <signal handler called>
#4  __strchrnul (s=0x1 <error: Cannot access memory at address 0x1>, c_in=37) at strchrnul.c:50
#5  0x76d5c174 in __find_specmb (format=0x1 <error: Cannot access memory at address 0x1>) at printf-parse.h:108
#6  _IO_vfprintf_internal (s=s@entry=0x74f3eac0, format=format@entry=0x1 <error: Cannot access memory at address 0x1>, ap=..., ap@entry=...) at vfprintf.c:1315
#7  0x76e04024 in __GI___vasprintf_chk (result_ptr=result_ptr@entry=0x74f3ec60, flags=flags@entry=1, format=0x1 <error: Cannot access memory at address 0x1>, format@entry=0x0, args=...,
args@entry=...) at vasprintf_chk.c:66
#8  0x76e03f30 in __asprintf_chk (result_ptr=result_ptr@entry=0x74f3ec60, flags=flags@entry=1, format=0x1 <error: Cannot access memory at address 0x1>) at asprintf_chk.c:32
#9  0x00484b2a in asprintf (__fmt=0x56f5d4 "SELECT id FROM network WHERE hwaddr = '%s';", __ptr=0x74f3ec60) at /usr/arm-linux-gnueabihf/include/bits/stdio2.h:178
#10 parse_neighbor_cache () at src/database/network-table.c:382
#11 0x00487ae0 in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#12 0x76e6f494 in start_thread (arg=0x74f3f460) at pthread_create.c:486
#13 0x76df2578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
gdb) continue
Continuing.
Unable to fetch general registers.: No such process.
Unable to fetch general registers.: No such process.
(gdb) [Thread 0x735ff460 (LWP 15852) exited]
[Thread 0x73f3d460 (LWP 15840) exited]
[Thread 0x7473e460 (LWP 15839) exited]
[Thread 0x75740460 (LWP 15837) exited]
[Thread 0x75f41460 (LWP 15836) exited]
[Thread 0x76742460 (LWP 15835) exited]
[Thread 0x76d19010 (LWP 15834) exited]

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
# pihole -v
Pi-hole version is v4.3.5-455-gfff7adf (Latest: v4.4)
AdminLTE version is v4.3.2-457-g0a81dadf (Latest: v4.3.3)
FTL version is vDev-81c4eac (Latest: v4.3.1)

@ArTourter (or anyone else) Can you reproduce the crash in the debugger and run the commands I posted in the GitHub issue ticket?

First you might have to switch context, see:

Then run

p linebuffer
p num
p ip
p iface
p hwaddr

This hopefully gives some hint to what is failing...

Not sure I got it right, but …

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7520f460 (LWP 1771)]
0x76ddc428 in strchrnul () from /lib/arm-linux-gnueabihf/libc.so.6
(gdb) backtrace
#0  0x76ddc428 in strchrnul () from /lib/arm-linux-gnueabihf/libc.so.6
0x76ddc428 in strchrnul () from /lib/arm-linux-gnueabihf/libc.so.6
(gdb) backtrace
#0  0x76ddc428 in strchrnul () from /lib/arm-linux-gnueabihf/libc.so.6
#1  0x76da2174 in vfprintf () from /lib/arm-linux-gnueabihf/libc.so.6
#2  0x76e4a024 in __vasprintf_chk () from /lib/arm-linux-gnueabihf/libc.so.6
#3  0x76e49f30 in __asprintf_chk () from /lib/arm-linux-gnueabihf/libc.so.6
#4  0x004fdb2a in asprintf (__fmt=0x5e85d4 "SELECT id FROM network WHERE hwaddr = '%s';", __ptr=0x7520ec60)
    at /usr/arm-linux-gnueabihf/include/bits/stdio2.h:178
#5  parse_neighbor_cache () at src/database/network-table.c:382
#6  0x00500ae0 in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#7  0x76eb5494 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#8  0x76e38578 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 5
[Switching to thread 5 (Thread 0x7520f460 (LWP 1771))]
#0  0x76ddc428 in strchrnul () from /lib/arm-linux-gnueabihf/libc.so.6
(gdb) p linebuffer
$1 = 0x0
(gdb) p num
No symbol "num" in current context.
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb)

Everyone in here, please check if

pihole checkout ftl fix/neighcrash

fixes the crash for you. This is a first attempt.

Nope, not for me, crashed again after less than a minute.

Attaching to process 17443
[New LWP 17444]
[New LWP 17445]
[New LWP 17446]
[New LWP 17447]
[New LWP 17448]
[New LWP 17449]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
__GI___poll (timeout=-1, nfds=6, fds=0x7cc7b0) at ../sysdeps/unix/sysv/linux/poll.c:29
29      ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) handle SIGHUP nostop SIGPIPE nostop
Signal        Stop      Print   Pass to program Description
SIGHUP        No        Yes     Yes             Hangup
SIGPIPE       No        Yes     Yes             Broken pipe
(gdb) continue
Continuing.
[New Thread 0x733ff460 (LWP 17483)]
[Thread 0x733ff460 (LWP 17483) exited]
[New Thread 0x733ff460 (LWP 17484)]
[Thread 0x733ff460 (LWP 17484) exited]
[Detaching after fork from child process 17485]

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x74f4f460 (LWP 17447)]
0x00504d20 in parse_neighbor_cache () at src/database/network-table.c:531
531     src/database/network-table.c: No such file or directory.
(gdb) thread 5
[Switching to thread 5 (Thread 0x74f4f460 (LWP 17447))]
#0  0x00504d20 in parse_neighbor_cache () at src/database/network-table.c:531
531     in src/database/network-table.c
(gdb) backtrace
#0  0x00504d20 in parse_neighbor_cache () at src/database/network-table.c:531
#1  0x00507b1c in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#2  0x76e7f494 in start_thread (arg=0x74f4f460) at pthread_create.c:486
#3  0x76e02578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p linebuffer
$1 = 0x735348b8 "192.168.72.116 dev eth0 lladdr b8:8a:60:51:e7:d0 REACHABLE\n"
(gdb) p num
$2 = num
(gdb) p ip
No symbol "ip" in current context.
(gdb) p iface
No symbol "iface" in current context.
(gdb) p hwaddr
No symbol "hwaddr" in current context.
(gdb) continue
Continuing.
[New Thread 0x733ff460 (LWP 17487)]

Thread 5 "database" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) continue
Continuing.
Unable to fetch general registers.: No such process.
Unable to fetch general registers.: No such process.
(gdb) [Thread 0x733ff460 (LWP 17487) exited]
[Thread 0x73f4d460 (LWP 17449) exited]
[Thread 0x7474e460 (LWP 17448) exited]
[Thread 0x74f4f460 (LWP 17447) exited]
[Thread 0x75750460 (LWP 17446) exited]
[Thread 0x75f51460 (LWP 17445) exited]
[Thread 0x76752460 (LWP 17444) exited]

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.

Not the same situation as mentioned here in the topic.
Most of us here do not run DHCP on PiHole.

it didn't fix my crash...
DBINTERVAL is set to 1 to test.
as soon as I set MAXDBDAYS to 1 and restart dns resolver, it crashed within 1 minute.

I cannot run debugger now (I'm at work... and my wife working from home were PiHole is installed)... but I can try later if necessary.

An alternative (short term fix) is (hopefully)

pihole checkout ftl 71e8498

This will reset Pi-hole to the version four days ago which was known to be stable. The though guys can still try to identify the bug with me. We should use Github for this to prevent spreading information all over the place (and maybe loose important bits).

looks fine to me. Will keep this one until the regular beta channel is updated... or switch to a different build if you have something new to test.

This one works for me too so far.

Please try again with fix/neighcrash, I fixed a small possible heap memory glitch which might actually have caused something like this. However, please try only in roughly one hour from now (I'll already be at work then), as the CI is currently very busy so building the binaries will take longer than usual.

You can check whether you got the most recent version by issuing

pihole-FTL -v

which should return

v4.3.1-630-g ---> 203057c <---

after checkout/update.

Thanks!

Working good so far!

version check provides this:

vDev-203057c

vDev-203057c crashes for me. However the status on the web interface for DNS says active even when it's offline. Will post debug info later. I'm trying to figure out which settings are causing the FTL crash for me.

One one mashine it's working fine for now (x86) but on a raspi 4 w. raspbian it crashes again with a different trace this time.

Thread 5 "database" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb5006460 (LWP 21824)]
0xb6e1f9f4 in _IO_vfscanf_internal (s=s@entry=0xb5005a58, format=format@entry=0x2 <error: Cannot access memory at address 0x2>, argptr=..., argptr@entry=..., errp=errp@entry=0x0) at vfscanf.c:381
381     vfscanf.c: Datei oder Verzeichnis nicht gefunden.
(gdb) backtrace
#0  0xb6e1f9f4 in _IO_vfscanf_internal (s=s@entry=0xb5005a58, format=format@entry=0x2 <error: Cannot access memory at address 0x2>, argptr=..., argptr@entry=..., errp=errp@entry=0x0) at vfscanf.c:381
#1  0xb6e31300 in _IO_vsscanf (string=0xb2d34888 "192.168.100.61 dev eth0 lladdr 00:1e:a0:00:0a:b1 STALE\n", format=0x2 <error: Cannot access memory at address 0x2>, format@entry=0x0, args=..., args@entry=...) at iovsscanf.c:41
#2  0xb6e2a6fc in __sscanf (s=<optimized out>, format=0x2 <error: Cannot access memory at address 0x2>) at sscanf.c:32
#3  0x004cdb9e in parse_neighbor_cache () at src/database/network-table.c:368
#4  0x004d0bec in DB_thread (val=<optimized out>) at src/database/database-thread.c:68
#5  0xb6f26494 in start_thread (arg=0xb5006460) at pthread_create.c:486
#6  0xb6ea9578 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

Still is working here, based on input from @bigpcjunky, I checked if all is running using pihole status and it seems it still is: [✓] DNS service is running [✓] Pi-hole blocking is Enabled

Edited to add: running Raspbian on Pi-Hole Zero W with unbound, using usb-to-ethernet dongle (WiFi is disabled). Only IPv4 enabled on Pi-Hole.