FTL-queries outgrowing /dev/shm (continued)

foolishtacos · May 31, 2025, 3:31pm

Continuation of FTL-queries outgrowing /dev/shm

Expected Behaviour:

Query log should not be allowed to outgrow /dev/shm.

Actual Behaviour:

Query log is allowed to outgrow /dev/shm leading to a FTL crash loop.

Debug Token:

Not able to capture relevant info due to crash loop; must delete pihole-FTL.db to achieve startup which clears the relevant info.

Happened again -- internet was out for a day, devices blasted out DNS queries during the outage and filled up /dev/shm leading to a crash loop. I wasn't able to grab >stats and >top-clients in the midst of the restarts but I was able to get a screen grab. It shows there was no single device or class of devices at fault; it was a broad set of devices (TVs, Chromecasts, network devices, robo vac, solar assistant) sending a big increase in queries.

2025-05-31 09:45:34.590 CDT [49M] INFO: Imported 904906 queries from the long-term database
2025-05-31 09:45:34.591 CDT [49M] INFO:  -> Total DNS queries: 904906
...
2025-05-31 09:48:18.162 CDT [49M] WARNING: Shared memory shortage (/dev/shm) ahead: 99% is used (67.1MB used, 67.1MB total, FTL uses 67.1MB)
2025-05-31 09:48:18.166 CDT [49M] WARNING: Could not fallocate() in realloc_shm() (/app/src/shmem.c:838): No space left on device
2025-05-31 09:48:18.166 CDT [49M] CRIT: realloc_shm(): Failed to resize "/FTL-49-queries" (10) to 65470464: No space left on device (28)

In light of this -- WAN outage due to external factors, general device behavior during the outage, no single device to punish for bad behavior -- would some kind of measure to manage /dev/shm proactively to prevent a crash loop be called for?

Thank you for the awesome software.

Doug_Laidlaw · June 1, 2025, 1:27pm

You aren't the only user to have this happen. I haven't got mine working yet, but I was thinking of using a 64 GB card as extra insurance.

foolishtacos · June 2, 2025, 2:19am

Thanks for confirmation. 64GB -- that ought to do it, though I'd worry about endurance of the card and failure there taking down DNS.

Edit: From this post it seems that moving the location of those files is not possible, nor would you want to, for performance reasons in addition to endurance reasons in the case of a card.

Also from the post linked above, suggestions given were to shorten the period for the stats to <24h or increase the size of shm. In my case roughly speaking it looks like if I adjust either by a factor of 3 then I'd make it through an outage without DNS going into a crash loop. That's very doable -- increasing container shm to 256m or 512m is no problem, and is what I'll try. Though for a critical use case (not a homelab) it seems like there ought to be some proactive management around it -- DNS getting borked due to external factors is no bueno.

Doug_Laidlaw · June 2, 2025, 3:46am

It was said earlier that 64 GB cards need to be formatted differently, but that is not quite true, apparently. This will be my first, and there is no guide on the packaging. Maybe someone more knowledgeable can help.

system · June 23, 2025, 3:47am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.