PiHole Crash - Reason shared memory tmpfs /dev/shm 100%

MichalisDBA · November 26, 2024, 10:46am

Hello.

We have a very high traffic internal system that generates many DNS request to pihole. We know that and it is by design that way.

The problem we have with pihole is that, although it's working fine, if the tmpfs /dev/shm partition fills up, it crashes the dns service with message DNS service not running

I have found many threads and what we did is to set the variable DBIMPORT=no so that when the crash occurs it will start the PiHole dns service again without the latest 24hours statistics.

The question i want to ask. Is there a way to set the location of FTL-queries or all FTL-* files inside the RAM shared memory partition tmpfs /dev/shm to a different location in a normal partition in disk? For example under /var/pihole-stats ?

I could not find a variable to set this up. Is there maybe another workaround to this?

Thank you.

DL6ER · November 26, 2024, 11:36am

/dev/shm is nothing but Linux's implementation of the shared memory concept. /dev/shm is currently a hard-coded location. It is designed to be in RAM to guarantee fast access and intended to appear as a mounted temporary file system (tmpfs), using virtual memory instead of a persistent storage device.

How large is the memory of your server and how much of this is allocated to /dev/shm? If you have plenty of space but your shm-space simply tends to be too small, you could try something like

sudo mount -o remount,size=8G /dev/shm

to increase the size of your shared memory region to 8 GB.

If this solves your issues, you may want to make this permanente by editing /etc/fstab (or similar), e.g.

none      /dev/shm        tmpfs   defaults,size=8G        0 0

followed by a

mount -o remount /dev/shm

It'd also be possible to do this on-disk but FTL heavily relies on unaligned access to this data, rendering the alternative to put this data on-disk a heavy performance burden which should not be chosen lightweight (plus, it is currently not possible without recompiling FTL from source).

Alternative: If memory is really limiting you here, the real solution would be reducing the default 24 hours windows of history kept in memory down to, say, 6 hours or maybe even less.

Use MAXLOGAGE=6 or similar in a file /etc/pihole/pihole-FTL.conf for Pi-hole pre-v6 or use sudo pihole-FTL --config webserver.api.maxHistory 6 for Pi-hole v6.0+.

Maybe one further question: How is the memory distributed ( ls -lh /dev/shm/FTL* ) ?

MichalisDBA · November 26, 2024, 11:59am

Thanks for the reply.

We have a PiHole VM with 2GB Ram and the /dev/shm partition is 1GB.

When we define in our network of systems that this pihole vm is their primary dns, our systems spam constantly the pihole for dns requests. This is by design unfortunatly. The problem is that this partition /dev/shm fills up before the 24hours pass and the pihole vm crashes. We are talking about million of requests. The pihole can keep up just fine with the requests BUT it crashes because this ram partition fills up.

To increase the RAM along with the ram partition size in /dev/shm will just postpone the inevitable.

I was asking maybe there was a variable like in piholeDebug.sh

SHM_DIRECTORY="/dev/shm"

that you can define and save the shm data to a different partition.

DL6ER · November 26, 2024, 12:04pm

Sure, I understand but shared-memory is not like a "normal" filesystem, the mere existence as files is just there to make it easier for applications to deal with files. When you use shm_open(), you cannot specify absolute paths.

See also man 7 shm_overview:

On Linux, shared memory objects are created in a (tmpfs(5)) virtual filesystem, normally mounted under /dev/shm.

Changing this requires you to recompile your kernel - but simply changing the path this in-memory region spawns into won't change anything regarding your particular problem.

Please try what I suggested last in my previous message: Reducing the memory that is needed to begin with.

MichalisDBA · November 26, 2024, 12:05pm

Thank you will try the value MAXLOGAGE=6 to begin with.

DL6ER · November 26, 2024, 12:08pm

Sorry, I forgot to reply to

No, it shouldn't. FTL's memory will increase notably during the initial 24 hours of priming the history but should saturate with little to no additional changes after this initial period. On my local system, for instance, it saturated around 5.5 MB (1% of my /dev/shm) and stays there since weeks. Of course, I am miles away from

MichalisDBA · November 26, 2024, 2:36pm

Something last to ask.

Will this behavior (saving in /dev/shm for the daily stats) be changed in version 6?

DL6ER · November 26, 2024, 5:08pm

No, probably not. I have been experimenting with this earlier today but the performance penalty of putting the files into, e.g., /tmp (on-disk) rather than the kernel's default location for shared memory (technically speaking: using shm_open() instead of open()) was quite noticeable. Without any proper timings, I'd say it was at least a factor of 100x slower. /tmp was mounted on a reasonably fast SATA-attached SSD.

MichalisDBA · November 26, 2024, 9:09pm

Sorry to bother you but i can not wrap my head around how to stop the file FTL-queries under /dev/shm to grow.

We do not need the live stats in dashboard. Is there a way to disable this permanently and stop this file to get bigger? What's the usage of this file anyway? It's a little undocumented as i saw.

If it can be disabled, is there a way to rotate it or empty it without restarting the DNS service if the maximum file of this exceeds some MBs? It's really very troublesome in our environment that we have high load of DNS requests.

Bucking_Horn · November 26, 2024, 9:45pm

How many DNS requests do you see in those 6 hours?
If 1G of storage at /dev/shm for 6 hours of DNS traffic is not enough, that may imply some client's misbehaviour, e.g. not honoring TTLs.

Would blocked domains significantly contribute to those high counts?
Are you in control of the network that issues the requests, or of the most active clients?

To that end, what's the result of:

echo ">stats >quit" | nc localhost 4711

echo ">top-clients >quit" | nc localhost 4711

echo ">top-domains >quit" | nc localhost 4711

echo ">top-ads >quit" | nc localhost 4711

MichalisDBA · November 26, 2024, 9:58pm

domains_being_blocked 121039
dns_queries_today 13348191
ads_blocked_today 0
ads_percentage_today 0.000000
unique_domains 13
queries_forwarded 11
queries_cached 13348180
clients_ever_seen 3
unique_clients 3
dns_queries_all_types 13348191
reply_UNKNOWN 0
reply_NODATA 0
reply_NXDOMAIN 8
reply_CNAME 2
reply_IP 13348179
reply_DOMAIN 2
reply_RRNAME 0
reply_SERVFAIL 0
reply_REFUSED 0
reply_NOTIMP 0
reply_OTHER 0
reply_DNSSEC 0
reply_NONE 0
reply_BLOB 0
dns_queries_all_replies 13348191
privacy_level 0
status enabled

I can give you the stats but not the rest of the commands because the data have sensitive IPs etc. But they are really few IPs that generates this traffic.

We have PHP applications in many servers that connect to databases via their FQDN instead of databases IPs. We have the FQDNs entries in PiHole on local DNS.

We do not block anything or have anything else custom in PiHole.

We just have many calls to backend databases via their DNS name instead of IP. The servers that we have the PHP applications run do not cache locally the DNS. For every request for a connection to a database, a DNS request is made to PiHole to resolve the FQDN.

PiHole can handle the traffic just fine, the only problem is that this file FTL-queries fills up /dev/shm and crashes the DNS service. Still do not understand what this file do under /dev/shm. Is it for stats? If yes, we do not want it and i am asking if there is a way to disable it from growing larger and larger.

I open the FTL-queries file with a Hex Editor and the only thing i see are repeated patterns

Are these the FQDNs in Local DNS? If so why to fill constantly that file?

To add further to my questions. This file i downloaded and it was 750MB.

In Hex Editor it has many empty lines

I edited the file and saw that these empty lines are 250MB if you remove the lines with data.

Very strange

Bucking_Horn · November 26, 2024, 10:17pm

13,348,191 DNS requests for 13 unique domains in 6 hours, originating from 3 clients - this is indeed client misbehaviour, as they are apparently not respecting TTLs, and their excessive and unnecessary DNS requests also contribute towards network congestion.
Especially for the latter, fiddling with in-memory storage won't help that.

This should be addressed by installing a stub resolver on the requesting clients, or by editing their hosts files.

I also note that you have virtually zero blocked domains:

ads_blocked_today 0

If you do not block any DNS traffic, and you do not need the insights that Pi-hole dashboard may provide, this makes me wonder what you would need Pi-hole for?

MichalisDBA · November 26, 2024, 10:21pm

I have updated my previous question with more info about FTL-queries file.

The main purpose for using pihole is in our case we want our PHP applications to connect to a different back-end database or manually fail-over to another database, to make it easily just by editing the DNS entries in Local DNS in PiHole.

And i am repeating myself. PiHole can handle the traffic just fine. Is the FTL-queries file that fills up /dev/shm and crashes the DNS service.

Bucking_Horn · November 26, 2024, 10:36pm

Perhaps /dev/shm usage may be limited further, but that won't address your issue.

The root cause here is client misbehaviour, and that should be addressed client-side.
One advantage of using Pi-hole is that its dashboard and query logs would allow you to visualise, detect and attribute that kind of misbehaviour to specific clients.

Even if you could tune Pi-hole to disable /dev/shm usage, your clients would still smudge your network with well over 50 million unnecessary DNS requests and replies a day.

If you are happy with that, and you do not need Pi-hole's filtering and dashboard core features at the same time, you probably should use another tool.

That may not even have to be a DNS server - a simple script pushing additional hosts file lines to three clients may suffice.

MichalisDBA · November 26, 2024, 10:48pm

You are right from your point of view.

In our case we do not have any network problems. We just wanted a simple and fast way to change on-demand how our application services talk to our databases and guide the traffic however we see it fit.

Please take a look to this FTL-queries file behavior especially the empty lines that are taking so much space and the repeated patterns. Maybe refactor some things to optimize it further. I am just adding some feedback.

Thank you.

Bucking_Horn · November 26, 2024, 11:11pm

As DL6ER has explained, it's unlikely that Pi-hole would cease using shm.
Limiting shm to a certain size (if possible) could perhaps be an option that would benefit other users as well, but that is for development to decide.

Meanwhile, I've tried to provide you with a solution that would work immediately and also lower DNS requests, independently of Pi-hole.

It may help to understand that DNS addresses the balance between most up-to-date information for clients and server load by providing a TTL in DNS replies.

Pi-hole allows to tune TTLs for records its authoritative for, by setting pihole-FTL's BLOCK_TTL or dnsmasq's local-ttl.

Most OSs would come with a caching DNS resolver, which would mitigate clients excessive requests by honoring those TTLs.
Your client OSs apparently don't, which is why you observe those excessive DNS requests.

Installing a stub resolver on the clients would address your issue, and substantially lower DNS requests at the same time.
That solution is not only available immediately, it is also superior to only having Pi-hole somehow hide or ignore the requests.

EDIT:
DNS may just paper-gloss over client application behaviour here:

That seems like an unfavourable implementation choice by the application software that makes those calls.
If you have control over those applications, there are likely ways to improve the code.
For once, the domain could be resolved to an IP and held in a variable prior to database access, so actual subsequent calls can operate with that IP variable. If multiple calls are made in one transaction, that would already lower the number of calls. Potentially, similar could be applied to longer living sessions as well.
But an even bigger benefit would be to make use of database connection pooling, allowing calls to reuse existing connections, even across sessions. This would avoid the overhead of authenticating against a database and negotiating a connection for every single database call.
While those improvements would significantly lower the number of DNS requests, they'd likely also reduce resource usage (cpu, ram, network) and thus boost application performance - perhaps considerably, given the large number of your requests.

lorisjoe · November 30, 2024, 8:41am

If you're running into issues with Pi-hole crashing because the tmpfs partition (/dev/shm) fills up due to high traffic, you can move Pi-hole's FTL (Fast Telemetry Logger) files to a regular disk partition to prevent this from happening. This can be done by creating a new directory on your disk (e.g., /var/pihole-stats) and moving the FTL files from tmpfs to this new location.

Start by creating the directory where you want the FTL files to be stored and setting the correct permissions. Next, stop Pi-hole to ensure you're not moving files while the service is running. Move the FTL files from /dev/shm to your new directory, then create symbolic links to make Pi-hole think the files are still in their original location in tmpfs, even though they are now on your disk.

Finally, restart Pi-hole, and the service should run without issues related to the tmpfs partition filling up. Just make sure that the disk partition you're using has enough space, as high traffic can generate a lot of data, and be mindful that accessing disk storage is slower than using RAM.

Bucking_Horn · November 30, 2024, 3:15pm

Where did you get the impression that your above suggestion would work?
Did you try that yourself?

EDIT:
I did try your proposed steps now, and they don't work.

If you'd do that, you'd find that there are no FTL files under /dev/shm to copy once you stop the service.
And if you create the symlinks manually, creating the shm files during Pi-hole startup will fail:

[2024-12-04 16:09] FATAL: create_shm(): Failed to create shared memory object "FTL-queries": File exists
[2024-12-04 16:09] Initialization of shared memory failed.

deHakkelaar · November 30, 2024, 7:02pm

Did you get below bit?

DanSchaper · November 30, 2024, 10:09pm

Where did you get that?