Please follow the below template, it will help us to help you!
Expected Behaviour:
I am running a Threadripper 3960X server with 128GB of RAM. It runs basically two main services. Plex and PiHole. I've tried pihole using docker, but as I am using the ZFS file system, its just too hard and creates too many disk corruptions. Anyway, that's not what this post is about. I run pihole natively on the box using port 8080 (I have apache running on port 80 for some plex automation stuff). I am using Ubuntu 20.04 TLS.
I forward my pihole queries to my firewall which runs another DNS server for my own internal domain. That all works fine.
I have 2.6million sites blocked based on about six or so lists I have loaded.
I would expect pihole to only use maximum around 8GB of RAM at any given point.
Actual Behaviour:
In 24 hours of running, pihole has taken up 64GB of my 128GB of RAM. I have a lot of RAM for caching ZFS. I certainly didn't expect to take 64GB of RAM. I don't know what the next 24 hours will bring in terms of RAM usage but the RAM is not there for pihole, its there for ZFS caching as I have many terabytes of data that moves around and need it for the caching.
Thanks for your response. Greatly appreciate you looking into it.
A couple of things before I post the results of that command that I'd make:
Regardless of what my results to that command show, it is clear that the RAM usage has significantly changed since running pihole. It may be consumed via a linux process rather than showing up in a pihole command, but logic says to me "what has changed in the past 24 hours to have such a significant change to my RAM usage" and the answer is simple - installation of pihole.
Your comment on RAM wastage. The way my RAM works is that when files start moving around, that RAM is used at that point and is consumed heavily. When its dormant, sure the RAM is available, but when file movement starts again, it has far less RAM available to draw from for caching, making everything much slower.
Thanks again for your consideration. Here is the results of that command:
You are right Dan. I just checked with this command:
arcstat:
time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
12:06:56 0 0 0 0 0 0 0 0 0 58G 62G
arcsz is using the 58GB of RAM. I am learning some stuff here. I've learned that the ARC, ZFS's disk cache will shrink automatically when it needs that RAM for other things.
What confused me is that other filesystems the disk cache is managed by the kernel's boring old regular cache pool, and so doesn't appear as "used" in the memory usage metrics (it gets labelled "cached", normally).
So I guess its gone way up because pihole is doing a huge amount of read/writes with 2.6 million domains. I am also guessing that ZFS will re-allocate that RAM as required if it is doing other things. That is a question - is that right Dan? What do you mean by tuning my ARC to something sane?
Thanks again for teaching me this stuff.
PS: I have a 1TB SSD L2ARC but it doesn't get used much.
This is not the case. The domains are read into memory, and there are no more read/writes with this many domains than with 50K domains.
Pi-hole writes to the dnsmasq log at /var/log/pihole.log, the Pi-hole log at /var/log/pihole-FTL.log, and to the long term database at /etc/pihole/pihole-FTL.db. The amount of writing is generally proportional to the number of DNS queries the Pi-hole processes.
Your previous output showed that Pi-hole was using 463 MB of RAM, which sounds about right.
Arc tuning for ZFS is a dark art. There's a ton of different opinions on what to set for minimum, what to set for maximum and how to decide how much of your memory should be consumed. ZFS is notorious for eating memory, especially if you have dedupe enabled for your pool(s).
And I'm not all that surprised the L2ARC isn't being touched if there's no constraint on the amount of RAM you've allowed ZFS to take over.
Yeah, understood. But what is happening is ZFS is caching a huge amount of transaction activity that pihole is undertaking. Might not be read/writes to disk, but read/writes into RAM cache. And that makes a lot of sense because DNS is by definition a very transactional system - micro transactions but a huge amount over a 24 hour period. So ZFS is caching all of that to make it run more efficiently - or at least thinking that is the best outcome here.
I have also learned that ZFS will make a decision if file transfers or other things need the RAM, it will automatically adjust its cache for pihole.
BTW, I only have learned this since my original post only just an hour ago.
Well, I do agree with you in part, the bit that I would define as the issue is actually me. I misunderstood how things fit together. Pi-hole is working normally - fantastic in fact. Love it. I don't think its ZFS though either. I think it's doing exactly what it is designed to do. Cache. And to be fair, if nothing else is using the RAM at that point in time, it should give it to pi-hole to run faster. If something else comes along that could use it better, ZFS apparently uses its dark art to work out what gets what RAM allocation. I'll just take a wait and see approach to it all and see how performance goes and whether I need to tweak ZFS arc - something I will need to learn before I go near touching it :).
Thanks for your support in this thread. I really appreciate it!
Well, it is only caching disk I/O, right? The entire DNS traffic happens over sockets (typ. UDP), so that is entirely out of scope for the file system. What could indeed be interesting here is how ZFS handles shared memory. These are pseudo-files on disk that actually exist in memory only. They are in /dev/shm and this should typically be a RAM-disk to not cause any writing to your disks.
Please provide the output of
df -h /dev/shm
on your system.
The only other places where stuff is often written to are the log file and the database
However, we're not actually reading anything (looking at the log files) or only some very few bytes (long-term database) from these files so they don't ever need to be cached. In fact, caching them may be the issue here (check the database file size!) as, if your caching is configured to place a dedicated copy of the same (say 500 MB) file on every update, it is absolutely clear that this will fill within, at most, hours.
From whatever direction we look at it: Yes, the caching needs tuning. Hopefully, my comment gives you some more info to play with/check when doing further tweaking.