What is the optimal cache size?

glenw · June 9, 2018, 8:36pm

Thanks for this article, it explains things very clearly. I am using the beta and unbound as described and it works very well for me. I am happy to accept a small performance penalty for the privacy benefits. In actual fact we have no perceivable performance hit - not noticed at all.

I did experiment with setting a larger cache (up from the default 10,000 to 250,000) but found I could only make this work by editing 01-pihole.conf. When I edited /etc/dnsmasq.conf Pi-hole's DNS services wouldn't run. Same if I created a new file (e.g. cache-increse.conf) in the /etc/dnsmasq.d directory.

Question: Is it only possible to do this in the 01-pihole.conf file as this somehow works around the native dnsmasq 10,000 cache limit?

Thanks for a great product!

DanSchaper · June 9, 2018, 9:11pm

We've had some discussion about cache values lately, I think the native 10,000 is patched out, but the config may still be in force. @DL6ER was the patch for total cache, or TTL, or none of the above?

DL6ER · June 10, 2018, 4:54am

The modification (removal of upper limit) is for the total cache. You'll have to change it in 01-pihole.conf - not because this file is special but because it can only be set in one config file at a time. If you want to configure it in another file, it would be defined in there plus in 01-pihole.conf and the resolver wouldn't know which you actually want and hence fails.

glenw · June 10, 2018, 5:00am

Thanks, solves the mystery. It's certainly working fine changing it in 01-pihole.conf. I'll just need to remember to change it after future upgrades.

SteveW25561 · June 10, 2018, 5:40am

This looks really cool - just installed it and so far looking good.

Followed the tutorial as listed. If done with all those steps, will the unbound service start automatically on reboot of the pi? Or is there another command we need to add to get it running on startup?

glenw · June 10, 2018, 6:34am

Yes, unbound starts automatically afterwards.

SoNic67 · June 10, 2018, 1:34pm

What number do you recommend for the cache limit? I am trying this:

cache-size=100000

DL6ER · June 11, 2018, 3:05pm

Cache sizes of more than maybe 1000 are only useful in very specific environments (very heterogeneous clients querying a large amount of different domains). I added a cache efficiency measure into FTLDNS (similar to what dnsmasq offers). Although it is not (yet) exposed nicely on the GUI, you can query it manually.

Run

echo ">cacheinfo" | nc pi.hole 4711

on your Pi-hole.

You should get something like

cache-size: 1000
cache-live-freed: 0
cache-inserted: 12620

The individual numbers mean the following:

cache-size - the (maximum) cache size. With Pi-hole, you typically specify this number directly in 01-pihole.conf. It is the number of entries that can be actively cached at the same time
cache-live-freed - the number of cache entries that had to be removed although they haven't expired so far. Cache entries get removed when the cache is full at some point and older cache entries need to be removed to fit newer queries in. The cache size should only be increased when this number is larger than zero
cache-inserted - total insertions of queries into the cache. This number may be substantially higher than cache-size because it is a global sum and cache entries naturally make room for new insertions over time when they expire

TL;DR: As long as cache-live-freed is really low (or even zero), your cache size is sufficient. It may even be too large.

I see that there is a certain lack of clarity when it comes to DNS caching and will consider to create a description for our documentation pages before we release Pi-hole v4.0.

glenw · June 11, 2018, 7:44pm

Thanks, this is helpful and educational.

I had to use "echo ">cacheinfo" | nc 127.0.0.1 4711" (maybe because I had edited my hosts file to show "unbound" or 'cause I'm on the beta - not sure).

Anyway I get the result below which suggests 250000 is OVERKILL and not necessary at all.

Could you explain how the dnsmasq cache and the unbound cache work together?
Output of "unbound-control stats_noreset" is also below. Many thanks.

cache-size: 250000
cache-live-freed: 0
cache-inserted: 143
---EOM---

thread0.num.queries=8721
thread0.num.cachehits=640
thread0.num.cachemiss=8081
thread0.num.prefetch=167
thread0.num.zero_ttl=0
thread0.num.recursivereplies=8081
thread0.requestlist.avg=0.677861
thread0.requestlist.max=17
thread0.requestlist.overwritten=0
thread0.requestlist.exceeded=0
thread0.requestlist.current.all=0
thread0.requestlist.current.user=0
thread0.recursion.time.avg=0.298242
thread0.recursion.time.median=0.237763
thread0.tcpusage=0
total.num.queries=8721
total.num.cachehits=640
total.num.cachemiss=8081
total.num.prefetch=167
total.num.zero_ttl=0
total.num.recursivereplies=8081
total.requestlist.avg=0.677861
total.requestlist.max=17
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.298242
total.recursion.time.median=0.237763
total.tcpusage=0
time.now=1528747140.238917
time.up=252015.844783
time.elapsed=252015.844783

DL6ER · June 11, 2018, 9:22pm

Agreed.

They are separate. However, while dnsmasq will only cache results to actual requests (e.g. some.domain.de), unbound will cache also the intermediate steps along the DNS path, e.g.

;rrset 86392 6 0 2 0
de.     172792  IN      NS      n.de.net.
de.     172792  IN      NS      l.de.net.
de.     172792  IN      NS      z.nic.de.
de.     172792  IN      NS      a.nic.de.
de.     172792  IN      NS      s.de.net.
de.     172792  IN      NS      f.nic.de.

;rrset 86392 1 0 1 0
n.de.net.       172792  IN      AAAA    2001:67c:1011:1::53
;rrset 86392 1 0 1 0
l.de.net.       172792  IN      A       77.67.63.105
;rrset 86392 1 0 1 0
s.de.net.       172792  IN      AAAA    2003:8:14::53
;rrset 86392 1 0 1 0
f.nic.de.       172792  IN      AAAA    2a02:568:0:2::53
;rrset 86392 1 0 1 0
f.nic.de.       172792  IN      A       81.91.164.5

;rrset 86392 4 0 2 0
google.de.      86392   IN      NS      ns2.google.com.
google.de.      86392   IN      NS      ns4.google.com.
google.de.      86392   IN      NS      ns3.google.com.
google.de.      86392   IN      NS      ns1.google.com.

;rrset 86392 1 0 8 0
ns1.google.com. 345592  IN      AAAA    2001:4860:4802:32::a
;rrset 86392 1 0 8 0
ns1.google.com. 345592  IN      A       216.239.32.10

;rrset 3592 1 0 8 3
google.de.      3592    IN      A       172.217.16.195

Knowledge about how to resolve .de or google domains may come in handy for subsequent queries and making them notably faster.

davygravy · November 11, 2020, 5:09am

Great read, thanks to all for posting.

What is the "lifespan" "length of time until expiration" of an insertion?

What specifically is the difference between an insertion and an entry?

yubiuser · November 11, 2020, 5:14am

It's the ttl ("time-to-life"). See how it decreased from 206 (sec) to 201 sec in my example

nanopi@nanopi:~$ dig google.com

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8595
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		206	IN	A	172.217.22.206

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mi Nov 11 06:12:33 CET 2020
;; MSG SIZE  rcvd: 55

nanopi@nanopi:~$ dig google.com

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53469
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		201	IN	A	172.217.22.206

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mi Nov 11 06:12:38 CET 2020
;; MSG SIZE  rcvd: 55

davygravy · November 11, 2020, 5:30am

(please see edit at end)
Thank you, and can you please explain.

That seems counterintuitive. TTL is number of hops, in some sense, of a packet. (sure, time to live)

I interpret "expiration" to be chronological in nature, applying to an entry in a database - perhaps something like "elapsed time since last access of a given entry". That entry is written in the database/cache, and not going anywhere... not sure I understand how hops through a network matches this.

TIA for your explanation.

EDIT/AFTERTHOUGHT: OK, I need to read more before posting... just read this, and need to read more, I guess...

Bucking_Horn · November 11, 2020, 8:08am

You are describing the IPv4 TTL header field (equivalent to IPv6 hop limit).

There are multiple defintions for TTL applicable in different contexts.

In DNS, an authoritative DNS server sets the TTL for a given DNS record to tell a client like a recursive or local resolver how long it should cache such a record for.

nazklx · December 15, 2021, 5:59am

Sorry, this turned out a much bigger reply than I first though

After reading this thread I worried-- How could I quantify 'too large'? I looked at the FTLDNS docs:

"the DNSSEC validation process uses the cache."

Now, just looking at my main Pi-hole's dashboard. 'Query Types' shows DS and DNSKEY requests to make up ~3% of queries. Does this mean that ~3% of my cache is from DNSSEC?

I think to evaluate: since TTL seems to be short (broadly 150-250 seconds) we can look at spikes in requests from the Pi-hole. Again, looking at the dashboard, I'm seeing ~600 total permitted & blocked queries at the highest 10-minute period of just today.

This ^ makes more sense now, and building on top of that, from the FTLDNS docs:

"The number of cache entries that had to be removed although the corresponding entries were NOT EXPIRED. Old cache entries get removed if the cache is full to make space for more recent domains. The cache size should be increased when this number is larger than zero."

Again, making sure to differentiate that cache insertions does not mean you have that much stuff cached.

Though to go through all this my basic takeaway is that the bigger the cache, you're allocating more memory to a cache that could only be 10% utilized most of the time, peaking at maybe 50%. Memory that could be allocated to other services for Pi-hole and your server. Is analyzing the peak the right way to go? Is there a better approach? I believe the default cache size, if I'm going the right direction, should have an asterisk telling you to change it.

DL6ER · December 15, 2021, 3:10pm

The default cache size is sufficient for most at-home-network scenarios and still yields a high performance. Performance is degraded as the cache grows - a larger cache also means you have to search through more stuff until you find that you don't have something in your cache.

The takeaway message builds on this and

If the latter is not happening for you, we recommend to leave it as the default value. The default value will neither eat an awful lot of memory nor will it require you to come back in regular intervals to check if your lowered value is stiff sufficient.