DNS cache questions

It appears that FTL does not respect the TTL in the DNS record.
This is a bug and should be fixed.

In the meantime, the documentation says that you can turn off caching by setting cache-size to zero.
If I do this, FTL refuses to start.

Pi-hole Version v4.3.2 Web Interface Version v4.3.2 FTL Version v4.3.1

The config file has a comment insisting that you don't modify it, but override values in your own config file. However, trying to override config-size gives this result:

Nov 12 22:52:09 raspberrypi systemd[1]: Started User Manager for UID 999.

Nov 12 22:52:09 raspberrypi pihole-FTL[1171]: dnsmasq: illegal repeated keyword at line 33 of /etc/dnsmasq.d/01-pihole.conf

Nov 12 22:52:09 raspberrypi dnsmasq[1212]: illegal repeated keyword at line 33 of /etc/dnsmasq.d/01-pihole.conf

Nov 12 22:52:09 raspberrypi dnsmasq[1212]: FAILED to start up

Can you show some information about what records are not respecting the TTL?

The actual DNS record is admin.solutus-lab.net:

$ dig +nocmd +noall +answer @10.0.0.21 admin.solutus-lab.net

admin.solutus-lab.net. 181 IN A 54.202.6.183

I see a TTL value returned (the configured value is 300 seconds.

This is an AWS instance. When shout down and re-started the public IP changes and is updated automatically.

Using Pi-Hole, if I try to access this server, even though there may have been no accesses for many hours, so any cached value should be invalid,

I get the old address returned.

If I use one of the upstream servers that Ph-Hole uses, I get the right IP returned.

Re-start Pi-Hole (or just FTL) and I get the new address.

Seems like the cached value gets returned for the lifetime of the cached entry, and that the cached entry does not time out based upon its TTL and the time it was added into the Pi-Hole cache.

The documentation notes that you should do this in /etc/dnsmasq.d/01-pihole.conf

https://docs.pi-hole.net/ftldns/dns-cache/

There is a note in that file that any changes will be overwritten, so if you update or repair, you will have to make the change again. If you specify this setting in multiple files, that causes a problem.

Make the change in the designated file, restart FTL and then run the command to check the cache statistics. You should then see a zero cache size.

What were the contents of that file when this error was received?

I added a second file (99-pihole.conf) to map some local addresses, so added “cache-size=0” to that, thinking that it would override the previous definition.
It doesn’t. It gives an error.

I could change it in the 01 file, but don’t want to have to do that each time a new version is loaded.

I believe changing the 01 file is your only option.

Well, I suppose I could create a cron job to check every now and again and if it finds it changed, edit the file and re-start…

Getting the cached entries to o honor the source TTL would be the real answer, but I suspect that is likely non trivial.

Given the benefits, I can live with a few little problems like this :slight_smile:

I don't see the issue here and I think there is some confusion.

The TTL of the domain is 300 seconds. We don't modify that. What you are seeing is the cache timer stepping down. Once it hits 0 it will expire out of the cache and the next time queried it will start back at 300. The only TTL values we have any control over are local values for items like hosts file entries or DHCP client IPs.

Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net
admin.solutus-lab.net.  300     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net
admin.solutus-lab.net.  289     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net
admin.solutus-lab.net.  285     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @192.168.10.2
admin.solutus-lab.net.  161     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @192.168.10.2
admin.solutus-lab.net.  160     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @192.168.10.2
admin.solutus-lab.net.  159     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @192.168.10.2
admin.solutus-lab.net.  158     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @192.168.10.2
admin.solutus-lab.net.  157     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @8.8.8.8
admin.solutus-lab.net.  299     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @8.8.8.8
admin.solutus-lab.net.  299     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @8.8.8.8
admin.solutus-lab.net.  299     IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @192.168.10.2
admin.solutus-lab.net.  73      IN      A       54.202.6.183
Mariner-10% dig +nocmd +noall +answer admin.solutus-lab.net @192.168.10.2
admin.solutus-lab.net.  300     IN      A       54.202.6.183

The way it works is:

The result of a DNS query contains a Time To Live (TTL) field. That number is set by the definition of the entry in an authoritative server.

When it hands out the entry in response to a query, the TTL is included. Its just a number of seconds.

When your local DNS gets the entry and passes it in to you, it will (or can) cache that entry in case you ask for it again.

It is, in this case, good for 300 seconds.

When it passes it on to you, it tells you how long this result is good for. First time, when the entry is fresh from the source, 300 seconds.

Wait 10 seconds and ask your server again, and it will give you the same data, but this time it will tell you its only good for 290 seconds.

When you ask after (say) 10 minutes, your local server may find this still in the cache, but compqring the time now to the time when it acquired the data it finds that more than 300 seconds have elapsed.

It goes back to the source and asks again.

This means that if the IP changes, the longest that anyone should be holding on to an old, out of date record is 300 seconds.

On average, it will be 150 seconds.

Once Pi-Hole caches an entry, it SHOULD look at the time it acquired the entry, look at how long it is valid for (TTL) and only hold on to it for that length of time before marking it as dead or just flushing it, and next time someone asks, it will make a new request to its upstream DNS. What it appears to do is just hang on to the entry for the max time specified for its cache.

What is the max time specified for the cache?

sudo grep ttl= -R /etc

@DL6ER Maybe you can explain how dnsmasq and thus pihole-FTL works for cache evictions and TTL.

We don't define a cache hold time*, only a number of entries. Cache lifetime is purely based on TTL.

*local-ttl is set to 2 seconds but that's not in play here or nothing would be in cache for more than 2 seconds.

Looks like something weird was going on when I had this problem.
I tried a small controlled experiment, creaqted a script which every 60 seconds did a lookup against:

Level 3 209.244.0.3

Local DNS 10.0.0.4

Pi-Hole 10.0.0.21

While that was running, restarted the AWS instance and noted the time at which the IP was updated.

This is the result:

> Wed Nov 13 12:03:30 PST 2019

> 209.244.0.3 [admin.solutus-lab.net](http://admin.solutus-lab.net).	300	IN	A	54.202.6.183

> 10.0.0.4    [admin.solutus-lab.net](http://admin.solutus-lab.net).	188	IN	A	54.202.6.183

> 10.0.0.21   [admin.solutus-lab.net](http://admin.solutus-lab.net).	188	IN	A	54.202.6.183

> ------------------

> Wed Nov 13 12:04:30 PST 2019

> 209.244.0.3 [admin.solutus-lab.net](http://admin.solutus-lab.net).	217	IN	A	54.202.6.183

> 10.0.0.4    [admin.solutus-lab.net](http://admin.solutus-lab.net).	128	IN	A	54.202.6.183

> 10.0.0.21   [admin.solutus-lab.net](http://admin.solutus-lab.net).	128	IN	A	54.202.6.183

> ------------------

> Wed Nov 13 12:05:30 PST 2019

> 209.244.0.3 [admin.solutus-lab.net](http://admin.solutus-lab.net).	157	IN	A	54.202.6.183

> 10.0.0.4    [admin.solutus-lab.net](http://admin.solutus-lab.net).	68	IN	A	54.202.6.183

> 10.0.0.21   [admin.solutus-lab.net](http://admin.solutus-lab.net).	68	IN	A	54.202.6.183

> ------------------ IP Updated at Wed Nov 13 12:06:28 PST 2019

> Wed Nov 13 12:06:30 PST 2019

> 209.244.0.3 [admin.solutus-lab.net](http://admin.solutus-lab.net).	97	IN	A	54.202.6.183

> 10.0.0.4    [admin.solutus-lab.net](http://admin.solutus-lab.net).	8	IN	A	54.202.6.183

> 10.0.0.21   [admin.solutus-lab.net](http://admin.solutus-lab.net).	8	IN	A	54.202.6.183

> ------------------

> Wed Nov 13 12:07:30 PST 2019

> 209.244.0.3 [admin.solutus-lab.net](http://admin.solutus-lab.net).	37	IN	A	54.202.6.183

> 10.0.0.4    [admin.solutus-lab.net](http://admin.solutus-lab.net).	300	IN	A	18.236.186.115

> 10.0.0.21   [admin.solutus-lab.net](http://admin.solutus-lab.net).	300	IN	A	18.236.186.115

> ------------------

> Wed Nov 13 12:08:30 PST 2019

> 209.244.0.3 [admin.solutus-lab.net](http://admin.solutus-lab.net).	300	IN	A	18.236.186.115

> 10.0.0.4    [admin.solutus-lab.net](http://admin.solutus-lab.net).	240	IN	A	18.236.186.115

> 10.0.0.21   [admin.solutus-lab.net](http://admin.solutus-lab.net).	240	IN	A	18.236.186.115

Which all looks exactly as it should be.
It has to be a client problem I am thinking … I know Windows used to cache DNS lookups and ignore TTL at one time, I wonder if it still does????

(I use Mac/Linux mostly, so not really up to date on Microsoft topics…)

Philip

This is precisely what FTL is doing.

Yes, this is more likely.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.