Github A records not cached

The issue I am facing:
See also the log entries:
Certain Home Assistant addons require access to Github (which works as expected).
It looks like these requests to Github are not cached - while others are cached as expected. The TTL is set at one hour.

2023-03-28 13:33:24 A api.github.com hass.itv.lan OK (answered by 1.1.1.2#53) IP (6.6ms)
2023-03-28 13:32:24 A api.github.com hass.itv.lan OK (answered by 1.1.1.2#53) IP (7.6ms)
2023-03-28 13:32:21 A inv-1.iot.itv.lan hass.itv.lan OK (cache) IP (0.0ms)
2023-03-28 13:32:18 A inv-2.iot.itv.lan hass.itv.lan OK (cache) IP (0.0ms)
2023-03-28 13:31:23 A api.github.com hass.itv.lan OK (answered by 1.1.1.2#53) IP (8.3ms)

Details about my system:
Ubuntu 22.04.2 LTS
Pihole docker 2023.03.1
Pi-hole v5.16.2 | FTL v5.22 | Web Interface v5.19

What I have changed since installing Pi-hole:
Added a VM with Home Assistant

=====

Any idea as to what might be causing this? And how to improve?

From my location, TTL for api.github.com is 60 seconds - one minute:

~$ dig api.github.com

; <<>> DiG 9.11.5-P4-5.1+deb10u8-Debian <<>> api.github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57780
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;api.github.com.                        IN      A

;; ANSWER SECTION:
api.github.com.         60      IN      A       140.82.121.5

;; Query time: 24 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Mar 30 16:37:52 CEST 2023
;; MSG SIZE  rcvd: 59
EDIT: The same TTL is apparently in use when querying the same upstream as Pi-hole (click for detaills)
:~$ dig api.github.com @1.1.1.1

; <<>> DiG 9.11.5-P4-5.1+deb10u8-Debian <<>> api.github.com @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60728
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;api.github.com.                        IN      A

;; ANSWER SECTION:
api.github.com.         51      IN      A       140.82.121.6

;; Query time: 12 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Fri Mar 31 00:20:17 CEST 2023
;; MSG SIZE  rcvd: 59

Maybe you misread the TTL?

Thank you for the quick response.

Not sure if we are on the same page here.
But caching TTL is a client side parameter - not?

On my site (tested with 2 clients and the Ubuntu server with pihole):

; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> api.github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10471
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;api.github.com.                        IN      A

;; ANSWER SECTION:
api.github.com.         3557    IN      A       140.82.121.6

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Thu Mar 30 16:58:04 CEST 2023
;; MSG SIZE  rcvd: 59

Are you implying that you have changed TTLs for domains you are not authoritative for in your installation?

TTLs are controlled by the respective DNS server authoritative for the domain.
The domain's owner set them to manage load and according to administrative requirements.
Artificially manipulating TTL values within non-authoritative DNS resolvers is generally bad practice.

Like I said - not sure if we are on the same page here.

To my knowledge there is a registar DNS config and a client site config.

Typically the registar side DNS config consists of an A-record with a TTL.
However, this TTL has nothing to do with the TTL that is part of client site caching of DNS names and IP addresses. Or the TTL for the caching done by pihole or systemd resolved.

Agree?

My initial post was about the latter: caching of pihole for the Github API calls.
This doen't seem to happen. While it does work for other domains.

I'm curious why this is happening.

No, registrars may provide sane defaults and manage them on behalf of a respective domain owner, but it's the domain owner that controls the TTL.

By the defining RFCs, TTLs in DNS specify "the time interval that the resource record may be cached before the source of the information should again be consulted".

No again.
In general, a DNS resolver would cache DNS records at most for the TTL value as provided with the reply before discarding them.

Your log output above shows that Pi-hole has forwarded DNS requests to 1.1.1.1 at 13:31:23, 13:32:24 and 13:33:24. That observation would be in line with the 60 seconds TTL as provided by api.github.com authoritative DNS servers (and that is also the TTL as provided by 1.1.1.1).

Your dig shows a longer TTL, and your mention of client side configuration of TTLs makes me wonder whether you would perhaps have tampered with TTLs on your machines.
If so, how did you go about configuring deviating TTLs?

That is done with dnsmasq (as part of pihole):
local-ttl=3600
max-ttl=3600
max-cache-ttl=86400
min-cache-ttl=3600
use-stale-cache=3600

Which is then sent to clients as part of the dhcp request (assumption).
And the Home Assistant server is one of those "clients".

I don't know any other way of playing with these settings.

No.
Depending on the option, Pi-hole will alter either the TTL for a cache record or in the DNS replies it supplies to a client, or both.

I wonder what it is you are trying to achieve by your config?

It seems problematic on first glance, e.g. some of the options may contradict others, you are extending DNS resolution for local records while cutting upstream TTLs at a maximum and forcing Pi-hole to serve stale records when Pi-hole already is deprived of knowing whether a cached record is stale for all upstream TTLs smaller than your min-cache-ttl.

It seems counter-intuitive to limit max-TTL to an hour, but force Pi-hole to serve stale data for another hour at the same time. In conjunction with your min-cache-ttl, this is telling clients to come back for fresh data in an hour, but serving them old data for another hour instead.

Furthermore, it would take up to two hours before a change of a local DNS record for a host in your network actually is seen by clients.
You are also making pihole-FTL/dnsmasq caching less efficient, increasing memory usage.
And you run into the risk of using incorrect data for public domains, e.g. your clients may fail to contact a stale IP that you forced Pi-hole to serve when the domain owner has already taken down the respective machine for maintenance, potentially upo to two hours ago.
Also, in the unfortunate event of manipulated upstream DNS responses (e.g. from a DNS cache poison attempt), you may use forged DNS records even longer.
And you may also thwart DNSSEC validation from succeeding when working with stale data, which would result in a loss of resolution for affected domains - they'd become inaccessible for clients.

Your custom cache and TTL manipulations explictly force Pi-hole to serve stale data for a full hour. The TTL for stale data is always zero, to encourage clients to come back for accurate current data at their earliest convenience.

I'd strongly recommend to revert all of those configuration options and let Pi-hole handle cache and TTL by its defaults, in order to have DNS TTLs play out as intended.

You may only want to keep a reasonable low value for local-tll, for local changes to propagate quickly. Something between a few seconds and a few minutes, perhaps.

Why such low TTL-values?

If IP adresses where to be changed that quick and frequent, the availability and performance of applications and webservices would be horrible. Because on every change users would need to reconnect and logon; part of which is (potentially) underlying route (re)discovery.

As far as memory usage goes: currently the system uses less then 2-GBytes of the available memory. And this is for the OS and 5 docker containers (one of which is pihole). So I guess this is not an issue.

As explained:

This is not about TTLs from public resolution.
local-ttl controls TTL of the values that Pi-hole is administering, e.g. Local DNS records or DHCP lease hostnames (if Pi-hole is acting as DHCP server).

Likely, you wouldn't want clients to continue with outdated information and to wait for an hour or two before clients would re-request resolution of a local DNS record that has been changed by you in the meantime.

In the past, e.g., this would also have affected domains that you'd have blocked manually in Pi-hole. (from Pi-hole's September 2021 release, Pi-hole uses a separate TTL for blocked domains).

Note that pihole-FTL /dnsmasq is defaulting local-ttl to zero:

local-ttl=<time>
When replying with information from /etc/hosts or configuration or the DHCP leases file dnsmasq by default sets the time-to-live field to zero, meaning that the requester should not itself cache the information. This is the correct thing to do in almost all situations. This option allows a time-to-live (in seconds) to be given for these replies. This will reduce the load on the server at the expense of clients using stale data under some circumstances.

My suggestion of a few seconds to a few minutes would try to strike a compromise here.

My recommendation was to keep only local-ttl, and remove the other four cache and ttl related options from your configuration.

You keep reasoning from a DNS perspective. And with the (hidden?) assumption that lots of IP addresses are changing every few minutes (seconds?).

However, if this would be the case IRL, one can expect a horrible user experience working in such an application environment. Which can never be compensated by low TTL values..

The same applies to failover delays because of high(er) TTL values: if this is really a pain, then it is probably because failovers are happening a lot. If so, then most likely, something else is not working as expected. In which case even 5 seconds is too long.

Perhaps this is one of those cases where "lets agree to disagree" is a good fit? :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.