With a validating recursive resolver, it would be expected that lookups take longer on average than with a caching DNS server.
This is obviously for two reasons: Recursion itself takes longer, as it involves communicating with multiple authoritative DNS servers, and DNSSEC validation comes on top of that.
Of course, unbound
would cache DNS replies for as long as a domain's TTL allows, serving cached reply records almost instantaneously.
Sporadic long resolution times may (re)occur when a cached reply's TTL expires, as unbound
has to partially or sometimes completely rewalk the recursion chain.
The time required for that refresh would depend on how many of the domains along the recursion chain would still be cached.
The refresh will take longer if more domains have to be rerequested. If any of the involved authoritative servers would be slow to respond or even unresponsive due to heavy load, that may further increase reply times, as unbound
may have to wait and potentially repeat its request to that authoritative servers.
Above should explain why a recursive resolver takes longer to resolve domains, and sometimes may fail to respond in time.
With your unbound 1.20.0, you should be able to mitigate this.
Starting with version 1.11.0, unbound
supports serving expired records according to RFC 8767, allowing it to use expired records if recursion did not complete in a certain time, in an attempt to avoid client side timeouts (see also Serving Stale Data — Unbound 1.21.0 documentation).
As the expired reply may be incorrect (e.g. a domain's IP address may have been changed in the meantime), it is served with a short TTL of 30 seconds. This in turn should prompt the client to send a new request after 30 seconds, by which time unbound
should have completed its recursion and serve a current reply.
You should be able to add the following lines to your unbound
configuration:
server:
serve-expired: yes
serve-expired-ttl: 86400 # do not serve replies older than one day, in seconds
serve-expired-client-timeout: 1500 # consider serving expired replies when resolution takes longer than 1.5 seconds, in milliseconds
You should just add the three serve-expired*
options - the initial server:
line above is only included to help you find the appropriate section where those options have to be added. It should already be present in your configuration.
Afterwards, you'd need to restart unbound
(sudo service unbound restart
).
This would not avoid longer response times altogether, but it would have unbound provide a reply after at least 1.5 seconds.
You may want to tune that value further, if your clients would still time out before that.