Prefetch Popular Domains to Improve Cached Response Frequency

tl;dr: prefetching could help increase Pi-hole cache utilization, and improve overall network speed at the cost of increased outbound DNS queries

Prefetching is a common DNS practice designed to increase cached response rate for DNS servers, as cached responses are typically faster than forwarded responses - especially for users who utilize a recursive DNS resolver like Unbound.

For the uninitiated, prefetching is the act of preemptively querying domains prior to their TTL expiring.

Functionally, from what I've found, there are two common methods for prefetching domains at the DNS server level:

Top X Prefetching - where the top X domains are prefetched as long as they have been queried within the last Y minutes. In the paper linked, the researchers found a "sweet spot" somewhere around the top 20 domains for a home network. For personal reference, looking at my network the top 20 non-blocked domains represent roughly 50% of all non-blocked queries. This style of prefetching is best for smaller networks where the number of unique devices accessing the network is relatively low, but ultimately results in significantly more outbound queries. In the study linked, the number of outbound queries for popular domains on home networks increased by a factor of ~5-150. When properly tuned with a good cutoff algorithm, this factor was capped at <10. Since Pi-hole actively tracks top domains (assuming your privacy settings are set to allow this), it's not unreasonable to imagine a system where this data is applied for prefetching.

TTL EOL Prefetching - as used by Unbound where domains are prefetched in response to any domain queries occuring within the last 10% of the cached TTL. This approach does not require a top list per-se, but commonly prefetches domains queried more often than their respective TTL time. The upper limit on additional outbound queries is 10%, but only reaches this limit on networks where popular domains are queried frequently enough by unique users. The downside of this approach is that if you have a limited number of devices, or different browsing habits amongst members of your your household, devices may preserve a DNS response in their cache until the TTL expires, so prefetching may never occur.

The purpose of this post is to request implementation of a prefetching system on Pi-hole to improve network performance, and to start a dialog around my personal assertion that the optimal route would be to allow users to toggle both Top X Prefetching and/or TTL EOL Prefetching depending on their personal network configuration and added load tolerance.

Implementation of prefetching, especially TTL EOL Prefetching designed to mimic Unbound prefetching, could resolve the main issue driving users to modify cache size on Pi-hole to 0, against developer recommendation, as Unbound would now be receiving those near-EOL requests typically absorbed by Pi-Hole. Everyone wins.

Additional thoughts on potential configuration parameters:

Top X Prefetching

  • # of domains - how many of the top are maintained prefetched (default ~20)
  • cutoff time - how long between organic queries before top domain is removed (temporarily) from prefetching queue (default ~600s)

TTL EOL Prefetching

  • % of TTL Cutoff - queries in the final % of TTL trigger prefetching (default ~10% to match Unbound)
  • global TTL response scale - a global scaling factor (0-1) applied to all received TTLs forwarded to clients (default ~0.95-1). Scaling all TTL by something like 0.95 could further increase the effectiveness of this prefetching approach as device caches would expire just in time to trigger a cached response and signal pi-hole to pre-fetch the domain. This scaling would be especially beneficial on smaller networks with fewer unique client requests.

Thoughts?

Use unbound.

1 Like

Second the unbound recommendation. Prefetches nicely and has an efficient cache as well.

We should change Pi-hole so people won't do things they probably shouldn't be doing in the first place?

1 Like

Per DL6ER: Caching in Pi-hole inhibits Unbound's prefetching algorithm.

Prefetching in Pi-hole would both support DNSSEC validation (because you wouldn't be disabling the cache) and fix the issue (caching for full TTL duration) that causes prefetching limitations.

Additionally, as proposed, there's more than one way to prefetch, and the first algorithm mentioned is likely favorable in smaller home networks where queries to popular sites aren't as statistically likely to hit during the final 10% of TTL.

Plus, pre-fetching in Pi-hole would benefit everyone, not just those who use Unbound.

Quite the jump to go from "might not have an effect" to changing his words to make it an absolute.

Give us some actual numbers of what kind of benefit this gives and you'll stand a better chance to getting it. Anecdotes and "It feels faster" won't do much.

Happy to get some numbers comparing prefetching stats

As for twisting DL6ER's words, that was not my intent, which is why I directly quoted him - can't get any less twisted than verbatim. But the specific phrasing of his statement was a bit hard to follow - when he says "prefetching might not have an effect", what he's saying is "if the algorithm follows what is stated in the mail thread, prefetching may not happen at all if Pi-hole caches queries". Maybe inhibit was the wrong word choice on my part, as it is vague and could be interpreted as completely prevents, but my intent was to say that it hinders the algorithm stated by the Unbound developers. There may be more going on behind the scenes, so it's not totally blocked, but you will get fewer <10% TTL queries if Pi-hole holds the query for the entire TTL duration.

Maybe ask him to clarify instead of assuming you know what he meant?

@DL6ER could you clarify what you meant please?

I guess you have read the whole topic you linkes above? I also thought that unbound might not prefetch if I leave pihole's cache enabled (as I understood the explanation in the mail thread) so I just went and looked at the unbound stats: a lot of prefetching happened despite pihole's cache enabled. My resume was that the description in the mail is not accurate (any more?) .

I have been following a long time in read-only mode before deciding to create an account here to gain read-write access. I also read the linked article about the addition of CACHE_SIZE to setupVars.conf and I agree with @yubiuser that @DL6ER's message was quite clear. It was just quoted out of context (by quoting only parts of it).

When I first read this other discussion two months ago, I was confused. I am, still.

  • Does unbound do prefetching? This is what @yubiuser suggests
  • What would be the benefit of adding (another layer of) prefetching to Pi-hole?
    (I write "another layer" but this is obviously only meant for the unbound users)

Pi-hole already caches domains. There is no upstream query performed for a typically long time. When a domain is requested after the TTL expired, the domain is requested upstream once and then again cached for the entire TTL.

I see only two effects of implementing prefetching in Pi-hole:

  1. PRO: The delay for the queries (every hour?) is decreased by some 10-100 milliseconds.
  2. CON: Pi-hole has to implement their own algorithm to decide what is a "hold" domain and what is a "cold" domain. May be how often the domain is queried in total (> 2 per minute) or relative to others (> 2% of total number of queries) or or or...

If you add it, can we have an option to control the algorithm used here? :slight_smile: Maybe use the same option to provide a way to disable it.

As I see it, the terms used to distinguish prefetching (Top X and TTL EOL) do not define different approaches of prefetching at all.

The authors of the quoted study "Accelerating Last-Mile Web Performance with Popularity-Based Prefetching" seem to use a similar time-based cache eviction strategy as unbound, they may just choose a different set of input variables for their parameters, i.e. an arbitrary theshold instead of TTL. Likewise, unbound has to decide which entries get evicted from its cache once it hits the cache size limit, which again is an arbitrary value of 20 in the study, based on some small data sets heuristics.

More importantly, the study does combine DNS prefetching with TCP connection caching for HTTP, collocating them on the same router, calling this combination "popularity-based prefetching".
It does so in order to "mitigate latency bottlenecks in the last mile", but falls short in providing actual latency numbers. Instead, it solely relies on separate figures for DNS and TCP cache hit ratio improvements.
This makes it difficult to assess both its overall benefit and the contribution of DNS, TCP and the effect of same-device collocation towards that total.
Furthermore, the study doesn't detail the traffic structure (remote vs. local), which would have an impact of latency incurred on cache misses.

So I can only guess here: Based on the fact that DNS makes up for a very small fraction of a network's data traffic, I'd expect the major benefit of the the study's proposal to be attributable to reuse of TCP connections, with a significantly smaller contribution by DNS, and maybe some effect of collocating those on the same device.

As Pi-hole is not involved in HTTP traffic (or any traffic other than DNS), the benefit of DNS prefetching -according to the study- would be to raise cache hit ratio from 15% to 50% while increasing the number of DNS requests tenfold (if optimised).

If you had to pay for each lookup, this would be an ineffective cost driver.
For every 1,000 DNS requests, you'd pay for:
1,000 x (100% - 15%) = 850 lookups without prefetching
1,000 x 50% x 10 = 5,000 lookups with prefetching

The benefit would be that you incur a higher latency less often with prefetching enabled, affecting your average "latency" as follows (assuming 1 ms for cache and 50ms for forwards) :
(15% x 1) + (85% x 50) = 42ms
(50% x 1) + (50% x 50) = 26ms
That's an advantage of 16ms on average, occuring once every 43 seconds or so (based on average daily DNS lookups per day from the study).

(Note that different metrics would apply to unbound, as a full recursion may take significantly longer than a straight DNS lookup. Prefetching therefore would seem more beneficial to unbound, so it's not surprising it actually can be configured for it.)

Of course, this ignores the unavoidable penalty of the first lookup for a non-cached entry, applicable in both scenarios, and any average contemplation displaces minimum and maximum observed values as well as their frequencies.
I acknowledge there is the occasional DNS resolution that takes a rather long time. I do not have any long-term data here; my own last 24 hours show 0.5% of queries taking a second and more.
It would take a more detailed study to verify why those occur, and whether they would typically be queried repeatedly beyond their cache expiration time to benefit from prefetching.

By and large, I doubt that a user will note a difference at all.

In case latency gains of that order are really important for someone, you should start by optimising it for the bulk of your network traffic, not the tiny fraction that DNS accounts for.

1 Like

Your math is spot on, but it wouldn't be every three minutes, it would apply to every query. By default (anecdotally, this roughly matches my statistics with Pi-Hole), ~15% of all requests are cached and ~85% are forwarded. Prefetching could make this closer to a 50/50 split for ALL QUERIES not just unique ones. Based on your estimates, for normal users (not using Unbound) that's a 38% decrease in average DNS latency, not just unique DNS latency - a non-trivial improvement.

"Every 3 minutes" bit would be more applicable to Unbound, because forwarded domains could still hit Unbound's cache, but unique domains would be more likely to trigger a recursive lookup which would, as stated, have a much longer response time.

I'm gathering data now, which will take some time. Preliminarily, Unbound still appears to do some prefetching, regardless of Pi-hole's caching, as others have stated, but time will tell if it is done to the same extent when Pi-hole caching affects its internal statistic gathering.

Can you point me to where response time data is stored? I'd also like to leverage this for a comparison.

Thanks, you are right, I picked the wrong numbers as baseline there.
I'll rework that bit. :wink:
EDIT: The study lists just below 2,000 DNS queries a day on average - one query every 43 seconds.

That's more around the 20% mark for me. I suspect that to be highly individual - and quite volatile, too.

AFAIAAO, response times are an in-memory only issue.
You can query those values over Pi-hole's Telnet API. You are looking for the last column of the getallqueries result set. Note that these values lose the decimal fraction by the query returning only the integer part.

EDIT: I just realised Pi-hole 5 adds another two columns to that output.

I've added an example to clarify output structure, with response times of 14.2 ms (click for details)
timestamp query type domain client status type DNSSEC reply type delay CNAME domain RegEx#
1593693424 AAAA fonts.gstatic.com 192.168.1.42 1 0 3 142 N/A -1
1593695428 A flurry.com smartphone.lan 1 0 4 142 N/A -1

Yes.

I'm working to gather data to indicate whether or not this is the case, but prefetching at the Pi-hole level could actually trigger additional prefetching (in a good way) at the Unbound level - based on the publicly shared Unbound prefetching algorithm. This assumes Pi-hole's algorithm functionally mimics Unbound's, and caching is done at the Pi-hole. If Pi-hole's algorithm differs from Unbound's, in a complementary fashion, it's possible that it could further increase the overall cached response rate.

Additionally, you highlighted the other obvious benefit - non-Unbound users would be able to capitalize on a DNS prefetching on their Pi-hole.

  1. Does your network only see one query per hour? If not, better cache return ratios are likely to be more frequent than hourly... See Bucking_Horn's math below your comment for an estimate on the impact to the average user. I'm still gathering my own statistics, but using an existing post from sawsanders as a reference, he had 13137 recursive look-ups in a 6-day period, or ~90 recursive look-ups per hour (assuming they were evenly distributed over a 24 hour period, and not preferentially happening during hours where people are actually using the network). For me, at least, an unbound recursive look-up takes somewhere between 100 and 1000 ms. Anything done to reduce the number of recursive lookup responses in a day is beneficial. A 1s lookup has a noticeable impact on page loading.

  2. Yes, the "con" to every feature request is that it will require development. That's the basic premise of a feature request. The beautiful thing is that once the basic prefetching code is in place, algorithms can be tuned to optimize prefetching over time. The system can start with a basic pre-fetching system like either or both of the algorithms I suggested in my first post, but over time can evolve. This is very much not the scope of the request today, but if a feature like this were to be implemented and allowed to evolve over time, prefetching would be a perfect candidate for a machine learning algorithm - they're designed for pattern recognition and prediction.

Abstract—An increasingly popular technique fordecreasing user-perceived latency while browsing theWeb is to optimistically pre-resolve (or prefetch) do-main name resolutions. In this paper, we presenta large-scale evaluation of this practice using datacollected over the span of several months, and showthat it leads to noticeable increases in load on nameservers—with questionable caching benefits. Further-more, to assess the impact that prefetching can haveon the deployment of security extensions to DNS(DNSSEC), we use a custom-built cache simulatorto perform trace-based simulations using millionsof DNS requests and responses collected campus-wide. We also show that the adoption of domainname prefetching raises privacy issues. Specifically,we examine how prefetching amplifies informationdisclosure attacks to the point where it is possible toinfer the context of searches issued by clients.

See section V.A (Results)

@DanSchaper Speaking of taking things out of context - this article is about browser hyperlink prefetching:

The soup du jour for decreasing user perceived latency is to optimize the use of the domain namesystem by pre-resolving (or prefetching) names in hyperlinks. Since DNS is responsible for translating human-readable names into IP addresses,nearly every initial visit to a website involves a name resolution. Thus, by proactively resolving hyperlinks in pages a user visits, the sites being referred to can be immediately contacted if, and when, the user decides to click on one of the links.

Pi-hole does not have exposure to hyperlinks, and thus is not impacted by the privacy concerns discussed in this article. No additional data would be exposed, and the prefetched queries at the Pi-hole level would be for domains that you actually want to connect do, not just every domain linked on a given search page.

Recall that our main goal is to study the effects ofbrowser-based DNS pre-resolution.

So you didn't read it?

To summarize - Pi-hole code should be changed to do pre-fetching because it may offer a very small speed benefit, regardless of the impact on any of the nameservers?

I you are using a third party DNS, query times are typically on the order of tens of milliseconds. Saving a few milliseconds or tens of milliseconds will have zero impact on the performance of any apps. I don't think I see the benefit here, other than it may look cool to have "better" cache usage stats shown on your dashboard. What am I missing?

I don't think anyone said regardless of the impact on the nameservers - that would naturally be a factor in determining the optimal prefetching algorithm. It is true by the very nature of prefetching that there will be an increase in DNS queries in order to optimize network performance. You may have just made a good point though for limiting the amount of user control over their personal settings. Without some amount of testing and QA, people could go wind up over-aggressively prefetching.

So prefetch once the TTL expires

My point is that it's not worth implementing this at all if there are no significant benefits. A few msec here and there...