Last night my internet was going up and down for hours. All of my Google devices started to go berserk trying to reach DNS which caused them to be rate limited. This caused them to try even harder which made things worse. Even after my internet settled down, the devices continued to be rate limited due to their continued pounding on the pi-hole. I wonder if it would be better if the rate limiter would allow a small percentage of the requests made by a rate limited device to be resolved? This might help break this kind of vicious cycle.Allowing a certain number of consecutive requests to go through might be enough to let the device start operating normally again.
Thanks for your report. We are discussing this internally as we see the limitations of the current rate limit design.
Do you know if it was always the same domain they requested or where different domains involved?
The biggest domain by far was connectivitycheck.gstatic.com. Over 13,000 hits in 15 minutes. There were a few others, but they were outnumbered by 12 to 1.
I'm not observing that behaviour in my network, but very likely, that's because I completely mock connectivity checks in my setup.
Still, I wonder if Google's developers wouldn't do better than having devices hammer a DNS server for replies in such short succession.
In searching for a possible explanation (and for potential means of mitigation), could you please check your Pi-hole's dnsmasq
configuration for your current local-ttl
?
Since Pi-hole FTL v5.10.1, that defaults to zero if it hasn't been set explicitly.
I'm not entirely sure whether that would also apply to REFUSED replies you are likely seeing during outages, when Pi-hole isn't able to contact any upstreams.
But a client trusting to receive a potentially different reply for the same domain after its TTL has expired may well decide to repeat its request immediately when seeing a zero TTL value.
You could try to set an explicit local-ttl
value by creating a custom configuration with the following content (e.g. /etc/dnsmasq.d/99-local-ttl.conf
):
local-ttl=2
Of course, you'd need to wait for clients to excessively request resolution again, before being able to verify if this would have any effect on your observation.
When everything went crazy, my modem signal levels were dropping so low that data was only making it through occasionally. It seems like temperature and moisture were screwing with the signal. This made for a really messy set of dropouts over several hours that morning.
I checked for a local-ttl setting and found none, so I created the 99 file and restarted FTL. Murphy's law says that if I need the network to drop out, it'll be golden instead. I'll keep an eye out for a re-occurance.
This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.