Consider making NODATA the default blocking strategy

Pi-hole supports multiple blocking modes, where NULL is the current default. I would like to ask to change the default to NODATA, given some recently discovered issues in browsers on macOS with the NULL strategy:

It appears that Chrome, derivatives (Edge, Brave, ...) and possibly Firefox on macOS Big Sur have an issue where connections to don't fail immediately, but run into timeouts. This can cause Chrome to become extremely sluggish and show other weird issues (video conferences start to lag and desync, typing into input fields has a noticeable one second lag, ...).

Chrome is looking into improving things on their end (see above linked Chromium issue report) and more advanced users can mitigate this by manually configuring their Pi-hole to use the NODATA strategy, but I figured that maybe this is a good time to switch to the NODATA strategy by default. Even as an IT professional it took me weeks to finally make the connection between "My Chrome browser has a sluggish UI and bad lagging in video conferences" to "Pi-hole serves on blocked domains, which triggers some network code issue". So even if Pi-hole is not at fault, maybe we can improve the experience for users by changing something on our side here.

I'm copying some additional thoughts from a Chromium engineer here from the issue linked above: "My suggestion for ideal blocking strategy would be NODATA with a TTL provided via SOA record in the Authority section and a relevant EDE (see RFC 8914). Creates a cacheable response while being mostly honest with the client about the situation rather than completely forging false results (REFUSED would be more honest than NODATA but it's not cacheable and it sounds like they're sensitive to that). But the caveat to my suggestion is that I'm biased due to knowing that Chrome can handle and cache such a response, but maybe the pi-hole peeps have experience with DNS clients that handle things poorly."

The Pi-hole docs about the blocking modes already state that NXDOMAIN and NODATA are roughly equivalent from the pros and cons compared to the NULL strategy (with NODATA having a slight edge in terms of compatibility). Maybe this is enough to tip it over the edge to make it the default?

We did some very extensive testing when we introduced the various methods. Nothing was set in stone. Our research showed that some IoT devices started hamming the Pi-hole like crazy (think in terms of 100 queries per second) because they knew the NODATA answer is wrong and they desperately wanted to send their telemetry home. In the end, we decided to use what had the best compatibility and given that the vast majority of IoT devices would never receive any update, it made sense to focus on them.

We send a TTL with 2 seconds for the IP blocked replies. This to avoid stuff being cached on the client side and whitelisting in Pi-hole becoming ineffective because the clients do not re-ask the Pi-hole. So far we do not mock SOA records at all to enter the data into client caches and this is entirely intended. Once Pi-hole knows that a domain is blocked, it has this knowledge in a fast cache. When the same query comes in multiple times, they can be replied in microseconds, even on Raspberry Pi hardware. I'm not sure creating overhead here is necessary as traffic in your local network should be cheap.

@DanSchaper @PromoFaux What do you think about switching the default blocking behavior?

I'd like to get some concrete proof that NULL is causing this issue.

I cannot reproduce this on a MacBook Air running Big Sur latest and Chrome or Firefox. Pi-hole is using NULL blocking.

I think this is a badly behaved app or combination of apps. We should keep the blocking mode that best meets the needs of the vast majority of our users. In the edge cases where NULL is not the best mode, we already have options for selecting other block modes.

I will note that I do zoom meetings regularly on Chrome with MacOS Mojave and Pi-hole NULL blocking, with zero problems. If a specific build of an app on a different OS is causing problems, then that app is what should be fixed.

I found this additional post and now everything makes sense to me:

"This is a bug in Big Sur. When the IP address is accessed, an additional delay of ca. 1 second is inserted when a Network Extension is loaded." -

Indeed, I can only reproduce this on my work-provided machines, which have macOS Network Extensions installed (firewall and network auditing software). Similar software that triggers this would be VPN software or personal firewalls like Little Snitch.

This also means that it's definitely the NULL blocking mode that causes the problem in these environments, as it triggers this bug. My own tests clearly show this and the various user complaints all point to the same issue and workaround. And the network extension framework bug is a plausible technical cause for this observation. You are of course right that fixing the root cause is the right thing to do and hopefully Apple will do so at some point in the future.

Advanced users can pick whatever blocking mode they prefer, but I think the important question here is what the right default choice for the average user is. What's the bigger problem for these people:

  • DNS request spamming IoT devices,
  • or mysteriously sluggish browser performance up to the point of making video conferences impossible?

I don't think it's a clear call and I don't know which group of affected users is bigger, but I at least wanted to raise this for discussion. If you think that NULL is still the better strategy (it would be nice to share your IoT research data, I'm just curious regarding the amount / brand / kind of affected devices), maybe we could add this to the Pi-hole documentation to give people a chance to notice it and switch their blocking mode to NODATA? Happy to send a PR.

Found this additional user comment:

I switched pihole to NXDOMAIN and browsing was fast again on the Mac, but i noticed that the knock on effect is that now Chrome/Firefox on my android devices (phone and tablet) hangs in the same way OSX did previously. Switching blockingmode back to null, swaps the problem back to the Mac but Android is cured.

Seems like we can't win with just a single strategy. :pensive: Maybe a warning in the documentation for the NULL blocking mode that it can cause issues if you use macOS Big Sur and network extensions would be the best way. WDYT?

I'll disagree here. The problem is the bugs in the Apple software. NULL blocking simply reveals the bugs. As the Little Snitch developers noted in replies to a user posted in your linked Reddit post:

This is a bug in Big Sur
The bug has been reported to Apple, but they currently work on fixing kernel panics first....
Because it's a bug in macOS, in the Network Extension framework. This framework hooks into networking only if an app like Little Snitch is installed.

Very few people are going to read this part of the documentation, and I don't think we should expand our documentation to address bugs or common problems with operating systems.

Avast causes DNS problems with Windows, browsers can use private DNS, etc. We don't address any of them in our documentation, nor should we.

There are plenty of posts online about this Big Sur behavior. A simple web search will find them in little time.

But users first have to know what their problems are to search for it. It is a personal skill if you know how to use the search engines to give you what you want. Modern Google etc. make it easier but you still have to ask the right question to get any meaningful results out of them.

Can Pi-hole detect via the MAC address if there are Apple devices using the Pi-hole and, if so, warn the user? Is this too much? It seems people need help with this.

The combination of "using a non-Safari browser", "using macOS Big Sur", "having a network extension installed" and "using Pi-hole's default blocking mode" triggers this issue. Remove one of the components and everything works fine.

From a technical point of view, the bug in macOS is the root cause, but from a user's point of view, Pi-hole is the culprit, because removing this additional piece makes their system work fine again. I don't think it helps if we try to talk this problem away by assigning blame to someone else - in the end Pi-hole has to work for non-technical users out of the box and I don't see why broken IoT devices are somehow more important to cater for than Mac users. Who knows if / when Apple will fix this issue - until then people are impacted by this in their home-offices and have no way of figuring out what's going on (see below for why a simple web search won't help).

Yes! The difficult part is not fixing the issue in the config once you figured out what's going on, but making the connection between "my browser UI is really sluggish" and "Pi-hole's default blocking mode triggers a bug in my operating system". No one I talked to even considered this to be a possibility - instead people figured it must be caused by an overheating computer, a Chrome performance bug, a too slow CPU, GPU driver bugs, broken hardware, ...

Without already knowing this connection, people won't search for the magic "pi-hole chrome slow" phrase and thus never find these posts online.

How did you resolve the problem you were experiencing? Through a web search.

Perhaps a well written FAQ here would help.

The web search was at the end of a multi-week process of trying to figure out why my web-browser is slow. It involved reformatting / reinstalling my computer, replacing it with a new one, trying to fight the perceived overheating with a fan, which didn't help and was then followed by a systematic elimination process where I temporarily replaced my external screen, my webcam, my internet connection, my network switch and router and during that last step, when I accidentally didn't use my Pi-hole as my DNS server while using the replacement router, I figured: Wait. It can't be DNS, right? The symptoms absolutely do not point to typical DNS issues, but.. yes, it's caused by Pi-hole.

Then I was able to search the web and found the solution that the blocking mode triggers this issue.

I think you could save many people a lot of money, time and frustration by ideally just not triggering the issue or alternatively by alerting them to the fact that they have to change their config.

1 Like