Wildcard and regex support for whitelisting

ah ok, that's nice. Care to share? Or even better submit to the project?

1 Like

I would (still) welcome wildcard whitelisting incorporated into pi-hole.

We are a Microsoft 365 Business shop (for better or worse), and literally just last week I had to whitelist yet-another microsoft sub-domain after my users couldn't login into Skype for Business because some blocklist decided it needed blocking. Grrrr....

1 Like

@Rick_v,

For what its worth, Microsoft publishes domain and ip addresses at https://docs.microsoft.com/en-us/office365/enterprise/urls-and-ip-address-ranges including machine readable json linked from there.

2 Likes

@smoser,

Wow, this is tremendously helpful! Thank you so much!

Me too. It very much surprised me it wasn't there, while pihole allowed me to put
*.somedomain.sometld
in whitelist.txt, it didn't error out or anything.

Is there an easy way to reload dnsmasq from this script from outside of a docker-container? I mean, I would have to reload the list manually everytime the list updates, that's not going to work.
Also, it would be nice if pi-hole had an 'execute after' option for such scripts, like you write, a hook during import or something post all updates, just before reload.

2 posts were split to a new topic: Entering multiple regex at one time

Revisiting the original request: Whitelist regex support.

It is technically possible, however, I will tell you why I don't think it is a good idea:

Regex filter evaluation is - always - a sequential (and hence slow) task. You have to try all of them until you know that none of them matched. This is the exact reason for why we split the blacklist into an "exact" and a "regex" component. The "exact" component is loaded into cache and can be replied to with close to no delay at all. Walking the chain of regex filters is, however, much slower.

The implementation could be made in two ways:

  1. Only use regex-based whitelist - very bad performance if you have many whitelisted domains
    This is to be avoided as Pi-hole v5.0 will just introduce support for massive whitelists, using an implementation strategy that will still give the result of a query with a typical delay of < 4 msec even if your have millions of domains on the whitelist.
  2. Add a regex-based whitelist next to the already existing whitelist - increase in complexity for the users.
    This is to be avoided as well as it would introduce a severe slowdown of the blocked domains preparation (AKA "gravity"). Instead of only excluding the whitelisted domains (which is very efficient), we'd need to evaluate all whitelist regex filters against any of the (possibly up to millions of) domains on the blocking lists. This would result in a catastrophic slowdown, maybe causing gravity to take hours instead of tens of seconds on Raspberry Pi devices. This is unacceptable.
1 Like

As you know there is almost nothing I will not discuss about. If I can be convinced from the contrary, I have not problems in accepting that I have maybe been wrong.

Can you give some more details about this so we can understand the performance impact?

  1. What kind of device are you running Pi-hole on?
  2. How many blocked domains so you source without whitelist regex filtering?
  3. How many whitelist regex entries do you use?
  4. How complex are they (this is a asking for subjective feeling, .* is not very complex, however, if you use () or | rather often, then the regex is much more complex when compiled to byte code).
  5. How many blocked domains are left after your whitelist procedure is done with them?
  6. How long does the script take only for the whitelist processing?

Hmm, okay I will continue to think about this, however, we made recent optimizations for v5.0 which make implementing it in the way you did (a bit) harder.

As far as I see, your implementation also only cleans the gravity list, right? If a user (intentionally or not) blocked something in addition on the blacklist, you don't delete the line for him, right? If so, this would be rather unexpected behavior I'd say.

Instead of one big gravity run, that needs to be repeated for each whitelist modification, we instead use table views now. By this it is sufficient to send SIGHUP to pihole-FTL for the modified whitelist to become active. There is no call to pihole -g any more. However, this also means that edits to such a regex-based whitelist would again require us to run gravity each time as live filtering when loading the lists may be too slow.

Can you send me the regex filters you're using so I can have a realistic set of filters for performance measurements when I come around doing a testing implementation?
Please do not see this sentence as a guarantee that I will do it anytime soon nor that this will become part of Pi-hole if it turns out to be either too slow or to complex.

Quicker than expected, we now already have a development draft that is handling regex whitelists consistently and completely inside of FTL and are currently working out the performance impact.

I've seen the blacklist regex domains on @mmotti's GitHub project but I'm not sure I have seen the mentioned whitelist regex before.
Note that this feature request is about whitelisting, not trimming down the number if gravity domains because they are already partially covered by regex filters.

A post was split to a new topic: DNS configuration for wildcard whitelisting

How about the Pi-hole Teleporter feature? Note that it is not only available on the dashboard but also through the CLI ( pihole -a -t creates a pi-hole-teleporter file in the current working directory ).

I agree.

1 Like

I'm also looking for blacklisting everything *. and only adding a few sites like *.apple.com and *.virtualdesk.net. This is for my kids network and I dont want them on anything but a few sites, but it seems these sites have 100s of subdomain names. Is your script located on your github? Can you point me to it?

Thanks
Nick

Hello!

May be you have an idea how to whitelist domains with subdomains?
I need policy "deny all except allowed".
I can deny by address=/#/127.0.0.1
BUT, i can't allow by mask.

For example i would like to allow

  1. pipa.com
  2. 2.pipa.com
  3. *.pipa.com

If i do server=/google.com/208.67.222.222
by this config i allow anything.google.com and this is not good for my work.
May be you have an idea? I have a resolution by SQUID, but i qould like to use DNS instead for this.
Thank you!

I'd also appreciate to use wildcard entries for whitelisting.

I'm very surpised that it isn't already here?
I'd count this to the very basic things when working with Filtering/Blocklists.

This is included in the upcoming Pi-hole V5 (currently in beta).

1 Like

That's great to hear. Very nice.
Is it already known when this version will be released on stable branch?

Or could I switch to beta branch meanwhile and then back to stable after it is released?
Or isn't it as easy as that? Will this lead to problems?

Soon(TM).

Yes. The release announcement for the 5.0 beta has the upgrade commands. When the master 5.0 is released, you would shift from beta 5.0 to master with pihole checkout master

Please note the large disclaimer on the release notice - the upgrade is one-way only. Make a backup of your SD card so you can revert to 4.X if desired.

https://pi-hole.net/2020/01/19/announcing-a-beta-test-of-pi-hole-5-0/#page-content

1 Like