Wildcard and regex support for whitelisting

#1

hello developers!

I’d like the ability to use wildcard (and regex) on the whitelist.

The reason being, our office is pretty much “all in” on Microsoft 365, Sharepoint, etc. Basically, “all microsoft, all the time”. Yet, several of the blacklists I subscribe to block various microsoft servers, causing random issues, like logging in, etc. While I understand this may not be everyone’s cup of tea, it is what it is for us and it would be great if we could wildcard-whitelist .microsoft.com once and for all.

Conversely, we’re mainly Mac users at home. And I see on some blocklists things like ocsp.apple.com listed, as well as other servers. I have no clue why anyone would think it would be a good idea to block a ocsp server! But same thing, at home I’d like to whitelist everything .apple.com.

-Rick

#2

In the short term, you could reduce your false positives with fewer block lists. One of the problems with public block lists is that you have no control over the content, and what the list maintainer wants to block may not be what you want to block (as in the case of ocsp.apple.com, which I have also had to whitelist).

You might try just a few lists (or no lists) and set up some regex filters that you control. Here are some examples that will knock down a lot of the adware, metrics, etc.

^(.+[-_.])??ad[sxv]?[0-9]*[-_.]
^adim(age|g)s?[0-9]*[-_.]
^adse?rv(e(rs?)?|ices?)?[0-9]*[-.]
^adtrack(er|ing)?[0-9]*[-.]
^advert(s|is(ing|ements?))?[0-9]*[-_.]
^aff(iliat(es?|ion))?[-.]
^analytics?[-.]
^banners?[-.]
^beacons?[0-9]*[-.]
^clicks?[-.]
^count(ers?)?[0-9]*[-.]
^pixels?[-.]
^stat(s|istics)?[0-9]*[-.]
^telemetry[-.]
^track(ers?|ing)?[0-9]*[-.]
^traff(ic)?[-.]
2 Likes
#3

Hey jfb,

Just a quick shout-out thanks for this list of regex filters! I’ve never really been able to wrap my head around regex, so this list is great! Thanks much!

Someone should consider adding this list as “examples” on the regex section of the documentation! :wink:
https://docs.pi-hole.net/ftldns/regex/tutorial/

-Rick

1 Like
#4

I won’t take credit for these. They are on github, but for the life of me I can’t recall where. When I find the link, I’ll post it.

Edit - thanks @msatter, that was the link.

#5
1 Like
#6

regexp whitelistling would be a really nice feature because now it is possible to set .* as blacklist regexp and deny everything but it is really cumbersome to add all subdomains that are to be allowed instead of just adding *.microsoft.com for an example.

#7

I am already using regex whitelisting for months. I have scripted it myself and move longtime domain pressent in whitelist to regex whitelist.

It will remove white listed domains from the gravity list. It does not overrules blacklist, regex and wildcard entries. For that you use the normal whitelist.

#8

I’m not sure i’m following what your saying but the webgui has ‘exact’, ‘wildcard’(which also is regexp), ‘regexp’ for blacklisting. But only ‘exact’ for whitelisting. What I would like and I’m guessing this feature request is, would be to have the regexp for whitelisting as well.

Thanks for a fantastic software for any developer who’s reading.

#9

I scripted it myself and is not a part of the official development of Pi-hole.

#10

ah ok, that’s nice. Care to share? Or even better submit to the project?

1 Like
#11

We tried to submit it to project but there was not much interest. I am fine with that and Pi-hole is consuming half of the memory now of that when I wrote the script. Motti made also a different implementation as you can read earlier in the thread.

It would be nice if there would be a hook implemented to run it during the import.

#!/bin/bash

# if there is a /tmp dir that can be used use it or put them in /etc/pihole
tmpdirPiHole="/etc/pihole"
[[ -d "/tmp" ]] && tmpdirPiHole="/tmp"


# build from regex.white a cleaning file. So that it can be used to filter out whitelisted domains in gravity.list
awk NF /etc/pihole/regex.white | awk '!/^#++'/ |sed "s/.*/awk \'\!\/&\/\' \|/" > "$tmpdirPiHole/gravity.regex.clean"

# filter white listed domains by running gravity.regex.clean on gravity.list
cat /etc/pihole/gravity.list | source "$tmpdirPiHole/gravity.regex.clean" > gravity.list.tmp"

mv -f /etc/pihole/gravity.list.tmp /etc/pihole/gravity.list
rm -f "$tmpdirPiHole/gravity.regex.clean"

# automatic reload
#pihole restartdns reload

So in /etc/pihole/regex.white you put the regex lines of domains that should not appear in gravity.list and you can reload gravity.list by running: pihole restartdns reload

1 Like
#12

I would (still) welcome wildcard whitelisting incorporated into pi-hole.

We are a Microsoft 365 Business shop (for better or worse), and literally just last week I had to whitelist yet-another microsoft sub-domain after my users couldn’t login into Skype for Business because some blocklist decided it needed blocking. Grrrr…

1 Like
#13

@Rick_v,

For what its worth, Microsoft publishes domain and ip addresses at https://docs.microsoft.com/en-us/office365/enterprise/urls-and-ip-address-ranges including machine readable json linked from there.

1 Like
#14

@smoser,

Wow, this is tremendously helpful! Thank you so much!

#15

Me too. It very much surprised me it wasn’t there, while pihole allowed me to put
*.somedomain.sometld
in whitelist.txt, it didn’t error out or anything.

#16

Is there an easy way to reload dnsmasq from this script from outside of a docker-container? I mean, I would have to reload the list manually everytime the list updates, that’s not going to work.
Also, it would be nice if pi-hole had an ‘execute after’ option for such scripts, like you write, a hook during import or something post all updates, just before reload.

#17

I can’t help you with Docker. That hook would be indeed welcome however it won’t happen soon or never because the risk is that inexperienced users could make a mess. Support would then be difficult because the results are changed and if support does not know this then this is becoming time consuming.

A possible solution was brought forward by me but it is still no go.

split this topic #18

2 posts were split to a new topic: Entering multiple regex at one time