Is this a good regex, or does it go to far (covers domains you don't want to blacklist)? All comments are welcome...
Using a regex here as opposed to blocklist (better solution), because it's impossible to keep up with new facebook domains.
edit
just realized I could remove the wildcard for 'fb', thus using:
Your opening match clause ^(.+\.)?? seems to be more efficient than both my approach ^.*\.? and the one Pi-hole inserts for wildcard entries (^|\.) - more efficient in the sense that regex101.com shows some 8 or 38 steps less needed to evaluate yours (using other-sub.graph.fb.com as test string).
Of course, that's no hard assessment criterion, as Pi-hole's runtime behaviour might differ, depending on the actually regex implementation used.
Midways, mine would still match fbcdn or tfbnw parts, as I apply the leading and trailing wildcard matches in the domain part that you cut away from yours.
Towards EOL, mine would also catch country specific TLDs like .co.uk or .nl, but may overblock, e.g. by also matching facebook.someblacklistingdomain.com.
I am going to adopt your opening match to my regex.
And just out of curiosity, maybe a developer could comment whether it actually would be beneficial to replace (^|\.) by ^(.+\.)?? in simple wildcard matching - absolutely no priority though
@Bucking_Horn I believe my wildcard syntax to be more efficient but I believe the devs are using the current wildcard regexp as visually it is easier to understand.
I would personally keep the tfbnw separate as it will likely never appear as the full tfbnw and it could be introducing extra steps checking for those optional characters every time
You can leave as wildcard at the end if you like but bear in mind this opens up to matching stuff like fbcdn.test.com etc. Maybe not a huge issue but sometimes have to be careful. If the list of tlds is small I would explicitly state them in an or statement at the end
Or look at whether extended regexps would support something like \.[^.]+$ (dot, not dot, to end of string)
Although the matches almost (except 1) require less steps, a NO match requires more steps. Since every domain, NOT in gravity, is always evaluated by regex, I'm NOT so sure this is a better solution than mine, You always need to look for the number of steps, required in case of NO match. This may result in a less efficient looking regex, but speed is all that counts.
You do absolutely want to know the number of steps, required for every individual match, to come to the best (fastest) solution, thus, looking at them one at the time.
Yes, well, it may well be more specific but ultimately they achieve the same goal. Sadly the truth is the more efficient version looks more ugly to most people and may confuse people just starting out with regex.
So, if I were to personally use regexps to block Facebook, my preferences would be as follows:
If I wanted to be very specific: ^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.(com|me|net)$
If I didn't care for issues that may come up under other subdomains (e.g. facebook.test.com) ^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.
If I wanted to accommodate for known and possibly unknown future tlds: ^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.([^.]+|co\.uk)$
Regarding #2 - You do not need to include a .+$ at the end - You only need a partial match. The only reason you would need to specify the full pattern is if there may be some exceptions you would like to make.
This regex has the least number of steps for a NO match, but doesn't cover www.fbnw.com (does it even exist it does exist). Looks like almost a winner...
I believe I originally referenced here although this list may have been updated since I looked at it to form the main regexp last year.
Bear in mind that this is not strictly a fair test for the steps of no match, as it would have partially matched the string before reaching the subdomain and determining it didn't fit the criteria. If you were to use test.com or match.test.com, it would likely have less steps.
Where have you got this domain from? It isn't one that currently looks to be owned by Facebook. It redirects to a parked page saying that the domain is for sale.
I would suggest that, for now, the following is enough
What about .nl or .de?
I think it's safe to assume facebook has registered its name with a TLD of any country they are operating or planning to operate in, so I'd stick to my more general end match