Regex help / facebook

Facebook appears to be at it again, ref this reddit article
I’ve been using the following regex (from the reddit Regex Megathread):


but this doesn’t block the domain, also mentioned in the reddit article.
so I modified the regex to


still no success.

After playing with the online regex tester and regexper, I came up with the following:


this covers all the domains(online regex tester), covered in the reddit article, and the article it refers to, examples:


Regexper shows the following:

Is this a good regex, or does it go to far (covers domains you don’t want to blacklist)? All comments are welcome…
Using a regex here as opposed to blocklist (better solution), because it’s impossible to keep up with new facebook domains.

just realized I could remove the wildcard for ‘fb’, thus using:


still would like to know which one you recommend…

A dump of my Facebook hatred:

### Facebook block ############################################################
instagram # sliding match

For example there is also a which TLD is covered by \.[a-z.]{2,7}$

Mine looks pretty similar:

Your opening match clause ^(.+\.)?? seems to be more efficient than both my approach ^.*\.? and the one Pi-hole inserts for wildcard entries (^|\.) - more efficient in the sense that shows some 8 or 38 steps less needed to evaluate yours (using as test string).
Of course, that’s no hard assessment criterion, as Pi-hole’s runtime behaviour might differ, depending on the actually regex implementation used.

Midways, mine would still match fbcdn or tfbnw parts, as I apply the leading and trailing wildcard matches in the domain part that you cut away from yours.

Towards EOL, mine would also catch country specific TLDs like or .nl, but may overblock, e.g. by also matching

I am going to adopt your opening match to my regex. :wink:

And just out of curiosity, maybe a developer could comment whether it actually would be beneficial to replace (^|\.) by ^(.+\.)?? in simple wildcard matching - absolutely no priority though :wink:

The current wild is is most efficient one and I can’t follow you two in using a less specific way.

@jpgpi250 this was mine some time ago:


@Bucking_Horn I believe my wildcard syntax to be more efficient but I believe the devs are using the current wildcard regexp as visually it is easier to understand.

compiled from your suggestions, how about this:





I would personally keep the tfbnw separate as it will likely never appear as the full tfbnw and it could be introducing extra steps checking for those optional characters every time

You can leave as wildcard at the end if you like but bear in mind this opens up to matching stuff like etc. Maybe not a huge issue but sometimes have to be careful. If the list of tlds is small I would explicitly state them in an or statement at the end :slight_smile:

Or look at whether extended regexps would support something like \.[^.]+$ (dot, not dot, to end of string)

and your proposed regex would than be?

I will do some experiments when I can get to my laptop :slight_smile:

^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$ -> match, 63 steps -> match, 41 steps -> match, 45 steps -> match, 46 steps -> match, 47 steps (only one step more for tfbnw) -> NO match, 64 steps


pihole-FTL log:

[2020-02-20 15:01:49.737 2720] Regex blacklist (DB ID 54) >> MATCH: "" vs. "^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$"
[2020-02-20 15:02:59.087 2720] Regex blacklist (DB ID 54) >> MATCH: "" vs. "^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$"
[2020-02-20 15:03:16.154 2720] Regex blacklist (DB ID 54) >> MATCH: "" vs. "^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$"
[2020-02-20 15:04:13.793 2720] Regex blacklist (DB ID 54) >> MATCH: "" vs. "^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$"

Wouldn’t ^(.+\.)?? be actually more specific than (^|\.) ?

That would miss out on (too many dots :wink: ).

But combining your more specific mid match with my ending doesn’t look too bad:
(@jpgpi250, note you can add mutiple lines to regex101)


(click for test strings)

Because the RegEx is anchored to the right by a $:



If you also want to block the replace \. by (^|\.) -> match, 37 steps (my regex 63 steps) -> match, 48 steps (my regex 41 steps) -> match 27 steps (my regex match, 45 steps) ->match, 34 steps (my regex 46 steps) -> match, 35 steps (my regex 47 steps) -> NO match, 71 steps (my regex 64 steps)

Although the matches almost (except 1) require less steps, a NO match requires more steps. Since every domain, NOT in gravity, is always evaluated by regex, I’m NOT so sure this is a better solution than mine, You always need to look for the number of steps, required in case of NO match. This may result in a less efficient looking regex, but speed is all that counts.

You do absolutely want to know the number of steps, required for every individual match, to come to the best (fastest) solution, thus, looking at them one at the time.

Yes, well, it may well be more specific but ultimately they achieve the same goal. Sadly the truth is the more efficient version looks more ugly to most people and may confuse people just starting out with regex.

Ah, yes. I see @jpgpi250 caught this above.

So, if I were to personally use regexps to block Facebook, my preferences would be as follows:

  1. If I wanted to be very specific:
  2. If I didn’t care for issues that may come up under other subdomains (e.g.
  3. If I wanted to accommodate for known and possibly unknown future tlds:

Regarding #2 - You do not need to include a .+$ at the end - You only need a partial match. The only reason you would need to specify the full pattern is if there may be some exceptions you would like to make. -> match, 55 steps (msatter 37 steps) (my regex 63 steps) -> match, 31 steps (msatter 48 steps) (my regex 41 steps) -> 39 match, steps (msatter 27 steps) (my regex match, 45 steps) ->NO match, 36 steps (match msatter 34 steps) (match my regex 46 steps) -> match, 35 steps (msatter 35 steps) (my regex 47 steps) -> NO match, 55 steps ( msatter 71 steps) (my regex 64 steps)

This regex has the least number of steps for a NO match, but doesn’t cover (does it even exist it does exist). Looks like almost a winner…

I believe I originally referenced here although this list may have been updated since I looked at it to form the main regexp last year.

Bear in mind that this is not strictly a fair test for the steps of no match, as it would have partially matched the string before reaching the subdomain and determining it didn’t fit the criteria. If you were to use or, it would likely have less steps.

by simply adding nw to the expression, the no match count only goes up by one.

^(.+\.)?(facebook|fb(cdn|sbx|nw)?|tfbnw)\.([^.]+|co\.uk)$ -> match, 37 steps (msatter 34 steps) (my regex 46 steps) -> NO match, 56 steps ( msatter 71 steps) (my regex 64 steps)

The final winner, or do we keep on going…

Where have you got this domain from? It isn’t one that currently looks to be owned by Facebook. It redirects to a parked page saying that the domain is for sale.

I would suggest that, for now, the following is enough :slight_smile:


dig, see below, but this says ‘for sale’…

pi@raspberrypi:~ $ dig @ -p 5552

; <<>> DiG 9.11.5-P4-5.1-Raspbian <<>> @ -p 5552
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55098
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

; EDNS: version: 0, flags:; udp: 1472
;                  IN      A

;; ANSWER SECTION:           3600    IN      A

;; Query time: 293 msec
;; WHEN: Thu Feb 20 16:45:42 CET 2020
;; MSG SIZE  rcvd: 57