and your proposed regex would than be?
I will do some experiments when I can get to my laptop
^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$
www.facebook.co.uk -> match, 63 steps
graphs.fb.me -> match, 41 steps
b-api.facebook.com -> match, 45 steps
www.fbnw.com -> match, 46 steps
www.tfbnw.com -> match, 47 steps (only one step more for tfbnw
)
fbcdn.test.com -> NO match, 64 steps
regexper:
pihole-FTL log:
[2020-02-20 15:01:49.737 2720] Regex blacklist (DB ID 54) >> MATCH: "www.tfbnw.com" vs. "^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$"
[2020-02-20 15:02:59.087 2720] Regex blacklist (DB ID 54) >> MATCH: "www.facebook.co.uk" vs. "^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$"
[2020-02-20 15:03:16.154 2720] Regex blacklist (DB ID 54) >> MATCH: "graphs.fb.me" vs. "^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$"
[2020-02-20 15:04:13.793 2720] Regex blacklist (DB ID 54) >> MATCH: "www.fbnw.com" vs. "^(.+\.)??(facebook|(t)?fb(nw)?(cdn|sbx)?)(\.[^\.]+|\.co\.uk)$"
Wouldn't ^(.+\.)??
be actually more specific than (^|\.)
?
Or look at whether extended regexps would support something like
\.[^.]+$
(dot, not dot, to end of string)
That would miss out on .co.uk
(too many dots ).
But combining your more specific mid match with my ending doesn't look too bad:
(@jpgpi250, note you can add mutiple lines to regex101
)
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\..+$
(click for test strings)
www.facebook.co.uk
an.facebook.nl
b-api.facebook.com
graph.facebook.com
graphs.fb.me
other-sub.graph.fb.com
www.fbcdn.com
www.tfbnw.com
www.fbsbx.com
www.fbnw.com
fbcdn.test.com
.|(facebook|fb|tfb)(cdn|sbx|nw|).[a-z.]{2,7}$
www.facebook.co.uk -> match, 37 steps (my regex 63 steps)
graphs.fb.me -> match, 48 steps (my regex 41 steps)
b-api.facebook.com -> match 27 steps (my regex match, 45 steps)
www.fbnw.com ->match, 34 steps (my regex 46 steps)
www.tfbnw.com -> match, 35 steps (my regex 47 steps)
fbcdn.test.com -> NO match, 71 steps (my regex 64 steps)
Although the matches almost (except 1) require less steps, a NO match requires more steps. Since every domain, NOT in gravity, is always evaluated by regex, I'm NOT so sure this is a better solution than mine, You always need to look for the number of steps, required in case of NO match. This may result in a less efficient looking
regex, but speed is all that counts.
note you can add mutiple lines to
regex101
You do absolutely want to know the number of steps, required for every individual match, to come to the best (fastest) solution, thus, looking at them one at the time.
Wouldn’t
^(.+\.)??
be actually more specific than(^|\.)
?
Yes, well, it may well be more specific but ultimately they achieve the same goal. Sadly the truth is the more efficient version looks more ugly to most people and may confuse people just starting out with regex.
That would miss out on
.co.uk
(too many dots ).
Ah, yes. I see @jpgpi250 caught this above.
So, if I were to personally use regexps to block Facebook, my preferences would be as follows:
- If I wanted to be very specific:
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.(com|me|net)$
- If I didn't care for issues that may come up under other subdomains (e.g. facebook.test.com)
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.
- If I wanted to accommodate for known and possibly unknown future tlds:
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.([^.]+|co\.uk)$
Regarding #2 - You do not need to include a .+$
at the end - You only need a partial match. The only reason you would need to specify the full pattern is if there may be some exceptions you would like to make.
If I wanted to accommodate for known and possibly unknown future tlds:
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.([^.]+|co\.uk)$
www.facebook.co.uk -> match, 55 steps (msatter 37 steps) (my regex 63 steps)
graphs.fb.me -> match, 31 steps (msatter 48 steps) (my regex 41 steps)
b-api.facebook.com -> 39 match, steps (msatter 27 steps) (my regex match, 45 steps)
www.fbnw.com ->NO match, 36 steps (match msatter 34 steps) (match my regex 46 steps)
www.tfbnw.com -> match, 35 steps (msatter 35 steps) (my regex 47 steps)
fbcdn.test.com -> NO match, 55 steps ( msatter 71 steps) (my regex 64 steps)
This regex has the least number of steps for a NO match, but doesn't cover www.fbnw.com
(does it even exist it does exist). Looks like almost a winner...
This regex has the least number of steps for a NO match , but doesn’t cover
www.fbnw.com
( it does exist ). Looks like a winner…
I believe I originally referenced here although this list may have been updated since I looked at it to form the main regexp last year.
fbcdn.test.com -> NO match, 55 steps ( msatter 71 steps) (my regex 64 steps)
Bear in mind that this is not strictly a fair test for the steps of no match, as it would have partially matched the string before reaching the subdomain and determining it didn't fit the criteria. If you were to use test.com or match.test.com, it would likely have less steps.
by simply adding nw
to the expression, the no match count only goes up by one.
^(.+\.)?(facebook|fb(cdn|sbx|nw)?|tfbnw)\.([^.]+|co\.uk)$
www.fbnw.com -> match, 37 steps (msatter 34 steps) (my regex 46 steps)
fbcdn.test.com -> NO match, 56 steps ( msatter 71 steps) (my regex 64 steps)
The final winner, or do we keep on going...
www.fbnw.com -> match, 37 steps (msatter 34 steps) (my regex 46 steps)
Where have you got this domain from? It isn't one that currently looks to be owned by Facebook. It redirects to a parked page saying that the domain is for sale.
I would suggest that, for now, the following is enough
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.([^.]+|co\.uk)$
Where have you got this domain from?
dig, see below, but this says 'for sale'...
pi@raspberrypi:~ $ dig @127.10.10.2 -p 5552 www.fbnw.com
; <<>> DiG 9.11.5-P4-5.1-Raspbian <<>> @127.10.10.2 -p 5552 www.fbnw.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55098
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;www.fbnw.com. IN A
;; ANSWER SECTION:
www.fbnw.com. 3600 IN A 69.172.201.153
;; Query time: 293 msec
;; SERVER: 127.10.10.2#5552(127.10.10.2)
;; WHEN: Thu Feb 20 16:45:42 CET 2020
;; MSG SIZE rcvd: 57
I would suggest that, for now, the following is enough
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.([^.]+|co\.uk)$
Agree...
Yeah so I would suggest that this is not a Facebook domain, so not one that the regexp needs to accommodate for at this time
Emoji spam
Edit: Yay. Looks like a resolution
- If I wanted to accommodate for known and possibly unknown future tlds:
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.([^.]+|co\.uk)$
What about .nl
or .de
?
I think it's safe to assume facebook has registered its name with a TLD of any country they are operating or planning to operate in, so I'd stick to my more general end match
^(.+\.)?(facebook|fb(cdn|sbx|nw)?|tfbnw)\..+$
domain | match | steps |
---|---|---|
www.facebook.co.uk |
YES | 44 |
graphs.fb.me |
YES | 30 |
b-api.facebook.com |
YES | 37 |
www.fbnw.com |
YES | 35 |
www.tfbnw.com |
YES | 33 |
fbcdn.test.com |
YES | 43 |
www.example.com |
NO | 32 |
What about
.nl
or.de
?
I think it’s safe to assume facebook has registered its name with a TLD of any country they are operating or planning to operate in
don't see the problem
What about
.nl
or.de
?
These are covered.
\.([^.]+|co\.uk)$
essentially means any single tld (e.g. .com, .net, .nl, .de, .org) or the one known tld to be used with a subdomain (.co.uk), are covered.
^(.+\.)?(facebook|fb(cdn|sbx|nw)?|tfbnw)\..+$
will match things like fb.fbi.gov
or fbsbx.entirelyunrelatedwebsite.com
don’t see the problem
Any TLD would e.g. include .co.au
and .co.nz
as well
Any TLD would e.g. include
.co.au
and.co.nz
as well
This is true, however going by the current facebook "blacklists", the necessary tlds are covered.
However, if you really wanted to be sure that all of your basis are covered above with .co.something
:
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.(co\.)?[^.]+$
Edit:
However, I would still say that:
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.([^.]+|co\.uk)$
is sufficient for all use cases at the moment. I don't even think the .co.uk
part is entirely necessary.
Final Edit:
My recommendation for this regexp going by the blacklists available to use at this time would be:
^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.[^.]+$
^(.+\.)?(facebook|fb(cdn|sbx|nw)?|tfbnw)\..+$
will match things like fb.fbi.gov
or fbsbx.entirelyunrelatedwebsite.com
I plead guilty - I admitted that much in my first post