Discussion about optimal wildcard syntax

you replied:

This afternoon, we had a discussion about a facebook regex

The following came up:

and

do we use (scripted entry)

(.|^)wizaly\.com$

regex101.com says match -> 18 steps
OR

^(.+\.)??wizaly\.com$

regex101.com says match -> 14 steps

your opinion and comment would be highly appreciated.

edit typos

The first one you mention is slightly different than the one we're using (\.|^)
Please repeat your benchmark with the correct syntax as a case-sensitive literal match \. is a few steps faster than an any match. Also, you only specified the regex. When you specify the number of steps it takes, you should also provide the test vector you ran your test against.

I can only assume you tested against wizaly.com for which I see 16 steps for our and 14 steps for the proposed other wildcard regex. However, this is not a definite answer as you also need to consider subdomain matches. Given abc.wizaly.com, our regex needs 26 steps whereas the other one needs 31 steps.

You see, regex magic is not necessarily as simple as it may seem sometimes and we've already invested a lot of effort also into the more subtle things of Pi-hole. Not everything is documented, however, that doesn't mean the things around FTL are not heavily optimized anyways.

that's not my fault, that's discourse, cutting chars when block quoting.
another attempt to list the regex

(\.|^)wizaly\.com$

versus

^(.+\.)??wizaly\.com$

and a screenshot of what I type
image

Seems okay.

repeating, you split while I was replying

so which on is the better one, the first one is the one you use, the second one appears to require less steps?

How so?

end of discussion.
when entering the domain wizaly.com in regex101.com, ^(.+\.)??wizaly\.com$ wins
when entering the domain abc.wizaly.com in regex101.com, (\.|^)wizaly\.com$wins

(hoping discourse doesn't change the code again...)
so your regex, wins, if a subdomain is used, which is what will happen in real life.

Sorry for my mistake.

edit
@Bucking_Horn, you showed an interest in this, here is the answer, provided by the smarter (than us) developer.
/edit

Thanks, I'd been alerted as soon as you quoted me in your opening post :wink:

But while we're at it, I add my two cents :wink:

In the very post you quoted me from, I was also doubting whether number of steps as calculated by regex101 would actually suffice as assessment criterion, as Pi-hole would likely use a different variant (ERE vs.PCRE) and most certainly a different implementation as that website.

Anyhow, as @DL6ER's musings here also take those number of steps into consideration, it seems that with increasing subdomain length, the preference for a method would break even on length=4.
|domain|(^|\.)|^(.+\.)?|
|---|---|---|
|1.domain.com|20|29|
|12.domain.com|23|29|
|123.domain.com|26|29|
|1234.domain.com|29|29|
|12345.domain.com|32|29|
|123456.domain.com|35|29|

Matching additional subdomain levels would tip the balance even earlier:
|domain|(^|\.)|^(.+\.)?|
|---|---|---|
|1.1.domain.com|28|29|
|1.12.domain.com|31|29|
|1.123.domain.com|34|29|
|www.tripod.lycos.com|48|27|

(I've included the last entry (lycos) as an example of a real world domain.)

So in searching an answer for my question, we have to move away from simple hard facts into heuristics, where the best solution for @jpgpi250 might not be equally beneficial for me.

However, I think it is safe to assume that the vast majority of those matches would be executed against www., which is clearly approving the solution that Pi-hole has chosen - gotta love them guys :wink: :+1: :smiling_face_with_three_hearts: