Collection of RegEx for blacklisting

No, I should correct this.

Regex and exact domains work very much differently: Gravity and exact blacklist domains are stored in a tree-structure. Scanning the tree for matching a single domain against (multiple!) millions of domain merely takes microseconds, even on Raspberry Pi hardware. This is because leafs can be found very efficiently in a balanced tree and searching the tree only becomes slower on a logarithmic scale (doubling the number of domains just means one step more work, not double as much work). You should read the Wikipedia article B-tree if you want more details, but be prepared that it is not all that straightforward when you deal with tree data structures for the first time.

[INFO: calculations simplified here and below, full details would clearly exceed the format here]

Scanning a single domain against, e.g., 10 million domains in your blacklist or blocklist (gravity) takes on the order of ceil(log2(10000000)) = 24 computation steps. This is - on all platforms - on the order of few micro- (yes, not milli-)seconds and can really not be made more efficient. This is because we can quickly traverse the tree down to its leaf when the expression we're matching against is constant.

Regex, on the other hand are an entirely different beast. While we can use a tree for exact domains, we have to iterate through all the regular expressions because they all could theoretically match. While an exact is easily found (the memory is just identical), regular expressions are rules and those rules have to be applied to every domain we want to check. Explaining how the matching takes place is beyond what I can do here, too, however, I will just let you guess how much work evaluating the suggested whiltelist regex

is (click on the image to see it in it's entire glory, or click here):

We really do our best to also serve the maximum possible bare-metal performance for regular expressions, however, they are just so much more involved that they will always be a lot more work than exact domains.

It all depends on your architecture (e.g., ARM is slower on regex compared to x86 because it is missing some important optimizations) but I'd always rather use a list with 100,000 domains in favor of one complex regular expression. That being said, Pi-hole got that efficient with regular expressions, that you won't actually notice a difference in speed (I won't notice if loading a pages takes some 10 milliseconds longer).

It's the other way around. As blocklist/gravity matching is much much faster, we do this first. As it may save us from having to do regex at all. The worst case are domain that are not blocked. For them, the entirety of whitelist exact, whitelist regex, blacklist exact, gravity (exact) and blacklist regex have to be searched through. Only if a domain wasn't found on any list, we forward it upstream.