Gravity.list and regex

Informational only...
I was interested in the effect of using regular expressions.
For this test, I reconfigured pihole to using the default lists only:


I'm using the regex list from @mmotti

Running this script (does take a while - there probably is a better way to achieve this - I said before, NOT a Linux expert, nor a regex expert…):




for domain in `cat $gravityfile`
   for regex in  `cat $regexfile`
         if [[ $domain =~ $regex ]]; then
      if $valid; then
echo -e "listsize=$list - regex matches=$regexcount - unique domains=$result"

The result:
number of entries in the list: 136702
number of regex matches: 25342 (18.5%)
number of domains, not covered by regex: 111360 (81,5%)

There are several topics on improving the gravity.list, based on the content of regex.list (here, here and here for example...). These are ingenious, but complex procedures, witch take a lot of processing time and make things complex.
It should however, given an improved and much faster version of the script above, be possible to integrate this in (= pihole -g). Is it worth the effort?


It is more efficient to block domains using the normal method instead of regex. The domains are kept in a structure which allows constant time lookup (very fast). When using regex, the domains have to be checked (in the worst case) against every regex you have loaded, and only after all of that does it know if it is blocked or not (this result is kept until the regex is modified).

1 Like

As with unbound, there will be a delay the first time the domain is queried, after this, the answer is readily available.
I understand there is a pro and con side to this approach, but, for example, I currently have 13949 (no typo it's really 13949, as counted by notepad++ / find / count) matches for in my gravity list. I'm not sure witch method will be the fastest to block the domain (wildcard, regex or gravity).
Other examples: 8397 631

I have, default output from pihole -g, no tweaks, 1280858 gravity entries, 15 regex expressions and 106 wildcard (will be 53 - IPv4 and IPv6, as soon as dnsmasq2.80 is integrated in pihole-FTL) entries.

But many of these are things like

which rarely exist (admittedly, this is only a feeling).

The fastest will be to have everything in gravity as the dnsmasq cache is optimize for fast lookups of domain in cache buckets. This is much faster than wildcard or regex. Wildcards are looped over and it is checked for each requested domain if this domain is part of the wildcard. With regex it is much more dramatic as every regex evaluation is a complex process - with all the possible rules obviously much slower than a simple sub-string comparison. The high regex performance that is seen in FTL is only possible as we pre-compile regex and hence make them as fast as possible in their execution.

TL;DR: gravity.list can as grow as it can fit into memory. The only slowdown you'll observe is the initial loading. Both, wildcards and regex filters do not have any initial overhead, however, their evaluation is notable or even significantly slower than a direct domain cache hit from domains imported from gravity.list.

I still don't understand, given the above statement, why you converted the wildcards into regex. I'm still using wildcard, just changed the filename to avoid regexconverter="/opt/pihole/"

That's why I feel cleaning up the gravity list would be a benefit. Doesn't pihole-FTL needs to load the list, every time you give it a kick (SIGHUP)? wouldn't a cleaned gravity list benefit the load time.

Not so sure about that. from what I understand from @DL6ERs explanation, if your regex list is longer, it will take more time to do the evaluation. The same logic would apply to wildcards, the more you have, the longer it takes to process them.

I assume wildcards are read from file, by pihole-FTL, once, and kept in memory, since the dnsmasq file clearly states you need to stop/start dnsmasq for changes to apply.
I assume the same goes for regex, @DL6ER: could you please confirm/deny this?


Yes. Wildcards need restarting. Regex can also be updated on-the-fly by sending the signal SIGHUP to pihole-FTL. The difference is that dnsmasq is constructed such that configuration can only be read at startup while pihole-FTL can re-read its config whenever needed and regex is an FTL implementation, there is no regex code at all in the contained dnsmasq.