Why not use more modern regex tools?

Hi,

I was browsing through source code of FTL and I realized we are using "tre" as an regex engine. Why are we not using more performant engines ? I am not sure if "RE2" would be a better option but why not make a benchmark ?

In this benchmark it does not good for tre :slight_smile:

In the past, users asked for approximate matching like agrep offers. None of the "competitors" in the test you linked support this or am I mistaken?

Pi-hole implements a regex cache so domains are typically evaluated against expressions once. Later queries for the same domain (by the same client) use the cache and do not touch the regex engine at all.

1 Like

Interesting. What is the cache strategy ? Is there a timeout? What triggers the cache flush ? When I looked it up online I saw lots of posts saying pihole and regex should not be used since it will slow down the query. So I removed all my regexs.

Did you have any performance problems while you had regex entries?

Regex is a bit slower than searching gravity, but you are dealing with nanoseconds (or at most milliseconds) and would typically see zero impact that you can perceive.

Here is some background reading:

1 Like

Evaluate regexes and cache the result per client and domain.

No, there is no need for a timeout as any chance possibly resulting in a change flushes the cache.

Adding/removing domains from the lists, a gravity update or changing groups associated with either clients or domains - everything that could cause a domain's blocking status to change. Pi-hole implements selective flushing per client so small changes do not reset everything.

See @jfb's answer.


edit To expand on this very incomplete statement:

is more about Pi-hole being so extremely fast with exact lookups as it is using a B-tree algorithm for its lookups. This all has been explain very often on this forum, see e.g., here for details about the particular why:

very short: one million regex will take several seconds to evaluate, one million domains will take less than a millisecond to evaluate. Having two million domains and 500 regex is probably fine, the other way around surely is not. This general statement wouldn't change too much even when the regex engine would be three times faster.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.