If a domain/host would be blocked by both the default blocklists and a regex, which one does pihole use preferentially?
I'm asking because if I'm not sure if bulk importing a regex list that may overlap with Domains on Blocklist will cause pihole performance to suffer if regex is preferred over the blocklist?
Ideally, I would need to sort out the regex and only add the ones that are not covered already by the blocklist.
If a domain is found anywhere from top to bottom, FTL skips the rest of the tests.
Yes. The drawback of regex is that any regex filter could match any domain. Hence, we have to loop over all enabled regex filters for each domain that is not either whitelisted or exactly blocked before.
The performance impact of a very large amount of regex filters can be quite hefty on low-end devices like Pi Zero. In contrast, the exact lists use a B-tree for the lookups to ensure that highest performance is reached even when you load millions of domains into these lists.
@DL6ER, thank you for your thoughts and insights. I started doing ad blocking with squid as described here before discovering pihole. In my use case, I actually use both now: pihole as a DNS blocker and squid to share a VPN connection.
As instructed, it uses Peter Lowe's list. I've tried adding the pgl list in hosts format to the pihole blocklist, but it really works better as a regex blocker. If the regex list is long (pgl is over 3k entries), am I better off letting squid handle the regex blocking rather than burden pihole with regex? Thanks again.
It depends on your device. The regex validation is Pi-hole is highly performant and is implemented "bare metal". I doubt any other software can implement regex filtering even fasten than we do . You should just try it. If you're on a Pi Zero, anything will be slow, the proxy will, for sure, be vastly slower than the Pi-hole. If you're on a VM on a sufficient beefy machine, you can do almost anything you want with your Pi-hole without slowing things down too much.
We actually tested FTL will work in environments with several hundreds of clients, over 1 mio. domains on the blocking lists and over 5k regex filters. All that with around 23 million queries per day. Everything worked fine without any notable delay. This test setup ran on a 8 GB, 1 CPU VM. Things just look different on SBC hardware, you should just try it.
Yep, definitely will give it a try. Last question (may be a FAQ), when a domain/host matches a regex (black or white list), is it cached in pihole for the TTL? Or, does the regex matching happen on every query for said domain/host? Or does it get added to Gravity?
Yes, FTL caches the result of the analysis for a given domain/host. If the same domain comes up a second time, FTL already knows how to handle it and does not need to go through all the regex filters.
Just for clarification, it is my understanding the regex rules, e.g. (^|\.)jms.bogus$, are not added to Gravity; I think @DL6ER explained as much in his initial reply to me.
What I was getting at in my last question about regex and Gravity is whether a specific domain, e.g. ads.jms.bogus, that matches a regex filter gets added to Gravity? I think what @jfb is saying is it does not.
Does it make sense to add regex matching hosts to Gravity? It would in a sense make pihole "smart" and able to learn. Just thinking out loud.
What would happen when you remove the regex? All those domains would have to be searched and removed from gravity.
One possibility is for Gravity to simply be reset back to its enabled blocklists when regex is updated in any way. Obviously, it would have to re-learn hosts from any subsequent regex matches.
First: Yes, you are right, in Pi-hole v4.x (which is what you're using), regex comes before gravity. This is not the case for Pi-hole v5.x which is what we are working on for more than half a year. A lot has changed and the internal machinery got optimized in many ways meanwhile.
Second: Adding to gravity is sort-of what is happening. If you'd add such an enormous amount of regex that you'd see a notable delay on the first query but no delay of any subsequent query to the same domain. This is caused by FTL memorizing internally
I have seen this domain before and know how to handle it
This knowledge survives as long until you edit the lists or gravity updates (or FTL is restarted). So you're both kind of correct in your assumptions: Yes, a regex match does not get added to gravity, however, FTL does still memorize and reuse the outcome of its analysis.
This forum also serves as an archive for information so posting information that will soon be outdated is not helpful. Furthermore, I didn't recall that Pi-hole v4.x was doing it differently because - on my Pi - the order I described above is used for more than half a year.
Pi-hole v5.0 is in late-alpha phase with a few code changes still awaiting review and merge before we enter a full public beta-round, followed by the official release.
We develop everything fully open so you can switch already now if you want to give it a go. It is pretty stable, there are only some minor glitches left. Note that when you decide to switch to v5.0, there will be no way going back to v4.x without significant data loss (data is migrated into a new format). You should better create a backup of /etc/pihole in case you want to be able to switch back.
Here's the pihole version I'm on now after checkout dev:
# pihole -v
Pi-hole version is v4.3.2-370-g8ae03b6 (Latest: v4.3.2)
AdminLTE version is v4.3.2-345-g7d1e5eb (Latest: v4.3.2)
FTL version is vDev-ecbeb95 (Latest: v4.3.1)
How do I bulk load a list of regex? The stable v4 uses /etc/pihole/regex.list. The dev version seems to store everything in gravity.db.