Allow to prioritize RegEx over domainlist

yubiuser · June 23, 2022, 2:56pm

Adlists take precedent over regex blacklist rules when blocking happens. Basically, there's some domains that I want to set the ";reply=" option on, but these domains are already being blocked by various adlists I use.

I really wish there was a priority setting or something we could set on regex blacklist rules to make them override our adlists when needed.

jpgpi250 · June 26, 2022, 9:28am

Same thing goes for the settings in the dnsmasq configuration files.

If I'm correct, the current order is (read the words decision tree in an other post):

Exact Whitelist
Regex Whitelist
Exact Blacklist
Blocklist domains (AKA gravity)
Regex Blacklist
dnsmasq configuration files (/etc/dnsmasq.d/)

personally, I consider the dnsmasq configuration files as a sort of "whitelist" entry, meaning, entries in these files should be considered as the desired answer and should take precedence over blacklist entries.

The problem you're describing (set the ";reply=" option) could possibly / alternatively be solved by prioritizing (all? / specific?) dnsmasq configuration files. You would than be able to specify the desired response in a dnsmasq conf file.

Whatever option is chosen, I assume changing the order would impact pihole-FTL's response time to queries. DL6ER probably has pro / con arguments not to change the current order...

edit
somewhat confused here, thinking further about this, I added a regex whitelist entry, as this may (possibly - you haven't provided an example) solve your dillema (add a whitelist entry with the desired reply as opposed to a blacklist entry):

^adebis.hotstaff.co.jp$;reply=192.168.2.78

The entry is on a list:

pihole -q adebis.hotstaff.co.jp
 Match found in https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts:
   adebis.hotstaff.co.jp
 Match found in https://dbl.oisd.nl:
   adebis.hotstaff.co.jp

I would asume, given the decission tree, this would trigger the whitelist regex entry, the regex test says it does:

 pihole-FTL regex-test adebis.hotstaff.co.jp ^adebis.hotstaff.co.jp$;reply=192.168.2.78
[i] Compiling regex filter...
    Compiled regex filter in 0.220 msec

Checking domain...
    ^adebis.hotstaff.co.jp$ matches
   Time: 0.073 msec

however, dig doesn't reply with the configured address (192.168.2.78), the real address is returned.

There are 2 possible causes (i think):

the dig reply indicates CNAME, doesn't the regex look at these entries?
the reply types aren't operational for whitelist entries?

@DL6ER, could you take a look?
/edit

yubiuser · June 26, 2022, 11:56am

I think this is the case here. I tried it with a domain that does not return a CNAME

rockpi@rockpi-4b:~$ pihole-FTL regex-test test.co.jp
[i] Checking domain against blacklist...
    ^test.co.jp$;reply=1.1.1.1 matches (regex blacklist, DB ID 213)
    Hint: This regex forces reply type IP
    Time: 0.398 msec
[i] Checking domain against whitelist...
    Time: 0.018 msec
rockpi@rockpi-4b:~$ pihole-FTL regex-test test.co.jp
[i] Checking domain against blacklist...
    Time: 0.218 msec
[i] Checking domain against whitelist...
    ^test.co.jp$;reply=1.1.1.1 matches (regex whitelist, DB ID 214)
    Hint: This regex forces reply type IP
    Time: 0.189 msec

Corresponding dig

dig test.co.jp

;; ANSWER SECTION:
test.co.jp.		2	IN	A	1.1.1.1

dig test.co.jp
;; ANSWER SECTION:
test.co.jp.		1800	IN	A	157.7.44.240

jpgpi250 · June 26, 2022, 12:28pm

This looks like something that can be fixed (reply types for whitelist regex).

Would this solve your use case, whitout honnering your initial request (Allow to prioritize RegEx over domainlist)?

Since you have a shorter line of contacting DL6ER (moderator -> developer), you could ask him...

Anova3 · June 27, 2022, 6:10pm

I was the one originally with the problem in the linked thread. I also tried the whitelist thing at one point and figured out it didn't work.

And yes, it would technically solve my use case if the ;reply= extension worked with whitelists, on two conditions:

It also has to block other kinds of replies from upstream servers. (i.e. I don't want it to reply with anything else in addition to my whitelist rule's reply. DNS servers can often do this when multiple results are available for a domain name, and I fear that would undermine the point.)
It needs to show the domain(s) as 'blocked' in the query log and other such stats. I know this is probably more of a nit-pick, but it's one of the reasons I didn't suggest a change to the whitelist in the first place. (It's a blocking rule, so it should be a blacklist entry - it just makes more sense that way.)

Ideally, I just need a way to override whatever is in the current adlists I'm using with a reply type other than the default or configured blocking type. (and if all my wishes came true, a TTL setting per domain would be awesome)
If this means catching it on the front end and supplying a list of domains I don't want imported into gravity, then that's fine too.

The core problem is that some devices and programs don't respond favorably to the 'nodata' reply type, so other types of blocking for very specific cases is required to make them behave. While this is an edge case in terms of % of devices affected, it seems to happen commonly enough to enough people to warrant some kind of solution. (i.e. most people seem to have at least one such device causing a ruckus)

I understand the reason the priority on these different lists is the way it is. Performance is important, and I know how hard that can be to optimize, and i know there's probably some other folks who like it where it is now because they have complex regex rules that would take a while to run if they had to be run against every blocked query.
That's why I suggested the option of being able to elevate individual rules in priority. Either that or just a reply type option on the normal blacklist, since that still happens before gravity.

jpgpi250 · June 27, 2022, 6:49pm

found the following code in regex.c:

// Return redirection addresses for a given blacklist regex (if specified)
bool regex_get_redirect(const int dbID, struct in_addr *addr4, struct in6_addr *addr6)

I asume this code (the comment indicates this) is currently only execuded for blacklist regexes. In order to achieve what we all want (solve our use case, by using a whitelist regex, with reply specification), this code (or similar) needs to be executed for whitelist regexes. Not within my skillset to achieve this, developer help required...

Willing to test.

Bucking_Horn · June 28, 2022, 8:38am

I agree: Adlists are always exact domains exclusively.
Instead of elevating specific regex, it would seem to me that supporting ;reply= for such exactly blocked domains would address what is asked for most adequately.

And some probably not so helpful und unrelated personal comment ;)

The core problem is that companies expect to be paid for being able to pry into and monetise our data. Personally, I have little patience with devices that would misbehave in such a way - I just return them to the retailer (I understand that is a rather strong sentiment and may not be an option if you'd somehow depend on a specific device).
Apart from raising an issue with the manufacterer or hoping on legal regulations, that seems like an adequate way to address the issue.
In my opinion, at the very least, such devices should work out of the box without leaking data unauthorised, offering an opt-in to allow them to collect any metrics, if at all. That may probably spare us from taking up technical measures like DNS filters to contain them. But my bet would be that the majority would search for ways to circumvent those technical measures, rather than giving you any degree of control over your data.

Anova3 · June 28, 2022, 11:55am

Agreed on all accounts. I can't replace my devices/apps at the moment, and the vendors will not respond to any complaints. Being a technophobe isn't going to save us here.

Back on topic: this feature request would also provide a more flexible and graphical method of implementing the blocks for iCloud's Public Relay and Mozilla Canary.
Normally you have to open the FTLDNS config to insert those blocks. This could do the same thing, but within the web UI, and it would cover a handful of other similar/future use cases like this without having to implement more one-off FTLDNS config settings.

jfb · June 28, 2022, 1:53pm

How so? The specific response provided by Pi-hole to those domains is NXDOMAIN, not the default NULL for other blocked domains.

Anova3 · June 28, 2022, 8:33pm

If we can specify a ;reply=nxdomain for those domains in such a way that it overrides adlists, then isn't that the same thing as what those settings are doing?

Example blacklist entry for mozilla canary:
use-application-dns\.net;reply=nxdomain

jpgpi250 · June 28, 2022, 10:10pm

Never tested this, but does pihole-FTL block anything while pihole is disabled (menu or desktop shortcut)?

User example: a page doesn't load, the user hits the desktop shortcut (or any other method) to disable pihole temporarily. The page can now be refreshed and loads as expected. Unfortunately, all other devices on the network (Network-wide Ad Blocking) are also no longer blocked (temporarily) and can possibly pick up the IP addres(ses) for the otherwise blocked domain(s).

To overcome (prevent) this (when using unbound), you can add a configuration file to unbound to provide the desired reply, even if pihole is temporarily disabled. example ( /etc/unbound/unbound.conf.d/blocked.conf)

# https://nlnetlabs.nl/documentation/unbound/unbound.conf/
server:
        local-zone: "use-application-dns.net." always_nxdomain
        local-zone: "mask.icloud.com." always_nxdomain
        local-zone: "mask-h2.icloud.com." always_nxdomain

If you are speciffically targetting (o)DoH, read this unbound manual, explaining how response policy zones work. The (o)DoH rpz file contains the known domains from various sources.

jfb · June 28, 2022, 10:48pm

No.

[pi@Pi-3B-DEV:~ $ pihole disable
  [i] Disabling blocking
  [✓] Reloading DNS lists
  [✓] Pi-hole Disabled
pi@Pi-3B-DEV:~ $ dig mask.icloud.com

; <<>> DiG 9.16.27-Raspbian <<>> mask.icloud.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13021
;; flags: qr rd ra; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;mask.icloud.com.		IN	A

;; ANSWER SECTION:
mask.icloud.com.	43200	IN	CNAME	mask.apple-dns.net.
mask.apple-dns.net.	3600	IN	A	17.248.139.229
mask.apple-dns.net.	3600	IN	A	17.248.139.231
mask.apple-dns.net.	3600	IN	A	17.248.139.230
mask.apple-dns.net.	3600	IN	A	17.248.139.232
mask.apple-dns.net.	3600	IN	A	17.248.139.228
mask.apple-dns.net.	3600	IN	A	17.248.139.233

;; Query time: 109 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Jun 28 17:47:48 CDT 2022
;; MSG SIZE  rcvd: 172

pi@Pi-3B-DEV:~ $ pihole enable
  [i] Enabling blocking
  [✓] Reloading DNS lists
  [✓] Pi-hole Enabled
pi@Pi-3B-DEV:~ $ dig mask.icloud.com

; <<>> DiG 9.16.27-Raspbian <<>> mask.icloud.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 40088
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;mask.icloud.com.		IN	A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Jun 28 17:48:07 CDT 2022
;; MSG SIZE  rcvd: 44

DanSchaper · June 29, 2022, 12:30am

This has been a long time nit for me. We've modified the development code to change the default condition when FTL is offline in an error condition. I remember having a discussion about being able to disable per group instead of for the system as a whole, I don't remember how that chat ended.

RPZ's are kind of interesting to me. I remember Paul Vixie talking about them a few years ago in relation to Pi-hole...

jpgpi250 · June 29, 2022, 7:14am

I've been using them for a while now, there are some interesting sources (other than mine - (o)DoH only), example: urlhaus, updated every 5 minutes.

After I've made a request, the unbound team made a change to their code to allow pihole-FTL to (sort of) indicate a domain was blocked due to an RPZ entry.

A long time ago, DL6ER added code to indicate a domain was blocked by external (status 8 - By upstream server (NXDOMAIN with RA bit unset)). This was originally intended to give a different status for domains, filtered by quad9 (if I remember correctly).
This status value can now be used, if configured in the unbound rpz config, to indicate a domain was blocked by an RPZ entry (unbound, but also possible with bind, knot-resolver, ...), details in my RPZ document.

edit

thanks for the test (blocking when pihole is disabled).

A user can add additional entries to the above blocked.conf example to ensure blocking is applied for domains that require blocking, regardles of pihole-FTL disabled/enabled status. Example:

local-zone: "userlocation.googleapis.com." always_null

notice the canary domains where blocked with always_nxdomain, to trigger the desired response (example: firefox browser), but it is possible to use always_null, this to deliver the same response as pihole-FTL (default blocking mode) would provide.

If you're going to use a large number of entries, response policy zones are a better option.
/edit

DanSchaper · June 29, 2022, 7:43pm

Yes, you remember correctly. And nice to know that it's useful in other places as well.

DL6ER · June 29, 2022, 8:16pm

Sorry, I have been swamped with other (real life) projects and didn't check back here for quite some time.

Firstly, yes, this looks about right (I changed your itemization into an enumeration):

with come comments:

There is a "zeroth" step you have to prepend which are the "special domains" (use-application-dns.net and mask.icloud.com + mask-h2.icloud.com). But their behavior can be changes easily via config options.
Executing regular expressions is a lot more computationally expensive than doing lookups in our exact domain. While the exact domains are organized in a B-tree (allowing logarithmic lookup times), regular expressions have to be iterated one after another for every domain. Running a single regular expression against a domain can easily be more work than comparing the domain against several millions of exact domains (not because regular expressions are so slow but because looking up in the tree is so blazingly fast).
Furthermore, FTL tries to stay out of your everyday browsing as much as possible. Hence, we look at whitelisted domains first (this is typically stuff you want to visit) while you usually don't care too much if it takes some extra milliseconds until ads are blocked.
You have to make an important distinction in the code-flow between 5. and 6. because this is where we leave FTL code and enter dnsmasq code. When nothing matches in steps 0-5, FTL hands the query to the embedded dnsmasq core as if nothing would have happened. Hence, we have no influence on prioritizing and dnsmasq config files.

The second being

which is something worth considering. However, it also needs some (maybe non-trivial) code changes as whitelist is currently simply pushing though queries as fast as possible to the embedded dnsmasq for further processing. Equipping whitelist with the reply option means we need to add another code path that basically handles these queries as if they were blacklisted (while they are not) so FTL steps in and crafts the DNS response with the requested reply type.

I'll think about how to achieve this, however, it will either be hacky or require some good amount of generalization in the code (basically widely remove the separate code paths for black and white domains). As it will be the latter, it is some work.
Feel free to ping me when you get the impression that I forgot about this.

DL6ER · June 29, 2022, 8:37pm

Adding to what I wrote (I just re-read some posts here):

Arbitrarily changing priority would be complicated to implement. Allowing to specify that black regex should be lookup up before the adlists which be fairly simple, however, I'm not sure how easy it would be to explain what this config option is doing. When a feature isn't easily explainable, it seems wrong to have it (as there should be a better alternative).
We cannot add extensions like ;reply=... to the exact lists as this would completely destroy the extremely favorable fast lookup times.
If you whitelist, you can do everything what you want with this domain in a dnsmasq config file right now, so this comment

jpgpi250:

You would than be able to specify the desired response in a dnsmasq conf file.

is directly applicable to whitelisted domains even without any code changes and with all versions of FTL.

jpgpi250 · June 29, 2022, 8:56pm

@yubiuser: you really need to provide the example for which you are trying to change the response.

If I understand DL6ER correctly, this would mean the reply a normal dig for an exact whitelisted domain can be changed with a dnsmasq.conf entry.

Just tested this:

whitelist entry for assets.adobedtm.com
dig assets.adobedtm.com -> 2.21.17.6
added dnsmasq conf file:

address=/assets.adobedtm.com/192.168.2.74

dig assets.adobedtm.com -> 192.168.2.74

Thus, confirmed, this clearly eliminates (partially??) the need for whitelist regex entries with the ;reply= option

Thank you DL6ER, for your valuable insight (as always)...

jfb · June 29, 2022, 9:08pm

Yubi is not the originator of the thread. He made the feature request in response to a separate post as linked at the top of this feature request. The author of the original post is the person who can provide the matching examples.

Anova3 · June 30, 2022, 2:30pm

I'm the originator of the help ticket.

Some examples of things I'm using, since you asked for them:
^scribe\.logs\.roku\.com$;reply=refused
^api2\.branch\.io$;reply=192.168.192.168
^cws\.conviva\.com$;reply=nxdomain

It's all well and good that folks want to be thorough with all of this, but I feel like we've over-complicated the problem a bit.

I have specific domains (not some kind of wildcard range of domains) that I want to give specific reply types to. The devices calling them get spammy when supplied with the default nodata reply type.

The catch is, these domains are already in the adlists I use (and not just one or two) and they deserve to stay there for folks who don't know enough yet to pursue alternative blocking modes like I'm doing now to optimize my network.

Regex blacklists have this feature, but they're overridden if the domain in question is in an adlist. That's the only reason I brought them into this.

I don't need full regex blocking. I just need the ability to specify a reply type for specific domains, and I need that to override what's in the adlists.
We've suggested a lot of options for implementing this feature, and regex lists have been a recurring thing mentioned, but regex itself is not needed.

So, if you tell me there's a way to accomplish this by putting some lines in the dnsmasq conf files, and whitelisting the domains in FTLDNS, then that's great! I don't know how to do that yet, but I can figure it out, I'm sure.
One question about this method: do domains whitelisted by FTLDNS and blocked/redirected by dnsmasq still show up in the logs as blocked? or are they shown as allowed?

Alternatively, is there a chance we could have a way to write custom "zeroth step" rules? In other words, is there a way we can specify our own "special domains"? This sounds close to what I'm trying to do. I'm guessing there's some reason we can't do this, but I figured it didn't hurt to ask.