Use @@ to whitelist in ABP style adlists

Indeed, the filter workflow appears to not be documented to that level. Perhaps code on github will reveal how it behaves?

In terms of the use case presented in the opening post

the blocks would be treated as wildcards, as they are now

||c.d.e^ equivalent to regex blacklist ^.*c\.d\.e$

and exceptions would be treated as exact (representing the "it" that is being whitelisted mentioned in the use case). That also eliminates the uncertainty in the last couple of examples you gave.

@@||b.c.d.e^ equivalent to exact whitelist b.c.d.e (or regex whitelist ^b\.c\.d\.e$)

But I agree that's a single use case interpretation and perhaps the filter list maintainer intended for the latter to represent whitelisting all subdomains of b.c.d.e in order to be actually useful in his curated list, and it's not clear if that's how ABP interprets that syntax.

Treating as above would result in:

||c.d.e^       rule 1
@@||b.c.d.e^   rule 2
||a.b.c.d.e^   rule 3
  • c.d.e blocked by rule 1
  • x.c.d.e blocked by rule 1
  • x.x.x.x.c.d.e blocked by rule 1
  • b.c.d.e permitted by rule 2 which is a whitelist and therefore takes priority
  • x.x.x.b.c.d.e blocked by rule 1 since rule 2 is exact
  • a.b.c.d.e blocked by rule 1 or rule 3 (whichever is found in gravity first) since rule 2 is exact
  • x.x.x.a.b.c.d.e blocked by rule 1 or rule 3 (whichever is found in gravity first) since rule 2 is exact

Yeah, it's wildcard following the link @jfb posted above:

(I'm referring to the "structured the same as for blcking rules" part)

Maybe then whitelist the entire thing and give it priority as with current whitelists.

||c.d.e^       rule 1
@@||b.c.d.e^   rule 2
||a.b.c.d.e^   rule 3
  • c.d.e blocked by rule 1
  • x.c.d.e blocked by rule 1
  • x.x.x.x.c.d.e blocked by rule 1
  • b.c.d.e permitted by rule 2
  • x.x.x.b.c.d.e permitted by rule 2 since whitelists have priority and this aligns with the intent of that ABP "domain plus all subdomains" expression
  • a.b.c.d.e permitted by rule 2 which take priority over rule 1 or rule 3
  • x.x.x.a.b.c.d.e permitted by rule 2 which take priority over rule 1 or rule 3

Any unexpected behaviour can be sorted out using Tools > Search Adlists to work out what is blocking or allowing what.

Personally I'm not a fan of importing ABP wildcard blocks or importing any whitelists at all, even less so for whitelists with wildcard overrides, at the DNS level. It feels like giving too much trust to unknown third-parties at that layer of my network, since lists intended for Web usage on individual computers are now controlling all usage lower down on every computer and device.

1 Like

How does that really differ from trusting unknown third-parties to create the lists of domains to block? You're doing that at the DNS level for all computers and devices?

What is the reasoning behind the current paradigm of allowing bulk lists of domains to block access to but requiring manual intervention for tuning those lists? Is it our (Pi-hole's) need to tell users how they deploy Pi-hole and under what specific use-cases we deem it acceptable? Or should Pi-hole be a tool just like every other linux utility that is truly just a tool and it's up to the users to deploy and use as they see fit?

Pi-hole should be a generally set-and-forget application. The more manual intervention required means the less functional utility provided.

That's not the case though. We are not designing or advocating that users download ABP lists. We are advocating and designing a way for list maintainers to create lists using the ABP format for specific entry types. Any format that is non-hosts will have it's own DSL | specific syntax and will fall in the same category.

Any entry that does not conform to the specific and detailed examples of what is accepted is ignored. Any entry that has modifiers or URL only items is ignored. This is really no different than parsing a list that is hosts formatted but has items that are not hosts, we ignore those items as well.

I think the important thing here is that if we were to go with something like this... anyone is free to use or not use the functionality as they so please.

We all of us have different threat models - none of them wrong (well, unless you totally YOLO it and put all of your devices in a DMZ with everything wide open to the world...)

Where I have concern is that we have never allowed third parties (list maintainers, public list posters, etc.) to introduce specific whitelist entries into Pi-hole.

List maintainers are free to not include specific domains into their blacklist, but this doesn't explicitly whitelist those domains in Pi-hole. It just stops their specific list from blocking them. If a user manually blocks the domain or the domain appears on a different list, it is blocked.

When we allow third parties to introduce whitelist entries (how we handle these is TBD), this may eliminate the user from being the final say in what is whitelisted on their individual Pi-hole.

As it currently stands in Pi-hole, if a domain is whitelisted, a local (the only option as of now) whitelist entry will override all subsequent attempts to block that domain (whitelist trumps all). If a user says "domain xyz is always allowed on my network" and deliberately whitelists the domain, that is one thing. If a public list maintainer (with either good or bad intentions) makes this decision, that's another thing.

A public adlist (blocklist) as of now can only prevent things from working - cannot do anything bad. To make websites or apps work may require Pi-hole user input (a conscious decision to allow the domain). With a public whitelist (or whitelisted domains or ranges of domains embedded in a blocklist) imported into Pi-hole, it is no longer within the control of the individual user. A third party is inserting whitelist entries into an individual Pi-hole, with the potential to do bad.

If whitelist entries are embedded into an adlist along with the blacklist entries, the user will likely not be aware of the details. Nobody is going to look through 50,000 lines of text before they subscribe to an adlist.

This is where we should tread carefully.

1 Like

Would this include a user-configurable option to never download ABP-style whitelist entries?

1 Like

In terms of their advertised functions they are aligned. They are presented as ad-blockers

I like to differ because Pihole has morphed into so much more, where ABP can't even begin to compete. In fact the guys at Eyeo-Software messed up so bad, that (almost) nobody is using the original APB anymore. With their "accepted ads" policy they made a perfect example out of @jfb fear, that a bad player slips in a few @@ whitelistings and torpedo other lists.

The only thing that survived the original ABP is the formatting style, we refer to as ABP-style. The reason for that formatting style is - as you already said - a Browser-Extension in form of an Ad-Blocker can see the complete URL. So a BE can block access to a single file on a webserver. That makes it the perfect tool if you just want to block a banner file.

But Pihole (loaded with the external blocklist) can block access to malware files (on malware-domains), block phishing-attacks, block domain-squatting. In conjunction with the user-group feature, it can protect minors from visiting porn sites or other radical stuff parents don't want them to visit. It can protect a network from users using p2p-streaming websites and get sued...

You could say the power of pihole derives from the blocklists provided.

Adapting the ABP-style (even if only rule No. 2) is a valuable weapon in extending those protections. This, because attackers frequently change their subdomains to avoid blocking. Blocking every future subdomain will make it very expensive for these guys. Their domains get burned on first detection.

Thats what we all have now. There is not a single Adblocker-Extension that can even remotely do the same. And I haven't even started on the same possiblities for every IoT-Device, Smartphone, Tablet, Kindle, SmartTV. Try installing an Adblocker on a china-chatting smart-plug. Even if you could... what for... the users problem are the built-in tracking-features, no adblocker can catch.

So adapting a feature that ABP introduced has basically nothing to do with ABP. It enables the list maintainers to fine-tune their lists. And: For the user, he might be able to use other lists (from other products), that use the same syntax.

But on a negative side: Users who operate a simple network-proxy can not use our lists anymore, because the proxy software (at the moment) can't understand the new abp-style format. This comes to an end if the list-provider doesn't offer two variants.

That's correct. The feature to introduce specific whitlist entries was never needed bevor, since every not blocked domain was by default "whitelisted".

But with the introduction of the new abp-style rule No. 2 you can block a million subdomains of a domain. This function creates the need to unblock a certain subdomain to avoid overblocking.

for example: cloud-hosters have a million servers out there just hosting malware (because they are cheap). these can be all blocked with one command. But sooner or later you might discover a project on a cloud-hoster, thats worth unblocking. Of course the pihole-admin can unblock it by hand.

But it would be way easier for the majority of useres to import a blocklist from maintainer ABC and the same maintainer offers Whitelist to avoid overblocking.

The pihole user obviously trusts maintainer ABC because he uses this blocklist. And if he does not trust him... just don't import the whitelist from that maintainer.

This would be an argument for not having whitelist abp-style commands within blocklists. If whitelists come in their own lists, to be imported by the user... there is no harm.

No, thats not correct. A user could see the whitelisting via imported whitelist and as already been stated, a so called local blocking through user would trump an imported whitlisting.

Putting on my Moderator hat: I have put this thread into slow mode for the weekend. I'd ask that anyone with the ability to post outside of this restriction takes a step back and posts within the limits below:

1 Post per 4 hours (per person).

I don't think there's a lot more to be discussed that needs to be done rapidly (certainly the original title suggested this as a more urgent issue than it was)

Good points and suggestions from all angles here - lets take some time to reflect.

I can see both sides of the feature: add some kind of allow-list (exact and ABP style) is a often wanted feature. But "bad" list maintainers could hide whitelisted domains within their >100.000 blacklist domain list and nobody will notice that.

So in oder to balance this I suggest a extra layer of "security" (next to don't allow imported whitelist entries to override manually set blocked domains):
We don't import whitelist entries from mixed-style adlist (allow/blocked) but only when a users manually sets the list type to "allow" (eg. web interface or command line argument). This way the make a conscious decision that they understand that all imported domains overwrite blocking ones and have he ability to review the whitelist.

I've spent a little time today looking at lists which contain ABP entries. From what I've seen they tend to be a mixture of all the available ABP filter types. Presumably these all work in tandem to achieve the curator's desired purpose at the browser layer.

Since DNS can't appreciate such variants, there may be limited value in extracting just the ||domain^ block/allow lines from such a file if the curator's intent is invalidated by not having the variants along for the ride, as these can only be realised at the Web layer in ABP.

That same list has some examples.

@@||redirect.appmetrica.yandex.net^
@@||redirect.appmetrica.yandex.com^
@@||vodafonelimited.tt.omtrdc.net^
@@||kiplinger.112.2o7.net^
@@||equifaxps.122.2o7.net^
@@||logging.apache.org^
@@||metrics.torproject.org^
@@||track2.royalmail.com^
@@||webtrack.dhlglobalmail.com^

Some of these are blacklisted here and would presumably become whitelisted if this list was imported with @@||domain^ awareness.

Match found in https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts: kiplinger.112.2o7.net

Match found in https://www.github.developerdan.com/hosts/lists/ads-and-tracking-extended.txt: vodafonelimited.tt.omtrdc.net

Match found in https://hostfiles.frogeye.fr/multiparty-trackers-hosts.txt: 10180.redirect.appmetrica.yandex.com
10267.redirect.appmetrica.yandex.com
1038135.redirect.appmetrica.yandex.com
(+18 more variants)

If a hosts file blocklist curator includes a domain their intent is to block that domain.

If an ABP filter list curator includes a ||domain^ their intent is not known because there may be other resource/CSS/element/whitelist rules for that domain and their intent was for them to be taken together to be applied in ABP at the Web layer. For example this list contains these entries, which may be an oversight or intentional (unclear what exact ABP behaviour these filters in this order creates – Pi-hole's behaviour would be to whitelist the domain and subdomains, courtesy of the second entry).

||s1.wp.com/*/ie_block_redirect.js$important,domain=indianexpress.com
@@||s1.wp.com^
||s1.wp.com^$subdocument,third-party

Indeed, though I'm more talking about how valid it is to take filters for Web elements, CSS, scripts, HTML, designed to be used on individual Web browsers, and using that to activate wildcard domain blocks and whitelists at the DNS layer affecting all devices and services. This makes me feel very uneasy. If you did, would it be possible to add a switch in Gravity to say "Ignore ABP expressions" please?

My own understanding of our acceptance of ABP style blocklists is so that list maintainers can provide their intended-for-dns-blocker lists in a different format other thna HOST format.

This is a pretty common practice recently - lots of the mega-list curators switched to this format to, in theory, cut down their list sizes while still providing the same blocking effectiveness.

It is not expected, for example, to take a list that is an intended-for-browser-blocker list, and throw it into Pi-hole. Of course there will be false positives in this case.

The intent is to support a very specific subset of ABPs syntax.

Paging @WaLLy3K and @lightswitch05 for their thoughts on this thread as blocklist maintainers/curators, if they have any!

I don't see why not, but in all honesty it is probably not me that will be writing the code for this if it happens, so I wouldn't like to make any promises on anything

4 Likes

I have not switched to the ABP style lists…. But it has been on my mind. It’s funny that this is coming up since a major hurdle with making the switch is what to do about allow-listing subdomains. The only idea I have is to use ABP style for my lists until an allow-list entry has been added, and then the domain would have to switch from ABP style to standard hosts style to accommodate the allow-listed entry. I haven’t convinced myself the work to do all that is worth it.

From a pihole point of view, it seems like it would also be really complicated. If a list used the @@ format- then the pihole would need to make sure to apply that allowlist only in context of that blocklist and not at the global level. I wouldn’t want a block list to allow-list things that are blocked in another list.

2 Likes

Which is exactly the approach that we are discussing.

I think it may be possible to reject a list entirely if it contains any selectors or lines that have extra elements beyond the specific documented ABP-style formats that are expected and will be processed.

That makes sense. I hadn't fully appreciated that. I had imagined this was heading towards ABP lists being casually added to collections of hosts lists and dissected for the wildcard domain elements during gravity import, and now possibly heading towards whitelists of the same type. Instead it sounds like we're talking about lists specifcally curated for Pi-hole but which happen to be in ABP format. That sounds nicer.

I think I'd definitely want a switch to control

( ) Don't import ABP entries
(o) Import only exact blocked domains (similar function to hosts files)
( ) Import exact and wildcard blocked domains
( ) Import exact and wildcard blocked and whitelisted domains

Something like that.

[ In my earlier post I wrote "That same list has some examples" but didn't include the link and can't edit the post in slow mode. This is the list. ]

2 Likes

I might be lost here...

but reading through all your postings, there seems to be a consensus for creating the possibility to import whitelistings through a dedicated list (with no ABP-style, just hostnames).

There are different opinions about importing whitelisting within blocklists AND/OR using ABP-style formatting with the whitelistings.

So why not go with the consensus as a first step, and see what the community says to it. See if they adapt the new feature, if they "ask" for more...

It would seem to me that exemption list subscription is a topic separate from treating @@ rules in ABP style lists, mainly because the former would be applied globally across all blocklists.

This does not seem to be the case for those latter @@ rules:

Indeed, I think this warrants careful consideration.

With HOSTS format lists, a maintainer would include all known domains to block, including subdomains. It's easy to exempt one of those subdomains from blocking - just don't put exempted.example.org it on the list.
Now, if any other list maintainer considers exempted.example.org still worth blocking, the result when using those two lists with Pi-hole would be that exempted.example.org would be blocked.

Now Pi-hole could treat ABP style lists defining @@ exemptions by adding respective whitelist entries.
This would result in the exemption being applied universally across all blocklists, regardless whether they would be more restrictive than the one defining the exemption.

Pi-hole's importer could possibly address that by disregarding any @@ rule in the absence of less specific/more general blocking rules in the same blocklist, and by removing or disabling its related whitelist entry if a more general blocking rule is found in any other blocklist.

Also, the import process should probably handle exact domains only for @@, as this could potentially become more complicated when dealing with subdomains of subdomains if @@ rules would allow wildcard usage as well.

And I agree with Dan that a list should be rejected entirely...

All of that would imply that each @@ rule would have to be treated individually and may have to be inspected and assessed repeatedly during import (i.e. it would be rather expensive in terms of resource consumption).

This has an impact on how I perceive the following suggestions:

If Pi-hole's importer would be able to treat @@ as outlined above, I don't think it would be necessary to control anything at the list level.

Instead, a more fine-grained control would seem more appropriate here, supporting a user to decide on whether specific exemptions should be applied or not, probably explaining what list defines the exemption and what other list(s) prevent it from having been applied.

1 Like

I have been working overtime hours and haven't even started looking at implementing this, but this may have actually been good. Let me try to summarize what we have so far (please feel free to correct/amend if I missed something):

  1. We add support for subscribed allowlists (whitelists)
  2. Domains on these lists can only whitelist domains from subscribed blocklists (adlists) - user-provided exact and regex blocked domains still have higher priority

Whether or not we support @@||abc.com^ on subscribed allowlists can still be discussed.


Programmatically, everything is possible. It just depends on the price we are ready to pay. Pi-hole could make sure that @@||...^ can only affect rules from the same blocklist but this will cause Pi-hole to run (possibly several, depending on the query domain's depth) multi-dimensional lookups against the tree. The implementation details are a bit tricky as we might have the same block rule on two adlists but only one has an @@||...^ exception resulting in that this is still to be blocked. Implementing is surely possible, but handling also the edge cases makes this non-trivial (and slow). If we go down this route, we will definitely have to make this opt-in as this will require a sufficiently powerful host.

The obvious alternative is to reject lists including @@||...^ altogether (we import zero domains). This is kind of the opposite of what this feature request asks for, but the present discussion seems to reveal that this is the better approach for network-wide DNS-based blockers.

1 Like