Apply Pi-Hole blocking to CNAMEs

anon55913113 · November 20, 2019, 7:54pm

www.ziggo.nl try that and look where you landed.

Tablet is out of battery so I am also out.

ajp2k17 · November 20, 2019, 7:57pm

Interesting...

DanSchaper · November 20, 2019, 8:02pm

So www.ziggo.nl is bad and does things behind your back, but you still want to be able to visit the site?

anon55913113 · November 20, 2019, 8:14pm

I don't know that it does bad things behind my back. All is hidden and I don't have a canary to indicate it.

If it are bad sites harming our privacy then I rather cut that limb off than go there again.

We are fighting for free and honnest Internet and CNAMES are now used, to deceive us.

Ps. I am now on a old tablet so I can keep posting.

DanSchaper · November 20, 2019, 8:26pm

What mechanism did you use to find out that it was a bad domain since you've used it here as an example. That may possibly be a better mechanism to explore, or something to use in addition to the requests made so far.

anon55913113 · November 20, 2019, 8:30pm

I know that game and I am not going to play along.

PromoFaux · November 20, 2019, 8:35pm

I think, @anon55913113, it was a genuine request, as opposed to a game. I wouldn't say this feature request is something we can fully rule out, but until we have explored everything, it's not something we can fully get behind, either.

The more information the better.

DanSchaper · November 20, 2019, 8:36pm

It's not a game, you may have an approach that is superior to what is being discussed so far.

drewski · November 20, 2019, 8:44pm

Not gonna lie it feels like we are barking up an empty tree here.

DanSchaper · November 20, 2019, 8:46pm

You're asking for something that fundamentally changes how the resolver works, I'm asking for all available data and options before adding in that kind of complexity.

drewski · November 20, 2019, 9:00pm

Don't get me wrong it's a lot to take on and I understand your position.

DanSchaper · November 20, 2019, 9:03pm

I don't want to come across as shutting down the conversation either. I'm open to other views and arguments but I need the information behind the motivation to be able to talk to the rest of the team with a good view of all the options.

unixfox · November 20, 2019, 10:20pm

At first when creating this feature request I though that going the way to resolve every CNAME and check if the corresponding domain is included in the blacklist wouldn't hurt performance that much because again I though that a DNS server would more efficient at doing this than a browser extension and thus could handle this small overhead.
And when I read in this issue on the Adguard GitHub repository that they planed to do it the way I described above I really though that Pi-hole could do the same.

But after DanSchaper gave his feedback I realized that the initial idea wouldn't be feasible due to potential increase in memory consumption and others issues described in his comments.

So I read again the GitHub issue of ublock origin and I think the best way to avoid hurting performance would be to instead of resolving every CNAME, we would rely on a list (that could be compatible with ublock origin) to know which CNAME domains needs to be resolved and then check if the corresponding domain is included in the blocklist. This idea is similar to what gorhill said in the GitHub issue:

The next step is for me to pick a cogent way for filter list maintainers to be able to tell uBO to uncloak specific hostnames, as doing this by default for all hostnames is not a good idea -- as this could cause a huge amount of network requests to be evaluated twice with no benefit for basic users (default settings/lists) while having to incur a pointless overhead -- for example when it concerned CDNs which are often aliased to the site using them.

The thing is currently I don't have any idea how this could actually technically implemented. If we rely on a separate list with a new format or if we add a way to distinct in the current list format a normal domain and a (wildcard) CNAME domain that needs to be checked for his corresponding domain?

In the end if we use a list this would only be useful when using wildcard (example: *.liberation.fr) to specify that for example if a subdomain of liberation.fr is requested it needs to be checked for a potential corresponding domain that would be included in the blacklist. Because when using a regular domain this could simply be added as a normal domain that needs to be blocked.

jfb · November 20, 2019, 10:28pm

In your example, you can quickly determine the subdomains that the site is using for ad serving. Can you block them with a regex?

Bucking_Horn · November 20, 2019, 10:31pm

@unixfox was inspired to bring this to Pi-hole's attention by discussion on uBlock Origin's Github that started some ten days ago.

He posted the link, so I took the freedom to read a bit into it.

In the example given there, it seemed that www.liberation.fr contained a link to f7ds.liberation.fr.
However, that harmless looking name resolved to CNAME liberation.eulerian.net, which is indeed hosted by a tracking company that boasts itself for supreme quality first party tracking.

It was further elaborated that blocking could prove difficult as the subdomain part (f7ds) is most likely produced randomly.

In a quick browse on liberation.fr right now, I could not spot that kind of resolution in Pi-hole's logs anymore, just as Raymond Hill of uBlockO stated.

Addressing this would seem worthwile, but only if this technique has a decent amount of market penetration already.
The effort that goes into developing a countermeasure has to be justified by the amount of malevolent traffic that would get blocked for the average user.

unixfox · November 20, 2019, 10:34pm

If you are talking about using this simple regex: *.liberation.fr.
No because this would block every sub domain of liberation.fr, imagine if they serve their static files (CSS, JS and images) from static.liberation.fr this would render the site unusable.

jfb · November 20, 2019, 11:03pm

That was not the regex I referred to.

More like ^[a-z][0-9][a-z][a-z].liberation.fr or something along those lines. Tailored to look for the pattern of the CNAME subdomains that lead to the undesired server.

unixfox · November 20, 2019, 11:16pm

The issue with this regex is that if the dev behind liberation.fr change the randomization method we are screwed up.
Whereas with a wildcard on the domain liberation.fr to ask Pi-hole to check every CNAME of the subdomain liberation.fr it's quite a permanent solution because if the technique to randomize the subdomain change, Pi-hole will still catch the bad domains without any additional interaction. It's essentially telling to Pi-hole that liberation.fr does shady things like tracking/displaying ads and every subdomain needs to be checked before serving the request to the client.
So unless liberation.fr change their main domain or start serving their tracking scripts/ads from the main domain it's game over for them.

drewski · November 20, 2019, 11:50pm

^This is exactly why it's needed

DanSchaper · November 21, 2019, 12:36am

Again, my point is "liberation.fr does shady things like tracking/displaying ads" should be a huge flag that the entire site needs to be nuked from orbit. Just because they aren't kicking YOUR puppy doesn't mean they are good people.