Apply Pi-Hole blocking to CNAMEs

Why do you think unbound should be relevant for this feature? The majority of Pi-hole users will likely run FTL with a distant upstream DNS. We are looking at implementing this somewhere but only inside Pi-hole itself.

The CPU utilization is a difficult, maybe misleading measure here as it is a momentary unit. Better suited would be either looking at load or memory utilization. However, the best metric for measuring the performance of Pi-hole should be the delay of replies. This is obviously difficult with queries that are sent upstream, however, replies answered from cache undergo (mostly) the same routines and can be used to measure.

Say your local delay is 2 msec and you have 4 active clients making 10 queries per second, this means Pi-hole will be busy for 0.002410 = 40 msec per second. This is a "busyness" of 8%.
If you, however, use the same hardware and due to your many regex filters, the delay per query is, say, 100 msec, then the business will be 0.100410 = 400%. This means the Pi-hole would only be able to reply to 1/4-th of the incoming queries in time and a certain backlog will build up. Clients who retry queries because of the delay make things only worse.

I hope this example makes it clear why we have to keep the delay (= the work per individual query) as low as possible. I should also say that I have never seen a delay coming even close to 100 msec even with regex filter lists going into the hundreds.

Having said all that, I spent a few hours writing and testing a suitable framework for keeping the majority of our current regex performance also with the newly proposed way of doing things. This could very well reduce the additional amount of work from N-times (where N may be a large number in unfortunate scenarios) to two-fold, which is obviously a much better compromise. Lots of implementation work still has to be done but we are, again, on a good track now.

1 Like

Apparently, there is no solution to this problem, now or in the near future. This script will at least inform you for which domains this is happening, at least for unbound users, that have enabled unbound-control.
It's not pretty, but it does the job. The result can be found in /etc/pihole/cnamematches.list and is formatted to be used by pihole-FTL, a restart is required (addn-hosts=/etc/pihole/cnamematches.list).
The script isn't smart, so CNAME entries that need to work, should be added to the whitelist and removed from the result file.

'edit'
added comment to the result in /etc/pihole/cnamematches.list
result will now look like:

0.0.0.0 fonts.gstatic.com # CNAME gstaticadssl.l.google.com found in gravity list

'/edit'

#!/bin/bash

TAB=`echo -e "\t"`
sudo /usr/sbin/unbound-control dump_cache | grep 'CNAME' | grep "$TAB" | while read -r line; do
   stats=$line
   set -- $stats
   domain=$(eval echo \${1%.})
   cname=$(eval echo \${5%.})
   #echo 'domain: '$domain
   #echo 'cname: '$cname
   if grep --quiet $cname /etc/pihole/gravity.list; then
      if ! grep --quiet $domain /etc/pihole/cnamematches.list; then
         if ! grep --quiet $domain /etc/pihole/whitelist.txt; then
            if ! grep --quiet $domain /etc/pihole/gravity.list; then
               #echo 'cname found in gravity: ' $cname
               #echo 'domain to add to blocklist: ' $domain
               printf '0.0.0.0 '$domain' # CNAME '$cname' found in gravity list\n' | sudo tee -a /etc/pihole/cnamematches.list
            fi
         fi
      fi
   fi
done

Since the unbound cache is very much alive, you'll need to schedule a cron job to execute the script regularly. I'm aware this only works for entries, cached by unbound, so some CNAME entries could be missing.

The command dig @127.10.10.2 -p 5552 +dnssec f7ds.liberation.fr (modify IP and port to directly request resolution by unbound), will ensure the domain, used in the original post is available in the cache, prior to running the script, with the assumption that liberation.eulerian.net is part of a blocklist

pihole -q liberation.eulerian.net
 Match found in list.3.dbl.oisd.nl.domains:
   liberation.eulerian.net

liberation.eulerian.net is already in the gravity list, because it is in list.3.dbl.oisd.nl.domains.

It is f7ds.liberation.fr that will be added to /etc/pihole/cnamematches.list

This can never be a wildcard (cannot be liberation.fr) because, for example, www.liberation.fr may be a valid page, static.liberation.fr may contain some additional elements required. In order to effectively prevent the CNAME trick, we need to specify the domain exactly (f7ds.liberation.fr)

/etc/pihole/cnamematches.list will contain:

0.0.0.0 f7ds.liberation.fr

Another example to make things clear:

gravity.list contains gstaticadssl.l.google.com

pihole -q gstaticadssl.l.google.com
 Match found in list.3.dbl.oisd.nl.domains:
   gstaticadssl.l.google.com

gravity.list doesn't contain fonts.gstatic.com

pihole -q fonts.gstatic.com
  [i] No results found for fonts.gstatic.com within the block lists

but gstaticadssl.l.google.com is a CNAME for fonts.gstatic.com

dig fonts.gstatic.com

; <<>> DiG 9.11.5-P4-5.1-Raspbian <<>> fonts.gstatic.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15334
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;fonts.gstatic.com.             IN      A

;; ANSWER SECTION:
fonts.gstatic.com.      0       IN      CNAME   gstaticadssl.l.google.com.
gstaticadssl.l.google.com. 0    IN      A       172.217.19.195

As soon as CNAME gstaticadssl.l.google.com is in the unbound cache AND the script runs, fonts.gstatic.com will be added to /etc/pihole/cnamematches.list

0.0.0.0 fonts.gstatic.com

/etc/pihole/cnamematches.list will only grow (learn as time progresses) when using a cron job to execute the script. Example (my script is /home/pi/cron/cname.sh):

5,20,35,50 * * * *  root PATH="$PATH:/home/pi" /home/pi/cron/cname.sh >/dev/null 2>&1

You can add domain names to /etc/pihole/whitelist.txt to prevent blocking, but you need to manually remove the entry from /etc/pihole/cnamematches.list

pihole-FTL will only use the new list if you add a dnsmasq configuraton file, content:

addn-hosts=/etc/pihole/cnamematches.list

AND pihole-FTL is restarted.
Worst case scenario is the default restart once a week (pihole -g)

I am a bit coinfused by your posts, @jpgpi250 jpg :thinking:

What makes you think that?
Pi-hole's developers already stated they are on track for developing a solution:

And with regard to your CNAME matching:

f7ds.liberation.fr is the domain name request that Pi-hole is passing to FTL-DNS/dnsmasq.
If you block that, you don't need to look at the DNS answer. In fact, blocking f7ds.liberation.fr is very similar to what @DanSchaper has proposed to tackle the problem, and you woulnd't need an extra config file to do that.

If the intention of writing to cnamematches.list is to gather informaton about CNAME obfuscated 3rd party trackers, it probably would make sense to include the offending domain names from the answers as well. Otherwise you lose the information why an entry goes on that list.

Also, I am unsure what parts of your explanation are a definite part of Pi-hole's coming version, how unbound is involved and which ones are your own.

Pi-Hole does not normally restart with a gravity update.

When I run pihole -g, the output says:

pihole -g
  [i] Pi-hole blocking is enabled
  [i] Neutrino emissions detected...
  [✓] Pulling blocklist source list into range

   [i] Target: adaway.org (hosts.txt)
  [✓] Status: No changes detected

….

  [✓] Cleaning up stray matter

  [✓] Force-reloading DNS service
  [✓] DNS service is running
  [✓] Pi-hole blocking is Enabled

What does Force-reloading DNS service mean, if it isn't a restart, this to use the new gravity.list?

I don't expect a new pihole-FTL binary (with CNAME detection) release this year. pihole 5.x is being developed for months now; gravity, regex, blacklist, whitelist, … are moved into a database (/etc/pihole/gravity.db). Meanwhile, for unbound users only, this method let's you quantify the problem.

Both examples discussed have only the CNAME entry in the gravity list, the domain name isn't in the gravity list. How would you know you need to block the domain name, if you don't detect the CNAME is used to bypass pihole. Only when the CNAME is found in the gravity list AND the domain name isn't, an entry will be made.

adding an extra config file, content addn-hosts=/etc/pihole/cnamematches.list is just one of the possibilities. If you don't want to add an extra dnsmasq configuration file, you could choose to add /etc/pihole/cnamematches.list as a block list, syntax (and yes, NOT a typo, really 3 forward slashes):

file:///etc/pihole/cnamematches.list

Example (this site does NOT use this method in real life): Let's assume you visit pcsupport.lenovo.com and they start using the CNAME method to bypass pihole. They would create a DNS entry recommend.lenovo.com, cname liberation.eulerian.net, and use the domain name to trigger a third party script. Even if liberation.eulerian.net is in the gravity list, the current version of pihole-FTL would NOT detect this. Running the script without the www limitation would add recommend.lenovo.com to the list. Running the script with the www limitation would never add recommend.lenovo.com to the list, thus the DNS query would be resolved to a valid IP.

As far as I know, whitelist.txt and gravity.list are replaced by a database (/etc/pihole/gravity.db) in pihole 5.x. This would actually be beneficial for the detection. The majority of the scripts processing time is used by the grep searches on gravity.list. As soon as this data is available in the database, a simple sqlite3 query would really speed up things. Of course, the script will than require a rewrite.

IT does not mean restart it means reload. This is not done by

but by sending a signal to FTL causing it to re-read the gravity list and clearing the DNS cache. However, this is done without restarting. And in the future (the PR I linked above), we might simply reload gravity even without clearing the DNS cache.

Well, true. We could actually do it, however, it will be useless for most as we do not even have started to think about how a web interface page could look like for this. Users fine with interacting directly with the database can already try it. I was able to resolve the mentioned performance penalty this morning bringing this a lot forward. However, true, I haven't had time to properly look at CNAME interventions. And I think I shouldn't do it as part of this PR but as a follow-up.

Found a possible problem with the used logic.

if the cname is nieuwsblad.be (assume this is in gravity.list)
and the domain is anything.nieuwsblad.be

a new entry would be created.

but since both the cname and the domain end with nieuwsblad.be, this is NOT something you want to catch (make a new entry).

This would require some additional logic in the above script, such as:

if [ -z "$(echo $domain | grep -- "$cname"'$')" ]; then
	# continue processing ...
fi

Don't know if you need to consider this in your pihole-FTL code...

Here is the code for the script: pi-hole/gravity.sh at master · pi-hole/pi-hole · GitHub

As a blocklist maintainer, I 100% support this feature. I add all hosts as I find them, but its certainly a losing battle to keep up with them all. Here is just one example that everyone should be failure with:

4645336.fls.doubleclick.net. 21599 IN CNAME dart.l.doubleclick.net

There are millions of doubleclick hosts, but a drastically fewer number of CNAME values for all those hosts. Being able to block just dart.l.doubleclick.net and have it take affect for all the doubleclick hosts would be amazing! And before anyone says it - yes its easy to block all of doubleclick.net using regex/wildcards - but as a blocklist maintainer, I do not have the ability to make use of that feature, not to mention the performance tradeoffs of regex.

Furthermore, I would be opposed to making this feature a separate list. I would recommend going the same path as uBlock origin where its a on/off toggle for all lists and defaults to be off.

1 Like

Is a more sensible way to handle the doubleclick domains with a regex? One regex and you are done.

@jfb maybe you didn't read my full comment?

And before anyone says it - yes its easy to block all of doubleclick.net using regex/wildcards - but as a blocklist maintainer, I do not have the ability to make use of that feature

Yes, I can block things locally via regex easily - but that is not something that I can include in my public blocklist for others to use.

I did read it. The point is that blocklists with individual domain blocks are not the ideal solution for this specific problem. A more suitable feature of Pi-Hole (regex) should be applied in this case.

The regex feature was added to Pi-Hole so that it can be used.

Seems like NextDNS implemented exactly this already:
https://news.ycombinator.com/item?id=21610386

I'm sorry, I have been deeply buried in code and have not followed each and every message in this discussion 100%. I makes no sense to pretend otherwise, it would just not be honest.

My FTL implementation idea is/was (basic but hopefully clear description):

Assume my DNS data from above:

FTL would check:

  • f7ds.liberation.fr
  • liberation.eulerian.net
  • atc.eulerian.net

If, at least, one of these three is a bad guy, the entire query will be blocked. The check would happen, as usual, against all black- and whitelist entries (both exact and regex) as well as all gravity domains (the formally mentioned performance penalty is mostly gone as of this morning).

So you might want to highlight why you say I'd need to consider this or that special part in FTL. Maybe the discussion already went on and into deeply into something I've missed. It would be great if this could be pointed out then.

seems it's time for a code exchange given that nextdns is, in fact, using pihole. :thinking:

also i've had a lot of stuff break via mobile today...android using nextdns as "private dns" solution. haven't had time to attribute it to (or rule out) their cname resolution upgrade though.

NextDNS is not using Pihole, they just say it's kind of like Pihole in the cloud. They use Unbound and their own software I think.

The example you provided, is actually using the CNAME to disguise the third party domain, so blocking the entire query is the correct decision. Some of the blocklist have already added f7ds.liberation.fr, so this is no longer a valid test case.

pihole -q f7ds.liberation.fr
 Match found in list.5.dbl.oisd.nl.domains:
   f7ds.liberation.fr
 Match found in list.18.pgl.yoyo.org.domains:
   f7ds.liberation.fr
 Match found in list.40.raw.githubusercontent.com.domains:
   f7ds.liberation.fr

I have been running my script (see earlier entry), and it appears that this (cname is in blocklist, domain isn't) is happening a lot more than you would expect. This implies a lot of things that should be blocked, aren't blocked at all, examples:

0.0.0.0 globalredir.akadns.net # CNAME a23.g.akamai.net found in gravity list
0.0.0.0 csw.lenovo.com.edgekey.net.globalredir.akadns.net # CNAME a23.g.akamai.net found in gravity list
0.0.0.0 com-c-3.edgekey.net.globalredir.akadns.net # CNAME a23.g.akamai.net found in gravity list
0.0.0.0 microsoft.com-c-3.edgekey.net.globalredir.akadns.net # CNAME a23.g.akamai.net found in gravity list

Will everything will still work after pihole-FTL is released with CNAME detection, that will be the bigger question. The blocklist(s) do contain the CNAME, but blocking is currently never triggered.

However, I'm also getting some entries where the cname and the domain are in the same dns zone, example:

0.0.0.0 www.nieuwsblad.be # CNAME nieuwsblad.be found in gravity list

When this CNAME is encountered, the site would be blocked, using your logic.
The question is if this isn't a valid use of a CNAME, isn't this what it was initially suppose to do?

After careful consideration, it's probably NOT wise to make an exception in the pihole-FTL code, as it would increase processing time.
Because the CNAME is never detected in the current version of pihole-FTL, but would be in one of the next releases (hopefully), users will suddenly be confronted with false positives, which they will have to whitelist. You should definitely issue a warning, whenever pihole-FTL is going to block cnames.

If one of the domains that occur in the chain of things is in gravity, then I think it is the correct action to block the query. I don't see how a CNAME that contains something like badguy.com can have some valid content in the end (assuming badguy.com is in gravity). I'd say it is rather unlikely that the CNAME goes valid content -> tracking website -> valid content. It is rather likely that the chain ends in the tracker being served.

To your example: If nieuwsblad.be is in gravity, it is rather likely that you also don't want www.nieuwsblad.be, I may not be seeing what you wanted to say here?

1 Like

What does

sqlite3 /etc/pihole/gravity.db "SELECT * FROM vw_gravity WHERE domain = 'liberation.eulerian.net';"

return?