Which adlists do you really need? Here is the answer (Script)

You're collecting adlists? You're proud to have millions of domains on your blocklist? But there is this small doubt: "Do I really need all of them?"

You probably know, you won't visit a fraction of the domains on your adlist in your lifetime. But want to figure out which domains you actual visited in the last months? And which adlists blocked them?

I wrote a script that answers these questions for me (and maybe you). In the end you will know which adlists are valuable for you (your browsing habits are likely different than my) because they blocked something.

You can find the script here:

This script only works with Pihole v5.0 (and onward).


This script tries to provide you with a bunch of information that enables you to decide which adlists you need based on your browsing behavior. It does that by matching your browsing history (FTL's querylog) with your current adlist configuration (gravity database) generating a list of domains that you have visited in the past and which would have been blocked if your current adlist configuration would have been in place back then. In a second step the scripts takes this list and attributes each domain to the adlists it is on (similar to what pihole -q does). The final output is a table of all your adlists with the corresponding number of covered domains (domains that you have visited and that would have been blocked if only this particular adlist would have been used).

Usage

pihole_adlist_tool [options]

options:
    -d [Num]                         Consider the last [Num] days (Default: 90). Enter 0 for all-time analysis.
    -t [Num]                         Show top blocked domains. [Num] defines the number to show.
    -s [total/domains/hits/unique]   Set sorting order to total domains, domains covered, hits covered or unique covered domains DESC. (Default sorting: id ASC)
    -u                               Show covered unique domains
    -h                               Show this help dialog

The script outputs

  • the number of adlists (and how many are enabled)
  • the number of unique domains in your gravity.db
  • the number of blocked domains as reported by pihole ('blocking status == blocked by gravity') and how often those domains have been blocked ('hits')
  • the number of covered domains and how often those would have been blocked ('hits')
  • special case: domains on your (personal) blacklist which are also on an adlist and have been visited in the past, including hits (run 'pihole -q' to see on which adlist those domains appear)
  • optional: top blocked domains and number of hits if your current adlist configuration would have been used
  • adlist table id, status, total domains on adlist, covered domains, hits, unique covered domains, address
  • the sum of unique covered domains
  • optional: list of unique coverd domains with adlist_id, address

As domains usually appear on more then one adlist I introduce the concept of unique covered domains . Those are domains that have been visited, would have been blocked and appear on just one adlist. This might help you to value your adlists not just by how many domains are covered but also what would happen if you disable this adlist.

Sample output
*** Pihole Adlist Tool ***

Calculating.....

This might take some time - please be patient.

You have 25 adlists configured (23 enabled). Your gravity.db contains 347030 unique domains.

In the last 90 Days 1110 different domains from your adlists have been blocked 124962 times in total.

Using you current adlist configuration 1219 domains would have been blocked 128586 times.


Press enter to continue...


**You hit a spececial case***
Your personal blacklist contains at least one domain that is also on an adlist and has been requested in the selected time period.
If pihole blocked it (because blacklist was enabled at the time of the request) it got a special status ('blocked by blacklist' instead of 'blocked by gravity')
and is NOT counted on the above number of blocked domains/hits. As the domain is on an adlist the number of potentially blocked domains/hits is therefore is higher.


domain                      hits      
--------------------------  ----------
adeventtracker.spotify.com  130       
trace.svc.ui.com            250       

use 'pihole -q "domain"' to see on which adlist(s) it is

Press enter to continue...



*** Top blocked domains ***


Those would have been the 10 top blocked adlist domains in the last 90 Days using your current adlist configuration

domain                 hits      
---------------------  ----------
logger.foxitcloud.com  11934     
self.events.data.micr  8626      
beacons.gcp.gvt2.com   8403      
beacons.gvt2.com       8088      
script.ioam.de         4162      
app-measurement.com    3875      
googleads.g.doublecli  2872      
incoming.telemetry.mo  2843      
www.googletagmanager.  2754      
www.google-analytics.  2654      

Press enter to continue...


*** Adlist coverage ***


id          enabled     total_domains  domains_covered  hits_covered  unique_domains_covered  address                                                         
----------  ----------  -------------  ---------------  ------------  ----------------------  ----------------------------------------------------------------
1           1           55946          632              79772         20                      https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
2           1           26857          1                2             1                       https://mirror1.malwaredomains.com/files/justdomains            
3           1           20567          340              41300         8                       http://sysctl.org/cameleon/hosts                                
4           1           34                                                                    https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
5           1           2701           8                270                                   https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt      
7           1           1104                                                                  http://www.malwaredomainlist.com/hostslist/hosts.txt            
8           0                                                                                 https://easylist.to/easylistgermany/easylistgermany.txt         
9           1           3343           45               6608                                  http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&sh
10          1           12151          392              60865                                 https://adaway.org/hosts.txt                                    
11          1           994            16               1090          1                       https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Adgu
12          1           310            11               531                                   https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Adgu
13          1           36442          201              7273          48                      https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Adgu
14          1           58832          2                4             2                       https://gitlab.com/ZeroDot1/CoinBlockerLists/raw/master/hosts   
15          0                                                                                 https://raw.githubusercontent.com/biroloter/Mobile-Ad-Hosts/mast
16          1           69             2                6             2                       https://raw.githubusercontent.com/HenningVanRaumle/pihole-ytadbl
17          1           8882           14               36            14                      https://raw.githubusercontent.com/anudeepND/youtubeadsblacklist/
18          1           365            18               15993         2                       https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/ma
21          1           200            10               4324          2                       file:///home/nanopi/blacklist_spy-adware.txt                    
22          1           19                                                                    https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Adgu
23          1           1027           67               2342          2                       https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Easy
24          1           1048           33               740           5                       https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Easy
25          1           433            58               7226                                  https://raw.githubusercontent.com/vokins/yhosts/master/data/unio
26          1           125294         899              93848         333                     https://www.github.developerdan.com/hosts/lists/ads-and-tracking
27          1           13227                                                                 https://phishing.army/download/phishing_army_blocklist_extended.
28          1           40820          510              73935         43                      https://raw.githubusercontent.com/anudeepND/blacklist/master/ads

Domains from disabled adlists are not stored in gravity's database. If you want to include them, enable them and run 'pihole -g'
As the same domains usually appears on more than one adlist the sum of covered domains from this table is greater 
than the number of calculated blocked domains shown above

In total your adlists contain 483 visited (covered) unique domains - meaning those domains are contained only in a single adlist.


Please note the limits as noted on the repository.

If you have ideas to improve the script your welcome to make suggestions (preferable with code) and/or make a PR on the repository.

3 Likes

Or, from a different perspective.

Which adlists do you really need?

Very few. The stock lists (even the four that are now installed with V5.0) block ads, and if a few ads pop up, just blacklist those domains locally.

This avoids having to constantly juggle lists in search of the perfect combination (which will never be determined).

I know :wink:
I wrote the script for me to get a feeling how many different domains have been blocked by which adlist. Turns out only a tiny fraction of the domains on my blocklist I have been visiting in the past. Some adlist contributed a lot, many nothing at all.
I polished the script a bit and wanted to share it here to cure others from the race to the highest numbers. If they see the numbers from their actual browsing habits they might remove some litter from their lists.

1 Like

Please note, discussions on this topic along with code examples and updates should take place at https://github.com/yubiuser/pihole_adlist_tool Issues and Pull Requests.