You're collecting adlists? You're proud to have millions of domains on your blocklist? But there is this small doubt: "Do I really need all of them?"
You probably know, you won't visit a fraction of the domains on your adlist in your lifetime. But want to figure out which domains you actual visited in the last months? And which adlists blocked them?
I wrote a script that answers these questions for me (and maybe you). In the end you will know which adlists are valuable for you (your browsing habits are likely different than mine) because they blocked something.
You can find the script here:
This script only works with Pihole v5.0 (and onward).
This script tries to provide you with a bunch of information that enables you to decide which adlists you need based on your browsing behavior. It does that by matching your browsing history (FTL's querylog) with your current adlist configuration (gravity database) generating a list of domains that you have visited in the past and which would have been blocked if your current adlist configuration would have been in place back then. In a second step the scripts takes this list and attributes each domain to the adlists it is on (similar to what pihole -q
does). The final output is a table of all your adlists with the corresponding number of covered domains (domains that you have visited and that would have been blocked if only this particular adlist would have been used).
Usage
pihole_adlist_tool [options]
options:
-d [Num] Consider the last [Num] days (Default: 90). Enter 0 for all-time analysis.
-t [Num] Show top blocked domains. [Num] defines the number to show.
-s [total/domains/hits/unique] Set sorting order to total domains, domains covered, hits covered or unique covered domains DESC. (Default sorting: id ASC)
-u Show covered unique domains
-h Show this help dialog
The script outputs
- the number of adlists (and how many are enabled)
- the number of unique domains in your gravity.db
- the number of blocked domains as reported by pihole ('blocking status == blocked by gravity') and how often those domains have been blocked ('hits')
- the number of covered domains and how often those would have been blocked ('hits')
- special case: domains on your (personal) blacklist which are also on an adlist and have been visited in the past, including hits (run 'pihole -q' to see on which adlist those domains appear)
- optional: top blocked domains and number of hits if your current adlist configuration would have been used
- adlist table id, status, total domains on adlist, covered domains, hits, unique covered domains, address
- the sum of unique covered domains
- optional: list of unique coverd domains with adlist_id, address
As domains usually appear on more then one adlist I introduce the concept of unique covered domains . Those are domains that have been visited, would have been blocked and appear on just one adlist. This might help you to value your adlists not just by how many domains are covered but also what would happen if you disable this adlist.
Sample output
*** Pihole Adlist Tool ***
Calculating.....
This might take some time - please be patient.
You have 25 adlists configured (23 enabled). Your gravity.db contains 347030 unique domains.
In the last 90 Days 1110 different domains from your adlists have been blocked 124962 times in total.
Using you current adlist configuration 1219 domains would have been blocked 128586 times.
Press enter to continue...
**You hit a spececial case***
Your personal blacklist contains at least one domain that is also on an adlist and has been requested in the selected time period.
If pihole blocked it (because blacklist was enabled at the time of the request) it got a special status ('blocked by blacklist' instead of 'blocked by gravity')
and is NOT counted on the above number of blocked domains/hits. As the domain is on an adlist the number of potentially blocked domains/hits is therefore is higher.
domain hits
-------------------------- ----------
adeventtracker.spotify.com 130
trace.svc.ui.com 250
use 'pihole -q "domain"' to see on which adlist(s) it is
Press enter to continue...
*** Top blocked domains ***
Those would have been the 10 top blocked adlist domains in the last 90 Days using your current adlist configuration
domain hits
--------------------- ----------
logger.foxitcloud.com 11934
self.events.data.micr 8626
beacons.gcp.gvt2.com 8403
beacons.gvt2.com 8088
script.ioam.de 4162
app-measurement.com 3875
googleads.g.doublecli 2872
incoming.telemetry.mo 2843
www.googletagmanager. 2754
www.google-analytics. 2654
Press enter to continue...
*** Adlist coverage ***
id enabled total_domains domains_covered hits_covered unique_domains_covered address
---------- ---------- ------------- --------------- ------------ ---------------------- ----------------------------------------------------------------
1 1 55946 632 79772 20 https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
2 1 26857 1 2 1 https://mirror1.malwaredomains.com/files/justdomains
3 1 20567 340 41300 8 http://sysctl.org/cameleon/hosts
4 1 34 https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
5 1 2701 8 270 https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
7 1 1104 http://www.malwaredomainlist.com/hostslist/hosts.txt
8 0 https://easylist.to/easylistgermany/easylistgermany.txt
9 1 3343 45 6608 http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&sh
10 1 12151 392 60865 https://adaway.org/hosts.txt
11 1 994 16 1090 1 https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Adgu
12 1 310 11 531 https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Adgu
13 1 36442 201 7273 48 https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Adgu
14 1 58832 2 4 2 https://gitlab.com/ZeroDot1/CoinBlockerLists/raw/master/hosts
15 0 https://raw.githubusercontent.com/biroloter/Mobile-Ad-Hosts/mast
16 1 69 2 6 2 https://raw.githubusercontent.com/HenningVanRaumle/pihole-ytadbl
17 1 8882 14 36 14 https://raw.githubusercontent.com/anudeepND/youtubeadsblacklist/
18 1 365 18 15993 2 https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/ma
21 1 200 10 4324 2 file:///home/nanopi/blacklist_spy-adware.txt
22 1 19 https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Adgu
23 1 1027 67 2342 2 https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Easy
24 1 1048 33 740 5 https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/Easy
25 1 433 58 7226 https://raw.githubusercontent.com/vokins/yhosts/master/data/unio
26 1 125294 899 93848 333 https://www.github.developerdan.com/hosts/lists/ads-and-tracking
27 1 13227 https://phishing.army/download/phishing_army_blocklist_extended.
28 1 40820 510 73935 43 https://raw.githubusercontent.com/anudeepND/blacklist/master/ads
Domains from disabled adlists are not stored in gravity's database. If you want to include them, enable them and run 'pihole -g'
As the same domains usually appears on more than one adlist the sum of covered domains from this table is greater
than the number of calculated blocked domains shown above
In total your adlists contain 483 visited (covered) unique domains - meaning those domains are contained only in a single adlist.
Please note the limits as noted on the repository.
If you have ideas to improve the script your welcome to make suggestions (preferable with code) and/or make a PR on the repository.