Quality of gravity list

Informational only…
I was interested in the effect of using regular expressions and the quality of the gravity.list
For this test, I reconfigured pihole to using the default lists only:

StevenBlack  https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
MalwareDom   https://mirror1.malwaredomains.com/files/justdomains
Cameleon     http://sysctl.org/cameleon/hosts
ZeusTracker  https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist
DisconTrack  https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
DisconAd     https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
HostsFile    https://hosts-file.net/ad_servers.txt

I'm using pihole + unbound, so it's possible to check the domains in the gravity.list, by directly asking unbound, this to avoid blocking by pihole.

I used the following script, submitting the DNS queries to the IPv4 unbound address, retry, using the IPv6 unbound address. The script ran for approximately 48 hours (there probably is a better way to achieve this - I said before, NOT a Linux expert…)

#!/bin/bash

sudo /usr/sbin/unbound-control verbosity 0

file=/etc/pihole/gravity.list
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # No Color

success=0
IPv4success=0
IPv6success=0
counter=1
for i in `cat $file`
do
   result=$success
   (($(dig @127.10.10.2 -p 5552 +noall +answer $i |wc -c)>0)) && IPv4success=$((IPv4success+1))
   success=$(($IPv4success + $IPv6success))
   if [ $success == $result ]; then
      echo -e "${RED}IPv4${NC} - count:$counter, success:$success / $IPv4success  / $IPv6success - $i"
   else
      echo -e "${GREEN}IPv4${NC} - count:$counter, success:$success / $IPv4success  / $IPv6success - $i"
   fi
   if [ $success == $result ]; then
      (($(dig @fdaa:bbcc:ddee:2::5552 -p 5552 +noall +answer $i |wc -c)>0)) && IPv6success=$(($IPv6success+1))
      success=$(($IPv4success + $IPv6success))
      if [ $success == $result ]; then
         echo -e "${RED}IPv6${NC} - count:$counter, success:$success / $IPv4success  / $IPv6success - $i"
      else
         echo -e "${GREEN}IPv6${NC} - count:$counter, success:$success / $IPv4success  / $IPv6success - $i"
      fi
   fi
   counter=$((counter+1))
    # sleep, I want to keep the pi responsive...
   sleep 0.6
done

The result:
number of entries in the list: 136702
resolved (IPv4): 108215
resolved (IPv6): 7
resolved (total): 108222 (79.2%)
failed: 28480 (20.8%)

To be fair, I'm using suricata on my pfsense firewall, there where a total of 2058 blocked replies, example:

This implies there might have been a reply, if suricata didn't drop the reply. It's also good news, using pihole in a normal way with these lists, would have protected me, since the reply would simply be 0.0.0.0 or :: (null blocking)
other suricata messages:

ET TROJAN DNS Reply Sinkhole - Anubis
ET TROJAN DNS Reply Sinkhole - Microsoft
ET TROJAN DNS Reply Sinkhole Microsoft NO-IP Domain
ET TROJAN Injected WP Keylogger/Coinminer Domain Detected (cloudflare .solutions in DNS Lookup)
ET TROJAN Kaspersky Sinkhole DNS Reply
ET TROJAN JS.InfectedMikrotik Injects Domain Observed in DNS Lookup
ET TROJAN Possible WannaCry DNS Lookup
ET TROJAN ABUSE.CH Locky C2 Domain (lvanwwbyabcfevyi in DNS Lookup)
ET TROJAN Known Hostile Domain .ntkrnlpa.info Lookup

In conclusion, I think we can agree pihole is only as good as the lists we feed it. I assume the developers choose these lists for a reason, and even those are not really up to date.