I was interested in the effect of using regular expressions.
For this test, I reconfigured pihole to using the default lists only:
StevenBlack https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts MalwareDom https://mirror1.malwaredomains.com/files/justdomains Cameleon http://sysctl.org/cameleon/hosts ZeusTracker https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist DisconTrack https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt DisconAd https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt HostsFile https://hosts-file.net/ad_servers.txt
Running this script (does take a while - there probably is a better way to achieve this - I said before, NOT a Linux expert, nor a regex expert…):
#!/bin/bash regexfile=/etc/pihole/regex.list gravityfile=/etc/pihole/gravity.list list=0 regexcount=0 result=0 for domain in `cat $gravityfile` do list=$((list+1)) valid=true for regex in `cat $regexfile` do if [[ $domain =~ $regex ]]; then regexcount=$((regexcount+1)) valid=false break fi done if $valid; then result=$((result+1)) fi done echo -e "listsize=$list - regex matches=$regexcount - unique domains=$result"
number of entries in the list: 136702
number of regex matches: 25342 (18.5%)
number of domains, not covered by regex: 111360 (81,5%)
There are several topics on improving the
gravity.list, based on the content of
regex.list (here, here and here for example...). These are ingenious, but complex procedures, witch take a lot of processing time and make things complex.
It should however, given an improved and much faster version of the script above, be possible to integrate this in
pihole -g). Is it worth the effort?