Optimisation | Scripts for advanced users

#1

These are now all compatible with the Pi-hole DB (currently active on the dev branch)

Selection of regexps with easy one-command install:

Example Output:

pi@raspberrypi:~ $ curl -sSl https://raw.githubusercontent.com/mmotti/pihole-regex/master/install.sh | bash
[i] Fetching mmotti's regexps
[i] Fetching existing regexps
[i] No regexps currently exist in the database
[i] Generating CSV file
[i] Importing CSV to DB
[i] Regex import complete
[i] Refreshing Pi-hole


These are your current regexps:
  ^(.+[-_.])??adse?rv(er?|ice)?s?[0-9]*[-.]
  ^(.+[-_.])??m?ad[sxv]?[0-9]*[-_.]
  ^(.+[-_.])??xn--
  ^adim(age|g)s?[0-9]*[-_.]
  ^adtrack(er|ing)?[0-9]*[-.]
  ^advert(s|is(ing|ements?))?[0-9]*[-_.]
  ^aff(iliat(es?|ion))?[-.]
  ^analytics?[-.]
  ^banners?[-.]
  ^beacons?[0-9]*[-.]
  ^count(ers?)?[0-9]*[-.]
  ^pixels?[-.]
  ^stat(s|istics)?[0-9]*[-.]
  ^telemetry[-.]
  ^track(ers?|ing)?[0-9]*[-.]
  ^traff(ic)?[-.]

Optimise your gravity list/table by removing unnecessary entries (through your custom DNSMASQ wildcards and regexps)

Example Output:

pi@raspberrypi:~ $ gravityOptimise.sh
[i] Pi-hole DB detected
[i] Updating gravity
[i] Parsing existing wildcard config (DNSMASQ)
[i] Removing wildcard matches from gravity
[i] Removing regex matches from gravity
[i] Updating gravity database
[i] Removing temp files
[i] Refreshing Pihole
[i] 55205 domains were removed from gravity
[i] 58103 domains remain in gravity

Add wildcard domains from adguarddns, easylist, easyprivacy and nocoin (justdomains repo) without conflicting with your whitelist or other local dnsmasq wildcards. Best when complimented by gravityOptimise.

Example Output:

pi@raspberrypi:~ $ fetchFilterLists.sh
[i] Pi-hole DB detected
[i] Selected filter sources: adguarddns,easylist,easyprivacy,nocoin
[i] Fetching domains
[i] Parsing existing wildcard config (DNSMASQ)
[i] Cleaning domains
[i] Removing regex.list conflicts
[i] Checking for local wildcard conflicts
[i] Checking whitelist conflicts
[i] Determining output format
[i] Constructing output
[i] 31438 domains added to /etc/dnsmasq.d/filter_lists.conf
[i] Restarting FTL
[i] Done


Don't forget to run this script again if you make changes to your whitelist!

Identify gravity domains with >=20 subdomains and add those as wildcards to a custom DNSMASQ conf file. Best when complimented by gravityOptimise

Example Output:

pi@raspberrypi:~ $ generateGravityWildcards.sh
[i] Pi-hole DB detected
[i] Updating gravity
[i] Fetching domains from gravity table
[i] Identifying domains with >= 20 subdomains
[i] 139 domains found
[i] Fetching domains from whitelist table
[i] Checking for whitelist conflicts
[i] 139 domains remain after conflict resolution
[i] Determining blocking mode
[i] Outputting domains to: /etc/dnsmasq.d/mmotti_generated_wildcards.conf
[i] Restarting Pi-hole service
#2

As previously noted by the developers, the gravity list is quite fast and regex filters are relatvely slow in comparison. What benefit does moving domains from the fast lane to the slow lane provide?

#3

An important bit of information is in the first line. Mmotti is implementing this in the development version of Pi-hole which has no gravity.list anymore. It has been moved into a database wich gives endless posibilities in tagging lines/addresses.

Despite all data is in the database the actual looking up is still done by dnsmasq because of it sheer speed.

I made my adjustments, months ago to be able to adapt to the new DB system. I also added filtering of nxdomains that would be death weight in the gravity table.

In the current version of Pi-hole regex can be extremly effective despite being slower. One regex can replace thousands of gravity lines. The break even point is, one regex must replace twenty or more gravity lines.

#4

How have you determined that this is the break-even point?

Regardless of where the gravity list is located, it is still used as the master block list and is still searched for blocked domains.

#5

I did not have to, it was tested by the developers of AdBlocker and I have written about it earlier.

I don’t know where the new DB version keeps it gravity list for searching. The database can be used to maintain the table and even could now supply the search option for searching if a domain is blocked or not.

https://discourse.pi-hole.net/t/using-regex-for-filtering/10959/2

#7

With reference specifically to regex filters, there are several reasons why one may wish to use these. The main reasons being coverage and flexibility. A single regular expression can capture many, many different variations of domains; some of which may not even be covered by your gravity list.

Example: ^(.+[-_.])??m?ad[sxv]?[0-9]*[-_.] accommodates for 19,318 gravity entries using the default host files.

pi@raspberrypi:~ $ grep -E '^(.+[-_.])??m?ad[sxv]?[0-9]*[-_.]' /etc/pihole/gravity.list | wc -l
19318

There are only currently 16 regexps in my filter list, and I would agree that if this would be expanded to hundreds / thousands, then this could have potential for moving things to the “slow lane”.

On the subject of slow lane, that’s why I have created some of these scripts. The current implementation of Pi-hole uses regexps for wildcards now which one could argue is a vastly more “expensive” than a simple DNSMASQ wildcard. That being said, of course I completely understand and appreciate the reasoning behind moving in this direction. I personally try to reserve regexp for use only when absolutely necessary (i.e. for more complex tasks / catchment).

DNSMASQ wildcards are particularly effective:

pi@raspberrypi:~ $ grep -E '302br\.net$' /etc/pihole/gravity.list | wc -l
17944
pi@raspberrypi:~ $ grep -E '2o7\.net$' /etc/pihole/gravity.list | wc -l
1474
pi@raspberrypi:~ $ grep -E 'adnxs\.com$' /etc/pihole/gravity.list | wc -l
560
pi@raspberrypi:~ $ grep -E 'doubleclick\.net$' /etc/pihole/gravity.list | wc -l
246



The “optimisation” script whos purpose is to remove gravity entries based on regex and DNSMASQ wildcards - One could argue that this is unnecessary as there is little impact on speed. This really comes down to personal preference again - I prefer to only have the absolutely necessary entries.