Improve filtering by adding ability to use wildcard entries for certain host files

I have been having an in-depth look at host files recently and I think it could be possible to improve upon the domain blacklist.

If we take filter lists, for example, we take:
||something.com^ (filter domain)

We are currently only adding something.com, although the filter list syntax dictates that the url should be blocked if the domain is visible, so we should also be blocking *.something.com

This could be achieved using existing DNSMASQ syntax (still works with FTL): server=/something.com/ (returns NXDOMAIN for both of the above domains).

The Disconnect host files, for example, should also be considered wildcards. Inside, you have a single entry for 2o7.net, although we know from other host files that this domain has hundreds of subdomains. Pgl@yoyo is another example.

The performance is awful for bulk wildcard blocking with the new regex.list - It's better to keep this small, but I am seeing little performance issue with the server=/something.com/ entries (providing we are not adding millions of entries at a time).

I appreciate that this request would likely involve a large amount of change in the collating process, but imo it would be well worth it for the improvement in Pi-hole filtering.

What do you mean with "awful"? We have tested this locally with a few thousand entries and queries are still resolvable with maybe a few ten milliseconds added delay. Also note that regex entries are only validated once for each domain, i.e. subsequent queries are not affected by the regex checking delay!

Given the flexibility of regex, it seems to be the right thing to add. Because we didn't want to overwhelm users with blocking facilities we intentionally decided to handle wildcard blocking solely through the regex list.

Any experienced user is invited to use any configuration they deem appropriate for their specific use case so please go ahead and use the dnsmasq syntax you suggested and wipe out the regex entries you do not need to speed things up.

2 Likes

@DL6ER Apologies - 'awful' may very well have been the wrong term to use.

I had tried to use ~35k entries that were extracted from easylist / easy privacy and there was a significant delay in dns resolution when trying to browse. The reply time went from ~15 - 30ms, to consistently above 100ms +

After consistent browsing, things settled down (as you say, after things have loaded once), but pages with elements from several different domains seemed to take a considerably long time to load and even things like Netflix took a considerable hit.

I agree, though, it's incredibly flexible and works very well with smaller lists and wildcards.

Perhaps it is easier in the case for me to try and write something to extract domains and add them to a wildcard.conf file as a cron job, I just thought it might be useful for users to have the ability to apply wildcard filtering to lists that have been built specifically for wildcards.