Add feature to import publicly maintained regex lists

cottton · January 20, 2020, 2:57pm

Background: Why is regex on blacklists not supported? · Issue #3088 · pi-hole/pi-hole · GitHub

Are you talking about adding a pre-compiled list from an external source?

Yes

You can add individual regex/wildcard blacklist entries from the web UI, however...

And that is a problem.

I understand that its not supported for false-positive reasons.
But adding every line by hand is a pain.
And overwriting /etc/pihole/regex.list seems not a good idea.

Suggestion 1

I dont see a format in the current lists. I did not check the code but seems like there is no hard format-rules.

What about upgrading the import for external regex sources with strict rules?
First rules:
an external regex source list must

end with .regex (or what you would like to use as extension)
have (valid) regex rules only

This way we could share lists including regex

Suggestion 2
What about an import for regex sources?
Just a simple multiple file upload that gets added to the regex.list.

I read this one: External regex/wildcard blocklists
But it was just a question if it exists.
I ask to implement this.

jfb · January 20, 2020, 4:56pm

Please clarify what you want to do. Do you want to go to a list of regex filters on the web and then import them directly into your local regex list? Or, are you looking for a way to import adBlock-style blocklists into Pi-Hole? If so, please provide an example of such a file or a line entry from that file that you wish to import.

I don't understand this. What is not supported? Are you referring to the discontinued feature where Pi-Hole would parse adBlock style lists into HOSTS format?

What current lists are these - adBlock formatted lists on the internet?

If you have a file that contains a number of regex entries (one per line), you can just copy and paste those lines directly into file /etc/pihole/regex.list , save the file and restart FTL.

If you have a file that has a number of adBlock style entries, these will need to be manually converted to either domains or regex, then imported into either your blacklist using the correct selection (exact or regex).'

DISCLAIMER - the file locations for regex and blacklist and whitelist will not be the same in V5.0 and later. The discussion above is accurate for V4.x only.

cottton · January 21, 2020, 1:11am

I actually wanted a preview of my post but we dont have this feature here ... so i submitted and wanted to read it again. But then i could not see it because of this "needs to be approved" thing ... so ...

Im asking for a way get domains and its subdomans blocked.
See: Why is regex on blacklists not supported? · Issue #3088 · pi-hole/pi-hole · GitHub

I always though pihole works like "domain.com" blocks this domain and all its subdomains.

Seems not the case and regext dont work in blacklists.

So it would be nice to get this working somehow. Perhaps through shared regex blocklists.

i really have a hard time writing a reply in this kind of popup. You may change this too?

jfb · January 21, 2020, 1:56am

On the web admin blacklist page, type domain.com and enter as wildcard. Pi-Hole will convert this to regex and this domain and all subdomains will be blocked.

If you have a list of regex already prepared or found on the internet, copy them all and add these to file /etc/pihole/regex.list and restart FTL.

cottton · January 21, 2020, 3:57pm

Please read.

Also you suggestion makes no sense since those files wont exist in 5.0 anymore.

I came here to post a featurerequest as told me on git.
Feels now like talking to a tech support that feeds me with default answers.

I will try it in one sentence:
Please add a regex-blacklist import
so we can get domains and their subdomains blocked (as actually expected)
and people|projects can share those lists.

Thanks.

EDIT: fixed highlighting .. omg this twitter like posting is annoying.

jfb · January 21, 2020, 4:04pm

We understand the feature request now. The title has been changed to match the request.

cottton · January 21, 2020, 4:30pm

Note: just run over related topic: Load regexps from "host file" - #9 by DL6ER

PromoFaux · January 21, 2020, 6:13pm

It is to the right of where you are typing:

That's just how it is. You can make the composer full screen by clicking the arrows to the right of the popup:

The markdown syntax is the same as on github...

Anyway. Lets keep discussion about this particular platform to the Meta category. Please help keep threads on the topic that they are started.

B_Trug · February 1, 2020, 11:20pm

I would love to see an easy import function for regexes as easy als importing blocklists.

B_Trug · February 10, 2021, 8:48pm

Yes please. We need the possiblity to import and (more important) update RegEx into Pi-hole

For example: In Germany there currenty is a big phishing attack running on duckdns.org. Since dynamic DNS its impossible to block via hosts. So you edit your RegEx but that has to be done locally by everyone... and thats not happening.

So Pi-hole is wasting a big opportunity.

timmy · February 13, 2021, 5:39pm

I like that proposal too.
That could lead to much smaller blocklists.

jfb · February 13, 2021, 7:59pm

But it won't. You will still use the same blocklists, and the blocklist maintainers will continue to put the same domains in their blocklists. Any regex you use will be in addition to those domains, not substituting for those domains.

timmy · February 17, 2021, 8:33am

If the blocklist maintainers would also offer a "RegEx-blocklist" that would be really nice.
Especially for blocking domains like the one @B_Trug mentioned above.

I am well aware that standard blocklists won't be substituted, but the amount of URLs can be decreased when offering a regex and standard blocklist.
See for example: https://www.reddit.com/r/pihole/comments/lhajje/looks_like_its_time_to_add_some_new_trackers_to/

I checked my blocklists for "clarity.ms" and found five domains:

c.clarity.ms
clarity.ms
lean.clarity.ms
log.clarity.ms
www.clarity.ms

My RegEx filter (.|^)clarity.ms$ will block these five domains and all upcoming domains before one blocklist maintainer will update a blocklist.

So would like to see an option, that I can add a RegEx-blocklist like a standard blocklist and don't have to add each regex filter manually.

UseGithubDammit · February 3, 2022, 12:28pm

I've had my PiHole for many years now and had firmly entered the "I forget that it's there" phase of the lifecycle. But I've had to tinker with it a few times recently, and from those interactions I really think that this feature (and a couple of similar ones) are badly missing...

My scenarios were the following:

New TV entering my network. I maintain my personal blacklists on Github and updated my TV list to sort out Samsung (in addition to LG). No problem (I assumed), but when I came to look at the Interface, it told me that a number of my domains were "Invalid" because I had used RegEx syntax! The only way to use these is individually!!! That meant when adding the list to my "TV" Group, I had to separately attach each regular expression to the correct Group from the Domains page (i.e. remove from Default and add to TV) manually for each one. Slow and annoying and now my list has to be maintained in two places!
My mum came round and wanted to use WhatsApp, Instagram and Facebook (all firmly blocked on my network). No problem I thought to myself, I can create a new Group for her (Social Media Allowed), add another list to my Github that is a whitelist and then add the domains to that. This time, there's no whitelist option for Adlists (voted for that issue separately), so each and every one of those rules had to be added manually (and painfully assigned to the correct, single group). Many were also regexp whitelists. Very slow and annoying and I can't see at a glance what I'm doing from Github either to update them.
A friend came round and wanted to show me his new product and this meant needing access to various domains such as Google Analytics, Google Ads, Bugsnag and several others that were all naturally blocked on my network... Again began the painful process of manually adding these one at a time and assigning them to a new Group (Development). Again, managing such a list (or lists) on Github would be massively easier.

From these small tasks, I can certainly say that there is a good argument for supporting lists of the 4 main task types:

Exact Blacklist (exists)
Regex Blacklist (this issue)
Exact Whitelist (voted)
Regex Whitelist (didn't see an issue, but it should also exist)

The Adlists page could simply have an extra select box with these options. Appropriate warning labels could be added to ensure users know the risks.

I understand the issue of compromised lists for whitelisting, but there are those of us who maintain their own lists carefully and are denied such functionality for no really good reason. The software obviously supports all cases already anyway, the only difference is in the user irritation in having to manually enter the details one at a time!

These lists would also allow me to share my complete lists with friends and family without having to go on a copy and paste mission with detailed instructions and irritation for them to achieve the same goals.

Please consider adding them all!!

mam · April 15, 2022, 8:09am

I also love to see this feature.
So far, I made my own script, it loads a list from (my local, but that does not matter) web server and pipes them into pihole with a line like this:

cat $RX | xargs -L2 /usr/local/bin/pihole regex -nr -q 2>&1 >/dev/null

This is not really fast, but legal and working (the overwriting of the exisiting file was bad before and stopped working with V5).

I still overwrite the whitelist because I did not find a "legal" way for this yet.

yubiuser · January 14, 2024, 11:59am

We support limited parsing of ABP-style lists, which can be used to block domains and all their subdomains. I would consider this limited regex support.

https://github.com/pi-hole/pi-hole/pull/5179