Hello everyone, not only is this my first post but I am also a first time Raspberry Pi 5 owner as of 12/27/23. Thank you for your consideration in advance.
I wrote the following regex that seems to work in notepad++ but doesn't give me the desired results in Pi-hole Domain Management.
^(?:[a-zA-Z0-9-]+\.)*\bgoogle\.(?!com\b)\w+
The type is set to "Regex blacklist" gave it the "comment": google.com Permit / Deny All Others
I want to block everything in this list except for the first item "google.com" and "www.google.com"
Ok, now you've done it... a challenge! lol - Thanks for pointing out the potential issue with the regex. Here's the new and improved version. Hammer away at it, I think it's rock solid now...
Google - Deny All Sites ^(?:[a-zA-Z0-9-]+\.)?([a-zA-Z0-9-]+\.)?google\.[a-zA-Z]{2,}$
Chris, thanks for pointing out that very cool use of wildcard matching a domain.
Well, rats. I was obviously trying to future proof it, so I wouldn't have to manually maintain a list. I was unable to find a list of search engines (for google or anyone else) and their country alternative sites. Fortunately, google has a list of all their sites and I can add them to my own private list, which is what I originally started and then realized, hey maybe Regex would be a good way to go.
Thanks for the insights and it's been a great learning experience.
The way I'd handle those would be a wildcard domain blacklist entry for google.ca to cover the bottom four, and an exact domain blacklist entry for google.net. That leaves the top two working by virtue of not being blocked, or explicitly add a wildcard domain whitelist entry for google.com.
If you want to block every item currently on this list, you can add 3 or 4 regex to cover the entire list without the need to whitelist google.com and its subdomains (and google.whatever.tld will not be blocked).
Nice solution and it looks future proofed pretty well without hindering sites like google.whatever.com. I'm going to implement the 3 rule set now. Thank you.
Taking the advice found here, this is the solution I'm running, which allows *.google.com but blocks dns.google.com along with other variants of google search engines found with alternate TLDs.
dns.google.com is what provides the ability to do DNS over HTTPS, which gets around port 53 blocking.
The online example is concatenating the 5 entries below demonstrating the matching of the current google domains list (list date 1/4/2024).
Deny *.google.xx and *.google.xx.xx (^|\.)google(\...){1,2}$
Deny *.google.com.xx (^|\.)google\.com\...$
Deny *.google.cat (^|\.)google\.cat
Deny *.dns.google.com (^|\.)dns.google\.com
Deny *.dns.google
(.|^)dns.google$
This is my final version until one of these braniacs comes up with yet another way of doing things better. Hope this run down of what I'm doing helps someone else who wants to keep the kiddos away from adult thumbnails in google and other search engines. Side note: I enforcing Google Safe Search via the cname method. This solution has really locked things down.