Collection of RegEx for blacklisting

Hi.
I have been lurking here for some time now and learned a lot.
In the last weeks, I have been primarily looking into RegExs and accumulated a small collection, that I am using on my Pi-Hole. I created these RegExs by searching for the Top100 Tracking, Data Analysis, Advertising and Malware Sites, as well, as combing trough some Blacklists and Privacy/Tracking Agreements on Sites I visit regularly.
Since I wanted to give something back to the Community I learned from, here they are.

Just one more thing, before anyone is asking:
No, I have not found a way to block youtube ads, sorry.(But I am open to suggestions :slight_smile: )

Please feel free to use, correct or add.
Enjoy :slight_smile:

Blacklists:

^(.*\.)*(sonobi|contentabc|exponential|bebi|3lift|leanplum|hotjar|m-pathy|parsely|exelator|touchcommerce|magnetic|exosrv|go2affise|mgid|clksite|tynt|inclk|bidswitch|optimizely|tiqcdn|maxymiser|9xiazaiqi|siquality|snwxn|8wix|tudown|psdzy|bankofamericaslpemr|cloudns|pfa.levexis)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(dotomi|urbanairship|criteo|yimg|footprint|sitescout|turn|w55c|openx|voluumtrk|videoplaza|inmobi|reporo|cnzz|skim(resources|links)|tribalfusion|getsocial|hoverr|mediaplex|auditude|tacoda|puserving|d(e|o)mdex)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(buffpanel|redshell|treasuredata|unity(ads|3d)|evidon|flashtalking|mathtag|mediamath|scorecardresearch|yieldmanager|sharezips|liveadvert|mktoresp|online-metrix|mobileapptracking|webengage|mpstat|trafficjunky|alphonso|stickyadstv|marketo|tubemogul|kameleoon|24s|0x1f4b0)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(petametrics|localytics|onthe|newrelic|casalemedia|tidaltv|newsinc|nr-data|tapad|crazyegg|vserv|solocpm|ojrq|getclicky|narrative|7eer|evyy|impactradius|redtrack|content-ad|sharethrough|getui|pro-market|albacross|evergage|araleg|districtm|singular|dynamicyield|smrtb|everesttech|snapchatprime)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(perimeterx|coremetrics|veinteractive|netcoresmartech|vungle|batmobi|marinsm|igexin|keywordblocks|facebook-info|freecontent|hostingcloud|360safe|axiatraders|beginads|ero-advertising|inteleksys|joodfbnm*[0-9]*|nbrwer*[0-9]*|usergrid|trackersimulator|tapjoy|supercell|bluekai|chartboost|atues|duilawyeryork|bannerflow|bmmetrix|flagcounter)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(segment|braze|amplitude|emetriq|buysellads|ioam|ensighten|eulerian|dnsdelegation|clicktale|samsungads|userlike|summerhamster|sourcepoint|adobetm|pcapredict|syscation|infosupports|arrowservice|bigdepression|earthsolution|firefoxupdata|infobusinessus|yahoodaily|newsonet|worthhummer|purpledaily|blackcake|msnhome)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(inmobicdn|jizzads|leadbolt|nexage|milleniamedia|mobfox|mobilityware|newrelic|propellerads|revd(depo|sci)|segment|serving-sys|sharethis|startapp|steelhousemedia|tapjoy|viglink|webterren|adtaily|zedo|liadm|exposebox|jazdoxthxiv|jwmwtcmexc|thetradedesk||rocketfuel|ignitionone|komoot|squarelovin|cookiebot)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(aatkit|abtasty|acce(ngage|sstrade)|adtec(h|jp|us)|anthill|atdmt|avazutracking|bfmino|bounceexchange|branch|clever(push|tap)|conviva|crittercism|cxense|deltadna|duapps|fractionalmedia|fyber|hyprmx|justpremium|lijit|mobileapptracking|ogury|omniture|presage|tapad|pubmatic|pushwoosh|qq|servedbyopenx|supersoni(c|cads)|swrve|webtrekk|aimatch|sas)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(actioniq|hs-(analytics|banner)|hsleadflows|hubspot|agileone|bizible|crazyegg|acquia|wistia|engagio|lytics|am15|bannerbank|bbelements|bravenet|cedexis-radar|crypto-loot|economicoutlook|esomniture|estat|extreme-dm|ezcybersearch|fastclick|focalink|gemius|hyperbanner|iovation|kaffnet|yieldlove|histats|forsee)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(indexww|yieldlove-ad-serving|xplosion|digitru|upapi|spotxchange|permutive|onesignal|usabilla|contextweb|ml314|afcpatrk|(klclick[0-9])|aidata|hybrid|deployads|perfectaudience|liveramp|didomi|etahub|giraff|hurra|sizmek|nativo|djaxadserver|webmasterplan|ggpht|zanox|audrte|agkn|dataxu|scarabresearch|ekomi)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(apsalar|tune|qualtrics|neustar|webtrends|linksynergy|backtrace|doubleverify|ligatus|a4|rayjump|umeng|umengcloud|wootric|medialytics|herokuapp|yllix|mobpartner|combango|bugsense|burstly|count|crashlytics|do-not-tracker|eviltracker|getexceptional|jumptap|playtomic|stathat|163|206ads|2mdnsys|360in|4seeresults|accesstrade|ads[0-9]-adnow|pop(ads|cash)|popadscdn)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(urekamedia|kochava|upsight|marfeel(cache|cdn|rev|tenmax)|blueseed|adx[0-9]|zucks|o-s|aralego|breaktime|mdotm|juicer|cnt|ematicsolutions|alcmpn|powerlinks|33across|1rx|bfmio|bnmla|doublepimp|weborama|specific(click|media)|acuityplatform|cnnx|scanscout|nxtck|socdm|simpleanalytics|gumgum|marketolive|demandbase|izooto|ibillboard)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(invitemedia|microad|ardata|152media|exoclick|doublemax|wywy|navdmp|trafficstars|pubexchange|vdopia|lsosad|aaxads|mantisadnetwork|triplelift|trafficfactory|themoneytizer|insurads|proads|pxlad|flux-adserve|eclick|vcmedia|nova(net|onads|on|nox)|coccoc|polyad|cleverads|ambient-platform|mangoads|cityads|autoads|yoads)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(oewabox|metriweb|caramb(o|ola)|refinedads|klaviyo|a-mo|fieldtest|justuno|smartredirect|ezoic|statscrop|optinmonster|nr-data|avocet|bttrack|eyeota|jetlore|1up|abmr|acxiom-online|mlnadvertising|amgdgt|askmen|bam-x|bidr|bluecava|brand-display|brilig|ccgateway|channelintelligence|choicestream|clickagy|cognitivlabs|collective-media)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(company-target|crosspixel|crsspxl|decdna|decideinteractive|disqus|emxdgt|extremetech|eyeviewads|btbuckets|fetchback|fimserve|freeskreen|gmads|gwallet|hlserve|imiclk|innovid|insidecrm|intentmedia|interclick|ipr(edictive|omote)|iqm|kargo|knorex|lkqd|logicbuy|marchex|media6degrees|mediaforge|mediaiqdigital|mixpo|mmismm|numberly|1000mercis)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(mxptint|ncaudienceexchange|net(mng|seer)|owneriq|parrable|pm14|postrelease|pro-market|raasnet|retargetly|revsci|rfihub|rkdms|ru4|samba|semasio|sojern|udmserve|unrulymedia|vindicosuite|vmmpxl|vmweb|wsod|xad|xgraph|yieldmo|yieldoptimizer|youknowbest|zemanta|skimlinks|ipromote|lotame|contentsquare|matomo|piwik|roq|emerse|widespace)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(nrich|rtbhouse|sovrn|beeswax|indexexchange|c3metrics|contactimpact|1plusx|semrush|improvedigital|xandr|madvertise|neural|rhythmone|inskinmedia|jivox|gsi-one|celtra|tapfiliate|req|moburst|webmechanix|divisionoflabor|commcreative|metrictheory|gkv|periscope|yieldbranding|gumas|fivebyfiveglobal|williamswhittle|baycreative|milleradagency)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(farinella|wearesculpt|neonambition|envision-creative|tronviggroup|databricks|gotomarketers|maxaudience|perfectsearchmedia|titangrowth|3mediaweb|directom|seobrand|disruptiveadvertising|bigleap|silverbackstrategies|ignitevisibility|thriveagency|webfx|seoinc|socialseo|frac|klientboost|avalaunchmedia|siegemedia|straightnorth|gulosolutions|grafik)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(thestorywebs|directiveconsulting|pbjmarketing|befoundonline|walkersandsdigital|firebellymarketing|comradeweb|contentbureau|hanapinmarketing|fruition|digitalbrandexpressions|noblestudios|thoughtspot|collibra|erne|uimserv|belboon)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(app(boy|next|adhoc|celerator|ier|odeal|lovin|nexus|sflyer|boy|brain|ier|lvn|spot|timize|see|ads|lifier|lift|logrocket|-measurement|topia))\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(ad(j|skeeper|dtoany|vertica|true|vertnative|pushup|media|master|ledge|kernel|kmob|future|bro|booth|ap|apt|4game|just|telligent|4mat|tng|alliance|link|marketplace|[0-9]|adapted|acado|alliance|alyser|brite|bureau|and|blade|brodealsnetwork|spirit|epom|lightning|plugg|glare|speed|butler|x|roll|port|river|srvr|advisor|symptotic|dthis|rtx|scale))\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(ad(smogo|target|infuse|conversantmedia|now|view|ups|tim(a|aserver|ming)|srv|spruce|see|same|sniper|smoloco|shot|skeeper|x1|tdp|hese|lightning|micro|entifi|ventori|worx|zerk|roll|safeprotected|this|colony|tech[a-z]{1,2}|dthisedge|looxtracking|vertising|nxs|(ds|s)wizz|omik|k2x|k2|ition|this|form|ocean|protect|science|tlgc|vertserve|thrive))\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(ad(netasia|clixx|hacker|xcorp|vergine|optimize|grx|sfactor|snative|op|high|xxx|schoom|recover|spsp|safety|marvel|brix|servme|flex|rcdn|maym|pone|xadserv|up-tech|timaserver|network|tima|360|pia|sota|splay|sparc|legend|chemy|haven|ready|viva|gear|ikteev|dapptr|defend|ventureppc))\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(ad-(balancer|brix|center|cloud|delivery|delivery|locus|maven|move|plus|score|srv|stir))\.([a-z]{2,3}\.)*[a-z]{2,6}$

Whitelist:

^(.*\.)*(dpd|ups|dhl|safelinking|githubusercontent|researchintel|intel|github|foldingathome|sysctl|apple-dns|apple|icloud|windowsupdate|samsung(electronics|otn|acr|cloudsolution|qbe)|plex|office|msftncsi|microsoftonline|dropbox|pr0gramm|solvemedia|microsoft|tnt-digital|fedex|creative)\.([a-z]{2,3}\.)*[a-z]{2,6}$

Sites I learned from, and used to inspire me:

For checking a RegEx:

https://regex101.com/
https://regexr.com/

Tested and Maintained Regexs:

https://github.com/mmotti/pihole-regex
https://github.com/cbuijs/accomplist/blob/8f3946d58667e6d09951a994e839f297cae95e0e/chris/regex.black
https://github.com/nocturnalarchives/BlockLists
2 Likes

But tell me, I am just curious to know, here the devs have told many times that Blocklist is faster, more effective and less memory consuming ( as pihole runs on Ram I guess), even then why do you need Regex Blocklist? Any advantages over the standard and better method?

How do I incorporate this improved Regex version into my latest Pihole? A guide would be useful. Thanks

Regex allows you to tailor your blocking. Don't like domains with the word elephant in them? Add that to a regex and they are gone.

Each regex has to be evaluated when blocking, and this takes time. If a domain is in gravity, this is searched very efficiently via a B tree (quite fast). The same is not the case with regex.

Use them as you need them, but at some point if you add more your Pi-hole will get sluggish.

No, I should correct this.

Regex and exact domains work very much differently: Gravity and exact blacklist domains are stored in a tree-structure. Scanning the tree for matching a single domain against (multiple!) millions of domain merely takes microseconds, even on Raspberry Pi hardware. This is because leafs can be found very efficiently in a balanced tree and searching the tree only becomes slower on a logarithmic scale (doubling the number of domains just means one step more work, not double as much work). You should read the Wikipedia article B-tree if you want more details, but be prepared that it is not all that straightforward when you deal with tree data structures for the first time.

[INFO: calculations simplified here and below, full details would clearly exceed the format here]

Scanning a single domain against, e.g., 10 million domains in your blacklist or blocklist (gravity) takes on the order of ceil(log2(10000000)) = 24 computation steps. This is - on all platforms - on the order of few micro- (yes, not milli-)seconds and can really not be made more efficient. This is because we can quickly traverse the tree down to its leaf when the expression we're matching against is constant.

Regex, on the other hand are an entirely different beast. While we can use a tree for exact domains, we have to iterate through all the regular expressions because they all could theoretically match. While an exact is easily found (the memory is just identical), regular expressions are rules and those rules have to be applied to every domain we want to check. Explaining how the matching takes place is beyond what I can do here, too, however, I will just let you guess how much work evaluating the suggested whiltelist regex

is (click on the image to see it in it's entire glory, or click here):

We really do our best to also serve the maximum possible bare-metal performance for regular expressions, however, they are just so much more involved that they will always be a lot more work than exact domains.

It all depends on your architecture (e.g., ARM is slower on regex compared to x86 because it is missing some important optimizations) but I'd always rather use a list with 100,000 domains in favor of one complex regular expression. That being said, Pi-hole got that efficient with regular expressions, that you won't actually notice a difference in speed (I won't notice if loading a pages takes some 10 milliseconds longer).

It's the other way around. As blocklist/gravity matching is much much faster, we do this first. As it may save us from having to do regex at all. The worst case are domain that are not blocked. For them, the entirety of whitelist exact, whitelist regex, blacklist exact, gravity (exact) and blacklist regex have to be searched through. Only if a domain wasn't found on any list, we forward it upstream.

4 Likes

But it is also blocking tv2play.dk
Now i have de-activated all of them, and now tv2play.dk works fine again.
Now i will try to avtivate them one by one, and find that entry that corrupt tv2play.dk from playing programs.

Yes, it does, because tv2play uses footprint.net. You can either remove "footprint" from the correspondig RegEx, or do as @anon55913113 said and just whitelist it.

I tried that with one RegEx and it did not work.

And thats exactly, why I wrote the above RegExs. They are a collection of Trackers and Ad-Servers I found annoying.
As @DL6ER said, RegExs are not the optimum in Performance, but I am running my Pi-Hole on a NUC6CAY, which might be a little overkill, but it is enough performance for some RegExs.

One or two is what I had in mind. Not a thousand of them. I would only add regex for unwelcome domains that aren't blocked by your existing adlists and blacklist, and that is very few.

Yes, Thanlk you it helped when i remove footprint from RegEx.

Thanks, works like a charm.

2 posts were split to a new topic: Pihole blocking https://radioplay.dk/

microsoft wont alow you to block useing host file. there are scripts used to get around it. in fack host file can be used to block about 1 pecent of microsoft servers, now here whats odd even if you block all microsoft servers on a firewall microsoft still gets throw, so how this going on, but better why is http ftp servers on each pc and server and why is remote desktop and termal server also added, and they cant be removed.
what happens is your computer sends out a ping and looks for any microsoft server, in order for get network access it sets hidden routes and uses ip address that should not use , 241.0.0.1 255.255.255.255 and all other brodcast address used, i work with microsoft servers sence xp, it is very hard block microsoft but there is a way useing ipsec, but note that they can use it as well, i use be able to get it work perficky now i cant it as if something was change. we need a program that binds to log files once these file are stolen they do one thing flash harddrives in hole data center to spend down one sec. thanks to nsa for this idea, but no one ever built it, as this program has be smart very smart . able to learn we take fight back to them , some says this is ileagle NO only depolyment is ileagle they steel it , and to in cort they must admit to theft of said files. but key is there no way to turn it off, no way to restore there network,
it gets on all computes and it montors if data center rebuilt will just reinfeck it as this in is pices no virus detecktion will work it is same idea as useing dns to make vpn, we must take fight back to them if i could write it i would but i cant i cant learn code

This list is not working

https://github.com/cbuijs/accomplist/blob/8f3946d58667e6d09951a994e839f297cae95e0e/chris/regex.black

Is there an actual link?

The link is dead.
Here is the link to his page:

But: this was not a List to begin with, just some Inspiration I got.
All the RegEx "Lists" mentioned in the first post are not Lists per se, meaning you can't import them directly.

How to get the content of this file?

Try the link above. If the maintainer of this git has not deleted the file, it should be still there.
Where exactly, I do not know.

5 posts were split to a new topic: Block telemetry by regex?