Collection of RegEx for blacklisting

Hi.
I have been lurking here for some time now and learned a lot.
In the last weeks, I have been primarily looking into RegExs and accumulated a small collection, that I am using on my Pi-Hole. I created these RegExs by searching for the Top100 Tracking, Data Analysis, Advertising and Malware Sites, as well, as combing trough some Blacklists and Privacy/Tracking Agreements on Sites I visit regularly.
Since I wanted to give something back to the Community I learned from, here they are.

Just one more thing, before anyone is asking:
No, I have not found a way to block youtube ads, sorry.(But I am open to suggestions :slight_smile: )

Please feel free to use, correct or add.
Enjoy :slight_smile:

Blacklists:

^(.*\.)*(sonobi|contentabc|exponential|bebi|3lift|leanplum|hotjar|m-pathy|parsely|exelator|touchcommerce|magnetic|exosrv|go2affise|mgid|clksite|tynt|inclk|bidswitch|optimizely|tiqcdn|maxymiser|9xiazaiqi|siquality|snwxn|8wix|tudown|psdzy|bankofamericaslpemr|cloudns|pfa.levexis)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(dotomi|urbanairship|criteo|yimg|footprint|sitescout|turn|w55c|openx|voluumtrk|videoplaza|inmobi|reporo|cnzz|skim(resources|links)|tribalfusion|getsocial|hoverr|mediaplex|auditude|tacoda|puserving|d(e|o)mdex)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(buffpanel|redshell|treasuredata|unity(ads|3d)|evidon|flashtalking|mathtag|mediamath|scorecardresearch|yieldmanager|sharezips|liveadvert|mktoresp|online-metrix|mobileapptracking|webengage|mpstat|trafficjunky|alphonso|stickyadstv|marketo|tubemogul|kameleoon|24s|0x1f4b0)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(petametrics|localytics|onthe|newrelic|casalemedia|tidaltv|newsinc|nr-data|tapad|crazyegg|vserv|solocpm|ojrq|getclicky|narrative|7eer|evyy|impactradius|redtrack|content-ad|sharethrough|getui|pro-market|albacross|evergage|araleg|districtm|singular|dynamicyield|smrtb|everesttech|snapchatprime)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(perimeterx|coremetrics|veinteractive|netcoresmartech|vungle|batmobi|marinsm|igexin|keywordblocks|facebook-info|freecontent|hostingcloud|360safe|axiatraders|beginads|ero-advertising|inteleksys|joodfbnm*[0-9]*|nbrwer*[0-9]*|usergrid|trackersimulator|tapjoy|supercell|bluekai|chartboost|atues|duilawyeryork|bannerflow|bmmetrix|flagcounter)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(segment|braze|amplitude|emetriq|buysellads|ioam|ensighten|eulerian|dnsdelegation|clicktale|samsungads|userlike|summerhamster|sourcepoint|adobetm|pcapredict|syscation|infosupports|arrowservice|bigdepression|earthsolution|firefoxupdata|infobusinessus|yahoodaily|newsonet|worthhummer|purpledaily|blackcake|msnhome)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(inmobicdn|jizzads|leadbolt|nexage|milleniamedia|mobfox|mobilityware|newrelic|propellerads|revd(depo|sci)|segment|serving-sys|sharethis|startapp|steelhousemedia|tapjoy|viglink|webterren|adtaily|zedo|liadm|exposebox|jazdoxthxiv|jwmwtcmexc|thetradedesk||rocketfuel|ignitionone|komoot|squarelovin|cookiebot)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(aatkit|abtasty|acce(ngage|sstrade)|adtec(h|jp|us)|anthill|atdmt|avazutracking|bfmino|bounceexchange|branch|clever(push|tap)|conviva|crittercism|cxense|deltadna|duapps|fractionalmedia|fyber|hyprmx|justpremium|lijit|mobileapptracking|ogury|omniture|presage|tapad|pubmatic|pushwoosh|qq|servedbyopenx|supersoni(c|cads)|swrve|webtrekk|aimatch|sas)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(actioniq|hs-(analytics|banner)|hsleadflows|hubspot|agileone|bizible|crazyegg|acquia|wistia|engagio|lytics|am15|bannerbank|bbelements|bravenet|cedexis-radar|crypto-loot|economicoutlook|esomniture|estat|extreme-dm|ezcybersearch|fastclick|focalink|gemius|hyperbanner|iovation|kaffnet|yieldlove|histats|forsee)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(indexww|yieldlove-ad-serving|xplosion|digitru|upapi|spotxchange|permutive|onesignal|usabilla|contextweb|ml314|afcpatrk|(klclick[0-9])|aidata|hybrid|deployads|perfectaudience|liveramp|didomi|etahub|giraff|hurra|sizmek|nativo|djaxadserver|webmasterplan|ggpht|zanox|audrte|agkn|dataxu|scarabresearch|ekomi)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(apsalar|tune|qualtrics|neustar|webtrends|linksynergy|backtrace|doubleverify|ligatus|a4|rayjump|umeng|umengcloud|wootric|medialytics|herokuapp|yllix|mobpartner|combango|bugsense|burstly|count|crashlytics|do-not-tracker|eviltracker|getexceptional|jumptap|playtomic|stathat|163|206ads|2mdnsys|360in|4seeresults|accesstrade|ads[0-9]-adnow|pop(ads|cash)|popadscdn)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(urekamedia|kochava|upsight|marfeel(cache|cdn|rev|tenmax)|blueseed|adx[0-9]|zucks|o-s|aralego|breaktime|mdotm|juicer|cnt|ematicsolutions|alcmpn|powerlinks|33across|1rx|bfmio|bnmla|doublepimp|weborama|specific(click|media)|acuityplatform|cnnx|scanscout|nxtck|socdm|simpleanalytics|gumgum|marketolive|demandbase|izooto|ibillboard)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(invitemedia|microad|ardata|152media|exoclick|doublemax|wywy|navdmp|trafficstars|pubexchange|vdopia|lsosad|aaxads|mantisadnetwork|triplelift|trafficfactory|themoneytizer|insurads|proads|pxlad|flux-adserve|eclick|vcmedia|nova(net|onads|on|nox)|coccoc|polyad|cleverads|ambient-platform|mangoads|cityads|autoads|yoads)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(oewabox|metriweb|caramb(o|ola)|refinedads|klaviyo|a-mo|fieldtest|justuno|smartredirect|ezoic|statscrop|optinmonster|nr-data|avocet|bttrack|eyeota|jetlore|1up|abmr|acxiom-online|mlnadvertising|amgdgt|askmen|bam-x|bidr|bluecava|brand-display|brilig|ccgateway|channelintelligence|choicestream|clickagy|cognitivlabs|collective-media)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(company-target|crosspixel|crsspxl|decdna|decideinteractive|disqus|emxdgt|extremetech|eyeviewads|btbuckets|fetchback|fimserve|freeskreen|gmads|gwallet|hlserve|imiclk|innovid|insidecrm|intentmedia|interclick|ipr(edictive|omote)|iqm|kargo|knorex|lkqd|logicbuy|marchex|media6degrees|mediaforge|mediaiqdigital|mixpo|mmismm|numberly|1000mercis)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(mxptint|ncaudienceexchange|net(mng|seer)|owneriq|parrable|pm14|postrelease|pro-market|raasnet|retargetly|revsci|rfihub|rkdms|ru4|samba|semasio|sojern|udmserve|unrulymedia|vindicosuite|vmmpxl|vmweb|wsod|xad|xgraph|yieldmo|yieldoptimizer|youknowbest|zemanta|skimlinks|ipromote|lotame|contentsquare|matomo|piwik|roq|emerse|widespace)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(nrich|rtbhouse|sovrn|beeswax|indexexchange|c3metrics|contactimpact|1plusx|semrush|improvedigital|xandr|madvertise|neural|rhythmone|inskinmedia|jivox|gsi-one|celtra|tapfiliate|req|moburst|webmechanix|divisionoflabor|commcreative|metrictheory|gkv|periscope|yieldbranding|gumas|fivebyfiveglobal|williamswhittle|baycreative|milleradagency)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(farinella|wearesculpt|neonambition|envision-creative|tronviggroup|databricks|gotomarketers|maxaudience|perfectsearchmedia|titangrowth|3mediaweb|directom|seobrand|disruptiveadvertising|bigleap|silverbackstrategies|ignitevisibility|thriveagency|webfx|seoinc|socialseo|frac|klientboost|avalaunchmedia|siegemedia|straightnorth|gulosolutions|grafik)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(thestorywebs|directiveconsulting|pbjmarketing|befoundonline|walkersandsdigital|firebellymarketing|comradeweb|contentbureau|hanapinmarketing|fruition|digitalbrandexpressions|noblestudios|thoughtspot|collibra|erne|uimserv|belboon)\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(app(boy|next|adhoc|celerator|ier|odeal|lovin|nexus|sflyer|boy|brain|ier|lvn|spot|timize|see|ads|lifier|lift|logrocket|-measurement|topia))\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(ad(j|skeeper|dtoany|vertica|true|vertnative|pushup|media|master|ledge|kernel|kmob|future|bro|booth|ap|apt|4game|just|telligent|4mat|tng|alliance|link|marketplace|[0-9]|adapted|acado|alliance|alyser|brite|bureau|and|blade|brodealsnetwork|spirit|epom|lightning|plugg|glare|speed|butler|x|roll|port|river|srvr|advisor|symptotic|dthis|rtx|scale))\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(ad(smogo|target|infuse|conversantmedia|now|view|ups|tim(a|aserver|ming)|srv|spruce|see|same|sniper|smoloco|shot|skeeper|x1|tdp|hese|lightning|micro|entifi|ventori|worx|zerk|roll|safeprotected|this|colony|tech[a-z]{1,2}|dthisedge|looxtracking|vertising|nxs|(ds|s)wizz|omik|k2x|k2|ition|this|form|ocean|protect|science|tlgc|vertserve|thrive))\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(ad(netasia|clixx|hacker|xcorp|vergine|optimize|grx|sfactor|snative|op|high|xxx|schoom|recover|spsp|safety|marvel|brix|servme|flex|rcdn|maym|pone|xadserv|up-tech|timaserver|network|tima|360|pia|sota|splay|sparc|legend|chemy|haven|ready|viva|gear|ikteev|dapptr|defend|ventureppc))\.([a-z]{2,3}\.)*[a-z]{2,6}$

^(.*\.)*(ad-(balancer|brix|center|cloud|delivery|delivery|locus|maven|move|plus|score|srv|stir))\.([a-z]{2,3}\.)*[a-z]{2,6}$

Whitelist:

^(.*\.)*(dpd|ups|dhl|safelinking|githubusercontent|researchintel|intel|github|foldingathome|sysctl|apple-dns|apple|icloud|windowsupdate|samsung(electronics|otn|acr|cloudsolution|qbe)|plex|office|msftncsi|microsoftonline|dropbox|pr0gramm|solvemedia|microsoft|tnt-digital|fedex|creative)\.([a-z]{2,3}\.)*[a-z]{2,6}$

Sites I learned from, and used to inspire me:

For checking a RegEx:

https://regex101.com/
https://regexr.com/

Tested and Maintained Regexs:

https://github.com/mmotti/pihole-regex
https://github.com/cbuijs/accomplist/blob/8f3946d58667e6d09951a994e839f297cae95e0e/chris/regex.black
https://github.com/nocturnalarchives/BlockLists
2 Likes

But tell me, I am just curious to know, here the devs have told many times that Blocklist is faster, more effective and less memory consuming ( as pihole runs on Ram I guess), even then why do you need Regex Blocklist? Any advantages over the standard and better method?

How do I incorporate this improved Regex version into my latest Pihole? A guide would be useful. Thanks

Regex allows you to tailor your blocking. Don't like domains with the word elephant in them? Add that to a regex and they are gone.

Each regex has to be evaluated when blocking, and this takes time. If a domain is in gravity, this is searched very efficiently via a B tree (quite fast). The same is not the case with regex.

Use them as you need them, but at some point if you add more your Pi-hole will get sluggish.

No, I should correct this.

Regex and exact domains work very much differently: Gravity and exact blacklist domains are stored in a tree-structure. Scanning the tree for matching a single domain against (multiple!) millions of domain merely takes microseconds, even on Raspberry Pi hardware. This is because leafs can be found very efficiently in a balanced tree and searching the tree only becomes slower on a logarithmic scale (doubling the number of domains just means one step more work, not double as much work). You should read the Wikipedia article B-tree if you want more details, but be prepared that it is not all that straightforward when you deal with tree data structures for the first time.

[INFO: calculations simplified here and below, full details would clearly exceed the format here]

Scanning a single domain against, e.g., 10 million domains in your blacklist or blocklist (gravity) takes on the order of ceil(log2(10000000)) = 24 computation steps. This is - on all platforms - on the order of few micro- (yes, not milli-)seconds and can really not be made more efficient. This is because we can quickly traverse the tree down to its leaf when the expression we're matching against is constant.

Regex, on the other hand are an entirely different beast. While we can use a tree for exact domains, we have to iterate through all the regular expressions because they all could theoretically match. While an exact is easily found (the memory is just identical), regular expressions are rules and those rules have to be applied to every domain we want to check. Explaining how the matching takes place is beyond what I can do here, too, however, I will just let you guess how much work evaluating the suggested whiltelist regex

is (click on the image to see it in it's entire glory, or click here):

We really do our best to also serve the maximum possible bare-metal performance for regular expressions, however, they are just so much more involved that they will always be a lot more work than exact domains.

It all depends on your architecture (e.g., ARM is slower on regex compared to x86 because it is missing some important optimizations) but I'd always rather use a list with 100,000 domains in favor of one complex regular expression. That being said, Pi-hole got that efficient with regular expressions, that you won't actually notice a difference in speed (I won't notice if loading a pages takes some 10 milliseconds longer).

It's the other way around. As blocklist/gravity matching is much much faster, we do this first. As it may save us from having to do regex at all. The worst case are domain that are not blocked. For them, the entirety of whitelist exact, whitelist regex, blacklist exact, gravity (exact) and blacklist regex have to be searched through. Only if a domain wasn't found on any list, we forward it upstream.

4 Likes

But it is also blocking tv2play.dk
Now i have de-activated all of them, and now tv2play.dk works fine again.
Now i will try to avtivate them one by one, and find that entry that corrupt tv2play.dk from playing programs.

Yes, it does, because tv2play uses footprint.net. You can either remove "footprint" from the correspondig RegEx, or do as @anon55913113 said and just whitelist it.

I tried that with one RegEx and it did not work.

And thats exactly, why I wrote the above RegExs. They are a collection of Trackers and Ad-Servers I found annoying.
As @DL6ER said, RegExs are not the optimum in Performance, but I am running my Pi-Hole on a NUC6CAY, which might be a little overkill, but it is enough performance for some RegExs.

One or two is what I had in mind. Not a thousand of them. I would only add regex for unwelcome domains that aren't blocked by your existing adlists and blacklist, and that is very few.

Yes, Thanlk you it helped when i remove footprint from RegEx.

Thanks, works like a charm.

2 posts were split to a new topic: Pihole blocking https://radioplay.dk/