Use Artificial Intelligence to Perfectly Block Ads

Sean_Paul_Hendrix · October 23, 2021, 8:25pm

I've noticed a dramatic increase in the number of ads getting through in the last week, and a dramatic decrease in the percentage of ads being blocked in pi-hole.

I'm sure this is an expected shift that inevitably happens when advertisers will periodically shift their paradigm to get their "pound of flesh". Also, lists are generally not updated frequently - and even some are just outright offline. (ex: target https://mirror1.malwaredomains.com/files/justdomains is perpetually offline)

It seems that pi-phole is perfectly poised to take advantage of AI as a "perceptual ad-blocker". When I checked this discourse for anything on "perceptual ad-blocker", nothing came up. When I checked for any articles on this process online, I came across this: The Best Adblocker - Using Artificial Intelligence to Perfectly Block Ads | IoT For All

"The Best Adblocker - Using Artificial Intelligence to Perfectly Block Ads" - ultimately, the creators decided to "avoid taking sides on the ethics of ad-blocking". in other words: it was too good - too effective, and will never officially see the light of day.

As an active patron of this amazing product (pi-hole), can I please request that perceptual ad-blocker "options", powered by AI, be actively considered (moved up the list)?

I've got a house full of children, young adults, and parents all working and going to school online. Pi-hole has been a godsend (and life saver) in my household.

jfb · October 23, 2021, 11:14pm

This is no longer maintained. I recommend deleting this and any other lists that are no longer active.

yubiuser · October 24, 2021, 11:03am

I've moved your topic to the feature request area. But I'm not sure if any dev has experience with AI implementation or if it would be feasible to run AI algorithms on low-end devices like RPi Zeros...

DL6ER · October 24, 2021, 1:04pm

I do have experience with artificial intelligence in applications and, more specifically, even things we'd use here - like shape recognition on irregular backgrounds. Even for simple stuff you need sufficient horsepower if you want to recognize not only what you already know but to learn something new. And horsepower does not only mean computational but also fast memory (not "SD card" but rather "RAM" speed).

We are talking about special hardware here. For instance, say you want to be able to distinguish between an ad and a legit image of, say, a wristwatch. Given todays typical image resolutions, this already requires powerful graphics cards if not more specialized hardware (ASICs) so fully share the concern raised by @yubiuser

Companies like TESLA have good reasons for working on specialized AI hardware for their automated driving, etc. The techniques they mention ("computer vision techniques") is just not something you want to perform on normal or, even worse, low-end general-purpose computers.

This really looks like an entirely new concept that would not work well with how Pi-hole is currently working: We are blocking ad requests from known ad providers. What you are suggesting means exactly the contrary: Download all ads, analyze them, and decide if we want to permit or block them. Even when we'd change how Pi-hole works, we still couldn't do it. Sure, a browser plugin "sees" (like, literally) what is on the page and can analyze things, however, your Pi-hole ever sees only DNS requests and has no idea what content is sourced from these pages.

Even if you'd force your traffic through Pi-hole (like in a VPN-style), it wouldn't work because the traffic between your browser and the remote page is encrypted. So Pi-hole would have to work in a man-in-the-middle way: decrypt, analyze and re-encrypt all content that is sent around to get access to it. Even without any AI, this is something that will easily overwhelm your Raspberry Pi on already moderately active networks.

I also don't think what they write on the page is as simple as they pretend. When they say things like

This method is more effective since regulations exist to enforce that advertisements must be clearly labeled. Essentially, the FTC ruled that people must be able to recognize ads.

they neglect the the majority of users with Internet access live outside of the United States. Ads can be in different languages and don't have to be labeled at all as ads in other countries. Also, if they don't have to prove their statements, they can easily say that their product would be the best if it were ever released. I'd be ready to try if it detects ads in other languages than English (and even English ones from other countries like the UK or India), but their plugin doesn't seem to be available anywhere.

Sorry to say this but a lot of products I've seen that had "AI-powered" in their title are bullshit. Maybe the only exception to this are AI-powered image enhancement algorithms used in many cameras today because this is a lot easier to do - you can recognize known scenarios and tweak some settings for that. "AI" is just a very broad buzzword for very different things.

Sean_Paul_Hendrix · October 26, 2021, 4:52pm

Great feedback -

I suppose the best way to implement AI would be focused as a separate project initiative to automate, build and maintain its own lists which every pi-hole instance could leverage as just one of the many existing options.

This would be a separate "proof of concept" project inside of the pi-hole github repository. The lists would then be living documents - maybe they even could be many separate list types that focus on specific categories like:

Gaming services
Email messaging services
File sharing
Social Media
Media streaming services
etc...

This way, the existing Pi-hole project would never be touched or compromised by any backlash with implementing AI into "ad-blocking" - since lists are just "external resources" that come and go.

Each list would be its own project and would need:

External Users who want to participate would have the sufficient hardware, security and bandwidth to host an AI service running unattended.
A centralized service - hosted by a "trusted resource" - that could queue up sites for review and process the feedback from the external services responding.
As a thought: the participating external users could feed the AI system by passing in their existing pi-logs which can seed the system.

Constant discovery and development would determine new and consistent ways of data mining and keeping everything relevant.

As a developer, I would LOVE to work with this project. If I can start figuring out and learning AI development, you can be assured I would pass it all on to this project... as your passion has become mine.

With Respect,

Sean

Coro · October 27, 2021, 6:22am

What you describe is a very manual process. Somebody will have to check if the numerous found ad-providing domains are indeed legit. Also, they'd have to manually inspect to categorize into "porn", "gaming", etc.

To me, this looks like you need at least four, rather then employees to do this. I really don't see hundreds of volunteers doing this in their spare time. Given typical IT salary (it should probably happen in Pakistan where equally trained people are cheaper), this seems to require on the order of $40,000 per month. Where should this money come from? It is surely a lot more than you'd get from donations and not even something you could initiate with a one-time kickstart event. This sounds much more like a monthly subscription model but you need enough subscribers from the very beginning so this is an attractive idea - ideally you need more than 40,000 users so they have to pay less than $1 per month which may already be too much for some because current Pi-hole is free altogether.

Sean_Paul_Hendrix · October 27, 2021, 2:51pm

Thank you for your considered response, because I really made a NOOB mistake.

The promise/curse of Artificial Intelligence over Human Intelligence is to be able to effectively process and act on information, be allowed to make and analyze mistakes, learn from them, make better decisions - and, in a fraction of the time humans could do it (even while sleep and coffee deprived). Obviously, that has historically been more science fiction than fact; however, I’ve personally seen that gap close considerably with the tools that we have at our disposal now. My co-workers are using it right now in OCR applications trying to preserve information from historical documents - old birth records and marriage certificates, etc. Their work has been producing really good results, albeit initially tedious - so, expectations managed.

From your response, I made a major misstep in my last post. Instead of “keep things simple, Sean” (KISS method), I threw in a heady concept of having different lists of category types - that’s like counting my AI chickens before they hatch (or even laid, in this case)… very NOOB of me. I know better - it leads to unreasonable expectations and scope creep. I muddied the conversation. So, let me pivot and correct.

Here’s a germ of a variation of the original idea that I am having right now (while still sitting in bed with no coffee yet) which may have the fewest moving parts (or just create more amazing conversation):

How about an AI pi-hole service (The AI - still a separate project) where a user can install on beefier hardware in their own network
The AI can point to a normal instance of pi-hole running somewhere on the user’s network.
The AI can gather the ‘allowed’ traffic from the logs created by pi-hole
The AI can see the existing pi-hole settings and mirror them internally
The AI can check the “allowed” traffic for any ADS that are not blocked by the current settings.
The AI can host its own internal list (a singular list, that is )
The user can point their pi-holes to the AI hosted list

The initial benefits:

The adopted AI user would become the most effective AI trainer, since any web services that would be blocked incorrectly would be immediately noticed by the household.
There would need to be some sort of dialog between the AI and trainer to keep things running smoothly
The time spent training an AI in specific households where people are working and going to school is invaluable - and now becoming more of the norm.
After a certain period of time where the AI has “lived”, the lessons learned - along with the actual lists - could then be uploaded and added into the “collective”.

Personally, I have a household with 2 adults working full time from home (wife and myself), 3 children in virtual school, 1 in college (virtual classes), and 1 who just lives in virtual la-la land (all gaming platforms). There are quite a few special needs in my house and pi-hole has kept things running very smoothly.

I’m just trying to think of ways to give back to the pi-hole community, since pi-hole has given so much to me, and my family (and friends).

Sean_Paul_Hendrix · October 27, 2021, 3:22pm

Also, quick thought - note:

Instead of having AI do image recognition - I’m more interested letting it review all the links on a particular page and seeing where the “rabbit-hole” goes.

ALERT - PERFECT PROJECT NAME - RABBIT-HOLE

Since links are simple to pull from a page, i propose a simple brute force approach in a very safe space: “the briar patch” - if a link leads to a known bad location - that get added to the list.

Simple,

Sean

P.S. Emphasis on oversimplification… I’m that guy that spent 7K+ on a whole new home network just to let pi-hole to do its thing to the best of its ability. I’m not swimming in cash, but somethings are worth doing right… no regrets.

P.S.S. Don’t tell my wife.

P.S.S.S. Kidding… she knows.

system · November 3, 2021, 6:18pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.