Need a host for 75 million blocklist please

Looking for someone to host a 3 Gb yacyblocklist online to share with other pihole users please.

If anybody knows a site that I could use that would be good to.

Screenshot from 2024-03-16 17-26-47

Out of interest, are you running on a Raspberry Pi? If so, how long does it take to run a Gravity update with that list?

It takes approx 11 mins on a HP 360DL gen 8 server.
I have loaded a 9 million block list on raspberry pi 3B, from memory, it took in the order of 1 hour so, going by that possibly 8 hours.

github?
or is there a file limit
nm:
it's not for free a file that large

75 million valid domains?
Split it up into smaller files and put it on github.

Yes that is possible but I would have to split it up into 4 segments.

About 8 million are invalid.

Would it not be simpler and quicker to do the following:

  • Add a blacklist regex .* to block everything
  • Let the network (attempt to) run for a short while
  • Use the Audit Log to whitelist the top domains in order based on most requested where those domains are not in the standard blocklist.
  • Iterate around this loop

This whitelist process can be automated, to some degree, by querying the blocked domains endpoint and interacting with the audit database directly.

This way only domains that are already not in the standard supplied list and which are being requested above a threshold would be allowed, and everything else would be blocked.

Thanks for your time to comment @chrislph

Plenty gets through.
Screenshot from 2024-03-17 18-05-44

That chart isn't very informational, as it doesn't have any details on domains (e.g. could be always the same domain).
Also, we don't know what's on your list of allowed domains (I recall you mentioning having to whitelist a lot of domains from another post).

I feel left out, I only have 11,200,000 :wink:

Gitlab has a free tier 5GB

that's total. not per file

Totally missed that, what is the "per file limit" ?

Would a free R2 bucket work? 'Maximum upload size 5 GiB'

It's in the link

but 2GB for free on large file storage, 100MB for normal files.

Can you point out the file limit? I can't see it in the link

See how this link goes for traffic.
The list I have been talking about and using with Yacy.

http://gts.undo.it/index.php/s/55aEx6yESmCFS8N/download

A backup site you have to download it and store it in the pihole web server folder eg /var/www/html
then point it to it in the adlists.
https://u.pcloud.link/publink/show?code=XZdKIF0Z9CvvFQCEO3uVgQkevMRadFl7NuKX

Having a look through the list, there are loads of valid, important domains on there like GitHub's and Pi-hole's domains, Debian's and Ubuntu's domains, plus entries for localhost and pi.hole, as well as loads of non-domain entries that will just waste Pi-hole's time and energy and needlessly extend the update window.

Some of these, especially the valid domains, may impact functionality on the network, including for Pi-hole itself and its OS. For example systems will be unable to access lists or repos on GitHub.

Using this list is likely to lead to unexpected instability and failure modes on the network, largely due to indiscriminate overblocking. It's not suitable to be reliably used as is.

I'd suggest a new project to prune the hell out of the list – remove the millions of non-domain entries, collate the valid entries into genuine blocklists that should be blocked. But even then you're essentially repeating work that's already been done when someone finds a curated list for a given service (eg TikTok, Facebook) already out there.

2 Likes

or maybe not :sunglasses:

@smokingwheels I'd rather this list grow to maybe 100 million