Request; Support g-zip content encoding on blocklist downloads

Ah yes, it was posted before. Read over that. Thanks.
Edit; That is windows though

So, to stop speculations I actually cloned the curl repository and checked the source code.

Support for compression is only available when libz or brotli is available at compile time.

If the option is not available, curl will fail instead of continuing without compression.

1 Like

Love people who have the knowledge to do this stuff, and actually do it. Respect for that!

So that would mean;

  • Use --compressed, upon fail; ditch the flag and retry

?

Test proposal

if curl -V | grep -q "libz"; then
 comp_flag="--compressed"
else
  comp_flag=""
fi
2 Likes

:point_up::+1:

Stock lists:

dschaper@nanopineo:/etc/pihole$ cat /etc/pihole/adlists.list | while read line; do curl -so /dev/null --compressed -w '%{size_download}' $line; echo; done
313484
290149
638705
345
613
43642
235309
dschaper@nanopineo:/etc/pihole$ cat /etc/pihole/adlists.list | while read line; do curl -so /dev/null -w '%{size_download}' $line; echo; done
1248566
595746
638705
345
613
43642
1772405
dschaper@nanopineo:/etc/pihole$ cat /etc/pihole/adlists.list 
https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
https://mirror1.malwaredomains.com/files/justdomains
http://sysctl.org/cameleon/hosts
https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist
https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
https://hosts-file.net/ad_servers.txt

Note that this PR still needs to get reviewed and approved. There is a certain likeliness but no guarantee (!) that this will eventually enter the code.

The implementation is not solely vote-driven. If a contributor (can very well be an external code contribution, too!) decides that they want to work on something, then they can submit a pull request.

We periodically look at feature requests ranked by votes and check if we can implement them. This list looks kinda imbalance because there are many highly voted on requests that we cannot implement right now. However, you need to be aware of that we close (= free the votes) implemented feature requests. Those do not show up on this list.

But me :wink:

The changes/additions in the pull request are minimal and don't increase code complexity in my opinion.

2 Likes

+1 since i cant vote or like.

2 Likes

agreed
+1
i'm adguard home user but still lol :sunny:

1 Like

The feature did not exist - it was a PR only and was not implemented for the reasons noted in the pull request.

1 Like

There's a reason it's a no. @sjhgvr sending your minions to game the vote isn't going to change my mind.

People that register solely to vote and have not made any comments or provided anything of value to the community do not get votes. Vote ability and vote quantity depend on how much you contribute to the discourse community. Specifically for reasons like someone telling people to register to just upvote and then leave.

I must say that I'm a bit worried about this. For knowing which file was downloaded that often and not only wildly guessing what is causing the total traffic on the server, they have to have logging enabled.
While it is understandable that the want to know who is downloading the how often (humans are curious, after all), such a thing is not very respectful. The privacy of their users is severely degraded when they run a HTTP server with logging as they will also record IP addresses which is definitely sensitive personal data.

For instance, we do not know how many people from where are downloading Pi-hole. That's specifically the reason why we have Pi-hole download everything from GitHub instead from our own servers. This is to make it perfectly obvious to you that we're not collecting your data. Trust is good, however, control is better. If it is technically impossible for us to record your sensitive data, then this is better than simply trusting us to not be the bad guys.

Just leaving this here to think about...

While I strongly disapprove the way the votes on this feature request are might be pushed, I do see some value in it.

It might be only ~2MB for the "default" lists but users tend to have a lot of adlists and some even modify their cron to have them updated more than once a week. This adds up and generates bandwith. Pihole could contribute a bit to reduce the carbon footprint and the power consumption of the internet with that feature :deciduous_tree: :earth_africa:
(I know it will be only a tiny tiny fraction, but setting good example?)

So, It turns out I missed this thread entirely when it was originally going on. I'm going to step in in two capacities here:

As a mod.

  1. Lets keep this discussion objective. Speculation only leads to assumptions, and assumptions (wrong or right) can damage reputations on both sides of a disagreement.

  2. I'm not going to remove anything that has already been said here, but please lets just keep it civil

  3. Brigading a post from a site external to this is not on. Don't do that please. It's petty. There is no need to throw your toys of the pram every time someone disagrees with you, it's incredibly unbecoming and just makes people that already don't like working with you want to work with you less. (This is not a point for discussion)

As a developer

  1. I agree that on a "default" install, there is little to gain from this, but I do not want to give the impression that we see the "default" lists as the only way to run Pi-hole. As discussed many many times before, in an ideal world we would not even have any default lists, BUT we do because it lowers the barrier to entry. Removing default lists is not going to happen - but say we shipped with 0 lists, we could argue then that the default experience gains nothing from utilising compression. That Pi-hole is customisable is part of it's charm. We don't set any arbitrary limits on what users ultimately can and cannot block, it is up to them. Personally I do not agree with those that change the cron job to download lists more often than once a week - but it's not something we can stop. If we can save low-bandwidth users some bandwidth, then that's good enough for me (knowing full well at least one of our developers has very limited bandwidth available to them)

  2. Complexity-wise, personally I don't think that adding the check as proposed makes gravity any more complex. The way it is written simply leaves out the --compressed flag if support is not available for that system (it seems that on most linux distros this support is available, having checked on some droplets - at least the ones we support - but that's almost moot, if the flag is not supported then we just don't use it - to me, there is no need to make this testable in the test-suite)

  3. Ultimately - if I had time to work on it - I would love to see the logic for gravity implemented into pihole-FTL instead of a bash script, or maybe even (thinking around the complexity of porting gravity.sh to c, even "just" adding a pihole-FTL --download-list command that can be called by gravity.sh to grab the file for it to process. This would solve the issue of whatever flags are compiled for the user's choice of operating system.

  4. Point 3 is just me thinking out loud without really thinking. Don't read anything into it.

  5. I've typed a lot more than I indented to type. I'll stop now with a tl;dr - I agree with adding this "feature" in principle - but I also agree that the implementation should be such that it "just works"

  6. I've had another thought, could we not just attempt curl --compressed [other options] [url], and if it succeeds, great, if not just fall back to curl [other options] [ur]. Not 100% sure how this would look/work in practice.

  7. I typed more. Sorry.

It may fail because of different reasons. I don't think it would be worth adding checks if the fail is due to unsupported feature or no Internet connectivity, or host down or ... I'd rather like to do the test for the availability of the feature rather than trying if specifying the option breaks things and, if it does, trying again without the option.

This may become reality at some time. It would likely be beneficial in a lot of ways (complexity, maintainability, speed, etc.).

Very fair.