Request; Support g-zip content encoding on blocklist downloads

As the title suggests.

Less data use for all clients using pihole and all servers serving blocklists for them.

Enforcing g-zip compression is possible.
But then a lot of clients downloading the blocklist of my server would not know how to deflate it, thus failing.
My server should only gzip the content if the client sends the instruction that it supports g-zip by using the "Accept-Encoding" request header.

Just an idea. How about detecting the ability to deflate gzip by interpreting the user agent string?

These are pi-hole's request headers;

Host: oisd.nl
Accept: */*
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36

Not done. And even if it was... pi-hole still doesn't know how to handle g-zipped content encoding.

Pi-hole uses curl for downloading the source files.

This could probably be changed to

curl --compressed ...

man curl:

       --compressed
              (HTTP) Request a compressed response using one of the algorithms
              curl  supports,  and  save  the  uncompressed document.  If this
              option is used and the server  sends  an  unsupported  encoding,
              curl will report an error.

We might need a fallback.

Now that's more like it!
Why need a fallback? Because of the following;?

If this option is used and the server  sends  an  unsupported  encoding, curl will report an error.

Why would a server do that? I mean .. that would be the same if I was to respond to pi-hole's request with gzipped content at this very moment.

Well, it depends on what you support. There are two things to get clarified before we can decide to ignore a fallback:

  1. What happens if the server does not support compression at all?
    • Does curl error out or does it continue happily downloading the uncompressed file?
  2. What happens if the server responds with g-zip, however, the local curl does not support it*?
    • Is there an automated negotiation to receive uncompressed data?
    • Will curl fail with something like Unknown option --compress`?

I see that this might be interpreted as pernickety, however, this is a critical component of Pi-hole and we do not know what kind of lists users use so we cannot do all-inclusive tests on our own.

It'd also be interesting to see what the --compressed option does when invoked with file://... URLs.


*) Possibly because it is a minimal binary in a docker image, or compiled from source with g-zip support disabled.

The web server is by no means obligated to use any compression method – this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.

Source: HTTP compression - Wikipedia

Why would adding --compressed add an accept-encoding header value for an encoding it doesn't support?

No, not "by design", but only "by suitable build options" - this is what I meant. See for example

Even if this precise example does not apply here, it shows that there can be curl binaries that error with this option being set.

No, this is not what I meant. I mean: What happens if the accept-encoding does only support compression algorithms the server doesn't understand? Will it fail or will it happily send in plain text (uncompressed)?

You see I'm totally not against your request, I actually support it. I just want to ensure that nothing breaks anywhere and that Pi-hole continues to work where it currently works.

I still think we should add a fallback to try again without the --compressed flag in case the first request fails for whatever reason.

Yes ... by design was wrong, Ive removed that.

Anyways: I've editted gravity.sh on my pi-hole and added --compressed to it;

The request headers changed to;

Host: oisd.nl
If-Modified-Since: Sat, 10 Aug 2019 09:47:13 GMT
Accept-Encoding: deflate, gzip
Accept: */*
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36

Which is perfect.
Also downloads the gzip encoded blocklist from my server, and succesfully deflates it on the pihole, and the amount of domains also matches.

EDIT: So I hereby have confirmed that pihole's curl library does have gzip/deflate support.

All that needs now is some more testing with other blocklists. (See if they still download/work without problems)

No, this is not what I meant. I mean: What happens if the accept-encoding does only support compression algorithms the server doesn’t understand? Will it fail or will it happily send in plain text (uncompressed)?

It will send uncompressed.
For example; Modern browsers all send Accept-Encoding headers (with gzip/deflate/brotly/whatever) in them. (They don't know if a server suports any of those, and afaik I've never seen a browser fire a second request without those headers after the first one failed?) - but hey.. where's my proof :stuck_out_tongue:

You have to be careful, we do not ship our own curl but use the one that is provided by the user's system. We have to expect a variety of different versions (and variants) of curl.

We can make curl seem whatever we deem appropriate. Do you know a server (any, only for testing purposes) that does not support gzip?

Okay, I confirmed this myself already.

$ curl --compressed -I http://dl6er.de
HTTP/1.1 200 OK
Date: Sat, 10 Aug 2019 10:25:11 GMT
Server: Apache/2.4.39 (Unix)
Vary: User-Agent
Last-Modified: Wed, 27 Mar 2019 15:56:31 GMT
ETag: "9b4-585157b311b92"
Accept-Ranges: bytes
Content-Length: 2484
Content-Type: text/html

You have to be careful, we do not ship our own curl but use the one that is provided by the user’s system. We have to expect a variety of different versions (and variants) of curl .

Hadn't thought of that, you are right.
There should be a way to find out if it is compiled with support for it, right?

The logic needed for testing this (search for a string in some version output, taking into account that on international devices the response might be non-English) will likely exceed to effort needed for a retry.

By now, I'm almost tempted to either just add --compressed without any other modifications and see if we get any user error reports during the upcoming Pi-hole v5.0 beta testing phase - or to add the fallback.

@sjhgvr Do you want to open a pull request for this?

2 Likes

What's that? lol.
I'd say; just fix it :smiley:

There is a lot of uncertainty and potential breakage here. Instead of throwing in a flag to curl that may or may not be compiled and and then adding more complexity to an already overly complex gravity process how about mirroring this list on GitHub or GitLab and just CNAME'ing the existing domain to use their distribution? Or if you want to keep using your own server you can put a CDN in front of things since this is highly cacheable?

Sigh.. let's just assume my blocklist has got nothing to do with it.

Is it even possible to compile curl without gzip/deflate?
And if so; will adding that flag do anything at all?

Then what is the basis of the feature request?

Edit: Oh, I see you've edited the original post to change the content.

Less data use for all clients using pihole and all servers serving blocklists for them.

Yes I have.