Difference in update entries "No Changes detected" and "List stayed unchanged"

I noticed that when you update gravity that there are two kind of blocks for a list to be updated and i was wondering how that happens

  [i] Target: https://block.energized.pro/extensions/regional/formats/hosts
  [✓] Status: No changes detected
  [i] Analyzed 41105 domains

on the one hand

  [i] Target: https://raw.githubusercontent.com/jerryn70/GoodbyeAds/master/Hosts/GoodbyeAds.txt
  [✓] Status: Retrieval successful
  [i] Analyzed 213268 domains
  [i] List stayed unchanged

on the other.

Does Github not tell you that the file is the same as the last time you tried, are we missing some caching headers or etag?

Exactly.

Status: No changes detected is based on the http header response

And List stayed unchanged really compares the the checksum of the downloaded files after http return code was 200.


See the missing last-modified for github

chrko@ThinkPad-X230:~$ curl -L -I https://raw.githubusercontent.com/jerryn70/GoodbyeAds/master/Hosts/GoodbyeAds.txt -o -
HTTP/2 200 
cache-control: max-age=300
content-security-policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
content-type: text/plain; charset=utf-8
etag: "2fb1b9db6f93375408cb26ca6988eadb8a0bdd7e3b426e12a300be681466430d"
strict-transport-security: max-age=31536000
x-content-type-options: nosniff
x-frame-options: deny
x-xss-protection: 1; mode=block
x-github-request-id: C21C:4053:50A535:560C50:629E2C10
accept-ranges: bytes
date: Mon, 06 Jun 2022 16:39:09 GMT
via: 1.1 varnish
x-served-by: cache-hhn4031-HHN
x-cache: MISS
x-cache-hits: 0
x-timer: S1654533550.685185,VS0,VE83
vary: Authorization,Accept-Encoding,Origin
access-control-allow-origin: *
x-fastly-request-id: a1ef16c1df1e82a8c0ae4233de89d98c5bc60290
expires: Mon, 06 Jun 2022 16:44:09 GMT
source-age: 0
content-length: 5943985

chrko@ThinkPad-X230:~$ curl -L -I https://block.energized.pro/extensions/regional/formats/hosts -o -
HTTP/2 200 
date: Mon, 06 Jun 2022 16:39:37 GMT
content-type: application/octet-stream
content-length: 981172
last-modified: Mon, 06 Jun 2022 14:15:29 GMT
etag: "629e0c01-ef8b4"
accept-ranges: bytes
cf-cache-status: DYNAMIC
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=Q4FO4qJAobrBKPN8n98N1pLMqr0R5VFkKrqysBuj%2B1u0Kvk085LXgTSvtQKD3WRE%2FPZrG8jJeLWN0605vrxvn1UwmTYigVniveB%2BaRr6oeAsr293kMAF82ILRsJEEC8iOZTmgnig%2F5OCpUecWIeD4Frm"}],"group":"cf-nel","max_age":604800}
nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
server: cloudflare
cf-ray: 717295cd0b8a6d73-MUC
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

So an Etag won't be enough?

It could maybe. But currently we don't check the Etag in the gravity script.

Would be nice if you would do that eventually. I think that could speed up the update process by quite a margin

Pi-hole already handles the resulting HTTP 304 Not Modified responses, but only the webserver would know why it decided to send that 304 - it may be based on any set of information from the client request headers, not just an ETag value.

ETags are a server-side feature: There won't be one to start with if the webserver that delivers a blocklist wouldn't provide one, a webserver implementation may consider client or connection specifics when generating an ETag for a given entity, and it is also that webserver that would handle ETag comparison when deciding whether it should return the requested resource or a HTTP 304 instead.

2 Likes

So it is somethingpi hole has to remember and send over with the next request hoping that the server kept track. Ah ok.

IMS only functions if the response would have included a Last-Modified header; If-None-Match and Etag are similarly paired.

Do we send the IMS, INM and Etag Header pro-actively to get the Server to reply with with a 304?

Note that the response you seem to quote, from some undisclosed source, would again be produced by the web server.

I cannot know whether you would send those.

As mentioned before by yubiuser, Pi-hole isn't currently sending an ETag value, nor would it store one in its database for that purpose. Pi-hole is considering the Last-Modified header option, though that doesn't work for GitHub - you would have found this referred to as heisenbergCompensator upon scrutinising the full code as linked by yubiuser.

When considering to send an ETag, if you are using curl for retrieving blocklist contents (as Pihole does), you should be aware that ETags may not be uniformly supported by an actual curl version of Pi-hole's host system, which would cause curl to fail (e.g. none of my Armbian or PiOS machine's curls would support --etag-save).

Also, as mentioned before, server-side ETag calculations may consider non-resource-related elements, which would render them somewhat useless for Pi-hole's purposes.
To give an example: If a server hosting a blocklist would calculate an ETag to reflect some client session state or to be used as identity match of a server's transient in-memory cache, then Pi-hole's weekly gravity update may never see a 304.

Dealing with this correctly may require to test for specific versions of curl, and it would significantly impede a uniform way of retrieving blocklists.
The code ultimately may have to be changed to treat each blocklist retrieval individually, posing the additional question of how to acquire the respective necessary sets of parameters.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.