Do not base "Status" of gravity run on HTTP response but actual file content

I've noticed that (most of) my adlists get the status Status: Retrieval successful instead of No changes detected when running pihole -g, even when running it twice within a minute.

Looking at the code, I see that the ''Status' is based on HTTP response.

The code even states that some servers don't provide the necessary header

As pihole will save all downloaded lists locally, I request to compare old local list and (new) downloaded to decide if "Status" should be "changed" or "not changed".

This could be extended to locally saved adlist, which at the moment will no never get "Status: no change detected".

This would allow to improve gravity's output by separating it into two lines for each adlist:

Status: Retrieval successful
Status: (No) Changes detected

Furthermore this would reliably allow to determine if the actual file content has been change, which is a prerequisite for FR like

and

Could do a md5sum.
Doesnt look too heavy for a Raspi 1B:

pi@noads:~ $ wc -l /etc/pihole/list.0.raw.githubusercontent.com.domains
66230 /etc/pihole/list.0.raw.githubusercontent.com.domains

pi@noads:~ $ time md5sum /etc/pihole/list.0.raw.githubusercontent.com.domains
3cf174431cd2b6498bb5b5548024ae6c  /etc/pihole/list.0.raw.githubusercontent.com.domains

real    0m0.217s
user    0m0.084s
sys     0m0.046s

The reason we do it the way it is now is to help list providers. We use curl to determine if there is a change and only download if there is a new file.

With your proposal we'd download every list every time and that would increase the traffic and load on list providers.

I do see why you do it as you do it now. But I have 8 out of 9 lists from github - so your server-friendly approach doesn't work for theses and the lists will be downloaded anyway. And the idea of a status "change/no change detected" becomes meaningless.

If you want to keep the current implementation to be nice to list providers I'd like to change my request to:
Add a second layer of change-detection after downloading based on file content. If HTTP response already indicates no change, skip the comparison.

We write Pi-hole to work for the majority of users, and for the majority the existing way is better.

1 Like

Why is it better for the majority of users? I can see why it is better for the list providers, but what are the benefits for users by keeping it instead of having a two step change detection that reliably tell them if the list content has changed ?

I doubt the majority of used list servers used by pihole users is capable of sending the correct HTTP response.
Even one of the two remaining "default" lists don't do it. Here is the log of the second consecutive run of pihole -g

  [✓] Preparing new gravity database
  [i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [✓] Status: Retrieval successful
  [i] Received 57337 domains

  [i] Target: https://mirror1.malwaredomains.com/files/justdomains
  [✓] Status: No changes detected
  [i] Received 26853 domains

Give me data then. Poll users, accumulate a popularity of lists and then ask the list providers what servers they use.

It's better for the users because it doesn't piss off providers and cause them to deny list use to users.

Why are you omitting that I already second that pihole should keep the change detection on HTTP response?
I'm not asking to replace it (anymore), but to extend it.

But you still insist that not extending is better for users?

You have to show why it would be better for users. So far you have shown why it's better for you.

If you've changed the scope of the Feature Request to be something different than this one then make a new one.

Follow up FR as requested by @DanSchaper.