Use database entries to speed up gravity

gravity.sh is using the parselist function to extract valid domains.

There may be a way to speed up this process, avoiding the use of the parselist function; parselist isn't really necessary in the folowing cases:

  • no changes detected (httpcode 304)
  • cached list (if available)

As opposed to using parselist to extract the domains, wouldn't it be more efficient to use sqlite3 to extract the domains from the database, e.g.

pihole-FTL sqlite3 "/etc/pihole/gravity db" "SELECT domain from gravity where adlist_id IS '${adlistID}';

adlistID is a variable, used in gravity.sh
the sqllite3 query results could be used to create the file '${target}', a variable used in the parselist function.

since "No changes detected" appears multiple times in my pihole_updateGravity.log, this may be a way to speed up the process...

Adlists are downloaded as text files.

Committing them to the database is an expensive operation.
AFAIAAO, the update script already tries to avoid that if the list content hasn't changed, which may also already be indicated by the HTTP reply.

Running SQL statements would only be possible once the altered list would have been transferred to the database.

Have you done any measurements? Dumping hundreds of thousands (possibly millions for humongous lists) of domains from the database onto disk only for reimporting them in the next moment does not really sound convincing at first glance, either TBH.

Looking at the code (gravity.sh is rather complex), what happens is:

  1. (attempt) to download the list
    2.a. HTTP response 304 -> parseList "${adlistID}" "${saveLocation}" "${target}" (line 714 in current master)
    2.b. download failed, cached list available -> parseList "${adlistID}" "${saveLocation}" "${target}" (line 742 in current master)
  2. parselist function does a things (sed, invalid, duplicate, suppress unusable, list invalid, collect some stats - I may of missed something here)

All of the actions are not needed, because the list hasn't changed, neither has the result.

At the time the parsing is done, the old database is still active, so submitting the above query and store the result in "${target}" eliminates all the work, done in the parselist function.

correct, writing the sqllite3 query results to disk takes more time than expected -> close feature request, apologies for wasting your time...

I've made an attempt to change the code and run "time pihole -g", this to supply measurements. This is of course highly dependent on the number of lists in use.

I've been running pihole -g several times, so there are lots of "no changes detected"
I've been using the gravity.sh version with ABP-style

original, using parselist:

[i] Number of gravity domains: 3037143 (2363762 unique domains)
real    5m30.617s
user    4m3.264s
sys     0m22.031s

new, replacing only the parselist line for httpCode 304 with the sql statement:

[i] Number of gravity domains: 3037144 (2363762 unique domains)
real    5m42.636s
user    4m6.046s
sys     0m25.315s

P.S. gravity.sh doesn't recognize an unchanged list from the local file system e.g.

file:///home/pi/blocklistproject/blocklistproject

is always processed, even if there is no change...

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.