Gravity database

Whenever gravity is run (manual, pihole -g OR automatic, sunday run) The new gravity database is build from the blocklists and, eventually swapped to become the active one. This method was found to be the most efficient one during the v5.0 beta tests.

My request: Don't simply erase the old database, keep it on disk, this to allow users to determine which domains have been added / removed since the last run. It often happens something suddenly doesn't work anymore on a sunday (due to added domains from a blocklist), it is possible to find the cause, using the query log, but it's impossible to determine if it's a new domain (recently added) or it already existed (last week), unless the user starts checking all the (some compressed) pihole logs.
By having the ability to determine the blocked domain is recently (last gravity run) added, the user can get insurance / gain confidence he's doing the right thing, whitelisting the domain.

There shouldn't be a disk space problem, the old (active) and new (to be activated) databases do exist on the disk until 'swapping databases' is executed. As opposed to deleting the old one, simply rename it, and erase it at the start of a new gravity run.

This hardly change anything to the inner working of pihole, only the moment the old database is removed changes.

Eventually, totally optional, an additional web interface page could be added, to indicate the added / removed domains, compared to the last gravity run.

I can see some value in keeping the old database for a diff around. There would still be a memory impact as we'd need three databases in parallel (in the worst case). Assume a gravity run fails for whatever reason. The current database is kept in this case. However, with your proposal we'd already have deleted the gravity_old.db file so the diff function would be broken until the next pihole -g run. My gravity.db is 4.5 MB, so keeping the database three times is okay for most users. Those with humongous lists should just ensure they have large enough disks. Doesn't sound like an issue to me.

How large is your gravity database?

How log does computing a diff take?" I'm aware that this is a very subjective question. Assume your pihole -g run will not change anything at all. Comparing the two indices will be fairly quick, how long does it run?

  1. disable a notable amount of lists,
  2. run pihole -g,
  3. copy the database sudo cp /etc/pihole/gravity.db /etc/pihole/gravity_old.db,
  4. re-enable all lists you use,
  5. run pihole -g,
  6. run the command
     time sqlite3 /etc/pihole/gravity.db "ATTACH \"/etc/pihole/gravity_old.db\" AS olddb; SELECT DISTINCT domain FROM gravity WHERE domain NOT IN (SELECT domain FROM olddb.gravity);" > delta
    

my database is currently (before disabling any lists) 258.788Kb (252.7Mb). Large, I know, the advantage is I can actually produce realistic time measurements, this to verify everything remains acceptable, even with a lot of lists.

executing the script I communicated to you, with time.
first test (new entries, but no new blocks):

real    0m14.776s
user    0m13.819s
sys     0m0.685s

second test (new entries, forced a new block entry, using dig):

real    0m14.434s
user    0m13.949s
sys     0m0.477s

third test (your scenario, my script, disabled 33% of the lists)
new gravity (everything enabled): 236.216Kb
old gravity (33% disabled): 89.632Kb
12 new blocks found by the script...

real    0m36.610s
user    0m29.579s
sys     0m0.818s

fourth test (your scenario, your sql statement, disabled 33% of the lists)
delta contains 1.543.114 entries, size 32.664Kb

real    0m22.354s
user    0m20.417s
sys     0m0.530s

all of the test performed on raspberry pi 3B, uname -a: Linux raspberrypi 5.10.17-v7+ #1403 SMP Mon Feb 22 11:29:51 GMT 2021 armv7l GNU/Linux

edit
for reference:

 time pihole -q doubleclick.net
  [i] Over 100 results found for doubleclick.net
        This can be overridden using the -all option

real    0m7.940s
user    0m7.307s
sys     0m0.816s

/edit