Pihole -g / List download / disappointing performance

doesn't work (failed to upload, but you can find the log here (I have removed the link to the log, as there is a solution, see edit)

edit
solved the debug upload problem, entries have been split of in a new topic, it is a firewall problem, port 9998 should be opened to bothouse.pi-hole.net.
/edit

Here is another indicator of a problem with your install.

Edit - potential problem.

I, and many system administrators, have come to the conclusions upgrades have to be avoided, if possible.
So whenever there is a new version of:

  • Raspbian
  • unbound
  • pihole
    I prefer to setup a new system, as opposed to upgrading. With pihole 4, this took me about 40 minutes, including adding my whitelist entries, regex entries and adlists entries + pihole -g. Since it is all scripted, it isn't that hard, can do it over lunch. If I have to wait for 1.5 hours for pihole -g to complete it's first run, I will be tempted to skip the result (e.g. check if it all works) and pihole will get a bad name...

I know it's my choice to setup from scratch over and over again, but I'm sure I'm not the only one that does this. The big advantage is you have an SD card to go back to with a fully operational system, incase of problems (they did happen in the past....)

That may be so, but I've been active on this forum for a long time, I've seen a lot of topics that report a failure to upload the debug log. I've never seen the requirements, needed to be able to upload, nor a solution for the failure to do so, apart from simply copy paste the log in the topic. Further more, I don't see the connection between being unable to upload the log and this performance issue.

77 list less than 2 min for me.

You are not running the distributed version of V5.0 beta, so we're comparing your local code to the released code. The common frame of reference should be a stock install of V5.0 as distributed.

*** [ DIAGNOSING ]: Core version
[i] Core: v4.3.2 (https://discourse.pi-hole.net/t/how-do-i-update-pi-hole/249)
[i] Branch: release/v5.0 (https://discourse.pi-hole.net/t/the-pihole-command-with-examples/738#checkout)
[i] Commit: v4.3.2-386-g29f06a4-dirty
[i] Status:  M advanced/Scripts/database_migration/gravity-db.sh
            ?? advanced/Scripts/database_migration/gravity/9_to_10.sql
[i] Diff: diff --git a/advanced/Scripts/database_migration/gravity-db.sh b/advanced/Scripts/database_migration/gravity-db.sh
          old mode 100644
          new mode 100755
          index 184b3a4..5fe8c93
          --- a/advanced/Scripts/database_migration/gravity-db.sh
          +++ b/advanced/Scripts/database_migration/gravity-db.sh
          @@ -87,4 +87,11 @@ upgrade_gravityDB(){
           		sqlite3 "${database}" < "${scriptPath}/8_to_9.sql"
           		version=9
           	fi
          +		if [[ "$version" == "8" ]]; then
          +		# This migration fixes some issues that were introduced
          +		# in the previous migration script.
          +		echo -e "  ${INFO} Upgrading gravity database from version 9 to 10"
          +		sqlite3 "${database}" < "${scriptPath}/9_to_10.sql"
          +		version=10
          +	fi
           }

If you are having issues connecting to the log uploader then there may be other network issues that are causing list downloads to be slow. While there may be some ways to increase gravity performance (DL is literally a genius) we should look at the whole picture.

No. FTL v5.0 does not need any gravity runs for any list manipulations. We just send a suitable signal to pihole-FTL (automatically done for you) and FTL will found out for itself what changed.

pihole -g is really run only once weekly (unless invoked manually).


While we're at it, I should mention that FTL can be informed about list changes in three ways:

  • pihole restartdns: Full restart of pihole-FTL (slowest)
  • pihole restartdns reload: Looks for updates to the lists, flushes dnsmasq's DNS cache (faster than with v4.x)
  • pihole restartdns reload-lists: Looks for updates to the lists WITHOUT flushing dnsmasq's DNS cache (new, basically instant)

As I indicated in this topic to @DL6ER I've added the proposed database upgrade version 10 to my system. Somebody has to test them, I was expecting this would be appreciated...

Testing is appreciated, but for troubleshooting a beta 5.0 issue it would be best to be running the distributed software and eliminate a potential variable.

The list downloads work fine. The processing message appears almost immediately, further more, if it was a network problem, this would also be the case in pihole v4.

edit
as you can see in the debug log, I'm also using local lists (file:///home/pi/BL/tracker/domains). Processing is also slow for this list.
/edit

I know you guys keep repeating that 30 seconds is fine etc for the standard lists, but the reality is that a large percentage of users have millions of domains spread across many blocklists. This is of course a matter of preference; I prefer a smaller list personally. I am not criticising either as I know you can't please everybody.

The problem you're going to have is that people are going to be used to the original pihole -g speed before the added complexities of adlist ids etc. It will be seen as a step backwards, where in many ways we know it in actual fact greatly improves functionality.

Might it be an idea to focus more on 'sync', than truncating the table and then starting a fresh? This might also solve the issue if lists aren't accessible during the gravity run.

I mainly work in python now so I'm not too familiar with the limitations of bash, but surely, for each url provider, you could instead make comparisons and generate an array of additions / removals and then make only the necessary changes? Say inserting a couple thousand urls instead of millions every time gravity is run? The time you'd lose in the extra checks would be gained back easily in DB updates?

this is a fair assumption, ref this topic.

using the default block lists only feels like driving a sports car round the block in a 20Mph zone and convince yourself it's a powerfull car.

The default block lists do NOT take regional domains into account, one must add at least some regional lists.

edit
A long time ago, I wrote a topic about testing the value of new lists. I still use this on pihole v4, it works, the result is useful to the decision whether or not to add the new list.
So whenever I add a lists, I verify if is appropriate adding it (low new entries value -> don't add)...
/edit

No. Mass-injections are very fast. Lookups for each domain and then going into an if is very slow. This is a severe limitation of bash. It is similar for python. One of the first things you learn in python: Avoid long loops! Second thing you learn: Avoid long loops with many branches inside.

1 Like

I'm working really hard on this topic, I'm not taking anything lightly here. When I'm not responding quickly, then it's because I'm deeply inside the code, all mental abilities bound.

My first iteration of performance improvements is done, I will look at this again tomorrow. It's too late already over here in Europe.

See for yourself:

Switched to branch 'release/v5.0'

pi@munichpi:/etc/.pihole# time pihole -g
  [i] Neutrino emissions detected...
  [βœ“] Pulling blocklist source list into range

  [βœ“] Flushing gravity table
  [i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [βœ“] Status: Retrieval successful
  [βœ“] Adding adlist with ID 1 to database table

  [i] Target: https://mirror1.malwaredomains.com/files/justdomains
  [βœ“] Status: Retrieval successful
  [βœ“] Adding adlist with ID 2 to database table

  [i] Target: http://sysctl.org/cameleon/hosts
  [βœ“] Status: Retrieval successful
  [βœ“] Adding adlist with ID 3 to database table

  [i] Target: https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist
  [βœ“] Status: Retrieval successful
  [βœ“] Adding adlist with ID 4 to database table

  [i] Target: https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
  [βœ“] Status: Retrieval successful
  [βœ“] Adding adlist with ID 5 to database table

  [i] Target: https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
  [βœ“] Status: Retrieval successful
  [βœ“] Adding adlist with ID 6 to database table

  [i] Target: https://hosts-file.net/ad_servers.txt
  [βœ“] Status: Retrieval successful
  [βœ“] Adding adlist with ID 7 to database table

  [i] Number of gravity domains: 146591 (124042 unique domains)
  [i] Number of exact blacklisted domains: 0
  [i] Number of regex blacklist filters: 2
  [i] Number of exact whitelisted domains: 0
  [i] Number of regex whitelist filters: 0
  [βœ“] Cleaning up stray matter

  [βœ“] Flushing DNS cache
  [βœ“] DNS service is running
  [βœ“] Pi-hole blocking is Enabled

real    0m49,244s
user    0m10,144s
sys     0m2,527s

versus

Switched to branch 'tweak/gravity_performance'

pi@munichpi:/etc/.pihole# time pihole -g
  [i] Neutrino emissions detected...           
  [βœ“] Pulling blocklist source list into range                             
                                                                                 
  [βœ“] Preparing new gravity table                   
  [i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [βœ“] Status: Retrieval successful     
                                                                              
  [i] Target: https://mirror1.malwaredomains.com/files/justdomains  
  [βœ“] Status: Retrieval successful                                      
                                                                         
  [i] Target: http://sysctl.org/cameleon/hosts                          
  [βœ“] Status: Retrieval successful                                          
                                               
  [i] Target: https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist                                             
  [βœ“] Status: Retrieval successful                                                               
                                                                       
  [i] Target: https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
  [βœ“] Status: Retrieval successful                           
                                                                
  [i] Target: https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
  [βœ“] Status: Retrieval successful                                                                          
                                            
  [i] Target: https://hosts-file.net/ad_servers.txt
  [βœ“] Status: Retrieval successful    
                                                                
  [βœ“] Storing downloaded domains in new gravity table
  [βœ“] Activating new gravity table                                       
  [i] Number of gravity domains: 146591 (124042 unique domains)
  [i] Number of exact blacklisted domains: 0                                 
  [i] Number of regex blacklist filters: 2                                               
  [i] Number of exact whitelisted domains: 0            
  [i] Number of regex whitelist filters: 0          
  [βœ“] Cleaning up stray matter                                          

  [βœ“] Flushing DNS cache
  [βœ“] DNS service is running
  [βœ“] Pi-hole blocking is Enabled

real    0m33,544s
user    0m9,753s
sys     0m2,026s

This is on a RPi 3B with the stock lists. You can try it yourself with

pihole checkout core tweak/gravity_performance

However, note that there are no guarantees that everything works as expected. Although it should.
This also already features the idea of populating an alternative table called gravity_new and only swapping them in the end. I haven't had the time to check if this leaves all triggers and views intact, however, I hope so.

2 Likes

Appreciate the time you're putting in @DL6ER. Please don't take my comments negatively; I just think overall the update from v4 to v5 has been such an improvement that it would just be a shame if it were to stumble a little on gravity updates. It's only meant in a constructive manner.

If it were entirely run in the background it would be fine, but if it's after tweaks by the user in the UI etc it could put them off a bit if it looks like it's just hanging.

1 Like

No UI changes need gravity runs. See

@DL6ER your effort is much appreciated. I don't intend to offend any of the developers with my comments, just trying to prevent massive negative responses for this issue when the final v5 is released.

Thanks again to all the developers for your time and effort.

If you're going that way, download first, process later, could you please reinstate the code to use the cached versions, in case of download failure (part of the original question - first entry of the topic)

I will install 2 new systems tomorrow, one with beta5 + database v10 and one with gravity_performance, using my adlist, regex and whitelist entries. This will take a while, I'll report back the processing times.

Good night from Europe (Belgium).

1 Like

I don't feel offended at all. It will be interesting to see your comparison.

Yeah, sure, I'm currently working on the performance. The cached files are something I will look into as well at some point. But it is something different. The code to re-use the cached versions (if they are available) is still present. I suspect there is some misguided rm somewhere. Let's see tomorrow.

edit It turns out we keep them, but we moved them into the migration_backup directory. This is now fixed.

2 Likes