Pihole -g / List download / disappointing performance

I'm not questioning this here, you cannot know what we did during the (quite extended) investigations that did for this massive change. That's why I'm just summarizing this. New ideas are never bad, however, if they have already been tried, we can skip them.

Yes and No, an in-memory database won't work. However, a duplicated on-disk might work.

image

image

Do recall experiencing good results with this approach (different DB (Oracle), however).

Alternating tables (or whole schemas, if need be) will also allow for validation of data before activating it, while leaving the system fully operational (well, depending on cpu) for the time of both import and validation.

Would .importing into an existing table vs. .importing in a newly created one have any performance impact?

EDIT:
I am not sure if any of the following would apply to SQLite or Pi-hole, or if @DL6ER already employed equivalent tactics - but for it's worth, go and take what is useful from my crumbled memories and just discard the rest:

I struggle remembering how exactly we tackled a similar problem over a decade ago (importing several 100k of structured records from a bunch of flat files into several database tables on a daily schedule).

I think we switched off indexes when merging imports into existing tables, reenabling them only after load completion, and postponed index creation until after bulk loading into newly created tables. This resulted in the most dramatic gain by almost an order of magnitude, even exceeding on tables with several indexes. Also, switching all indexes to upper case exclusively had some impact on the larger merge tables, if minor.

As for the tables we chose to merge, diff'ing a file against its direct predecessor and working on that diff result proved to be faster than any in-process evaluation we tried.

Doesn't pihole -g run every time you add a blocklist and click save and update? There already was a topic about not being able to upload multiple blocklists, using the web interface. Hit the 'save and update' button (by accident) instead of the 'save' button and your day is over...

How often do you change things? Seems like a one time cost for setting things up. Amortize the pain.

I hear the issue, I'm just trying to see what the overall impact is.

And if that takes out your entire day, well...

doesn't work (failed to upload, but you can find the log here (I have removed the link to the log, as there is a solution, see edit)

edit
solved the debug upload problem, entries have been split of in a new topic, it is a firewall problem, port 9998 should be opened to bothouse.pi-hole.net.
/edit

Here is another indicator of a problem with your install.

Edit - potential problem.

I, and many system administrators, have come to the conclusions upgrades have to be avoided, if possible.
So whenever there is a new version of:

  • Raspbian
  • unbound
  • pihole
    I prefer to setup a new system, as opposed to upgrading. With pihole 4, this took me about 40 minutes, including adding my whitelist entries, regex entries and adlists entries + pihole -g. Since it is all scripted, it isn't that hard, can do it over lunch. If I have to wait for 1.5 hours for pihole -g to complete it's first run, I will be tempted to skip the result (e.g. check if it all works) and pihole will get a bad name...

I know it's my choice to setup from scratch over and over again, but I'm sure I'm not the only one that does this. The big advantage is you have an SD card to go back to with a fully operational system, incase of problems (they did happen in the past....)

That may be so, but I've been active on this forum for a long time, I've seen a lot of topics that report a failure to upload the debug log. I've never seen the requirements, needed to be able to upload, nor a solution for the failure to do so, apart from simply copy paste the log in the topic. Further more, I don't see the connection between being unable to upload the log and this performance issue.

77 list less than 2 min for me.

You are not running the distributed version of V5.0 beta, so we're comparing your local code to the released code. The common frame of reference should be a stock install of V5.0 as distributed.

*** [ DIAGNOSING ]: Core version
[i] Core: v4.3.2 (https://discourse.pi-hole.net/t/how-do-i-update-pi-hole/249)
[i] Branch: release/v5.0 (https://discourse.pi-hole.net/t/the-pihole-command-with-examples/738#checkout)
[i] Commit: v4.3.2-386-g29f06a4-dirty
[i] Status:  M advanced/Scripts/database_migration/gravity-db.sh
            ?? advanced/Scripts/database_migration/gravity/9_to_10.sql
[i] Diff: diff --git a/advanced/Scripts/database_migration/gravity-db.sh b/advanced/Scripts/database_migration/gravity-db.sh
          old mode 100644
          new mode 100755
          index 184b3a4..5fe8c93
          --- a/advanced/Scripts/database_migration/gravity-db.sh
          +++ b/advanced/Scripts/database_migration/gravity-db.sh
          @@ -87,4 +87,11 @@ upgrade_gravityDB(){
           		sqlite3 "${database}" < "${scriptPath}/8_to_9.sql"
           		version=9
           	fi
          +		if [[ "$version" == "8" ]]; then
          +		# This migration fixes some issues that were introduced
          +		# in the previous migration script.
          +		echo -e "  ${INFO} Upgrading gravity database from version 9 to 10"
          +		sqlite3 "${database}" < "${scriptPath}/9_to_10.sql"
          +		version=10
          +	fi
           }

If you are having issues connecting to the log uploader then there may be other network issues that are causing list downloads to be slow. While there may be some ways to increase gravity performance (DL is literally a genius) we should look at the whole picture.

No. FTL v5.0 does not need any gravity runs for any list manipulations. We just send a suitable signal to pihole-FTL (automatically done for you) and FTL will found out for itself what changed.

pihole -g is really run only once weekly (unless invoked manually).


While we're at it, I should mention that FTL can be informed about list changes in three ways:

  • pihole restartdns: Full restart of pihole-FTL (slowest)
  • pihole restartdns reload: Looks for updates to the lists, flushes dnsmasq's DNS cache (faster than with v4.x)
  • pihole restartdns reload-lists: Looks for updates to the lists WITHOUT flushing dnsmasq's DNS cache (new, basically instant)

As I indicated in this topic to @DL6ER I've added the proposed database upgrade version 10 to my system. Somebody has to test them, I was expecting this would be appreciated...

Testing is appreciated, but for troubleshooting a beta 5.0 issue it would be best to be running the distributed software and eliminate a potential variable.

The list downloads work fine. The processing message appears almost immediately, further more, if it was a network problem, this would also be the case in pihole v4.

edit
as you can see in the debug log, I'm also using local lists (file:///home/pi/BL/tracker/domains). Processing is also slow for this list.
/edit

I know you guys keep repeating that 30 seconds is fine etc for the standard lists, but the reality is that a large percentage of users have millions of domains spread across many blocklists. This is of course a matter of preference; I prefer a smaller list personally. I am not criticising either as I know you can't please everybody.

The problem you're going to have is that people are going to be used to the original pihole -g speed before the added complexities of adlist ids etc. It will be seen as a step backwards, where in many ways we know it in actual fact greatly improves functionality.

Might it be an idea to focus more on 'sync', than truncating the table and then starting a fresh? This might also solve the issue if lists aren't accessible during the gravity run.

I mainly work in python now so I'm not too familiar with the limitations of bash, but surely, for each url provider, you could instead make comparisons and generate an array of additions / removals and then make only the necessary changes? Say inserting a couple thousand urls instead of millions every time gravity is run? The time you'd lose in the extra checks would be gained back easily in DB updates?

this is a fair assumption, ref this topic.

using the default block lists only feels like driving a sports car round the block in a 20Mph zone and convince yourself it's a powerfull car.

The default block lists do NOT take regional domains into account, one must add at least some regional lists.

edit
A long time ago, I wrote a topic about testing the value of new lists. I still use this on pihole v4, it works, the result is useful to the decision whether or not to add the new list.
So whenever I add a lists, I verify if is appropriate adding it (low new entries value -> don't add)...
/edit

No. Mass-injections are very fast. Lookups for each domain and then going into an if is very slow. This is a severe limitation of bash. It is similar for python. One of the first things you learn in python: Avoid long loops! Second thing you learn: Avoid long loops with many branches inside.

1 Like

I'm working really hard on this topic, I'm not taking anything lightly here. When I'm not responding quickly, then it's because I'm deeply inside the code, all mental abilities bound.

My first iteration of performance improvements is done, I will look at this again tomorrow. It's too late already over here in Europe.

See for yourself:

Switched to branch 'release/v5.0'

pi@munichpi:/etc/.pihole# time pihole -g
  [i] Neutrino emissions detected...
  [✓] Pulling blocklist source list into range

  [✓] Flushing gravity table
  [i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [✓] Status: Retrieval successful
  [✓] Adding adlist with ID 1 to database table

  [i] Target: https://mirror1.malwaredomains.com/files/justdomains
  [✓] Status: Retrieval successful
  [✓] Adding adlist with ID 2 to database table

  [i] Target: http://sysctl.org/cameleon/hosts
  [✓] Status: Retrieval successful
  [✓] Adding adlist with ID 3 to database table

  [i] Target: https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist
  [✓] Status: Retrieval successful
  [✓] Adding adlist with ID 4 to database table

  [i] Target: https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
  [✓] Status: Retrieval successful
  [✓] Adding adlist with ID 5 to database table

  [i] Target: https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
  [✓] Status: Retrieval successful
  [✓] Adding adlist with ID 6 to database table

  [i] Target: https://hosts-file.net/ad_servers.txt
  [✓] Status: Retrieval successful
  [✓] Adding adlist with ID 7 to database table

  [i] Number of gravity domains: 146591 (124042 unique domains)
  [i] Number of exact blacklisted domains: 0
  [i] Number of regex blacklist filters: 2
  [i] Number of exact whitelisted domains: 0
  [i] Number of regex whitelist filters: 0
  [✓] Cleaning up stray matter

  [✓] Flushing DNS cache
  [✓] DNS service is running
  [✓] Pi-hole blocking is Enabled

real    0m49,244s
user    0m10,144s
sys     0m2,527s

versus

Switched to branch 'tweak/gravity_performance'

pi@munichpi:/etc/.pihole# time pihole -g
  [i] Neutrino emissions detected...           
  [✓] Pulling blocklist source list into range                             
                                                                                 
  [✓] Preparing new gravity table                   
  [i] Target: https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
  [✓] Status: Retrieval successful     
                                                                              
  [i] Target: https://mirror1.malwaredomains.com/files/justdomains  
  [✓] Status: Retrieval successful                                      
                                                                         
  [i] Target: http://sysctl.org/cameleon/hosts                          
  [✓] Status: Retrieval successful                                          
                                               
  [i] Target: https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist                                             
  [✓] Status: Retrieval successful                                                               
                                                                       
  [i] Target: https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
  [✓] Status: Retrieval successful                           
                                                                
  [i] Target: https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
  [✓] Status: Retrieval successful                                                                          
                                            
  [i] Target: https://hosts-file.net/ad_servers.txt
  [✓] Status: Retrieval successful    
                                                                
  [✓] Storing downloaded domains in new gravity table
  [✓] Activating new gravity table                                       
  [i] Number of gravity domains: 146591 (124042 unique domains)
  [i] Number of exact blacklisted domains: 0                                 
  [i] Number of regex blacklist filters: 2                                               
  [i] Number of exact whitelisted domains: 0            
  [i] Number of regex whitelist filters: 0          
  [✓] Cleaning up stray matter                                          

  [✓] Flushing DNS cache
  [✓] DNS service is running
  [✓] Pi-hole blocking is Enabled

real    0m33,544s
user    0m9,753s
sys     0m2,026s

This is on a RPi 3B with the stock lists. You can try it yourself with

pihole checkout core tweak/gravity_performance

However, note that there are no guarantees that everything works as expected. Although it should.
This also already features the idea of populating an alternative table called gravity_new and only swapping them in the end. I haven't had the time to check if this leaves all triggers and views intact, however, I hope so.

2 Likes