Pihole -g / List download / disappointing performance

DL6ER · January 23, 2020, 7:29pm

I isolated this by analyzing what the SQLite3 byteengine does when we DROP the table. I will not go into any details (it actually generates a very complex code with 62 instructions). However, it is a long loop that has to go through and erase the records in the database file that deal with the table before it can be dropped.
Uncircumventable.

jpgpi250 · January 23, 2020, 7:40pm

The only remaining questions I have:

How long and when doesn't pihole-FTL have access to the data (which steps)?
Would it make sense to remove gravity_old as the last step in the pihole -g process. Everything would than be up and running, long before the script has completed.

I also noticed pihole_FTL did not start fast enough, according to pihole -g, a retry counter (120 seconds) ran on my screen. This appears to be a little excessive, I checked the status in another putty session, pihole-FTL was up and running long before the timer expired. Multiple short (30 seconds) retry test loops?

DanSchaper · January 23, 2020, 7:49pm

That's the max time to wait, the check is once per second.

github.com/pi-hole/pi-hole

gravity.sh

e41c4b5bb


      
          
          # Determine error output message
          if pidof ${resolver} &> /dev/null; then
            echo -e "  ${CROSS} DNS resolution is currently unavailable"
          else
            echo -e "  ${CROSS} DNS service is not running"
            "${PIHOLE_COMMAND}" restartdns
          fi
          
          # Ensure DNS server is given time to be resolvable
          secs="120"
          echo -ne "  ${INFO} Time until retry: ${secs}"
          until timeout 1 getent hosts "${lookupDomain}" &> /dev/null; do
            [[ "${secs:-}" -eq 0 ]] && break
            echo -ne "${OVER}  ${INFO} Time until retry: ${secs}"
            : $((secs--))
            sleep 1
          done
          
          # Try again
          gravity_CheckDNSResolutionAvailable

jpgpi250 · January 23, 2020, 7:54pm

I don't want to argue here, but I specifically noticed, using sudo service pihole-FTL status, pihole-FTL was reporting a proper (green) state after less than 60 seconds, however the counter ran to 0.
According to the code (if I'm smart enough to understand it - NOT), this would mean DNS resolution was unavailable for 120 seconds. Correct?

DanSchaper · January 23, 2020, 7:57pm

Green status where? On the service output?

jpgpi250 · January 23, 2020, 7:59pm

DanSchaper · January 23, 2020, 8:03pm

That just says the daemon is up and running. Doesn't measure anything about being able to resolve.

getent hosts "${lookupDomain}" which translates to getent hosts pi.hole is the test. The timer will continue until that condition is true or it reaches 0 and the assumption is that things are broken beyond our capability to repair.

DanSchaper · January 23, 2020, 8:07pm

Ah, if there is no locallist file then "raw.githubusercontent.com" is used. Are you blocking that?

github.com/pi-hole/pi-hole

gravity.sh

e41c4b5bb


      
            else
              echo -e "  ${INFO} Pi-hole blocking is enabled"
            fi
          }
          
          # Determine if DNS resolution is available before proceeding
          gravity_CheckDNSResolutionAvailable() {
            local lookupDomain="pi.hole"
          
            # Determine if $localList does not exist
            if [[ ! -e "${localList}" ]]; then
              lookupDomain="raw.githubusercontent.com"
            fi
          
            # Determine if $lookupDomain is resolvable
            if timeout 1 getent hosts "${lookupDomain}" &> /dev/null; then
              # Print confirmation of resolvability if it had previously failed
              if [[ -n "${secs:-}" ]]; then
                echo -e "${OVER}  ${TICK} DNS resolution is now available\\n"
              fi
              return 0

DL6ER · January 23, 2020, 8:10pm

It doesn't have access to the data in between the time 3 starts and 5 ends (where 3-4 is basically not measurable and only 5 counts here). sqlite3 blocks the entire access to the database during this time.

Some pihole-FTL.log excerpt:

[2020-01-23 20:56:34.257 2746] **** new UDP query[A] "doubleclick.net" from 127.0.0.1 (ID 3242, FTL 3420, src/dnsmasq/forward.c:1571)                                                                                             
[2020-01-23 20:56:34.257 2746] doubleclick.net is not known                                                                           
[2020-01-23 20:56:34.258 2746] domain_in_list("doubleclick.net"): Database is busy, assuming domain is NOT on list                                
[2020-01-23 20:56:34.258 2746] domain_in_list("doubleclick.net"): Database is busy, assuming domain is NOT on list                     
[2020-01-23 20:56:34.258 2746] domain_in_list("doubleclick.net"): Database is busy, assuming domain is NOT on list                         
[2020-01-23 20:56:34.258 2746] **** forwarded doubleclick.net to 127.0.0.1 (ID 3242, src/dnsmasq/forward.c:558)
[2020-01-23 20:56:34.301 2746] doubleclick.net is known as not to be blocked
[2020-01-23 20:56:34.302 2746] **** got reply doubleclick.net is 172.217.20.238 (ID 3242, src/dnsmasq/cache.c:487)

I just tried it, it makes no difference. It just moves the time when gravity cannot be read to a later point. Access to the database is not available during the DROP command. Not even read-only.

edit Next iteration would be looking at two separate database files...

jpgpi250 · January 23, 2020, 8:17pm

No, not blocking

 nslookup raw.githubusercontent.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
raw.githubusercontent.com       canonical name = github.map.fastly.net.
Name:   github.map.fastly.net
Address: 151.101.36.133

I have a local list (/etc/pihole/local.list), content:

192.168.2.57 raspberrypi
2a02:1810:4d01:1902:4e0a:8fe6:95b0:e91a raspberrypi
192.168.2.57 pi.hole
2a02:1810:4d01:1902:4e0a:8fe6:95b0:e91a pi.hole

the installer always chooses the wrong IPv6 address, there is no option to change this. I always change it later (my post install)

When I enter getent hosts pi.hole, I get:

getent hosts pi.hole
2a02:1810:4d01:1902:4e0a:8fe6:95b0:e91a pi.hole

Is it possible to let getent look for an IPv4 address? e.g use ahost (feature request)

getent ahosts pi.hole
2a02:1810:4d01:1902:4e0a:8fe6:95b0:e91a STREAM pi.hole
2a02:1810:4d01:1902:4e0a:8fe6:95b0:e91a DGRAM
2a02:1810:4d01:1902:4e0a:8fe6:95b0:e91a RAW
192.168.2.57    STREAM
192.168.2.57    DGRAM
192.168.2.57    RAW

jpgpi250 · January 23, 2020, 8:21pm

a complete duplicate of gravity.db?

DanSchaper · January 23, 2020, 8:30pm

Why? All we care is that there is a response, we don't check the value of the response.

DL6ER · January 23, 2020, 9:02pm

Yes. We can completely independently build a second file. We can copy all the lists from left-to-right very efficiently by ATTACHing and directly copying them. The memory requirement will be the same as with a second table in the same file (2 x 200MB).

The benefit will be that we can immediately swap the two files. The other can simply be removed using rm. There would be basically zero downtime for FTL with this approach,

jfb · January 23, 2020, 9:38pm

I did run the same speed test on the newest build of the tweak gravity branch, same blocklists, same Zero W with a 32 GB Class 10 SanDisk Ultra uSD card.

  [i] Number of gravity domains: 3106983 (2137836 unique domains)
  [i] Number of exact blacklisted domains: 10
  [i] Number of regex blacklist filters: 29
  [i] Number of exact whitelisted domains: 138
  [i] Number of regex whitelist filters: 0
 
real	22m2.758s
user	9m2.923s
sys	2m12.282s

For comparison, the same run yesterday on the dev branch (without the gravity tweaks), took about 4 minutes longer.

  [i] Number of gravity domains: 3100840 (2137802 unique domains)

real	25m56.066s

DL6ER · January 23, 2020, 11:16pm

I implemented this. It saves me another 2 seconds as the swapping action is instantaneous now. There is zero downtime for FTL's gravity blocking now.

Bucking_Horn · January 23, 2020, 11:18pm

Since I felt that a 2D index would make postponing of index creation more favourable, I decided to showcase what could be gained theoretically if you switched from an automatic primary key index to a manual composite index.

Edit: Note that this is a synthetic, isolated showcase, not a full blown performance evaluation.

That latter would take more careful testcase rigging, aligning them better with your target design (I deliberately eliminated primary keys, totally ignoring whether that would meet your needs), run on a range of more relevant hardware, take into consideration more side effects and inter-table issues (which I have eliminated completely), account better for the execution enironment (i.e. especially of runtime behaviour of SQLite3 over time), provide deeper coverage of its additional performance-relevant paramaters (which -frankly- I am totally oblivious of for SQLite) and above all, an excessive number of iterations to raise the lot size to minimal statistical significance or even a desired level of confidence.

As such, take the resulting numbers as an indication only, not as hard evidence.

I first helped myself to a Beta 5.0 Pi-hole on a separate machine.
I also added a few blocklist to get a decent size, using Wally3K's non-crossed lists, currently 62 entries strong, resulting in just under 1.7 millions blocked domains.

I then extracted the DDL for the gravity table from Pi-hole's database:

sqlite3 ".schema gravity"
CREATE TABLE gravity
(
        domain TEXT NOT NULL,
        adlist_id INTEGER NOT NULL REFERENCES adlist (id),
        PRIMARY KEY(domain, adlist_id)
);

For my isolated showcase, I decided to create two stripped down versions of the table:

compgravity, defining a composite primary key, mimicing the actual table

CREATE TABLE compgrav
(
        domain TEXT NOT NULL,
        adlist_id NOT NULL,
        PRIMARY KEY(domain, adlist_id)
);

indxgravity - omitting a primary key in favour of manual index creation

CREATE TABLE indxgravity
(
        domain TEXT NOT NULL,
        adlist_id NOT NULL,
);
CREATE UNIQUE INDEX idx_indxgrav_domain_adlist ON indxgravity (domain, adlist_id);

Next, I generated a `gravimport` flat file for bulk loading by a full export of Pi-hole's gravity table:

.mode csv gravity
.once /etc/pihole/gravimport
select * from gravity;

Finally, I prepared a bulk load `.import` for each table with the following snippets

comp-import for compgravity

DELETE FROM compgravity;
.mode csv
.import gravimport compgravity
SELECT count(*) from compgravity;

indx-import for indxgravity

DROP TABLE indxgravity;
CREATE TABLE indxgravity( domain TEXT NOT NULL, adlist_id NOT NULL );
.mode csv
.import gravimport indxgravity
CREATE UNIQUE INDEX idx_indxgrav_domain_adlist ON indxgravity (domain, adlist_id);
SELECT count(*) from indxgravity;

I then measured the time it took to bulk load my 1.7m blocklists into each table respectively:

into compgravity

time ( sudo -u pihole sqlite3 /etc/pihole/gravity.db < comp-import )
1.692.085

real    3m43.597s
user    2m42.503s
sys     0m15.124s

into indxgravity

time ( sudo -u pihole sqlite3 /etc/pihole/gravity.db < indx-import )
1.692.085

real    1m48.947s
user    1m37.152s
sys     0m5.072s

As both tables were empty on that initial run, I repeated the same commands for a second time.
I also ran a VACUUM before each run.

The resulting times are as follows:

For those just looking at the table without reading:
Times are purely for bulkloading gravity - they dont account for a complete blocklist update

	initial comp	initial indx	decrease	repeated comp	repeated indx	decrease
real	03:43,597	01:48,947	-51,28%	04:16,282	02:17,995	-46,16%
user	02:42,503	01:37,152	-40,22%	02:42,670	01:38,488	-39,46%
sys	00:15,124	00:05,072	-66,46%	00:21,126	00:10,318	-51,16%

This shows that manually creating a index has some potential to decrease bulk loading times.

I ran my tests on RPi Zero W. This is a single core CPU - achievable gain might turn out to be smaller with multi-cores.

As populating the gravity table is only part of the blacklist import process, you have to decide whether the potential decrease in time offsets the work needed to go with a manual composite index instead of a composite primary key.

Also, you are the only ones to know if and how the rest of your DB design would cope with this shift.

That said, I wouldn't expect a manual index to be significantly slower than the automatic index enforced by a composite primary key.

If needed, you could expand the table by an additional integer id column as artificial primary key that would also auto-alias with the rowid shadow column (which wouldn't be possible for the composite key).

Note that I would expect the performance gain to grow disproportionate to the number of blacklist entries, as insertion cost would rise with each database record.

In other words, I would expect larger manual indexed bulkloads to be faster relatively when compared to composite primary key bulkloads, e.g. 55% for 2m blacklist entries as opposed to 46% for the showcased 1.7m, or a mere 25% for just 150k.

I haven't got the statistics to prove this, as I am shying away from the effort to prepare another scenario (has taken a few hours so far), and I forgot to take exact numbers on my first test runs with only 130k blocked domains (recall 12 vs 15 secs though).

I hope this helps somehow

DL6ER · January 23, 2020, 11:24pm

Thanks, it definitely looks interesting. We're moving to only insert to a fresh database now as it has a number of benefits.

I wonder if not omitting the foreign key (REFERNCES ...) causes any difference. It is not something I'd like to give up as it is some security measure (you cannot delete adlist columns as long as there are gravity domains still pointing at them as source).

Bucking_Horn · January 23, 2020, 11:26pm

I did that just for the purpose of showcasing, as I didn't want to recreate all of your tables - keep those refs in production mode.
The only thing I've been suggesting: Replace composite primary key constraint by manual composite index on the exact same fields, and maybe throw in an additional artificial id key if needed

P.S.: I am surprised how well SQLites .import statement performs on its own. As I said, I recall an order of magnitude in performance gains when we did something similar on an Oracle DB - but that was more than a decade ago

DL6ER · January 24, 2020, 6:13am

I tried your suggestion and hit an issue with that: Poor quality adlists with multiple entries for the same domain so I had to relax the UNIQUE constraint (it doesn't matter all that much TBH) as SQLite was not able to check this on-the-fly now.

I can see an improvement from (no statistics on those numbers)

4 seconds for an existing 2D index realized though PRIMARY KEY
to
2 + 1 = 3 seconds for Importing + index creation in two steps

Commit pushed,

jpgpi250 · January 24, 2020, 8:20am

currently, https://dbl.oisd.nl/ is unavailable, so the gravity count has severally decreased, comparing the results would NOT be real. I've sent a PM to @sjhgvr (the owner of the blocklist?) to look into that.

I can confirm a second database is created (gravity_temp.db), showing something like this, but since not all lists are available, the count is off, times should be ignored. I'll rerun the test as soon as the list is available again.

  [i] Storing downloaded domains in new gravity database...
real    0m31.184s
user    0m21.222s
sys     0m0.410s
  [✓] Storing downloaded domains in new gravity database
  [i] Building tree...
real    0m29.755s
user    0m11.662s
sys     0m1.114s
  [✓] Building tree
  [i] Swapping databases...
real    0m3.465s
user    0m0.023s
sys     0m0.020s
  [✓] Swapping databases
  [✓] Flushing DNS cache
  [i] Number of gravity domains: 2350500 (1588393 unique domains)
  [i] Number of exact blacklisted domains: 0
  [i] Number of regex blacklist filters: 20
  [i] Number of exact whitelisted domains: 32
  [i] Number of regex whitelist filters: 0
  [✓] Cleaning up stray matter

  [✓] DNS service is running
  [✓] Pi-hole blocking is Enabled

I checked the database content, all the data appears to be copied over to the new database (I don't use all the tables yet...)

edit
NOT sure, reporting anyway.
Yesterday, all lists downloaded perfectly, so a cached version of the list was available.
I was running pihole-up this morning and noticed the message:

  [i] Target: https://dbl.oisd.nl
  [✗] Status: Forbidden
  [✗] List download failed: no cached list available

Is it possible there still is a problem, using cached lists, if available?
/edit