Since I felt that a 2D index would make postponing of index creation more favourable, I decided to showcase what could be gained theoretically if you switched from an automatic primary key index to a manual composite index.
Edit: Note that this is a synthetic, isolated showcase, not a full blown performance evaluation.
That latter would take more careful testcase rigging, aligning them better with your target design (I deliberately eliminated primary keys, totally ignoring whether that would meet your needs), run on a range of more relevant hardware, take into consideration more side effects and inter-table issues (which I have eliminated completely), account better for the execution enironment (i.e. especially of runtime behaviour of SQLite3 over time), provide deeper coverage of its additional performance-relevant paramaters (which -frankly- I am totally oblivious of for SQLite) and above all, an excessive number of iterations to raise the lot size to minimal statistical significance or even a desired level of confidence.
As such, take the resulting numbers as an indication only, not as hard evidence.
I first helped myself to a Beta 5.0 Pi-hole on a separate machine.
I also added a few blocklist to get a decent size, using Wally3K's non-crossed lists, currently 62 entries strong, resulting in just under 1.7 millions blocked domains.
I then extracted the DDL for the gravity table from Pi-hole's database:
sqlite3 ".schema gravity"
CREATE TABLE gravity
(
domain TEXT NOT NULL,
adlist_id INTEGER NOT NULL REFERENCES adlist (id),
PRIMARY KEY(domain, adlist_id)
);
For my isolated showcase, I decided to create two stripped down versions of the table:
- compgravity, defining a composite primary key, mimicing the actual table
CREATE TABLE compgrav
(
domain TEXT NOT NULL,
adlist_id NOT NULL,
PRIMARY KEY(domain, adlist_id)
);
- indxgravity - omitting a primary key in favour of manual index creation
CREATE TABLE indxgravity
(
domain TEXT NOT NULL,
adlist_id NOT NULL,
);
CREATE UNIQUE INDEX idx_indxgrav_domain_adlist ON indxgravity (domain, adlist_id);
Next, I generated a `gravimport` flat file for bulk loading by a full export of Pi-hole's gravity table:
.mode csv gravity
.once /etc/pihole/gravimport
select * from gravity;
Finally, I prepared a bulk load `.import` for each table with the following snippets
- comp-import for compgravity
DELETE FROM compgravity;
.mode csv
.import gravimport compgravity
SELECT count(*) from compgravity;
- indx-import for indxgravity
DROP TABLE indxgravity;
CREATE TABLE indxgravity( domain TEXT NOT NULL, adlist_id NOT NULL );
.mode csv
.import gravimport indxgravity
CREATE UNIQUE INDEX idx_indxgrav_domain_adlist ON indxgravity (domain, adlist_id);
SELECT count(*) from indxgravity;
I then measured the time it took to bulk load my 1.7m blocklists into each table respectively:
- into compgravity
time ( sudo -u pihole sqlite3 /etc/pihole/gravity.db < comp-import )
1.692.085
real 3m43.597s
user 2m42.503s
sys 0m15.124s
- into indxgravity
time ( sudo -u pihole sqlite3 /etc/pihole/gravity.db < indx-import )
1.692.085
real 1m48.947s
user 1m37.152s
sys 0m5.072s
As both tables were empty on that initial run, I repeated the same commands for a second time.
I also ran a VACUUM before each run.
The resulting times are as follows:
For those just looking at the table without reading:
Times are purely for bulkloading gravity - they dont account for a complete blocklist update
|
initial comp |
initial indx |
decrease |
repeated comp |
repeated indx |
decrease |
real |
03:43,597 |
01:48,947 |
-51,28% |
04:16,282 |
02:17,995 |
-46,16% |
user |
02:42,503 |
01:37,152 |
-40,22% |
02:42,670 |
01:38,488 |
-39,46% |
sys |
00:15,124 |
00:05,072 |
-66,46% |
00:21,126 |
00:10,318 |
-51,16% |
This shows that manually creating a index has some potential to decrease bulk loading times.
I ran my tests on RPi Zero W. This is a single core CPU - achievable gain might turn out to be smaller with multi-cores.
As populating the gravity table is only part of the blacklist import process, you have to decide whether the potential decrease in time offsets the work needed to go with a manual composite index instead of a composite primary key.
Also, you are the only ones to know if and how the rest of your DB design would cope with this shift.
That said, I wouldn't expect a manual index to be significantly slower than the automatic index enforced by a composite primary key.
If needed, you could expand the table by an additional integer id column as artificial primary key that would also auto-alias with the rowid
shadow column (which wouldn't be possible for the composite key).
Note that I would expect the performance gain to grow disproportionate to the number of blacklist entries, as insertion cost would rise with each database record.
In other words, I would expect larger manual indexed bulkloads to be faster relatively when compared to composite primary key bulkloads, e.g. 55% for 2m blacklist entries as opposed to 46% for the showcased 1.7m, or a mere 25% for just 150k.
I haven't got the statistics to prove this, as I am shying away from the effort to prepare another scenario (has taken a few hours so far), and I forgot to take exact numbers on my first test runs with only 130k blocked domains (recall 12 vs 15 secs though).
I hope this helps somehow