The database is slow and unoptimized

First I want to thank the authors for this useful application. I use it both at work and at home and it is very useful.
However, I have noticed that getting some data, especially analytics, is extremely slow.
As far as I know the SQLite3 database can process a million data per second (for single query and single user, while its extremely slow for concurent clients), and while I'm writing this pi-hole is still running/spinning for over 10 minutes. I am sure that this data can be obtained in a few seconds, if the database and queries are optimized (if MySQL were the database, the query would take less than a second with proper indexes, structure and queries).
I tested the database in my web application (php and SQLite3) and loaded some analytics data from a pi-hole database with over a million records. I received detailed analytical data with few seconds (7-15 seconds and for me its still slow).
Therefore, slowness mostly occurs due to unoptimized queries and database work.
SQLite3 is good and fast for small and simple database and queries, but as soon as the database grows and requires a little more complex queries, the database is simply extremely slow, whether you like it or not and if multiple competing clients send queries to the database, it becomes almost unreadable due to slowness.

There is no point in waiting for half an hour for the analytical data to be displayed.
This is a minor constructive criticism. I hope you won't mind.

Thanks and all the best.

I find SQLite3 to be quite responsive with Pi-hole's databases.

Please share your SQL statement that you observe to take so long exactly as you ran it.

i did that over a year ago at work (will see if i can find the script).
However, I just looked in api_db.php and I noticed at first glance a piece of code (the query) that potentially slows down the database drastically. For instance (line 319):

    $sqlcommand = "
        SELECT
            (timestamp / :interval) * :interval AS interval,
            SUM(CASE
                WHEN status !=0 THEN 1
                ELSE 0
            END) AS domains,
            SUM(CASE
                WHEN status IN (1,4,5,6,7,8,9,10,11,15,16) THEN 1
                ELSE 0
            END) AS blocked
        FROM queries
        WHERE $limit
        GROUP BY interval
        ORDER BY interval";

SUM, COUNT and CASE will run very slowly on large databases unless the indexes are well defined.
Also, mathematical operations like (timestamp / :interval) * :interval should be avoided within the query because they slow down the query and the database a lot when it comes to larger databases.

For example, my pi-hole database is currently over 5GB and only this query above takes over 2 minutes (for a period of a month or longer). If I want to get all the data (interval all time) it will takes 5-10 minutes or more. This is also tested and executed just from command line (without web app).

Sometimes it is better to make several queries than to combine everything within one (it all depends on the structure of the database and the logic used).

There are still plenty of examples where queries are not optimized and slow down the database (specially getAllQueries request)

The snippet you've quoted is a template, not a specific SQL statement, and you are a bit vague on the parameters you've used ('for a period of a month or longer').
And of course, individually observed times would depend on hardware as well as database sizes.

When running this for a 600 seconds interval for the last 30 days, it takes just over a second on my ARMv7 machine, serving the db from an sdcard, with ~150,000 queries for that interval.

But as you refer to api_db.php, I wonder whether your observations would be referring to responsiveness of Pi-hole's UI, rather then raw SQL statements?

The major impact on times observed when using Pi-hole's web UI would be the web UI itself, not the database.

Note that Pi-hole v5 uses PHP for rendering the results, which adds its own overhead, especially since v5 does not support server-side pagination. The most recent 30 days take about 13 seconds on my machine, so a good dozen times longer than direct SQL.
For large result sets, that may even fail completely, if data would exceed PHPs memory limits.

This will change in the upcoming Pi-hole v6 (currently in beta), which will support server-side pagination and won't use PHP any more.

If you'd like to propose specific, concrete SQL improvements, you are most welcome to join the current Pi-hole V6 Beta Testing and contribute your improvements.

1 Like

Thanks for reply.

I wonder whether your observations would be referring to responsiveness of Pi-hole's UI, rather then raw SQL statements?

It was specifically SQL statements, because I tested directly via the command line (no interface).
There is not a single delay of responsiveness of Pi-hole's UI.
The hardware is a server computer with 48 CPUs... we have over 300 devices/users on the network at the same time using pi-hole.
I already mentioned above what the problem is with queries. And specifically this query (you wrote template) is used with parameters api_db.php?getGraphData from, until and interval
I think it's not a problem with php because php can handle much more, but the way it is used.
You can always execute queries from command line (without UI) and see that there's problem with queries and large database. If your query executes for more than few seconds. there's problem with queries, logic and optimizations (and maybe database structure), not programming/scripting language itself.
For instance, you mentioned server-side load. It could be solved for example with lazy load in php while user is scrolling. Or even use pagination logic. I think SQLite supports LIMIT offset, rows. But queries need to be optimized (do not use mathematical operations directly in query, index SUMs, add field for CASE instead of use in query, do not use subqueries SELECT specially not with CASE, try with LEFT JOIN, index data so you do not need to have additional query etc.). This will boost performance at least 20 times. So there's not a problem with php/UI but the way it's used.

I will check Pi-hole V6 Beta Testing as you mentioned.

There is just no way to have specific indices prepared for each and every possible query. This would quickly create such an overhead that updating the indices themselves will be more work summed together than can possibly be saved at query time.
When users want some specific magic to happen, they are completely free to add indices themselves and FTL/sqlite3 will automagically keep updating those for them as new rows are added.
But this also means that some queries simply require a full database scan and this will always be slow when your database is in the gigabytes range.

I totally agree that this shouldn't be the case for queries done by PI-hole itself and that the query you mentioned above is a prime example for slowness. I will not argue about this. However, this particular query is already gone in v6 because it was always slow and there was no real gain from these graphs.

This is not very relevant because SQLite3 itself is single-threaded. Pi-hole v6 has made some efforts to get the database engine multi-threading but this is severely limited and only applies when the work can evenly be partitioned for workers, check out SQLITE_DEFAULT_WORKER_THREADS setting the maximum number of auxiliary threads that a single prepared statement will launch to assist it with a query. If not specified, the default maximum is 0 which is what most (maybe all) operating system provided sqlite3 binaries will give you. Pi-hole sets this to which - theoretically - up to four-fold the speed of queries invoked using pihole-FTL sqlite3 instead of the system-provided sqlite3 binary.

If you find any queries in the v6.0 beta that can be improved, I'd absolutely love to see details. I can tell you that I do not love adding more indices to the database but if there is good justification to do so, we will surely consider this as well. Thanks for looking into this matter!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.