Add more statistics on per-client basis

Dear Pi-hole team,

I would like to request more advanced statistic in particular on per-client basis as an improvement to the implemented long-term data. I could think of the following:

  1. Graphics of all queries over selected time period for a selected client (cumulated blocked/permitted)
  2. Top list over selected time period for a selected client
  3. Graphics of a singel selected domains over selected time period for a selected client

Thanks for considering these improvements.

I could see this being integrated into the already existing long-term pages, however, as a free-text input rather than a dropdown menu.

Free-text instead of pre-populated menus because I measured that both

sqlite3 /etc/pihole/pihole-FTL.db "SELECT DISTINCT(client) FROM queries;"

and

sqlite3 /etc/pihole/pihole-FTL.db "SELECT DISTINCT(domain) FROM queries;"

took rather long (almost two minutes each) before returning a result we could use to populate such lists (Raspberry Pi 3, one year of long-term database, roughly 400 MB in size). This just doesn’t seem acceptable.

Proposal (this is already working):

See

pihole checkout web new/more-long-term-filtering
1 Like

Understandable, two minutes are too long. It must scale exponentially, on my NanoPi Neo Plus2 with 150 MB pihole-FTL.db (2.3 million queries) it takes 3.5 sec to query 33 clients and 9-10 sec for 40.000 distinct domains.

For domains I’m with you - nobody wants a pre-populated menu with >40.000 entries. But with clients I’m not sure. Usually that wouldn’t be so many (arrgh… maybe with IPv6?). One quick and dirty approach could be to use the clients from network table (mmh…fails if table has been flushed). Or create an index on clients - initial building took 17 sec but later queries were down to 1.2 sec (not sure how index queries scale with CPU power and number of clients). Or normalize queries in regard to clients by creating a new clients table (which is almost done already in the network table).

Thanks for implementing this so fast. Will try it in the evening.

With an index, expected frequency of increased lookup speed for queries has to be weighed against increased insertion cost for every single added row.

In a usage scenario where you constantly insert data into a table in numbers and rarely look up data that would benefit from an index, creating an index may not always be favourable.

I wouldn’t know how Pi-hole’s database behaves in that respect, but would understand if developers decide against an index.

I’ve tested the branch and it is working. However I found two bugs and have a long wishlist.

Bugs:

  1. Clicking on ‘custom range’ opens the extended selection view but doesn’t automatically select ‘custom range’. One has the make at least one click in the date/time selector before it switches to ‘custom range’ on the left side. This behavior is the same across all pages with this selection box. Usually users will select an date/time but I saw that the box automatically selects the time one clicks on the box (say 20:03) but in custom range it rounds to full hour (8:00 pm). I wanted to get this full hour (8:00 pm) but therefor I had to select first a different time and than switch back to what I really wanted.

  2. Make the bars even in width. Maybe this is something by design I don’t understand but I get some really big bars next to some really slim ones for almost the some time periode.

Tweaks

  1. Put an ‘apply button’ outside of the date/time selector (same for long term query log). If I now enter a client/domain or select/deselect a query status I have to go back to the date/time selector to trigger the database query. Of course I could first enter a client/domain or select the query status and then select the date/time but it feels counter-intuitive to first select something that is below and than walk your way upwards.

  2. Create a cancel button. If on accident clicking “All time” pihole is busy until all data is collected. It is not even possible to switch to another page in the WEB GUI until all database action is performed.

Wishlist

  1. Allow entering hostnames instead/next of IPs
  2. Allow entering more than one client or domain
  3. Improve ‘linking’ the graph to the query log. Clicking on the graphic preserves the selected date/time rage but it would be nice the have the client/domain also preserved. For clients with known hostnames this will break until wish #1 is working (as IPs are automatically substituted by hostnames in query log).

See pi-hole#3110 for some investigations on this (link to my comment, but you should probably read the entire topic). So far, it doesn’t look like a worthwhile idea concerning the possibly substantial costs in creating these entries + the extra space they will constantly consume.
Users are obviously free to create the indices anything they feel they’d be advantageous for them. Pi-hole will maintain the indices automatically when they are present.

They look independent to my changes, do you also see them on the regular beta?

Probably yes. Do you run your Pi-hole 24/7 and do you always have devices connected to it making queries? I do see larger next to smaller bars, but in my case the Pi-hole is often powered off for a week when I’m traveling. The issues comes from SQL not being able to handle empty time intervals when aggregating IIRC. We might want to look at this again, however, I’m not convinced yet that we can find an equivalent SQL statement easily (the current one is heavily optimized to make the queries as fast as possible).

Concerning the other points, we’ll come back to them later. What you see here was just a first try done withing 10 minutes. This feature is clearly not something yet to be added to v5.0 or we will never release it. I’m more thinking about v5.1 for this.

After reading the topic I’m with you - speed and memory costs esp. on low-end devices are to high for a feature that most might never use. If one needs it often one can create an index - all others have to wait a few seconds longer.

Whats about

Will need more new code than an index but memory should not be a problem and retrieving clients list will be almost instant.


Yes I do. Will open an issue on the first one.

For the second bug: I’m running my pi 24/7 and some network stuff make a few queries even at night. I wrapped my head around the code in question and as far as I understood it it’s because of the SQL not returning “0” for times without queries (deeper: count() as aggregate function will never return “0” if there is nothing to return by an GROUP BY). But: Mcat12 has foreseen this and added a function to fill empty intervals with zeros. But somehow this function lost its vital part, because the allocation "set empty interval =0] is not there anymore.

I’ll go and open an issue on that as well. Edit found and added the (possible) solution you wrote.


No need to rush. I’m happy that you picked it up at all.

Btw: Thanks for adding client/domain filtering for Top Lists