Long-term statistics without individual query data

Hey,
DL6ER posted in the original thread about long-term statistics this (in Feb 2017):

Looking at the current implementation this has changed (or was never implemented in the mentioned way). I can query the DB for a specific time and date and it will show me complete individual queries with all information on query type, domain, device, status....
Because I don't want each and every DNS query in my network to be logged and saved (in cleartext) on my Pi I decided to deactivate long term logging with this parameter.
But nevertheless I would like to have long term statistics, just not that specific. Is there a option for this? If not would it be possible to implement this feature?

1 Like

Yes, you are right. The reason why I said that was that I was afraid that this would use way too much space if we do it. However, after actually implementing and testing it, we saw that a standard at home scenario with a few fairly active clients will result in databases with less than 500 MB in file size and we decided that this should always be possible. The benefits of this are clear as you can compute any statistics for any time interval you want and do that probably eben for statistics we might only implement in the future. You can, for example, now also (fairly) easily query what domain is most often requested by a single client or if a suspicious domain you have seen just today has also already been queried in the past and if this query was then blocked or not. You see, there are basically endless opportunities for inspecting your data retrospectively if you keep all logs and since the size of the database seems manageable (systems with many clients should also have larger disks, on average), we decided early on to follow this path.

Yes, we could add an additional table that just contains ever growing counters for total and blocked queries. Albeit being a bit boring (they will not be able to filter specific time intervals), it would be a straightforward endeavor. I move this to the feature requests section so people can vote for it.

If I find some time over carnival, I might add this right away as it should not be much to be done.

P.S.: You should re-active your long term database (and even if it is only for the most recent two days) as future versions of FTL will load historic data (the most recent 24 hours) from the database in the future. This increases the startup speed of FTL significantly as we do not have to analyze the log in so much detail each time but can use the already "digested" and properly analyzed data from the database. With this, the file pihole.log.1 also becomes obsolete.

1 Like

Done, but it won't make it into the (already internally reviewed and tested) v3.0 release.

Thank you for your work and your help!
The general all time statistics are good.
Wouldn't it be possible to compute a lot more detailed statistics (at least the graphs, top sites could be a bit more complicated) by just "compressing" the information in the logs to the things really needed? With: "client A made a A/AAAA request that was blocked/forwarded" basic statistics on 1) query type, 2) clients and 3) percentage of blocked queries would be possible. In a even more minimized version, this could mean: "from 16:50 to 17:00 PiHole was asked for 786 DNS records". In this case, you could filter specific time intervals and have at least some kind of statistics without any sensitive information on the clients.
I know this is of course a lot more complicated compared to the counter you made. Maybe this would be more kind of a long term milestone (if someone else also wants it...)?

You can open a new topic in the Feature Requests area, people would then be able to vote for your request and help us judge which features should have a higher priority.

We have an (inofficial) long-term road map which already includes the idea of having different "privacy levels". Much of the code is already written but not yet tested, but will appear at some point for public testing.

The basic idea is to give users more fine grained settings for setting with how much detail they are comfortable. The currently defined levels are:

  1. Permit everything (no privacy filtering)
  2. Obfuscate domains names (replace domains by hidden, Top Domains and Top Ads not available)
  3. Obfuscate domains and clients (replace domains + clients by hidden + 0.0.0.0, all Top Lists unavailable)
  4. Maximum privacy (Top Lists, Query Log, and database aren't updated with any data)

Thank you for the clarification. That sounds good to me.

I will open a new FR about the privacy levels. Thank you.