Long-Term statistics for Pi-hole


#21

Do you mean show “yesterdays data” as button to show it instead of the current data? That might be no too complicated, but I’m not sure what that would bring us.

Here we have the problem that it is not guaranteed that pihole.log.1 contains data that is contiguous with respect to pihole.log. It should be, but it is not guaranteed. How to deal with situations where there is a large gap in between both files (I have such a situation, currently, as I moved pihole.log via a symlink to another place, pihole.log.1 is about one week old). Also, when you flush early (there should be no need to flush anything anymore, but who knows), then the current data gets shifted into pihole.log.1. How to deal with that (I think we should remove the possibility to be able to flush the log entirely)?

FTL might also be able to read all .1, .2, etc. files when we turn off compression and may run periodically some garbage collection that will automatically remove data that is older than some time X (defaulting to 24h). This may help in guaranteeing that you can always see continuous 24h. But it will be a lot of work and many more moving gears in the box.


#22

Yes, simply show the data and query log from yesterday. If you workday ends @ 18h00, you currently have no clue what happened between 18h00 and midnight (unless you look at pihole.log.1 manually).

If a user willingly flushes his log, he should be aware the data isn’t going to be complete/inaccurate. My current system up time is 36 days, even the upgrade(s) to v2.13 and v2.13.1 did not affect the log(s). If a user willingly (re)installs the system there simply won’t be a pihole.log.1, so don’t show the option.


#23

Such a check box would require a conceptual change for FTL's internal data structure, since the statistics (like overTime data, but also the total number of queries, etc.) are pre-calculated so that they can just be sent out whenever a request arrives it. The main reason is that there will still be tons of slower devices (single-core ARMs like Pi v1 or Zero) that will see significantly degraded performance when I’d have to compute the statistics on the fly.

In addition, this kind of check box would translate into something like "show content of pihole.log or content of pihole.log.1" seems quite artificial to me and is likely to have to change again in the near future because different users what to see different results.

I’m more thinking into the direction of analyzing the multiple files that logrotate leaves behind and show the data of the latest e.g. 24 hours (i.e. no sharp cutting at midnight and maybe even an adjustable time interval). For that I’m thinking about something like this:

  1. Read all pihole.log.* log files that can be found in /var/log/
  2. Accept only those data that are within the set time interval (e.g. now minus 24 hours)
  3. Compute all the statistics
  4. Wait for clients to request something
  5. In some time interval, e.g. once per hour, throw away the data and go to point 1.

Doing it like this (instead of having to come up with some GC algorithm, which will always be error-prone), would be quite simple to implement and would be flexible enough to serve the needs for the majority of users. What do you think?


#24

Just for the fun of it, I changed line 15 of /var/www/html/admin/scripts/pi-hole/php/data.php into:

$logListName = checkfile("/var/log/pihole.log.1");

and got everything that happened yesterday (both dashboard and querylog)

That’s all I need to know (yesterday).

It should be fairly easy to extract the first date entry from the log you’re parsing, and show that as a reference, as opposed to a log name or “today” / “yesterday”

I don’t like the idea of moving views. Suppose you find an entry (blocked or allowed) on page 15 of the query list. You get distracted (a phone call, mail, whatever). By the time you look back, the entry is no longer on page 15, an administrators nightmare. Looking at static logs (yesterdays) implies you can find the entry all trough the day, in the same place.

I don’t know what FTL is going to do, or how it will handle log entries, but the origin of the data will always be pihole.log (or wherever you let dnsmasq write to)


#25

You misunderstand what is happening, let me clarify this here. The query log page content is generated as a whole and then transferred into the cache of the requesting client (i.e. your browser). Hence, it will never change, whatever you do (except if you reload the entire page, of course). You can leave the query log page opened for weeks or years and it will still show the very same data. This is true for the current version of the API (PHP based) and will still be true for the future version of the API (FTL based).

I’m more thinking about the statistics (incl. the over time data and the Top Lists) being a moving number (show always the data within the most recent X hours).

One of the reasons for its speed is that it does not process the log data when it is asked for some quantity (like the total number of DNS queries), but is always running in the background and updating all the numbers live. Hence, I cannot simply change the source of the data, because that would require a complete reprocessing of the logs (which would still be fast, but will certainly add some hundred milliseconds to each request).


#26

You could just keep the data(base) from the previous day(s) and start the background collection again @ midnight in a new data set.

Static views are much more interesting. Some scenario’s:

  • A “call home” is detected. Consulting the previous day(s) data will make it easy to determine if the call is at a random time or always on a set time.
  • informing a co-worker off an interesting DNS entry (previous day). Since the data is static, you can just mention the day, page and line number.
  • Checking / evaluating the DNS calls. A static log is considered processed, once the end is reached.

On every page, you just need to create a button “previous day” and “next day”, if appropriate.


#27

I see what you are targeting at. A database-connection is in the making, but still out-of-scope for the soon™ to be released version of Pi-hole featuring the FTL engine. It is rather being worked on in the background (e.g. by @PromoFaux) and there are several ideas floating around but we have not decided on what to do exactly.

The major reason for having FTL is that we want to see even low-performance devices like the first generation Raspberries or the Pi Zero to be usable without too much delays on the dashboard. I currently have a stress test running on a Pi B+ and it performs basically as if you have a log with less than 1000 entries with today’s API:

Assume they are querying some.home.loc all the time, you might still use either the link on the Top Domains/Ads tables (if it shows up there, but it should if you notice it) or use the search field on the query log. As the new way will always show (at least) 24 consecutive hours, your analysis capabilities are enhanced compared to the current version (Pi-hole v2.xx).

Instead of mentioning the page and line number it seems to make much more sense to just provide either the link to a fixed time frame (click on the dashboard over time graph), provide a link from the Top Lists, or use the search function on the Query Log page. Providing a page / line on the full Query Log page is generally meaningless, since the log will continue to grow and the newest entries are added on page 1 (shifting everything to the back).


#28

Not sure of the current status of this - but buttons to choose 1day, 3day, 7days of data would be cool. Or some other time increment w/o building giant log files.


#29

As far as I’ve heard so far, the long-term statistics front end is planned but …

Because

Looking through FTL's code there is no obvious sign that there is active development going on in this direction. It looks like FTL is pretty much converged. It seems there are one or two sporadic crashes seen by a very limited subset of users. Since debugging them with only being able to communicate indirectly (via Discourse, GitHub or direct mails) is quite slow, this currently delays the advent of FTL.

However, you could still get some sort of long-term statistics as has also already been mentioned here: If you disable logrotation altogether, you will see your logs from this point on. But that might not what you are looking for in the end…


#30

I confirm what @MrD said and have nothing to add.


#31

What does FTL mean ?

btw, for the graphs, has dygraphs been considered ? For large datasets, it is the fastest dataplotter out there (even NASA is using it).


#32

FTL - Faster Than Light

We are currently running into speed issues on the server side, not the client side, so I don’t think a different client graph renderer would be much benefit (and might require some rewriting to make it work).


#33

Hey, new to PiHole here and really loving it.
As far as I´ve seen PiHole is made for a maximum of 24 hour logging. I´ve tried to disable the daily flush in cron but the results aren´t convincing. The graph for “Queries over time” isn´t accurate and shows (i guess) the data from the first day recorded. The “DNS Queries Blocked Todys” and all the other values are counted up correctly.

I really like the Pi-Hole project but I´d like something that gives me a better long-term feedback like a hourly, daily, weekly, monthly and even yearly feedback (incl. graph) like I´m used at projects like nagios. So far my querylog grows about 3MB per day which isn´t too big. I think even on a minimal raspberry pi 2/3 system it could hold several weeks to months.

If this request is over the top for this tiny, damn cool and easy to use project is there a better/more suitable solution for my wishes?


#34

You are too late, look here :wink:


#35

Well, actually, this has been requested way earlier as well :wink:

Well, Yes and No. The current version of the Web Interface can reach its limitations even within a single day. It all depends about your number of clients and their browsing activities.

However, with the upcoming FTL engine we are able to revisit this idea. I will open a new FR for this so we have a final (?) place to discuss about this.


#36

I would like to see long-term statistics. I do like the simple a quick dashboard as it is but can envision a new link to history/statics. I also worry about performance (I run on a Pi Zero W) and ease of setup (I’m a Linux noob). I do have a 2nd Pi Zero W that I would be happy to test builds on (if the instructions can be dumbed down to Windows Server Admin level :smirk: )


#37

you could probably set up a script to output a daily api call to a csv file.


#38

BUUUUUTTT I’m LAZY…

And really I’m a novice, so setting up scripts and doing outputs to CSV’s and then graphing the data is well beyond my skills(at the moment).


#39

Proposed beta version of the long-term statistics has two functions (so far):


#40