Option to ignore domains from appearing in the Query Log

+1 to this request.

As a user
I would like to exclude specific domains from logging at all
So that I can enhance my privacy OR help to minimise the amount of logging for very noisy domains ensuring that my dashboard view is actually useful and not just a bunch of noise.

Ideally for me, this would just be a check box when creating allow or block list entries, where per entry you could choose if logging takes place. Approached in the right way, wouldn't this actually reduce the overall load on the piHole (we are talking about the removal for the logging operation for most of the traffic in my house, and then the removal of the requirement to keep, maintain and make available in searches).

Specifically the top blocked domains would benefit the most, and this would be the primary information most users are keen to see. For me this is currently filled with device-metrics-us-2.amazon.com, audible.sc.omtrdc.net, app-measurement.com, c.amazon-adsystem.com, pagead2.googlesyndication.com, pubads.g.doubleclick.net, csi.gstatic.com, 2mdn.net, tlx.3lift.com.

Thank you to the piHole team for their work and dedication to helping us all live a more secure and less risky digital life. Hope this request raised by @bertoost can be considered at some point.

1 Like

PS - I've been thinking about running two piHoles in serial, using the first one to be my block no logging, and the second to operate as my normal piHole. This would achieve what is outlined above, but certainly not in the way that I'd like to approach it.

You have the option today to drop any or all of these from the top lists.

Thanks @jfb that does cover some of the requirement for the dashboard, as some point a more broad ability would be very welcome.

+1, would be useful for me too.

3 posts were split to a new topic: Root zone . can't be excluded from the top lists

A post was merged into an existing topic: Single dot domain/DNS root zone query issue with containerized wireguard

+1. I have a friend whose Roku devices make his Query Log largely unusable due to how noisy they are, an order of magnitude above the surrounding numbers. It would be useful to be able to not show selected domains in the Query Log. But I'm also mindful of @DL6ER's comment about how using filters, especially wildcards, across the databases will scale badly and impact performance, and his other comment about avoiding yet more filter lists.

Would it be possible to implement this just for rendering the static Query Log, and just up to 100 items?

  • Long-term query database logging – no change in behaviour
  • /var/log/pihole.log logging – no change in behaviour
  • API exporting – no change in behaviour
  • Query Log with show all (queries.php?all) – no change in behaviour
  • The entries in Settings > API / Web interface > Top Domains & Top Clients – continue to function as they do today, with a new additional interpretation below
  • Query Log with 10, 25, 50, 100, All selected – exclude unwanted domains when the page is manually loaded
  • Unwanted domains are the same ones listed in the aforementioned two sections

This would satisfy my friend's needs, since his disproportionately noisy devices would no longer dominate his Query Log when inspecting the current activity, and they already would be excluded from the Top lists in the Dashboard.

I suspect this is how most people would make use of being able to block noisy domains. It's not for system-wide filtering, just to silence awkward domains when viewing current activity. I am guessing that for most casual home users with the likes of Roku and Samsung devices, being able to see the 100-entry Query Log is good enough if they can do so without these devices showing.

Would this implementation allow domains to be exluded from the Query Log without the scaling, performance and management impacts?

It doesn't really feel like a pi-hole base code issue, more like a web browser UI issue to me. I would build a web UI like the pi-hole dashboard by connecting to a back-end database and using Ajax to do a continuous data feed to the webpage, while using the browser Javascript engine to do the local filtering of display items. This uses the local computing resources and not the pi-hole device and shouldn't impact DNS performance.

It's not that simple because the Query Log is not populated from the FTL database (except at startup). The FTL database is updated from the underlying Query Log source once per minute by default.

I've opened a little project here to try and create a colourised scrolling log with selected domains filtered out, so that the visual experience is similar to the Query Log, but it's coming from the dnsmasq live log and is not browser based at all.

If you filter locally then when you select to display n items from the Query Log you will render fewer than n items. The Pi-hole UI isn't informed of local browser filtering which breaks the selection vs results returned.

Therefore I think any such filtering would a) have to be started and completed on a single instance by instance basis, b) populated from live log data, on the Pi-hole side, to reach the chosen number of selected entries for that instance, and c) then rendered in the UI for the browser to display post-processing.

I can see how it appears to need a full redo of the framework to allow something like this, which on the surface seems trivial. There are search engines and databases that do filtering of millions of items per second, but they are optimized for that activity and the FTL database wasn't built with that in mind. What about grabbing a very large local buffer of data and using local resources to filter that for display, while still appending current data to it every minute, sort of a FIFO thing? That means that even for the large number of items being filtered out of the UI display, the buffer will still be full of actual data that will be shown to the user. Just thinking out loud...

My mesh router is unfortunately hard-coded into pinging a domain many times a minute and over the year it is going to gather up thousands of useless clutter within the log that i currently cannot block, because it's an essential one to be able to access. This feature would single-handedly solve that issue for me and possibly the others in the same situation, or for some similar reason.

This is a big yes vote from me.

1 Like

The best would be if the audited Domains would optionally not show up in the log.
This would make the log so much more usable.

Hello! If I may chime in, would it be possible and of help if regex was allowed under API settings > Top Lists to exclude permutations of blocked (or allowed) domains appearing in the Query Log? See below my example:

|2023-07-26 18:37:05|A|icnlc9z65n6fhf74h1tkzqd1dlgs11690388057.uaid.nmrodam.com|firetv.fritz.box|Blocked (regex blacklist)|IP (0.0ms)||
|2023-07-26 18:37:04|A|lo22p9fjb6d6vsmx51a4lk38u8wm91690386585.uaid.nmrodam.com|firetv.fritz.box|Blocked (regex blacklist)|IP (0.1ms)||
|2023-07-26 18:37:03|A|2n9ssrky7au15gj6ky0fqc4p74hg71690384153.uaid.nmrodam.com|firetv.fritz.box|Blocked (regex blacklist)|IP (0.1ms)||
|2023-07-26 18:37:02|A|r7gh6lbc8i424btzgwc1eft9yvuvd1690380265.uaid.nmrodam.com|firetv.fritz.box|Blocked (regex blacklist)|IP (0.1ms)||
|2023-07-26 18:37:01|A|1urkletd6eki3f5p5e84jghva9dcl1690376354.uaid.nmrodam.com|firetv.fritz.box|Blocked (regex blacklist)|IP (0.1ms)||
|2023-07-26 18:37:00|A|vhldyspu5skn6s8ghb7t5ex7ujcik1690374396.uaid.nmrodam.com|firetv.fritz.box|Blocked (regex blacklist)|IP (0.1ms)||
|2023-07-26 18:36:59|A|oe4rc9ghb7di9b7opqms3lu847mgx1690370955.uaid.nmrodam.com|firetv.fritz.box|Blocked (regex blacklist)|IP (0.1ms)||
|2023-07-26 18:36:58|A|202i65zr7e016zc26ww8ghb7tz7ws1690368587.uaid.nmrodam.com|firetv.fritz.box|Blocked (regex blacklist)|IP (0.1ms)||

And:

Top Blocked Domains

Domain Hits Frequency
[secure-eu.nmrodam.com] 348
[emwzdatl1k4hg7dhxim8tv7m4ebl01689738405.uaid.nmrodam.com] 70
[nxzf7ghbxg448u7yu26y2wl9vlc081689752810.uaid.nmrodam.com] 70
[5r687k06f8ghbbucqccu1sps63tzi1689767214.uaid.nmrodam.com] 70
[j64ktybmzedv1fghjjwrafa5dg79rc1re31689781619.uaid.nmrodam.com] 70
[hcd0bheb0uml59i3kjoh6qmhlhiyz1689791317.uaid.nmrodam.com] 70
[0hwdhv8f1q41x2y1b1lhde8ggz7m6kd89801102.uaid.nmrodam.com] 70
[71spmo7hfd9ghbdngyxeecnoof2z31689803053.uaid.nmrodam.com] 70
[q7s6f9b95bqwobtm25uuajtu0p3v81689805007.uaid.nmrodam.com] 70
[msc4cy1r0y1p90ef9bheiq3vfieno1689806958.uaid.nmrodam.com] 70

These flood the Top List.

Thanks for sharing your thoughts! :slight_smile:

I think we can discuss this once again. Let me summarize what we want so we are not talking at cross-purposes here:

Where should domains be filtered?

  1. On the main page
    1.1. Top Domains
    1.2. Top Blocked/Ads
  2. In the Query Log

Anywhere else?

And then we define a list of regex that are used to do the exclusions. This is of course a very costly process as every domain that is to be shown first needs to get checked. When we add this filter at displaying time, we will not have any effect on the performance-critical point of the DNS resolution.
The effort may be negligible for one regex you are using on a Pi-hole installed on an x86_64 machine but less acceptable for a Raspberry Pi 2B-based Pi-hole and hundred regular expressions. Anyway, we can just try this out and see what happens. In the end, this will only affect those that manually (!) add regular expressions here so nothing that would slow it down for every user.

1 Like

I believe most posters here would want filtering of the [web UI's] Query Log, myself included.

I still want to see the Top Blocked / Allowed graphs f.ex. when I fire up my gaming PC the PiHole gets thousands of NVIDIA telemetry events (all blocked) and I still want to know that it's doing that. BUT when inspecting the Query Log I want to be able to say "OK I know about these 17 biggest offenders but please show me everything else". Example: if I visit a website that mysteriously fails to render then I'll want to be able to quickly understand why by looking at the Query Log and ignoring the usual suspects that I know are not responsible.

I will opt in for that feature, should you implement it.

2 Likes

That aligns with my earlier post in this topic. If the filtering was just for the Query Log then the long-term database would presumably not be impacted and operate as it does now without a performance hit. And when viewing the Query Log, that's the point where a single round of filtering cleans it up. Would the points I made in that earlier post be useful?

As @dimitarvp says it's just to allow the Query Log without the noise of a device that floods it. A single Roku device or the Velop mesh floods the Query Log and make it near impossible to pick out a web page domain that needs whitelisting, since the domain has already been carried off the page in the seconds it takes to view the Query Log.

I don't mind – and in fact, want – the long-term logging of the noise because I want that to accurately represent what Pi-hole saw. The long-term database isn't a problem though since it is used to search for things so filtering is applied by the user in normal operation.

Thanks for looking at this.

1 Like

I implemented a very first draft where the effect is indeed limited to the Query Log only. This is what is left when adding the regex exclusion rule [a-z]:

If you pay close attention to the text below the Query Log, you'll see that this filtering is indirectly obvious from the UI.

Any number of regular expressions is supported. Note that above mentioned performance-warning stays: Every domain to be displayed has to be matched against every of you configured regular expressions. That being said, there will be no performance hit if the array of regular expressions is empty.

This is implemented in the realm of the ongoing Pi-hole v6.0 development on branch new/queryLogRegex:

4 Likes

This feature has been merged into the v6.0 beta code. You will not only be able to hide domain on a regex basis but als clients (both by hostname and IP address). The same filters will apply both to the Query Log and the tables on the dashboard.

2 Likes

I think most of us want to have a clean query log to find the interesting domains more easily.

The best would be if the audited Domains would optionally not show up in the log.
Audited domains have been already checked and need no further attention, at least for a while.

With this implemented it should also be possible to browse the Audit log for easy changes.