Option to ignore domains from appearing in the Query Log

At least for localhost, there is already a feature for this in FTLDNS:

1 Like

This only prevents hits TO localhost, right?

It will not ignore legitimate hits to different hosts, originating from localhost

As in, localhost querying github will still show, right?

No, see the description of this config option:

IGNORE_LOCALHOST=no|yes (Should FTL ignore queries coming from the local machine?)

It's a good feature however, I personally would still like to see if locahost is "dialing" out and where EXCEPT for self queries.

It would be just a cosmetic thing I agree.

What I was thinking about is an approach like the API/Web Interface Exclude options.

Not only in the Top lists but also Query log.

So if a query is placed to localhost and/or localhost.local, or any domain specified in the Exclude Boxes, strip it from being displayed ...

I see, but, unfortunately, there is a (significant!) performance penalty connected to this feature which is the sole reason for why we have not added it yet. Assume we have 15 clients on a typical system with 20.000 queries within 24 hours. Then, the filter would have to be applied 15*20.000 = 300.000 times as we would have to check in each and every query for each and every domain individually if we want to show them or not. Not even going to mention how wildcard comparisons might perform (they are much more costly in terms of computations time). Sure, there are ways to code around this like using tree methods and similar techniques, but this would make the code still slower (albeit not as dramatic) but also much harder to maintain).

1 Like

If you enable ssh to your pi-hole, you can tail -f the log file and run that through whatever you like. I detail doing that here for a roku, but it is more or less the same for any device.

@DL6ER what about ignoring the entire client itself other than localhost? For example, IGNORE_CLIENT=10.0.0.13 does it affect any performance?

1 Like

If we add it now, users will want to have an arbitrary long list of clients to be ignored in the future. It would have a notable performance penalty as each entry in the entire Query Log would have to be matched against each of the clients to be ignored. Whilst possible, this would surely have a bad influence.

We're (internally) working on a new API implementation. This API will provide the Query Log of the future and may be able to handle this better. As it will also support pagination, it would have to check against a significantly lower number of queries on each individual request reducing the performance penalty notably.

2 Likes

I agree. I don't mind as much if they are still stored in the logs under the hood, but it would be nice to be able to filter items out on the Query Log in the web client. Bonus points if you can make the block stats do the same.

I see, but, unfortunately, there is a (significant!) performance penalty connected to this feature which is the sole reason for why we have not added it yet. Assume we have 15 clients on a typical system with 20.000 queries within 24 hours. Then, the filter would have to be applied 15*20.000 = 300.000 times as we would have to check in each and every query for each and every domain individually if we want to show them or not.

There's no need to check each domain individually (excluding wildcard functionality). Searching a Hashset of excluded domains would be trivially fast, assuming users don't put 50k domains in.

Another option would be to maintain a separate "filtered" bit and set it appropriately when the request is serviced. If the user changes the filters, reprocess all requests and reset the bits appropriately. This would remove the need to do the filtering on the web client.

The issue is not how long it takes to figure out if one query should be filtered out or not, it's that we have to check each query. It takes O(1) to check a hash set (theoretically), but since you do that N times it takes O(n).

As @DL6ER noted, we are solving these filtering questions in the API, where it is easier to use things like hash sets.

Yes, the additional filter check would add O(n) to request processing, but that process must already be at least O(n). You would be adding 1 to the multiplier.

Or is the problem that Hashsets aren't available at the level where the web interface would be doing the filtering?

The web interface does not do the filtering, that currently happens in FTL. Because FTL is in C, it does not have many nice things like hash maps or hash sets built in. The API is in Rust, which does have those features as part of the standard library. For FTL to do this level of filtering, it would either need to implement a hash set/map (complex, not fun) or use a slower approach with an array (fastest reasonable approach would be binary search, O(n log n) to filter all of the queries). There can be hundreds of thousands of queries, so we want to keep filtering performant.

2 Likes

any news here?
or its already implemened? how i can exclude domains from querylog (to see better the other domains)

1 Like

There are two complimentary requests regarding not logging certain resolutions to queries table: this one about ignoring certain domains and another about ignoring certain clients.

Pi-Hole already knows how to very efficiently decide what to do with requests based on the set of rules. It would be amazing to be able to also define what it should do with logging based on a similar set of rules (e.g. client's group, domain, decision to block or allow etc). It should cover both requests very nicely.

In my particular situation I have a Chinese IoT that tries to access baidu.com every couple of seconds. It is always blocked but it does not prevent anything, and the FTL database grows several hundred thousands records every week. Deleting them manually and truncating statistics is very boring to say the least...

2 Likes

Would love to see this feature! When my Samsung TV is on it hits it's ad network which I've blocked but would be nice to not see my query logs always full of the same thing.

2 Likes

New to discussion but it feels like the devs misunderstood the request. I believe OP wanted the ability to simply not see certain domains in the query log screen in the web interface. This would be in the Rust code as I understand, not in the C code. It literally would be a line of code such as [if domain I'm about to write to the screen is not in this list then ...] if that makes sense. At least this is what I want to be able to do.

There are some domains that I don't even care to log, eg. doubleclick, googleadservices, or google-analytics ... Yep, everybody and their dog uses these on their websites, and even my dogs fitness tracker tries to send analytics. Logging these queries is just wasting space and causing unnecessary disk writes. There are other threads where this is made more clear.

Just adding my vote... some polling services query so much causing the log to be cluttered with useless items, which makes it difficult to find the actually useful info you're looking for :confused:

Perhaps a different approach would be to:

  1. add the ability to exclude domains/clients via the query log filter
  2. add an optional setting in the settings page to automatically exclude certain domains/clients, so when opening the query log the filter would automatically be set to exclude these domains

Already 4 yeas old, I wonder why this gets so few attention :frowning:
Actually this is a bug, as the actual status affects the usability of the query log and blows resources. It should be possible to completely drop domains from any logging.

Comparing an incoming domain against yet another list will have an performance impact on Pi-hole as this is obviously something that has to happen after the query is received but before anything is logged.