Option to ignore domains from appearing in the Query Log

bertoost · January 2, 2018, 8:11pm

Some services are polling several times in a minute. All these entries are showing up in the Query Log. There should be an option to hide or ignore/exclude domains from the Query Log viewer.

DL6ER · January 2, 2018, 10:21pm

How about hiding the domains that is in the exclude list (left input box) also in the Query Log?

I assume that users exclude domains in there that are queries most often and which should then also be hidden on the Query Log?

In contrast to an additional list, implementing it as I suggest will be less than ten minutes of work.

Anudeep · January 4, 2018, 9:02am

Nice, option to exclude domains from Query log page would be great!

Anudeep · February 27, 2018, 9:59am

This feature will help to hide commonly appearing domains and some blacklisted domains contacting several thousand times a day (like Amazon metrics, some NTP domains, and littlefield.logs.roku )

And also it helps in the analysis of large number of queries. What do you think @DL6ER?

RamSet · March 29, 2018, 10:14pm

I second this also. Hiding them from the query list will be great !

I'm flooded by localhost queries every time i access the query log.

RamSet · March 31, 2018, 6:10pm

Is it something i can change within the code (use the same top list from the api option) and have those entries ignored and not displayed in the wuery log?

DL6ER · April 18, 2018, 4:32pm

At least for localhost, there is already a feature for this in FTLDNS:

github.com/pi-hole/FTL

Add IGNORE_LOCALHOST config option

FTLDNS ← FTLDNS-IGNORE_LOCALHOST

opened 04:56PM - 11 Apr 18 UTC

DL6ER

+39 -5

**By submitting this pull request, I confirm the following (please check boxes, …eg [X]) _Failure to fill the template will close your PR_:** ***Please submit all pull requests against the `development` branch. Failure to do so will delay or deny your request*** - [X] I have read and understood the [contributors guide](https://github.com/pi-hole/pi-hole/blob/master/CONTRIBUTING.md). - [X] I have checked that [another pull request](https://github.com/pi-hole/FTL/pulls) for this purpose does not exist. - [X] I have considered, and confirmed that this submission will be valuable to others. - [X] I accept that this submission may not be used, and the pull request closed at the will of the maintainer. - [X] I give this submission freely, and claim no ownership to its content. **How familiar are you with the codebase?:** ## 10 --- This solves https://github.com/pi-hole/AdminLTE/issues/719 Note that queries from `localhost` will not just be hidden but completely ignored. This will impact the generated statistics and might hide information! _This template was created based on the work of [`udemy-dl`](https://github.com/nishad/udemy-dl/blob/master/LICENSE)._

RamSet · April 18, 2018, 4:37pm

This only prevents hits TO localhost, right?

It will not ignore legitimate hits to different hosts, originating from localhost

As in, localhost querying github will still show, right?

DL6ER · April 18, 2018, 4:39pm

No, see the description of this config option:

IGNORE_LOCALHOST=no|yes (Should FTL ignore queries coming from the local machine?)

RamSet · April 18, 2018, 4:46pm

It's a good feature however, I personally would still like to see if locahost is "dialing" out and where EXCEPT for self queries.

It would be just a cosmetic thing I agree.

What I was thinking about is an approach like the API/Web Interface Exclude options.

Not only in the Top lists but also Query log.

So if a query is placed to localhost and/or localhost.local, or any domain specified in the Exclude Boxes, strip it from being displayed ...

DL6ER · April 18, 2018, 6:13pm

I see, but, unfortunately, there is a (significant!) performance penalty connected to this feature which is the sole reason for why we have not added it yet. Assume we have 15 clients on a typical system with 20.000 queries within 24 hours. Then, the filter would have to be applied 15*20.000 = 300.000 times as we would have to check in each and every query for each and every domain individually if we want to show them or not. Not even going to mention how wildcard comparisons might perform (they are much more costly in terms of computations time). Sure, there are ways to code around this like using tree methods and similar techniques, but this would make the code still slower (albeit not as dramatic) but also much harder to maintain).

reg · September 10, 2018, 12:54am

If you enable ssh to your pi-hole, you can tail -f the log file and run that through whatever you like. I detail doing that here for a roku, but it is more or less the same for any device.

Anudeep · October 5, 2018, 6:48am

@DL6ER what about ignoring the entire client itself other than localhost? For example, IGNORE_CLIENT=10.0.0.13 does it affect any performance?

DL6ER · October 7, 2018, 4:36pm

If we add it now, users will want to have an arbitrary long list of clients to be ignored in the future. It would have a notable performance penalty as each entry in the entire Query Log would have to be matched against each of the clients to be ignored. Whilst possible, this would surely have a bad influence.

We're (internally) working on a new API implementation. This API will provide the Query Log of the future and may be able to handle this better. As it will also support pagination, it would have to check against a significantly lower number of queries on each individual request reducing the performance penalty notably.

aaronburro · March 26, 2019, 2:47pm

I agree. I don't mind as much if they are still stored in the logs under the hood, but it would be nice to be able to filter items out on the Query Log in the web client. Bonus points if you can make the block stats do the same.

I see, but, unfortunately, there is a (significant!) performance penalty connected to this feature which is the sole reason for why we have not added it yet. Assume we have 15 clients on a typical system with 20.000 queries within 24 hours. Then, the filter would have to be applied 15*20.000 = 300.000 times as we would have to check in each and every query for each and every domain individually if we want to show them or not.

There's no need to check each domain individually (excluding wildcard functionality). Searching a Hashset of excluded domains would be trivially fast, assuming users don't put 50k domains in.

Another option would be to maintain a separate "filtered" bit and set it appropriately when the request is serviced. If the user changes the filters, reprocess all requests and reset the bits appropriately. This would remove the need to do the filtering on the web client.

Mcat12 · March 27, 2019, 1:01am

The issue is not how long it takes to figure out if one query should be filtered out or not, it's that we have to check each query. It takes O(1) to check a hash set (theoretically), but since you do that N times it takes O(n).

As @DL6ER noted, we are solving these filtering questions in the API, where it is easier to use things like hash sets.

aaronburro · March 27, 2019, 3:17pm

Yes, the additional filter check would add O(n) to request processing, but that process must already be at least O(n). You would be adding 1 to the multiplier.

Or is the problem that Hashsets aren't available at the level where the web interface would be doing the filtering?

Mcat12 · March 28, 2019, 4:46am

The web interface does not do the filtering, that currently happens in FTL. Because FTL is in C, it does not have many nice things like hash maps or hash sets built in. The API is in Rust, which does have those features as part of the standard library. For FTL to do this level of filtering, it would either need to implement a hash set/map (complex, not fun) or use a slower approach with an array (fastest reasonable approach would be binary search, O(n log n) to filter all of the queries). There can be hundreds of thousands of queries, so we want to keep filtering performant.

horst · May 13, 2021, 9:27am

any news here?
or its already implemened? how i can exclude domains from querylog (to see better the other domains)

idan · July 5, 2021, 12:48am

There are two complimentary requests regarding not logging certain resolutions to queries table: this one about ignoring certain domains and another about ignoring certain clients.

Pi-Hole already knows how to very efficiently decide what to do with requests based on the set of rules. It would be amazing to be able to also define what it should do with logging based on a similar set of rules (e.g. client's group, domain, decision to block or allow etc). It should cover both requests very nicely.

In my particular situation I have a Chinese IoT that tries to access baidu.com every couple of seconds. It is always blocked but it does not prevent anything, and the FTL database grows several hundred thousands records every week. Deleting them manually and truncating statistics is very boring to say the least...