It would be nice to have an option to change the aggregation of the Top Blocked Domains / Top Permitted Domains to ignore the sub domain component.
For example:
mask.icloud.com 1327
mask-h2.icloud.com 1298
metrics.icloud.com 739
Would just be aggregated as
icloud.com 3364
The actual breakdown of the sub domain count could be shown with an on hover window when mousing over the aggregated domain or just have click drill through if it's easier to implement.
In my opinion while less specific this would provide a more useful view of which services are actually being blocked allowed etc. Currently 6 of my top 10 allowed entries are only two actual domains and my top 10 blocked all 10 are only three actual domains.
I understand your request, but there is no "breakdown" happening in the web interface.
From DNS queries point of view there is no relation between a domain and subdomains. They are different domains and these queries are individually stored in the database.
example.com is a domain. www.example.com is another domain. pi-hole.example.com is another domain. dns.pi-hole.example.com is another domain.
I assume there are libraries to parse a URL into it's component pieces. I wonder if that's a viable option. Given the absurd complexity of some domains in the current day it just might not be worth even trying.
What I’m saying is either a column could be added to the database so you have actual_domain (www.example.com) and primary domain (example.com) for each row.
When querying the DB to get the top 10 you perform do it at query time using a DB function or regex (I’m not sure what SQLite supports).
The former would be much more performant. The latter easier to implement.
No, Pi-hole will definitely show ten distinct domains.
You are suggesting to aggregate DNS request counts by their second level domain parts.
I see very little benefit in this, with a considerable potential to make statistics much less significant or even meaningless.
As there are no inherent semantics attached to a domain beyond what the DNS hierarchy itself would imply, a second level aggregation would be a rather arbitrary criterion.
It may already fail to even meet your expectation for those *.icloud.com domains specifically, as aggregation may not be limited to those three domains you quote in your example.
There are literally hundreds of other domains that end in icloud.com (click for a sample excerpt)
Furthermore, it may produce unexpected or unwanted results for other domains as well, e.g. your suggestion would also aggregate all of *.co.uk domains as well as *.in-addr.arpa, which in turn may catapult otherwise unremarkable counts into the top 10s.
Obviously, universally applying a second level aggregation would potentially result in a much distorted view of statistics.
Given above considerations, I'm not convinced that implementing this would be beneficial.
Also, as picking an appropriate aggregation level would very much depend on the actual domain, and quite probably on personal preferences as well, that would require quite a bit more customisation than your suggestion anticipated.
Yes, I am suggesting second level domain aggregation. I am aware that this would provide significantly less detail than the entire domain.
The point, in my opinion would be to know "who's house" most of your permitted / blocked traffic is going to, not which specific rooms in that house. I'm also only suggesting it as an option rather than a primary change and I'm not suggesting even then, dropping the entire domain counts but rather putting them as a hover window or drill through link on the second level domain.
As an example about 30% of blocked traffic in my network currently is going to logs.netflix.com ... I don't have a netflix subscription. Being able to see all traffic going to *.netflix.com would be great. If these were spread across a number of sub domains then the counts would likely not be enough to appear in the top 10 and therefore I likely wouldn't have become aware of it,
In relation to the domain aggregation being complex, I agree and acknowledged this in my second post. That said, there are a finite number of TLDs, surely functions exist that can find the TLD in a domain and then return TLD plus the next block to the left as the second level domain.