Log-queries=extra parsing

Hi,

I would very much like to include the correct parsing (and showing in "Recent Queries" WebGUI) of dnsmasq logs with the " log-queries = extra " option enabled

Below log fragment:
with "log-queries = extra"

Apr 19 09:44:42 dnsmasq[30501]: * 127.0.0.1/37601 reply . is DNSKEY keytag 14796, algo 8
Apr 19 09:44:42 dnsmasq[30501]: * 127.0.0.1/37601 reply . is DNSKEY keytag 19036, algo 8
Apr 19 09:44:42 dnsmasq[30501]: 1 127.0.0.1/37601 validation result is SECURE
Apr 19 09:44:56 dnsmasq[30501]: 2 127.0.0.1/54154 query[A] discourse.pi-hole.net from 127.0.0.1
Apr 19 09:44:56 dnsmasq[30501]: 2 127.0.0.1/54154 forwarded discourse.pi-hole.net to 8.8.4.4
Apr 19 09:44:56 dnsmasq[30501]: 2 127.0.0.1/54154 validation result is INSECURE
Apr 19 09:44:56 dnsmasq[30501]: 2 127.0.0.1/54154 reply discourse.pi-hole.net is 104.24.109.83
Apr 19 09:44:56 dnsmasq[30501]: 2 127.0.0.1/54154 reply discourse.pi-hole.net is 104.24.108.83

and without "log-queries = extra" just standart "log-queries"

Apr 19 09:42:51 dnsmasq[30244]: reply . is DNSKEY keytag 14796, algo 8
Apr 19 09:42:51 dnsmasq[30244]: reply . is DNSKEY keytag 19036, algo 8
Apr 19 09:42:51 dnsmasq[30244]: validation result is SECURE
Apr 19 09:42:59 dnsmasq[30244]: query[A] discourse.pi-hole.net from 127.0.0.1
Apr 19 09:42:59 dnsmasq[30244]: forwarded discourse.pi-hole.net to 8.8.4.4
Apr 19 09:42:59 dnsmasq[30244]: validation result is INSECURE
Apr 19 09:42:59 dnsmasq[30244]: reply discourse.pi-hole.net is 104.24.109.83
Apr 19 09:42:59 dnsmasq[30244]: reply discourse.pi-hole.net is 104.24.108.83

As you can see, this option adds some important fields such as the IP address ( this time local client 127.0.01 )and the query port, and most importantly lets you bind queries together with the responses through uniqe ID (intiger before client IP this time "2" or some times its " * ") this is necessary if you got lots of parallel queries.

Thank you very much and congratulations on the good work on this project !

We are aware of this issue, but I don't think the new FTL engine needs this, because of the way it interprets the log (which is very much different from how the current PHP API is doing it).

In addition, there is the problem of versions. dnsmasq has only recently been updated to 2.76 on Raspbian Jessie, but many users have still 2.72. Also, 2.76 might not be available on Debain VMs.
However, version 2.72 actually fails to start, throwing an error that the option log-queries=extra is unknown if it is specified.

Hence, I see no way it could be added to Pi-hole (at least not now and probably also not in 2017).

This not work on FTL engine version too.

We could always run a check for the installed version of dnsmasq in the install script, and if it exists, add the line. The issue would then be having two separate log-parsing methods. Not unfeasible, but may add unnecessary complexity.

As far as I see in the code of FTL, the file that contains the code that is responsible for the log parsing is > 800 lines of code in length. Most parts of it would now have to be duplicated in order to not group request and successive answer by the domain name, but by the ID.

I see no way to incorporate this is a way that does not drastically increase the complexity and hence the susceptibility to errors in this most critical part of the code. @DL6ER will surely agree.

Furthermore, I see other possible complications that is that on some (enterprise) systems, the size of the Pi-hole log can already now exceed millions of queries. Still, these systems can operate perfectly fine even on Pi hardware (where file size limitations are not to be ignored).

Note that we had a testing VM running (until yesterday) that was producing random queries at a rate of about 2'000 queries per second, summing up to more than 120 million queries a day. In a Monto-Carlo-fashion, I verified that the data parsed by FTL was correct and have not found a single false-positive entry in more than one week.

Before anyone of us should start thinking about how to implement this best, you should show us a practical situation where the log analysis of FTL fails to be accurate.

Pseudocode incoming:

IF(dnsmasqVersion > 2.76)
{
    ParseLogAndTakeIntoAccountTheExtraInformationIncluded();
}
else
{
   ParseLogInTheWayThatWeAlreadyDo();
}

Aside from running FTL on a daily basis, I've personally not looked too closely at the results, other than to note that they are FAR more accurate than the current PHP based parsing we do. But if it is accurate, then I agree that this is more of a low-priority "nice to have" rather than something that must be done.

So without that, how do you know which DNS answer is associated with which DNS query from logs

In my opinion it is impossible and practically useless without it.
This relationship is important for security reasons.

Isn't it trivial to implement this ? The only difference is extra two columns in log

Yes.

Have you even tried if it works or looked into the code before assuming that this is a task that is impossible to be done?

Since I assume the answer is No, I prepared an example:
Assume the following log excerpt. Note that the requests are made at the same time and the answer do not follow immediately. Even worse, the answers come in a different ordering that the requests have been made.

Apr 18 03:12:26 dnsmasq[22397]: query[A] 2.client-channel.google.com from 10.8.0.2
Apr 18 03:12:26 dnsmasq[22397]: forwarded 2.client-channel.google.com to 2001:1608:10:25::9249:d69b
Apr 18 03:12:26 dnsmasq[22397]: forwarded 2.client-channel.google.com to 2001:1608:10:25::1c04:b12f
Apr 18 03:12:26 dnsmasq[22397]: forwarded 2.client-channel.google.com to 2620:0:ccd::2
Apr 18 03:12:26 dnsmasq[22397]: forwarded 2.client-channel.google.com to 2620:0:ccc::2
Apr 18 03:12:26 dnsmasq[22397]: forwarded 2.client-channel.google.com to 2001:4860:4860::8844
Apr 18 03:12:26 dnsmasq[22397]: forwarded 2.client-channel.google.com to 2001:4860:4860::8888
Apr 18 03:12:26 dnsmasq[22397]: query[AAAA] play.googleapis.com from 192.168.2.229
Apr 18 03:12:26 dnsmasq[22397]: query[A] clients6.google.com from 10.8.0.2
Apr 18 03:12:26 dnsmasq[22397]: query[AAAA] play.googleapis.com from 192.168.2.229
Apr 18 03:12:26 dnsmasq[22397]: reply 2.client-channel.google.com is 64.233.181.189
Apr 18 03:12:26 dnsmasq[22397]: forwarded clients6.google.com to 2001:1608:10:25::1c04:b12f
Apr 18 03:12:26 dnsmasq[22397]: forwarded play.googleapis.com to 2001:1608:10:25::1c04:b12f
Apr 18 03:12:26 dnsmasq[22397]: reply clients6.google.com is <CNAME>
Apr 18 03:12:26 dnsmasq[22397]: reply clients.l.google.com is 64.233.181.102
Apr 18 03:12:26 dnsmasq[22397]: reply clients.l.google.com is 64.233.181.100
Apr 18 03:12:26 dnsmasq[22397]: reply clients.l.google.com is 64.233.181.139
Apr 18 03:12:26 dnsmasq[22397]: reply clients.l.google.com is 64.233.181.113
Apr 18 03:12:26 dnsmasq[22397]: reply clients.l.google.com is 64.233.181.101
Apr 18 03:12:26 dnsmasq[22397]: reply clients.l.google.com is 64.233.181.138
Apr 18 03:12:26 dnsmasq[22397]: reply play.googleapis.com is <CNAME>
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.132.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 173.194.74.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.193.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.202.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 173.194.196.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 173.194.197.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 209.85.147.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 209.85.200.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.69.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.70.95
Apr 18 03:12:26 dnsmasq[22397]: cached play.googleapis.com is <CNAME>
Apr 18 03:12:26 dnsmasq[22397]: reply play.googleapis.com is <CNAME>
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.132.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 173.194.74.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.193.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.202.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 173.194.196.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 173.194.197.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 209.85.147.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 209.85.200.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.69.95
Apr 18 03:12:26 dnsmasq[22397]: reply googleapis.l.google.com is 74.125.70.95

When FTL analysis this portion of the log, it concludes that

  • 2.client-channel.google.com was a type A query originating from 10.8.0.2 which has been forwarded to 2001:1608:10:25::9249:d69b,2001:1608:10:25::1c04:b12f,2620:0:ccd::2,2620:0:ccc::2,2001:4860:4860::8844, and 2001:4860:4860::8888
  • play.googleapis.com was a type AAAA query originating from 192.168.2.229 which has been forwarded to only 2001:1608:10:25::1c04:b12f
  • clients6.google.com was a type A query originating from 10.8.0.2 which has been forwarded to only 2001:1608:10:25::1c04:b12f
  • the second play.googleapis.com query was also a type AAAA query originating from 192.168.2.229 but has not been forwarded, but was answered from the cache

Why should that be wrong or how could that be wrong when you have even more contemporaneous requests?

As you have seen in my example, FTL is designed such that is recognizes which request belongs to which action and/or answer by considering the domain. Changing that to rely on the ID is a major task, since a lot of code has to be re-written as suggested by @MrD.

Ok thx i understand this now for FTL but let me explain:

In more complicated env like my with proxy etc. when client ask proxy for www , proxy forward query to pihole without original client IP just with own proxy IP, without ID or UDP port its hard for me to know who really ask i must analyze (forwarded to existing external system) dns logs by "time" which isnt sure.

I usualy use PHP version. For this version someone could implement "2 coulmn shift" whit copy/paste existing code with trim/truncate or something :wink:

We accept pull requests :wink:

This will be phased out completely when FTL is launched soon™

I still don't really get what you mean. Could you show us an excerpt of your log that exemplifies your issue?