Thanks, I got all the files. Sadly, no immediate answer is apparent, however, while you were preparing the more verbose logs, I checked again on your pcap
file from above. Using the filter
dns && (dns.flags.response == 0) && ! dns.response_in
I reduced the shown packages to those who were sent to unbound
and were never replied to. Not many, but certainly there is an issue on unbound
's side here:
This matches the related log file perfectly because it also has the Connection prematurely closed by remote server
message for exactly these five queries. So the problem is really not with Pi-hole / dnsmasq
but with unbound
and the warning is actually correct.
I can confirm that in each case, unbound
terminated/closed the connection. In all working cases, however, dnsmasq
closed the connection on its side once the reply was received.
Sadly, me neither. The logs are very verbose (fair, this is what we asked for). Let me try to summarize one of the failed calls here:
Feb 04 11:49:59 unbound[1639:0] debug: startlistening 13 mode r
Feb 04 11:49:59 unbound[1639:0] debug: Reading tcp query of length 49
Feb 04 11:49:59 unbound[1639:0] debug: answer from the cache failed
Feb 04 11:49:59 unbound[1639:0] debug: tcp request from ip4 127.0.0.1 port 51536 (len 16)
Feb 04 11:49:59 unbound[1639:0] debug: mesh_run: start
Feb 04 11:49:59 unbound[1639:0] debug: subnetcache[module 0] operate: extstate:module_state_initial event:module_event_new
Feb 04 11:49:59 unbound[1639:0] info: subnetcache operate: query speeddials.opera.com. HTTPS IN
Feb 04 11:49:59 unbound[1639:0] debug: comm point stop listening 12
Feb 04 11:49:59 unbound[1639:0] debug: comm point start listening 12 (300 msec)
Feb 04 11:49:59 unbound[1639:0] debug: startlistening 12 mode r
Feb 04 11:49:59 unbound[1639:0] debug: Reading tcp query of length 49
Feb 04 11:49:59 unbound[1639:0] debug: answer from the cache failed
Feb 04 11:49:59 unbound[1639:0] debug: tcp request from ip4 127.0.0.1 port 51538 (len 16)
Feb 04 11:49:59 unbound[1639:0] debug: mesh_run: start
Feb 04 11:49:59 unbound[1639:0] debug: subnetcache[module 0] operate: extstate:module_state_initial event:module_event_new
Feb 04 11:49:59 unbound[1639:0] info: subnetcache operate: query speeddials.opera.com. A IN
... after a long time ...
Feb 04 11:50:00 unbound[1639:0] debug: serviced_delete
Feb 04 11:50:00 unbound[1639:0] debug: close of port 62962
Feb 04 11:50:00 unbound[1639:0] debug: comm_point_close of 12: event_del
Feb 04 11:50:00 unbound[1639:0] debug: close fd 12
and the A
lookup succeeded!
What happens to the lost son, the HTTPS
query for the same domain? Well, unbound
finishes to resolve this, too, but then:
Feb 04 11:50:00 unbound[1639:0] debug: comm point stop listening 13
Feb 04 11:50:00 unbound[1639:0] debug: comm point start listening 13 (300 msec)
Feb 04 11:50:00 unbound[1639:0] debug: startlistening 13 mode r
Feb 04 11:50:00 unbound[1639:0] debug: tcp channel read side closed 13
Feb 04 11:50:00 unbound[1639:0] debug: comm_point_close of 13: event_del
Feb 04 11:50:00 unbound[1639:0] debug: close fd 13
I am not sure what is happening here. Seems like unbound
thinks the connection was closed from dnsmasq
's end and, hence, never sends a reply.
To be fully honest with you: My lack of knowledge on how to read the (frankly speaking quite difficult to read) unbound
log will not bring us to determining why unbound
has closed the connection. I have not even been able to identify where/how unbound
logs having sent replies back to the original requestor (our dnsmasq
), I always ever see the requests going upstream and replies coming back from there.
Do you have
log-queries: yes
log-replies: yes
in your unbound.log
. If not, it may be worth to attempt a last try with this set, all the logs and - ideally - another PCAP recording. But if we cannot resolve it then, I'd we close on this as this is consuming quite some time on our both ends.