Two Pi-hole instances, One with "failed to send..."

Yes, cleaning the logs seems like a good idea.

sudo systemctl stop unbound
sudo rm /var/log/unbound/unbound.log
sudo touch /var/log/unbound/unbound.log
sudo chown unbound /var/log/unbound/unbound.log
sudo unbound-checkconf /etc/unbound/unbound.conf.d/pi-hole.conf
sudo systemctl start unbound
1 Like

Thanks :slight_smile: and done...

Hi again...

Had a go again :slight_smile:

Deleted and created a new Unbound.log file as Bucking_Horn suggested.

Three log files and one pcap file uploaded to same location as yesterday.

There are several incidents in pihole.log and FTL.log, I tried to use same time stamps in the unbound.log, but as before I am not sure what to look for. Each request in the unbound.log got many many entries with same time stamp like thousands...

Hope the pcap file is ok, got the message The capture file appears to have been cut short in the middle of a packet and for some reason the capturing is only between 15:19 and 16:25. However, there are incidents inside that timeframe so I hope it is OK.
Warm Regards - Steen

Okay, we got something now.

Feb  7 13:14:46 dnsmasq[3013]: query[A] sync.opera.com from 192.168.1.40
Feb  7 13:14:46 dnsmasq[3013]: cached-stale sync.opera.com is <CNAME>
Feb  7 13:14:46 dnsmasq[3014]: query[HTTPS] sync.opera.com from 192.168.1.40
Feb  7 13:14:46 dnsmasq[3013]: cached-stale nl.sync.opera.com is 185.26.182.111
Feb  7 13:14:46 dnsmasq[3014]: cached-stale sync.opera.com is <CNAME>
Feb  7 13:14:46 dnsmasq[3013]: cached-stale nl.sync.opera.com is 185.26.182.112
Feb  7 13:14:46 dnsmasq[3014]: cached nl.sync.opera.com is NODATA
Feb  7 13:14:47 dnsmasq[3014]: forwarded sync.opera.com to 127.0.0.1#5335
Feb  7 13:14:47 dnsmasq[3014]: dnssec-query[DS] opera.com to 127.0.0.1#5335
Feb  7 13:14:47 dnsmasq[3014]: reply opera.com is no DS
Feb  7 13:14:47 dnsmasq[3014]: validation result is INSECURE
Feb  7 13:14:47 dnsmasq[3014]: reply sync.opera.com is <CNAME>
Feb  7 13:14:47 dnsmasq[3014]: reply sync.geo.opera.com is <CNAME>
Feb  7 13:14:47 dnsmasq[3014]: reply nl.sync.opera.com is NODATA
Feb  7 13:14:47 dnsmasq[3013]: Failed at receiving length, data_sent: true
Feb  7 13:14:47 dnsmasq[3013]: failed to send TCP(read_write) packet: Connection prematurely closed by remote server

and, indeed unbound never sent the reply to the A query downstream:

Feb 07 13:14:46 unbound[1989:0] info: 127.0.0.1 sync.opera.com. HTTPS IN

...

Feb 07 13:14:46 unbound[1989:0] info: 127.0.0.1 sync.opera.com. A IN

...

Feb 07 13:14:47 unbound[1989:0] info: subnetcache operate: query sync.opera.com. HTTPS IN
Feb 07 13:14:47 unbound[1989:0] debug: reply has edns subnet (null)
Feb 07 13:14:47 unbound[1989:0] debug: mesh_run: subnetcache module exit state is module_finished
Feb 07 13:14:47 unbound[1989:0] debug: comm point stop listening 12
Feb 07 13:14:47 unbound[1989:0] debug: comm point start listening 12 (200 msec)
Feb 07 13:14:47 unbound[1989:0] debug: startlistening 12 mode w
Feb 07 13:14:47 unbound[1989:0] debug: query took 0.165177 sec
Feb 07 13:14:47 unbound[1989:0] info: 127.0.0.1 sync.opera.com. HTTPS IN NOERROR 0.165177 0 135
Feb 07 13:14:47 unbound[1989:0] info: mesh_run: end 1 recursion states (1 with reply, 0 detached), 1 waiting replies, 1240 recursion replies sent, 0 replies dropped, 0 states jostled out

...

Feb 07 13:14:47 unbound[1989:0] info: subnetcache operate: query sync.opera.com. A IN
Feb 07 13:14:47 unbound[1989:0] debug: reply has edns subnet (null)
Feb 07 13:14:47 unbound[1989:0] debug: mesh_run: subnetcache module exit state is module_finished
Feb 07 13:14:47 unbound[1989:0] info: mesh_run: end 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 1241 recursion replies sent, 0 replies dropped, 0 states jostled out

just before the last four lines, we see that the cache is populated with

Feb 07 13:14:47 unbound[1989:0] info: chased extract ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
sync.opera.com.	IN	A

;; ANSWER SECTION:
sync.opera.com.	60	IN	CNAME	sync.geo.opera.com.

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 55

Feb 07 13:14:47 unbound[1989:0] info: chased extract ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
sync.geo.opera.com.	IN	A

;; ANSWER SECTION:
sync.geo.opera.com.	120	IN	CNAME	nl.sync.opera.com.

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 58

Feb 07 13:14:47 unbound[1989:0] info: chased extract ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
nl.sync.opera.com.	IN	A

;; ANSWER SECTION:
nl.sync.opera.com.	60	IN	A	185.26.182.112
nl.sync.opera.com.	60	IN	A	185.26.182.111

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 67

so unbound indeed has the necessary data to respond to your query.

This brings me to the final conclusion that is really an unbound bug. Despite being asked to resolve query sync.opera.com. A IN and having performed all the necessary steps and recursions, it falsely concludes the chain with "0 waiting replies", and, hence, does not think it has to reply.

Case closed, I'd say. This bug would need to be reported to unbound but I am not familiar with their forums/procedures. I guess everything they'd need can be found in this message I am just writing.

1 Like

Hi DL6ER,
Many thanks for all your efforts, investigations, very good explanations and suggestions!!!
I'll - just for the "fun" - start to use a couple of other browsers just to see if same would pop-up or I have tell the house not to use Opera :slight_smile:
I am also not sure how to report to Unbound, might poke a bit around their forum...
You have a great Friday and a lovely weekend!
Steen

1 Like

It looks like you open an Issue on their GitHub, and I guess include background and link back to DL6ER's post here for details.

1 Like

Hi,
I might not be the best to do, but I'll try.
Thanks for the hint!

1 Like

https://github.com/NLnetLabs/unbound/issues/1237

1 Like

please update anything I wrote as it is most likely not the best bug report.

Hi,
I got a long response at the Unbound forum. Unfortunately several of the questions - I can't with my limited knowledge - answer.
Perhaps if you or one of the others could assist me with answers?

I dont have the complete log to see what is happening but my initial thoughts is why does Unbound get TCP queries in the first place?

If this is not the environments configuration (everything over TCP), maybe only select queries are strictly over TCP, like the one from opera I see in the forum? But then the amount of said TCP traffic still pushes Unbound near its configured limits?

Also, I see that this is mostly dnsmasq talking to Unbound?

Maybe dnsmasq can be configured to send multiple queries over a single TCP connection?

They also suggest to bump incoming-number-tcp and see how it goes.

https://github.com/NLnetLabs/unbound/issues/1237#issuecomment-2648162871

Sorry, I haven't been able to answer before but I see you are already quite well discussing (and progressing?) with them.

1 Like