Communications error to 127.0.0.1#5335: timed out

Please follow the below template, it will help us to help you!

If you are Experiencing issues with a Pi-hole install that has non-standard elements (e.g you are using nginx instead of lighttpd, or there is some other aspect of your install that is customised) - please use the Community Help category.

Expected Behaviour:

Running pihole on an ubuntu 22.04 LXC in Proxmox. Using it as DNS with unbound ( installed using apt, as in the official docs of pihole ), and working pretty much ok, except that some domains are not being resolved.

Actual Behaviour:

Trying to investigate, I ran

dig plex.tv @127.0.0.1 -p 5335

and the return was:

;; communications error to 127.0.0.1#5335: timed out
;; communications error to 127.0.0.1#5335: timed out
;; communications error to 127.0.0.1#5335: timed out

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> plex.tv @127.0.0.1 -p 5335
;; global options: +cmd
;; no servers could be reached

Further on, checked the unbound service and the result is:

* unbound.service - Unbound DNS server
     Loaded: loaded (/lib/systemd/system/unbound.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-02-22 17:45:05 EET; 1 day 22h ago
       Docs: man:unbound(8)
    Process: 113 ExecStartPre=/usr/lib/unbound/package-helper chroot_setup (code=exited, status=0/SUCCESS)
    Process: 139 ExecStartPre=/usr/lib/unbound/package-helper root_trust_anchor_update (code=exited, status=0/SUCCESS)
    Process: 18587 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
   Main PID: 149 (unbound)
      Tasks: 1 (limit: 76799)
     Memory: 24.4M
        CPU: 27.769s
     CGroup: /system.slice/unbound.service
             `-149 /usr/sbin/unbound -d -p

Debug Token:

https://tricorder.pi-hole.net/Lq6MAzI1/

Please, any advise would help!

What machine was that dig command run from?

Run from your Pi-hole host machine, please share the output of:

sudo unbound-checkconf
sudo grep -v '#\|^$' -R /etc/unbound/unbound.conf.d

this outputs:

unbound-checkconf: no errors in /etc/unbound/unbound.conf

output of sudo grep -v '#\|^$' -R /etc/unbound/unbound.conf.d:

/etc/unbound/unbound.conf.d/root-auto-trust-anchor-file.conf:server:
/etc/unbound/unbound.conf.d/root-auto-trust-anchor-file.conf:    auto-trust-anchor-file: "/var/lib/unbound/root.key"
/etc/unbound/unbound.conf.d/pi-hole.conf:server:
/etc/unbound/unbound.conf.d/pi-hole.conf:    verbosity: 0
/etc/unbound/unbound.conf.d/pi-hole.conf:    interface: 127.0.0.1
/etc/unbound/unbound.conf.d/pi-hole.conf:    port: 5335
/etc/unbound/unbound.conf.d/pi-hole.conf:    do-ip4: yes
/etc/unbound/unbound.conf.d/pi-hole.conf:    do-udp: yes
/etc/unbound/unbound.conf.d/pi-hole.conf:    do-tcp: yes
/etc/unbound/unbound.conf.d/pi-hole.conf:    do-ip6: no
/etc/unbound/unbound.conf.d/pi-hole.conf:    prefer-ip6: no
/etc/unbound/unbound.conf.d/pi-hole.conf:    root-hints: "/var/lib/unbound/root.hints"
/etc/unbound/unbound.conf.d/pi-hole.conf:    harden-glue: yes
/etc/unbound/unbound.conf.d/pi-hole.conf:    harden-dnssec-stripped: yes
/etc/unbound/unbound.conf.d/pi-hole.conf:    use-caps-for-id: no
/etc/unbound/unbound.conf.d/pi-hole.conf:    edns-buffer-size: 1232
/etc/unbound/unbound.conf.d/pi-hole.conf:    prefetch: yes
/etc/unbound/unbound.conf.d/pi-hole.conf:    num-threads: 1
/etc/unbound/unbound.conf.d/pi-hole.conf:    so-rcvbuf: 1m
/etc/unbound/unbound.conf.d/pi-hole.conf:    private-address: 192.168.0.0/16
/etc/unbound/unbound.conf.d/pi-hole.conf:    private-address: 169.254.0.0/16
/etc/unbound/unbound.conf.d/pi-hole.conf:    private-address: 172.16.0.0/12
/etc/unbound/unbound.conf.d/pi-hole.conf:    private-address: 10.0.0.0/8
/etc/unbound/unbound.conf.d/pi-hole.conf:    private-address: fd00::/8
/etc/unbound/unbound.conf.d/pi-hole.conf:    private-address: fe80::/10

dig command was run on the same host as the pihole and unbound.

Those results look normal.

127.0.0.1 would only work on the same (virtual) machine.

Did you run it from within the VM that runs your Pi-hole and unbound?
And did you install unbound in the very same VM as Pi-hole?

The answer is YES to all of your questions above. And it does work, with some exceptions. So far I found 2 domain names that are not resolved, but I'm sure there could be more.

Your dig results show that the DNS requests do not make it to your unbound:

If Pi-hole works, then likely because you've configured Pi-hole to use G**gle's DNS besides unbound:

   server=8.8.8.8
   server=127.0.0.1#5335

I did that just as a temp. fix. If you think it is useful, I could trigger a new set of logs with google removed.

As its just changing Pi-hole's upstreams, you could just confirm whether Pi-hole could reach your unbound. Based on your dig results run from the same VM, I'd expect that removing 8.8.8.8 would result in failure of DNS resolution.

Run from within your Pi-hole VM, please share the output of:

dig pi.hole @127.0.0.1
dig flurry.com @127.0.0.1
dig flurry.com @127.0.0.1 -p 5335

Which of those requests registers in Pi-hole's Query Log?

root@pihole:~# dig pi.hole @127.0.0.1

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> pi.hole @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25305
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;pi.hole.                       IN      A

;; ANSWER SECTION:
pi.hole.                0       IN      A       127.0.0.1

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Feb 24 22:00:45 EET 2024
;; MSG SIZE  rcvd: 52

root@pihole:~# dig flurry.com @127.0.0.1

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> flurry.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31364
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;flurry.com.                    IN      A

;; ANSWER SECTION:
flurry.com.             2       IN      A       0.0.0.0

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Feb 24 22:01:12 EET 2024
;; MSG SIZE  rcvd: 55

root@pihole:~# dig flurry.com @127.0.0.1 -p 5335

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> flurry.com @127.0.0.1 -p 5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27307
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;flurry.com.                    IN      A

;; ANSWER SECTION:
flurry.com.             188     IN      A       44.228.206.170
flurry.com.             188     IN      A       13.50.184.192
flurry.com.             188     IN      A       34.225.127.72
flurry.com.             188     IN      A       13.49.212.207
flurry.com.             188     IN      A       13.251.69.97
flurry.com.             188     IN      A       18.136.37.69
flurry.com.             188     IN      A       34.213.101.254
flurry.com.             188     IN      A       54.161.105.65

;; Query time: 0 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1) (UDP)
;; WHEN: Sat Feb 24 22:01:48 EET 2024
;; MSG SIZE  rcvd: 167

the only one registered in query log was : dig flurry.com @127.0.0.1 and it was blocked.

All those results look good, indicating that your setup works as intended.

Does dig plex.tv @127.0.0.1 also resolve successfully now?

When did you see those time outs/'no servers could be reached' messages?
How often?

the result for plex.tv is now:

root@pihole:~# dig plex.tv @127.0.0.1

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> plex.tv @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 29909
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;plex.tv.                       IN      A

;; Query time: 35 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Feb 24 22:12:45 EET 2024
;; MSG SIZE  rcvd: 36

The timeouts came in every now and then. I checked in about 6-7 times today.
Not sure if related or not, but I also often see this warning in diagnosis logs ( value varies, not always 1.5 ):

Long-term load (15min avg) larger than number of processors: 1.5 > 1
This may slow down DNS resolution and can cause bottlenecks.

LE: and just got another tieout:

root@pihole:~# dig plex.tv @127.0.0.1 -p 5335
;; communications error to 127.0.0.1#5335: timed out
;; communications error to 127.0.0.1#5335: timed out
;; communications error to 127.0.0.1#5335: timed out

; <<>> DiG 9.18.18-0ubuntu0.22.04.2-Ubuntu <<>> plex.tv @127.0.0.1 -p 5335
;; global options: +cmd
;; no servers could be reached

So far, your results demonstrate that it is your unbound that is unreachable at times, making this more of a networking/unbound issue.

DNS is usually light on CPU, unless you are dealing with a high number of clients and/or DNS requests. By chance, your debug log shows a count of ~64,000 DNS requests during the last 24 hours, or ~2,700 per hour - not enough to break a sweat in Pi-hole, provided they did not all occur in the same split second.

It's unlikely that that happened, though, as there are no respective warnings in your debug log.

Nevertheless, if you see sudden spikes of DNS requests, it may be worth chasing down the client sending them, in order to change its behaviour (if possible).

Likely, the high load is caused by some other process on your machine, usually only temporary, e.g. during an OS update.

Do unbound's timeouts always correlate with times of high loads then?
If so, it may be worth tracking down the process that's causing the load.

xyboox, I had a similar problem with a number of domains a while back. My ISP Virgin Media provides a router that breaks DNSSEC TCP replies. Until I could put their router into modem mode and provide my own router I used a file /etc/dnsmasq.d/99-my-settings.conf in which I have added:
server=/*.tv/1.0.0.1 and then restarted dns. This causes pihole to forward queries for any *.tv domain to 1.0.0.1 instead of to your unbound.
If this was the cause of your problem then a query for twitch.tv would have the same error.

So far, your results demonstrate that it is your unbound that is unreachable at times, making this more of a networking/unbound issue.

This, I believe, lays at core of the issue. In fact I discovered is not just plex.tv the one that's failing, but there are more: twitch.tv, debian.be, newpharma.be All of these are returning ;; communications error to 127.0.0.1#5335: timed out regardless of the load state of the VM. And about that: the max number of DNS queries I noticed are indeed in the range of 60k / 24h:

Any params I could tweak in this regard? Could this be the cause of the comm errors? If so, why limited to some ( but always the same ) domain names? Something seems off here...

Thanks Moto my ISP equipment is already in bridge mode, I'm using pfSense for DHCP, VLANs and firewall. twitch.tv is indeed one of the unresolved domains, but not limited to, see my prev. answer. However I'd like to investigate more on the DNSSEC replies you mentioned. How you figured out that issue? How can I check on my side?

Is your unbound installed within pfSense. Hmm. It's a while since I used pfSense. I think it has an automatic setting for EDNS Buffer Size. You could try setting that to 1232 as recommended in the pihole unbound documentation. pfSense recommends a value of 1432 if you have fragmentation reassembly problems.
My money says you have the same problem resolving gov.uk which started my search for a solution.

EDNS set to 1232 in /etc/dnsmasq.d/99-edns.conf but I'm not sure if that file is being used somehow. I just created it ( as per pihole unbound docs ) but there is no further instruction on where to add the file itself ( is it being read/used by someone? ).
You're right, gov.uk is on the unresolved "side" too.

LE: Does it matter if in pfSense I'm not using DNS Resolver but DNS Forwarder?

There is no need to tweak anything here, especially since it would seem that your peak load observations are not tied to your lookup failures.

As mentioned, 60k per day is not an issue for Pi-hole.
A client would need to exceed 1,000 requests per minute to trigger Pi-hole's rate limit (which is a safety measure, not a performance one).

I can prompt a similar output from unbound (for any domain) if I block access to public DNS servers for it in my routers firewall, e.g.:

$ dig www.google.com @127.0.1.1 -p 5335

; <<>> DiG 9.11.5-P4-5.1+deb10u10-Raspbian <<>> www.google.com @127.0.1.1 -p 5335
;; global options: +cmd
;; connection timed out; no servers could be reached

As in your case, this only happens for some domains, it may suggest that something upstream of your unbound interferes with access to one of the authoritative DNS servers unbound is recursing through.

I don't think those timeouts are DNSSEC related, but it may play a role for the SERVFAIL results.
If dns-utils or bind9-dnsutils are installed, the Domain Entity Lookup and Validation tool delv may provide more insights into why DNSSEC validation fails, e.g. try something like:

delv @127.0.0.1 -p 5335 +rtrace plex.tv

this outputs:

root@pihole:~# delv @127.0.0.1 -p 5335 +rtrace plex.tv
;; fetch: plex.tv/A
;; resolution failed: SERVFAIL

Here's a new development that might help, somehow the diagnosis: I've enabled DNS resolver in pfSense, which is also using unbound, and set DNS Resolution Behavior to use local DNS and then fallback to remote ( set for the IP of the pihole ). Then I did a DNS lookup in pfsense UI, for plex.tv. It failed for both: local and remote DNS. This says we can't talk about an upstream blocking of unbound. Well... at least not in my network. But also seems is not from my ISP ( since when using google DNSs it does resolve plex.tv ).

Is this saying something?