Facebook, WhatsApp Web, Messenger Kids not working with unbound recursive resolver

After following the instructions for setting up unbound in the Pi-hole as All-Around DNS Solution guide Facebook, WhatsApp Web, Messenger Kids, and Hulu don’t resolve reliably: requests for those domains resolve on some devices but not others.

Expected Behaviour:

Consistently resolve facebook.com, web.whatsapp.com, Messenger kids, Hulu domains through unbound for all devices connected to the local network. (Local network is set to use Pi-hole as DNS. Pi-hole is set to use unbound as a recursive resolver.)

Actual Behaviour:

Most domains resolve fine, but not Hulu, Facebook, and those used by Messenger Kids, WhatsApp Web, and WhatsApp Desktop.

When those domains fail to resolve, folks on the network see “Can’t connect to this site” errors in web browsers and “Not connected to the Internet” errors in mobile and desktop apps requesting to unresolved domains.

Per guidance in other threads, I’ve run dig from the Pi-hole to gather further information on how requests are resolving from the Pi-hole upstream.

“Vanilla” dig

> $ dig web.whatsapp.com
> 
> ; <<>> DiG 9.11.5-P4-5.1-Raspbian <<>> web.whatsapp.com
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61257
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ;; QUESTION SECTION:
> ;web.whatsapp.com.		IN	A
> 
> ;; ANSWER SECTION:
> web.whatsapp.com.	3390	IN	CNAME	mmx-ds.cdn.whatsapp.net.
> mmx-ds.cdn.whatsapp.net. 60	IN	A	31.13.71.49
> 
> ;; Query time: 12 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Tue Oct 08 23:51:01 EDT 2019
> ;; MSG SIZE  rcvd: 98

dig via port 5353

> $ dig web.whatsapp.com -p5353
> ;; Warning: Message parser reports malformed message packet.
> 
> ; <<>> DiG 9.11.5-P4-5.1-Raspbian <<>> web.whatsapp.com -p5353
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21112
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 1472
> ;; QUESTION SECTION:
> ;web.whatsapp.com.		IN	A
> 
> ;; ANSWER SECTION:
> web.whatsapp.com.	53	IN	CNAME	mmx-ds.cdn.whatsapp.net.
> mmx-ds.cdn.whatsapp.net. 28731	RESERVED0 A	\# 4 1F0D4731
> 
> ;; Query time: 2 msec
> ;; SERVER: 127.0.0.1#5353(127.0.0.1)
> ;; WHEN: Tue Oct 08 23:51:08 EDT 2019
> ;; MSG SIZE  rcvd: 98

dig to @1.1.1.1

> $ dig web.whatsapp.com @1.1.1.1
> 
> ; <<>> DiG 9.11.5-P4-5.1-Raspbian <<>> web.whatsapp.com @1.1.1.1
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49098
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;web.whatsapp.com.		IN	A
> 
> ;; ANSWER SECTION:
> web.whatsapp.com.	49	IN	CNAME	mmx-ds.cdn.whatsapp.net.
> mmx-ds.cdn.whatsapp.net. 49	IN	A	31.13.71.49
> 
> ;; Query time: 1 msec
> ;; SERVER: 1.1.1.1#53(1.1.1.1)
> ;; WHEN: Tue Oct 08 23:51:12 EDT 2019
> ;; MSG SIZE  rcvd: 87

Current workaround (doesn’t solve WhatsApp problems)

Activating the two OpenDNS IPv4 servers in the settings appears to address the issues accessing Facebook, Messenger Kids, and Hulu. (I’m continuing to watch results for those sites to verify the fix sticks.) Issues accessing WhatsApp Web and WhatsApp Desktop persist.

I chose to add the OpenDNS servers because OpenDNS is marked as supporting ECS, though I assume that this setup doesn’t run the ECS requests through unbound, based on this reference to unbound & ECS on Reddit. While that’s how I’d prefer to set things up, I don’t understand the information on configuring ECS in the unbound man pages well enough to try that approach.

Debug Token:

https://tricorder.pi-hole.net/8a3hdqtyl2

Can you provide a copy of the configuration you are using for unbound?

Unbound is asking direclty with the Authorative servers.This is different then the other show servers do.

Try with changed EDNS of 512.

    # Reduce EDNS reassembly buffer size.
    # Suggested by the unbound man page to reduce fragmentation reassembly problems
    edns-buffer-size: 512

Then, you are trying to find the closed CDN for those firms and that knows your ISP the best. Maybe they have those searched CDN servers even in their own network.

You can tell Ubound to only ask the DNS server of your ISP or VPN provider for the IP address of those CDN servers.

All other servers still are looked up with qname minimalization.

You can add as many you want of these lines to unbound:

forward-zone: name: "facebook.com"  forward-addr: 1.1.1.1
forward-zone: name: "facebook.net"  forward-addr: 1.1.1.1
forward-zone: name: "whatsapp.com"  forward-addr: 1.1.1.1

1.1.1.1 is an example and you should use the DNS server from your ISP or VPN provider.

From the unbound configuration manual, this may not be the best option.

edns-buffer-size: <number>
              Number of bytes size to advertise as the EDNS reassembly  buffer
              size.   This  is  the  value put into datagrams over UDP towards
              peers.  The actual buffer size is determined by  msg-buffer-size
              (both  for  TCP  and  UDP).   Do not set higher than that value.
              Default is 4096 which is RFC recommended.  If you have  fragmen-
              tation  reassembly  problems,  usually  seen as timeouts, then a
              value of 1472 can fix it.  Setting to 512 bypasses even the most
              stringent  path  MTU problems, but is seen as extreme, since the
              amount of TCP fallback generated is excessive (probably also for
              this resolver, consider tuning the outgoing tcp number).

I have posted earlier about using a 512 edns:

https://discourse.pi-hole.net/t/interesting-info-by-lets-encrypt-on-using-512-edns-size-in-unbound/14558

It could shed more light on this problem and I could only find one reference to RESERVED0 and that was due to fragmentation. I am in the dark on this so every option has to be taken in account.

For comparison purposes, I ran the same command using Buster and unbound and did not have the same result. Ignore the TTL difference - I have a minimum TTL configured on this install of unbound.

dig web.whatsapp.com -p5353
; <<>> DiG 9.10.3-P4-Raspbian <<>> web.whatsapp.com -p5353
;; global options: +cmd

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52360
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472

;; QUESTION SECTION:
;web.whatsapp.com. IN A

;; ANSWER SECTION:
web.whatsapp.com. 3600 IN CNAME mmx-ds.cdn.whatsapp.net.
mmx-ds.cdn.whatsapp.net. 3600 IN A 157.240.19.53

;; Query time: 288 msec
;; SERVER: 127.0.0.1#5353(127.0.0.1)
;; WHEN: Wed Oct 09 10:37:39 CDT 2019
;; MSG SIZE rcvd: 98

The difference is in this line from your dig result:

> mmx-ds.cdn.whatsapp.net. 28731 RESERVED0 A \# 4 1F0D4731

As noted by @DanSchaper, we need to look at your unbound configuration. A quick way to see what’s in the configuration directories is with this command:

sudo grep -v '#\|^$' -R /etc/unbound/unbound.conf.d

@DanSchaper, thanks for such a quick reply. Here’s the unbound config file.

server:
    # If no logfile is specified, syslog is used
    # logfile: "/var/log/unbound/unbound.log"
    verbosity: 0
    # log-queries: yes

    port: 5353
    do-ip4: yes
    do-udp: yes
    do-tcp: yes

    # May be set to yes if you have IPv6 connectivity
    do-ip6: no

    # Use this only when you downloaded the list of primary root servers!
    root-hints: "/var/lib/unbound/root.hints"

    # Trust glue only if it is within the servers authority
    harden-glue: yes

    # Require DNSSEC data for trust-anchored zones, if such data is absent, the zone becomes BOGUS
    harden-dnssec-stripped: yes

    # Don't use Capitalization randomization as it known to cause DNSSEC issues sometimes
    # see https://discourse.pi-hole.net/t/unbound-stubby-or-dnscrypt-proxy/9378 for further details
    use-caps-for-id: no

    # Reduce EDNS reassembly buffer size.
    # Suggested by the unbound man page to reduce fragmentation reassembly problems
    edns-buffer-size: 1472

    # Perform prefetching of close to expired message cache entries
    # This only applies to domains that have been frequently queried
    prefetch: yes

    # One thread should be sufficient, can be increased on beefy machines. In reality for most users running on small networks or on a single machine it should be unnecessary to seek performance enhancement by increasing num-threads above 1.
    num-threads: 1

    # Ensure kernel buffer is large enough to not lose messages in traffic spikes
    so-rcvbuf: 1m

    # Ensure privacy of local IP ranges
    private-address: 192.168.0.0/16
    private-address: 169.254.0.0/16
    private-address: 172.16.0.0/12
    private-address: 10.0.0.0/8
    private-address: fd00::/8
    private-address: fe80::/10

@msatter and @jfb, thanks for the discussion on buffer size. Also, thanks @msatter for the example forward-zone and forward-addr directives and @jfb for the comparison dig.

Here’s the current edns-buffer-size:
edns-buffer-size: 1472

It appears that timezones are forcing some space into our conversation. Thanks very much to all of you for your patience with that.

Please let me know if there’s something in the config file that points to a solution to getting the domains to resolve through unbound without forwarding to another domain server. What I understand from the thread so far, reducing edns-buffer-size may help but generates more TCP fallback than is recommended (excessive per unbound's man page).

I’d love to hear further if there’s a way I can better tune my unbound (or perhaps other pi-hole) settings so that all DNS requests run through unbound resolve properly.

Updated workaround

I’ve updated my workaround based on @msatter’s forward-zone recommendation, adding forwards to the zones for Facebook, WhatsApp, Hulu and their CDNs. Based on a couple days’ testing, this addresses the end-user issue of “why can’t I get to Facebook, WhatsApp, etc.?”

# Forward facebook, whatsapp, hulu domains
forward-zone: name: "facebook.com" forward-addr: 1.1.1.1
forward-zone: name: "facebook.net" forward-addr: 1.1.1.1
forward-zone: name: "fbcdn.net" forward-addr: 1.1.1.1
forward-zone: name: "akamaihd.net" forward-addr: 1.1.1.1
forward-zone: name: "whatsapp.com" forward-addr: 1.1.1.1
forward-zone: name: "whatsapp.net" forward-addr: 1.1.1.1
forward-zone: name: "hulu.com" forward-addr: 1.1.1.1
forward-zone: name: "hulustream.com" forward-addr: 1.1.1.1

Privacy cost to workaround?

If I understand this approach properly, it comes at a privacy cost. While Cloudflare promises not to track or sell information on requests to 1.1.1.1, I assume that forwards from unbound to an upstream DNS server are sent in the clear like “normal” DNS requests.

Have I understood that properly?

You can encrypt traffic also with Unbound as you can see in this thread:

https://discourse.pi-hole.net/t/slow-cloudflare-response-times-when-using-unbound-dns-over-tls/24497

You also have to state the location of the cert.pem file. See example config file for Unbound on this page:
https://dnsprivacy.org/wiki/m/mobile.action#page/1278020

forward-zone: name: "facebook.com" forward-addr: 1.1.1.1@853#cloudflare-dns.com forward-tls-upstream: yes

Thanks for the update on encrypting traffic when forwarding.

I haven’t executed it successfully yet because I’m not sure what .pem file to reference. The cert bundles on my machine don’t match the example in https://dnsprivacy.org/wiki/m/mobile.action#page/1278020. Searching for info on what .pem to use for a secure connection with cloudflare-dns.com I’ve come up empty. Where would you recommend I look for additional information on what .pem to use?

Thanks very much for all your help. As your previous response on forwarding requests addressed the original problem I raised, I’ve marked that response as the solution.

That is strange to use a .pem and you need the ca-certificates.crt file coming with Linux:

# use ca-certificate
tls-cert-bundle: "/etc/ssl/certs/ca-certificates.crt"

Cloudflare should be know then.