Case mismatch in DNS reply

@DL6ER

Here is the log file snippet:

pi@pi5:~$ sudo cat /var/log/pihole/pihole.log | grep 11:53:52
Feb  8 11:53:52 dnsmasq[40582]: query[A] fw-snaps.prod.gws.ring.amazon.dev from 172.31.31.84
Feb  8 11:53:52 dnsmasq[40582]: forwarded fw-snaps.prod.gws.ring.amazon.dev to 127.0.0.1#5335
Feb  8 11:53:52 dnsmasq[40582]: query[A] fw-snaps.prod.gws.ring.amazon.dev from 172.31.31.84
Feb  8 11:53:52 dnsmasq[40582]: forwarded fw-snaps.prod.gws.ring.amazon.dev to 127.0.0.1#5335
Feb  8 11:53:52 dnsmasq[40582]: Case mismatch in DNS reply - check bit 0x20 encoding.
Feb  8 11:53:52 dnsmasq[40582]: dnssec-query[DS] dev to 127.0.0.1#5335
Feb  8 11:53:52 dnsmasq[40582]: reply dev is DS for keytag 60074, algo 8, digest 2
Feb  8 11:53:52 dnsmasq[40582]: dnssec-query[DS] amazon.dev to 127.0.0.1#5335
Feb  8 11:53:52 dnsmasq[40582]: dnssec-query[DNSKEY] dev to 127.0.0.1#5335
Feb  8 11:53:52 dnsmasq[40582]: reply dev is DNSKEY keytag 60074, algo 8
Feb  8 11:53:52 dnsmasq[40582]: reply dev is DNSKEY keytag 20076, algo 8
Feb  8 11:53:52 dnsmasq[40582]: reply amazon.dev is no DS
Feb  8 11:53:52 dnsmasq[40582]: dnssec-query[DS] ring.com to 127.0.0.1#5335
Feb  8 11:53:52 dnsmasq[40582]: reply ring.com is no DS
Feb  8 11:53:52 dnsmasq[40582]: validation result is INSECURE
Feb  8 11:53:52 dnsmasq[40582]: reply fw-snaps.prod.gws.ring.amazon.dev is <CNAME>
Feb  8 11:53:52 dnsmasq[40582]: reply fw-snaps.ring.com is <CNAME>
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw.prod.gws.ring.amazon.dev is <CNAME>
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-geo.prod.gws.ring.amazon.dev is <CNAME>
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw.us-west-2.prod.gws.ring.amazon.dev is <CNAME>
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw.c0.us-west-2.prod.gws.ring.amazon.dev is <CNAME>
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-alb.c0.us-west-2.prod.gws.ring.amazon.dev is 54.148.25.142
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-alb.c0.us-west-2.prod.gws.ring.amazon.dev is 44.237.0.76
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-alb.c0.us-west-2.prod.gws.ring.amazon.dev is 100.20.16.147
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-alb.c0.us-west-2.prod.gws.ring.amazon.dev is 52.10.146.130
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-alb.c0.us-west-2.prod.gws.ring.amazon.dev is 52.13.244.82
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-alb.c0.us-west-2.prod.gws.ring.amazon.dev is 52.37.230.176
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-alb.c0.us-west-2.prod.gws.ring.amazon.dev is 34.213.127.239
Feb  8 11:53:52 dnsmasq[40582]: reply snapshotsgw-alb.c0.us-west-2.prod.gws.ring.amazon.dev is 34.210.112.176
Feb  8 11:53:52 dnsmasq[40582]: query[HTTPS] self-events-data.trafficmanager.net from 192.168.10.112
Feb  8 11:53:52 dnsmasq[40582]: forwarded self-events-data.trafficmanager.net to 127.0.0.1#5335
Feb  8 11:53:52 dnsmasq[40582]: query[A] self-events-data.trafficmanager.net from 192.168.10.112
Feb  8 11:53:52 dnsmasq[40582]: forwarded self-events-data.trafficmanager.net to 127.0.0.1#5335

Larger section of log file here: https://tricorder.pi-hole.net/oOUizB7A/

and the pcap file:
dns.pcap (15.8 KB)

Note: I tried your origional tcpdump command sudo tcpdump -i lo -w /tmp/dns.pcap -n -p tcp port 5335and got nothing so tried udp: sudo tcpdump -i lo -w /tmp/dns.pcap -n -p udp port 5335 and it collected the above file.

@sawsanders Thank you for the pcap and the log, I am pretty sure we know what to look out for now. The issue is that you see two queries immediately following each other with no time between them so a reply could have been received from upstream. The second try triggers a "re-forwarding" as dnsmasq thinks the client retried because it didn't receive a response for an initial query:

Feb  8 11:53:52 dnsmasq[40582]: query[A] fw-snaps.prod.gws.ring.amazon.dev from 172.31.31.84
Feb  8 11:53:52 dnsmasq[40582]: forwarded fw-snaps.prod.gws.ring.amazon.dev to 127.0.0.1#5335

Feb  8 11:53:52 dnsmasq[40582]: query[A] fw-snaps.prod.gws.ring.amazon.dev from 172.31.31.84
Feb  8 11:53:52 dnsmasq[40582]: forwarded fw-snaps.prod.gws.ring.amazon.dev to 127.0.0.1#5335

Feb  8 11:53:52 dnsmasq[40582]: Case mismatch in DNS reply - check bit 0x20 encoding.

The pcap now reveals that the first query is sent in MiXeD CAse (to add some extra entropy into the query, complicating any cache spoofing attempts) but the second is all-lowercase. When the reply to the second now arrives earlier, the case of (original mixed case) query and (now all lowercase) response differ and, hence, the warning is triggered:
image

Standard query 0x2d75 A Fw-SNaPs.pROD.gws.rINg.AmAZON.deV OPT <----- mixed case
Standard query 0x2d75 A fw-snaps.prod.gws.ring.amazon.dev OPT <----- lower case
Standard query response 0x2d75 A fw-snaps.prod.gws.ring.amazon.dev CNAME <----- lower case
Standard query response 0x2d75 A Fw-SNaPs.pROD.gws.rINg.AmAZON.deV CNAME <----- mixed case

@mwoolweaver1 It appears to be the same in your cloudflared-doh setup, the log is just a bit more crowded making it harder to see this.

@seh2000 Same for you, likely.


I'm already in contact with the maintainer of dnsmasq and sent them a summary of by bug description. We typically stay away from fixing these things on our own to avoid incompatibility with future updates of dnsmasq.

5 Likes

I did just update the branch update/dnsmasq with the most recent fix for this. Id' be awesome if you could try updating the branch and see if the issue comes up again or is now - once and for all - truly fixed. This complements the previous fix, which fixed the retry path when the query is retried from a different id/source address, this fixes new fix is for retries from the same id/source address.

sudo pihole checkout ftl update/dnsmasq
4 Likes

So far so good! I’ve seen many double queries in very quick succession from the same client in the logs and no errors thus far!

I will continue to monitor and report back. I appreciate all the work that’s gone into making such an awesome piece of software!

EDIT: 4 hours later and no errors.

Looks like it's been merged into development already anyway, so I'm switching back to that branch.

1 Like

There is another (minor) security commit in update/dnsmasq which is not (yet) in development but it really shouldn't make much of a difference. It is more to prevent specifically tailored malicious DNS clients from being able to cause malformed packets going upstream. But, as this was fixed before any dnsmasq release was officially done, there will be nobody trying to actively exploit this one. Even when, no real hard could be caused else than maybe the upstream server temporary blocking you.

4 Likes

This has not shown up in at least 5 days. Thank you for the work you do!

2 Likes

hello, just updated to v6.0 yesterday and saw this error in my logs and wanted to confirm if the fix was supplied with the new v 6.0 release.

thanks

I am using v6.0 using Docker and I am also experiencing this error.

Good day,
same here Case mismatch in DNS reply - check bit 0x20 encoding.
I am using Bookworm in a Proxmox/LXC (privileged)
Until I did the pihole update today, my life was good...now I have almost no internet connection, i.e. sometimes very slow but most of the time no access...
any help appreciated

This error means your upstream DNS provider is messing around with query capitalization and your Pi-hole is rejecting the replies as a measure against DNS cache poisoning (an attack vector on DNS servers).

Please try

sudo pihole-FTL --config misc.dnsmasq_lines '["no-0x20-encode"]'

and check if this resolves the issues you are seeing. If so (and even if not): What are your upstream servers?

1 Like

It'd also be awesome if you could record the traffic between you and your upstream so we can analyze what is really happening here. This mainly to rule out a possible bug in dnsmasq/Pi-hole and to confirm that it is indeed the upstream which is misbehaving here.

1 Like

I noticed the error message came back, several hours after update to 6.0.1 so I ran the command sudo pihole-FTL --config misc.dnsmasq_lines '["no-0x20-encode"]'

btw, can you please tell what this command does?

will monitor for the next 24 hrs and see if it comes back.

Could you please tell how I can obtain the upstream servers for you, and even how to record the traffic?

thank you.

1 Like

It adds a line to dnsmasq configuration, containing no-0x20-encode.

2 Likes

Ah, yeah, sorry, I should have been more detailed. The upstream servers are simply the DNS servers you configured your queries are sent to. Easiest would be if you could upload a debug log using sudo pihole -d and agree to upload. Once done, please provide the ID.

Recording the traffic should be as easy as setting

sudo pihole-FTL --config files.pcap /tmp/dns.pcap

which will create a new file /tmp/dns.pcap containing all the relevant traffic. You can then send us this file via direct/private message (or share it here if you don't mind) and I will start analyzing it.

The relevant log will be in /var/log/pihole/pihole.log, please share this one, too, after it happened.

1 Like

EDIT: I believe the fix if using Unbound is to enable the "use-caps-for-id" configuration line item. This is the 0x20 DNS encoding that PiHole v6 has enabled. If you disable the feature as DL6ER provides, you can leave the "use-caps-for-id" to no in UnBound.
More details in my original post below that may clarify for more visitors here.

I just upgraded to v6.0.3 this week and noticed these errors in my log. I'm also using local Unbound on two PiHole Podman containers.

Unlike the v6 configuration states, I've completely disabled resolved on my Fedora hosts and simply reference three entries in my resolv.conf file and have been running this way for years without issue:
Default GW
PiHole1 (Unbound1)
PiHole2 (Unbound 2)

I found this config line item in my unbound.config which seems to contradict your quoted statement. Having the following configured should not be capitalizing requests if I'm understanding this correctly. Should an issue be raised with the unbound dev's about this?

EDIT: PiHole is utilizing 0x20 but UnBound's default is to not utilize 0x20. DL6ER's statement is technically correct. UnBound is messing with the capitalization. PiHole is sending randomized capitalization (0x20 enabled) and UnBound is formatting it all to lowercase (0x20 disabled).

# Don't use Capitalization randomization as it known to cause DNSSEC issues sometimes
 # see https://discourse.pi-hole.net/t/unbound-stubby-or-dnscrypt-proxy/9378 for further details
 use-caps-for-id: no

I found another post with more information on this DNS feature for others that stumble here:

So what does this mean for those of us using Unbound with PiHole and how can we get our Unbound and PiHole instances in sync with 0x20 encoding?

It seems enabling this feature in UnBound can potentially resolve this error. I'll enable and report back over the weekend if it does.
EDIT: I'm still testing this but I believe enabling 0x20 in UnBound config will resolve this issue. Will report back after running this over the weekend.

3 Likes

Good day,
Maybe I screwed up because I also updated Bullseye to Bookworm prior to the Pi-Hole update.
As I was "under pressure by family", I simply trashed the LXC with the updated pi-hole and restored a backup. So my old version
Pi-hole v5.18.3, FTL v5.25.2, Web Interface v5.21
was up and running and working perfectly well right away.
Since I don't want to loose the DNS again I leave it as it is w/o updating it.

I just had the same error: https://tricorder.pi-hole.net/bCazAgXc/

@DL6ER

After enabling 0x20 in UnBound I am also still seeing this error.
PiHole1: https://tricorder.pi-hole.net/hDcKAxyL/
PiHole2: https://tricorder.pi-hole.net/NrrVTVJR/

While looking at some of these entries captured by the grep, for some reason it is forwarding some of my internal PiHole DNS records to my default GW IP which does not have the DNS entries (yet). The names still resolve internally and they are not external facing.

1 Like

same for me, seeing that after v6 upgrade:

changing unbound config doesnt solve

I would not expect enabling this option in unbound to resolve anything as, when I interpret the according unbound documentation correctly, unbound just does this to queries itself forwards somewhere else. This does not seem to relate at all to queries coming from your Pi-hole to your -unbound.

I am trying to reproduce this locally with my own unbound container (unbound 1.22.0) and failed so far - everything works fine here, I followed the same Pi-hole guide for unbound you probably followed.

Could anyone here follow my suggestion above? It's be very helpful if I'd have some real data to look at instead of groping in the dark.

2 Likes