DNSSEC discussion - support for proxy-dnssec

No, INSECURE is wrong here. This seems to suggest that there is no EDE for upstream or it isn't read properly.

for some reason, there is a difference between:

cleared the FTL log, performed the requests as listed above, with debug flags enabled.

FTL.zip (9.6 KB)

I don't have any DNSSEC extensions in the browser.

edit
the messages start changing


BOGUS (DNSSEC bogus)
/edit

Okay, this is interesting... The first one behaves as expected (BOGUS) and is of type A. The two others are not. However, this isn't the important difference here. Your upstream apparently really didn't include the EDE codes in the two INSECURE cases.

To verify this, we need to pump the reply received from unbound. It can be enabled by adding the following to a file like /etc/dnsmasq.d/99-record.conf:

dumpfile=/etc/pihole/dump.pcap
dumpmask=0x00ff

This file should show us if my speculation is right and it is missing from upstream. Please also upgrade your FTL before as I added a bit more debug logging to the ENDS0 code (no functional change) - the compilation is still running and should be done in 1-2 minutes (done and uploaded).

means you're having fun?

  • added the 99-record.conf file

  • installed new ftl
    Pi-hole version is v5.16.2 (Latest: v5.16.2)
    AdminLTE version is v5.19 (Latest: v5.19)
    FTL version is new/ede-dnssec vDev-538c6a0 (Latest: v5.22)

  • cleared ftl log

  • ran the tests in the given order (dig A , followed by dig AAAA, edge, firefox

FTL.zip (35.5 KB)

Thanks, is there still something unexpected in the Query Log (dashboard) as in your screenshot earlier?

start of test:

already forwarded:

Yes, the attached PCAP file confirms the EDE information is missing in those cases where FTL shown you INSECURE status:

vs. one of the BOGUS answers:

I don't think we're seeing an dnsmasq/FTL bug here (well, except dumpfile would be broken, too).

implies I need to create an issue @ unbound, this to have this investigated / corrected.

please don't throw this branch away, if somebody is interested @ unbound, they might fix this fairly soon....

edit
another anomaly, dig from windows machine:

unbound issue created here.

unbound mailing list conversation here.
/edit

edit2
related to this unbound PR (not yet merged)? tested (mesh.c from features/eds-with-cd-bit), not a solution...
/edit2

@jpgpi250 one further detail from the last FTL log: You are using both your unbounds's IPv4 and IPv6 addresses internally. This creates a lot of additional traffic in your internal network. Reason: long running queries aren't finished yet, FTL thinks the upstream may have stopped working and starts broadcasting all upstream packets to all configured servers. This causes a lot of unnecessary traffic and unbound doesn't reply in order (it doesn't have to according to DNS specifications). I don't think this explains it but it is the only visible change from running dig (performing exactly one query) to running in the browser (running multiple queries simultaneously).

Please try what changes when you only use one of the internal IP addresses for your unbound upstream. And note that - for the reasons mentioned above, it is in general disadvantageous to add the same server multiple times (even if over different protocols). This might be the reason for the confusion here.


What is abnormal here? I see a domain that should be BOGUS + SERVFAIL and this exactly what is being reported? The first BOGUS is Pi-hole's interpretation of the DNSSEC status, the text in parentheses is what was found as the reason (in this case, unbound told us "DNSSEC bogus"). There can be other texts there, e.g., "DNSKEY missing" or "... expired", etc. so it is meaningful to have this in addition.

I tried running pihole-FTL with a single upstream (IPv4). The results are the same, correct with dig, incorrect with browser. Repeated the test with only IPv6 upstream, same result.

I also tried removing redis (unbound cache module) from the equation, same result, correct with dig, incorrect with browser.

what is the recommendation, use IPv4 or IPv6 as upstream to unbound? The dashboard indicates IPv4 (38.7%), IPv6 (12.4%), but this is probably caused by the order in the conf file.

Using IPv4 only is the recommendation. Whenever direct routing is available (e.g. in your local network), it doesn't make much of a difference but IPv6 packets are a little bit larger than IPv4. Whenever NAT is involved (e.g. when the packet goes out to the public Internet) IPv6 is generally the better choice because less rewriting of the packet and more obvious routing makes IPv6 faster in most multi-hop scenarios.

Okay. I thought (hoped?) so, it would probably be more difficult to debug such a IPv4/IPv6 convolution scenario. Could you provide another log+pcap ZIP file in this simpler case? It will make the analysis a lot easier (especially for possible unbound developers coming here and not being familiar with the quite verbose FTL log output).

1 Like

IPv4 only (upstream to unbound), only removed server=fdaa:bbcc:ddee:2::5552#5552, server=127.10.10.2#5552 is active.
test order:

  • dig A
  • dig AAAA
  • edge
  • firefox

FTL.zip (8.5 KB)

There are no debug flags enabled on your last run but this isn't important as the PCAP shows us pretty clearly what is going on.

Working

Request (packet 28)

Response (packet 29)

Not working

Request (packet 36)

Response (packet 37)


You may have noticed that the "working" request looks different (much longer, less white area) than the "not working" one. dig is telling dnsmasq (which does nothing more than forwarding the request to unbound) that it explicitly that it supports EDNS data. The browser lookup doesn't do this (missing "EDNS0 version: 0" in the "not working request" Wireshark screenshot above) and, as consequence, unbound doesn't attach the EDE at all. One may now argue whether this is expected (and intended) behavior or not but I could understand if the unbound folks say that they don't want to provide information that has not been requested (or, rather, where support for has not been announced).

Now that we know what is happening, we can reliably trigger it with

dig +noedns www.dnssec-failed.org

I see three possible way for going forward:

  1. unbound send the EDE data even if EDNS0 was not explicitly requested
  2. dnsmasq adds the signaling of EDNS0 support into forwarded packets that do not contain it
  3. FTL interprets all SERVFAIL as being BOGUS in proxy-dnssec mode when no EDE is contained

I would prefer option 1 as I don't see much reasoning for no. 2 to be implemented by Simon. I don't know if sending down EDNS0 data that was not requested has the potential to break things so we'd possibly even need to remember that we'd have to strip this. The third and last option would obviously be the simplest, however, it'd mark all SERVFAIL as BOGUS without any real justification for doing so. Say the unbound server has some other problem causing it to return SERVFAIL (e.g. it cannot resolve parts of a recursive path because the nameserver of example.com is currently down), the Pi-hole would falsely stamp BOGUS here even when this is completely wrong.

NOT your preferred option, yet I think that whenever proxy-dnssec is enabled, dnsmasq should signal that it "needs" EDE data for all queries. This way, people, not using proxy-dnssec, don't receive the additional EDE data, when it isn't required to make everything work.

I've already added an entry to the dnsmasq mailing list, here. Maybe you could add some valuable info into a mailing list entry, or convince Simon this would be a major improvement.

I don't think this will be possible because it isn't true - this would be no improvement as dnsmasq itself isn't using the EDE code for anything.


However, we found that dnsmasq already has option no. 2 above!

Just add something that makes dnsmasq add EDNS0 data on its own, e.g.

add-cpe-id=01234

(could also be add-mac or add-subnet) into a config file. The added EDNS0 data will be stripped away before sending the reply back to your client as I was suspecting earlier.

Please add this line and upgrade to the latest version of FTL. I needed to change something to ensure also subsequently stripped EDNS0 data will be read. Binaries already built, tested and uploaded while I was typing this reply.

pihole -v
Pi-hole version is v5.16.2 (Latest: v5.16.2)
AdminLTE version is v5.19 (Latest: v5.19)
FTL version is new/ede-dnssec vDev-72b4bc5 (Latest: v5.22)

  • added add-cpe-id=01234

improvement, but still some confusion (screenshot is all from edge, not in pcap)

FTL.zip (16.4 KB)

latest results, the number of wrong entries has decreased - very few wrong entries, sometimes none (cache?)

FTL.zip (38.5 KB)

your latest commit failed to build?

The log only shows 11:11:00 - 11:11:43 so 11:15:00 from your screenshot is not included.

However, looking at the screenshot alone suggests that this is happening because so many queries are needlessly done in parallel. dnsmasq identifies them as being exact duplicates and simply ignores them altogether ("already forwarded"). 50 identical queries will receive only one reply in the end. You see hte INSECURE here because of a glitch in the new FTL logic that says "if it is neither SECURE not BOGUS, then it must by INSECURE". In the exact case you have found, this is not true because they are ignored so their DNSSEC status is actually none at all. I'm working on this as time allows, the failed build is actually right, I forgot to adjust the tests in case of failed duplicated replies.

already forwarded -> unknown status ?
means no info displayed in the query log ?

Yes, because they are ignored. Looking at your screenshot

all of the "already forwarded" are thrown away and not forwarded another time to unbound to (a) not overwhelm it and (b) because it is not necessary, dnsmasq knows it is already in progress. They are also never answered (hence reply N/A) so there is really nothing to see here. Already pushed - tests should pass now passed, binaries ready.

Pi-hole version is v5.16.2 (Latest: v5.16.2)
AdminLTE version is v5.19 (Latest: v5.19)
FTL version is new/ede-dnssec vDev-d228fbf (Latest: v5.22)


(more entries - all OK - not in screenshot)

AMAZING!!!

FTL.zip (29.2 KB) (log and pcap)

All FTL tests (using NSD) complete successfully (json test data included)
testftl.zip (2.2 KB)

summary:

  • added to unbound:
	# required for proxy-dnssec (dnsmasq)
	# https://nlnetlabs.nl/news/2022/Jun/02/unbound-1.16.0-released/
	# https://blog.nlnetlabs.nl/extended-dns-error-support-for-unbound/
	ede: yes

added to dnsmasq (ensure DNSSEC isn't enabled in settings!)

# requires use of "ede: yes" in unbound.conf
proxy-dnssec
add-cpe-id=01234

grafana:
image

THANKS for your time, effort, persistence !!!

1 Like