DNSSEC discussion - support for proxy-dnssec

Yes, the attached PCAP file confirms the EDE information is missing in those cases where FTL shown you INSECURE status:

vs. one of the BOGUS answers:

I don't think we're seeing an dnsmasq/FTL bug here (well, except dumpfile would be broken, too).

implies I need to create an issue @ unbound, this to have this investigated / corrected.

please don't throw this branch away, if somebody is interested @ unbound, they might fix this fairly soon....

edit
another anomaly, dig from windows machine:

unbound issue created here.

unbound mailing list conversation here.
/edit

edit2
related to this unbound PR (not yet merged)? tested (mesh.c from features/eds-with-cd-bit), not a solution...
/edit2

@jpgpi250 one further detail from the last FTL log: You are using both your unbounds's IPv4 and IPv6 addresses internally. This creates a lot of additional traffic in your internal network. Reason: long running queries aren't finished yet, FTL thinks the upstream may have stopped working and starts broadcasting all upstream packets to all configured servers. This causes a lot of unnecessary traffic and unbound doesn't reply in order (it doesn't have to according to DNS specifications). I don't think this explains it but it is the only visible change from running dig (performing exactly one query) to running in the browser (running multiple queries simultaneously).

Please try what changes when you only use one of the internal IP addresses for your unbound upstream. And note that - for the reasons mentioned above, it is in general disadvantageous to add the same server multiple times (even if over different protocols). This might be the reason for the confusion here.


What is abnormal here? I see a domain that should be BOGUS + SERVFAIL and this exactly what is being reported? The first BOGUS is Pi-hole's interpretation of the DNSSEC status, the text in parentheses is what was found as the reason (in this case, unbound told us "DNSSEC bogus"). There can be other texts there, e.g., "DNSKEY missing" or "... expired", etc. so it is meaningful to have this in addition.

I tried running pihole-FTL with a single upstream (IPv4). The results are the same, correct with dig, incorrect with browser. Repeated the test with only IPv6 upstream, same result.

I also tried removing redis (unbound cache module) from the equation, same result, correct with dig, incorrect with browser.

what is the recommendation, use IPv4 or IPv6 as upstream to unbound? The dashboard indicates IPv4 (38.7%), IPv6 (12.4%), but this is probably caused by the order in the conf file.

Using IPv4 only is the recommendation. Whenever direct routing is available (e.g. in your local network), it doesn't make much of a difference but IPv6 packets are a little bit larger than IPv4. Whenever NAT is involved (e.g. when the packet goes out to the public Internet) IPv6 is generally the better choice because less rewriting of the packet and more obvious routing makes IPv6 faster in most multi-hop scenarios.

Okay. I thought (hoped?) so, it would probably be more difficult to debug such a IPv4/IPv6 convolution scenario. Could you provide another log+pcap ZIP file in this simpler case? It will make the analysis a lot easier (especially for possible unbound developers coming here and not being familiar with the quite verbose FTL log output).

1 Like

IPv4 only (upstream to unbound), only removed server=fdaa:bbcc:ddee:2::5552#5552, server=127.10.10.2#5552 is active.
test order:

  • dig A
  • dig AAAA
  • edge
  • firefox

FTL.zip (8.5 KB)

There are no debug flags enabled on your last run but this isn't important as the PCAP shows us pretty clearly what is going on.

Working

Request (packet 28)

Response (packet 29)

Not working

Request (packet 36)

Response (packet 37)


You may have noticed that the "working" request looks different (much longer, less white area) than the "not working" one. dig is telling dnsmasq (which does nothing more than forwarding the request to unbound) that it explicitly that it supports EDNS data. The browser lookup doesn't do this (missing "EDNS0 version: 0" in the "not working request" Wireshark screenshot above) and, as consequence, unbound doesn't attach the EDE at all. One may now argue whether this is expected (and intended) behavior or not but I could understand if the unbound folks say that they don't want to provide information that has not been requested (or, rather, where support for has not been announced).

Now that we know what is happening, we can reliably trigger it with

dig +noedns www.dnssec-failed.org

I see three possible way for going forward:

  1. unbound send the EDE data even if EDNS0 was not explicitly requested
  2. dnsmasq adds the signaling of EDNS0 support into forwarded packets that do not contain it
  3. FTL interprets all SERVFAIL as being BOGUS in proxy-dnssec mode when no EDE is contained

I would prefer option 1 as I don't see much reasoning for no. 2 to be implemented by Simon. I don't know if sending down EDNS0 data that was not requested has the potential to break things so we'd possibly even need to remember that we'd have to strip this. The third and last option would obviously be the simplest, however, it'd mark all SERVFAIL as BOGUS without any real justification for doing so. Say the unbound server has some other problem causing it to return SERVFAIL (e.g. it cannot resolve parts of a recursive path because the nameserver of example.com is currently down), the Pi-hole would falsely stamp BOGUS here even when this is completely wrong.

NOT your preferred option, yet I think that whenever proxy-dnssec is enabled, dnsmasq should signal that it "needs" EDE data for all queries. This way, people, not using proxy-dnssec, don't receive the additional EDE data, when it isn't required to make everything work.

I've already added an entry to the dnsmasq mailing list, here. Maybe you could add some valuable info into a mailing list entry, or convince Simon this would be a major improvement.

I don't think this will be possible because it isn't true - this would be no improvement as dnsmasq itself isn't using the EDE code for anything.


However, we found that dnsmasq already has option no. 2 above!

Just add something that makes dnsmasq add EDNS0 data on its own, e.g.

add-cpe-id=01234

(could also be add-mac or add-subnet) into a config file. The added EDNS0 data will be stripped away before sending the reply back to your client as I was suspecting earlier.

Please add this line and upgrade to the latest version of FTL. I needed to change something to ensure also subsequently stripped EDNS0 data will be read. Binaries already built, tested and uploaded while I was typing this reply.

pihole -v
Pi-hole version is v5.16.2 (Latest: v5.16.2)
AdminLTE version is v5.19 (Latest: v5.19)
FTL version is new/ede-dnssec vDev-72b4bc5 (Latest: v5.22)

  • added add-cpe-id=01234

improvement, but still some confusion (screenshot is all from edge, not in pcap)

FTL.zip (16.4 KB)

latest results, the number of wrong entries has decreased - very few wrong entries, sometimes none (cache?)

FTL.zip (38.5 KB)

your latest commit failed to build?

The log only shows 11:11:00 - 11:11:43 so 11:15:00 from your screenshot is not included.

However, looking at the screenshot alone suggests that this is happening because so many queries are needlessly done in parallel. dnsmasq identifies them as being exact duplicates and simply ignores them altogether ("already forwarded"). 50 identical queries will receive only one reply in the end. You see hte INSECURE here because of a glitch in the new FTL logic that says "if it is neither SECURE not BOGUS, then it must by INSECURE". In the exact case you have found, this is not true because they are ignored so their DNSSEC status is actually none at all. I'm working on this as time allows, the failed build is actually right, I forgot to adjust the tests in case of failed duplicated replies.

already forwarded -> unknown status ?
means no info displayed in the query log ?

Yes, because they are ignored. Looking at your screenshot

all of the "already forwarded" are thrown away and not forwarded another time to unbound to (a) not overwhelm it and (b) because it is not necessary, dnsmasq knows it is already in progress. They are also never answered (hence reply N/A) so there is really nothing to see here. Already pushed - tests should pass now passed, binaries ready.

Pi-hole version is v5.16.2 (Latest: v5.16.2)
AdminLTE version is v5.19 (Latest: v5.19)
FTL version is new/ede-dnssec vDev-d228fbf (Latest: v5.22)


(more entries - all OK - not in screenshot)

AMAZING!!!

FTL.zip (29.2 KB) (log and pcap)

All FTL tests (using NSD) complete successfully (json test data included)
testftl.zip (2.2 KB)

summary:

  • added to unbound:
	# required for proxy-dnssec (dnsmasq)
	# https://nlnetlabs.nl/news/2022/Jun/02/unbound-1.16.0-released/
	# https://blog.nlnetlabs.nl/extended-dns-error-support-for-unbound/
	ede: yes

added to dnsmasq (ensure DNSSEC isn't enabled in settings!)

# requires use of "ede: yes" in unbound.conf
proxy-dnssec
add-cpe-id=01234

grafana:
image

THANKS for your time, effort, persistence !!!

1 Like

Unrelated to the actual issue, only to expand on the previous:

You may want to consider privacy issues when deciding whether to opt for IPv4 or IPv6 for upstream connectivity.
Certain constellations of your Pi-hole host machine's IPv6 addresses would potentially make that address more susceptible to being tracked, e.g. if the host's OS would not use temporary IPv6 addresses (aka Privacy Extensions), but rather EUI-based or perhaps stable private addresses. To a lesser extent, that may also apply to IPv4.

As often, it would ultimately depend on your individual network's properties (including ISP behaviour with regard to public IP assignment) as well as your personal preferences what you'd ultimately opt for.

This also works on the Nightly docker image.

Just wondering: this presumably doesn't imply that domains with good DNSSEC signatures are now also identified as 'SECURE' in the Pi-Hole Query Log, is that correct?

It should show SECURE if a valid DNSSEC signature is present.

Be aware there are currently two unbound PRs, that should ensure the EDE info is also cached. These PRs aren't merged in to master yet, so even compiling from GitHub master doesn't enable caching (see issue, comment from gthess). I don't think ede-serve-expired works in the current version.

So I was playing around a bit (the below is with the standard Pi-Hole cache size of 10000).

Now, this happens when I dig for dnssec-deployment.org: first time, 'ad' flag is there, subsequent goes show a response without the flag... I know these are served from the Pi-Hole cache as I checked the Pi-Hole webinterface immediately after
(172.16.0.2 is the Pi-Hole address)

pi@Firewalla:~/.firewalla/run/docker/pi-hole/etc-dnsmasq (Purple 2.0) $ dig dnssec-deployment.org @172.16.0.2

; <<>> DiG 9.16.1-Ubuntu <<>> dnssec-deployment.org @172.16.0.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12278
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; OPT=15: 00 03 ("..")
;; QUESTION SECTION:
;dnssec-deployment.org.         IN      A

;; ANSWER SECTION:
dnssec-deployment.org.  30      IN      A       104.18.25.160
dnssec-deployment.org.  30      IN      A       104.18.24.160

;; Query time: 4 msec
;; SERVER: 172.16.0.2#53(172.16.0.2)
;; WHEN: Thu Apr 13 14:15:02 CEST 2023
;; MSG SIZE  rcvd: 88

pi@Firewalla:~/.firewalla/run/docker/pi-hole/etc-dnsmasq (Purple 2.0) $ dig dnssec-deployment.org @172.16.0.2

; <<>> DiG 9.16.1-Ubuntu <<>> dnssec-deployment.org @172.16.0.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9864
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;dnssec-deployment.org.         IN      A

;; ANSWER SECTION:
dnssec-deployment.org.  27      IN      A       104.18.25.160
dnssec-deployment.org.  27      IN      A       104.18.24.160

;; Query time: 4 msec
;; SERVER: 172.16.0.2#53(172.16.0.2)
;; WHEN: Thu Apr 13 14:15:05 CEST 2023
;; MSG SIZE  rcvd: 82

pi@Firewalla:~/.firewalla/run/docker/pi-hole/etc-dnsmasq (Purple 2.0) $ dig dnssec-deployment.org @172.16.0.2

; <<>> DiG 9.16.1-Ubuntu <<>> dnssec-deployment.org @172.16.0.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46244
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;dnssec-deployment.org.         IN      A

;; ANSWER SECTION:
dnssec-deployment.org.  14      IN      A       104.18.24.160
dnssec-deployment.org.  14      IN      A       104.18.25.160

;; Query time: 0 msec
;; SERVER: 172.16.0.2#53(172.16.0.2)
;; WHEN: Thu Apr 13 14:15:18 CEST 2023
;; MSG SIZE  rcvd: 82

Trying to upload the Pi-Hole webinterface Query Log screenshot but it's not working.

 2023-04-13 14:15:18 	A	dnssec-deployment.org	172.16.0.1	OK (cache)
INSECURE	IP (0.2ms)	
2023-04-13 14:15:05 	A	dnssec-deployment.org	172.16.0.1	OK (cache)
INSECURE	IP (1.3ms)	
2023-04-13 14:15:02 	A	dnssec-deployment.org	172.16.0.1	OK (answered by unbound#53)
SECURE (stale answer)	IP (3.2ms)	

Now, if I change Pi-Hole cache size to 0, all dig results have the 'ad' flag.

Next test: dane.sys4.de

If I now use Firefox to go to the 'website', Pi-Hole shows 'INSECURE' in the log. If I perform dig for the same address afterwards, I get the 'ad' flag every time and on top of that, when looking into the Pi-Hole logs after, these indeed are identified as 'SECURE'.

can confirm, and query log always says SECURE.

I've been using cache-size=0 for years now, which is probably why my results were initially positive (and shouted victory to soon).

Tried to retrieve a webpage on the pi-hole terminal, using lynx (sudo apt-get -y install lynx):
lynx -dump dnssec-deployment.org -> not a single SECURE reported, all INSECURE.