Okay, this is interesting... The first one behaves as expected (BOGUS) and is of type A. The two others are not. However, this isn't the important difference here. Your upstream apparently really didn't include the EDE codes in the two INSECURE cases.
To verify this, we need to pump the reply received from unbound. It can be enabled by adding the following to a file like /etc/dnsmasq.d/99-record.conf:
dumpfile=/etc/pihole/dump.pcap
dumpmask=0x00ff
This file should show us if my speculation is right and it is missing from upstream. Please also upgrade your FTL before as I added a bit more debug logging to the ENDS0 code (no functional change) - the compilation is still running and should be done in 1-2 minutes (done and uploaded).
installed new ftl
Pi-hole version is v5.16.2 (Latest: v5.16.2)
AdminLTE version is v5.19 (Latest: v5.19)
FTL version is new/ede-dnssec vDev-538c6a0 (Latest: v5.22)
cleared ftl log
ran the tests in the given order (dig A , followed by dig AAAA, edge, firefox
@jpgpi250 one further detail from the last FTL log: You are using both your unbounds's IPv4 and IPv6 addresses internally. This creates a lot of additional traffic in your internal network. Reason: long running queries aren't finished yet, FTL thinks the upstream may have stopped working and starts broadcasting all upstream packets to all configured servers. This causes a lot of unnecessary traffic and unbound doesn't reply in order (it doesn't have to according to DNS specifications). I don't think this explains it but it is the only visible change from running dig (performing exactly one query) to running in the browser (running multiple queries simultaneously).
Please try what changes when you only use one of the internal IP addresses for your unbound upstream. And note that - for the reasons mentioned above, it is in general disadvantageous to add the same server multiple times (even if over different protocols). This might be the reason for the confusion here.
What is abnormal here? I see a domain that should be BOGUS + SERVFAIL and this exactly what is being reported? The first BOGUS is Pi-hole's interpretation of the DNSSEC status, the text in parentheses is what was found as the reason (in this case, unbound told us "DNSSEC bogus"). There can be other texts there, e.g., "DNSKEY missing" or "... expired", etc. so it is meaningful to have this in addition.
I tried running pihole-FTL with a single upstream (IPv4). The results are the same, correct with dig, incorrect with browser. Repeated the test with only IPv6 upstream, same result.
I also tried removing redis (unbound cache module) from the equation, same result, correct with dig, incorrect with browser.
what is the recommendation, use IPv4 or IPv6 as upstream to unbound? The dashboard indicates IPv4 (38.7%), IPv6 (12.4%), but this is probably caused by the order in the conf file.
Using IPv4 only is the recommendation. Whenever direct routing is available (e.g. in your local network), it doesn't make much of a difference but IPv6 packets are a little bit larger than IPv4. Whenever NAT is involved (e.g. when the packet goes out to the public Internet) IPv6 is generally the better choice because less rewriting of the packet and more obvious routing makes IPv6 faster in most multi-hop scenarios.
Okay. I thought (hoped?) so, it would probably be more difficult to debug such a IPv4/IPv6 convolution scenario. Could you provide another log+pcap ZIP file in this simpler case? It will make the analysis a lot easier (especially for possible unbound developers coming here and not being familiar with the quite verbose FTL log output).
You may have noticed that the "working" request looks different (much longer, less white area) than the "not working" one. dig is telling dnsmasq (which does nothing more than forwarding the request to unbound) that it explicitly that it supports EDNS data. The browser lookup doesn't do this (missing "EDNS0 version: 0" in the "not working request" Wireshark screenshot above) and, as consequence, unbound doesn't attach the EDE at all. One may now argue whether this is expected (and intended) behavior or not but I could understand if the unbound folks say that they don't want to provide information that has not been requested (or, rather, where support for has not been announced).
Now that we know what is happening, we can reliably trigger it with
dig +noedns www.dnssec-failed.org
I see three possible way for going forward:
unbound send the EDE data even if EDNS0 was not explicitly requested
dnsmasq adds the signaling of EDNS0 support into forwarded packets that do not contain it
FTL interprets all SERVFAIL as being BOGUS in proxy-dnssec mode when no EDE is contained
I would prefer option 1 as I don't see much reasoning for no. 2 to be implemented by Simon. I don't know if sending down EDNS0 data that was not requested has the potential to break things so we'd possibly even need to remember that we'd have to strip this. The third and last option would obviously be the simplest, however, it'd mark all SERVFAIL as BOGUS without any real justification for doing so. Say the unbound server has some other problem causing it to return SERVFAIL (e.g. it cannot resolve parts of a recursive path because the nameserver of example.com is currently down), the Pi-hole would falsely stamp BOGUS here even when this is completely wrong.
NOT your preferred option, yet I think that whenever proxy-dnssec is enabled, dnsmasq should signal that it "needs" EDE data for all queries. This way, people, not using proxy-dnssec, don't receive the additional EDE data, when it isn't required to make everything work.
I've already added an entry to the dnsmasq mailing list, here. Maybe you could add some valuable info into a mailing list entry, or convince Simon this would be a major improvement.
I don't think this will be possible because it isn't true - this would be no improvement as dnsmasq itself isn't using the EDE code for anything.
However, we found that dnsmasq already has option no. 2 above!
Just add something that makes dnsmasq add EDNS0 data on its own, e.g.
add-cpe-id=01234
(could also be add-mac or add-subnet) into a config file. The added EDNS0 data will be stripped away before sending the reply back to your client as I was suspecting earlier.
Please add this line and upgrade to the latest version of FTL. I needed to change something to ensure also subsequently stripped EDNS0 data will be read. Binaries already built, tested and uploaded while I was typing this reply.
pihole -v
Pi-hole version is v5.16.2 (Latest: v5.16.2)
AdminLTE version is v5.19 (Latest: v5.19)
FTL version is new/ede-dnssec vDev-72b4bc5 (Latest: v5.22)
added add-cpe-id=01234
improvement, but still some confusion (screenshot is all from edge, not in pcap)
The log only shows 11:11:00 - 11:11:43 so 11:15:00 from your screenshot is not included.
However, looking at the screenshot alone suggests that this is happening because so many queries are needlessly done in parallel. dnsmasq identifies them as being exact duplicates and simply ignores them altogether ("already forwarded"). 50 identical queries will receive only one reply in the end. You see hte INSECURE here because of a glitch in the new FTL logic that says "if it is neither SECURE not BOGUS, then it must by INSECURE". In the exact case you have found, this is not true because they are ignored so their DNSSEC status is actually none at all. I'm working on this as time allows, the failed build is actually right, I forgot to adjust the tests in case of failed duplicated replies.
Yes, because they are ignored. Looking at your screenshot
all of the "already forwarded" are thrown away and not forwarded another time to unbound to (a) not overwhelm it and (b) because it is not necessary, dnsmasq knows it is already in progress. They are also never answered (hence reply N/A) so there is really nothing to see here. Already pushed - tests should pass now passed, binaries ready.