Pihole+unbound+localzone - incorrectly NODATA-IPv4 until restarted (and then repoisoned...)

This is very odd behaviour and I'm wondering if ipv6 tickle's this bug? OSX Ventura apparently does not help things. I've turned off DNSSec on pihole, leaving unbound to do the heavy lifting... so that's not the issue.

I'm going to upgrade, but I wanted to file the bug as this didn't immediately jump out on google as a known bug. Hopefully we can pin this down and help others.

Update: confirmed issue still present on *latest ..

Please follow the below template, it will help us to help you

This is on:

Expected Behaviour

Pihole will query unbound, which returns the correct data for the A-record

Actual Behaviour

Pihole returns NODATA-IPV4

Debug Token


Aug 12 21:03:27 dnsmasq[240]: query[A] nodered.myhouse.privatedomain.com from 192.168.1.23
Aug 12 21:03:27 dnsmasq[240]: forwarded nodered.myhouse.privatedomain.com to 192.168.1.112#5335
Aug 12 21:03:27 dnsmasq[240]: reply nodered.myhouse.privatedomain.com is <CNAME>
Aug 12 21:03:27 dnsmasq[240]: reply cnamed.myhouse.privatedomain.com is NODATA-IPv4
Aug 12 21:03:31 dnsmasq[240]: query[A] cnamed.myhouse.privatedomain.com from 0001:0000:0000:4a00:8d65:869b:55ca:0001
Aug 12 21:03:31 dnsmasq[240]: cached terraDelta.myhouse.privatedomain.com is NODATA-IPv4


root@cnamed2:~# cat /etc/unbound/unbound.conf.d/localzone-myhouse.privatedomain.com.conf
server:
    ###########################################################################
    # LOCAL ZONE
    ###########################################################################


    local-zone: "myhouse.privatedomain.com." transparent
    ### <<< snip >>>

    local-data: "cnamed.myhouse.privatedomain.com. IN A 192.168.1104"
    local-data-ptr: "192.168.100.104 cnamed.myhouse.privatedomain.com."
    local-data: "canmed.myhouse.privatedomain.com. TXT 'Cheese' "
    local-data: "nodered.myhouse.privatedomain.com. CNAME cnamed.myhouse.privatedomain.com."

Your log output suggests that Pi-hole has received that reply from your unbound.

What's the result of:

dig cnamed.myhouse.privatedomain.com @127.0.0.1 -p 5335

the correct ip is returned from both unbound instances when asked directly.

dig cnamed.myhouse.privatedomain.com @192.168.1.112 -p 5335

; <<>> DiG 9.10.6 <<>> cnamed.myhouse.privatedomain.com @192.168.1.112 -p 5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18155
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;cnamed.myhouse.privatedomain.com. IN A

;; ANSWER SECTION:
cnamed.myhouse.privatedomain.com. 3600 IN A 192.168.1.104

;; Query time: 0 msec
;; SERVER: 192.168.1.112#5335(192.168.1.112)
;; WHEN: Sun Aug 13 08:49:34 EDT 2023
;; MSG SIZE  rcvd: 80

dig cnamed.myhouse.privatedomain.com @192.168.1.110 -p 5335

; <<>> DiG 9.10.6 <<>> cnamed.myhouse.privatedomain.com @192.168.1.110 -p 5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25927
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;cnamed.myhouse.privatedomain.com. IN A

;; ANSWER SECTION:
cnamed.myhouse.privatedomain.com. 3600 IN A 192.168.1.104

;; Query time: 4 msec
;; SERVER: 192.168.1.110#5335(192.168.1.110)
;; WHEN: Sun Aug 13 08:49:37 EDT 2023
;; MSG SIZE  rcvd: 80

After a bit more digging and finding threads indicating a change in behaviour on OSX Ventura 13.5, - I wonder if this could be some weird race condition caused by changes in behaviour in OSX to query A, then AAAA and the NODATA of the AAAA getting cached by dnsmasq?

12:32:08.269173 IP6 (flowlabel 0x60600, hlim 64, next-header UDP (17) payload length: 59)  0001:0000:0000:4a00:8d65:869b:55ca:0001.60470 >  0001:0000:0000:4a00:8d65:869b:55ca:0002.53: [udp sum ok] 46943+ A? supera.myhouse.private.com. (51)
12:32:08.269335 IP6 (flowlabel 0x70d00, hlim 64, next-header UDP (17) payload length: 59) 2 0001:0000:0000:4a00:8d65:869b:55ca:0001.54115 >  0001:0000:0000:4a00:8d65:869b:55ca:0002.53: [udp sum ok] 29260+ AAAA? supera.myhouse.private.com. (51)
12:32:08.274608 IP6 (flowlabel 0x8de5d, hlim 64, next-header UDP (17) payload length: 75) 0001:0000:0000:4a00:8d65:869b:55ca:0002.53 > 0001:0000:0000:4a00:8d65:869b:55ca:0001.60470: [udp sum ok] 46943 q: A? supera.myhouse.private.com. 1/0/0 supera.myhouse.private.com. A 192.168.1.110 (67)
12:32:08.278580 IP6 (flowlabel 0xc472b, hlim 64, next-header UDP (17) payload length: 59) 0001:0000:0000:4a00:8d65:869b:55ca:0002.53 > 0001:0000:0000:4a00:8d65:869b:55ca:0001.54115: [udp sum ok] 29260* q: AAAA? supera.myhouse.private.com. 0/0/0 (51)

I think I've narrowed the issue occurring to just those local-zone entries with CNAMEs. I've also filed an Issue:

https://github.com/pi-hole/FTL/issues/1623

I'll disable CNAMEs and monitor for stability / see how it goes.

No.
That would be two separate requests, each for a different RR type, A and AAAA.

That doesn't preclude that Pi-hole could have cached an A record reply from a previous lookup that could somehow be involved here.

Please upload a debug log and post just the token URL that is generated after the log is uploaded by running the following command from the Pi-hole host terminal:

pihole -d

or do it through the Web interface:

Tools > Generate Debug Log

I'm also surprised that you did get that reply from your unbound, as the unbound configuration snippet you've shared looks faulty:

(edit: I've underlined ( "----" ) potential issues in your above output.)

You didn't mention that you run two instances of unbound.
Do they run on the same machine as your Pi-hole, or on separate hosts each?
Is that the reason you did not run the digs against 127.0.0.1?

A possible explanation could have been one of your two unbounds running that faulty configuration, so Pi-hole may deliver the correct answer only when querying the correctly configured one.

However, your dig output suggests that both resolvers at 192.168.1.110#5335 and 192.168.1.112#5335 are returning correct results?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

As mentioned in the Github issue ticket, it'd be great it you could prepare a PCAP recording of the packets being sent to unbound and the according replies - and what your Pi-hole then passes on to your client. This ill also be important for the dnsmasq-discuss mailing list as reproduction doesn't seem reliable/straightforward, at least not without an additional locally running server such as unbound.

As mentioned over there, Pi-hole has its own embedded package dumping. It can be enabled by adding the following to a file like /etc/dnsmasq.d/99-record.conf:

dumpfile=/etc/pihole/dump.pcap

(or any other location you prefer), in addition to

dumpmask=<mask>

where mask specifies which types of packets should be added to the dumpfile defined above. The argument should be the OR of the bitmasks for each type of packet to be dumped: it can be specified in hex by preceding the number with 0x in the normal way.
Each time a packet is written to the dumpfile, we log the packet sequence and the mask representing its type. The current types are:

  • 0x0001 - DNS queries from clients
  • 0x0002 - DNS replies to clients
  • 0x0004 - DNS queries to upstream
  • 0x0008 - DNS replies from upstream
  • 0x0010 - queries send upstream for DNSSEC validation
  • 0x0020 - replies to queries for DNSSEC validation
  • 0x0040 - replies to client queries which fail DNSSEC validation
  • 0x0080 - replies to queries for DNSSEC validation which fail validation.

If you just want to record everything and later filter this in Wireshark (I typically recommend this) you can just add the two lines

dumpfile=/etc/pihole/dump.pcap
dumpmask=0x00ff

After restarting pihole-FTL, it will record whatever you do.

Once you are done with testing, you can remove the config file created above and restart pihole-FTL. The shorter your capturing period is, the easier it will be to analyze them.

Next, you can use tcpdump to analyze your recording, like

tcpdump -n -ttt -r /tmp/dns.pcap -vvv

(-n shows IP addresses, -ttt shows the time relative to the first recorded package and -vvv shows as much information as possible).

Example
reading from file /tmp/dns.pcap, link-type EN10MB (Ethernet)
 00:00:00.000000 IP (tos 0x0, ttl 64, id 47075, offset 0, flags [DF], proto UDP (17), length 55)
    192.168.2.224.34661 > 192.168.2.10.53: [bad udp cksum 0x866f -> 0x4807!] 32+ A? google.de. (27)
 00:00:00.056041 IP (tos 0x0, ttl 64, id 32467, offset 0, flags [DF], proto UDP (17), length 71)
    192.168.2.10.53 > 192.168.2.224.34661: [udp sum ok] 32 q: A? google.de. 1/0/0 google.de. [5m] A 142.250.185.163 (43)

Here, you can see that I ran dig A google.de from 192.168.2.224 to 192.168.2.10 (my Pi-hole). The second line shows that the result came 56 milliseconds later and told my dig that the answer to my query is A 142.250.185.163.


You can also share the file with us for further analysis and/or use wireshark to analyze your recording. If there happened other things while you have been testing (and you don't want to share them), you can also use Wireshark to clean your PCAP and save only a subset of the recorded packets. However, if you do this, please indicate it so we are prepared if maybe something is missing.

You can upload the file here or send it to us directly via PM - we can also do the reduction of the PCAP recording for you if you'd prefer this.

1 Like

Thank you for responding - I appreciate that you are quite busy. I'll get this captured over the weekend.

It's worth noting that disabling CNAMEs (at the unbound side) and things have been stable for 30+ days.

In trying to reproduce your issue, I've applied a similar config as yours (if for different domains for CNAME and A record).

Interestingly enough, `unbound` produces the same answer as Pi-hole (click for details)

unbound has the following local definitions:

    local-data: "cnamed.myhouse.lan. IN A 192.168.1.104"
    local-data: "nodered.myhouse.lan. CNAME cnamed.myhouse.lan."

Querying Pi-hole (at the default port 53)

~ $ dig nodered.myhouse.lan

; <<>> DiG 9.11.5-P4-5.1+deb10u8-Raspbian <<>> nodered.myhouse.lan
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57924
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nodered.myhouse.lan.           IN      A

;; ANSWER SECTION:
nodered.myhouse.lan.    30      IN      CNAME   cnamed.myhouse.lan.

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu Sep 14 17:34:03 CEST 2023
;; MSG SIZE  rcvd: 69

Querying unbound (at port 5335)

pi@pihole-5:~ $ dig nodered.myhouse.lan @127.0.0.1 -p5335

; <<>> DiG 9.11.5-P4-5.1+deb10u8-Raspbian <<>> nodered.myhouse.lan @127.0.0.1 -p5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58722
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nodered.myhouse.lan.           IN      A

;; ANSWER SECTION:
nodered.myhouse.lan.    3600    IN      CNAME   cnamed.myhouse.lan.

;; Query time: 0 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1)
;; WHEN: Thu Sep 14 17:34:27 CEST 2023
;; MSG SIZE  rcvd: 69

It seems that -being a recursive resolver rather than an authoritative one- unbound intentionally does not expand CNAMEs from local definitions, which would prompt the behaviour you observe.

See e.g. [Unbound-users] Can't get CNAME entries to resolve and Consider disabling CNAME scrubbing for forwarded queries · Issue #132 · NLnetLabs/unbound · GitHub.

You could consider to move your definitions to Pi-hole's CNAME and A records.

Thanks for spending some time here. GH Issue has been closed as resolved.

I do not think that I'm the only one who aspires to use CNAMEs in a split-horizon environment. Perhaps a "guide" is in order adjacent or under unbound here?

Happy to draft a PR for review/integration.

It seems there are like two options:
1.) Move the definition "upstream" to PiHole
2.) Run a 3rd authoritative zone server for homelan behind unbound - just for homelan.

Let me play with both and I will revert.

2.) is what the linked discussions also seem to suggest.

As you do run Pi-hole, I'd still recommend to define your CNAMEs and As in Pi-hole (unless you'd require some additional features that neither Pi-hole nor unbound would be able to provide). :wink:
It also would seem like an obvious choice, regardless of Pi-hole's upstream DNS configuration. In fact, upstreams commonly cannot be expected to resolve local DNS names.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.