Seeking advice on running pihole along with an authoritative Bind9 DNS server

Apologies for not using the template, but I need some high-level advice on how to handle the DNS architecture of my ecosystem.

For many years, I have used a Bind9 recursive resolver (which is doing split-brain DNS) and it has worked well. My clients talk directly to bind on port 53 and there are multiple zones (work, personal, and kids) with different upstreams.

Much later in the evolution of the ecosystem I deployed pihole as an upstream (on port 5353) primarily for ad blocking. I looked into dnsmasq to take over the functions of bind, but it seemed that was not what I needed to upend my existing deeply configured ecosystem. For the work and personal bind zones, which resolve different internal addresses hence the different zones, they upstream to pihole which happily blocks a ton of content. If my wife got frustrated because something got blocked, I had to temporarily disable blocking for the whole house, as pihole saw ALL requests coming from the bind instance and not the client (remember this for later). The kids get openDNS family.

This has been stable for a long time; however my needs are changing, and due to several factors (not the least of which is requiring DNSSEC for SSHFP) I am considering putting pihole in front, have all my clients ultimately get the same set of internal resolution and the same upstream DNS, and then remove recursive resolving from bind (a requirement for it to be authoritative per my understanding) and having that be the priority 1 upstream for pihole. This would allow me to do different levels of blocking per client/subnet, and would make my DNSSEC issue less tricky to troubleshoot.

I see a lot of pros/cons to doing this either way. As I understand it, per this post several years ago, pihole can't specify different upstreams per group as the cache is common: Allow different DNS for each group I can't tell if that's still the case. Additionally, if my bind server becomes a fully authoritative (as I understand it, recursion has to be disabled for it to return an ad flag as a child domain where the parent domain has a matching DS record) I need for it to be the first DNS resolver for pihole, then my upstream of choice as the second. I don't see if I can set that priority via the UI, meaning what I'm hoping to do is impossible, unless specifying custom DNS in order (1, 2) works.

I know this is an over complicated setup, but at this point I'm mostly committed to the bind setup and want to figure out how to integrate pihole into it, however to disable recursion to make it return the ad flag seems to require me to re architect how all my clients will get DNS. I appreciate any ideas.

Mind that there is no priority with DNS: A client (Pi-hole, in this case) may pick any server from a given set. If you provide multiple upstreams, Pi-hole will always use the fastest responding one exclusively.

If your Bind9 server really would become an authoritative DNS server, then you could pick any upstream for Pi-hole, as public DNS would be aware of it and use it already.

But why would you run an authoritative DNS server at home?

Also, your aspired goal and the motivation for it are not quite clear to me.
From a 2022 post of yours, it would seem that you've started with your Pi-hole in front of Bind9? If your current configuration works, what's the reason for going back to that older one?

I totally forgot about that post; that was still during implementation and I never really went with it, I flipped them almost immediately. Previously, I had bind as the only dns, then I put pihole in on a pi, then when I was sure I liked it (which overlapped with switching to actual VM infrastructure and moving things off pi's to VM's) I tried it that way, but it didn't end up working out.

The main change is requiring DNSSEC for SSHFP; we run minecraft servers and have grown a lot to horizontally expand to gaming services off of the major platforms (mumble instead of discord), and there's a need for more orchestration, spinning machines and VMs up and down, etc. My intended solution is SSHFP, but the issue is SSH won't trust the keys it finds in DNS without DNSSEC. I've signed all my bind zones and have a DS record in my host (I have a hosted domain.com domain, and that has DNSSEC, then a bunch of subdomains off of that are hosted by the web host, and the rest point to the public IP of the internal network. One of those is our internal domain, say internal.domain.com, and the zone for that is signed and presents all the correct DNSSEC records).

I am still learning, but my understanding is this should be possible for a bind9 server serving an internal network only as a child of a parent domain, as long as the DS and DNSKEY records line up. This does validate in delv on a host inside the network.

When running dig against the domain, I still don't get the ad flag, and my research has taken me to the point of seeing that's because it's recursive, so that's the main issue I'm trying to solve...how can I maintain an internal DNS which validates DNSSEC, and get the benefits of pihole for the normal browsing? Additionally, when thinking about my options, I realized that moving pihole in front would mean I could do things like turn off ad blocking for my wife's clients, because she hates it.

I'm open to any ideas, this is a wonky setup but I'm pretty committed to our bind as it has at this point thousands of records we've curated as we've grown.

This sounds like you are running some business off your home?

In that case, my advice would be for a clean separation of business and private networks.
Then use Pi-hole as a filter for your private network with any upstream you like, and run at least one separate network for your business affairs, where you host your public services and your authoritative Bind9 DNS server.

It's more complicated than that; it started as me being a big self-hoster and having a bunch of services, then growing that to act as a CDN node for a bunch of other family businesses, then running a minecraft server for my kids, then expanding that to a bunch of off-cloud services for them to make the other parents comfortable with "online without being online," then that grew and is expanding horizontally with multiple minecraft servers spun up and down frequently, and my kids are now part of the administration and management of that. Functionally, it's one big network, it's just grown out horizontally with a tremendously blurry line between family and business.

At a high level, internal.domain.com should have stuff not resolving externally (most of our internal services exist on various ports which get forwarded) however we're rolling out ansible, and host key management was something I wanted to try to solve with SSHFP since it seems like the most robust way to do it from a management perspective.

This is a bit outside of a pihole specific discussion, but ultimately I'm still understanding this level of DNS. This is what I'm seeing (names changed), where delv shows the whole thing as fully DNSSEC validated, but dig is not sending the ad flag, and this article is saying it's the ad flag: Use SSHFP Records to Verify SSH Host Keys | Vultr Docs

$ ssh servername.internal.domain.com
The authenticity of host 'servername.internal.domain.com (10.2.2.202)' can't be established.
ED25519 key fingerprint is SHA256:3IM+5WjzYyZHYGhpdQCVD3BKc4Lm334iE3rF7etcJyM.
Matching host key fingerprint found in DNS.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
$ delv @10.2.2.213 servername.internal.domain.com. A +rtrace
;; fetch: servername.internal.domain.com/A
;; fetch: internal.domain.com/DNSKEY
;; fetch: internal.domain.com/DS
;; fetch: surfrock66.com/DNSKEY
;; fetch: surfrock66.com/DS
;; fetch: com/DNSKEY
;; fetch: com/DS
;; fetch: ./DNSKEY
; fully validated
servername.internal.domain.com.	86400 IN A	10.2.2.202
servername.internal.domain.com.	86400 IN RRSIG	A 13 4 86400 20240917122101 20240903215102 18461 internal.domain.com. HTPid68T3hn2ccWNxmxH4XGknTr7TWEWEvjfEW+DfABG7aopGEdlgdIK q12yRkEYGJOZbQRhxy5Tw8o4H5jpkA==
$ dig @10.2.2.213 servername.internal.domain.com +dnssec +multi

; <<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> @10.2.2.213 servername.internal.domain.com +dnssec +multi
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7853
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; COOKIE: 513ef31438584dd50100000066db1d409c1c810ee8f25bf3 (good)
;; QUESTION SECTION:
;servername.internal.domain.com.	IN A

;; ANSWER SECTION:
servername.internal.domain.com.	86400 IN A 10.2.2.202
servername.internal.domain.com.	86400 IN RRSIG A 13 4 86400 (
				20240917122101 20240903215102 18461 internal.domain.com.
				HTPid68T3hn2ccWNxmxH4XGknTr7TWEWEvjfEW+DfABG
				7aopGEdlgdIKq12yRkEYGJOZbQRhxy5Tw8o4H5jpkA== )

;; Query time: 0 msec
;; SERVER: 10.2.2.213#53(10.2.2.213) (UDP)
;; WHEN: Fri Sep 06 08:18:24 PDT 2024
;; MSG SIZE  rcvd: 217

So your actual issue is how to get your SSHFP DNS records operational?

That would be out of scope for Pi-hole, but I can try to provide some hints.

As I understand from your description, your Bind9 is currently working as both an authoritative DNS server for certain (local and/or public?) zones as well as a recursive resolver.

I'm guessing, but perhaps your issue is that Bind9 would answer as an authoritative DNS resolver when serving requests form those zones: According to RFC 4035 - Protocol Modifications for the DNS Security Extensions, an authoritative DNS server must not set the AD bit for a response, unless it has been explicitly configured to do so.
Generally, it is the job of a validating recursive resolver, employing DNSSEC validation, to determine if the Authentic Data (AD) can be set.

As the RFC alludes that an authoritative DNS server could be explicitly configured to set the AD bit in replies, you want to consult Bind9's documentation on if and how that could be achieved.

Alternatively, if that fails, you could try your approach of running a separate authoritative DNS server, though that could also be a second dedicated instance of Bind9 (or some other DNS server like unbound).

Also, note that your dig/delv commands did not verify your SSHFP record, as they request an A record instead, and furthermore, using ssh -v servername.internal.domain.com could provide some additional information about fingerprints.

I really appreciate your help, and I know this wandered from the original discussion from "can pihole be the solution" to "I need fundamental DNS help" so I appreciate your response given how far out of scope this is.

I've been battling the bind config to force it to send the ad flag; I've been going through so many config iterations, especially playing around with trust anchors for the parent zones, but it is still not sending it. It finds the key, however it regards it as insecure:

debug1: Server host key: ssh-ed25519 SHA256:QUF+YUZXDIzBnJ1IAJK6EWsvmDj4fz6CMUs2/V6B4cs
debug1: found 6 insecure fingerprints in DNS
debug1: verify_host_key_dns: matched SSHFP type 4 fptype 2
debug1: verify_host_key_dns: matched SSHFP type 4 fptype 1
debug1: matching host key fingerprint found in DNS

I'm getting a bit fed up with the whole thing, and the default will be to turn SSH host fingerprint checking off altogether (as this will mostly be used with freshly imaged machines, meaning I'll be dealing with it before the ansible bootstrap completes anyway so I can't push the known hosts file down, also it avoids constantly updating that). I may come back to it later, but for now it's become too deep of a rabbit hole.

Ultimately, all signs point to DNSSEC being fully in place and validated, but for some reason ssh isn't seeing it. I worked through the steps here, and the only thing not seemingly in place is EDNS but that's not required anywhere else I've read: ssh - SSHFP not working - Server Fault

ansible@managementnode:~$ delv servername.internal.domain.com.
**; fully validated**
servername.internal.domain.com.	86400 IN A	10.2.2.202
servername.internal.domain.com.	86400 IN RRSIG	A 13 4 86400 20240917122101 20240903215102 18461 internal.domain.com. HTPid68T3hn2ccWNxmxH4XGknTr7TWEWEvjfEW+DfABG7aopGEdlgdIK q12yRkEYGJOZbQRhxy5Tw8o4H5jpkA==

ansible@managementnode:~$ ssh-keyscan -D servername.internal.domain.com.
; servername.internal.domain.com.:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.10
; servername.internal.domain.com.:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.10
servername.internal.domain.com. IN SSHFP 1 1 bb65c184326abcedfd3fab6b55355b9d40d3dc57
servername.internal.domain.com. IN SSHFP 1 2 c283dd05748a22104ea88aaca6d2c23cd2e532c70a0a25c7131c3a514b0cac93
; servername.internal.domain.com.:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.10
servername.internal.domain.com. IN SSHFP 3 1 2e59108148ff69f1f53af9ba57ab731791ea993d
servername.internal.domain.com. IN SSHFP 3 2 9a7fe5e26f36448dc0bcf33435487a206174017a0f60255d74a64a3ed8419283
; servername.internal.domain.com.:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.10
servername.internal.domain.com. IN SSHFP 4 1 6c743b1eb85008bf847edafce1ae439d4857080d
servername.internal.domain.com. IN SSHFP 4 2 dc833ee568f36326476068697500950f704a7382e6df7e22137ac5edeb5c2723
; servername.internal.domain.com.:22 SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.10

ansible@managementnode:~$ delv servername.internal.domain.com. sshfp
; fully validated
servername.internal.domain.com.	86400 IN SSHFP	1 1 BB65C184326ABCEDFD3FAB6B55355B9D40D3DC57
servername.internal.domain.com.	86400 IN SSHFP	1 2 C283DD05748A22104EA88AACA6D2C23CD2E532C70A0A25C7131C3A51 4B0CAC93
servername.internal.domain.com.	86400 IN SSHFP	2 1 31A37B2E2D83A78769E2170DFDD3BBD556ABC5F7
servername.internal.domain.com.	86400 IN SSHFP	2 2 03E5FEE74C426F022B62E28495EC2E08BB60331CE15CC511B0B9F355 66400200
servername.internal.domain.com.	86400 IN SSHFP	3 1 2E59108148FF69F1F53AF9BA57AB731791EA993D
servername.internal.domain.com.	86400 IN SSHFP	3 2 9A7FE5E26F36448DC0BCF33435487A206174017A0F60255D74A64A3E D8419283
servername.internal.domain.com.	86400 IN SSHFP	4 1 6C743B1EB85008BF847EDAFCE1AE439D4857080D
servername.internal.domain.com.	86400 IN SSHFP	4 2 DC833EE568F36326476068697500950F704A7382E6DF7E22137AC5ED EB5C2723
servername.internal.domain.com.	86400 IN RRSIG	SSHFP 13 4 86400 20240919132142 20240905212139 18461 internal.domain.com. CQvY3hYib5Nd8kGzNEQShmyXRA/t2lI8OZ4C0BR7eMgwwE8m5pFkiW7S MNi+2/FpiCMWX0r5WmPTJYZnfL1wKQ==

ansible@managementnode:~$ ssh -oVerifyHostKeyDNS=yes servername.internal.domain.com.
The authenticity of host 'servername.internal.domain.com. (10.2.2.202)' can't be established.
ED25519 key fingerprint is SHA256:3IM+5WjzYyZHYGhpdQCVD3BKc4Lm334iE3rF7etcJyM.
**Matching host key fingerprint found in DNS.**
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?

You are running bind in a mixed mode configuration, both as an authoritative as well as a recursive resolver.

We've established so far that if Bind9 is queried for records that it is authoritative for, it will answer them, but it won't set the AD bit, just as required by the RFC specs, and that missing AD would in turn prompt ssh to treat the keys as insecure.

If there's no way configuring Bind9 to set that AD bit, then you'd have to separate your authoritative from your recursive resolver.
Note that may introduce new difficulties, if you'd rely on or even enforce internal domains to be only queried by private range source IPs.

If authoritative use wouldn't be your main concern, and you'd be open to consider another DNS server: unbound's documentation suggests that using stub zones would allow to run a mixed recursive and authoritative resolver for private zones, see unbound.conf(5) — Unbound 1.21.0 documentation.

Yea, at this point unless I can find a way to configure bind to set the ad bit (which I keep seeing hints is possible but can't seem to make work) I think I'm gonna move on and just disable host fingerprint checking. Re-doing DNS to a different platform is way out of scope for right now. We're big enough to need the automation, but not big enough for executing tasks on fake or modified hosts to be a specific issue. It's been an interesting exercise, and I'm glad at least in some regards DNSSEC seems to validate.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.