Caddy Fails to Retrieve Certificates with SSL Challenge (Pihole DNS Configuration Issue Maybe)

The issue I am facing:
When trying to fetch certificates for some (sub)domains with caddy, I have been getting this error message:

{"level":"error","ts":1749437463.0314894,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"subdomain.example.com",
"issuer":"acme-v02.api.letsencrypt.org-directory","error":"[subdomain.example.com] solving challenges: presenting for challenge: could not determine zone for domain \"_acme-challenge.subdomain.example.com\": unexpected response code 'REFUSED' for _acme-challenge.subdomain.example.com. (order=https://acme-v02.api.letsencrypt.org/acme/order/<MY_ORDER>) (ca=https://acme-v02.api.letsencrypt.org/directory)"}

Most of the domains seem to work, but a couple refuse to work properly. At one point it was every domain, but now it's just a couple of them for some reason. Initially I thought this was a Caddy issue and so I posted there, but after reading around it seems like it could be DNS/Pihole related, so I wanted to tackle it from that angle. It almost feels like something is caching something about those sub(domains), but I've cleared all the caches I am aware of and still have the issue.

Details about my system:
Pihole + Unbound on one machine following the setup instructions from the pihole docs.
Caddy on another machine that reverse_proxies to various domains in my network. The domain (an example domain was used in this post) is owned by me in Cloudflare. My API credentials given to caddy work (as other domains used them fine), just a couple of them are having issues.

What I have changed since installing Pi-hole:
Nothing major in terms of configuration changes I personally made. It may have started after upgrading to Pihole 6, but I'm not positive. Otherwise I haven't made a lot of manual changes to configuration files.

I also added debug logs if that helps, but I'm not sure if it will: https://tricorder.pi-hole.net/qEqbadod/

I finally figured it out, and it was mostly due to some configuration issues I made.

First I saw that it was definitely an issue with one of my pihole instances (piholeB). When I did nslookup pointed to piholeA everything worked fine but pointed at piholeB it was timing out, which likely is what led to Caddy having issues getting the certificates. I must have just got lucky originally where most of the domains initially got their DNS from working piholeA and those last 2 newer ones got their DNS from piholeB and so failed.

For more context, piholeA is pihole + unbound running in an LXC in PVE. Since the only other server I have on my network is a Synology machine, I run piholeB on that as a container. At some point, when I was upgrading to pihole 6 I think I did it on piholeA and then copied the configuration to piholeB, which included relevant unbound settings. This led to piholeB pointing at the recommended 127.0.01 IP for unbound, which works for piholeA where that is true, but for piholeB the pihole image didn't come with unbound and I didn't have another unbound instance installed (I don't remember anymore but I guess in the past piholeB maybe just didn't have unbound at all). So this led to the observed issue because queries to piholeB led to querying a non-existent unbound instance, and so led to a timeout.

Adding an unbound container (with proper configuration), and then changing the pihole configuration, solved the issue, and Caddy has been able to fetch new certificates for all of my domains.

So in the end it was mostly a silly misconfiguration on my part, but I wanted to post the solution here in case anyone comes across similar issues, as it was difficult for me to diagnose (with my network-related knowledge) from the errors I was getting and not understanding why sometimes it was successful.

Although for some reason I can't get it to properly point to the unbound IP, it keeps changing back to 127.0.0.1. I'm guessing it's maybe nebula_sync that I have running that is syncing the settings from my other pihole that uses that IP for unbound.

I confirmed, it's definitely nebula_sync that's syncing the custom DNS server setting (which I don't want here because the unbound server is different between the piholes).

After some digging I found that it's possible to make nebula_sync do it's syncing more granularly, which makes sense in my mixed-hardware setup that I have.

To avoid syncing the upstreams (and so avoid syncing over the differing upstream pointed at the piholeB's unbound container), one needs to disable FULL_SYNC, and then re-enable each component separately as desired, while explicitly disabling syncing the upstreams. This will ensure that everything but that one component gets synced between pihole instances.

Here is what I did (configured as environment variables in my docker compose file for the nebula-sync service). The key part I initially missed is that if SYNC_CONFIG_DNS_EXCLUDE is used, it seams that SYNC_CONFIG_DNS also needs to be set to true if this is to work as expected.

    - FULL_SYNC=False
    - RUN_GRAVITY=true
    # Granular sync settings (excluding DHCP and upstream DNS), when FULL_SYNC is false
    - SYNC_ADLISTS=true
    - SYNC_AUDIT_LOGS=true
    - SYNC_CNAME=true
    - SYNC_CONFIG_API=true
    - SYNC_CONFIG_AUDIT=true
    - SYNC_CONFIG_BLOCKING=true
    - SYNC_CONFIG_DNS=true
    # Exclude syncing upstreams (i.e. unbound) while replica instance is using a different unbound container/unbound setup as primary instance
    - SYNC_CONFIG_DNS_EXCLUDE=upstreams
    - SYNC_CONFIG_INTERFACE=true
    - SYNC_CONFIG_PRIVACY=true
    - SYNC_GROUPS=true
    - SYNC_LOCALHOST_NAME=true
    - SYNC_REGEX=true
    - CRON=0 * * * *