Unbound frequent restarts

jpgpi250 · May 31, 2021, 2:54pm

If this is correct, this would mean my method (commenting out the entries and update the file using a script) is working (still needs to be confirmed).

However, more important, this also implies there is a bug in resolvconf (the script that is triggering the unbound stops) that makes services (probably not just unbound) restart, regardless of the IP actually changed.

I have no idea on how or were to report this...

MichaIng · May 31, 2021, 3:07pm

I think so. Although when the resolvconf command is called by e.g. dhcpcd, then usually because it does want to change something, but probably it is done as well when actually no related IP has changed. Not sure where to report bugs. GitHub is a mirror only with not a single issue. Here the official website of the project: openresolv

To list the possible solutions from least to most intrusive (still all to be verified!):

Comment the unbound_conf setting (or set it with empty value: unbound_conf=) in /etc/resolvconf.conf, to make the special unbound script exit before doing something: openresolv/unbound.in at b51e465c3d289da7bee9843ddc2eb5404e72b954 · NetworkConfiguration/openresolv · GitHub
Set unbound=off in /etc/resolvconf.conf, to make the special unbound script not being called in the first place: openresolv/resolvconf.in at b51e465c3d289da7bee9843ddc2eb5404e72b954 · NetworkConfiguration/openresolv · GitHub
Install the resolvconf package to replace openresolv, which has no special unbound script, or any other script to configure anything else but /etc/resolv.conf.

jpgpi250 · May 31, 2021, 3:28pm

Simple last question, before I suspend to monitor the next few days, this may be acceptable / feasible for most users, having this problem.

/var/cache/unbound/resolvconf_resolvers.conf listed a few post higher, now contains the IPv6 addresses for, in my case the zone "localdomain" and ".".

What if (you'll probably say: test it) this file only lists IPv4 addresses, no IPv6 addresses (thus change them)?

This would make the least intrusive solution to:

comment out the unbound line in /etc/resolvconf.conf
change the IPv6 addresses in /var/cache/unbound/resolvconf_resolvers.conf into the pi-hole IPv4 address.

Since pi-hole has (default setup) a fixed IPv4 address, this would eliminate the need to update the file, incase an IPv6 address change occurs.

jmccamb · May 31, 2021, 3:50pm

I will report back after some hours of using the resolvconf package. I just don't want to make a dumb mistake and announce that the issue is resolved and then have to walk it back. (like I did yesterday)

MichaIng · May 31, 2021, 5:33pm

Note that whatever content /var/cache/unbound/resolvconf_resolvers.conf has, is irrelevant for Unbound as well as the Unbound restarts. The location, as mentioned above, is wrong (a bug), as Unbound reads only from /etc/unbound/unbound.conf[.d/] and openresolv recreates that file completely, only based on /etc/resolv.conf and some own listing it does, which I didn't fully understand.

Commenting out the unbound_conf= line equals solution 1, as both leads to an empty variable. Indeed commenting it is a little less intrusive, as the original value is kept and it hence can be reverted easily. I'll edit it above. /var/cache/unbound/resolvconf_resolvers.conf doesn't need to be touched then, or it can be removed, as it is neither used, nor re-created.

jmccamb · May 31, 2021, 7:56pm

So after just over six hours I can say with relative confidence that this issue was resolved (or worked around) by installing resolvconf and thus removing openresolv and therefore the problematic file.

As @MichaIng had noted, there are some other workarounds posted here that you might want to check out. This was the one that just seemed easy to me.

Thanks again all who helped, I can't thank you enough. This bug was driving me nuts, but I am glad that we found a solution.

jpgpi250 · June 2, 2021, 11:08am

Thank you, pi-hole team, to allow me to write my final findings regarding this.

TLDR; read, unbound users only, bottom of the post, Why is this something to look into? Users hardly notice the restarts...

I'm happy to report that I haven't had a single 'unbound stopped' message anymore, since I commented out

#unbound_conf=/var/cache/unbound/resolvconf_resolvers.conf

in /etc/resolvconf.conf, thus solving the problem.

The following are observations made on a raspberry pi 3B, Raspberry Pi OS Lite, Release date: May 7th 2021, running unbound 1.13.1 (compiled version). jmccamb however, uses unbound version 1.9.0, see here, and has the same problem. You may need to verify / change paths to get the information you require, this to verify if you are affected.

You need to ignore the time/date, mentioned in the example messages I'm adding, I'm reusing screenshots / messages from previous posts in this topic. All the events happen within 1 or 2 consecutive seconds.

The problem was first noticed when I started checking my unbound logs. I noticed the following message, several times a day:

May 14 06:15:26 unbound[790:0] info: service stopped (unbound 1.13.1).

This was unexpected, so I started checking other logs and found a matching (time) message in syslog:

06:15:26 raspberrypi dhcpcd[562]: eth0: part of Router Advertisement expired

Queries, regarding this on Netgate (I have a pfsense router/firewall) and the unbound user discussion list, didn't return a solution. Luckily, jmccamb asked the question here, a lot of feedback from various users eventually solved the problem.

Are you, an unbound user, affected?

The remainder of this topic contains instructions for users that are reasonably familiar with linux. Don't do this, if you're not!

There are two methods you can use to check if you are affected, and need to implement one of the suggested solutions.
Method 1 (recommended):
if you have unbound logging enabled (not he default in the guide), you need to up the verbosity to 1 (which means level 1 gives operational information.).
Once unbound has been running (you need to restart unbound to activate the setting) for a day or so, run the command (notice / modify the path):

cat /var/log/unbound/unbound.log |grep stopped

remember, if you intentionally restarted unbound, this message will occur at least once, you need to look for stop messages, other than user initiated restarts. If you find unexplainable (not user initiated) stop messages, you are affected.

If you're going to keep these unbound log settings, I strongly recommend to look into / setting up logrotation for this file! If you don't want logging to be permanently enabled, or don't know how to setup logrotation, remove the log settings from you're unbound.conf (and restart), as soon as you've determined you're (not) affected.

Method 2 (for inexperienced users, I strongly recommend to use method 1 - the unbound log):
Bucking_Horn suggested a method to detect signals, sent to unbound:

install screen (sudo apt-get install screen)
open a new screen (screen)
start the trace:

sudo strace -tt -ffo /tmp/trace-unbound -e trace=%signal -p "`pidof unbound`"

detach from the screen (press CTRL-AD)
The results are logged in /tmp. You'll find a number of files 'trace-unbound', you need to check the file that matches the unbound pid.
run pidof unbound, look at /tmp/trace-unbound.xxx, where xxx matches the result of pidof unbound.
check for SIGHUP, if there is a match, you probably are affected.

To return to the detached screen, run screen -r, to stop the trace, press CTRL-C, to exit (close the screen permanently, just type exit (in the active screen)

You may or may not want to verify our conclusion (the culprit is /usr/sbin/resolvconf)

jmccamb found:

141-type=PROCTITLE msg=audit(1622304458.990:134): proctitle=2F62696E2F7368002F7573722F7362696E2F7265736F6C76636F6E66002D6100657468302E7261
142:type=SYSCALL msg=audit(1622304459.310:135): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=67eb a1=1 a2=7fffffff a3=0 items=0 ppid=12586 pid=12626 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="resolvconf" exe="/usr/bin/dash" key=(null)

I found:

Sun 30 May 18:48:15 CEST 2021,32610,/bin/sh /usr/sbin/resolvconf -a eth0.ra

Again, there are two methods to get this information:
Method 1:
Bucking_Horn suggested using audit, explained here
Method 2:
DL6ER suggested using a script, see here

Both methods return a result, identifying /usr/sbin/resolvconf (which resolvconf) as the culprit. You'll have to write the script yourself, we're not allowed to publish scripts here (tip: endless loop logging the unique results of ps aux to a file)

Further analysis, a big thanks to MichaIng for his insight and effort, learned that, not only the raspbian version of resolvconf sends a SIGHUP to unbound, but also modifies a file, /var/cache/unbound/resolvconf_resolvers.conf.
The entries in that file are unbound forward-zone configuration entries, but the file itself is completely useless, read here, edit5 - zero effect, because unbound doesn't read configuration files from this location. Unbound uses configuration files, stored in /etc/unbound/ and /etc/unbound/unbound.conf.d/

In conclusion:
MichaIng suggested 3 possible solutions to eliminate the undesired unbound restarts, read here I used the least intrusive solution (comment out the unbound line in /etc/resolvconf.conf, however jmccamb opted to use method 3, read here. Both methods appear to be successful.

Why is this something to look into? Users hardly notice the restarts...

I noticed a lot of unbound errors in the unbound logs, immediately after the restart. After applying the fix (comment out the unbound entry in /etc/resolvconf.conf), these are all gone.
In this topic, frequent query retries are discussed, the dnsmasq fix was applied to pihole-FTL, but the retry count remained high (for me). After applying the fix, the retry count has decreased (not zero, but significantly lower than before).

May I suggest this information (the fix) is added to the unbound user guide, this to prevent future users from running into this problem...

DL6ER · June 2, 2021, 4:24pm

Yeah, we can discuss this. Yet, the unbound guide is already a lot more complex that it should be. It should be possible to install unbound without many stumbling blocks. I'll have to think about splitting it into multiple files. Having too long one-page guides doesn't seem helpful. There is already Disable resolvconf for unbound (optional) in the guide so there is certainly room for improvement.

For me, the IPv6 prefix changes once a day (when only computers are awake). I have seen zero retries in my Pi-hole logs. Sure, me not being affected doesn't men the issue doesn't exist. You likely have noted this before somewhere, but how often did this happen every day?

I haven't had much time to follow this discussion, sorry. Yet, I don't think the behavior is wrong as such. It rather seems that the configuration used by us all is wrong (as in: incomplete), see here:

If unbound is responsible for everything (indicated by .), it needs to be kicked when addresses may have changed. Typically, addresses don't change - assuming IPv6 privacy extension is disabled on your DNS server which seems to be the correct choice! If they do, the cache may contain invalid stuff and needs flushing.

jpgpi250 · June 2, 2021, 5:27pm

today, and the last few days, it looks like this (the issues I'm trying to figure out)

status  count   unique  description
12      62      39      Retried query
14      1170    31      Already forwarded, not forwarding again

The 'Retried query' count, before the 'fix' was always a lot higher (database contains 1323 entries - MAXDBDAYS=8). Difficult to measure a diff, because, I assume it also depends highly on what your doing (websites visited) and how long (active browsing time) your doing it.

My prefix hasn't changed in the over a year. I assume they do this (not change the prefix) to be able to identify subscribers, as explained here. I have an XLS file, specifically to store IP info, just compared the prefix value with the real value, no change. I only need to reconfigure when my ISP kicks the modem or their IPv6 infra, which leads to an address change, but never the prefix.

This is another reason why I'm reasonably sure the configuration file, created by resolvconf, is in the wrong place, thus irrelevant.
The unbound docs says (open the link and search for forward-zone - only 2 matches):

A forward-zone entry with name "." and a forward-addr target will forward all queries to that other server (unless it can answer from the cache).

I may be wrong (usually when replying to you), but I feel, if this config file would be used, given the description from the docs, there would be a loop, since the address in the auto generated file is the address pihole-FTL is listening on, pihole-FTL forwards to unbound.

I fully agree an address change is a very good reason to send a SIGHUP to unbound, but the problem is, this happens, despite the fact the address hasn't changed (running address check every 30 minutes, using cron, functionality confirmed).

Not really sure this will be an issue, when using IPv4 only, is that why nobody ever noticed it before? Only fools and horses appear to be using IPv6, the general recommendation you can find all over the internet, is to disable IPv6 (the solution for a lot of problems). Remember you once said: "Not using IPv6? you will be missing out on a big part of the internet"... That's why I persist.

MichaIng · June 2, 2021, 11:54pm

It is openresolv only which does this assumption and creates this config based on /etc/resolv.conf, regardless how Unbound is actually configured and used. I verified that it (aims to) add the systems DNS nameservers to Unbound's forward zone + sometimes other entries I couldn't fully identify. I cannot imagine a situation where this is wanted, especially not without any interactive choice or at least an info or log about this, but it is all muted.

I'll test on Debian Bullseye, where the config is created at an effective location, whether this really makes Unbound using the local systems upstream DNS servers instead of being a direct recursive DNS server, and if it creates a loop when the system itself uses Unbound as upstream DNS.

Bucking_Horn · June 3, 2021, 5:47am

Browsing through openresolv's documentation, I don't think that openresolv makes that assumption.

True, it is creating some custom configuration files (for unbound as well as for other DNS resolvers) as controlled by /etc/resolvconf.conf, but it does not intend to apply those files automatically.
It states specifically:

You then need to configure your resolver to use these files.

That seems a sane approach as far as configuration is concerned.
Still, unbound is HUPed when openresolv is configured to provide an unbound configuration file, regardless of whether unbound actually would depend on that file or not.

But this may mean that a fix for the Debian reported bug MichaIng' mentioned (EDIT5 above) potentially would force such a configuration on users that never intended to make use of it or are even aware of it.

Another side question that springs to mind: What happens if the openresolv package is updated? Does that leave changes to resolvconf.conf intact?

jpgpi250 · June 3, 2021, 6:08am

I've asked myself, you and users on another (dutch) forum this:

@deHakkelaar replied he doesn't have this problem in his IPv4 only environment.

I used a script (ps aux) to catch the culprit, ps aux also lists the full command line

Sun 30 May 18:48:15 CEST 2021,32610,/bin/sh /usr/sbin/resolvconf -a eth0.ra

does this help in anyway to identify what resolvconf is actually doing?

deHakkelaar · June 3, 2021, 6:41am

With PiOS Buster, I have that file auto populated with below where the 10.0.0.1 IP is from dhcpcd.conf DNS:

pi@ph5b:~ $ cat /var/cache/unbound/resolvconf_resolvers.conf
# Generated by resolvconf

forward-zone:
        name: "home.dehakkelaar.nl"
        forward-addr: 10.0.0.1

forward-zone:
        name: "."
        forward-addr: 10.0.0.1

But it doesnt affect the lookups:

pi@ph5b:~ $ sudo unbound-control lookup pi-hole.net
The following name servers are used for lookup of pi-hole.net.
;rrset 3589 4 0 7 3
pi-hole.net.    3589    IN      NS      ns4.pi-hole.net.
pi-hole.net.    3589    IN      NS      ns3.pi-hole.net.
pi-hole.net.    3589    IN      NS      ns2.pi-hole.net.
pi-hole.net.    3589    IN      NS      ns1.pi-hole.net.
[..]

Nor does resolvconf gets triggered to HUP unbound:

pi@ph5b:~ $ uptime
 08:20:49 up 18 days,  3:54,  1 user,  load average: 0.04, 0.06, 0.02

pi@ph5b:~ $ stat /etc/resolv.conf
[..]
Modify: 2021-05-16 00:59:34.000000000 +0200

pi@ph5b:~ $ stat /var/cache/unbound/resolvconf_resolvers.conf
[..]
Modify: 2021-05-16 00:59:35.000000000 +0200

If I create below file:

pi@ph5b:~ $ sudo nano /etc/unbound/unbound.conf.d/99-forward-zone.conf
forward-zone:
        name: "home.dehakkelaar.nl"
        forward-addr: 10.0.0.1

forward-zone:
        name: "."
        forward-addr: 10.0.0.1

pi@ph5b:~ $ sudo service unbound reload
pi@ph5b:~ $

It does affect:

pi@ph5b:~ $ sudo unbound-control lookup pi-hole.net
The following name servers are used for lookup of pi-hole.net.
forwarding request:
Delegation with 0 names, of which 0 can be examined to query further addresses.
It provides 1 IP addresses.
10.0.0.1                rto 146 msec, ttl 889, ping 14 var 33 rtt 146, tA 0, tAAAA 0, tother 0, EDNS 0 probed.

EDIT: Ow that is with the unbound-resolvconf service disabled.
EDIT2: When enabled, most likely you've created a loop because above service populates resolv.conf with 127.0.0.1.

MichaIng · June 3, 2021, 6:46am

At least from official Debian, Raspbian and Ubuntu repository, all packages are forced to handle all files in /etc as "conffiles", so that on upgrade they are by default left untouched, when the contained file did not change with that version, and otherwise the user is interactively asked whether to keep the old file, use the new one or see a diff. That is where the .dpkg-old/.dpkg-dist suffixes are coming from. This can be overwritten with the DPkg force-confold/force-confnew options, but I have never seen someone using forcenew.

Now I see the hint in the manpage to disable "subscribers":

To disable a subscriber, simply set it's name to NO. For example, to disable the libc subscriber you would set: libc=NO

Whether it's NO or off or donotdothat doesn't matter according to the code (yes/on and such keyword are affective, or not having it set at all), but good to see that this is documented and hence intended to be used.

Only because IPv4 addresses usually do not change that often, especially when a static IP is assigned, while IPv6 prefixes may change more regular and at least RAs are sent very frequently and seem to trigger the restarts through dhcpcd > openresolv even when the IPv6 address did not change.

So it was an RA (router advertisement) on the eth0 interface, as expected.

Basically, also seeing libc and Unbound mentioned the same way in the manpage, openresolv handles Unbound, dnsmasq and such the same way as libc (/etc/resolv.conf), so it makes sense that it applies the same upstream DNS nameservers to all of them. While libc is for the local system only, dnsmasq is usually for the local network and Unbound is (while it can be a simple forwarding resolver like dnsmasq) a recursive DNS server that usually is not configured at all to user an upstream DNS other than DNS root servers. I'll suggest upstream to have the subscriber switch settings present and set to NO by default, so that it is transparent to users and nothing done by default that exceeds what the resolvconf package does (which controls libc /etc/resolv.conf only).

jult · October 11, 2023, 1:38pm

I know this doesn't add much to the 'discussion', but I just want to emphasize that I find it awkwardly strange how an important thing like DNS has become such an utter disaster config for a widely used and employed distro like debian. It's just embarrassing, to be honest, how fragile and failure ridden something as crucial as proper name resolving has become for the average linux-admin. Truly shameful for the debian dev community, I can't put it any other way. I was already booted and banned from their forums when I pointed this out, so there's that.

MichaIng · October 11, 2023, 3:02pm

First of all, this is not "DNS" on Debian, but the openresolv package only, which caused these issues. And second, it has been resolved in the meantime .

EDIT: Ah, solved from Debian Trixie only. So Bookworm is still affected.