Unbound frequent restarts

jmccamb · May 29, 2021, 7:10pm

Just wanted to get one more instance before I end the audit. It looks like the same thing, just a different PID

295-type=PROCTITLE msg=audit(1622309841.479:256): proctitle=2F62696E2F7368002F7573722F7362696E2F7265736F6C76636F6E66002D6100657468302E7261
296:type=SYSCALL msg=audit(1622309841.739:257): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=67eb a1=1 a2=7fffffff a3=0 items=0 ppid=23427 pid=23467 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="resolvconf" exe="/usr/bin/dash" key=(null)

@Bucking_Horn

jpgpi250 · May 29, 2021, 8:02pm

any of the following files have a date/time that matches the unbound stop?

/etc/resolvconf.conf
/etc/resolv.conf
/var/cache/unbound/resolvconf_resolvers.conf

@jmccamb

jmccamb · May 29, 2021, 9:02pm

Yes, so here are the timestamps for a few of the recent stoppages:

May 29 12:37:21 raspberrypi-2 unbound[26603]: [26603:0] info: service stopped (unbound 1.9.0).
May 29 12:37:29 raspberrypi-2 unbound[26603]: [26603:0] info: start of service (unbound 1.9.0).
May 29 12:37:29 raspberrypi-2 unbound[26603]: [26603:0] info: service stopped (unbound 1.9.0).
May 29 12:37:37 raspberrypi-2 unbound[26603]: [26603:0] info: start of service (unbound 1.9.0).

And if I stat the files, two of the three were changed:

Same timestamp for resolv.conf

  File: /etc/resolv.conf
  Size: 64        	Blocks: 8          IO Block: 4096   regular file
Device: b302h/45826d	Inode: 809         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-01-11 07:15:50.981160954 -0600
Modify: 2021-05-29 12:37:21.699911385 -0500
Change: 2021-05-29 12:37:21.699911385 -0500
 Birth: -

Not the same for resolvconf.conf

  File: /etc/resolvconf.conf
  Size: 490       	Blocks: 8          IO Block: 4096   regular file
Device: b302h/45826d	Inode: 812         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-01-11 07:14:55.444118840 -0600
Modify: 2016-04-26 01:02:35.000000000 -0500
Change: 2021-01-11 07:14:55.444118840 -0600
 Birth: -

Same for resolveconf_resolvers.conf

  File: /var/cache/unbound/resolvconf_resolvers.conf
  Size: 312       	Blocks: 8          IO Block: 4096   regular file
Device: b302h/45826d	Inode: 257544      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-01-11 07:16:08.629999993 -0600
Modify: 2021-05-29 12:37:21.739910760 -0500
Change: 2021-05-29 12:37:21.739910760 -0500
 Birth: -

jpgpi250 · May 29, 2021, 9:53pm

In my case, my latest stop is

May 29 22:57:49 unbound[713:0] info: service stopped (unbound 1.13.1).

only /var/cache/unbound/resolvconf_resolvers.conf has an exact matching timestamp

checked out the link you pointed to,

sudo systemctl status unbound-resolvconf.service
Unit unbound-resolvconf.service could not be found.

deHakkelaar · May 30, 2021, 12:29am

Thats most likely bc you compiled unbound yourself:

pi@ph5b:~ $ dpkg -S unbound-resolvconf.service
unbound: /lib/systemd/system/unbound-resolvconf.service

@jmccamb, did you try disabling above service?
Could give it a try.
And how did the power swap work out? no more under voltage?

jpgpi250 · May 30, 2021, 9:28am

Continuing to investigate...
resolvconf is a script.
which resolvconf returns /usr/sbin/resolvconf

The script has a comment regarding SIGHUP. Below that comment, there is a function that builds a RESTARTCMD, I assume the restart commands are build from the content of /etc/resolvconf.conf and eventually executed. The script also rewrites the files, referred in /etc/resolvconf.conf.

I don't know what OS your running, and if there is a different script for each OS.

can you run sha1sum /usr/sbin/resolvconf (modify the location if required - which resolvconf). For my OS ( Raspberry Pi OS Lite, Release date: May 7th 2021), this returns:

sha1sum /usr/sbin/resolvconf
4bfee7ac4e855ae48e35ab9ac37ebb8c2d37d210  /usr/sbin/resolvconf

this to confirm we are looking at the same script / possible problem.

I'm considering to comment out the lines, for dnsmasq and unbound, in /etc/resolvconf.conf, this to verify if the unexpected stop events will disappear. I've already have an IPv6check script, running with cron, this to detect IPv6 address changes on the pi, hardly no effort to add sed instructions to modify the files and send a SIGHUP signal (only when the address has really changed).

Would however like the advice of DL6ER and Bucking_Horn...

DL6ER · May 30, 2021, 2:59pm

Thanks, the pointer to resolvconf was the missing element. Quick source code checking shows this is intended behavior:

github.com

NetworkConfiguration/openresolv/blob/b51e465c3d289da7bee9843ddc2eb5404e72b954/unbound.in#L85-L97

    
      
          if [ ! -f "$unbound_conf" ] || \
          	[ "$(cat "$unbound_conf")" != "$(printf %s "$newconf")" ]
          then
          	printf %s "$newconf" >"$unbound_conf"
          	# If we can't sent a HUP then force a restart
          	if [ -s "$unbound_pid" ]; then
          		if ! kill -HUP $(cat "$unbound_pid") 2>/dev/null; then
          			restart_unbound
          		fi
          	else
          		restart_unbound
          	fi
          fi

To be run whenever resolvconf is run by the system. This can very well coincidence with your IPv6 observations.

edit: restart_unbound is defined further up in the same file (not shown here)

DL6ER · May 30, 2021, 3:08pm

A bit off-topic (because not targeting unbound) but it makes me wonder if there should be a "reread /etc/hosts but keep cache otherwise" feature in dnsmasq. Maybe I'll suggest this to the mailing list and check what they say.

jpgpi250 · May 30, 2021, 4:01pm

All my Ipv6 addresses, including temporary addresses are kept (added automatically) in files in a specific directory, processed by dnsmasq, as soon as a file changes by using 'hostsdir=/etc/pfsense'. You can clearly see the new / changed entries are picked up by dnsmasq, without restart, by looking at the logfile.

you could consider moving (v6) the pihole files in a specific directory and use the hostdir= directive to achieve the same for /etc/pihole/local.list and /etc/pihole/custom.list as opposed to using (/etc/dnsmasq.d/01-pihole.conf)

addn-hosts=/etc/pihole/local.list
addn-hosts=/etc/pihole/custom.list

On topic: the unbound 'stopped' messages do not match the time the file(s) (a new IPv6 address added to a file) is changed and dnsmasq reads it. The global dynamic mngtmpaddr, the address in /var/cache/unbound/resolvconf_resolvers.conf, see here, never changes, unless the ISP kicks the modem.

That isn't in my resolvconf, so different versions on different OS, as I already suspected...

jpgpi250:

can you run sha1sum /usr/sbin/resolvconf (modify the location if required - which resolvconf). For my OS ( Raspberry Pi OS Lite, Release date: May 7th 2021), this returns:
sha1sum /usr/sbin/resolvconf
4bfee7ac4e855ae48e35ab9ac37ebb8c2d37d210  /usr/sbin/resolvconf

jmccamb · May 30, 2021, 4:26pm

looks like we have the same version:

pi@raspberrypi:/sbin $ sha1sum resolvconf
4bfee7ac4e855ae48e35ab9ac37ebb8c2d37d210  resolvconf

If you find a workaround please let me know.

Also, would blowing up my pihole and starting over with a different OS like dietpi, would that potentially resolve the issue? Or is modifying the resolveconf.conf file something more likely to help?

Can we safely say that this is something not related to pfsense or pihole at this point? For example, if I were to use opnsense would this issue still exist?

Thanks to everyone for you continued support. The devotion and generosity with time is unmatched in this community.

jmccamb · May 30, 2021, 4:47pm

@deHakkelaar If i disable the resolvconf.service as per the docs, should I first set the static domain_name_servers= in dhcpcd.conf to 127.0.0.1#5335?

Or is it enough to just disable and stop the service?

deHakkelaar · May 30, 2021, 5:04pm

No, you cant supply the #5335 port in that dhcpcd.conf file.
Either you just supply the localhost address 127.0.0.1 (without the port) which will be pihole-FTL listening on default port 53:

pi@ph5b:~ $ sudo netstat -nltup | grep 'Proto\|:53 \|:5335 '
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:53              0.0.0.0:*               LISTEN      3233/pihole-FTL
tcp        0      0 127.0.0.1:5335          0.0.0.0:*               LISTEN      3091/unbound
tcp6       0      0 :::53                   :::*                    LISTEN      3233/pihole-FTL
udp        0      0 0.0.0.0:53              0.0.0.0:*                           3233/pihole-FTL
udp        0      0 127.0.0.1:5335          0.0.0.0:*                           3091/unbound
udp6       0      0 :::53                   :::*                                3233/pihole-FTL

Or you enter another DNS server(s).
I've configured my router IP so when tinkering and I break things again, I still have DNS.

jmccamb · May 30, 2021, 5:10pm

Oh okay, I got you. I ran the first command:

sudo systemctl disable unbound-resolvconf.service

Another restart happened, so I am not sure if this is because I didn't follow through with the rest of the steps or because this service wasn't the one actually causing the restarts.

deHakkelaar · May 30, 2021, 5:11pm

Probably the latter but lets wait

EDIT: ow yeah, stop is oc requerement:

sudo systemctl stop unbound-resolvconf.service

jmccamb · May 30, 2021, 5:13pm

Okay, I will give that a go and try again

jpgpi250 · May 30, 2021, 8:49pm

I had another series of unbound stop messages, can confirm it is resolvconf causing the stop.

Sun 30 May 18:48:15 CEST 2021,32610,/bin/sh /usr/sbin/resolvconf -a eth0.ra

I've gone ahead and comment out the lines for dnsmasq and unbound, it will probably take a while to notice the consequences, if any...

Bucking_Horn · May 30, 2021, 9:26pm

jmccamb:

Just wanted to get one more instance before I end the audit.

296:type=SYSCALL msg=audit(1622309841.739:257): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=67eb a1=1 a2=7fffffff a3=0 items=0 ppid=23427 pid=23467 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="resolvconf" exe="/usr/bin/dash" key=(null)

Sorry, I should probably have clarified that the logs persist, i.e. after you captured the event we were after, you could have stopped auditing your system (and also sorry for returning late).

Just for completeness (click)

You should be able to extract all lines for that event by grepping the audit log for the audit id (following the timestamp):

sudo grep -n  'audit(.*:257)' /var/log/audit/audit.log

In your case, that would produce a few lines that would confirm that unbound was the process receiving the HUP signal, so nothing we do not already know.
Still, extracting all the lines pertaining to the same audit id may be of help in other auditing endeavours.

Looks like you've tacked resolvconf as the process responsible for HUPing unbound In the meantime.

There seem to be different versions in different packages around.
In my RaspberryPi OS installation, it resides at /sbin/resolvconf (edit: and there only), but produces the same sha hash.

On my Armbian installation, it doesn't: It seems Armbian uses the resolvconf package instead of openresolv.
Maybe that explains why I wasn't able to reproduce jmccamb's observation. I do also distribute a ULA prefix only in my network, though, and I haven't looked into the differences between different resolvconf implementations.
I'm also learning about those differences for the first time.

I think DietPi are basing some of their images on Armbian, so maybe @MichaIng can shine some light on those differences?

I don't know - without knowlegde of the IPv6 trigger event and the resolv.conf content, it may be too early to decide on that.

The one thing we haven't established yet is what IPv6 event exactly is triggering resolvconf. I think DL6ER has made some suggestions earlier to get hold of that.

It may still be the case that such an event warrants an update of resolvconf and the subsequent HUP signal to unbound (though I personally think that is unlikely, as jmccamb is observing that way too often).

@jmccamb, do you recall whether /etc/resolv.conf content changed at all during this analysis?

(Warning: speculations ahead)
If I assume that an IPv6 prefix has changed, then such a change may trigger construction of a new IPv6 address for a DNS server.
If that DNS server was listed in resolv.conf by a different, now expired IPv6 address, the update / reload may be warranted (though I still struggle to understand why unbound would be interested in resolv.conf).

But I somehow suspect that whatever decides on the invocation of unbound.in could be more sophisticated in detecting whether an IPv6 prefix change would also entail a relevant change of a DNS server address (e.g. if resolv.conf never needs to change because it contains only an LLA or ULA).

If that would be true, then that issue should be addressed by resolvconf maintainers. (Again, that is just me speculating here.)

And let me close by saying that all of this is out of scope for Pi-hole.
We don't normally have the time to cover a non-Pi-hole issue in the depth we've done here, so please don't take this as the norm.

On the other hand, this has been categorised accordingly, and given the many voices that joined in (thank you!), I guess you are seeing just that: An exceptional example of Community Help.

MichaIng · May 30, 2021, 9:52pm

/sbin/resolvconf and /usr/sbin/resolvconf are one and the same file on recent Debian and Raspberry Pi OS images, as /sbin is a symlink to /usr/sbin, like /bin and /lib as well.

We always installed the actual resolvconf package, not openresolv on DietPi, so I have no experience with it either. Nowadays we don't even have resolvconf pre-installed, as for simple network including DHCP it is not required. ifupdown and isc-dhcp-client write to /etc/resolv.conf directly when no resolvconf is present.

DL6ER linked the source code of openresolv above which triggers the Unbound HUP/restart, probably we should verify that resolvconf doesn't to that and then find out about the reason, as I currently don't see a reason why a DNS server should restart when the systems upstream DNS is updated . It's too late today but I'll have a look into this tomorrow.

jpgpi250 · May 30, 2021, 9:56pm

With all due respect for your effort as moderator, greatly appreciated, I must disagree.

One of pi-hole's recommended solutions (upstream) is unbound, see here.
In the past, see here, we have been looking at a high retry count problem. The fact that unbound stops/starts multiple times a day (4 times in the last 5 hours - my environment) may well be one of the reasons for this. We didn't look into to that possibility, because nobody really noticed it at the time.

jmccamb · May 30, 2021, 11:19pm

Thank you all for your help. Shout out to @Bucking_Horn @deHakkelaar .. I would mention everyone but it only allows you to mention two people.

I disabled that resolveconf service and my issue appears to have resolved. If this is an issue with another package I will report it to them too, I am just not sure where. If anyone could be assistance with pointing me in the right direction, that would be awesome.

Edit: I spoke too soon. looks it restarted. Just a lot later than usual.

I am not sure if this is I did it on one pihole instance and not another, but I am going to disable it on both and see what happens. Will report back.