Unbound frequent restarts

Just a side note on IPv6 addresses (click for details).

Not quite. :wink:

An RFC 4941 IPV6 Privacy extension address can also be considered a special application of SLAAC, and EUI64 addresses (containing a MAC) may be used just as well to provide the interface portion of an IPv6 ULA or GUA.

There are three types or ranges of IPv6 addresses (edit: well, truth is there are more, but these are the most common and also most relevant ones):
link-local (fe80::/10), ULA (fd00::/8) and GUA (2000::/3).
There are also three methods how a client can acquire those:
SLAAC as well as Stateless and Stateful DHCPv6.

A client may use SLAAC to construct IPv6 addresses for all three ranges, or it may choose any combination of those methods to acquire any set of addresses, depending on a client's configuration and availability of router support for those methods.

Link-local addresses are the only ones that can be acquired by SLAAC exclusively, but they cannot be relied upon to be predictable: A client may be configured to forego EUI48/64 addresses in favour of an RFC 7217 address. In Raspberry Pi OS, this is the default (configured by slaac private in dhcpcd.conf).

2 Likes

Maybe.
Note that some networking tools would ignore those settings, relying on their own configuration options instead - I believe NetworkManager may fall into that category.


If we cannot find this on the receiving end, let's try and capture it at the sender, using auditd.

EDIT: I've successfully tested below steps on Raspbian Buster, but couldn't get them to work on Armbian (the audit log never populated).
It may only be worth the hassle if you would run Raspberry Pi OS as well.

Since we don't know the sender, we have to capture traffic of all processes, so this may incur a performance hit for all of your system, and the respective log file may grow large quickly. And we still need strace to find the pid.

Don't forget to disable it once finished.

We start by installing auditd:

sudo apt install auditd audispd-plugins

Enable auditing:

sudo auditd -s enable

Find your system CPU architecture, e.g.:

~ $ uname -m
armv7l

Use that arch when creating a rule to capture system calls to kill -HUP:

sudo auditctl -a exit,always -F arch=armv7l -S kill -F a1=1

EDIT: In an attempt to limit audit log growth, I tried to be as restrictive as possible here, by assuming that unbound is indeed HUPed by an unknown process. I think that's a safe assumption, as the observed behaviour is documented for HUP reception in unbound. Still, if it proves too restrictive and you can't find a match, you may drop the trailing -F a1=1.

Verify that rule got created, e.g.:

~ $ sudo auditctl -l
-a always,exit -F arch=b32 -S kill -F a1=0x1

You should see an audit log at /var/log/audit/audit.log
Check that audit logging is working:

sudo auditctl -m 'verify its working'

Verify that statement made it into your audit log:

sudo grep -n 'verify' /var/log/audit/audit.log

If everything works, have your strace running also and wait for unbound to restart.

Once that happens, extract the pid of the process that HUPed unbound from your strace log as before, and use that to determine the offending process:

sudo grep -n -B1 '=29812 ' /var/log/audit/audit.log

In the above, replace 29812 with the pid from your strace (keep the leading = and trailing space).

That should give you a few lines like:

1265-type=USER_START msg=audit(1622277865.428:1119): pid=29811 uid=0 auid=1000 ses=1522 msg='op=PAM:session_open grantors=pam_permit,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
1266:type=SYSCALL msg=audit(1622277865.448:1120): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=1f6 a1=1 a2=42adc00 a3=42adc00 items=0 ppid=29811 pid=29812 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts1 ses=1522 comm="kill" exe="/bin/kill" key=(null)

where exe= should point to the binary (at the end of line 1266 in the example above).
Let's hope that isn't just a shell /bin/kill spawned by yet another unknown process (in the example I used, it is of course such a command that I used to trigger those log lines).

Once done, don't forget to disable auditing:

sudo auditd -s disable
sudo service auditd stop
sudo systemctl disable auditd
4 Likes

@Bucking_Horn
Thanks for clarification. With "SLAAC address" I basically meant the first step of SLAAC to create an LLA from the MAC (being imprecise), while everything else implies the router(s): https://www.networkacademy.io/ccna/ipv6/stateless-address-autoconfiguration-slaac
It remains complex, which might be the major issue with IPv6 in general and why many users with average knowledge still need to disable it to resolve this or that network issue. Our VPS is IPv4 only, because initial IPv6 packets from the server have 10-30 seconds delay, when no packet was sent within ~40 seconds, an no one is able to find out why, probably ISP routing related :frowning_face:.

@jmccamb

If you really want to disable IPv6 completely (for testing or if we cannot find any better solution):
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1

The second is usually required as well to disable it for any newly configured network interfaces, after that rules got applied (which is very early at boot). Good to know that it can be ignored. When it's successful, ip a does not show any inet6.

Last-last resort would be to disable the IPv6 kernel module, respectively via kernel cmdline when it's builtin. But that requires some software configs to be changed, including Unbound AFAIK, to assure that they do not even try to interpret IPv6 addresses.

But I like the diagnosing attempts, instead of using the hammer (and it's still now 100% clear that IPv6 is the issue(?)). To further diagnose on the network side, tcpdump might be interesting as well, e.g. to print all RAs and other NDs around in the network:

apt install tcpdump
tcpdump icmp6
1 Like

Okay I will give this a shot and report back! Thanks for taking the time to do this.

I will keep you posted.

Here is what I got. I was lucky and a restart happened pretty much right after I set up the audit:

141-type=PROCTITLE msg=audit(1622304458.990:134): proctitle=2F62696E2F7368002F7573722F7362696E2F7265736F6C76636F6E66002D6100657468302E7261
142:type=SYSCALL msg=audit(1622304459.310:135): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=67eb a1=1 a2=7fffffff a3=0 items=0 ppid=12586 pid=12626 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="resolvconf" exe="/usr/bin/dash" key=(null)

I still have the audit running just in case we need more info.

interesting, can you please check /etc/resolvconf.conf

mine contains the following:

# Configuration for resolvconf(8)
# See resolvconf.conf(5) for details

resolv_conf=/etc/resolv.conf
# If you run a local name server, you should uncomment the below line and
# configure your subscribers configuration files below.
#name_servers=127.0.0.1

# Mirror the Debian package defaults for the below resolvers
# so that resolvconf integrates seemlessly.
dnsmasq_resolv=/var/run/dnsmasq/resolv.conf
pdnsd_conf=/etc/pdnsd.conf
unbound_conf=/var/cache/unbound/resolvconf_resolvers.conf
  • never mind the dnsmasq entry, dnsmasq was installed before pihole-FTL disables it.
  • it contains an entry for unbound (is this the culprit?)

the entry points to /var/cache/unbound/resolvconf_resolvers.conf. that file contains:

# Generated by resolvconf

forward-zone:
	name: "localdomain"
	forward-addr: the IP v6 address of my pihole - masked for privacy

forward-zone:
	name: "."
	forward-addr: the IP v6 address of my pihole - masked for privacy

please check this...

DL6ER and Bucking_Horn, please advise...

Here in the info you requested:

resolv_conf=/etc/resolv.conf
# If you run a local name server, you should uncomment the below line and
# configure your subscribers configuration files below.
#name_servers=127.0.0.1

# Mirror the Debian package defaults for the below resolvers
# so that resolvconf integrates seemlessly.
dnsmasq_resolv=/var/run/dnsmasq/resolv.conf
pdnsd_conf=/etc/pdnsd.conf
unbound_conf=/var/cache/unbound/resolvconf_resolvers.conf

resolvconf_resolvers.conf

# Generated by resolvconf

forward-zone:
	name: "home.arpa"
	forward-addr: static IPv6 I set in pfsense and setupVars
	forward-addr: the IPv6 that was assigned to the device initially

forward-zone:
	name: "."
	forward-addr: 1.1.1.1
	forward-addr: 1.0.0.1
	forward-addr: static IPv6 I set in pfsense and setupVars
	forward-addr: the IPv6 that was assigned to the device initially

I thought that they were more interested in the exe though. Does this have anything to do with this mention in the docs?

Just wanted to get one more instance before I end the audit. It looks like the same thing, just a different PID

295-type=PROCTITLE msg=audit(1622309841.479:256): proctitle=2F62696E2F7368002F7573722F7362696E2F7265736F6C76636F6E66002D6100657468302E7261
296:type=SYSCALL msg=audit(1622309841.739:257): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=67eb a1=1 a2=7fffffff a3=0 items=0 ppid=23427 pid=23467 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="resolvconf" exe="/usr/bin/dash" key=(null)

@Bucking_Horn

any of the following files have a date/time that matches the unbound stop?

/etc/resolvconf.conf
/etc/resolv.conf
/var/cache/unbound/resolvconf_resolvers.conf

@jmccamb

Yes, so here are the timestamps for a few of the recent stoppages:

May 29 12:37:21 raspberrypi-2 unbound[26603]: [26603:0] info: service stopped (unbound 1.9.0).
May 29 12:37:29 raspberrypi-2 unbound[26603]: [26603:0] info: start of service (unbound 1.9.0).
May 29 12:37:29 raspberrypi-2 unbound[26603]: [26603:0] info: service stopped (unbound 1.9.0).
May 29 12:37:37 raspberrypi-2 unbound[26603]: [26603:0] info: start of service (unbound 1.9.0).

And if I stat the files, two of the three were changed:

Same timestamp for resolv.conf

  File: /etc/resolv.conf
  Size: 64        	Blocks: 8          IO Block: 4096   regular file
Device: b302h/45826d	Inode: 809         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-01-11 07:15:50.981160954 -0600
Modify: 2021-05-29 12:37:21.699911385 -0500
Change: 2021-05-29 12:37:21.699911385 -0500
 Birth: -

Not the same for resolvconf.conf

  File: /etc/resolvconf.conf
  Size: 490       	Blocks: 8          IO Block: 4096   regular file
Device: b302h/45826d	Inode: 812         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-01-11 07:14:55.444118840 -0600
Modify: 2016-04-26 01:02:35.000000000 -0500
Change: 2021-01-11 07:14:55.444118840 -0600
 Birth: -

Same for resolveconf_resolvers.conf

  File: /var/cache/unbound/resolvconf_resolvers.conf
  Size: 312       	Blocks: 8          IO Block: 4096   regular file
Device: b302h/45826d	Inode: 257544      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-01-11 07:16:08.629999993 -0600
Modify: 2021-05-29 12:37:21.739910760 -0500
Change: 2021-05-29 12:37:21.739910760 -0500
 Birth: -

In my case, my latest stop is

May 29 22:57:49 unbound[713:0] info: service stopped (unbound 1.13.1).

only /var/cache/unbound/resolvconf_resolvers.conf has an exact matching timestamp

checked out the link you pointed to,

sudo systemctl status unbound-resolvconf.service
Unit unbound-resolvconf.service could not be found.

Thats most likely bc you compiled unbound yourself:

pi@ph5b:~ $ dpkg -S unbound-resolvconf.service
unbound: /lib/systemd/system/unbound-resolvconf.service

@jmccamb, did you try disabling above service?
Could give it a try.
And how did the power swap work out? no more under voltage?

1 Like

Continuing to investigate...
resolvconf is a script.
which resolvconf returns /usr/sbin/resolvconf

The script has a comment regarding SIGHUP. Below that comment, there is a function that builds a RESTARTCMD, I assume the restart commands are build from the content of /etc/resolvconf.conf and eventually executed. The script also rewrites the files, referred in /etc/resolvconf.conf.

I don't know what OS your running, and if there is a different script for each OS.

can you run sha1sum /usr/sbin/resolvconf (modify the location if required - which resolvconf). For my OS ( Raspberry Pi OS Lite, Release date: May 7th 2021), this returns:

sha1sum /usr/sbin/resolvconf
4bfee7ac4e855ae48e35ab9ac37ebb8c2d37d210  /usr/sbin/resolvconf

this to confirm we are looking at the same script / possible problem.

I'm considering to comment out the lines, for dnsmasq and unbound, in /etc/resolvconf.conf, this to verify if the unexpected stop events will disappear. I've already have an IPv6check script, running with cron, this to detect IPv6 address changes on the pi, hardly no effort to add sed instructions to modify the files and send a SIGHUP signal (only when the address has really changed).

Would however like the advice of DL6ER and Bucking_Horn...

1 Like

Thanks, the pointer to resolvconf was the missing element. Quick source code checking shows this is intended behavior:

To be run whenever resolvconf is run by the system. This can very well coincidence with your IPv6 observations.

edit: restart_unbound is defined further up in the same file (not shown here)

2 Likes

A bit off-topic (because not targeting unbound) but it makes me wonder if there should be a "reread /etc/hosts but keep cache otherwise" feature in dnsmasq. Maybe I'll suggest this to the mailing list and check what they say.

1 Like

All my Ipv6 addresses, including temporary addresses are kept (added automatically) in files in a specific directory, processed by dnsmasq, as soon as a file changes by using 'hostsdir=/etc/pfsense'. You can clearly see the new / changed entries are picked up by dnsmasq, without restart, by looking at the logfile.

you could consider moving (v6) the pihole files in a specific directory and use the hostdir= directive to achieve the same for /etc/pihole/local.list and /etc/pihole/custom.list as opposed to using (/etc/dnsmasq.d/01-pihole.conf)

addn-hosts=/etc/pihole/local.list
addn-hosts=/etc/pihole/custom.list

On topic: the unbound 'stopped' messages do not match the time the file(s) (a new IPv6 address added to a file) is changed and dnsmasq reads it. The global dynamic mngtmpaddr, the address in /var/cache/unbound/resolvconf_resolvers.conf, see here, never changes, unless the ISP kicks the modem.

That isn't in my resolvconf, so different versions on different OS, as I already suspected...

image

1 Like

looks like we have the same version:

pi@raspberrypi:/sbin $ sha1sum resolvconf
4bfee7ac4e855ae48e35ab9ac37ebb8c2d37d210  resolvconf

If you find a workaround please let me know.

Also, would blowing up my pihole and starting over with a different OS like dietpi, would that potentially resolve the issue? Or is modifying the resolveconf.conf file something more likely to help?

Can we safely say that this is something not related to pfsense or pihole at this point? For example, if I were to use opnsense would this issue still exist?

Thanks to everyone for you continued support. The devotion and generosity with time is unmatched in this community.

1 Like

@deHakkelaar If i disable the resolvconf.service as per the docs, should I first set the static domain_name_servers= in dhcpcd.conf to 127.0.0.1#5335?

Or is it enough to just disable and stop the service?

No, you cant supply the #5335 port in that dhcpcd.conf file.
Either you just supply the localhost address 127.0.0.1 (without the port) which will be pihole-FTL listening on default port 53:

pi@ph5b:~ $ sudo netstat -nltup | grep 'Proto\|:53 \|:5335 '
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:53              0.0.0.0:*               LISTEN      3233/pihole-FTL
tcp        0      0 127.0.0.1:5335          0.0.0.0:*               LISTEN      3091/unbound
tcp6       0      0 :::53                   :::*                    LISTEN      3233/pihole-FTL
udp        0      0 0.0.0.0:53              0.0.0.0:*                           3233/pihole-FTL
udp        0      0 127.0.0.1:5335          0.0.0.0:*                           3091/unbound
udp6       0      0 :::53                   :::*                                3233/pihole-FTL

Or you enter another DNS server(s).
I've configured my router IP so when tinkering and I break things again, I still have DNS.

1 Like

Oh okay, I got you. I ran the first command:

sudo systemctl disable unbound-resolvconf.service 

Another restart happened, so I am not sure if this is because I didn't follow through with the rest of the steps or because this service wasn't the one actually causing the restarts.