Unbound frequent restarts

deHakkelaar · May 28, 2021, 3:33pm

Maybe the journals still have that PID logged somewhere:

journalctl | less -p 13932

jmccamb · May 28, 2021, 6:16pm

Pattern not found

Interestingly enough. Haven't had any issues since reporting last. (knock on wood).

I am sure it will happen. Just not sure when.

If it does I will try this command with the relevant PID

DL6ER · May 28, 2021, 7:15pm

You may find some information in the system log files /var/log/messages , /var/log/syslog . Some processes print messages to the system log when it starts.

jmccamb · May 28, 2021, 7:34pm

Awesome, I will check that too. Thank you!

MichaIng · May 28, 2021, 9:17pm

Did you try to increase the Unbound verbosity, to get some more info upon unexpected service restart?

sed -i 's/verbosity: 0/verbosity: 2/' /etc/unbound/unbound.conf.d/pi-hole.conf
systemctl restart unbound

And about dhcpcd IPv6 expiry: You said you tried to set a static IPv6 via pfsense, but stat means you reserved an IP, right? In this case, a router advertisement (DHCP in case of IPv4) is still required. Did you think about setting a real static IP via /etc/dhcpcd.conf, like below?

interface eth0
    static ipaddress=192.168.0.2
    static ip6address=xxxx:yyyy:etc...
    static routers=192.168.0.1
    static domain_name_servers=1.1.1.1

EDIT: All logs from /var/log/messages and /var/log/syslog are btw coming from journalctl, i.e. systemd-journald sends them to rsyslog, which filters, sorts and writes them to the known /var/log/ files. So likely you won't find anything new there. Only after reboot or crash, the journal will be empty, as it is stored in a tmpfs (/run/log/journal/) by default, while the /var/log directory is usually preserved on disk.

jmccamb · May 28, 2021, 9:37pm

Increased the logging verbosity as per your advice.

In terms of the static IP. I set both a static IPV4 and IPV6 address on pfsense, but I didn't change it on the actual device. The IPV4 address is correct in /etc/dhcpcd.conf, but should I also hard code the static IPV6 address as well?

MichaIng · May 28, 2021, 9:58pm

If anything (whether it's a LAN client or a service on the Pi-hole system itself) relies on that IP and the router for whatever reason sends RAs in longer intervals than its TTL, then yes. I guess it doesn't matter and you could also disable this ULA address advertising in the router completely, so that LAN hostnames are assured to resolve to IPv4 addresses. If IPv6 connections are really wanted within the LAN, then the SLAAC addresses can be used, which are anyway always static and predictable (derived from only the MAC of related adapter).

jmccamb · May 28, 2021, 10:10pm

Sorry, I am not very familiar with this kind of stuff. A few clarifying questions:

If I add the IPV6 address in dhcpcd.conf, which one would I add? There are a few listed when running ifconfig. Would I just copy the static IP that I set for the device in pfsense and paste it in dhcpcd.conf?
In terms of disabling ULA address advertising on the router. Is that akin to changing these settings in the RA section of my router? Or do you mean disabling ipv6 entirely

Sorry for all the questions. When it comes to IPV6 I really am lost.

jpgpi250 · May 28, 2021, 10:11pm

As I have been checking / resolving problems I noticed in my logs, I also noticed the unexpected unbound stop messages in my logs, and have been looking into this.

Found and solved one of the unexpected stops, see here, however there are more..

After looking at several logs, I managed to match the remaining unbound stop times to a syslog entry:

May 14 06:15:26 raspberrypi dhcpcd[562]: eth0: part of Router Advertisement expired

which matches the stop entry in the unbound log

May 14 06:15:26 unbound[790:0] info: service stopped (unbound 1.13.1).

I opened a issue here on the netgate forum, but so far, nothing useful has come out of this.

details:

pfsense hands out IPv4 DHCP addresses
pihole (raspberry pi) doesn't use a static IPv4 address, but gets a DHCP assigned address that never changes (MAC based DHCP address assignment). This makes it easier for me to swap SD cards between my production and test pi's.
the IPv4 DHCP renewals (pfsense DHCP log) don't match the unbound stop (unbound log on the raspberry pi), so it's not the IPv4 DHCP server that causes this.
IPv6 addresses GUA are distributed in the network, using pfsense 'track interface'.

on the WAN interface:

on the LAN interface

The IPv6 addresses on the devices never change, unless the ISP kicks the modem, in which case the WAN IPv6 changes, thus all tracked addresses (if the prefix changes).
The IPv6 addresses on the devices also never change, unless I move the SD card to another raspberry pi (OR the WAN address changes - new prefix)
On the raspberry pi, I've configured temporary IPv6 addresses, changes every 2 hours, the address changes don't match the unbound stops.
There are no noticeable related events on the pfsense box @ the time unbound stops.

I'm currently assuming this is 'normal', nothing you can do about it, I must admit I haven't gone trough the strace process yet, this to try to identify the process that sends the SIGHUP.

running unbound with verbosity 1 here all the time, no additional information found yet.

	# level 0 means no verbosity, only errors.
	# level 1 gives operational information.
	# level 2 gives detailed operational  information.
	# level 3 gives query level information, output per query.
	# level 4 gives algorithm  level  information.
	# level 5 logs client identification for cache misses.

MichaIng · May 28, 2021, 10:56pm

The one set in pfsense for that device was my idea. But to be true, I'm not sure if it will then stop respecting RAs or simply have one IPv6 address more . I'm also not 100% familiar with IPv6.

Basically there are min tree types of addresses assigned: the MAC-based autoconfigured SLAAC addresses, the ULAs (unique local addresses), advertised by the router and the GUAs (global unicast addresses), also advertised by the router, required if you want to access a device within LAN from www without NAT. And in theory there is DHCPv6, which is rarely used. For a usual home network, IPv4 is totally sufficient and an IPv6 address on each client only required for them to connect to remote IPv6 hosts. For that SLAAC is sufficient and can be used as well to connect via IPv6 to LAN clients. SLAAC is always used immediately by every network adapter/interface where IPv6 is enabled, no router or anything else needs to trigger that. So in your case, choosing "Disabled" should be fine and reduce a bit overhead and a bit more possible confusion in your home network .

EDIT: What I forgot about is the IPv6 default route/gateway that relies on the RA as well, when not statically set. Not sure if "Disabled" means that the router won't send any RAs or only that is won't send ULAs and GUAs . When curl -6 google.com suddenly fails with "No route to host", we'll know.

jmccamb · May 29, 2021, 3:10am

If I disabled the pi from accepting ipv6 RA with net.ipv6.conf.all.disable_ipv6 = 1 in /etc/sysctl.conf will this prevent it from restarting unbound?

DL6ER · May 29, 2021, 6:20am

I quickly checked the dhcpcd source code. dhcpcd simply reloads its configuration on receipt of SIGHUP:

github.com/NetworkConfiguration/dhcpcd

src/dhcpcd.c

01748b315


      
          	case SIGHUP:
          		loginfox(sigmsg, "SIGHUP", "rebinding");
          		reload_config(ctx);
          		/* Preserve any options passed on the commandline
          		 * when we were started. */
          		reconf_reboot(ctx, 1, ctx->argc, ctx->argv,
          		    ctx->argc - ctx->ifc);
          		return;

It does not send any signals to any non-dhcpcd processes so it is a "victim" of the same underlying issue.

Summary:
The cause seems to be a network device update (address change). The process causing it does not exist when manually calling ps so it is either short-lived or exited already. It may be (a) a terminating network process that fires this signal or (b) an intermediate process only existing for a few moments or (c) it is the new network process but the SIGHUP is fired before the process changed its PID by forking, etc.

To address

(a): Try setting up a small job that stores the content of ps every few minutes. This will be your memory for later checking if this PID every existed (for long enough time).
(b): I don't think this can be addressed easily.
(c): Try to find a process whose PID is the PID you are looking for +1 and +2 (look for 124 and 125 when the PID of the SIGHUP-sender is 123). This process may be unrelated but if it is always the same one, we should take a closer look here.

jpgpi250 · May 29, 2021, 7:26am

quick question

My unbound always stops twice in a very short time frame, same in your environment?

cat /etc/unbound/log/unbound.log |grep stopped
May 29 02:17:56 unbound[14980:0] info: service stopped (unbound 1.13.1).
May 29 02:18:02 unbound[14980:0] info: service stopped (unbound 1.13.1).
May 29 04:22:57 unbound[14980:0] info: service stopped (unbound 1.13.1).
May 29 04:23:03 unbound[14980:0] info: service stopped (unbound 1.13.1).

@jmccamb

Bucking_Horn · May 29, 2021, 8:10am

Just a side note on IPv6 addresses (click for details).

Not quite.

An RFC 4941 IPV6 Privacy extension address can also be considered a special application of SLAAC, and EUI64 addresses (containing a MAC) may be used just as well to provide the interface portion of an IPv6 ULA or GUA.

There are three types or ranges of IPv6 addresses (edit: well, truth is there are more, but these are the most common and also most relevant ones):
link-local (fe80::/10), ULA (fd00::/8) and GUA (2000::/3).
There are also three methods how a client can acquire those:
SLAAC as well as Stateless and Stateful DHCPv6.

A client may use SLAAC to construct IPv6 addresses for all three ranges, or it may choose any combination of those methods to acquire any set of addresses, depending on a client's configuration and availability of router support for those methods.

Link-local addresses are the only ones that can be acquired by SLAAC exclusively, but they cannot be relied upon to be predictable: A client may be configured to forego EUI48/64 addresses in favour of an RFC 7217 address. In Raspberry Pi OS, this is the default (configured by slaac private in dhcpcd.conf).

Bucking_Horn · May 29, 2021, 8:14am

Maybe.
Note that some networking tools would ignore those settings, relying on their own configuration options instead - I believe NetworkManager may fall into that category.

If we cannot find this on the receiving end, let's try and capture it at the sender, using auditd.

EDIT: I've successfully tested below steps on Raspbian Buster, but couldn't get them to work on Armbian (the audit log never populated).
It may only be worth the hassle if you would run Raspberry Pi OS as well.

Since we don't know the sender, we have to capture traffic of all processes, so this may incur a performance hit for all of your system, and the respective log file may grow large quickly. And we still need strace to find the pid.

Don't forget to disable it once finished.

We start by installing auditd:

sudo apt install auditd audispd-plugins

Enable auditing:

sudo auditd -s enable

Find your system CPU architecture, e.g.:

~ $ uname -m
armv7l

Use that arch when creating a rule to capture system calls to kill -HUP:

sudo auditctl -a exit,always -F arch=armv7l -S kill -F a1=1

EDIT: In an attempt to limit audit log growth, I tried to be as restrictive as possible here, by assuming that unbound is indeed HUPed by an unknown process. I think that's a safe assumption, as the observed behaviour is documented for HUP reception in unbound. Still, if it proves too restrictive and you can't find a match, you may drop the trailing -F a1=1.

Verify that rule got created, e.g.:

~ $ sudo auditctl -l
-a always,exit -F arch=b32 -S kill -F a1=0x1

You should see an audit log at /var/log/audit/audit.log
Check that audit logging is working:

sudo auditctl -m 'verify its working'

Verify that statement made it into your audit log:

sudo grep -n 'verify' /var/log/audit/audit.log

If everything works, have your strace running also and wait for unbound to restart.

Once that happens, extract the pid of the process that HUPed unbound from your strace log as before, and use that to determine the offending process:

sudo grep -n -B1 '=29812 ' /var/log/audit/audit.log

In the above, replace 29812 with the pid from your strace (keep the leading = and trailing space).

That should give you a few lines like:

1265-type=USER_START msg=audit(1622277865.428:1119): pid=29811 uid=0 auid=1000 ses=1522 msg='op=PAM:session_open grantors=pam_permit,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
1266:type=SYSCALL msg=audit(1622277865.448:1120): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=1f6 a1=1 a2=42adc00 a3=42adc00 items=0 ppid=29811 pid=29812 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts1 ses=1522 comm="kill" exe="/bin/kill" key=(null)

where exe= should point to the binary (at the end of line 1266 in the example above).
Let's hope that isn't just a shell /bin/kill spawned by yet another unknown process (in the example I used, it is of course such a command that I used to trigger those log lines).

Once done, don't forget to disable auditing:

sudo auditd -s disable
sudo service auditd stop
sudo systemctl disable auditd

MichaIng · May 29, 2021, 10:01am

@Bucking_Horn
Thanks for clarification. With "SLAAC address" I basically meant the first step of SLAAC to create an LLA from the MAC (being imprecise), while everything else implies the router(s): IPv6 Stateless Address Auto-configuration (SLAAC) | NetworkAcademy.io
It remains complex, which might be the major issue with IPv6 in general and why many users with average knowledge still need to disable it to resolve this or that network issue. Our VPS is IPv4 only, because initial IPv6 packets from the server have 10-30 seconds delay, when no packet was sent within ~40 seconds, an no one is able to find out why, probably ISP routing related .

@jmccamb

If you really want to disable IPv6 completely (for testing or if we cannot find any better solution):

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1

The second is usually required as well to disable it for any newly configured network interfaces, after that rules got applied (which is very early at boot). Good to know that it can be ignored. When it's successful, ip a does not show any inet6.

Last-last resort would be to disable the IPv6 kernel module, respectively via kernel cmdline when it's builtin. But that requires some software configs to be changed, including Unbound AFAIK, to assure that they do not even try to interpret IPv6 addresses.

But I like the diagnosing attempts, instead of using the hammer (and it's still now 100% clear that IPv6 is the issue(?)). To further diagnose on the network side, tcpdump might be interesting as well, e.g. to print all RAs and other NDs around in the network:

apt install tcpdump
tcpdump icmp6

jmccamb · May 29, 2021, 3:42pm

Okay I will give this a shot and report back! Thanks for taking the time to do this.

I will keep you posted.

jmccamb · May 29, 2021, 4:11pm

Here is what I got. I was lucky and a restart happened pretty much right after I set up the audit:

141-type=PROCTITLE msg=audit(1622304458.990:134): proctitle=2F62696E2F7368002F7573722F7362696E2F7265736F6C76636F6E66002D6100657468302E7261
142:type=SYSCALL msg=audit(1622304459.310:135): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=67eb a1=1 a2=7fffffff a3=0 items=0 ppid=12586 pid=12626 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="resolvconf" exe="/usr/bin/dash" key=(null)

I still have the audit running just in case we need more info.

jpgpi250 · May 29, 2021, 5:39pm

interesting, can you please check /etc/resolvconf.conf

mine contains the following:

# Configuration for resolvconf(8)
# See resolvconf.conf(5) for details

resolv_conf=/etc/resolv.conf
# If you run a local name server, you should uncomment the below line and
# configure your subscribers configuration files below.
#name_servers=127.0.0.1

# Mirror the Debian package defaults for the below resolvers
# so that resolvconf integrates seemlessly.
dnsmasq_resolv=/var/run/dnsmasq/resolv.conf
pdnsd_conf=/etc/pdnsd.conf
unbound_conf=/var/cache/unbound/resolvconf_resolvers.conf

never mind the dnsmasq entry, dnsmasq was installed before pihole-FTL disables it.
it contains an entry for unbound (is this the culprit?)

the entry points to /var/cache/unbound/resolvconf_resolvers.conf. that file contains:

# Generated by resolvconf

forward-zone:
	name: "localdomain"
	forward-addr: the IP v6 address of my pihole - masked for privacy

forward-zone:
	name: "."
	forward-addr: the IP v6 address of my pihole - masked for privacy

please check this...

DL6ER and Bucking_Horn, please advise...

jmccamb · May 29, 2021, 5:52pm

Here in the info you requested:

resolv_conf=/etc/resolv.conf
# If you run a local name server, you should uncomment the below line and
# configure your subscribers configuration files below.
#name_servers=127.0.0.1

# Mirror the Debian package defaults for the below resolvers
# so that resolvconf integrates seemlessly.
dnsmasq_resolv=/var/run/dnsmasq/resolv.conf
pdnsd_conf=/etc/pdnsd.conf
unbound_conf=/var/cache/unbound/resolvconf_resolvers.conf

resolvconf_resolvers.conf

# Generated by resolvconf

forward-zone:
	name: "home.arpa"
	forward-addr: static IPv6 I set in pfsense and setupVars
	forward-addr: the IPv6 that was assigned to the device initially

forward-zone:
	name: "."
	forward-addr: 1.1.1.1
	forward-addr: 1.0.0.1
	forward-addr: static IPv6 I set in pfsense and setupVars
	forward-addr: the IPv6 that was assigned to the device initially

I thought that they were more interested in the exe though. Does this have anything to do with this mention in the docs?