Maybe the journals still have that PID logged somewhere:
journalctl | less -p 13932
Maybe the journals still have that PID logged somewhere:
journalctl | less -p 13932
Pattern not found
Interestingly enough. Haven't had any issues since reporting last. (knock on wood).
I am sure it will happen. Just not sure when.
If it does I will try this command with the relevant PID
You may find some information in the system log files /var/log/messages
, /var/log/syslog
. Some processes print messages to the system log when it starts.
Awesome, I will check that too. Thank you!
Did you try to increase the Unbound verbosity, to get some more info upon unexpected service restart?
sed -i 's/verbosity: 0/verbosity: 2/' /etc/unbound/unbound.conf.d/pi-hole.conf
systemctl restart unbound
And about dhcpcd IPv6 expiry: You said you tried to set a static IPv6 via pfsense, but stat means you reserved an IP, right? In this case, a router advertisement (DHCP in case of IPv4) is still required. Did you think about setting a real static IP via /etc/dhcpcd.conf
, like below?
interface eth0
static ipaddress=192.168.0.2
static ip6address=xxxx:yyyy:etc...
static routers=192.168.0.1
static domain_name_servers=1.1.1.1
EDIT: All logs from /var/log/messages
and /var/log/syslog
are btw coming from journalctl
, i.e. systemd-journald
sends them to rsyslog
, which filters, sorts and writes them to the known /var/log/
files. So likely you won't find anything new there. Only after reboot or crash, the journal will be empty, as it is stored in a tmpfs (/run/log/journal/
) by default, while the /var/log
directory is usually preserved on disk.
Increased the logging verbosity as per your advice.
In terms of the static IP. I set both a static IPV4 and IPV6 address on pfsense, but I didn't change it on the actual device. The IPV4 address is correct in /etc/dhcpcd.conf
, but should I also hard code the static IPV6 address as well?
If anything (whether it's a LAN client or a service on the Pi-hole system itself) relies on that IP and the router for whatever reason sends RAs in longer intervals than its TTL, then yes. I guess it doesn't matter and you could also disable this ULA address advertising in the router completely, so that LAN hostnames are assured to resolve to IPv4 addresses. If IPv6 connections are really wanted within the LAN, then the SLAAC addresses can be used, which are anyway always static and predictable (derived from only the MAC of related adapter).
Sorry, I am not very familiar with this kind of stuff. A few clarifying questions:
If I add the IPV6 address in dhcpcd.conf, which one would I add? There are a few listed when running ifconfig
. Would I just copy the static IP that I set for the device in pfsense and paste it in dhcpcd.conf
?
In terms of disabling ULA address advertising on the router. Is that akin to changing these settings in the RA section of my router? Or do you mean disabling ipv6 entirely
Sorry for all the questions. When it comes to IPV6 I really am lost.
As I have been checking / resolving problems I noticed in my logs, I also noticed the unexpected unbound stop messages in my logs, and have been looking into this.
Found and solved one of the unexpected stops, see here, however there are more..
After looking at several logs, I managed to match the remaining unbound stop times to a syslog entry:
May 14 06:15:26 raspberrypi dhcpcd[562]: eth0: part of Router Advertisement expired
which matches the stop entry in the unbound log
May 14 06:15:26 unbound[790:0] info: service stopped (unbound 1.13.1).
I opened a issue here on the netgate forum, but so far, nothing useful has come out of this.
details:
on the WAN interface:
on the LAN interface
I'm currently assuming this is 'normal', nothing you can do about it, I must admit I haven't gone trough the strace process yet, this to try to identify the process that sends the SIGHUP.
# level 0 means no verbosity, only errors.
# level 1 gives operational information.
# level 2 gives detailed operational information.
# level 3 gives query level information, output per query.
# level 4 gives algorithm level information.
# level 5 logs client identification for cache misses.
The one set in pfsense for that device was my idea. But to be true, I'm not sure if it will then stop respecting RAs or simply have one IPv6 address more . I'm also not 100% familiar with IPv6.
Basically there are min tree types of addresses assigned: the MAC-based autoconfigured SLAAC addresses, the ULAs (unique local addresses), advertised by the router and the GUAs (global unicast addresses), also advertised by the router, required if you want to access a device within LAN from www without NAT. And in theory there is DHCPv6, which is rarely used. For a usual home network, IPv4 is totally sufficient and an IPv6 address on each client only required for them to connect to remote IPv6 hosts. For that SLAAC is sufficient and can be used as well to connect via IPv6 to LAN clients. SLAAC is always used immediately by every network adapter/interface where IPv6 is enabled, no router or anything else needs to trigger that. So in your case, choosing "Disabled" should be fine and reduce a bit overhead and a bit more possible confusion in your home network .
EDIT: What I forgot about is the IPv6 default route/gateway that relies on the RA as well, when not statically set. Not sure if "Disabled" means that the router won't send any RAs or only that is won't send ULAs and GUAs . When curl -6 google.com
suddenly fails with "No route to host", we'll know.
If I disabled the pi from accepting ipv6 RA with net.ipv6.conf.all.disable_ipv6 = 1
in /etc/sysctl.conf
will this prevent it from restarting unbound?
I quickly checked the dhcpcd
source code. dhcpcd
simply reloads its configuration on receipt of SIGHUP
:
It does not send any signals to any non-dhcpcd
processes so it is a "victim" of the same underlying issue.
Summary:
The cause seems to be a network device update (address change). The process causing it does not exist when manually calling ps
so it is either short-lived or exited already. It may be (a) a terminating network process that fires this signal or (b) an intermediate process only existing for a few moments or (c) it is the new network process but the SIGHUP
is fired before the process changed its PID by forking, etc.
To address
ps
every few minutes. This will be your memory for later checking if this PID every existed (for long enough time).+1
and +2
(look for 124
and 125
when the PID of the SIGHUP-sender is 123
). This process may be unrelated but if it is always the same one, we should take a closer look here.quick question
My unbound always stops twice in a very short time frame, same in your environment?
cat /etc/unbound/log/unbound.log |grep stopped
May 29 02:17:56 unbound[14980:0] info: service stopped (unbound 1.13.1).
May 29 02:18:02 unbound[14980:0] info: service stopped (unbound 1.13.1).
May 29 04:22:57 unbound[14980:0] info: service stopped (unbound 1.13.1).
May 29 04:23:03 unbound[14980:0] info: service stopped (unbound 1.13.1).
Not quite.
An RFC 4941 IPV6 Privacy extension address can also be considered a special application of SLAAC, and EUI64 addresses (containing a MAC) may be used just as well to provide the interface portion of an IPv6 ULA or GUA.
There are three types or ranges of IPv6 addresses (edit: well, truth is there are more, but these are the most common and also most relevant ones):
link-local (fe80::/10
), ULA (fd00::/8
) and GUA (2000::/3
).
There are also three methods how a client can acquire those:
SLAAC as well as Stateless and Stateful DHCPv6.
A client may use SLAAC to construct IPv6 addresses for all three ranges, or it may choose any combination of those methods to acquire any set of addresses, depending on a client's configuration and availability of router support for those methods.
Link-local addresses are the only ones that can be acquired by SLAAC exclusively, but they cannot be relied upon to be predictable: A client may be configured to forego EUI48/64 addresses in favour of an RFC 7217 address. In Raspberry Pi OS, this is the default (configured by slaac private
in dhcpcd.conf
).
Maybe.
Note that some networking tools would ignore those settings, relying on their own configuration options instead - I believe NetworkManager may fall into that category.
If we cannot find this on the receiving end, let's try and capture it at the sender, using auditd
.
EDIT: I've successfully tested below steps on Raspbian Buster, but couldn't get them to work on Armbian (the audit log never populated).
It may only be worth the hassle if you would run Raspberry Pi OS as well.
Since we don't know the sender, we have to capture traffic of all processes, so this may incur a performance hit for all of your system, and the respective log file may grow large quickly. And we still need strace
to find the pid.
Don't forget to disable it once finished.
We start by installing auditd
:
sudo apt install auditd audispd-plugins
Enable auditing:
sudo auditd -s enable
Find your system CPU architecture, e.g.:
~ $ uname -m
armv7l
Use that arch when creating a rule to capture system calls to kill -HUP
:
sudo auditctl -a exit,always -F arch=armv7l -S kill -F a1=1
EDIT: In an attempt to limit audit log growth, I tried to be as restrictive as possible here, by assuming that unbound
is indeed HUPed by an unknown process. I think that's a safe assumption, as the observed behaviour is documented for HUP reception in unbound
. Still, if it proves too restrictive and you can't find a match, you may drop the trailing -F a1=1
.
Verify that rule got created, e.g.:
~ $ sudo auditctl -l
-a always,exit -F arch=b32 -S kill -F a1=0x1
You should see an audit log at /var/log/audit/audit.log
Check that audit logging is working:
sudo auditctl -m 'verify its working'
Verify that statement made it into your audit log:
sudo grep -n 'verify' /var/log/audit/audit.log
If everything works, have your strace
running also and wait for unbound
to restart.
Once that happens, extract the pid of the process that HUPed unbound
from your strace
log as before, and use that to determine the offending process:
sudo grep -n -B1 '=29812 ' /var/log/audit/audit.log
In the above, replace 29812
with the pid from your strace
(keep the leading =
and trailing space).
That should give you a few lines like:
1265-type=USER_START msg=audit(1622277865.428:1119): pid=29811 uid=0 auid=1000 ses=1522 msg='op=PAM:session_open grantors=pam_permit,pam_unix acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
1266:type=SYSCALL msg=audit(1622277865.448:1120): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=1f6 a1=1 a2=42adc00 a3=42adc00 items=0 ppid=29811 pid=29812 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts1 ses=1522 comm="kill" exe="/bin/kill" key=(null)
where exe=
should point to the binary (at the end of line 1266 in the example above).
Let's hope that isn't just a shell /bin/kill
spawned by yet another unknown process (in the example I used, it is of course such a command that I used to trigger those log lines).
Once done, don't forget to disable auditing:
sudo auditd -s disable
sudo service auditd stop
sudo systemctl disable auditd
@Bucking_Horn
Thanks for clarification. With "SLAAC address" I basically meant the first step of SLAAC to create an LLA from the MAC (being imprecise), while everything else implies the router(s): https://www.networkacademy.io/ccna/ipv6/stateless-address-autoconfiguration-slaac
It remains complex, which might be the major issue with IPv6 in general and why many users with average knowledge still need to disable it to resolve this or that network issue. Our VPS is IPv4 only, because initial IPv6 packets from the server have 10-30 seconds delay, when no packet was sent within ~40 seconds, an no one is able to find out why, probably ISP routing related .
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
The second is usually required as well to disable it for any newly configured network interfaces, after that rules got applied (which is very early at boot). Good to know that it can be ignored. When it's successful, ip a
does not show any inet6
.
Last-last resort would be to disable the IPv6 kernel module, respectively via kernel cmdline when it's builtin. But that requires some software configs to be changed, including Unbound AFAIK, to assure that they do not even try to interpret IPv6 addresses.
But I like the diagnosing attempts, instead of using the hammer (and it's still now 100% clear that IPv6 is the issue(?)). To further diagnose on the network side, tcpdump
might be interesting as well, e.g. to print all RAs and other NDs around in the network:
apt install tcpdump
tcpdump icmp6
Okay I will give this a shot and report back! Thanks for taking the time to do this.
I will keep you posted.
Here is what I got. I was lucky and a restart happened pretty much right after I set up the audit:
141-type=PROCTITLE msg=audit(1622304458.990:134): proctitle=2F62696E2F7368002F7573722F7362696E2F7265736F6C76636F6E66002D6100657468302E7261
142:type=SYSCALL msg=audit(1622304459.310:135): arch=40000028 syscall=37 per=800000 success=yes exit=0 a0=67eb a1=1 a2=7fffffff a3=0 items=0 ppid=12586 pid=12626 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="resolvconf" exe="/usr/bin/dash" key=(null)
I still have the audit running just in case we need more info.
interesting, can you please check /etc/resolvconf.conf
mine contains the following:
# Configuration for resolvconf(8)
# See resolvconf.conf(5) for details
resolv_conf=/etc/resolv.conf
# If you run a local name server, you should uncomment the below line and
# configure your subscribers configuration files below.
#name_servers=127.0.0.1
# Mirror the Debian package defaults for the below resolvers
# so that resolvconf integrates seemlessly.
dnsmasq_resolv=/var/run/dnsmasq/resolv.conf
pdnsd_conf=/etc/pdnsd.conf
unbound_conf=/var/cache/unbound/resolvconf_resolvers.conf
the entry points to /var/cache/unbound/resolvconf_resolvers.conf. that file contains:
# Generated by resolvconf
forward-zone:
name: "localdomain"
forward-addr: the IP v6 address of my pihole - masked for privacy
forward-zone:
name: "."
forward-addr: the IP v6 address of my pihole - masked for privacy
please check this...
DL6ER and Bucking_Horn, please advise...
Here in the info you requested:
resolv_conf=/etc/resolv.conf
# If you run a local name server, you should uncomment the below line and
# configure your subscribers configuration files below.
#name_servers=127.0.0.1
# Mirror the Debian package defaults for the below resolvers
# so that resolvconf integrates seemlessly.
dnsmasq_resolv=/var/run/dnsmasq/resolv.conf
pdnsd_conf=/etc/pdnsd.conf
unbound_conf=/var/cache/unbound/resolvconf_resolvers.conf
resolvconf_resolvers.conf
# Generated by resolvconf
forward-zone:
name: "home.arpa"
forward-addr: static IPv6 I set in pfsense and setupVars
forward-addr: the IPv6 that was assigned to the device initially
forward-zone:
name: "."
forward-addr: 1.1.1.1
forward-addr: 1.0.0.1
forward-addr: static IPv6 I set in pfsense and setupVars
forward-addr: the IPv6 that was assigned to the device initially
I thought that they were more interested in the exe though. Does this have anything to do with this mention in the docs?