I have disabled DHCP on my router and I'm using Pi-hole's DHCP server. It works fine for less than a day, but at some point it will fail to assign IP addresses until the DNS resolver is restarted. I am running Pi-hole on an Odroid with Ubuntu 20.04.
Expected Behaviour:
Pi-hole assigns DHCP leases to connections
Actual Behaviour:
All new connections to the network fail to obtain IP addresses. Existing connections are fine until DHCP leases expires. Setting up a static IP works fine as well. Restarting the DNS resolver solves this but I don't want to keep restarting it everyday.
Your debug log shows that no DHCP server was available at the time of its creation, confirming your observation that Pi-hole isn't answering DHCP broadcasts anymore:
*** [ DIAGNOSING ]: Discovering active DHCP servers (takes 10 seconds)
Scanning all your interfaces for DHCP servers
DHCP packets received on interface wlan0: 0
DHCP packets received on interface eth0: 0
DHCP packets received on interface lo: 0
Let's see if we can extract DHCP related information from your debug logs.
What's the output of:
The netstat output shows that Pi-hole is currently listening on port 67/DHCP, suggesting DHCP is currently operational.
The following may not be related to your DHCP issue, but still I'm curious:
Why does your Pi-hole bind to 192.168.1.101:53 (DNS) explicitly?
You'd normally choose such a configuration in an attempt to allow coexistence of several processes binding port 53 on the same machine, but on different IPs/interfaces.
Are you perhaps running (or trying to run) such additional processes, e.g. additional dnsmasq instances?
That's unexpected, but maybe the log file has just been rotated before.
You could try to search older plain logs as well:
grep "dnsmasq-dhcp" /var/log/pihole.log*
Or, if that also returns nil, search the compressed log files as well:
zgrep "dnsmasq-dhcp" /var/log/pihole.log*
Yet the best time to run the grep as initially provided would probably be next time when you just start to observe DHCP outages again.
I'm not sure, it could have something to do with me installing resolvconf/dhcpcd and uninstalling it at some point. The Pi-hole wasn't resolving external domains, and I mistakenly thought that installing resolvconf would allow me to add DNS addresses to resolv.conf. Turns out NetworkManager and resolvconf just took turns overwriting resolv.conf. Anyway, that should be fixed now.
You're right, grepping it now gives me
I only see the ARP-injection error, though I'm not sure what this means. The system time when this was taken was Sat Dec 4 18:04:17 UTC 2021, so the logs stopped updating at some point.
Off-topic, what's the best practice for pasting log info? I am concerned that I'm revealing personal information in the logs, even though in this case it's just device names.
We'd then need the token to access the files on our servers, where only trusted Pi-hole members can see them, and where they are auto-deleted after 48 hours.
How did you take care of that?
Note that NetworkManager may run and manage its own dnsmasq instance, which may well interfere with Pi-hole's configuration and operation.
I set up a static IP address and DNS servers using the GUI, and uninstalled resolvconf. Before doing that, the Pi-hole will fail to resolve external domain names, presumably because the DNS servers are not set, as resolv.conf only contains 127.0.0.53. Should I disable NetworkManager somehow?
P.S. After uninstalling resolvconf and setting my static IP + DNS servers using the GUI, resolv.conf now shows
# Generated by dhcpcd from wlan0.dhcp
# /etc/resolv.conf.head can replace this line
nameserver 8.8.8.8
nameserver 8.8.4.4
# /etc/resolv.conf.tail can replace this line
Ok so I realised that I actually had dhcpcd installed, and that it was setting my resolv.conf. I uninstalled it, and resolv.conf became
# Generated by NetworkManager
nameserver 127.0.0.53
Googling 127.0.0.53 seems to tell me that NetworkManager is trying to use systemd-resolve for DNS, but that isn't enabled on my system. Even after enabling it, I still get
ping: google.com: Temporary failure in name resolution
when I try to ping google.com. So to avoid the trouble of figuring out the interaction between NetworkManager and systemd-resolve I just disabled systemd-resolve in NetworkManager following https://gist.github.com/zoilomora/f7d264cefbb589f3f1b1fc2cea2c844c. Restarted NetworkManager and everything seems to work now. I'm not sure whether this will resolve the Pi-hole issue though.
Honestly I have no idea what I'm doing and there seem to be so many services trying to do the same thing (DNS resolver). I'm not sure how they interact with each other. I don't think I installed any of them either, so they must have come together with my Ubuntu image.
Starting Nmap 7.80 ( https://nmap.org ) at 2021-12-05 18:31 PST
Nmap scan report for 192.168.1.101
Host is up (0.000087s latency).
PORT STATE SERVICE
67/udp closed dhcps
Nmap done: 1 IP address (1 host up) scanned in 0.76 seconds
Is this expected? Should the port be closed?
Edit: After restarting DNS resolver, the output changes to
Starting Nmap 7.80 ( https://nmap.org ) at 2021-12-05 19:22 PST
Nmap scan report for 192.168.1.101
Host is up.
PORT STATE SERVICE
67/udp open|filtered dhcps
|_dhcp-discover: ERROR: Script execution failed (use -d to debug)
Nmap done: 1 IP address (1 host up) scanned in 2.83 seconds
so I'm guessing the DHCP server is being closed for some reason?
resolvconf is configuring DNS services.
It wouldn't interfere or cause port conflicts with a DHCP server, so there would have been no need to uninstall it.
dhcpcd is a DHCP client.
It is also the standard DHCP client for some Linux distributions like Raspberry Pi OS.
It wouldn't interfere with a DHCP server, so there would have been no need to uninstall it.
So you could rerun the netstat command I've provided, or the shorter, more specific:
sudo netstat -tulpn | grep "Proto\|:67 "
Your debug log suggests you are running a Desktop variant of Ubuntu.
If you wouldn't depend on a graphical UI on your Pi-hole host, running a headless Ubuntu Server edition may avoid potential issues with NetworkManager, as the Server variant would rely on systemd-networkd.
However, we haven't identified yet why your pihole-FTL instance would drop DHCP support suddenly.
(Note that dnsmasq-dhcp is just a prefix within the log files that allows to identify DHCP related entries. It is still pihole-FTL, Pi-hole's optimised dnsmasq version, that would be responsible for supplying DNS as well as DHCP services.)
Hey! Trying this command sudo netstat -tulpn | grep "Proto\|:67 " gives me
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 0.0.0.0:67 0.0.0.0:* 24404/pihole-FTL
These are the last logs before DHCP stops responding
Dec 8 20:21:00 dnsmasq-dhcp[24404]: DHCPDISCOVER(wlan0) a8:97:cd:8f:81:92
Dec 8 20:21:00 dnsmasq-dhcp[24404]: DHCPOFFER(wlan0) 192.168.1.210 a8:97:cd:8f:81:92
Dec 8 20:21:01 dnsmasq-dhcp[24404]: DHCPDISCOVER(wlan0) a8:97:cd:8f:81:92
Dec 8 20:21:01 dnsmasq-dhcp[24404]: DHCPOFFER(wlan0) 192.168.1.210 a8:97:cd:8f:81:92
Dec 8 20:21:01 dnsmasq[24404]: query[A] mail.google.com from 192.168.1.251
Dec 8 20:21:01 dnsmasq[24404]: cached mail.google.com is <CNAME>
Dec 8 20:21:01 dnsmasq[24404]: cached googlemail.l.google.com is 172.217.6.37
Dec 8 20:21:06 dnsmasq[24404]: query[A] daisy.ubuntu.com from 127.0.0.1
Dec 8 20:21:06 dnsmasq[24404]: forwarded daisy.ubuntu.com to 8.8.8.8
Dec 8 20:21:06 dnsmasq[24404]: reply daisy.ubuntu.com is 162.213.33.132
Dec 8 20:21:06 dnsmasq[24404]: reply daisy.ubuntu.com is 162.213.33.108
I've noticed that 90% of the dhcp logs are repeats of these two lines
Dec 8 20:21:01 dnsmasq-dhcp[24404]: DHCPDISCOVER(wlan0) a8:97:cd:8f:81:92
Dec 8 20:21:01 dnsmasq-dhcp[24404]: DHCPOFFER(wlan0) 192.168.1.210 a8:97:cd:8f:81:92
Is this normal? Is there some device on the network that is constantly failing to be assigned an IP address? I wonder if it's related to why the DHCP stops working.
pihole-FTL.log from the same timeframe
[2021-12-08 19:55:01.678 24404M] INFO: FTL is running as user pihole (UID 998)
[2021-12-08 19:55:01.679 24404M] Reloading DNS cache
[2021-12-08 19:55:01.925 24404M] Blocking status is enabled
[2021-12-08 19:55:02.696 24404/T24408] Compiled 0 whitelist and 0 blacklist regex filters for 12 clients in 0.4 msec
[2021-12-08 20:00:26.743 24404M] Resizing "FTL-dns-cache" from 4096 to (512 * 16) == 8192 (/dev/shm: 4.5MB used, 1.0GB total, FTL uses 4.5MB)
[2021-12-08 20:25:27.579 24404M] Resizing "FTL-dns-cache" from 8192 to (768 * 16) == 12288 (/dev/shm: 4.5MB used, 1.0GB total, FTL uses 4.5MB)
There is no indication why DHCP just stops working. Is there some kind of error log I could look into? Thanks for the help so far!