Pi-hole DHCP server intermittently unable to assign IP addresses until DNS resolver is restarted

I have disabled DHCP on my router and I'm using Pi-hole's DHCP server. It works fine for less than a day, but at some point it will fail to assign IP addresses until the DNS resolver is restarted. I am running Pi-hole on an Odroid with Ubuntu 20.04.

Expected Behaviour:

Pi-hole assigns DHCP leases to connections

Actual Behaviour:

All new connections to the network fail to obtain IP addresses. Existing connections are fine until DHCP leases expires. Setting up a static IP works fine as well. Restarting the DNS resolver solves this but I don't want to keep restarting it everyday.

Debug Token:

https://tricorder.pi-hole.net/0CRxPe4h/
This was taken after the IP addresses assignment starts failing.

Your debug log shows that no DHCP server was available at the time of its creation, confirming your observation that Pi-hole isn't answering DHCP broadcasts anymore:

*** [ DIAGNOSING ]: Discovering active DHCP servers (takes 10 seconds)
   Scanning all your interfaces for DHCP servers
   
   DHCP packets received on interface wlan0: 0
   DHCP packets received on interface eth0: 0
   DHCP packets received on interface lo: 0

Let's see if we can extract DHCP related information from your debug logs.
What's the output of:

grep "dnsmasq-dhcp" /var/log/pihole.log

And what does the following command return:

sudo netstat -tulpn | grep "Proto\|:53 \|:67 \|:80 \|:4711 "

Thanks for helping!

grep "dnsmasq-dhcp" /var/log/pihole.log returns nothing.

sudo netstat -tulpn | grep "Proto\|:53 \|:67 \|:80 \|:4711 " returns

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name                       
tcp        0      0 127.0.0.1:4711          0.0.0.0:*               LISTEN      30583/pihole-FTL                       
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      1010/lighttpd                          
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      30583/pihole-FTL                       
tcp        0      0 192.168.1.101:53        0.0.0.0:*               LISTEN      30583/pihole-FTL                       
tcp6       0      0 ::1:4711                :::*                    LISTEN      30583/pihole-FTL                       
tcp6       0      0 :::80                   :::*                    LISTEN      1010/lighttpd                          
tcp6       0      0 ::1:53                  :::*                    LISTEN      30583/pihole-FTL                       
udp        0      0 127.0.0.1:53            0.0.0.0:*                           30583/pihole-FTL                       
udp        0      0 192.168.1.101:53        0.0.0.0:*                           30583/pihole-FTL                       
udp        0      0 0.0.0.0:67              0.0.0.0:*                           30583/pihole-FTL                       
udp6       0      0 ::1:53                  :::*                                30583/pihole-FTL 

The netstat output shows that Pi-hole is currently listening on port 67/DHCP, suggesting DHCP is currently operational.

The following may not be related to your DHCP issue, but still I'm curious:
Why does your Pi-hole bind to 192.168.1.101:53 (DNS) explicitly?

You'd normally choose such a configuration in an attempt to allow coexistence of several processes binding port 53 on the same machine, but on different IPs/interfaces.

Are you perhaps running (or trying to run) such additional processes, e.g. additional dnsmasq instances?

That's unexpected, but maybe the log file has just been rotated before.

You could try to search older plain logs as well:

grep "dnsmasq-dhcp" /var/log/pihole.log*

Or, if that also returns nil, search the compressed log files as well:

zgrep "dnsmasq-dhcp" /var/log/pihole.log*

Yet the best time to run the grep as initially provided would probably be next time when you just start to observe DHCP outages again.

I'm not sure, it could have something to do with me installing resolvconf/dhcpcd and uninstalling it at some point. The Pi-hole wasn't resolving external domains, and I mistakenly thought that installing resolvconf would allow me to add DNS addresses to resolv.conf. Turns out NetworkManager and resolvconf just took turns overwriting resolv.conf. Anyway, that should be fixed now.

You're right, grepping it now gives me

I only see the ARP-injection error, though I'm not sure what this means. The system time when this was taken was Sat Dec 4 18:04:17 UTC 2021, so the logs stopped updating at some point.

Off-topic, what's the best practice for pasting log info? I am concerned that I'm revealing personal information in the logs, even though in this case it's just device names.

If files get too large or if they contain sensitive information, you could share any file output by uploading it via Pi-hole's tricorder, e.g.

cat /var/log/pihole-FTL.log | pihole tricorder

or in your case

grep "dnsmasq-dhcp" /var/log/pihole.log | pihole tricorder

(See also How do I debug my Pi-hole installation? for other ways.)

We'd then need the token to access the files on our servers, where only trusted Pi-hole members can see them, and where they are auto-deleted after 48 hours.

How did you take care of that?

Note that NetworkManager may run and manage its own dnsmasq instance, which may well interfere with Pi-hole's configuration and operation.

I set up a static IP address and DNS servers using the GUI, and uninstalled resolvconf. Before doing that, the Pi-hole will fail to resolve external domain names, presumably because the DNS servers are not set, as resolv.conf only contains 127.0.0.53. Should I disable NetworkManager somehow?

P.S. After uninstalling resolvconf and setting my static IP + DNS servers using the GUI, resolv.conf now shows

# Generated by dhcpcd from wlan0.dhcp                                                                                  
# /etc/resolv.conf.head can replace this line                                                                           
nameserver 8.8.8.8                                                                                                      
nameserver 8.8.4.4                                                                                                      
# /etc/resolv.conf.tail can replace this line 

Ok so I realised that I actually had dhcpcd installed, and that it was setting my resolv.conf. I uninstalled it, and resolv.conf became

# Generated by NetworkManager
nameserver 127.0.0.53

Googling 127.0.0.53 seems to tell me that NetworkManager is trying to use systemd-resolve for DNS, but that isn't enabled on my system. Even after enabling it, I still get

ping: google.com: Temporary failure in name resolution 

when I try to ping google.com. So to avoid the trouble of figuring out the interaction between NetworkManager and systemd-resolve I just disabled systemd-resolve in NetworkManager following https://gist.github.com/zoilomora/f7d264cefbb589f3f1b1fc2cea2c844c. Restarted NetworkManager and everything seems to work now. I'm not sure whether this will resolve the Pi-hole issue though.

Honestly I have no idea what I'm doing and there seem to be so many services trying to do the same thing (DNS resolver). I'm not sure how they interact with each other. I don't think I installed any of them either, so they must have come together with my Ubuntu image.

The issue is still happening, here's a new debug token: https://tricorder.pi-hole.net/9wCrhFvx/

I assume dnsmasq-dhcp is being stopped seemingly randomly because there are no logs for it after a certain time.

How do I check whether dnsmasq-dhcp is actually up and whether it's being stopped somehow?

Running sudo nmap -sU -p 67 --script=dhcp-discover 192.168.1.101 gives

Starting Nmap 7.80 ( https://nmap.org ) at 2021-12-05 18:31 PST                                                         
Nmap scan report for 192.168.1.101                                                                                      
Host is up (0.000087s latency).                                                                                                                                                                                                                 
PORT   STATE  SERVICE                                                                                                   
67/udp closed dhcps                                                                                                                                                                                                                             
Nmap done: 1 IP address (1 host up) scanned in 0.76 seconds

Is this expected? Should the port be closed?

Edit: After restarting DNS resolver, the output changes to

Starting Nmap 7.80 ( https://nmap.org ) at 2021-12-05 19:22 PST                                                         
Nmap scan report for 192.168.1.101                                                                                      
Host is up.                                                                                                                                                                                                                                     
PORT   STATE         SERVICE                                                                                            
67/udp open|filtered dhcps                                                                                              
|_dhcp-discover: ERROR: Script execution failed (use -d to debug)                                                                                                                                                                               
Nmap done: 1 IP address (1 host up) scanned in 2.83 seconds 

so I'm guessing the DHCP server is being closed for some reason?

resolvconf is configuring DNS services.
It wouldn't interfere or cause port conflicts with a DHCP server, so there would have been no need to uninstall it.

dhcpcd is a DHCP client.
It is also the standard DHCP client for some Linux distributions like Raspberry Pi OS.
It wouldn't interfere with a DHCP server, so there would have been no need to uninstall it.

So you could rerun the netstat command I've provided, or the shorter, more specific:

sudo netstat -tulpn | grep "Proto\|:67 "

Your debug log suggests you are running a Desktop variant of Ubuntu.
If you wouldn't depend on a graphical UI on your Pi-hole host, running a headless Ubuntu Server edition may avoid potential issues with NetworkManager, as the Server variant would rely on systemd-networkd.

However, we haven't identified yet why your pihole-FTL instance would drop DHCP support suddenly.
(Note that dnsmasq-dhcp is just a prefix within the log files that allows to identify DHCP related entries. It is still pihole-FTL, Pi-hole's optimised dnsmasq version, that would be responsible for supplying DNS as well as DHCP services.)

Hey! Trying this command sudo netstat -tulpn | grep "Proto\|:67 " gives me

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name                        
udp        0      0 0.0.0.0:67              0.0.0.0:*                           24404/pihole-FTL 

These are the last logs before DHCP stops responding

Dec  8 20:21:00 dnsmasq-dhcp[24404]: DHCPDISCOVER(wlan0) a8:97:cd:8f:81:92                                              
Dec  8 20:21:00 dnsmasq-dhcp[24404]: DHCPOFFER(wlan0) 192.168.1.210 a8:97:cd:8f:81:92                                   
Dec  8 20:21:01 dnsmasq-dhcp[24404]: DHCPDISCOVER(wlan0) a8:97:cd:8f:81:92                                              
Dec  8 20:21:01 dnsmasq-dhcp[24404]: DHCPOFFER(wlan0) 192.168.1.210 a8:97:cd:8f:81:92                                   
Dec  8 20:21:01 dnsmasq[24404]: query[A] mail.google.com from 192.168.1.251                                             
Dec  8 20:21:01 dnsmasq[24404]: cached mail.google.com is <CNAME>                                                       
Dec  8 20:21:01 dnsmasq[24404]: cached googlemail.l.google.com is 172.217.6.37                                          
Dec  8 20:21:06 dnsmasq[24404]: query[A] daisy.ubuntu.com from 127.0.0.1                                                
Dec  8 20:21:06 dnsmasq[24404]: forwarded daisy.ubuntu.com to 8.8.8.8                                                   
Dec  8 20:21:06 dnsmasq[24404]: reply daisy.ubuntu.com is 162.213.33.132                                                
Dec  8 20:21:06 dnsmasq[24404]: reply daisy.ubuntu.com is 162.213.33.108 

I've noticed that 90% of the dhcp logs are repeats of these two lines

Dec  8 20:21:01 dnsmasq-dhcp[24404]: DHCPDISCOVER(wlan0) a8:97:cd:8f:81:92                                              
Dec  8 20:21:01 dnsmasq-dhcp[24404]: DHCPOFFER(wlan0) 192.168.1.210 a8:97:cd:8f:81:92   

Is this normal? Is there some device on the network that is constantly failing to be assigned an IP address? I wonder if it's related to why the DHCP stops working.

pihole-FTL.log from the same timeframe

[2021-12-08 19:55:01.678 24404M] INFO: FTL is running as user pihole (UID 998)                                                                                                                                                        
[2021-12-08 19:55:01.679 24404M] Reloading DNS cache                                                                                                                                                                                  
[2021-12-08 19:55:01.925 24404M] Blocking status is enabled                                                                                                                                                                           
[2021-12-08 19:55:02.696 24404/T24408] Compiled 0 whitelist and 0 blacklist regex filters for 12 clients in 0.4 msec                                                                                                                  
[2021-12-08 20:00:26.743 24404M] Resizing "FTL-dns-cache" from 4096 to (512 * 16) == 8192 (/dev/shm: 4.5MB used, 1.0GB total, FTL uses 4.5MB)                                                                                         
[2021-12-08 20:25:27.579 24404M] Resizing "FTL-dns-cache" from 8192 to (768 * 16) == 12288 (/dev/shm: 4.5MB used, 1.0GB total, FTL uses 4.5MB)      

There is no indication why DHCP just stops working. Is there some kind of error log I could look into? Thanks for the help so far!

Thank you for the help! In the end I got around it by running a cron script to detect if the DHCP server if down, and restart pihole-FTL if so.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.