I recently switched to using pihole as my DHCP server. For the most part, it's working fine but occasionally for some devices I'll have an hour or two where it just won't connect to wifi. This has happened with a Macbook Air and iPhone SE 2020 (both latest OS), but other devices are fine (windows/other). I have also set the DHCP lease to infinite but the IOS devices are renewing multiple times per day under a 90 day lease time.
I would expect that there is no downtime when connecting, but it seems like randomly it's happening with these devices when renewing the DHCP lease.
Actual Behavior:
I'm getting connectivity issues on IOS devices when the DHCP lease goes through renewal (I believe...its correlated but I don't know for sure that's the main cause). During this time I cannot access anything on the internet, but I CAN access everything on my internal network aside from my pi. For example, pihole is on 192.168.1.18, during these connectivity issue periods I can access 192.168.1.1 (router), 192.168.1.2 (NAS), etc. but I CANNOT access 192.168.1.18 or the internet until the connectivity issues resolve, which sometimes takes an hour+.
One solution I had was to set the DHCP lease to infinite, but for some reason it looks like the IOS devices only allow 90 day maximum lease. Plus, my phone seems to renew the lease on a much more frequent basis (multiple times per day), so the lease time is always 89 days, 23 hours, XX minutes. Checking the lease time after these connectivity issues occur seems to show that the DHCP renewal and the problems are at least aligned.
I doubt that's the case.
iOS implies that those devices would connect via wifi.
A client releases its DHCP lease when it disconnects from a network, which is just what a wifi device can be expected to do for various reasons (e.g. switching to mobile carrier data when reception is weak, or switching bands when your 5GHz band is getting stronger/weaker than your 2GHz, or when going into a deep power saving mode,...).
When it would reconnect to the same wifi network, it would have to acquire a new lease.
Your debug log shows no signs of outages.
Your observation may indicate that your Pi-hole host machine isn't reachable, which could happen e.g. if the wifi connecton would go down.
If it isn't reachable, your clients would not be able to resolve DNS requests, which would look like an internet outage, but access by IP would still work.
During your observations, did you verify whether DNS resolution works through a public resolver?
Also, did you verify if all clients would be affected by that outage?
Or would that perhaps just be your wifi clients?
On occasions, routers may have to release wifi frequencies for priority usage, e.g. for a weather radar. They'd usually drop clients and/or steer them to a different wifi channel, but depending on your router's and client's capablities, that may not always result in a seamless reconnect.
However, respective priority events in my vicinity never seem to never exceed a duration of 5 minutes (certainly never an hour).
I am able to access the router itself during these outages...I can confirm the wifi is still active. I'm also able to access the internet on other devices at the same time, so it's not a pihole <> internet/router issue. I'll try to access a public DNS next time it goes down.
Router is a Netgear R7800 if that helps, running the current Netgear software.
I can confirm that NOT all clients are affected by this outage, only IOS devices, and its by device. If I turn on the device's private MAC address (essentially, change the MAC address), it will connect (for a brief period). If I turn off the MAC privacy (going back to the device's original MAC), I will be unable to connect again.
To help you in further analysis, during a quiet time of your network, could you connect an offending iOS device and another working wifi device to your network, making sure both of them would connect through the same network equipment (e.g. router or wifi access point), preferably being the only clients.
You should then extract the DHCP communications during that time frame from your logs, e.g. by running:
Looks like I didn't have to wait long. It is currently occurring.
Currently my computer is connected. My phone was not connecting for about 10 minutes, so I was able to capture the log while the issue was occurring and after.
A similar issue just happened with one of my windows machines, it said the DNS could not be found but I was able to ping 8.8.8.8 with success as well as access devices on my local network (other than my pi).
Seems you've also included grep results outside of your observation's time frame.
I've assumed that the relevant time frame is near the end of the log files.
A complete successful DHCP negotiaton would look as follows:
a.) client broadcasts a DHCPDISCOVER to all potential DHCP servers on the link
b.) Pi-hole sends a DHCPOFFER with an IP address to the client
c.) client sends a DHCPREQUEST for that IP to Pi-hole
d.) Pi-hole confirms the IP assignment by sending a DHCPACK to the client
A client trying to renew its existing lease would directly send a DHCPREQUEST to the DHCP server it acquired the lease through, so a succesful renewal sequence would show up as c.) and d.)
Your logs shows that your Pi-hole has received and answered your client's DHCP replies as expected:
But your client idoes not stop its requests, even after such a sucessful sequence.
That would suggest that Pi-hole's replies are either not received or ignored by your client.
How are those clients connected to your network:
Directly through your router or through some access points?
Would you perhaps employ some firewall rules on any of that network equipment that would drop traffic or otherwise interfere with DHCP replies?
The client is an iPhone SE 2020, so it would be connected via WIFI directly to the router. No firewall rules that would be interfering, everything is occurring behind the firewall.
Your client being an iPhone SE 2020 wouldn't preclude it from connecting through other network equipment like accesspoints, switches or mesh nodes.
Do you run any such network equipment for your network, and did your clients connect through that network equipment when you observed those DHCP issues?
If so, chances are that network equipment could be involved.
(That's the reason why I asked whether perhaps only wifi clients would be affected.)
Perhaps, but there's too little information to confirm or reject that.
Can you tell whether a client is connected through that AP when you observe those DHCP issues?
If not, can you exclude the AP, e.g. by disallowing your problematic devices to use it?
Or would it be possible to refrain from using it for a while, to preclude it would be involved?
I still seem to be having connection issues, but they seem to resolve faster now (maybe 5-10 min rather than hours). From what I gather, this might not be a pihole issue and I'd be fine to investigate the router path, but I didn't notice this back when I had my router set up as the DHCP. Any thoughts on what might be impacting this from a pihole side, or if theres any debugging I can do there?
There are at least three parties in involved here: pihole-FTL/dnsmasq providing the DHCP server, a client requesting/renewing a DHCP lease, and a router (plus any additional network equipment) transporting broadcasts and unicast packets alike between Pi-hole and the client.
Any of those could contribute to your observation.
We've seen from the Pi-hole logs that Pi-hole is acting up to a client's requests as expected.
What we can't be sure of is whether Pi-hole's replies make it to the client (a router or AP may drop or lose the packets e.g. if they wouldn't be aware of an on-link client, or because there is too much traffic) or whether the client is receiving them, and if it receives them, if it would simply ignore them or somehow disagree with the reply.
From your description, it would seem that your AP contributes to your observation, but at least some clients still would be stuck in repeating the DHCP request/renewal cycle.
Interestingly enough, the majority of those 592 lines from your log can be attributed to randomised MAC addresses (click for details)
MAC
#lines
IP offered
hostname
26:4e:<...>
82
.1.43
52:2b:<...>
38
.1.103
8e:e5:<...>
26
.1.254
72:d6:<...>
144
.1.47
76:c2:<...>
46
.1.89
92:f4:<...>
90
.1.46
ba:19:<...>
44
.1.121
total
470
e8:7f:<...>
32
.1.4
AllisonLaptop
5c:0c:<...>
8
.1.37
MattDesktop
3c:22:<...>
4
.1.209
NintendoSwitch
40:8d:<...>
2
.1.71
Roku
84:ea:<...>
2
.1.7
AFSLaptop
8c:17:<...>
72
.1.48
PocketComputer
f4:ce:<...>
2
.1.3
Batmobile
total
122
Getting to the bottom of this could be challenging, as you'd have to start inspecting things on the other end of the connection, i.e. you'd need to analyse the packets that are received by a client.
On a PC, you would install some packet capturing software (e.g. Wireshark).
However, I am unaware how to achieve that on a smartphone - and those seem to be the devices exhibiting offending behaviour in your case.
Alternatively, your router may offer tools to capture packet traffic, allowing you to download a packet log that then can be analysed with tools like Wireshark.
And since the primary offender is a phone, I think the easier approach would be to drop the pihole as DHCP and go back to my router. I haven't noticed the randomized MAC addresses as causing the problem (there are some devices with randomized MACs that don't have this issue). The AP did seem to contribute to the problem, but it wasn't the source...I'm thinking at the end of the day my router just doesn't play nice with another DHCP server on the network, so I'll try this again if I get a new router.
Yeah, these kind of issues are not easy to analyse, unfortunately.
Only packet capturing tools could confirm or reject if your router would be involved.
If it is indeed your client, then swappping routers may have no effect.
And if it would be your client dropping or ignoring packets despite receiving them, then that would still leave us with the question why it would do so: Is the client software not recognising the packets? Or do the packets contain some incorrect detail that would require client software to discard it? And if so, have the packets been mangled by your router, or is dnsmasq sending them as received?
If having your router as DHCP server resolves the issue for you, it's probably a good idea to stick with it.
Was there specific goal you were trying to achieve by switching DHCP duties to Pi-hole?
I was trying to maintain the per-device tracking that's not available on the router, so not much is lost there. When the router is the DHCP all requests come from 192.168.1.1.
So far with the reversal it seems to be working fine, so I'll keep it that way going forward.