Google Wifi w/pihole DHCP problems (narrowed down to pihole-FTL pegged at 99%)

Expected Behaviour:

  • Configure Google Wifi DHCP range to a single IP x.x.x.10 to x.x.x.10
  • Reserve that ip for pihole
  • Configure Google Wifi Custom DNS with ip of pihole.
  • Enable DHCP on pihole with non-overlapping range x.x.x.20 to x.x.x.250
  • Reboot Google Wifi Access Points (4 total on my network)
  • Expect the Google Wifi Access Points to obtain IP from pihole DHCP server
  • Expect wireless devices on network to obtain IP from pihole DHCP server
  • Expect wired devices on network to obtain IP from pihole DHCP server

Actual Behaviour:

  • Wired devices successfully obtain IP from pihole DHCP server
  • Wireless devices (including the 3 non-master Google Wifi access points) fail to obtain IP

Debug Token:

x4b5d0rogm

Things I have noticed in my troubleshooting so far.

  • I disabled wlan0 interface on the Pi Z as I have a usb ethernet interface active. I was seeing some DHCPDISCOVER(wlan0) messages and the "Pi-hole Ethernet Interface" still showed wlan0. Ran pihole -r to get it to eth0.
  • no iptables rules blocking any of the required ports

At wits end. I know there are plenty of Google Wifi posts. I've searched them out here and on reddit for clues but I can't figure out why IPs are not getting assigned.

A solution might be to:

  1. Connect both the wired and WiFi interfaces on the Pi-hole host to network;
  2. Create a new reboot persistent bridge interface on the Pi-hole host with a static IP address configured in your subnet;
  3. Add the Pi-hole wired and WiFi interfaces (without IP assigned/configured) to the bridge;
  4. Run pihole -r (reconfigure) to have Pi-hole listening on the bridge IP that has a leg in both the wired and wireless network segments.
  5. Disable all DHCP services except the Pi-hole one;

EDIT: changed order

I could give the bridge idea a go if I really needed wifi on the PiZ, but I'm fine leaving it off. I wouldn't expect it being disabled to interfere with DHCP.

Sadly we can't turn off DHCP on Google wifi :confused:

Went back to square one, setup the PiZ from scratch (thank god for ansible :slight_smile: )

This time I had much better success. I am able to get 99% of my clients online and dnsmasq handing out ips as expected for wired and wireless clients.

There are a few clients that have troubles getting an IP still, specifically android phones and FireTV stick. Sometimes they get an IP very quickly, sometimes it takes a good minute to get an IP, sometimes they fail to get an IP. If I "forget" the network on those devices and re-connect they seem to get an IP most of the time.

Getting closer, but sadly I don't know what changed that fixed most of this (since it's an ansible playbook the setup is identical).

Anyone have any issues with android os specifically?

Well that was short lived. I figured out what was causing my problems, pihole-FTL was pegged at 99% CPU bringing the PiZ to it's knees. DNS resolutions were taking >5s, DHCP renews would fail >90% of the time.

Rebooted the PiZ, pihole-FTL shot right back up.

Had to disable pihole's DHCP, immediately pihole-FTL dropped to 1%.

New token: b9bqo193qb!

anyone have clues why FTL shoots up with DNS enabled?

I can reproduce the 99% spike by just enabling Conditional Forwarding. Something fishy going on with FTL.

Most probably you've created a loop somewhere.

Any pointers on where to check?

Config as I know it:

  • Google Wifi DNS custom 1: x.x.x.10 (pihole LAN address)
  • pihole settings screen for DNS has IPv4 primary and secondary checked for Cloudflare
  • /etc/resolv.conf single entry for "nameserver 127.0.0.1"
  • /etc/dnsmasq.d/01-pihole.conf:
addn-hosts=/etc/pihole/gravity.list
addn-hosts=/etc/pihole/black.list
addn-hosts=/etc/pihole/local.list
localise-queries
no-resolv
cache-size=10000
log-queries
log-facility=/var/log/pihole.log
local-ttl=2
log-async
server=1.1.1.1
server=1.0.0.1
domain-needed
bogus-priv
dnssec
trust-anchor=<redacted>
trust-anchor=<redacted>
local-service
  • /etc/dnsmasq.d/03-custom.conf:
server=/pihole.lan/192.168.86.10

edit: added more details

Try setting the Google WiFi upstream DNS setting to default or another DNS IP (not Pi-hole).

See if it works without above one.
EDIT: this is probably the loop.

I tried removing that line and setting Google Wifi upstream DNS to 1.1.1.1.

Still, as soon as I enabled the pihole DHCP server pihole-FTL spiked.

See if DNSSEC is giving you troubles.
Select non DNSEC upstream DNS servers for diagnosing.

I think you're on to something! Switched pihole DNS to use 9.9.9.10 (and disabled DNSSEC option) and when a host renews DHCP pihole-FTL hits 80-90% but only for a few seconds, and the host gets an IP assigned.

edit: pressed my luck and tried to switch more hosts over, after a handful of DHCP renews pihole-FTL stayed at 99%

Sweet.
But still I believe you 've got to get rid of below one too:

If 192.168.86.10 is Pi-hole, your creating a loop:

pi@noads:~ $ man dnsmasq

       -S, --local,
       --server=[/[<domain>]/[domain/]][<ipaddr>[#<port>][@<source-ip>|<interface>[#<port>]]
              Specify IP address of  upstream  servers  directly.

yup I got rid of that already.

1 Like

Did some more digging. Started with a fresh pihole -r and flipped one setting on at a time.

I have it working pretty stable with DNSSEC and 1.1.1.1/1.0.0.1 upstream DNS servers.
The dramatic jump in pihole-FTL seems to be directly related to number of enabled blocklists. If I only enable the default lists (~113k domains) then the CPU spike for pihole-FTL only lasts a couple of seconds.

If I enable all of these: https://v.firebog.net/hosts/lists.php?type=tick (~800k domains) then it gets into trouble. As soon as a DHCP renew request comes in it spikes and fails to assign IP.

Seems like I am hitting a performance bottleneck. Are there testcases for CPU recommended related to blocklists? I might need to retire the idea of using a PiZ.

[2019-05-20 18:31:03.931 2140] /etc/pihole/gravity.list: parsed 818769 domains (took 14831.3 ms)

vs.

[2019-05-20 18:37:39.204 2140] /etc/pihole/gravity.list: parsed 113137 domains (took 1166.8 ms)

It should be able to handle that many domains. But other things take up CPU resources. What is the output of these commands?

echo ">stats" | nc localhost 4711

free

$ echo ">stats" | nc localhost 4711
domains_being_blocked 113137
dns_queries_today 51056
ads_blocked_today 7333
ads_percentage_today 14.362660
unique_domains 1989
queries_forwarded 30778
queries_cached 12798
clients_ever_seen 62
unique_clients 60
dns_queries_all_types 51056
reply_NODATA 35
reply_NXDOMAIN 218
reply_CNAME 1119
reply_IP 1036
privacy_level 0
status enabled
---EOM---
$ free
              total        used        free      shared  buff/cache   available
Mem:         443132      126228       89112       13776      227792      252948
Swap:        614396           0      614396
$ zramctl
NAME       ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 lz4           600M    4K   76B    4K       1 [SWAP]
/dev/zram1 lz4           150M 45.3M  7.7M  8.2M       1 /opt/zram/zram1

Added zramctl output to show where some of the physical memory got mapped. swap and /var/log are zram'd

Here's the output with all the blocklists enabled from above

$ echo ">stats" | nc localhost 4711
domains_being_blocked 818861
dns_queries_today 51586
ads_blocked_today 7361
ads_percentage_today 14.269376
unique_domains 1993
queries_forwarded 31100
queries_cached 12970
clients_ever_seen 63
unique_clients 61
dns_queries_all_types 51586
reply_NODATA 55
reply_NXDOMAIN 230
reply_CNAME 1359
reply_IP 1270
privacy_level 0
status enabled
---EOM---