2nd Pi-hole hits RATE_LIMIT

Not trying to hijack @dermoth's thread, unfortunately I experienced some similar??? behaviors. I do apologize in advance, haven't had coffee yet, just started receiving "no internet" complaints as the majority of devices on our network ceased responding to queries as my pihole2 device was overwhelmed (hit RATE_LIMIT cap).

DEBUG TOKEN: https://tricorder.pi-hole.net/BQH8SQgi/

I can explain what happened in a plain english sense, but have been madly scurrying to restore connectivity to maintain our SLA's (with the family). :hot_face: A Windows host (running type2 Hyper-V with Pihole1 VM) automatically restarted overnight to apply Windows Updates and never came back online, at which point only Pihole2 was responding. Still trying to isolate the issue -- could be problem after upgrading to Raspbian GNU/Linux 11 (bullseye) armv7l this week... or something else entirely that escapes me and/or is beyond my meager technical capabilities.

EXPECTED BEHAVIOR: If pihole1 fails, I expect pihole2 device to take over and handle queries so our entire network does not drop.

Thank you for your time and attention to my little matter. Be safe.

OS: Raspbian GNU/Linux 11 (bullseye) armv7l
Host: Raspberry Pi 3 Model B Plus Rev 1.3
Kernel: 5.10.63-v7+
Uptime: 1 hour, 6 mins
Packages: 628 (dpkg)
Shell: bash 5.1.4
Terminal: /dev/pts/1
CPU: BCM2835 (4) @ 1.400GHz
Memory: 103MiB / 972MiB

That expectation is not directed at Pi-hole, but towards your client's behaviour, as a client knows two DNS server IPs (192.168.11.7 and 192.168.11.8) and can opt for using either of them.
Furthermore, it is already met, or you wouldn't be observing any DNS traffic with your second Pi-hole.

That said, those excessive DNS requests you see are often caused by a DNS loop.

Your debug log shows you have Conditional Forwarding enabled.
As CF may potentially close a partial DNS loop, try if disabling CF would solve your issue.

Also, let's check what clients and domains cause those high volumes.
Please share the the output of the following commands, run from your Pi-hole machine:

echo ">stats >quit" | nc localhost 4711
echo ">top-clients >quit" | nc localhost 4711
echo ">top-domains >quit" | nc localhost 4711
echo ">top-ads >quit" | nc localhost 4711
1 Like

Sorry for the poor phrasing, I completely understand you cannot work magic, nor help my router's dhcp services handle DNS failover properly if I have misconfigured my network. Genuinely appreciate your input and expertise, sir. Apologies for the delay getting further data back to you.

┌─[user@pihole2]─[~]
└──╼ $echo ">stats >quit" | nc localhost 4711
domains_being_blocked 281725
dns_queries_today 254373
ads_blocked_today 6804
ads_percentage_today 2.674812
unique_domains 1746
queries_forwarded 181068
queries_cached 1059
clients_ever_seen 29
unique_clients 28
dns_queries_all_types 253866
reply_NODATA 682
reply_NXDOMAIN 3066
reply_CNAME 1419
reply_IP 9251
privacy_level 0
status enabled
┌─[user@pihole2]─[~]
└──╼ $echo ">top-clients >quit" | nc localhost 4711
0 318355 192.168.11.8 pihole2.lab.xxx
1 13822 192.168.11.11 sg300.lab.xxx
2 9867 192.168.11.109 Ruckus.lab.xxx
3 8696 192.168.107.251 Google-Home-Mini.lab.xxx
4 8215 192.168.107.48 Google-Home-Mini.lab.xxx
5 8151 192.168.107.21 Google-Nest-Mini.lab.xxx
6 6747 192.168.107.216 Chromecast.lab.xxx
7 5139 192.168.107.75 TCL.lab.xxx
8 3701 192.168.11.121 Pixel2.lab.xxx
9 3456 127.0.0.1 localhost
┌─[user@pihole2]─[~]
└──╼ $echo ">top-domains >quit" | nc localhost 4711
0 37272 1.us.pool.ntp.org
1 34257 3.us.pool.ntp.org
2 26673 pubsub.pubnub.com
3 19641 2.us.pool.ntp.org
4 18300 0.us.pool.ntp.org
5 13347 www.google.com
6 10618 time.google.com
7 8357 connectivitycheck.gstatic.com
8 7517 www.baidu.com
9 6416 support.ruckuswireless.com
┌─[user@pihole2]─[~]
└──╼ $echo ">top-ads >quit" | nc localhost 4711
0 4767 scribe.logs.roku.com
1 844 diagnostics.meethue.com
2 184 graph.instagram.com
3 177 api.nova.fox.com
4 166 edgekey.net
5 79 ssl.google-analytics.com
6 68 analytics.plex.tv
7 60 app-measurement.com
8 49 segmentapis.com
9 45 liberty.logs.roku.com

It would seem that your pihole2.lab.xxx (.8) is desperate for getting the correct time from a set of time servers at *.pool.ntp.org.

Since that is likely running on an RPi, that may suggest that RPi's clock may be (or have been) out of sync.
Then your pihole2 RPi would try to sync time with 1.us.pool.ntp.org, but since unbound is employing DNSSEC, it would not be able to resolve that domain (nor any other public domains) - for lack of precise time information.

There are several ways to address this, e.g. you could consider to purchase and install an RTC, or to forward requests for us.pool.ntp.org to a public DNS resolver or your router, or to supply your router as a time server for that Raspberry Pi OS machine in /etc/systemd/timesyncd.conf (provided your router can be configured to act as a time server).

We've only recently discussed those and other options in Failing lookups after power outage DNSSEC - #20 by DL6ER.


Not likely to be related, but your debug log also shows that your Pi-hole is configured for eth0, which doesn't exist on your system. Its Ethernet interface is labeled enxb827eb09ecfe instead.
This may be a side effect of a recent upgrade to Bullseye, which may have switched your OS to use predictable names once again.

A default Pi-hole would then fail to receive DNS requests, but in your case, you have changed Pi-hole's listening behaviour to Listen on all..., allow all origins (presumably to cater for Wireguard connections, as wg0 would suggest).
While your pihole2 is still responsive, the same may not be true for another Pi-hole with an Interface listening behaviour tied strictly to a specific interface.

To address this, you could either run pihole -r with Reconfigure and choose the correct network interface, or disable predictable interface names via an Advanced Option in raspi-config.

1 Like

Thank you kindly for your incredibly detailed replies to my somewhat vague and hastily written help request. Appreciate you migrating this to a new thread, as well. DNS is a real nightmare to attempt to troubleshoot, especially when one does not fully grasp the underlying protocol.

Perhaps in the upgrade from Raspbian buster > bullseye somethings may have broken, but the DS3231 RTC I installed in 2018 2019 (for DNSSEC purposes, thanks to recommendations here) appears to still be working just fine, at least compared with time.gov website.

sudo hwclock -r
2021-11-12 09:52:17.066233-05:00

I discovered a vast number of NTP requests coming from my Cisco "core" switch. Somehow that device did NOT adjust itself properly after returning from Daylight Savings to Standard time, I have now updated it manually.

My router has the following NTP servers applied; 0.us.pool.ntp.org 1.us.pool.ntp.org 2.us.pool.ntp.org 3.us.pool.ntp.org which are then passed out to the entire network via DHCP (which i have attempted to confirm via NMAP):

sudo nmap --script broadcast-dhcp-discover
[sudo] password for user:
Starting Nmap 7.80 ( https://nmap.org ) at 2021-11-12 10:13 EST
Pre-scan script results:
| broadcast-dhcp-discover:
|   Response 1 of 1:
|     IP Offered: 192.168.11.60
|     DHCP Message Type: DHCPOFFER
|     Server Identifier: 192.168.11.1
|     IP Address Lease Time: 2m00s
|     Renewal Time Value: 1m00s
|     Rebinding Time Value: 1m45s
|     Subnet Mask: 255.255.255.0
|     Broadcast Address: 192.168.11.255
|     Domain Name Server: 192.168.11.7, 192.168.11.8
|     Domain Name: lab.example.com
|_    Router: 192.168.11.1
WARNING: No targets were specified, so 0 hosts scanned.
Nmap done: 0 IP addresses (0 hosts up) scanned in 3.22 seconds

This is a wonderful thread, apparently I had a draft reply written here, but never followed up to finish posting.

Yet again, you are spot on, sir. Unsure how i failed to notice the change in interface name post buster upgrade. Now resolved, we're back to eth0 finally.

I adjusted this once upon a time, based upon a recommendation you had made to me, many many moons ago, in order to allow connectivity for our various voice assistant / listening devices we've planted all around the home (connected to a separate SSID walled off from the LAN as an "untrusted" IoT VLAN), but almost certainly misconfigured this and will try "Listen on all interfaces" (one hop, local devices). I've been utilizing the same DNS redirect (sNAT) & captive DNS (dNAT) for years on my Edgerouter, but perhaps I need to revisit things and make sure there are no other potential configuration issues on my end, however that doesn't seem very likely at this stage in the game.

user@er12:~$ show dns forwarding statistics
----------------
Cache statistics
----------------
Cache size: 1000
Queries forwarded: 0
Queries answered locally: 104
Total DNS entries inserted into cache: 0
DNS entries removed from cache before expiry: 0

---------------------
Nameserver statistics
---------------------
Server: 192.168.11.8
Queries sent: 0
Queries retried or failed: 0

Server: 192.168.11.7
Queries sent: 0
Queries retried or failed: 0

Things do not seem to be escaping my attempts at capturing and invisibly redirecting DNS requests from the LAN. I cannot begin to thank you enough for all the time and patience with my various, disjointed (often times vague) support requests. Hopefully I've resolved the most glaring issues. Will continue testing through the weekend and see how we make out. So sorry to take much of your precious time with this. Be safe, @Bucking_Horn sir.

EDIT: Perhaps i'm still getting adjusted to all the new query types that are displayed in the latest version, but it seems as if i'm getting an awful lot of pihole2 requests across both devices. At this point I probably should just start with a fresh image of the SD card and rebuild it all properly from the ground up. I'm spending far too much time chasing my tail... and wasting your valuable time in the process.

Ok, my turn at hijacking threads... I'd like to point out my personal experience with predictable names in Bullseye.

@Bucking_Horn Just a side note as I already went through hell on my desktop with predictable interfaces, I always had udev rules to rename the interfaces to the correct names (eth0 and eth1), the problem since Bullseye is that network devices are brought up in parallel and so eth1 may be already there when udev triggers the rename of eth0 to eth1, which then fails and ends up with the devices named in an unpredictive manner. I've tried many other ways to set the names but it's always the same issue.

I went with predictive names but then realized another issue, when adding/removing PCIe cards, or enabling/disabling onboard devices in the BIOS, the predictive names change and I need to reconfigure my network again!

I wish the kernel could use a different prefix than "eth" for network devices, but I think in the end I will just name them et0, et1, etc or something like that to avoid the clash with the default names.

NB: That issue is of course only a problem with more that one ethernet device, and depending on the system initialization order it could be an issue only in rare cases (for me about one every 10 boot was in the correct order so I would have probably not even noticed it immediately if the device names were reversed!). If you want to test I suggest swapping the device names and see if it still works consistently.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.