High rate limit

Hey guys,

I think I might have a DNS loop occurring w/ pihole and Ubiquiti gear. It's been working absolutely fine until recently, but something has changed, and I'm getting some quite substantial rates coming in.

The values high I'm getting are:
pihole-master.home.lan
192.168.50.1

https://tricorder.pi-hole.net/KDPb96lv/

Setup:
Multiple VLANs
UDMP
2x Pihole instances w/ GravitySync.

Pihole VMs:
Dockerised Pihole.
Pihole 1 = 192.168.50.104
Pihole 2 = 192.168.50.105

UDMP Setup:
UDMP IP: 192.168.0.200 (w/ firewall rules)
VLANS: null, 10, 20, 30, 40, 50

 #Pihole - Network wide adblocker, DNS cache and DHCP server
  pihole:
    image: pihole/pihole:latest
    container_name: Pihole
    hostname: PiHole-Master
    environment:
      # INTERFACE: $INTERFACE
      DNSMASQ_LISTENING: $DNSMASQ_LISTENING
      ServerIP: $SERVER_IP
      # DNS1: $PH_DNS1
      # DNS2: $PH_DNS2
      PIHOLE_DNS_: $PIHOLE_DNS
      VIRTUAL_HOST: $VIRTUAL_HOST
      FTLCONF_REPLY_ADDR4: $SERVER_IP
      CORS_HOSTS: $CORS_HOSTS
      TZ: $TZ
      SKIPGRAVITYONBOOT: $SKIPGRAVITYONBOOT
      DNSSEC: $DNSSEC
      DNS_BOGUS_PRIV: $DNS_BOGUS_PRIV
      DNS_FQDN_REQUIRED: $DNS_FQDN_REQUIRED
      REV_SERVER: $REV_SERVER
      DNSMASQ_USER: $DNSMASQ_USER
      REV_SERVER_TARGET: $REV_SERVER_TARGET
      REV_SERVER_CIDR: $REV_SERVER_CIDR
      REV_SERVER_DOMAIN: $REV_SERVER_DOMAIN
      CUSTOM_CACHE_SIZE: $CUSTOM_CACHE_SIZE
      WEBPASSWORD: $WEBPASSWORD
      WEBTHEME: default-dark
    labels: 
      - "autoheal=true"
    # cap_add:
      # - NET_ADMIN
    network_mode: "host"
    dns:
      - "127.0.0.1"
      - "9.9.9.9"
    depends_on:
      - dnscrypt-proxy
    volumes:
      - $USERDIR/Pihole/etc/pihole:/etc/pihole/
      - $USERDIR/Pihole/etc/dnsmasq:/etc/dnsmasq.d/
      # run `touch $USERDIR/Pihole/etc/logs/pihole.log` first unless you like errors
      - $USERDIR/Pihole/etc/logs/pihole.log:/var/log/pihole.log
      - $PIBACKUPDIR:/backup
      - $PISCRIPTS/piholebackup.sh:/scripts/piholebackup.sh:ro
    restart: always
    mem_limit: 1024m
    mem_reservation: 256m

REV_SERVER_TARGET="192.168.0.200"
REV_SERVER_CIDR="192.168.0.0/16"
Permit all origins enabled

The issue seems to be coming from 192.168.50.1 (UDMP's VLAN gateway) - not quite sure how and where to diagnose - unsure if anyone can assist?

Rate limit is triggered by high volumes of DNS requests.
Those need not necessarily to be caused by a DNS loop.

If your 192.168.50.1 would aggregate DNS requests from a large number of very active clients, you may have to adjust the rate limit.

What script are you running that is repeatedly requesting this?

 -----tail of pihole.log------
   Feb  6 04:38:37 dnsmasq[458]: config insertions.bind is <TXT>
   Feb  6 04:38:37 dnsmasq[458]: config evictions.bind is <TXT>
   Feb  6 04:38:37 dnsmasq[458]: config hits.bind is <TXT>
   Feb  6 04:38:37 dnsmasq[458]: config misses.bind is <TXT>
   Feb  6 04:38:37 dnsmasq[458]: config auth.bind is <TXT>
   Feb  6 04:38:37 dnsmasq[458]: config servers.bind is <TXT>
   Feb  6 04:38:38 dnsmasq[458]: config cachesize.bind is <TXT>
   Feb  6 04:38:38 dnsmasq[458]: config insertions.bind is <TXT>
   Feb  6 04:38:38 dnsmasq[458]: config evictions.bind is <TXT>
   Feb  6 04:38:38 dnsmasq[458]: config hits.bind is <TXT>
   Feb  6 04:38:38 dnsmasq[458]: config misses.bind is <TXT>
   Feb  6 04:38:38 dnsmasq[458]: config auth.bind is <TXT>
...

Only script I have is gravitysync, but otherwise, it's a pretty vanilla setup.

VM > Docker > Pihole + DNSCrypt.

I've blatted the VM's and started again (thank god for github repos), and it seems to be behaving now.

Not quite sure why it did what it did on two separate instances (well, gravitysync would have obviously had some part to play, but still).

Looks like the issues are back (well, not high rate limit, but certainly quite a few queries to itself).

https://tricorder.pi-hole.net/7TQWhnpG/

That's the latest debug log.

No additional scripts bar gravity-sync (and the master isn't touched by that anyway).

Not quite sure where to start looking.

This is what I have set on each of my VLANs:

   Feb 13 18:01:44 dnsmasq[11912]: config servers.bind is <TXT>
   Feb 13 18:01:45 dnsmasq[11912]: query[A] PiHole-Master.home.lan from 127.0.0.1
   Feb 13 18:01:45 dnsmasq[11912]: /etc/pihole/custom.list PiHole-Master.home.lan is 127.0.0.1
   Feb 13 18:01:45 dnsmasq[11912]: query[AAAA] PiHole-Master.home.lan from 127.0.0.1
   Feb 13 18:01:45 dnsmasq[11912]: forwarded PiHole-Master.home.lan to 192.168.50.1
   Feb 13 18:01:45 dnsmasq[11912]: reply PiHole-Master.home.lan is NODATA-IPv6
   Feb 13 18:01:45 dnsmasq[11912]: config cachesize.bind is <TXT>
   Feb 13 18:01:45 dnsmasq[11912]: config insertions.bind is <TXT>
   Feb 13 18:01:45 dnsmasq[11912]: config evictions.bind is <TXT>
   Feb 13 18:01:45 dnsmasq[11912]: config hits.bind is <TXT>
   Feb 13 18:01:45 dnsmasq[11912]: config misses.bind is <TXT>
   Feb 13 18:01:45 dnsmasq[11912]: config auth.bind is <TXT>
   Feb 13 18:01:45 dnsmasq[11912]: config servers.bind is <TXT>
   Feb 13 18:01:46 dnsmasq[11912]: config cachesize.bind is <TXT>
   Feb 13 18:01:46 dnsmasq[11912]: config insertions.bind is <TXT>
   Feb 13 18:01:46 dnsmasq[11912]: config evictions.bind is <TXT>
   Feb 13 18:01:46 dnsmasq[11912]: config hits.bind is <TXT>
   Feb 13 18:01:46 dnsmasq[11912]: config misses.bind is <TXT>
   Feb 13 18:01:46 dnsmasq[11912]: config auth.bind is <TXT>
   Feb 13 18:01:46 dnsmasq[11912]: config servers.bind is <TXT>

You're still getting hit for metrics on that server. Are you using a montioring package like Zabbix or an app for Android/iPad/iPhone?

Heya,

I'm using "Pi-hole Connect" on android, and Netdata for monitoring, but nothing else.

However, I have been using these tools for a while now with no ill effects until recently (including up until yesterday prior to updating).

Just as a test, I disabled the Android app, but still getting the hits from 127.0.0.1.

Here's the config I have if that helps:

https://hastebin.modem7.com/ogofe.yaml

Can you disable Netdata for a bit and see if the queries stop?

Sure thing!

Both netdata + android client have been stopped/disconnected, but still same deal, unfortunately.

I've also tried to disable conditional forwarding just in case, but still no go.

What is monitoring for that status and how is it monitoring the health state?

That's purely the autoheal container.

If the Docker container becomes "unhealthy", then it'll restart it.

With the healthcheck being the default of:

HEALTHCHECK CMD dig +short +norecurse +retry=0 @127.0.0.1 pi.hole || exit 1

docker-pi-hole/Dockerfile at 1b35e7fdd425ee2ac49b3aadc33ac1489c897ef3 · pi-hole/docker-pi-hole · GitHub

The state (just checked) currently is:

  • End 2022-02-13T20:38:47.714402306Z
  • ExitCode 0
  • Output 192.168.50.104
  • Start 2022-02-13T20:38:47.565325175Z

Some process is asking for dnsmasq's metrics every second. Usually that is a monitoring package or something that charts. It's not asking for identification, just the values for DNS resolution. There's no real reason for CHAOS txt records to be queried once a second and I can't think of any standard application that would do that.

1 Like

It has been a while since I played with Netdata, but I think the one-second frequency is what Netdata would use by default.

Also, the sequence of CHAOS TXT requests does match the one used by Netdata's dnsmasq collector exactly, further supporting those queries would be related to Netdata.

I can't explain why those requests won't fade once you disable Netdata, but please check Netdata's documentation on its dnsmasq collector.

1 Like

I'll try a full uninstallation of netdata on both VMs, and monitor it for a few hours - thanks!

FYI - there's also a dedicated Pihole collector as well which I admit, I did not configure manually. It was set to autoconfigure values.

If it is indeed the case that Netdata was causing it, at least we'll know the culprit, and if I figure out a way to get it working, I'll write up the resolution.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.