Config error is REFUSED every night

Expected Behaviour:

PiHole continues to resolve host names

Actual Behaviour:

Every night, are around 23h00 every DNS query causes the following error in the log file and no DNS result is returned:
May 29 23:05:27 dnsmasq[7338]: config error is REFUSED

Restarting the dns service resolves the issue until the same or similar time that night. (pihole restartdns)

Debug Token:

https://tricorder.pi-hole.net/7ecy0g0lc8

I tried enabling FTP debug logging and disabling the new throttling feature per some other related threads, but saw no clues (other than filling up my disk with log files)

I don't know if it's coincidence that its always towards the end of the the day every day, but it is strangely regular.

Did you select OpenDNS as upstream for Pi-hole?

Other DNS providers might deploy similar rate limiting or other reasons to reply with a REFUSED status.

echo ">stats >quit" | nc localhost 4711
domains_being_blocked 82948
dns_queries_today 40988
ads_blocked_today 514
ads_percentage_today 1.254026
unique_domains 1381
queries_forwarded 6379
queries_cached 34013
clients_ever_seen 57
unique_clients 57
dns_queries_all_types 40988
reply_NODATA 8922
reply_NXDOMAIN 2813
reply_CNAME 2381
reply_IP 19045
privacy_level 0
status enabled

Not sure of why I might be rate limited. I don't have an OpenDNS account. Are you suggesting I should try a different provider and see if the issue persists?

What else is happening on the Pi or in your network around 2300 daily? Look in /var/log/syslog.1 for yesterday's entries and see what shows around 2300. Any cron scripts running, etc?

I see the following at about the same time that the DNS stops responding. UID 999 is the pihole user.

May 29 23:08:31 pihole systemd[1]: session-c48.scope: Succeeded.
May 29 23:08:41 pihole systemd[1]: Stopping User Manager for UID 999...
May 29 23:08:41 pihole systemd[7051]: Stopped target Default.
May 29 23:08:41 pihole systemd[7051]: Stopped target Basic System.
May 29 23:08:41 pihole systemd[7051]: Stopped target Paths.
May 29 23:08:41 pihole systemd[7051]: Stopped target Timers.
May 29 23:08:41 pihole systemd[7051]: Stopped target Sockets.
May 29 23:08:41 pihole systemd[7051]: gpg-agent.socket: Succeeded.
May 29 23:08:41 pihole systemd[7051]: Closed GnuPG cryptographic agent and passphrase cache.
May 29 23:08:41 pihole systemd[7051]: gpg-agent-browser.socket: Succeeded.
May 29 23:08:41 pihole systemd[7051]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
May 29 23:08:41 pihole systemd[7051]: gpg-agent-ssh.socket: Succeeded.
May 29 23:08:41 pihole systemd[7051]: Closed GnuPG cryptographic agent (ssh-agent emulation).
May 29 23:08:41 pihole systemd[7051]: gpg-agent-extra.socket: Succeeded.
May 29 23:08:41 pihole systemd[7051]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
May 29 23:08:41 pihole systemd[7051]: dirmngr.socket: Succeeded.
May 29 23:08:41 pihole systemd[7051]: Closed GnuPG network certificate management daemon.
May 29 23:08:41 pihole systemd[7051]: Reached target Shutdown.
May 29 23:08:41 pihole systemd[7051]: systemd-exit.service: Succeeded.
May 29 23:08:41 pihole systemd[7051]: Started Exit the Session.
May 29 23:08:41 pihole systemd[7051]: Reached target Exit the Session.
May 29 23:08:41 pihole systemd[1]: user@999.service: Succeeded.
May 29 23:08:41 pihole systemd[1]: Stopped User Manager for UID 999.
May 29 23:08:41 pihole systemd[1]: Stopping User Runtime Directory /run/user/999...
May 29 23:08:41 pihole systemd[1]: run-user-999.mount: Succeeded.
May 29 23:08:41 pihole systemd[1]: user-runtime-dir@999.service: Succeeded.
May 29 23:08:41 pihole systemd[1]: Stopped User Runtime Directory /run/user/999.
May 29 23:08:41 pihole systemd[1]: Removed slice User Slice of UID 999.

User pihole or user pi?

from /etc/passwd:
pihole:x:999:995::/home/pihole:/usr/sbin/nologin

If its not the upstream DNS provider thats refusing (select different one if want to diagnose), maybe below is related:

Your debug log is truncated at the dhcp-discover section. It looks like you're running some additional features on your server.

Can you post the output from running pihole-FTL dhcp-discover at the Pi-hole servers terminal?

Edit: And do you have any additional configuration files in /etc/dnsmasq.d?

With pleasure:

pihole-FTL dhcp-discover
Scanning all your interfaces for DHCP servers
Timeout: 10 seconds

* Received 329 bytes from eth0:192.168.1.4
  Offered IP address: 192.168.1.179
  Server IP address: 192.168.1.20
  Relay-agent IP address: N/A
  BOOTP server: (empty)
  BOOTP file: netboot.xyz.kpxe
  DHCP options:
   Message type: DHCPOFFER (2)
   server-identifier: 192.168.1.4
   lease-time: 86400 ( 1d )
   renewal-time: 43200 ( 12h )
   rebinding-time: 75600 ( 21h )
   netmask: 255.255.255.0
   broadcast: 192.168.1.255
   dns-server: 192.168.1.4
   domain-name: "highpeak.co.za"
   tftp-server: "192.168.1.20\0"
   ntp-server: 192.168.1.4
   router: 192.168.1.1
   --- end of options ---

DHCP packets received on interface eth0: 1
DHCP packets received on interface wlan0: 0
DHCP packets received on interface lo: 0

I have the following files:


-rw-r--r--  1 root root 1495 May 30 19:43 01-pihole.conf
-rw-r--r--  1 root root  670 May 30 19:43 02-pihole-dhcp.conf
-rw-r--r--  1 root root 4013 May  2 14:36 04-pihole-static-dhcp.conf
-rw-r--r--  1 root root  489 Nov 17  2020 99-pihole-custom.conf

BTW - no issues last night after switching to Cloudflare DNS instead of OpenDNS

So most likely one of your clients is misbehaving at around that time causing the upstream OpenDNS to refuse.
Inspect the Pi-hole logs whats being queried around that time and who is querying that might cause this (when have OpenDNS upstream):

zgrep -E 'May 30 (22|23)' /var/log/pihole.log* | less

Adjust date and the "22" & "23" PM hour above to refine your search.

EDIT: adjusted the grep a bit

I figured the same, but according to the Pihole dashboard my queries are spread evenly across the day (with natural dips at night due to lower activity).

However, you have pointed me in the right direction. I have another issue that I need advice to solve. I've posted it here 47388

I'm getting a lot of "not found: 3 (NXDOMAIN)" errors. Which I'm guessing could result in the rate limiting?

Yeah possibly.
Whats the top 10?

zgrep -E 'May 28 (22|23)' /var/log/pihole.log* | grep ': query' | awk '{print $6 " " $8}' | sort | uniq -c | sort -nr | column -t | head -10

Can you confirm that tftp-server line? The email notification I received shows:

   tftp-server: "192.168.1.20%{email_content}"
   ntp-server: 192.168.1.4
   router: 192.168.1.1
   --- end of options ---

That would be a misconfigured tftp-server line and that % is right where the debug upload broke. I've seen other uploads break in similar ways and they all had illegal characters as the cause.

What is the contents of that file, was it one you manually created?

The offending line was:
dhcp-option=66,"192.168.1.20"

I'm not sure why it's adding the /0 at the end when I run a pihole-FTL dhcp-discover

The contents of the 99-pihole-custom.conf are:

server=/[hidden].local/172.x.x.x
server=/x.x.x.in-addr.arpa./172.x.x.x
rev-server=172.x.x.x/24,172.x.x.x
dhcp-option=option:ntp-server,192.168.1.4
dhcp-option=tag:smartdns, option:dns-server, 154.127.57.224
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-boot=netboot.xyz.kpxe,,192.168.1.20
dhcp-boot=tag:lenovo,netboot.xyz-undionly.kpxe,,192.168.1.20
dhcp-boot=tag:efi-x86_64,netboot.xyz.efi,192.168.1.20
# dhcp-option=66,"192.168.1.20"
address=/xxx.highpeak.co.za/192.168.1.20

Are you using the built-in TFTP server that comes with dnsmasq|pihole-FTL?

Edit: And I'm not sure, but I don't think spaces work well in dhcp-option lines, like the smartdns line. I don't know if these changes will help with the REFUSED issue but they will help us rule things out.

Nope, I'm using the netboot.xyz docker container that has it's own TFTP server https://netboot.xyz/

OK - spaces removed to be on the safe side. New debug token created:
https://tricorder.pi-hole.net/ldm8hk0t40

Changing to the Cloudflare DNS was a red herring as it's still happening. We've had rolling blackouts here in South Africa so the timing of the REFUSED entries has varied. This really seems like it's some sort of rate limiting but I can't find a pattern as to what.

I combed the logs around the time I see the REFUSED responses start to show up but nothing out of the ordinary.

We're also now getting a clean TFTP server entry in the DHCP response:

domain-name: "highpeak.local"
tftp-server: "192.168.1.20"
ntp-server: 192.168.1.4

Finally, is there a way I could schedule a cron job to restart DNS for me until I can crack this so I don't lose my DNS at random times?

Have you tried below to pinpoint?

You could create a watchdog script by creating below file:

sudo nano /root/pi-hole-watchdog.sh

Containing below:

#!/bin/bash

DOMAIN=pi-hole.net
[ ! $(dig +short @localhost $DOMAIN) >/dev/null 2>&1 ] && /usr/sbin/service pihole-FTL restart >/dev/null 2>&1

Make it executable:

sudo chmod u+x /root/pi-hole-watchdog.sh

Edit the crontab for root:

sudo crontab -e

Add below line at the bottom to schedule every minute:

* * * * * /root/pi-hole-watchdog.sh

And save/exit.
Consequences are that a considerable amount of queries will be logged in the dbase plus Pi-hole will be restarted every minute if for whatever reason the dig command cant resolve that DOMAIN.

EDIT: my initial attempt failed running inside cron.
Have tested the changes I made.

1 Like