DNS not resolving. Lots of FTL errors

Rob_Hilken · June 24, 2025, 8:14am

Please follow the below template, it will help us to help you!

Expected Behaviour:

Operating System - DietPi 9.14.2
PiHole - Core [v6.1.2] FTL [v6.2.3] Web interface [v6.2.1]
Hardware - RPi B (armv6l)

Actual Behaviour:

Errors started after no obvious changes, but I have had to remove the pihole as DNS server in order to use the internet at all.

I have removed the FTL database and restarted the service and the errors still occur. I used:

systemctl stop pihole-FTL 
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL.db.bck
systemctl start pihole-FTL

There are still lots of errors in the FTL log, including

CRIT Corrupt binary detected - this may lead to unexpected behaviour!
ERROR SQLite3: database corruption at line 96760 of [17144570b0] (11)
ERROR Cannot receive UDP DNS reply: Timeout - no response from upstream DNS server.

Debug Token:

https://tricorder.pi-hole.net/yGSpQBmo/

deHakkelaar · June 25, 2025, 5:06am

If a binary and dbase file is corrupted, I suspect a power issue and after that, a dying SD card.
Check below link and the systemd journals for power issues:

Rob_Hilken · June 25, 2025, 7:49am

There are no voltage warnings in the systemd journals. Is there a way to check the SD card?

I have removed the FTL database again and this time it recreated with no errors, but I am still getting DNS upstream errors:

2025-06-25 08:57:24.104 ERROR Cannot receive UDP DNS reply: Timeout - no response from upstream DNS server
2025-06-25 08:57:24.104 INFO Tried to resolve PTR "8.8.8.8.in-addr.arpa" on 127.0.0.1#53 (UDP)
2025-06-25 08:58:10.504 ERROR Cannot receive UDP DNS reply: Timeout - no response from upstream DNS server
2025-06-25 08:58:10.504 INFO Tried to resolve PTR "8.8.8.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.6.8.4.0.6.8.4.1.0.0.2.ip6.arpa" on 127.0.0.1#53 (UDP)
2025-06-25 08:58:35.704 ERROR Cannot receive UDP DNS reply: Timeout - no response from upstream DNS server
2025-06-25 08:58:35.704 INFO Tried to resolve PTR "4.4.8.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.6.8.4.0.6.8.4.1.0.0.2.ip6.arpa" on 127.0.0.1#53 (UDP)

new error messages:

2025-06-25 09:03:00.490 WARNING Long-term load (15min avg) larger than number of processors: 1.0 > 1
2025-06-25 09:03:01.521 ERROR add_message(type=6, message=excessive load) - SQL error step DELETE: database is locked
2025-06-25 09:03:01.521 ERROR Error while trying to close database: database is locked
2025-06-25 09:03:02.504 ERROR Cannot receive UDP DNS reply: Timeout - no response from upstream DNS server

Is this an indication that my old RPi2B with just one core isn't cutting it any more?

deHakkelaar · June 26, 2025, 3:42am

Running a filesystem check (fsck) on a live mounted filesystem is a bit tricky.
Do you have another Linux host with an SD card slot where you can perform the fsck?
You could try below on the Pi (read only, no fixing) to see if it detects something wrong:

sudo fsck -n -f /dev/mmcblk0p2

I still run Pi-hole on a Pi 1B.
The first Pi with Ethernet.
And I've seen folks run Pi-hole on even less like a Pogoplug/stick.

Above ones are conflicting!
Which is it if run below?

cat /proc/device-tree/model; echo

Bc from that Raspi link, the Pi 1B (without the +) doesnt have the "low-voltage detection circuitry":

On all models of Raspberry Pi since the Raspberry Pi B+ (2014) except the Zero range, there is low-voltage detection circuitry that will detect if the supply voltage drops below 4.63V (±5%).

From above two, I suspect a DNS loop or partial loop troubling your setup.
What does below one show for upstreams?

dig +short @localhost servers.bind chaos txt

And did you configure the router WAN/Internet DNS settings to point to the Pi-hole IP?

Rob_Hilken · June 26, 2025, 5:34pm

I can run a scan on another linux machine.

To answer your other questions though...

RPi model is: Raspberry Pi Model B Rev 1

The dig is giving:

;; communications error to ::1#53: timed out
;; communications error to ::1#53: timed out
;; communications error to ::1#53: timed out
;; communications error to 127.0.0.1#53: timed out
;; no servers could be reached

I haven't currently got the router configured to use the Pi-hole for DNS as I need internet access.

I'm going to scan the SD card now and will post the results.

deHakkelaar · June 26, 2025, 11:50pm

Thats the Pi 1B without the voltage check.
Below a Pi 1B+:

$ cat /proc/device-tree/model; echo
Raspberry Pi Model B Rev 2

So it might be worth to test with another AC adapter or USB cable for power.
Pi's are infamous for crashing and corrupting storage when the power isn't ok.

Output for below instead?

sudo grep server= /etc/pihole/dnsmasq.conf

Pending outcome for a fsck, if there are many errors, I'd recommend flashing the SD card new instead of trying to fix with fsck.

Rob_Hilken · July 1, 2025, 12:56pm

OK I have now bought a new AC adapter, and there are no bad blocks on the SD card, so hopefully everything from hereonin will be configuration issues!

results of sudo grep server= /etc/pihole/dnsmasq.conf

server=8.8.8.8
server=8.8.4.4
server=2001:4860:4860:0:0:0:0:8888
server=2001:4860:4860:0:0:0:0:8844
server=208.67.222.222
server=208.67.220.220
server=2620:119:35::35
server=2620:119:53::53
server=4.2.2.1
server=4.2.2.2
server=8.26.56.26
server=8.20.247.20
server=84.200.69.80
server=84.200.70.40
server=2001:1608:10:25:0:0:1c04:b12f
server=2001:1608:10:25:0:0:9249:d69b
server=9.9.9.10
server=149.112.112.10
server=2620:fe::10
server=2620:fe::fe:10
server=1.1.1.1
server=1.0.0.1
server=2606:4700:4700::1111
server=2606:4700:4700::1001
server=127.0.0.1#5335
server=/test/
server=/localhost/
server=/invalid/
server=/bind/
server=/onion/

If I want to use unbound, should I have removed all of the other upstream servers?

deHakkelaar · July 1, 2025, 6:48pm

I dont see any servers that could create a loop or a partial one.

Yes.
Check below " Improve detection algorithm ..." paragraph for which server(s) Pi-hole will prefer :

But before you remove the others, make sure Unbound is functioning properly by running below ones:

$ dig +noall +comments +answer @127.0.0.1 -p 5335 bogus.nlnetlabs.nl
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5007
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232

$ dig +noall +comments +answer +ad @127.0.0.1 -p 5335 cloudflare.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59669
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; ANSWER SECTION:
cloudflare.com.         300     IN      A       104.16.132.229
cloudflare.com.         300     IN      A       104.16.133.229

The first command should return a SERVFAIL status and no IP address ANSWER.
The second should return a NOERROR status plus an IP address in the ANSWER section in addition to an ad flag.

Ps. those dig's are from below pull to adjust the ones in the official Pi-hole guide:

github.com/pi-hole/docs

Fix Bogus DNSSEC Validation Domain in unbound.md

master ← HeliusMagnum:master

opened 05:31AM - 12 Jun 25 UTC

HeliusMagnum

+3 -3

## Thank you for your contribution to the Pi-hole Community! Please read the …comments below to help us consider your Pull Request. We are all volunteers and completing the process outlined will help us review your commits quicker. **Please make sure you** 1. Base your code and PRs against the repositories developmental branch. 2. [Sign Off](https://docs.pi-hole.net/guides/github/how-to-signoff/) all commits as we enforce the [DCO](https://docs.pi-hole.net/guides/github/dco/) for all contributions 3. [Sign](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits) all your commits as they must have verified signatures 4. File a pull request for any change that requires changes to [our documentation](https://docs.pi-hole.net/) at our [documentation repo](https://github.com/pi-hole/docs) --- **What does this PR aim to accomplish?:** Fix the issue with `fail01.dnssec.works` returning `NOERROR` **How does this PR accomplish the above?:** Changes the test domain for a bogus DNSSEC validation to bogus.nlnetlabs.nl which returns `SERVERFAIL` Changes the test domain for a successful DNSSEC validation to `cloudflare.com` to avoid any future misconfiguration with the `dnssec.works` domains, specifying the `+ad` flag. resolves #1251 --- **By submitting this pull request, I confirm the following:** 1. I have read and understood the [contributors guide](https://docs.pi-hole.net/guides/github/contributing/), as well as this entire template. I understand which branch to base my commits and Pull Requests against. 2. I have commented my proposed changes within the code and I have tested my changes. 3. I am willing to help maintain this change if there are issues with it later. 4. It is compatible with the [EUPL 1.2 license](https://opensource.org/licenses/EUPL-1.1) 5. I have squashed any insignificant commits. ([`git rebase`](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)) 6. I have checked that another pull request for this purpose does not exist. 7. I have considered, and confirmed that this submission will be valuable to others. 8. I accept that this submission may not be used, and the pull request closed at the will of the maintainer. 9. I give this submission freely, and claim no ownership to its content. --- - [x] I have read the above and my PR is ready for review. *Check this box to confirm*

Rob_Hilken · July 1, 2025, 8:13pm

Both digs returned correctly.

Still getting lots of errors in the FTL logs

2025-07-01 20:56:47.837 WARNING Long-term load (15min avg) larger than number of processors: 1.7 > 1
2025-07-01 20:56:48.858 ERROR add_message(type=6, message=excessive load) - SQL error step DELETE: database is locked
2025-07-01 20:56:48.859 ERROR Error while trying to close database: database is locked

deHakkelaar · July 1, 2025, 8:20pm

Is it pihole-FTL thats causing the high load?
Can check with the top or htop commands when experiencing those messages.

EDIT: Oh did you run a Pi-hole gravity pull manually at those times?
That can cause a bit of excessive load on a Pi 1B.
But this one is scheduled on a early Sunday morning so should trouble you:

$ cat /etc/cron.d/pihole
[..]
# Pi-hole: Update the ad sources once a week on Sunday at a random time in the
#          early morning. Download any updates from the adlists
#          Squash output to log, then splat the log to stdout on error to allow for
#          standard crontab job error handling.
41 3   * * 7   root    PATH="$PATH:/usr/sbin:/usr/local/bin/" pihole updateGravity >/var/log/pihole/pihole_updateGravity.log || cat /var/log/pihole/pihole_updateGravity.log

Rob_Hilken · July 1, 2025, 8:54pm

Yes it is pihole-FTL, and no not running a gravity update.

deHakkelaar · July 1, 2025, 8:57pm

If you tail/follow the logs with below, are there an excessive amount of queries?
Can you post a snippet of those excessive queries?

sudo pihole tail

What do below ones output?

sudo pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "pragma integrity_check"

sudo pihole-FTL sqlite3 /etc/pihole/gravity.db "pragma integrity_check"

Do you run any external software that interacts with Pi-hole?

Rob_Hilken · July 1, 2025, 9:08pm

I ran a pihole -d again. Results at https://tricorder.pi-hole.net/XLSAVobk/

sudo pihole tail

I can't see any evidence of an excessive load.

What do below ones output?

sudo pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "pragma integrity_check"

ok

sudo pihole-FTL sqlite3 /etc/pihole/gravity.db "pragma integrity_check"

ok

Do you run any external software that interacts with Pi-hole?

I run PiVPN but it doesn't interact with pi-hole I don't think.

deHakkelaar · July 1, 2025, 9:15pm

Ok I have no clue.
Maybe a dev/mod can see anything wrong in the debug log.

Rob_Hilken · July 1, 2025, 9:21pm

Thanks a million for your help. I'm going to try a reinstall if I can't get to the bottom of it.

deHakkelaar · July 2, 2025, 3:37am

I too was in the process of reinstalling the whole thing on my two Pi 1B's so I created below for reference:

Below is the current idle load doing nothing:

$ uptime
 05:32:34 up  1:28,  2 users,  load average: 0.08, 0.05, 0.20

I have not switched live yet but dont expect the load to increase that much.
Maybe .40 or .50 max as was with the Pi-hole v5 release.

deHakkelaar · July 2, 2025, 1:32pm

FYI, I've installed Unbound on top and configured the Pi to do DHCP services for my LAN.
Its live now and getting hammered by a Samsung TV (a query every second):

$ uptime
 15:26:55 up 11:22,  1 users,  load average: 0.51, 0.41, 0.47

And if I run a gravity pull (below is the max):

$ uptime
 15:29:09 up 11:24,  1 users,  load average: 1.66, 0.70, 0.55

Rob_Hilken · July 3, 2025, 3:21pm

I reinstalled unbound and pi-hole, removed PiVPN from that device and added a second pi-hole and things are running OK again now. Load is still quite high, but manageable I think, and the database seems stable.

$ uptime
 16:20:16 up 11 min,  1 user,  load average: 1.30, 1.28, 0.88

deHakkelaar · July 3, 2025, 5:16pm

Thats still a bit strange as when I disconnect that Samsung TV of mine, the load settles to below:

$ uptime
 19:13:30 up 1 day, 15:09,  1 user,  load average: 0.25, 0.23, 0.21

With about 16 clients connected (Linux, Windows, Android, iOS).

deHakkelaar · July 4, 2025, 8:54pm

I hooked up the Samsung TV again and it hammered my Pi 1B for a good 5 minutes with Netflix related queries while I dont have a subscription.
Some blocked and some not.
After that I've been monitoring with below and the max I saw was .55 but that was for the 1 minute interval.
The 15 minutes interval max was .20 ish.

watch --exec awk '{print $1,$2,$3}' /proc/loadavg

This is what I mostly get with about 16 clients:

Every 2.0s: awk {print $1,$2,$3} /proc/loadavg ph6a: Fri Jul  4 22:47:38 2025

0.24 0.18 0.18

FYI:

$ man proc
[..]
       /proc/loadavg
              The first three fields in this file are load average figures
              giving the number of jobs in the  run  queue  (state  R)  or
              waiting  for  disk  I/O (state D) averaged over 1, 5, and 15
              minutes.  They are the same  as  the  load  average  numbers
              given  by  uptime(1)  and  other programs.  The fourth field
              consists of two numbers separated by a slash (/).  The first
              of these is the number of currently runnable kernel schedul‐
              ing entities (processes,  threads).   The  value  after  the
              slash  is the number of kernel scheduling entities that cur‐
              rently exist on the system.  The fifth field is the  PID  of
              the process that was most recently created on the system.