Pihole-ftl using all my CPU and breaks all internet connectivity


#1

Please follow the below template, it will help us to help you!

Expected Behaviour:

PiHole should be providing DNS resolutions

Actual Behaviour:

PiHole has high load on the admin page and devices do not have internet access, so they disconnect. Smart TVs and voice assistants disconnect and say they are having trouble connecting. Android phones shows the wifi signal bars with an x on them and they only use cellular data connection but can still connect to other local clients.
PiHole-FTL was showing 70-100% CPU usage in TOP.
I tried to debug using the admin interface but it just froze, likely because it was overloaded.
pihole -d hung on the “[ DIAGNOSING ]: Web version” for 3 hours.
After rebooting, killing pihole-ftl, then “pihole -r” repairing, the debug went though but did not contain any of the problems in the log.
Debug logs failed to upload. 6 of them failed to upload and the 7th failed but finished debug and the log did not contain any problems since it was after the repair.
I dumped “pihole -t” and saw about 3 hours of this line repeating:

Jan  1 19:32:19 dnsmasq[668]: query[SRV] _ldap._tcp.dc._msdcs.WORKGROUP.home from 192.168.1.1
Jan  1 19:32:19 dnsmasq[668]: forwarded _ldap._tcp.dc._msdcs.WORKGROUP.home to 192.168.1.1

192.168.1.1 is my router running DDWRT.

My PiHole is running on a Pi Zero W with a NOOBS install of raspbian and only has PiHole installed.

I am seeing this problem daily and it just kills internet and devices disconnect due to no internet.

Debug Token:

Debug logs (7 of them) failed to upload. 6 of them hung at step 3, DIAGNOSING, until I ctrl+c’d them.
I pulled the pihole_debug-sanitized.log and can upload it privately somewhere but it only completed past DIAGNOSING when pihole -r finished, so the log is of a perfectly working PiHole.


#2

How many domains are shown on blocklist in your dashboard?

Do you have conditional forwarding enabled?

What upstream DNS server(s) is Pi-Hole using?


#3

How many domains are shown on blocklist in your dashboard?
707,607

Do you have conditional forwarding enabled?
Yes. 192.168.1.1 and home

What upstream DNS server(s) is Pi-Hole using?
Both boxes checked for IPv4, no ipv6 because my ISP doesn’t offer it.
Google (ECS), OpenDNS (ECS), Cloudflare

I just tried to upload another “pihole -d -a” log and it failed again with:

[i] Debug script running in automated mode
    * Using openssl for transmission.
[✗]  There was an error uploading your debug log.
   * Please try again or contact the Pi-hole team for assistance.
   * A local copy of the debug log can be found at: /var/log/pihole_debug-sanitized.log

Rebooting the Pi is the only thing that fixes it onces it starts this overloading thing.


#4

Let’s get some of the load off it.

Revert back to the 7 original blocklists and rebuild gravity.

Turn off conditional forwarding.

Restart FTL with sudo service pihole-FTL restart

After that see if you can upload a debug log.


#5

[✓] Your debug token is: 6g0cilo8ya

I tried removing all my block lists but then gravity wouldn’t rebuild. Tried to reboot the Pi but gravity still wouldn’t rebuild do to not being able to resolve.
I went into my router and forced my dns to go to Google and my pihole and the debug uploaded.
The problem seems like the pihole can’t get dns or is asking my router to resolve but that won’t work because the router is set to only use my pihole for dns.
I have my dns server set to the pihole and used dnsmasq service to set dns to the pihole using dhcp-option=6.

Looks like it’s my setup causing the resolving issues but why is my pihole asking my router for the dns server when my router just sends back the pinholes own ip?

If I leave the setup on my router like this, ads and tracking are getting through because my dns server 2 and 3 are Google.


#6

OK, with token in hand we see the problem.

pihole-FTL is not running (but is trying to run). The reason it isn’t running is that in the previous 24 hours, you had the following activity on the Pi-Hole (from the tail of the pihole-FTL.log near the end of your debug log). The offender here is the 3,263,220 queries in that time (that’s a lot). These queries are likely circular traffic between the Pi-Hole and the router, but could be another problem.

   [2019-01-01 23:26:25.138]  -> Total DNS queries: 3263220
   [2019-01-01 23:26:25.138]  -> Cached DNS queries: 7639
   [2019-01-01 23:26:25.138]  -> Forwarded DNS queries: 3254018
   [2019-01-01 23:26:25.139]  -> Exactly blocked DNS queries: 1544
   [2019-01-01 23:26:25.139]  -> Unknown DNS queries: 19
   [2019-01-01 23:26:25.139]  -> Unique domains: 859
   [2019-01-01 23:26:25.139]  -> Unique clients: 7
   [2019-01-01 23:26:25.139]  -> Known forward destinations: 7

When pihole-FTL starts, the default behavior is to read the previous 24 hours of data from the long term database. The same section of the debug log shows FTL allocating more and more resources until your Pi runs out of RAM and mis-behaves.

   [2019-01-01 23:26:23.151] Notice: Increasing queries struct size from 3220000 to 3230000
   [2019-01-01 23:26:23.550] Notice: Increasing queries struct size from 3230000 to 3240000
   [2019-01-01 23:26:23.979] Notice: Increasing queries struct size from 3240000 to 3250000
   [2019-01-01 23:26:24.469] Notice: Increasing queries struct size from 3250000 to 3260000
   [2019-01-01 23:26:24.970] Notice: Increasing queries struct size from 3260000 to 3270000

The solution is two fold - first we move the long term database to another file name (so you don’t lose all the data, or you can just delete it if you don’t care about the previous data), then restart FTL with conditional forwarding off (your current setting) and see if behavior improves. We use the methods described in this part of the documentation (https://docs.pi-hole.net/ftldns/database/).

sudo service pihole-FTL stop
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL.db.old
sudo service pihole-FTL start

This should get you up and running again. Watch the number of queries for the first 24 hours. If they start climbing rapidly to the millions, the root cause is not corrected and the problem will recur. If they stay in the tens of thousands (typical for a home network), then you are set.

And, don’t forget to put your router back pointing to Pi-Hole DNS only, and remove the entries that bypass Pi-Hole.

This thread from a few weeks ago is related:


#7

Thank you so much for looking over my log and explaining what was happening.
I am super new to pihole, so learning about these problems and how to fix them really helps. And as a result of that, I am also becoming familiar with how to use Dnsmasq.

I tried flushing the log multiple times but it would just hang. The pihole.log was 5gb I think so that, mixed with a maxed cpu, probably didn’t help.

I did a “pihole uninstall” last night then started over fresh noting any major changes I made and I found a few mistakes that I made.

I referenced this guide: How do I configure my devices to use Pi-hole as their DNS server?

First, on the ddwrt router, I used dnsmasq to set dns and I had 3 dns entries in my static dns fields. I used my pihole ip+two other inactive ips on the Lan. I also had 'local dns" and “gateway” filled out. This was a combination of numbers 1 and the first part of 2 in that guide.
Step 2 of that setup guide should really say to set those entries to 0.0.0.0.
I am now only using dnsmasq and the other entries are all 0.0.0.0.

Second, on the pihole’s admin page for dns settings, I didn’t have my router ip set in the upstream dns. I had only the public upstream dns servers checked. (Google, opendns, etc).
Another variation of this that didn’t work for me was having my custom upstream set to my router AND had a public dns server checked. This resulted in the Pi being unable to resolve.

I haven’t turned on conditional forwarding yet, but I am going to do that tonight.

Why can’t I check a public upstream dns only and no custom upstream to my router? And why can’t I use a combination of public upstream dns and a router custom upstream dns entry? (both result in the Pi being unable to resolve itself)


#8

You can do this, but this is the setup for Option 1.

There is no benefit to this. Either use the method of Option 1 or Option 2, not both.


closed #9

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.