Pi-hole becomes unresponsive during pihole -d and -q

Just checked, and I do have this on release/v5.0, so the issue isn’t triggered by new/CNAME_inspection_details.

To the best of my knowledge I was up-to-date last night (11h ago, 9 PM PT) when I posted this first issue and did not have any problems. -q and -d both ran smoothly without issue. I had run -up prior to posting to make sure everything was fresh.

I believe “just” slow is the appropriate diagnosis. Everything times out, and my network reports being “offline”, likely due to DNS resolution timeouts. I was able to connect in a second SSH client, so the whole system isn’t totally unresponsive. Once it finished doing whatever it was doing, everything returns to normal.

Let me know if you want me to open a second thread, as this is now officially a second issue from what was originally posted.

I moved our posts into a new thread. This is likely caused by the debugger doing some very expensive database interactions. I will look at optimizing them over the next few days. We’ve not checked this ourselves with super-large blocking lists which is why it wasn’t all that apparent (just guessing here, though!).

Thanks for moving it!

Just added a clarification to the title, it also becomes unresponsive during -q. These may be two separate issues, but they both only appear to have cropped up with the latest updates

Keep up the good work, I very much appreciate everything you put into this.

I agree that these should be two separate issues.

Concerning the pihole -d issue, this PR should resolve it:

1 Like

Testing out the debugger after using checkout core tweak/debugger_performance

pihole -q still hogs all of my Zero’s resources, making it unresponsive for 5ish minutes.

Running pihole -d took my network down for 15 minutes this morning before I had to connect in a secondary SSH to force a reboot because it was impacting my wife’s morning routine.

I will try and decouple my network from the pi-hole so that I can formally time the process and provide a debug token for you, but long story short, changes in the past week have definitely lengthened the debug process for my ultra low powered system.

Let me know if you want me to set more verbose logging or anything as I transition my network off of my pi hole temporarily.

Okay … this is unexpected since I removed the one instruction that took the longest (absolute!) time in my test debugging process (link to specific line of code):

It’d be very helpful if you could watch it for a bit (not necessarily the entire lengthy time!) and tell us what is on the screen when the first massive delay occurs.

Decoupled my pi-hole this morning so that I can play around with it! Since this is a lengthy process, I’ll be live editing this post until it’s done. Incoming data dump:

time pihole -q google.com

Outputs

/opt/pihole/query.sh: line 37: 27006 Killed                  awk 'NR==FNR{regexps[$0];next}{for (r in regexps)if($0 ~ r)print r}' <(echo "${lists}") <(echo "${domain}") 2> /dev/null
  [i] Over 100 results found for google.com
        This can be overridden using the -all option

real    11m23.794s
user    0m38.839s
sys     0m20.687s

time pihole -d
Started at 8:28

*** [ DIAGNOSING ]: Name resolution (IPv4) using a random blocked domain and a known ad-serving domain

takes almost 30 seconds - seems long for 3 queries

*** [ DIAGNOSING ]: Groups

Started at 8:30, ran until 8:39

*** [ DIAGNOSING ]: Domainlist (0/1 = exact/regex whitelist, 2/3 = exact/regex blacklist)

8:39 start

flew through the first 10-ish items, then ~1 minute for the 11th whitelist entry and now it’s been stuck for a while. It’s worth noting here that the id counter is ordered:
1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13

This really doesn’t seem like a stage where there should be any delay - is it just reading from the list?

Moved on at about 8:50 and the rest flew by.

Final timing result:

real    23m28.055s
user    0m37.767s
sys     0m22.718s

Debug token for your enjoyment:
https://tricorder.pi-hole.net/52drki7d8c

It’s a bit faster than it was the other day

Pinging @DanSchaper for having a look at the debugger delay as well until the token expires in 47 hours.

Because I’m a masochist, I ran it a second time.

This time I’m getting held up on

*** [ DIAGNOSING ]: Info table

I’m getting the feeling that there may be something running in the background during -d

A necessary thing for a true bug fighter!

I’m pretty sure the debugger is a very simple single-task application but Dan might now more.

@Matt Could you watch htop (or nmon or glances or whatever you prefer) during the debugger run? I expect the majority of CPU power to be consumed by sqlite3 following your description, but it’s not yet clear what is the real bottleneck.

Away we go!

New -d, new pause points…

*** [ DIAGNOSING ]: Dashboard and block page

Imgur

SSH has stopped refreshing at this point. Will update if/when it refreshes.

Update: htop refreshed, pihole -d connection has not changed
Imgur

Thought: It appears they may be related! It looks like the slow -q performance is what’s operating in the background slowing down -d

Since I ran it anyways (~30 min), have another debug token: https://tricorder.pi-hole.net/uv26xfowci

Looks like the awk call in /opt/pihole/query.sh -adlist localhost -bp is eating up resources. That ties in to

2020-02-15 09:47:25: (mod_fastcgi.c.421) FastCGI-stderr: PHP Warning:  file(http://127.0.0.1/admin/scripts/pi-hole/php/queryads.php?domain=localhost&amp;bp): failed to open stream: HTTP request failed! in /var/www/html/pihole/index.php on line 134
   2020-02-15 09:47:25: (mod_fastcgi.c.421) FastCGI-stderr: PHP Warning:  array_filter() expects parameter 1 to be array, string given in /var/www/html/pihole/index.php on line 135
   2020-02-15 09:47:25: (mod_fastcgi.c.421) FastCGI-stderr: PHP Warning:  array_values() expects parameter 1 to be array, null given in /var/www/html/pihole/index.php on line 135

A hostname of localhost -bp doesn’t make any sense to me.

This is the line of code that is being triggered: https://github.com/pi-hole/pi-hole/blob/e41c4b5bb691cea1f5b950d39518d8c404b5846e/gravity.sh#L271

Somehow ${domain} is being populated by localhost -bp?

1 Like

Are you running pihole -d from the command line or from the web interface?

Command line

I’m not sure then why lighttpd and php are the parent processes for the calls. It looks like https://github.com/pi-hole/AdminLTE/blob/release/v5.0/scripts/pi-hole/php/queryads.php#L64 is the calling process.

1 Like

Just as a test, can you disable the Conditional Forwarding? I think there may be something looping with having a conditionally forwarded domain be the source for a blocklist/blacklist/whitelist.

Testing now, it’s stuck, so I’m thinking it didn’t change, forgot to pull up htop in a separate session, so I’ll loop back with more details in a minute

I’m not sure why pihole -q is looking up localhost, nor why it is being called via lighttpd and php during pihole -d, I’m sure that’s it’s own problem that this may have brought to light…

Rather than do a full pihole -d (takes about 30 minutes), I ran time pihole -q google.com (only takes 11 minutes) after disabling conditional forwarding. It has not improved performance, and I’m still having the same issue. Here’s a screenshot:

Imgur

The CPU usage fluctuates during the process, but RAM stays completely throttled the entire time.

That code hasn’t changed in 6 months or so, but let’s pull it apart.

sudo bash -v -x /opt/pihole/query.sh google.com and be prepared for a whole lot of output.

I’m going in! Wish me luck!

Will DM results, as they are likely to be long and probably contain some extra data on my network setup.

Initial result - it seems to be getting hung up while or just after going through my regex list. Will try removing all regex and see if that fixes it.