Pi-hole becomes unresponsive during pihole -d and -q

Matt · February 11, 2020, 2:51pm

pihole checkout ftl new/CNAME_inspection_details has allowed me to identify the CNAME which was being blocked (gstaticadssl.l.google.com)

However, on the new branch, querying the lists is now SIGNIFICANTLY more resource intensive, going from ~20-30 seconds to 10+ minutes, and slowing my puny Pi Zero to a crawl in the process (load averages pushing 8.0), breaking DNS resolution during the time.

When the query finally finished, it threw out this bit of text alongside the expected list data:

/opt/pihole/query.sh: line 37: 20058 Killed awk 'NR==FNR{regexps[$0];next}{for (r in regexps)if($0 ~ r)print r}' <(echo "${lists}") <(echo "${domain}") 2> /dev/null

Not sure if that helps...

Matt · February 11, 2020, 3:14pm

Just for good measure, here's a debug token:
https://tricorder.pi-hole.net/qvlfs0vypi

Generating the debug token took my network down during the *** [ DIAGNOSING ]: Gravity List and Database phase

DL6ER · February 11, 2020, 3:44pm

This checkout does only exchange the FTL binary, it does not change anything at all in the core code. Please check if you see this as well with FTL on the release/v5.0 branch. When did you last update? I updated the new/CNAME_inspection_details branch only a few hours ago to incorporate all the recent speed enhancements we've added to the beta.

Do you know whether your Pi-hole became completely unresponsive or "just" slow?

Matt · February 11, 2020, 4:24pm

Just checked, and I do have this on release/v5.0, so the issue isn't triggered by new/CNAME_inspection_details.

To the best of my knowledge I was up-to-date last night (11h ago, 9 PM PT) when I posted this first issue and did not have any problems. -q and -d both ran smoothly without issue. I had run -up prior to posting to make sure everything was fresh.

I believe "just" slow is the appropriate diagnosis. Everything times out, and my network reports being "offline", likely due to DNS resolution timeouts. I was able to connect in a second SSH client, so the whole system isn't totally unresponsive. Once it finished doing whatever it was doing, everything returns to normal.

Let me know if you want me to open a second thread, as this is now officially a second issue from what was originally posted.

DL6ER · February 11, 2020, 4:30pm

I moved our posts into a new thread. This is likely caused by the debugger doing some very expensive database interactions. I will look at optimizing them over the next few days. We've not checked this ourselves with super-large blocking lists which is why it wasn't all that apparent (just guessing here, though!).

Matt · February 11, 2020, 5:23pm

Thanks for moving it!

Just added a clarification to the title, it also becomes unresponsive during -q. These may be two separate issues, but they both only appear to have cropped up with the latest updates

Keep up the good work, I very much appreciate everything you put into this.

DL6ER · February 12, 2020, 6:56pm

I agree that these should be two separate issues.

Concerning the pihole -d issue, this PR should resolve it:
https://github.com/pi-hole/pi-hole/pull/3131

Matt · February 14, 2020, 4:07am

Testing out the debugger after using checkout core tweak/debugger_performance

pihole -q still hogs all of my Zero's resources, making it unresponsive for 5ish minutes.

Running pihole -d took my network down for 15 minutes this morning before I had to connect in a secondary SSH to force a reboot because it was impacting my wife's morning routine.

I will try and decouple my network from the pi-hole so that I can formally time the process and provide a debug token for you, but long story short, changes in the past week have definitely lengthened the debug process for my ultra low powered system.

Let me know if you want me to set more verbose logging or anything as I transition my network off of my pi hole temporarily.

DL6ER · February 15, 2020, 7:03am

Okay ... this is unexpected since I removed the one instruction that took the longest (absolute!) time in my test debugging process (link to specific line of code):

https://github.com/pi-hole/pi-hole/pull/3131/files#diff-499b751268b08d94233fe28bd8001e56L1138

It'd be very helpful if you could watch it for a bit (not necessarily the entire lengthy time!) and tell us what is on the screen when the first massive delay occurs.

Matt · February 15, 2020, 4:27pm

Decoupled my pi-hole this morning so that I can play around with it! Since this is a lengthy process, I'll be live editing this post until it's done. Incoming data dump:

time pihole -q google.com

Outputs

/opt/pihole/query.sh: line 37: 27006 Killed                  awk 'NR==FNR{regexps[$0];next}{for (r in regexps)if($0 ~ r)print r}' <(echo "${lists}") <(echo "${domain}") 2> /dev/null
  [i] Over 100 results found for google.com
        This can be overridden using the -all option

real    11m23.794s
user    0m38.839s
sys     0m20.687s

time pihole -d
Started at 8:28

*** [ DIAGNOSING ]: Name resolution (IPv4) using a random blocked domain and a known ad-serving domain

takes almost 30 seconds - seems long for 3 queries

*** [ DIAGNOSING ]: Groups

Started at 8:30, ran until 8:39

*** [ DIAGNOSING ]: Domainlist (0/1 = exact/regex whitelist, 2/3 = exact/regex blacklist)

8:39 start

flew through the first 10-ish items, then ~1 minute for the 11th whitelist entry and now it's been stuck for a while. It's worth noting here that the id counter is ordered:
1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13

This really doesn't seem like a stage where there should be any delay - is it just reading from the list?

Moved on at about 8:50 and the rest flew by.

Final timing result:

real    23m28.055s
user    0m37.767s
sys     0m22.718s

Debug token for your enjoyment:
https://tricorder.pi-hole.net/52drki7d8c

It's a bit faster than it was the other day

DL6ER · February 15, 2020, 5:11pm

Pinging @DanSchaper for having a look at the debugger delay as well until the token expires in 47 hours.

Matt · February 15, 2020, 5:30pm

Because I'm a masochist, I ran it a second time.

This time I'm getting held up on

*** [ DIAGNOSING ]: Info table

I'm getting the feeling that there may be something running in the background during -d

DL6ER · February 15, 2020, 5:38pm

A necessary thing for a true bug fighter!

I'm pretty sure the debugger is a very simple single-task application but Dan might now more.

@Matt Could you watch htop (or nmon or glances or whatever you prefer) during the debugger run? I expect the majority of CPU power to be consumed by sqlite3 following your description, but it's not yet clear what is the real bottleneck.

Matt · February 15, 2020, 5:53pm

Away we go!

New -d, new pause points...

*** [ DIAGNOSING ]: Dashboard and block page

SSH has stopped refreshing at this point. Will update if/when it refreshes.

Update: htop refreshed, pihole -d connection has not changed

Thought: It appears they may be related! It looks like the slow -q performance is what's operating in the background slowing down -d

Since I ran it anyways (~30 min), have another debug token: https://tricorder.pi-hole.net/uv26xfowci

DanSchaper · February 16, 2020, 12:59am

Looks like the awk call in /opt/pihole/query.sh -adlist localhost -bp is eating up resources. That ties in to

2020-02-15 09:47:25: (mod_fastcgi.c.421) FastCGI-stderr: PHP Warning:  file(http://127.0.0.1/admin/scripts/pi-hole/php/queryads.php?domain=localhost&amp;bp): failed to open stream: HTTP request failed! in /var/www/html/pihole/index.php on line 134
   2020-02-15 09:47:25: (mod_fastcgi.c.421) FastCGI-stderr: PHP Warning:  array_filter() expects parameter 1 to be array, string given in /var/www/html/pihole/index.php on line 135
   2020-02-15 09:47:25: (mod_fastcgi.c.421) FastCGI-stderr: PHP Warning:  array_values() expects parameter 1 to be array, null given in /var/www/html/pihole/index.php on line 135

A hostname of localhost -bp doesn't make any sense to me.

This is the line of code that is being triggered: pi-hole/gravity.sh at e41c4b5bb691cea1f5b950d39518d8c404b5846e · pi-hole/pi-hole · GitHub

Somehow ${domain} is being populated by localhost -bp?

DanSchaper · February 16, 2020, 1:18am

Are you running pihole -d from the command line or from the web interface?

Matt · February 16, 2020, 1:19am

Command line

DanSchaper · February 16, 2020, 1:21am

I'm not sure then why lighttpd and php are the parent processes for the calls. It looks like https://github.com/pi-hole/AdminLTE/blob/release/v5.0/scripts/pi-hole/php/queryads.php#L64 is the calling process.

DanSchaper · February 16, 2020, 1:25am

Just as a test, can you disable the Conditional Forwarding? I think there may be something looping with having a conditionally forwarded domain be the source for a blocklist/blacklist/whitelist.

Matt · February 16, 2020, 1:38am

Testing now, it's stuck, so I'm thinking it didn't change, forgot to pull up htop in a separate session, so I'll loop back with more details in a minute