After updating from v5 to v6, experienced error: Maximum number of concurrent DNS queries reached (max: 150) and dns does not work at all

Hi!

I have already opened a bug report on github, but I also wanted to use the opportunity to describe my errors here. I don't know if this is directly related to the upgrade to the v6 developer - but everything worked until today at noon before the update.


Versions

Pi-hole version is development-v6 v5.17.1-161-g2841a33 (Latest: v5.17.2)
web version is development-v6 v5.19-334-ge3a547df (Latest: v5.20.2)
FTL version is development-v6 vDev-c5067b3 (Latest: v5.23)

Platform

raspberry pi 3, raspberry pi os 32 bit

Expected behavior

I expected that the pihole worked after upgraded to development v6 and after removal of php and lighthttpd.

Actual behavior / bug

Pihole didnt work. No DNS requests worked, on all devices which are connected to pihole.
Then I got an error:
Maximum number of concurrent DNS queries reached (max: 150)

Needed to follow this to fix my issue.
Edit: Nope, it didnt fix my issue. Issue still active.
As in the thread, there was no 02-custom-settings.conf in /etc/dnsmasq.d

Also seen this maaaany requests / logs in my pihole.log:

2023-10-15 13:49:51.666 dnssec-query[DNSKEY] . to 208.67.222.222
2023-10-15 13:49:51.684 reply . is DNSKEY keytag 46780, algo 8
2023-10-15 13:49:51.684 reply . is DNSKEY keytag 20326, algo 8
2023-10-15 13:49:51.684 dnssec-query[DNSKEY] . to 208.67.222.222
2023-10-15 13:49:51.700 reply . is DNSKEY keytag 46780, algo 8
2023-10-15 13:49:51.700 reply . is DNSKEY keytag 20326, algo 8
2023-10-15 13:49:51.701 dnssec-query[DNSKEY] . to 208.67.222.222
2023-10-15 13:49:51.722 reply . is DNSKEY keytag 46780, algo 8
2023-10-15 13:49:51.723 reply . is DNSKEY keytag 20326, algo 8
2023-10-15 13:49:51.723 dnssec-query[DNSKEY] . to 208.67.222.222
2023-10-15 13:49:51.740 reply . is DNSKEY keytag 46780, algo 8
2023-10-15 13:49:51.740 reply . is DNSKEY keytag 20326, algo 8
2023-10-15 13:49:51.741 dnssec-query[DNSKEY] . to 208.67.222.222
2023-10-15 13:49:51.756 reply . is DNSKEY keytag 46780, algo 8
2023-10-15 13:49:51.756 reply . is DNSKEY keytag 20326, algo 8
2023-10-15 13:49:51.757 dnssec-query[DNSKEY] . to 208.67.222.222
2023-10-15 13:49:51.775 reply . is DNSKEY keytag 46780, algo 8
2023-10-15 13:49:51.775 reply . is DNSKEY keytag 20326, algo 8
2023-10-15 13:49:51.776 dnssec-query[DNSKEY] . to 208.67.222.222
2023-10-15 13:49:51.802 reply . is DNSKEY keytag 46780, algo 8
2023-10-15 13:49:51.802 reply . is DNSKEY keytag 20326, algo 8
2023-10-15 13:49:51.803 dnssec-query[DNSKEY] . to 208.67.222.222

Also this on my webinterface dashbord:
image

image

Steps to reproduce

Steps to reproduce the behavior:
Update from latest stable release to development release

Debug Token

I can't see your tricorder, but are you using a custom DNS? Try switching instead to Google DNS, e.g., and see if that changes anything. It looks to me like a lot of root lookups for some reason and I'm wondering if unbound is creating a denial of service attack or something. Maybe also turn off DNSSEC.

but are you using a custom DNS? Try switching instead to Google DNS, e.g

My upstream dns server was OpenDNS. I changed to Google and nothing changed.

OK. What is your DNSSEC setting in the DNS section, out of curiosity?

OK. What is your DNSSEC setting in the DNS section, out of curiosity?

Just a guess but try turning DNSSEC off.

turning DNSSEC off seems to work now.
But WHY?
I didnt change anything after the update.

That question is above my pay grade. Perhaps the resolver test in the text of setting would provide an answer until a developer can weigh in?

tried it:

Your debug log shows your router to distribute its own IPv4 as local DNS server:

*** [ DIAGNOSING ]: Discovering active DHCP servers (takes 10 seconds)
   Scanning all your interfaces for DHCP servers
   
   * Received 548 bytes from 192.168.10.1 @ eth0
     Offered IP address: 192.168.10.33
     DHCP options:
      Message type: DHCPOFFER (2)
      router: 192.168.10.1
      dns-server: 192.168.10.1

This can be a valid configuration if your router would use Pi-hole as its sole upstream, but you'd have to be careful to not point Pi-hole back to your router.
Did you perhaps enable Pi-hole's Conditional Forwarding?

Often, Maximum number of concurrent DNS queries reached is triggered by a DNS loop configuration.

Ocassionally, an unreachable or unresponsive upstream DNS server could have Pi-hole exhaust its connection pool quickly, and less often, a misbehaving client excessively requesting resolution for a set of domains in rapid succession could also trigger it.

What upstream DNS servers is Pi-hole using?

Let's have a look of the highest domain counts by clients:

pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT client, domain, count(domain) FROM queries WHERE timestamp > strftime('%s','now','-1 day') GROUP BY domain ORDER BY 3 DESC LIMIT 10;"

By the way, the DNSSEC resolver test site is permanently down. The working one is DNSSEC Resolver Test

Your debug log shows your router to distribute its own IPv4 as local DNS server:

Yes thats correct. I just want to use pihole on selected devices. What do you mean by: "Be careful not to point Pi-hole back to your router?"

Did you perhaps enable Pi-hole's Conditional Forwarding?

No I havent activated it. But can I? I think I forgot to activate it.

What upstream DNS servers is Pi-hole using?

Open DNS

Let's have a look of the highest domain counts by clients:

Unfortunately I have cleared the logs before you commented. But it was mostly this:

image


Now I changed everything back (activating dnssec and changing upstream to open dns)

pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT client, domain, count(domain) FROM queries WHERE timestamp > strftime('%s','now','-1 day') GROUP BY domain ORDER BY 3 DESC LIMIT 10;"

shows now:


::|.|2530
192.168.10.5|eu-central-courier-4.push-apple.com.akadns.net|101
192.168.10.36|firebaselogging-pa.googleapis.com|85
192.168.10.5|mcs-lb-821760791.us-east-1.elb.amazonaws.com|56
192.168.10.5|fmfmobile.fe.apple-dns.net|49
192.168.10.36|lb._dns-sd._udp.0.10.168.192.in-addr.arpa|42
192.168.10.5|get-bx.g.aaplimg.com|40
192.168.10.36|lb._dns-sd._udp.2.0.0.192.in-addr.arpa|38

This is what I really find interesting and what may help:

pihole-FTL sqlite3 /etc/pihole/pihole-FTL.db "SELECT client, domain, count(domain) FROM queries WHERE timestamp > strftime('%s','now','-1 day') GROUP BY domain ORDER BY 3 DESC LIMIT 10;"

Shows now:


::|.|23150
::|net|242
192.168.10.5|eu-central-courier-4.push-apple.com.akadns.net|164
192.168.10.5|gsp-ssl.ls-apple.com.akadns.net|137
::|com|136
192.168.10.5|mcs-lb-821760791.us-east-1.elb.amazonaws.com|127
192.168.10.5|fmfmobile.fe.apple-dns.net|96
192.168.10.36|firebaselogging-pa.googleapis.com|85
192.168.10.5|get-bx.g.aaplimg.com|81

::|.|23150

This is heavy

Enabling Conditional Forwarding would channel a portion of your DNS requests back to your router, closing a partial DNS loop. This could then have been triggered e.g. when resolving an unknown/inexistent local hostname, and may have prompted the error message.

However, the root domain is usually not conditionally forwarded and traveling to public servers, and this is confirmed by your logs.

This is indeed unusual - I'd expect Pi-hole to validate the root domain (.) a few times a day only.

I've configured my Pi-holes to use 208.67.222.222 as upstream, but have been unable to recreate your observation, neither with v5 nor with a Docker beta-v6.

How do you run your beta-v6, bare metal or Docker?

I'm seeing a similar pattern. v6 running bare metal, fresh install only been running about an hour and already logged all these...

::|.|1629

I installed it directly on my os. No docker or any other virtualization used.

Ok is there something else I can try? If not, ill reinstall my pihole to v5, cause I need my pihole back :smiley:

Thanks!

I thought it started working when you disabled DNSSEC? I'd just do that and keep running pihole-up as the nightly fixes come in.

Yea without DNSSEC enabled it worked. I do this now.

We are not yet able to reproduce this behavior so we also don't know yet what to fix. Could you check out the file /var/log/pihole/pihole.log on your Pi-hole itself (not via the web interface) and quote a few lines from there?

The reason I'm asking is that the web interface is hiding an intermediate part of the log lines showing us which process is doing this. I just want to exclude that there is some heavy TCP worker activity going on.

Also, development-v6 contains many commits of the bleeding-edge dnsmasq that have not been in the latest v5 release. This is by 99% the reason why you are seeing what you are seeing here.

Once we find further clues and we get into the position of reproducing this ourselves, a fix should be done rather quickly.

1 Like