Maximum number of concurrent DNS queries reached (max: 150) When starting a crawl with Yacy

I have a 12 year old notebook with Ubuntu running a Yacy Search Server with Pihole installed on the same machine.

See Yacy forum https://community.searchlab.eu/t/i-have-yacy-and-pihole-running-on-the-same-device-now-may-cure-the-problem-of-slow-crawling-after-a-while/1205

I have raised an issue with yacy see https://github.com/yacy/yacy_search_server/issues/505

Its works ok if I run 3 Piholes together with DNSMasq.
Just a crawl depth of 3 does cause 13500 Requests over 10 min.

https://twitter.com/smokingwheels/status/1576241991293489153.

Thanks

I now have 5 piholes working together and still hit this error at the start of a crawl with all 5.

https://twitter.com/smokingwheels/status/1577111283744538625

I have raised an issue.
Its only a problem now on my old i7 desktop computer the old i7 notebook is ok with 5 piholes.

https://github.com/yacy/yacy_search_server/issues/513

Are people using YaCy and Pi-hole together successfully? From my limited YaCy knowledge it seems like you will always have this tension between these two products.

Pi-hole is essentially, by design, a choke point for DNS queries from a given network, ensuring that all outbound queries can be filtered before being resolved. The typical use-case is a private network for a home or corporation running the kind of services you might expect to find on such networks. As a choke point it is easily wide enough to handle this profile of traffic even for large organisations. This traffic is already subject to a gateway and so Pi-hole's placement fits well in that architecture.

I'd not seen YaCy before but your earlier post intruiged me and I was looking at it this evening. It seems like YaCy's 'horizontal' P2P crawler model will always be cutting across Pi-hole's intended client-server and typical home or corporation traffic model and lead to the swamping and thread problems you're seeing. It almost feels the same as if performing a denial of service DNS attack against your own Pi-holes.

In your tables above you're using OpenDNS. Perhaps you can try switching to a local Unbound instance, since that is a documented configuration and quite straightforward to set up. This is the Unbound setup guide for Pi-hole. Follow the guide carefully and methodically as it contains a lot of useful config tweaks and notes, in particular around configuring the various OS resolver services. Perhaps this is an example where increasing the Unbound default threads from 1 is applicable.

On the one hand lookups should be resolved quicker once Unbound's caching is doing some heavy lifting, which it would seem will happen quickly in this scenario. On the other hand maybe this just frees up slots so they can instantly filled anyway.

Pi-hole's rate limiting can be increased by editing the default 1000 requests per 60 seconds in Settings > DNS or editing the RATE_LIMIT line in /etc/pihole/pihole-FTL.conf and restarting Pi-hole. Perhaps this is worth exploring too.

Thanks for your input @chrislph

Have you tried unbound dns on a raspberry pi 3 B I need in the order of 300 Queries a second not to have any errors?

I have currently 5 Piholes working together using 10 separate upstream DNS servers and it has cut the errors down quite a lot.

My best guess is When Yacy is indexing web pages at approx 4000 PPM there would need to be at least 140 000 DNS queries per 10 min on my 10 year old hardware and an NBN 25 connection.

I'm sure the 3B can handle it as it's not a hardware limitation. Perhaps changing Pi-hole's own logging to Anonymous (Settings > Privacy) will help slightly so it's not logging all these lookups. Give Unbound a try and see how it plays. You can also change the 150 limit but I'd suggest only doing that if you are running Unbound, since you don't want to swamp upstream servers or risk hitting their rate-limiting or worse being blacklisted.

I need to have the full logging for development.

Ok but that is not going to be easy for someone that is new to linux.

I'm just offering suggestions to explore for your own needs.

But fine for someone who can install jdk, ant and git and compile YaCy. Let us know how you get on, as it will also help future Pi-hole users who are running this.

Why are you even routing this DNS traffic throgh Pi-hole? If you are running a web crawler, it would seem you would want unfiltered results.

Yacy is a bit different you can build eg crawl with just the things you like.
It has a few bugs like it tries to download ISO images.
With the a pihole program I have blocked all that and made crawling betta eg not to die. .

If you run Unbound you'll be your own recursive DNS resolver and not need to go out to OpenDNS, as mentioned above. However Unbound can also do its own blacklisting, so perhaps you can just use Unbound on its standard port for both those purposes and take Pi-hole out the loop, making it a tighter setup for this particular use-case. Seems like it's worth some testing. Since this moves away from Pi-hole in the Pi-hole forum I'll leave that idea there.

Can it have 10 different forwarding DNS servers to share the load?

Not if you set it up as a recursive resolver. In recursive mode it communicates directly with the nameservers, avoiding the intermediary services.

You can have as many instances as you need. But I'd test it with a single instance, optimise it for that and gather some metrics to determine what scaling out you may or may not need to do, rather than choose the number 10 up front.

Are you talking about raspberry pi?

Pi-hole does both blocking and forwarding non-blocked upstream for resolution. Upstream can be a local instance of Unbound running in recursive mode on the same Pi as Pi-hole. Since Unbound also has blocking capabilities you may be able to remove Pi-hole from the loop and use Unbound in recursive mode directly. Again I've not teested this but its very capable.

jfb's reply is referring to an instance of Unbound in recursive mode. There are no forwarding servers; it is doing its own resolution from your Pi directly the same sort of way that an external server would be doing it.

My reply is referring to scaling those instances, should you need to, and treating those as your forwarding servers. You can have multiple separate instances of Unbound in recursive mode, each doing lookups. If you have one instance then it's far simpler so I'd start with that, optimise it and gather some metrics. Perhaps it will be plenty to handle what you throw at it.

If you have multiple instances then you'll need a front-end dispatcher of some sort, which I believe Unbound or BIND can also manage, and its job will be to pass the incoming queries to the multiple separate instances of Unbound.

Have a look for unbound cluster, unbound round-robin, unbound multiple instances, that kind of thing. Eg something like this unbound cluster front end may be useful, you'll have to grab it and investigate.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.