Why do you think unbound
should be relevant for this feature? The majority of Pi-hole users will likely run FTL with a distant upstream DNS. We are looking at implementing this somewhere but only inside Pi-hole itself.
The CPU utilization is a difficult, maybe misleading measure here as it is a momentary unit. Better suited would be either looking at load or memory utilization. However, the best metric for measuring the performance of Pi-hole should be the delay of replies. This is obviously difficult with queries that are sent upstream, however, replies answered from cache undergo (mostly) the same routines and can be used to measure.
Say your local delay is 2 msec and you have 4 active clients making 10 queries per second, this means Pi-hole will be busy for 0.002410 = 40 msec per second. This is a "busyness" of 8%.
If you, however, use the same hardware and due to your many regex filters, the delay per query is, say, 100 msec, then the business will be 0.100410 = 400%. This means the Pi-hole would only be able to reply to 1/4-th of the incoming queries in time and a certain backlog will build up. Clients who retry queries because of the delay make things only worse.
I hope this example makes it clear why we have to keep the delay (= the work per individual query) as low as possible. I should also say that I have never seen a delay coming even close to 100 msec even with regex filter lists going into the hundreds.
Having said all that, I spent a few hours writing and testing a suitable framework for keeping the majority of our current regex performance also with the newly proposed way of doing things. This could very well reduce the additional amount of work from N-times (where N may be a large number in unfortunate scenarios) to two-fold, which is obviously a much better compromise. Lots of implementation work still has to be done but we are, again, on a good track now.