FTLDNS - CPU Usage Observation

Hey Everyone,

I was just doing some comparisons between plain dnsmasq and FTLDNS with regards to resolution speed and CPU usage. It appears FTLDNS has some overhead when compared to dnsmasq (though, in terms of realistic usage, I doubt this is noticeable). Not trying to nitpick... Just curious what kind of hooks are being made in FTLDNS that would cause the difference.

dnsmasq:
Max CPU - ~60%
Cached - 1ms
Uncached - 90ms
DotCom - 70ms

FTLDNS:
Max CPU - ~90%
Cached - 16ms
Uncached - 101ms
DotCom - 74ms

#edited - fullscreen gif

Thank you very much for your analysis, it is highly appreciated!

For each incoming query, dnsmasq simply logs the query. FTLDNS does, however, much more. It thoroughly analyzes this query by first applying some filters (some users don't like to see queries from localhost queries being analyzed, some don't want parts of the analysis due to privacy settings, ...) and then it is registered in the internal database that holds all queries of the most recent 24 hours in memory. This involves quite a few lookups of clients and domains because FTL doesn't store domains and clients for each query individually but has a refined very memory efficient data structure.

Also note this is also a reason that a test with such a tool might not be very meaningful, because in reality, you wouldn't be querying random domains (and then only once) but it is much more likely that you query domains you have queried before several times. FTL's data structure is optimized for the later (more realistic case) and the overhead should be reduced in this case.

Having said all that, the hooks are actually pretty extensive as they have to take care of any possible combination of type and source of queries, their path (being forwarded, answered from cache, configuration of maybe blocking list), their response (CNAME, IP, NODATA, NXDOMAIN, ...), and some more things. Also, note that we compute all statistics (like blocked over time, top lists, etc.) on the fly so that we can present them within a millisecond on request of the API.

2 Likes

Thank you for the explanation @DL6ER. As you mentioned, with real world usage, there is no perceivable impact. However, now I have an idea of what is going on "behind the scenes."

Thanks!

3 Likes

It might be interesting to see if we can make some of the analysis asynchronous...

Just for fun, I've been reading around about different ways DNS Resolver software logs data. Any kind of extra operations in the resolving code will add overhead (logging to files, etc.). It appears that this is pretty common (and appears to be known for its performance):

http://dnstap.info/

I don't think dnsmasq supports this, but it is an interesting concept.

For statistics, you mentioned they are computed on the fly. Do you mean in the same "path" as the resolving code? Or on the fly when pulled by the admin web pages? I noticed reference to sqlite in the FTL code, are the statistics being pulled from that db? I don't really know if that would be any faster... just thinking out loud. That would also give somewhat of a clear separation between the resolver (during resolve, log records to the db), and then pulling stats via appropriate SQL queries for the admin pages (which could be done outside of the actual resolving code).

I'm not very good with C, so I'm just thinking out loud (maybe some of this is already done)... I also want to make sure it is known that none of what I am saying should be taken as nitpicking/complaining. I think this is a great product as is, and I'm sure you all have bigger fish to fry. :slight_smile:

David

FTL originally read from the dnsmasq log file, and would generate the statistics as it read the queries in. Now with the resolver in FTL, it still generates the statistics as the queries come in, but does so while the query is being processed.
So the benefit is that instead of going through thousands of queries to generate statistics when the API is hit, FTL incrementally updates the statistics on every DNS query (which doesn't take too much time). This processing could potentially be sped up by asynchronously updating the statistics so as to not block dnsmasq from responding to the query (kind of like how it was before).

This was the first idea I had when implementing FTLDNS, I tried two paths for asynchronism:

  1. Spawn a new analysis thread for each incoming query/reply/cache response/...
    Benefit: The resolver would just have to start a new thread with given parameters like the domain name, the client, etc. The new thread would then take care of the analysis while the resolver can process the next queries.
    Drawback: There is a significant overhead of a few milliseconds with starting and terminating threads (even when set to not-synchronizing at the end of their lifetime). It turned out that it was fast to just do the analysis in 99/100 cases compared to the time it takes to only create a new thread (the analysis would then still have to come). Furthermore, it is not guaranteed that threads are executed in order. So, on systems with only one core (can only process one thread at a time) but a huge rate of incoming queries, it may be that threads accumulate and pile up. Eventually, the CPU might decide to work on reply threads for which the query thread did not even succeed so far and race-collisions would kill everything.

  2. Have one analysis thread that works on a given list of events
    Benefit: The resolver would append events (in whatever form) to a long list and then continue operation.
    Drawback: This list would have to be very generic to be able to cover all possible events, i.e., it would need lots of memory. Furthermore, we would need to allocate much more memory dynamically, as currently we can rely that all string analyzed are still valid, as the resolver is still processing this very same query. However, once we decouple the resolver from the analysis routines, we'd have to make sure that all strings are temporarily copied for possible delayed analysis. Although this sounds promising to be able to give some milliseconds of speedup, it would create a massive additional workload of bookkeeping which strings we allocated and when they can again be freed, etc. This could easily make the code much less readable and easily kill the initially though of advantages due to added bookkeeping complexity.

I did some tests myself after looking again at your speed measurements and don't seem to be able to reproduce this huge difference in reply time to cached domains (partial quote follows):

I ran this test on a Raspberry Pi 3 Model B. It is also running a few other services next to Pi-hole and is constantly on a load of around ~1.5
When I repeat it several times, the cache reply hovers around 3-6 msec. However, I can confirm that on the same machine, dnsmasq is usually able to deliver the reply within 1-2 msec, so I will check for speed-up potential that doesn't make the code harder to read nor to maintain.

I just wanted to share that my data more or less agrees with iamperson347 - FTLDNS on my RPi-0 consumes more CPU and has slower cache resolutions than DNSMASQ - these are all subject to some variability, but in general

CPU went from ~5% under low loads to ~15%
Cached DNS Response went from ~1-5ms to ~7-10ms according to DNS Benchmark

90% of the time these have almost no impact, but I have noticed that occasionally with high loads DNS resolution will time out and I'll have to wait a bit to refresh a dead page. Something I didn't experience before FTLDNS.

How does this fit to an apparent additional cost of maybe 5-10 msec? Can you give some log excerpts from when this timeout happen? How many queries do you have within a typical 24hrs period?

Pi-Hole is on my home network, sees ~20k queries/day. I'll keep an eye on things and try and pull a debug for you next time the DNS drops out. It may be something unrelated to the higher CPU usage, though it was not an issue before dnsmasq (this could just be some form of confirmation bias though...)

That was sooner than I expected - had Wikipedia drop out on me (I'm guessing it wasn't on their end...)

kslh4lqrq3

Following that was a CPU usage spike, but it looks like it may be tied to FTLDNS restarting. Hopefully the debug log is helpful.

Hmm, not really. Is your /var/log/pihole-FTL.log really empty?

Also, I see there

   Apr 24 10:17:02 dnsmasq[20927]: exiting on receipt of SIGTERM
   Apr 24 10:17:06 dnsmasq[28965]: started, version pi-hole-2.79 cachesize 10000

which is an externally triggered restart rather than a crash, so there be something more somewhere.

Just checked, it is 100% empty. I'm guessing that's not normal?

I'll look into what may be triggering the restart... I suspect it's somehow tied to DietPi OS, or possibly unattended-upgrades, since the Pi-Hole is running on a 100% clean install otherwise...

Yeah, this is not normal. It should contain at least starting or exit or possibly re-allocation messages (but re-allocations are not needed that often with FTLDNS).

Oh yes, I think @PromoFaux has mentioned that he saw an issue report with/on DietPi somewhere. Pinging him for input.

@Matt DietPI; every night, dnsmasq is enabled; pihole-FTL not running anymore. · Issue #2109 · pi-hole/pi-hole · GitHub

I saw that back when I was having the same nightly reboot/death issue, and took the bandaid approach - systemctl mask dnsmasq which fixed the nightly reboot errors. I believe I also followed the instructions to use systemctl enable pihole-FTL but I cannot remember. Just in case I have run that again. I'll update if/when FTL spontaneously reboots.

I've definitely seen the cache resolution times differ. I don't think it's typically as high as my original post. I'm not sure what causes the variance.

I'm not sure if anything changed in FTLDNS, but I'm getting better numbers after testing with ubuntu 18.04. (I was on debian 9 before.)

Max CPU - ~20%
Cached - 1-2ms
Uncached - 51ms
DotCom - 36ms

1 Like

This is interesting, we haven't changed anything in FTLDNS that could lead to a possible speed up.