Slow Initial DNS lookups

Good afternoon.

Have been running Pi-Hole for a couple of years now but constantly having an issue with slow initial DNS lookups before the IP is cached. Running Pi-Hole on a RPi with a Unifi Dream Machine router.

Expected Behaviour:

Initial DNS lookup should be about the same speed as querying the forwarded DNS server, and the subsequent requests should be faster

Actual Behaviour:

Initial DNS request is very slow, but then fast when in cache. For example...

$ dig discourse.pi-hole.net @ | grep time
;; Query time: 662 msec

$ dig discourse.pi-hole.net @ | grep time
;; Query time: 65 msec

dig discourse.pi-hole.net @ 1.1.1.3 | grep time
;; Query time: 63 msec

Requests in the GUI query log are showing 300-400ms for each request

Thanks

Debug Token:

https://tricorder.pi-hole.net/5tNPhWeQ/

Those first two dig commands are missing the name server and the third dig command has a space before the IP address. I guess you manually typed them in here, but, just to be sure, what were the correct commands used in each case?

Were they entered on the Pi-hole or on a computer on the network? If the latter, was it on the 192.168.0.0 subnet or one of the other ones?

Is IPv6 appearing in the output (eg in the SERVER)? Try adding -4 to the dig commands to force IPv4 to see if that changes anything.

You can use random subdomains of example.com to ensure that each test is not initially cached, eg testxyz1.example.com then testxyz2.example.com, etc.

Hi chrislph

Yes I copy and pasted those commands, then manually deleted the server ip address. The space is because it was reading them as trying to mention users. Should probably have put it in code brackets instead.

Anyway. Those commands were entered on my laptop on the network. It is on the default 192.168.0.0.subnet.

Tried with the -4 switch.

$ dig -4 testxyz1.example.com @192.168.0.2 | grep time
;; Query time: 527 msec
$ dig -4 testxyz1.example.com @192.168.0.2 | grep time
;; Query time: 72 msec
$ dig -4 testxyz1.example.com @1.1.1.3 | grep time
;; Query time: 55 msec

Thanks

Out of interest what do these commands give you? They will measure the resolve time plus the overall time. I suspect the overall time will be a fixed addition and the resolve time really is as shown, but interesting to check.

time dig testxyz2.example.com @192.168.0.2 | grep time

same again

time dig testxyz2.example.com @192.168.0.2 | grep time

then

time dig testxyz2.example.com @1.1.1.3 | grep time

Eg for me using my Pi-hole

$ time dig testxyz2.example.com @192.168.1.2 | grep time
;; Query time: 13 msec
real 0m0.025s
user 0m0.003s
sys 0m0.007s

13ms for the resolve time and 25ms (the "real" line) overall, so an approx 12ms local execution and networking addition.

Yes, as you suspect. User and Sys time pretty constant.

$ time dig testxyz3.example.com @192.168.0.2 | grep time
;; Query time: 328 msec

real	0m0.336s
user	0m0.003s
sys	0m0.006s
$ time dig testxyz3.example.com @192.168.0.2 | grep time
;; Query time: 40 msec

real	0m0.049s
user	0m0.003s
sys	0m0.007s
$ time dig testxyz3.example.com @1.1.1.3 | grep time
;; Query time: 38 msec

real	0m0.046s
user	0m0.003s
sys	0m0.005s

It's one of those problems where some trial and error is needed, especially since you have virtual network adapters on eth0 and various subnets in use.

Some other ideas

Try testing on different clients. Do they all exhibit this behaviour?

Temporarily eliminate the virtual adapters and subnets in order to test. Try the same experiments on the 192.168.0.0 subnet with what you might call a "normal" setup. Do you still see this same latency on the first lookup of uncached domains? Return the setup to how it is now once done testing.

Try setting the client's local DNS to be those Cloudflare servers directly. Test again. Return to current setup once done testing.

Try taking Pi-hole out the loop and using the UDM's DNS service, with those same filtered Cloudflare servers. Test again.

Try using different upstream servers. Perhaps these filtered Cloudflare servers are partly responsible in how they perform. Try the normal unfiltered servers on 1.1.1.1 or another vendor such as Quad9's 9.9.9.9 and test again to compare.

Try temporarily disabling Homebridge on the Pi and see if that affects timings.

See if Ubiquiti support has any ideas on what might be going on, especially if the above tests indicate that the problem goes away when the external servers are used directly.

I was going to suggest Wireshark could be used to inspect the timings but I don't think that would work because the delay appears to be a time somewhere else, and Wireshark would just end up telling you what dig has already told you. Someone else may have some more ideas to try.

How are your clients connected to your router?
From what machine did you run those digs from?
What kind of connection is your router using for internet access (LTE, satellite, DSL, fibre,...)?

Apart from the initial hundreds of msecs, even times of 60 or 40 msec seem way too high for local network traffic.

For cached records, I see around ~1 to 3 msecs from wired and ~4 to 10 msecs from wireless clients directly connected to my router.

That particular laptop was a 2019 macbook pro on WIFI at the time. I can try later on a wired connection. Internet connection is fibre.

Same results with wired connection. This is using pi-hole as DNS served by DHCP server.

$ time dig testxyz3.example.com | grep time
;; Query time: 488 msec

real	0m0.514s
user	0m0.004s
sys	0m0.020s
$ time dig testxyz3.example.com | grep time
;; Query time: 38 msec

real	0m0.046s
user	0m0.003s
sys	0m0.006s

Hmm... with a bit more poking, I'm starting to think this might be a router issue (or such) rather than a pi-hole issue. I get the same results with the dns set directly on the machine. Same with google's dns set on the machine

$ time dig testxyz.example.com | grep time
;; Query time: 692 msec

real	0m0.701s
user	0m0.003s
sys	0m0.006s
$ time dig testxyz.example.com | grep time
;; Query time: 70 msec

real	0m0.078s
user	0m0.003s
sys	0m0.006s

I always assumed it was pi-hole as I have another vlan/ssid that bypasses the pi-hole and uses the unfiltered dns servers instead, and that didn't have the issue. Having just tried that vlan wired using google's DNS servers, it doesn't have the issue. Still getting constant 85-86ms times though :o/

Thanks for the help, I'll try digging deeper, or head over to the ubiquti forums :o)

The tests above will help rule various things out. Perhaps the UDM itself is messing with traffic in some way. I see they have some Quality of Service and filtering features. Perhaps it is messing around with all DNS traffic that passes through it, adding a delay to each 'new' analysis.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.