I'm experiencing an issue where, when I first try to visit a website, the browser shows an error message saying the site doesn't exist or is down. However, if I refresh the page or wait for a second or two, it eventually loads without any problems.
This issue is also affecting other applications, such as connecting to game servers. Initially, it will say "couldn't connect," but then a few seconds later, or after restarting the game, the connection works fine.
My network is setup thus -> ATT Passthrough --> pfSense --> TPLink Managed Switch (1 VLAN)
TPLink Managed Switch -> Deco Mesh Router (192.168.86.0/24)
TPLink Managed Switch -> piHole (10.0.0.10)
I did some digging around and it seems the first query for websites that are cached are very slow. That's why I keep on getting a "Not Found"
My current DNS setup is that I use the DNS Resolver in pfSense in "Forward Only" mode. It forwards the request to pihole on a separate VLAN. Pihole uses an unbound server running on the same machine to answer queries.
It's not Pi-hole that's taking its time.
As a recursive resolver, unbound will take longer to resolve a domain if it has to walk the full recursion chain. Exact resolution times would also depend on your public upstream connectivity, of course.
That said, 199 ms seems reasonable for a combined A/AAAA lookup.
For comparison, my unbound resolves the A record for `samba.org` in about 150ms:
$ dig -p 5335 @127.0.0.1 samba.org
; <<>> DiG 9.11.5-P4-5.1+deb10u11-Raspbian <<>> -p 5335 @127.0.0.1 samba.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63634
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;samba.org. IN A
;; ANSWER SECTION:
samba.org. 86400 IN A 144.76.82.148
;; Query time: 148 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1)
;; WHEN: Fr Nov 08 22:10:58 CET 2024
;; MSG SIZE rcvd: 54
Times in that range are normal with unbound, and they commonly won't trigger the messages you observe.
What is the exact message you see?
Also, it would be helpful to know if and how Pi-hole has handled the resolution requests triggered by opening a website, by inspecting Pi-hole's Query Log.
However, your debug log shows logging is disabled:
-rw-rw-r-- 1 pihole root 193 Nov 1 03:12 /etc/pihole/pihole-FTL.conf
PRIVACYLEVEL=0
For analysing purposes, you should consider to re-enable Pi-hole's query logging.
I am guessing I would see better response times once the unbound logs prime up ? Do you want me to upload another debug log with QUERY LOGGING turned on ?
If I use an external DNS like Quad9 or Google the issue goes away. How do I mitigate this issue ? Maybe this will go away after pihole and unbound's cache is primed ?
No, unbound's logging would be very unlikely to contribute to your observation.
Cached DNS records may be kept at most for as long as a domain's TTL allows, so unbound rewalking the resolution chain is expected to happen regularly over time.
And as already explained, your unbound's resolution times look normal, and they commonly won't trigger the browser messages you observe, which you haven't detailed yet.
To analyse your issue, you need to monitor Pi-hole's Query Log when you try to open a website and actually encounter those messages, to see if your browser's resolution requests associated with that site do register in Pi-hole, how Pi-hole has handled them, and to share the actual browser message as you observe it.
Your above Query Log screenshot suggests that you've just run an nslookup for samba.org, which wouldn't give us any of those details.
It does, however, demonstrate that your pfsense.local.lan has a problem resolving samba.org.local.lan, as it replies with SERVFAIL where NXDOMAIN would have been expected. This is unlikely to contribute to your observation, as a browser would usually not send queries expanded by your local domains, but it may warrant separate attention.