Some websites are showing up as unknown in status while others work

Thanks for providing the PCAP via PM. I checked what was going on in your network and found that the second query was in fact a resubmission because Windows was impatient.

Windows resubmitted after waiting only 0.1 seconds! That's pretty odd and a bit low for a timeout, but okay, this is probably among the things that cannot be fixed on Windows.

Now we know what is going on and I can look into reproducing this locally so we can work on a fix.

that's great! i would point out that I do use tcp optimizer on my windows machines with these settings:
image_2020-10-07_174412
image_2020-10-07_174436

I didn't see anything that's immediately relevant, but maybe you will.

Undo the optimizer and see if things work right without it. If so, add back tweaks one at a time until you find the one that is causing it.

I'm very sure it's one of the tweaks.

that's a lot of restarts for something that won't fix this for sure. I'll read later into each tweak to see in-depth if something is more relevant to this case. however if it's a time out thing, those entries were in the 600ms range for some of the sites, it could be a normal timeout.

It's just a single restart. The one that disables the entire list of changes. That will tell you with certainty if the issue is with pihole-FTL of if it's self-inflicted.

Just a quick update: Reproducing this locally turns out to be a lot trickier than I figured initially because Linux (which is the only operating system I have at hand) is trying really hard to prevent me from doing DNS lookups with such a ridiculously low retry timeout :slight_smile:

Still work in progress...

ah Linux, allowing you to do stupid things if you want to, but you'll have to work hard for that. yeah, windows is a bit more flexible with user errors. anyway, I might have the time today to restore the settings of the TCP optimizer to defaults and check if that's the cause.

I honestly disagree. From what I know, the registry is a beast you don't want to edit manually. And you can only tweak such things in Windows using third-party software.

Anyway, even when I was able to reproduce retried queries by sending queries with the same query ID in short succession, I was not able to reproduce exactly what you saw. However, I'm currently on my somewhat limited mobile setup and will try to reproduce this at home next week.

So far, the proposed change is documented here:

windows is a bit more flexible

I mentioned it's flexible to user errors, I worked as a pc tech for most of my careers, the ease with which normal users can destroy windows with a few clicks be it with 3rd party software or just randomly, is astounding.

as for the change, that's great! I hope this also helps other people, maybe ones with lower-end hardware or low memory or something.

haven't gotten around to testing the TCP optimizer, maybe tomorrow. will update with the results.

yep, reverted to windows defaults and the issue disappeared. will test further to see if it's just temporary.
EDIT: was wrong, it did not change, and I saw this happen on a computer in the network I'm pretty sure I did not use the optimizer on.

Hmm, strange that we are not seeing this from other users on Windows (at least there are no reports). Anyway, a method to handle this is on its way. I hope this will work for you as well.

@Scepterus Could you try

pihole checkout ftl fix/retries_master

and see the situation changes?

@DL6ER do I need to restart after that? because I didn't and I still see those.

Restart shouldn't be necessary. I guess it may be something else (or rather: in addition) then. I'll keep looking for it.

Can we reproduce this on a Mac?

thanks!

I think I found a possible explanation in TCP optimizer. there's a setting called "Retransmit Timeout" the description for it says it determines the time before connections are aborted.

now you can see in my screenshot the initial time is 2 seconds, and the minimum time is 300 ms. that would explain the queries that took more than 600ms to respond showing up as unknown.

however, queries should mostly not take that long to respond, I'm using Cloudflare DNS which has a very fast response time, around 60-80ms.

Hmm, yes, that's indeed interesting. Can you test the delay for some random domains you have not queried before? Like

dig ebay.com @1.1.1.1
dig ikea.com @1.1.1.1

and some else, checking the reply time (right at the bottom)?

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> ikea.com @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 704
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;ikea.com.                      IN      A

;; ANSWER SECTION:
ikea.com.               300     IN      A       204.74.99.103

;; Query time: 65 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Oct 13 13:25:48 IDT 2020
;; MSG SIZE  rcvd: 53

Did not enter IKEA at all, so it's a new site. it's 65 ms.

this is one I ran:

dig blizzard.com @1.1.1.3

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> blizzard.com @1.1.1.3
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14091
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;blizzard.com.                  IN      A

;; ANSWER SECTION:
blizzard.com.           129     IN      A       137.221.106.104

;; Query time: 188 msec
;; SERVER: 1.1.1.3#53(1.1.1.3)
;; WHEN: Tue Oct 13 13:27:31 IDT 2020
;; MSG SIZE  rcvd: 57

I will watch my network to see if something is using the upload to the limit of my isp's bandwidth. would have been nice to have a dashboard in pihole for traffic that at least goes through the pihole.

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> get.paleorecipebook.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40500
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;get.paleorecipebook.com.       IN      A

;; ANSWER SECTION:
get.paleorecipebook.com. 300    IN      CNAME   unbouncepages.com.
unbouncepages.com.      60      IN      A       54.93.101.66
unbouncepages.com.      60      IN      A       18.196.95.178

;; Query time: 339 msec
;; SERVER: 1.1.1.3#53(1.1.1.3)
;; WHEN: Tue Oct 13 14:05:16 IDT 2020
;; MSG SIZE  rcvd: 112