Some websites are showing up as unknown in status while others work

Hmm, yes, that's indeed interesting. Can you test the delay for some random domains you have not queried before? Like

dig ebay.com @1.1.1.1
dig ikea.com @1.1.1.1

and some else, checking the reply time (right at the bottom)?

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> ikea.com @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 704
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;ikea.com.                      IN      A

;; ANSWER SECTION:
ikea.com.               300     IN      A       204.74.99.103

;; Query time: 65 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Oct 13 13:25:48 IDT 2020
;; MSG SIZE  rcvd: 53

Did not enter IKEA at all, so it's a new site. it's 65 ms.

this is one I ran:

dig blizzard.com @1.1.1.3

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> blizzard.com @1.1.1.3
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14091
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;blizzard.com.                  IN      A

;; ANSWER SECTION:
blizzard.com.           129     IN      A       137.221.106.104

;; Query time: 188 msec
;; SERVER: 1.1.1.3#53(1.1.1.3)
;; WHEN: Tue Oct 13 13:27:31 IDT 2020
;; MSG SIZE  rcvd: 57

I will watch my network to see if something is using the upload to the limit of my isp's bandwidth. would have been nice to have a dashboard in pihole for traffic that at least goes through the pihole.

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> get.paleorecipebook.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40500
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;get.paleorecipebook.com.       IN      A

;; ANSWER SECTION:
get.paleorecipebook.com. 300    IN      CNAME   unbouncepages.com.
unbouncepages.com.      60      IN      A       54.93.101.66
unbouncepages.com.      60      IN      A       18.196.95.178

;; Query time: 339 msec
;; SERVER: 1.1.1.3#53(1.1.1.3)
;; WHEN: Tue Oct 13 14:05:16 IDT 2020
;; MSG SIZE  rcvd: 112

Should be about zero (only a few kilobytes), I don't think it'd be really worth it. Also, keep in mind Pi-hole should also run on very low-end devices like Raspberry Pi Zero. Bandwidth monitoring isn't all that trivial and there is a ton of external software dedicated for monitoring. We would always only be running after what they already offer, so I suggest installing something like munin or RPi-monitor (note: I haven't checked if they can actually do network traffic diagnostics, but I assume so).

These are pretty slow, is a ping to their servers similarly slow?

Reply from 137.221.106.104: bytes=32 time=217ms TTL=49
to blizzard.

I made sure to have at least 100KB upload free at all times for the pihole, and it still happens.

Also, keep in mind Pi-hole should also run on very low-end devices like Raspberry Pi Zero

I understand that, I meant as an optional component. but you are correct, only dhcp and dns traffic goes through it.

just noticed something strange, there's another type of unknown entry, not 0 but 12. here's a screenshot:

/var/log/pihole.log:16747:Oct 14 06:19:39 dnsmasq[653]: 50855 192.168.55.100/57402 query[A] www.fosshub.com from 192.168.55.100
/var/log/pihole.log:16748:Oct 14 06:19:39 dnsmasq[653]: 50855 192.168.55.100/57402 forwarded www.fosshub.com to 208.67.222.123
/var/log/pihole.log:16749:Oct 14 06:19:39 dnsmasq[653]: 50856 192.168.55.100/57402 query[A] www.fosshub.com from 192.168.55.100
/var/log/pihole.log:16750:Oct 14 06:19:39 dnsmasq[653]: 50856 192.168.55.100/57402 forwarded www.fosshub.com to 1.1.1.3
/var/log/pihole.log:16751:Oct 14 06:19:39 dnsmasq[653]: 50856 192.168.55.100/57402 forwarded www.fosshub.com to 208.67.222.123
/var/log/pihole.log:16753:Oct 14 06:19:39 dnsmasq[653]: 50856 192.168.55.100/57402 reply www.fosshub.com is 104.20.136.9
/var/log/pihole.log:16754:Oct 14 06:19:39 dnsmasq[653]: 50856 192.168.55.100/57402 reply www.fosshub.com is 172.67.32.78
/var/log/pihole.log:16755:Oct 14 06:19:39 dnsmasq[653]: 50856 192.168.55.100/57402 reply www.fosshub.com is 104.20.137.9

you'll notice, it's less than 300ms, so at least this kind of unknown is not related to the delayed response.

From this I cannot see that it is less than 300msec. This new 12 is actually what I added. I'll need to update the web interface so it shows forwarded (retried later) here.

so it is new, I just almost never scroll down. anyway in the screenshot the green query shows 244.8ms


same thing, less than 200ms, so I'm guessing there's another cause for this.

Okay, can you run the tcpdump and generate a pcap once again? May just have been a coincidence that last time we picked up the retried scenario which we can now distinguish from this one.

sent it to you in a pm.

I did the required low-level DNS traffic inspection of your data sent via PM now. Thank you very much for the traffic snippet. Apparently, it is a specific timing issue we're facing here. One that was, admittedly, quite hard to extract from the wirecapture. I will put up the exact sequence that lead to what you're seeing (anonymized) below for later reference:

Delay Sender <--> Destination Activity
0 ms Windows --> Pi-hole Query A a.b.c is made *
1 ms Pi-hole --> upstream Query A a.b.c is sent upstream
60 ms Pi-hole <-- upstream Upstream reply arrives for A a.b.c
60 ms Pi-hole --> upstream DNSSEC query: DS c
118 ms Pi-hole <-- upstream Upstream reply arrives for DS c
119 ms Pi-hole --> upstream DNSSEC query: DS b.c
187 ms Pi-hole <-- upstream Upstream reply arrives for DS b.c
188 ms Pi-hole --> upstream DNSSEC query: DNSKEY c
219 ms Windows --> Pi-hole Query A a.b.c is retried **
220 ms Pi-hole --> upstream DNSSEC query: DNSKEY c is retried by Pi-hole ***
264 ms Pi-hole <-- upstream Upstream reply arrives for DNSKEY c
270 ms Pi-hole --> Windows Reply to the original request
282 ms Pi-hole <-- upstream Upstream reply arrives for retried DNSKEY c ****

(delay is relative time from first query)

Extra comments:

  • * This query is shown as green OK (forwarded) in the Query Log.
  • ** This is the query that is shown as Unknown (0). FTL is ignoring a retired query here because it already has the reply, however, it is not ready to send it to the requestor because DNSSEC verification is still ongoing.
  • *** The too-soon retry of the query leads FTL to the (somwhat wrong) assumption that the DNSSEC verification took to long. As a result, the DNSKEY c query is retried. This is without any consequences otherwise, so I won't change this minor bit.
  • **** This is the reply to the retried DNSKEY query. It is silently ignored as we already have the answer.

Result of this analysis: Reproducing this exact chain of things is a bit tricky, however, you seem to be able to routinely get into this scenario due to your specific tuning settings.

@Scepterus Please update your Pi-hole to get the latest version and verify that there are no Unknown (0) queries any longer. Instead, there should be only Unknown (12) (retried queries) and Unknown (13) (retried during DNSSEC verification).

4 Likes

hey, that's great, glad I could help. I will keep checking this today to see that I do indeed get only the 12 and 13, they will show up at the bottom if I sort by status right?

also if this is indeed the case, how do we proceed from here?

EDIT: ah sadly the saga is not over, see screenshots.


My bad, I told FTL to add the 13 status to the original instead of the retried query. While this is correct for the "regular" retried queries (the new ones take over, we flag the original query as being ignored), it is wrong in this case (the new query is simply ignored, the old one survives).

So one more update and try, please.

It will be merged into development and automatically become available for all through the next release of Pi-hole. You can stay on this test branch but have to remember to run pihole checkout master to get back on track before doing the next pihole -up. Otherwise, custom checked out branches are preserved across updates. This is intended behavior.

[i] Downloading and Installing FTL...pihole-FTL-arm-linux-gnueabihf: FAILED
sha1sum: WARNING: 1 computed checksum did NOT match
  [✗] Downloading and Installing FTL
  Error: Download of https://ftl.pi-hole.net/fix/retries_master/pihole-FTL-arm-linux-gnueabihf failed (checksum error)
  [✗] FTL Engine not installed

  Unable to complete update, please contact Pi-hole Support

that's what I get when I try to update.

This is strange. I triggered a rebuild on our automated system. However, due to many jobs being run at the same time, it may take up to 30 minutes until the new binaries are ready. Please try again after some time today.

ok, it updated now, and so far 0 unknown (0) they are now mostly 13 and a few 12.
what's next? do we need to change something? or just keep watch to make sure no more 0's appear?
are these (13 and 12) going to just stay that way? or is this the first step in fixing them or just renaming them?

thanks for your patience so far with this!

No.

That'd be good, however, I'm fairly certain we found them all.

Yes, almost. They will get nicer names.

ok great! the 0's did not come back since yesterday, so I'll return to master now.
this thing did however make me want to open another bug/feature request, fix the query list status sort. but I'll open a new one for that, thanks!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.