Log entry for DNS type "TYPE5335" with reply of "BLOB"

This is a standard install of Pi-hole and Unbound (via Pi-hole guide) on Pi OS 64-bit.

On the Pi-hole dashboard I noticed that query type OTHER was showing so clicked it to show the relevant entries from the query log. It shows a solitary entry, which in the larger log is the middle one with TYPE5335 below.

I couldn't find any prior posts relating to this. I wonder if it may be a bug that has caused the reply from Unbound to be parsed incorrectly, resulting in the port number and some binary data finding its way into Pi-hole's database. Or, conversely, a problem which has caused Pi-hole to request this record type. I'm suspicious of the resolver's local port number being the same as the query type logged and the preceding A query with a status of Unknown (0).

I had just visited amazon.co.uk and logged in, and the logs around that time contain a flurry of requests for loads of different TLDs for Amazon domains. It appears this is related to the content and various web apps which load up in their web store. These resulted in around 500 lines in pihole.log in around 4 seconds. Could this be related to the SD card's performance or integrity?

I did some further dig commands for this same domain with and without www on the same client and they were good, no problems. [ EDIT, screenshot of dig www.amazon.com.mx from same client shown below with expected behaviour of DS record request and no data returned on non-DNSSEC domain ]

[ EDIT – another one just happened, same sequence, this time no flurry of logs to implicate since it was a different domain that was quiet ]. What am I seeing here?

Expected Behaviour:

Normal record types and no BLOBs being received

Actual Behaviour:

As above

Debug Token:

https://tricorder.pi-hole.net/ty5o3DOa/

Thankyou,
Chris

Please post the matching query, forward and reply from the dnsmasq log at /var/log/pihole/pihole.log

Related log entries for the 1st image, which resulted in the TYPE5335 and BLOB, caused by logging in to Amazon UK and it querying a flurry of amazon-related TLDs, one of which was www.amazon.com.mx

Sep 15 06:58:48 dnsmasq[636]: query[A] www.amazon.com.mx from 192.168.1.121
Sep 15 06:58:48 dnsmasq[636]: forwarded www.amazon.com.mx to 127.0.0.1#5335
Sep 15 06:58:49 dnsmasq[636]: dnssec-query[DS] mx to 127.0.0.1#5335
Sep 15 06:58:49 dnsmasq[636]: reply mx is DS keytag 61885, algo 8, digest 2
Sep 15 06:58:49 dnsmasq[636]: dnssec-query[DS] com.mx to 127.0.0.1#5335
Sep 15 06:58:49 dnsmasq[636]: dnssec-query[DNSKEY] mx to 127.0.0.1#5335
Sep 15 06:58:49 dnsmasq[636]: reply mx is DNSKEY keytag 61885, algo 8
Sep 15 06:58:49 dnsmasq[636]: reply mx is DNSKEY keytag 19441, algo 8
Sep 15 06:58:49 dnsmasq[636]: reply com.mx is DS keytag 61885, algo 8, digest 2
Sep 15 06:58:49 dnsmasq[636]: dnssec-query[DS] amazon.com.mx to 127.0.0.1#5335
Sep 15 06:58:49 dnsmasq[636]: query[A] www.amazon.com.mx from 192.168.1.121
Sep 15 06:58:49 dnsmasq[636]: dnssec-retry[DS] amazon.com.mx to 127.0.0.1#5335
Sep 15 06:58:49 dnsmasq[636]: dnssec-query[DNSKEY] com.mx to 127.0.0.1#5335
Sep 15 06:58:49 dnsmasq[636]: reply com.mx is DNSKEY keytag 61885, algo 8
Sep 15 06:58:49 dnsmasq[636]: reply com.mx is DNSKEY keytag 19441, algo 8
Sep 15 06:58:49 dnsmasq[636]: reply amazon.com.mx is no DS
Sep 15 06:58:49 dnsmasq[636]: reply www.amazon.com.mx is <CNAME>
Sep 15 06:58:49 dnsmasq[636]: reply tp.4e37fb303-frontier.amazon.com.mx is <CNAME>
Sep 15 06:58:49 dnsmasq[636]: reply www.amazon.com.mx.edgekey.net is <CNAME>

Related log entries for the 2nd image, which was me doing a dig A on the the same domain, and it worked okay

Sep 15 09:01:48 dnsmasq[636]: query[A] www.amazon.com.mx from 192.168.1.121
Sep 15 09:01:48 dnsmasq[636]: forwarded www.amazon.com.mx to 127.0.0.1#5335
Sep 15 09:01:48 dnsmasq[636]: dnssec-query[DS] amazon.com.mx to 127.0.0.1#5335
Sep 15 09:01:48 dnsmasq[636]: reply amazon.com.mx is no DS
Sep 15 09:01:48 dnsmasq[636]: reply www.amazon.com.mx is <CNAME>
Sep 15 09:01:48 dnsmasq[636]: reply tp.4e37fb303-frontier.amazon.com.mx is <CNAME>

Related log entries for the 3rd image, which was loaded when I visited codepen.io, and saw another entry for TYPE5335 and the BLOB

Sep 16 00:25:23 dnsmasq[2899]: query[A] static.codepen.io from 192.168.1.121
Sep 16 00:25:23 dnsmasq[2899]: forwarded static.codepen.io to 127.0.0.1#5335
Sep 16 00:25:23 dnsmasq[2899]: dnssec-query[DS] io to 127.0.0.1#5335
Sep 16 00:25:23 dnsmasq[2899]: reply io is DS keytag 57355, algo 8, digest 2
Sep 16 00:25:23 dnsmasq[2899]: dnssec-query[DS] codepen.io to 127.0.0.1#5335
Sep 16 00:25:23 dnsmasq[2899]: query[A] codepen.io from 192.168.1.121
Sep 16 00:25:23 dnsmasq[2899]: forwarded codepen.io to 127.0.0.1#5335
Sep 16 00:25:23 dnsmasq[2899]: reply digitalocean1.mktoweb.com is blocked during CNAME inspection
Sep 16 00:25:24 dnsmasq[2899]: query[A] static.codepen.io from 192.168.1.121
Sep 16 00:25:24 dnsmasq[2899]: dnssec-retry[DS] codepen.io to 127.0.0.1#5335
Sep 16 00:25:24 dnsmasq[2899]: dnssec-query[DNSKEY] io to 127.0.0.1#5335
Sep 16 00:25:24 dnsmasq[2899]: reply io is DNSKEY keytag 49206, algo 8
Sep 16 00:25:24 dnsmasq[2899]: reply io is DNSKEY keytag 13608, algo 8
Sep 16 00:25:24 dnsmasq[2899]: reply io is DNSKEY keytag 57355, algo 8
Sep 16 00:25:24 dnsmasq[2899]: reply codepen.io is no DS
Sep 16 00:25:24 dnsmasq[2899]: reply codepen.io is 104.16.176.44
Sep 16 00:25:24 dnsmasq[2899]: reply codepen.io is 104.17.14.48
Sep 16 00:25:24 dnsmasq[2899]: reply static.codepen.io is 104.17.14.48
Sep 16 00:25:24 dnsmasq[2899]: reply static.codepen.io is 104.16.176.44

Note that in all cases I was not aware of anything being wrong, no broken sites or anything, it was only the type OTHER on the dashboard that alerted me to look into it and discover the weird TYPE5335 entry.

I've now temporarily enabled Unbound logging verbosity level 3 to a dedicated log file in case it happens a third time.

When did this start to happen?
This looks suspiciously like a bug in recent dnsmasq/FTL (v5.18) as it interprets your upstream DNS port 5335 as query type.

It started with that 06:58 date and time yesterday running Core: v5.12.1, Web: v5.15, FTL: v5.18.

The entire setup is a brand new install with Unbound which has only been up since Sep 11th, and that was using whatever was current at that time. I think this was Core v5.12, Web v5.14.2, FTL v5.17 - the domain filter checkboxes had not yet been added, for example.

I did not see the problem until yesterday, and before this install I was running a much older version of everything on Buster 32-bit with no Unbound.

I just went to amazon.co.uk again and have 3rd example happen as Amazon queried its multiple TLDs.

Dnsmasq entries from pihole.log

Sep 16 09:16:52 dnsmasq[2899]: query[A] www.amazon.com.br from 192.168.1.121
Sep 16 09:16:52 dnsmasq[2899]: forwarded www.amazon.com.br to 127.0.0.1#5335
Sep 16 09:16:53 dnsmasq[2899]: dnssec-query[DS] br to 127.0.0.1#5335
Sep 16 09:16:53 dnsmasq[2899]: reply br is DS keytag 2471, algo 13, digest 2
Sep 16 09:16:53 dnsmasq[2899]: dnssec-query[DS] com.br to 127.0.0.1#5335
Sep 16 09:16:53 dnsmasq[2899]: dnssec-query[DNSKEY] br to 127.0.0.1#5335
Sep 16 09:16:53 dnsmasq[2899]: reply br is DNSKEY keytag 2471, algo 13
Sep 16 09:16:53 dnsmasq[2899]: reply br is DNSKEY keytag 65267, algo 13
Sep 16 09:16:53 dnsmasq[2899]: reply com.br is DS keytag 14462, algo 13, digest 2
Sep 16 09:16:53 dnsmasq[2899]: reply com.br is DS keytag 23357, algo 13, digest 2
Sep 16 09:16:53 dnsmasq[2899]: dnssec-query[DS] amazon.com.br to 127.0.0.1#5335
Sep 16 09:16:53 dnsmasq[2899]: query[A] www.amazon.com.br from 192.168.1.121
Sep 16 09:16:53 dnsmasq[2899]: dnssec-retry[DS] amazon.com.br to 127.0.0.1#5335
Sep 16 09:16:54 dnsmasq[2899]: dnssec-query[DNSKEY] com.br to 127.0.0.1#5335
Sep 16 09:16:54 dnsmasq[2899]: reply com.br is DNSKEY keytag 23357, algo 13
Sep 16 09:16:54 dnsmasq[2899]: reply com.br is DNSKEY keytag 14462, algo 13
Sep 16 09:16:54 dnsmasq[2899]: reply amazon.com.br is no DS
Sep 16 09:16:54 dnsmasq[2899]: reply www.amazon.com.br is <CNAME>
Sep 16 09:16:54 dnsmasq[2899]: reply tp.2afc17682-frontier.amazon.com.br is <CNAME>

This time I have the Unbound logs. There are 300+ lines relating to the recursive resolution of www.amazon.com,br, nothing stands out. There are numerous entries which contain the word "irrelevant", all to do with this visit to amazon.co.uk and the loading of all the TLDs in this 1 second of time (see below adjusted from epoch to normal time).

No idea if these are relevant or not. Perhaps the sudden volume of these domains has some interaction with dnsmasq which results in it failing to parse the forwarding correctly and interpret the port of a later domain in the chain (mx yesterday, br now) as a lookup. Guess I'll have to wait for a 4th example on a non-amazon domain to see any commonalities.

grep 'irrelevant'

[Fri Sep 16 09:16:50 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: bfbdc3ca1-frontier.amazon.co.uk. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 47cf2c8c9-frontier.amazon.com. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 248578a95-frontier.shopbop.com. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 391b988c0-frontier.imdb.com. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: c95e7e602-frontier.amazon.in. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 252df37b2-frontier.amazon.nl. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: c9a5976c7-frontier.amazon.ae. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 2afc17682-frontier.amazon.com.br. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 04f01a85e-frontier.amazon.com.au. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 1fe6d5bb2-frontier.amazon.es. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 508c1a525-frontier.audible.co.uk. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 6ca7af544-frontier.amazon.it. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 6624f9d37-frontier.amazon.pl. NS IN
[Fri Sep 16 09:16:52 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 4e37fb303-frontier.amazon.com.mx. NS IN
[Fri Sep 16 09:16:53 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 04788557e-frontier.amazon.sg. NS IN
[Fri Sep 16 09:16:53 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 8fb1e192f-frontier.amazon.com.tr. NS IN
[Fri Sep 16 09:16:53 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 21ca3455c-frontier.bookdepository.com. NS IN
[Fri Sep 16 09:16:53 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: f0c2e5164-frontier.amazon.fr. NS IN
[Fri Sep 16 09:16:53 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: abe2c2f23-frontier.amazon.de. NS IN
[Fri Sep 16 09:16:53 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 3a3674792-frontier.amazon.ca. NS IN
[Fri Sep 16 09:16:53 2022] unbound[3856:0] info: normalize: removing irrelevant RRset: 4d5ad1d2b-frontier.amazon.co.jp. NS IN

I enabled DNSSEC in Pi-hole and visited the sites you saw that query type occurring. But I did not observe it - but this does not mean this is not a bug, rather than it's a hard-to-catch one.

Same here, I'm not seeing it normally when I visit those sites or using dig, except for those occasions when it happened. I was only even alerted to it be seeing a type of OTHER in the dashboard query type pie chart, so keep an eye out for that appearing.

I don't think it's an Unbound issue – that seems to be working fine and the recursive logs are not showing me anything odd. Rather it seems to be FTL's handling of something. The A record from this iMac request preceeding each TYPE5335 entry shows a status of Unknown (0) which is not correct and should be OK (answered by localhost#5335). You can see this difference in the first two screencaps, first one wrong, second one correct. Then the reply above it should lilely be a DS record with NODATA (for a non-DNSSEC domain), but instead is showing as a BLOB.

Therefore I suspect that FTL is, for some reason, not parsing the A record request correctly in some cases. Maybe a race condition or an edge case trigger. Once it happens, it logs the port number as the query type, and the NODATA response and possibly the non-reply out of bounds data around that as a BLOB.

Amazon UK has triggered it twice so far, so it may be that the large number of TLDs it tries to resolve in one go has, by chance, exposed a bug. Perhaps a script to try and dig A records for a large number of different TLDs simultaneously might trigger it?

Looking at your logs (thank you very much!), I suspect this is rather caused by some misinterpretation of the

queries.

I tried to reproduce it myself, too, as I have the same/similar configuration of unbound on the same machine. I'm also using DNSSEC in Pi-hole but I haven't come across one abnormal query while running this for nearly an hour by now.

Since you seem to "know" how to trigger this (at least more reliably than us), I'd like to ask you to switch on debugging output for FTL.

Please add

DEBUG_QUERIES=true
DEBUG_FLAGS=true
DEBUG_DNSSEC=true

to the file /etc/pihole/pihole-FTL.conf (create if it does not exist) and run

pihole restartdns

Next time you see the odd queries, you'll find related log lines in /var/log/pihole/FTL.log with hopefully enough details to immediately show us where/why exactly the issue happens.

Thanks, done. I'll post back when it next happens!

Here we go, amazon.co.uk triggered it again with another one of their background loading TLDs. This time amazon.com.au.

There are a few hundreds lines in the log - I can extract out a few seconds either side of the queries and send to you in a message, is that okay? Same sections for the pihole and unbound logs, plus I noticed this domain is resolved without issue earlier, since the debug logging was enabled, so will make for a handy comparison?

You can print the log to tricorder:

sudo cat /var/log/pihole/FTL.log | pihole tricorder

Then post the token URL.

I've consoldiated the 3 logs – FTL, pihole and unbound – into a single text file which covers the timespan in question 5 minutes either side, with some notes at the top.

Consolidated log – https://tricorder.pi-hole.net/tqC7N1xj/
Fresh debug log from just now in case useful – https://tricorder.pi-hole.net/gS0z8yeq/

I've spotted something in the logs which may be implicated. The FTL and pihole logs contain plenty of normal entries which state:

FTL:

**** got upstream reply from 127.0.0.1#5335: example.com is no DS (ID nnnn...

pihole:

reply example.com is no DS

But there are some entries specifically for the domain co.jp which have the string "/cut" appended to the string "DS"

FTL:

**** got upstream reply from 127.0.0.1#5335: co.jp is no DS/cut (ID nnnn...

pihole:

reply co.jp is no DS/cut

These entries appear on 2 of the 3 occasions (in pihole.log since dnsmasq debug was not active earlier) where the TYPE5335 / BLOB entries were recorded - both for amazon.co.jp (as part of amazon's domain TLD blitz when logging in to amazon.co.uk). This seems relevant.

It does not appear for 1 of the occasions (codepen.io).

It also appears on other occasions, again as part of amazon.co.jp, and in those other cases it's present but no TYPE5335 / BLOB was recorded.

I looks like an errant string which, on some occasions, is confusing FTL/dnsmasq. What do you think?

Thanks

1 Like

Thanks. The string itself isn't an issue but only cosmetics. When there is a DS record with no NS (and we can prove that), dnsmasq logs no DS/cut.

However, the real issue here is shown in these lines:

[2022-09-16 20:09:12.910 16008M] **** new IPv4 dnssec-query[type=5335] query "amazon.com.au" from -/<internal>#53
[2022-09-16 20:09:12.934 16008M] Set reply to BLOB (13) in src/dnsmasq_interface.c:2117

where you can also see the type=5335 being logged as query and BLOB as the reply type. The fact that this is a reply also perfectly explains why we haven't been able to reproduce this easily (it very much depends on the latency of your Internet connection and a few other things). The pihole.log lines confirming this are:

Sep 16 20:09:12 dnsmasq[16008]: dnssec-query[DS] amazon.com.au to 127.0.0.1#5335
Sep 16 20:09:12 dnsmasq[16008]: dnssec-retry[DS] amazon.com.au to 127.0.0.1#5335
Sep 16 20:09:12 dnsmasq[16008]: reply amazon.com.au is no DS

This matches your screenshot above where you can see N/A as reply for the first DS query (third line in your screenshot). A reply for this query simply never arrived. Instead, FTL follows up by a retry which is incorrectly processed as TYPE5335 (first line in the screenshot). This also explains why it happens more often in a flurry of logs as it is much likelier to cause a retry when a domain is queried very often due to the occasional slowness of a local unbound with a not fully saturated cache.


To wrap this up: We're seeing two combined issues where the former causes the latter:

  1. A DNSSEC retry isn't correctly recognized, leading to the TYPE5335 query
  2. As FTL does not know how to interpret such a TYPE5335 query, the reply is simply recorded as BLOB instead of actually analyzing the DNS reply.

This is a regression of the recently merged release candidate for dnsmasq but also purely cosmetic. The underlying DNS processing works fine and is not affected. I (hopefully) fixed this now in FTL. Could you test if this resolves all the issues you've seen?

Please run

pihole checkout ftl fix/dnssec-retry

to get my proposed fix and check if this resolves the issues (as we cannot test this reliably).

You can go back at any time to the regular release using

pihole checkout ftl master
1 Like

I have the same Problem with newest Stable release.

Thanks for looking into the logs and finding it. I was searching for TYPE5335 and lowercase blob (because in a man page a /search in lowercase finds an uppercase variant, but evidently not in less!), so I missed these entries. I wasn't aware of cut nodes and that was the only "anomaly" that I could see. Looks like Amazon is a handy tool for generating a quick blitz of TLD lookups!

The internet connection is fast and low latency. I've checked the ISP's FireBrick graphs to look for any packet loss, all solid. The Pi is a 3B+ with a gig of memory. The only thing I can think of that could be a bottleneck is the SD card – it's an older 16GB card, not a fast write one. I've been using it for years for Pi-hole. It's not thrown up any I/O errors in anything. Have you seen any similar Pi-hole issues with slower SD cards (eg enough overall I/O from the OS, logging and everything that FTL is sometimes 'blocked' from seeing a reply that does arrive? My suspicion is that the reply did in fact simply not arrive.

I had it three more times in the early hours, again testing with Amazon, this time on a lower level part of a domain.

blob5

Out of interest what would someone be seeing if they were using, say 1.1.1.1 on port 53 as their upstream and they had this no reply/retry issue? Would it be logged a TYPE53 query with a BLOB reply?

With pleasure, thankyou Dominik for your help and the fix, and yubiuser+jfb too. I'll leave the logging in place and switch to this branch and run it for a while and report back.

Thanks @Starfoxfs, handy to have a second installation non-Amazon datapoint.

A test with good old Amazon suggests it is fixed – not seeing the issue under same circumstances anymore, despite having what look like the same kind of retry situations you described. Here are some screenshots and another tricorder copy of the relevant logs sections, if you want to take a look.

amazon.pl

com.au

amazon.com.mx

Tricorder copy of logs - https://tricorder.pi-hole.net/eS2kDCpF/

We have the exact same situation as before but now everything looks good. Thank you for confirming the fix.

I agree. I never meant I/O can be the issue here, FTL keeps everything in memory for good reasons. When you take a closer look at the debug logs in FTL.log (as they are on millisecond accuracy), you can see that the two queries were made in close succession so it isn't all that unlikely to be a "simply not arrived yet" situation.

There were previously many cases where the same situation was present with Amazon domains but the issue didn't occur. It does look fixed but I'll stay on the dev branch and leave logging on and test for a couple of days (and watch OTHER drop off the query types graph). I'll update here on Mon 19th to confirm fixed.

I noticed in the unbound logs I see a few debug: outnettcp got tcp error -1 entries dotted around within a few seconds of dnsmasq's dnssec-retry entries, but not a strong correlation and some googling suggests those crop up as normal network under-the-hood stuff. Thought I'd mention it in case relevant to this or a later case.

Thanks again

1 Like