DNSMASQ_WARN reducing DNS packet size

Just a quick reply. Thanks for the fix. I will say that I'm running 2 recursive BIND servers on my network. One is BIND 9.16 running on FreeBSD 13.0. The other is BIND 9.11 on Rocky Linux 8.5. Only BIND 9.11 is giving this error. Don't know if that makes a difference in your hypothesis. My ISP isn't messing around with the queries. My Router is pfSense 2.5.2.
The DNSMASQ error comes up immediately after restarting the Pihole process. It started this morning when I did my weekly udpates. The config file fix did stop the warnings though.
Thanks!

Yes, this makes me believe it is an bind issue. Either due to a misconfiguration (I guess you already ruled that out, though) or due to a bad default value for something you don't have configured at all. This would also make sense as you say that the latter bind version doesn't show this - the default value might have been adjusted as they realized this. Maybe the ISP's DNS servers are running the same "bad" bind version resulting in this seen by the other users.

I updated recently and looks like I am seeing something similar. Not sure exactly what this means though.

Version:

Docker Tag 2021.12
Pi-hole [v5.7]
FTL [v5.12]
Web Interface [v5.9]

Image of the warnings I am seeing.

I run a pfsense router which is the IP I have masked out, if that info is helpful at all.

@LIGISTX as I said above, this comes either from a misconfiguration of a component in your network (most likely the DNS server, but it can also be a router) or a bad default. See DNSMASQ_WARN reducing DNS packet size - #9 by DL6ER for a the solution. Note that setting a lower maximum packet size is not a workaround but a proper solution in this case.

Not sure about this statement:

setting a lower maximum packet size is not a workaround but a proper solution in this case

Others have stated that it has an inpact on performance: https://www.linksysinfo.org/index.php?threads/reducing-dns-packet-size-for-nameserver-127-0-0-1-to-1280.75502/

it seems to have an impact on the webpages loading response time

As far as I understand, a bigger buffer than 1280 is needed sometimes to avoid truncation, which can lead to retry on tcp and fragmentation.

I am only a user, so please correct me if my conclusion is wrong.

I said in this case because it seems that larger packets cannot make it anyway to their target and have to be retired with smaller packets even over UDP. The performance is much worse as the second UDP attempt is only made when we never receive a reply from upstream due to a packet that never reached its target (due to size). In contrast to TCP, we don't get status information about UDP transmissions and cannot know if a packet reached its target at all.

Hence, when you see this warning, it means that you have hit at least one timeout on UDP already. If the packet has to be retransmitted over TCP, that's an altogether different question and comes even thereafter.

TL;DR: The situation is worse with too large packets. Reduces packet size will make everything faster due to not having to retry after a timeout.

2 Likes

My pihole is pointed at my pfsense firewall for DNS. So this implies something is wrong within pfsense's DNS? I did update to the newest FTL yesterday and have not seen this happen since. Not sure if that had anything to do with this though. I suppose I will monitor for it happening again.

With my set up (pi-hole + unbound + DNSSEC), I get a few of these a day. I'm not too keen on enforcing the smaller packet size, potentially affecting all requests, when only a few, occasional requests trigger this warning. On the other hand, while the new diagnosis page is handy, having the admin interface alert me each time is not as desirable. I am using the recommended unbound conf from the guide for unbound in the docs. It suggests: edns-buffer-size: 1472 for unbound. Is this configuration still appropriate?

You can change this to 1232, per the unbound documentation - this is the unbound default value.

https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html

edns-buffer-size:
"Number of bytes size to advertise as the EDNS reassembly buffer size. This is the value put into datagrams over UDP towards peers. The actual buffer size is determined by msg-buffer-size (both for TCP and UDP). Do not set higher than that value. Default is 1232 which is the DNS Flag Day 2020 recommendation. Setting to 512 bypasses even the most stringent path MTU problems, but is seen as extreme, since the amount of TCP fallback generated is excessive (probably also for this resolver, consider tuning the outgoing tcp number)."

Not necessarily. Something along the path from your Pi-hole to the final name server serving any domain you browse to. Pi-hole cannot find out which name server is responsible for the truncation. It doesn't have to be the first one on this path.

The limit is never surpassed for the vast majority of queries. Only those that trigger the warning are affected at all.

I'm not sure about this. There is something that affects your performance and you can fix it by appropriate configuration. Sounds logical to alert you that you should look at it. Once it is resolved, the earnings will not appear any longer.

In this case you should either use this value for the FTL setting or change both as advised by @jfb. This value is even lower than the 1280 I suggested above. We should update the documentation accordingly.

edit

I was getting this error from OpenDNS' IPv6 server.
I changed the IPv6 server to Quad9 (filtered, ECS, DNSSEC), stayed with OpenDNS for IPv4 and it has not thrown the issue again.

I do not see anyone here (yet) getting the error from an IPv6 address but just my $0.02.

If were not for this thread I would have not, properly, troubleshooted the issue.

edit

I CLEARLY did not read Scepterus post to the end. :zipper_mouth_face:

Second edit.

BTW Scepterus 9.9.9.9 is Quad9's IPv4 server and now I am, also, getting this error from them on IPv6.

So, this is not normal, something is wrong with the new version? I mean:
If some of the best DNS' are causing errors, then it is not the DNS'.

May I ask you?
How does the default setting, 4096, help?
I'm now scouring the threads on how to lower it to 1280...

I mean, sure, I know enough Linux pull up terminal or to to pull the card out, navigate to that directory, and add *.conf but there must be an easier way or, at least a justification, why the packet size, by default, is so large to begin with?
In all earnest: you guys have been all over the place squashing this alert; why not just lower it by default?

Of course.

It is beneficial to handle large DNS replies over UDP without having to fall back to TCP. Retries can easily add unnecessary delays of up to several hundreds of milliseconds. The value of 4096 hasn't been invented by us but is suggested by the ENDS Internet standard, to be precise, RFC 6891 Section 6.2.5:

6.2.5. Payload Size Selection

Due to transaction overhead, it is not recommended to advertise an architectural limit as a maximum UDP payload size. Even on system stacks capable of reassembling 64 KB datagrams, memory usage at low levels in the system will be a concern. A good compromise may be the use of an EDNS maximum payload size of 4096 octets as a starting point.

A requestor MAY choose to implement a fallback to smaller advertised sizes to work around firewall or other network limitations. A requestor SHOULD choose to use a fallback mechanism that begins with a large size, such as 4096. If that fails, a fallback around the range of 1280-1410 bytes SHOULD be tried, as it has a reasonable chance to fit within a single Ethernet frame. Failing that, a requestor MAY choose a 512-byte packet, which with large answers may cause a TCP retry.

Values of less than 512 bytes MUST be treated as equal to 512 bytes.

Emphasis on the second paragraph. This is exactly what Pi-hole does. We implement the optional fallback ("MAY") and first try with a packet size of 4096 ("SHOULD"). We then drop to a value of 1280 ("SHOULD") if that doesn't work. We do not try a third time with 512 bytes ("MAY") and immediately retry over TCP as a network that cannot even handle 1280 bytes properly in one packet is considered severely broken and other things are expected to fail, too.

It is exactly here:

But it also was in the thread you posted your reply in, first.

2 Likes

Because this is not a good solution for everyone. I recommended reducing the payload size only for this who discovered that a larger payload size does not work for them. Many can handle larger DNS buffers just fine. For instance, my local unbound handles 4096 bytes packets just fine.

Because this doesn't seem to be documented anywhere properly, I probed all the DNS servers currently offered by Pi-hole to find out their maximum DNS packet size:

Name Address Maximum packet size
Google (ECS) 8.8.8.8 1400
8.8.4.4 1400
2001:4860:4860:0:0:0:0:8888 1400
2001:4860:4860:0:0:0:0:8844 1400
OpenDNS (ECS) 208.67.222.222 1410
208.67.220.220 1410
2620:0:ccc::2 1410
2620:0:ccd::2 1410
Level3 4.2.2.1 8192
4.2.2.2 8192
Comodo 8.26.56.26 4096
8.20.247.20 4096
DNS.WATCH 84.200.69.80 4096
84.200.70.40 4096
2001:1608:10:25:0:0:1c04:b12f 4096
2001:1608:10:25:0:0:9249:d69b 4096
Quad9 (filtered, DNSSEC) 9.9.9.9 1232
149.112.112.112 1232
2620:fe::fe 1232
2620:fe::9 1232
Quad9 (unfiltered, no DNSSEC) 9.9.9.10 1232
149.112.112.10 1232
2620:fe::10 1232
2620:fe::fe:10 1232
Quad9 (filtered + ECS) 9.9.9.11 512
149.112.112.11 512
2620:fe::11 1232
Cloudflare 1.1.1.1 1452
1.0.0.1 1452
2606:4700:4700::1111 1452
2606:4700:4700::1001 1452

You see that several do allow for a packet size of 4096. Level3 even allows for insane 8192 bytes.

8 Likes

my bad

Wow!!!

Thank you and all for all the hard work you do.

Seriously:
Wow!

I looked it up and only 3 of you maintain Pi-hole?

I’ll, now, donate every year.
Thank you very much!!!!

I had no idea!
I’d suggest you put that in the next update: it is just 3 of you and even a dollar, from all that benefit from your work -would- should be appreciated.

Only 3…

Just…

…Wow.

7 Likes

Thanks for the explaining.
I got it also with cloudflared using 1.1.1.1 as dns followed the official documentation.
For me it comes up about 1 or 2 times a week, so it's more rare. Since all seems to work fine for me, it is only a bit irritating.

Doing some research about the warning I've found this posts...

If you can probe DNS asking for the maximum-packet-size, couldn't you use those values in pihole? I mean, assigning the default-packet-size value of 4096 to each DNS server that is configured in pihole. On first use query the actual packet-size from the server, update the internal value and use that value when communicating with that server. That way would work with user-driven/custom DNS servers, we don't need to limit all DNS servers because of one raising that warning and could get the best performance from each server. Dont know how often that value should be updated, may be once a week or month (in case the value has been increased on the server) or once when a DNS_MASQ warning occured with that server.

Of couse, that is easier said than implemented. But may be you could put something like that on the todo-list of this awesome project :slight_smile:

1 Like

Sure, but you cannot. My study above was slowly increasing DNS packets with random data to said servers, carefully investigating the reply to find the upper boundary of there they either set the TC (truncated) bit or stop responding altogether. This sent a lot of packets to each and every server. If I do this once and put up a table that's one thing. If thousands of Pi-holes out there do this automatically...

We have been contacted by Github in the past because Pi-holes were keeping their API notably busy. I don't want to repeat this story elsewhere.

4 Likes