Pi-hole embeds the DNS server dnsmasq, which is currently in release-candidate state for version 2.85. Please join us in the final testing of this version of dnsmasq, to help us ensure there are no major bugs before the final release. You may be receiving a few updates on this branch.
To get the release candidate version, run pihole checkout ftl update/dnsmasq-v2.85
You can go back at any time using pihole checkout ftl master
Please also go back to master before updating Pi-hole after the next release. This can be done at any time, also after the update happened. Support and discussions are possible on the linked Discourse thread.
dnsmasq: CHANGELOG
Fix problem with DNS retries in 2.83/2.84.
The new logic in 2.83/2.84 which merges distinct requests for the same domain causes problems with clients which do retries as distinct requests (differing IDs and/or source ports.) The retries just get piggy-backed on the first, failed, request.
The logic is now changed so that distinct requests for repeated queries still get merged into a single ID/source port, but they now always trigger a re-try upstream.
Avoid treating a dhcp-host which has an IPv6 address as eligible for use with DHCPv4 on the grounds that it has no address, and vice-versa.
Add dynamic-host option A and AAAA records which take their network part from the network of a local interface. Useful for routers with dynamically prefixes.
Teach bogus-nxdomain and ignore-address to take an IPv4 subnet.
Use random source ports where possible if source addresses/interfaces in use. CVE-2021-3448 applies.
It’s possible to specify the source address or interface to be used when contacting upstream name servers: server=8.8.8.8@1.2.3.4 or server=8.8.8.8@1.2.3.4#66 or server=8.8.8.8@eth0, and all of these have, until now, used a single socket, bound to a fixed port. This was originally done to allow an error (non-existent interface, or non-local address) to be detected at start-up. This means that any upstream servers specified in such a way don’t use random source ports, and are more susceptible to cache-poisoning attacks.
We now use random ports where possible, even when the source is specified, so server=8.8.8.8@1.2.3.4 or server=8.8.8.8@eth0 will use random source ports. server=8.8.8.8@1.2.3.4#66 or any use of query-port will use the explicitly configured port, and should only be done with understanding of the security implications. Note that this change changes non-existing interface, or non-local source address errors from fatal to run-time. The error will be logged and communication with the server not possible.
Change the method of allocation of random source ports for DNS. Previously, without min-port or max-port configured, dnsmasq would default to the compiled in defaults for those, which are 1024 and 65535. Now, when neither are configured, it defaults instead to the kernel’s ephemeral port range, which is typically 32768 to 60999 on Linux systems. This change eliminates the possibility that dnsmasq may be using a registered port > 1024 when a long-running daemon starts up and wishes to claim it. This change does likely slightly reduce the number of random ports and therefore the protection from reply spoofing. The older behaviour can be restored using the min-port and max-port config switches should that be a concern.
Scale the size of the DNS random-port pool based on the value of the dns-forward-max configuration.
TFTP tweak: Check sender of all received packets, as specified in RFC 1350 para 4.
Nothing too specific. Just if everything still works. This is going to recognize issues like the broken retry-mechanism in the previous / current master version early enough so they don't get released into the wild and annoy users that may not be willing to register here and provide feedback.
started new FTL (update/dnsmasq-v2.85) yesterday (2021-03-31 23:05:57 local time).
hopefully no logic errors...
Epoch timestamp : 1617228000 Date and time (Your time zone) : Thursday, April 1, 2021 12:00:00 AM GMT+02:00
SELECT count(*) FROM "queries" WHERE timestamp > "1617228000";
count: 7489
SELECT count(*) FROM "queries" WHERE timestamp > "1617228000" and status is "12";
count: 1289
There is no noticeable impact on the user experience.
edit
todays results @ 09:00 local time (CET)
today: 04/02/2021 -> 1617314400
total # of queries today: 1681
status count description
0 0 Unknown status
1 399 Domain contained in gravity database
2 1128 Forwarded
3 25 Known, replied to from cache
4 5 Domain matched by a regex blacklist filter
5 0 Domain contained in exact blacklist
6 0 By upstream server (known blocking page IP address)
7 0 By upstream server (0.0.0.0 or ::)
8 0 By upstream server (NXDOMAIN with RA bit unset)
9 44 Domain contained in gravity database (CNAME)
10 0 Domain matched by a regex blacklist filter (CNAME)
11 0 Domain contained in exact blacklist (CNAME)
12 80 Retried query
13 0 Retried but ignored query (DNSSEC)
14 0 Already forwarded, not forwarding again
Running nicely here on Pi Zero W, with other Pi-Hole modules running dev branches.
(for completeness, using latest unbound as DNS resolver)
Ran the same check as @jpgpi250 , here are my results:
today: 04/02/2021 -> 1617314400
total # of queries today: 41707
status count description
0 0 Unknown status
1 21660 Domain contained in gravity database
2 19062 Forwarded
3 133 Known, replied to from cache
4 8 Domain matched by a regex blacklist filter
5 0 Domain contained in exact blacklist
6 0 By upstream server (known blocking page IP address)
7 0 By upstream server (0.0.0.0 or ::)
8 0 By upstream server (NXDOMAIN with RA bit unset)
9 437 Domain contained in gravity database (CNAME)
10 20 Domain matched by a regex blacklist filter (CNAME)
11 0 Domain contained in exact blacklist (CNAME)
12 387 Retried query
13 0 Retried but ignored query (DNSSEC)
14 0 Already forwarded, not forwarding again
@jpgpi250
This may be a brain fart on my side, however i cannot seem to run either select commands on 3 different Pi - one built just to confirm if it is my earlier instance
SELECT count(*) FROM "queries" WHERE timestamp > "1617228000";
I keep getting below error:
-bash: syntax error near unexpected token `('
What am I doing wrong?
Edit:
Managed to get this going with below command
sqlite3 "/etc/pihole/pihole-FTL.db" "SELECT count(*) FROM "queries" WHERE timestamp > "1617228000" and status is "12";"
However the output is a single line with a number and not in tabular form as shown
The select statement you use only returns a single number, the total number of retried queries registered, since the given timestamp. The output is thus correct. I assume DL6ER is mostly interested in the result of the query you executed, that is one of the things dnsmasq v2.85 is suppose to fix (if I understand the description correctly...)
The output I added in my edit is the result of a script, getting the count for all possible status types from the database, unfortunately, we're not allowed to share scripts here. You can look at the documentation to learn more about the different status types
Thanks @jpgpi250.
I figured that this must be some type of script but didn’t extend my question that far. Appreciate you highlighting that the script is internal. I managed to run single liner using same documentation and will try to make something from all available options.
sqlite3 /etc/pihole/pihole-FTL.db --header --column "SELECT status, count(*) FROM 'queries' WHERE timestamp > strftime('%s','now','-24 hours') group by status order by status asc;"
Taking this a step further, to verify it's not some specific TLD or domain that is causing the problem.
first column (count) = SELECT count(*)
second column (unique) = SELECT count(DISTINCT domain)
today: 04/02/2021 -> 1617314400
total # of queries today: 9612
status count unique description
0 0 0 Unknown status
1 1343 51 Domain contained in gravity database
2 5918 358 Forwarded
3 104 16 Known, replied to from cache
4 18 4 Domain matched by a regex blacklist filter
5 0 0 Domain contained in exact blacklist
6 0 0 By upstream server (known blocking page IP address)
7 0 0 By upstream server (0.0.0.0 or ::)
8 0 0 By upstream server (NXDOMAIN with RA bit unset)
9 209 3 Domain contained in gravity database (CNAME)
10 0 0 Domain matched by a regex blacklist filter (CNAME)
11 0 0 Domain contained in exact blacklist (CNAME)
12 2020 95 Retried query
13 0 0 Retried but ignored query (DNSSEC)
14 0 0 Already forwarded, not forwarding again
Looking at the result of SELECT DISTINCT domain FROM "queries" WHERE timestamp > 1617314400 AND status= 12, it doesn't look like a specific TLD or domain is causing this, it's just my activity (browsing and viewing habits) that is causing the unique count to be much lower than the total count. Trying to find a cause, another dead end...
edit
I used the result from the above query (95 domains) and used these as source for DPT (DNS Performance Test) Checked the stats again after completion, No significant change in the count (status 12), thus confirmed, it's not a TLD or domain problem.
/edit
Normally, I use unbound as a recursive resolver, thus pihole-FTL is reporting stats, based on the data, provided by unbound.
As I have a rather specific compiled unbound setup (tcp fast open, redis and unbound optimizations), I wanted to make sure this is not causing the problem, so I installed knot-resolver, config out of the box, and changed the pihole-FTL resolver settings to use knot-resolver, unbound thus out of the picture.
It didn't take that long to conclude my unbound config isn't the cause of the retries, knot-resolver generated similar stats (lots of retries) in a few hours.
While I was testing if knot-resolver gave me the same features (DNSSEC, IPv6 support, ...), I noticed, during one of the tests, the retry count increased significantly when a lot of messages like "dnsmasq[1135]: reply ipv6.test-ipv6.fratec.net is CNAME" appeared in the pihole-log. To achieve this, just browse to this site, you'll see a lot of them. No idea if this might point DL6ER into the right direction, hope it helps.
edit
just ran a test on the unique domains with status 12 from today. Out of 88 unique domains listed, 55 are actually cnames. Don't know if this means anything...
/edit
I also noticed the following.
Installed a fresh pihole on a new system.
as soon as pihole was installed, executed pihole checkout ftl update/dnsmasq-v2.85
tried to add entries to the aliasclient table of /etc/pihole/pihole-FTL.db, this failed because the table didn't exist. The aliasclient_id field in the network table was also missing.
switch back to master, aliasclient table was immediately created. ditto for the aliasclient_id field.
switched back to update/dnsmasq-v2.85, problem solved...
I hope the alias feature is here to stay (not removed in the next version of FTL), as it works perfectly...
Thanks, this hint was very helpful (even if it doesn't have anything to do with CNAMEs). I can reproduce this with your suggested test.
editContent largely replaced/updated
There seems to be an issue with the "fixed" retry algorithm of dnsmasq. I reported this upstream. The issue is that certain queries are refused by the upstream (SERVFAIL). This is by design of the IPv6 test. dnsmasq retries those SERVFAILs while it shouldn't. Further investigation is going on.
I pushed a small fix that should detect and handle this case. Please update on this branch and try again.
This. A warning like Ignoring self-retry would only be shown for DEBUG_QUERIES=true.
This special branch may not have been in sync with development, however, the aliasclient feature is there for longer by now. They are added when the database is update to version 9 (which is the most recent version). Does it work as expected now?
Yes, this is another but closely related issue which I haven't seen during my tests. I'll push a fix for this later today (edit done) even when it is completely harmless (no-op for identical IDs). We should still suppress the debug message in this case.
It's strange that you don't see the self-retry message but this just means you're maybe not even affected by the issue I've fixed
Can you identify the pihole.log lines related to this?
This is simplified by setting log-queries=extra in /etc/dnsmasq.d/01-pihole.conf as this ensures the query IDs do also show up in pihole.log.
Did the retry count in your database change notably since the most recent update?
Just to ensure we run (about) to same test: What is your result? I'm getting a 10/10 in the end.
The self-retry messages appear close to the end of even shortly afterwards for me.