Help us test FTL v5.8 / dnsmasq v2.85

We actually try to always be beyond the release candidates and apply all patches that are approved by Simon. So we are most up-to-date all the time and new stuff can already be tested before the next release-candidate is enumerated.

Coro's statement is not correct. The query ID is compared here:

      for (src = &forward->frec_src; src; src = src->next)
        if (src->orig_id == ntohs(header->id) && 
            sockaddr_isequal(&src->source, udpaddr))
          break;

This code says:

  1. When the query comes from the same source (IP address) and has the same ID: retried query
  2. Otherwise: repeated query

Simon's patch will reduce the number of packets send upstream only if you have multiple forward destinations. It will not change anything if you have only one (like a local unbound sitting at 127.0.0.1#5353). It seems we cannot have anything better at this time.

I just pushed the last patch from Simon along with the necessary FTL changes. Please also watch if strange things happen to the over-time graphs and/or query type pie chart (like any strange negative counts, etc.). I had to change something in the underlying datastructure as well and, as you know, every change may have unintended side effects.

Pi-hole version is v5.2.4 (Latest: v5.2.4)
AdminLTE version is v5.4 (Latest: v5.4)
FTL version is update/dnsmasq-v2.85 vDev-c2ef607 (Latest: v5.7)

the database says:


e.g. first an A query, then an AAAA query

the query log says (Reply N/A 0.0ms) for the A query:


missing a reply for the A query or just cosmetic?

Lots of N/A entries in the query log, using the test site ...

The retries in the query log, which could also be found in the database (status 12), are now N/A in the query log, but the database doesn't show anything (abnormal), that allows you to find these entries, using select, thus cosmetic?

edit
dig A ipv6.test-ipv6.cl doesn't return an IPv4 address
dig AAAA ipv6.test-ipv6.cl does return IPv6 2001:1398:1:0:200:1:122:49

does N/A always mean no result? Is there a status that would reflect "No result" other than N/A, with no matching value in the database?
/edit

Simon eventually agreed on my patch being useful, he just stripped the customizability of the timeout value and hard-coded it to 2 seconds. Update pushed.

It means no result received at all from upstream (so we don't know if NOERROR, NODATA, NXDOMAIN), etc. It is perfectly normal when restarting as the reply type nor the response time are stored in the database to avoid blowing the database file (further) up.

I do not see it with the test page, however, I made a few adjustments to how the replies of cached answers are stored. My last change could have caused confusion in the code when queries are blocked externally.

edit We're now at 2.85rc3 and merging my patch leads to a resurrection of the passed by query status IN_PROGRESS. For queries received in said two seconds interval that do not trigger a re-forwarding.

How can I get the Update to 2.85rc ?

Original instructions from DL6ER (NOT yet using the test branch):

if your already on 'update/dnsmasq-v2.85', just run 'pihole -up'

this should give you

pihole -v

  Pi-hole version is v5.2.4 (Latest: v5.2.4)
  AdminLTE version is v5.4 (Latest: v5.4)
  FTL version is update/dnsmasq-v2.85 vDev-b1deb8c (Latest: v5.7)

pihole-FTL -vv

****************************** FTL **********************************
Version:         vDev-b1deb8c

****************************** dnsmasq ******************************
Version:         pi-hole-2.85
Compile options: IPv6 GNU-getopt no-DBus no-UBus no-i18n IDN DHCP DHCPv6 Lua TFTP no-conntrack ipset auth cryptohash DNSSEC loop-detect inotify dumpfile

****************************** SQLite3 ******************************
Version:         3.35.2
Compile options: COMPILER=gcc-6.3.0 20170516 DEFAULT_FOREIGN_KEYS OMIT_DEPRECATED OMIT_LOAD_EXTENSION OMIT_PROGRESS_CALLBACK THREADSAFE=1
******************************** LUA ********************************
Lua 5.4.1  Copyright (C) 1994-2020 Lua.org, PUC-Rio
1 Like

YES, also read this in the discussion group (what changed his mind???)

Anyway, retry count back to an acceptable count, status 14 (Already forwarded, not forwarding again) count came alive (previous tests, this branch, was always zero)

Most important result, there appears to be no impact on the user experience, in the end, that is what counts...

today: 04/07/2021 -> 1617746400
total # of queries today: 3049
status  count   unique  description
0       0       0       Unknown status
1       276     28      Domain contained in gravity database
2       2295    422     Forwarded
3       56      17      Known, replied to from cache
4       326     22      Domain matched by a regex blacklist filter
5       0       0       Domain contained in exact blacklist
6       0       0       By upstream server (known blocking page IP address)
7       0       0       By upstream server (0.0.0.0 or ::)
8       0       0       By upstream server (NXDOMAIN with RA bit unset)
9       40      2       Domain contained in gravity database (CNAME)
10      0       0       Domain matched by a regex blacklist filter (CNAME)
11      0       0       Domain contained in exact blacklist (CNAME)
12      15      10      Retried query
13      0       0       Retried but ignored query (DNSSEC)
14      41      22      Already forwarded, not forwarding again

Thank you

Works amazingly well here as well. I noticed a small reduction in memory usage in /dev/shm as well :+1:

It was

small-hours musing

It is very funny how similar his conclusion it to the reasoning DL6ER gave in his very first post to the list.

The code DL6ER proposed here:

       /* If we don't want to retry just now, drop this query right after
          we added it to the list above */
       if (difftime(now, forward->time) < daemon->retry_timeout)
         return 0;

The code added by Simon:

       /* closely spaced identical queries cannot be a try and a retry, so
          it's safe to wait for the reply from the first without
          forwarding the second. */
       if (difftime(now, forward->time) < 2)
         return 0;

Can you tell a difference (else than the now hard-coded time)? :slight_smile:

What I'm a bit concerned is that he used @DL6ER's code 1:1 (the practical realization of it) and didn't credit his work. Not a very nice move.

We are very friendly with Simon and there are things and conversations that happen that you may not be privy to. Please keep in mind that you don't know who is reading these forums, there are far more anonymous users here.

pi@pihole:~ $ pihole checkout ftl update/dnsmasq-v2.85
Please note that changing branches severely alters your Pi-hole subsystems
Features that work on the master branch, may not on a development branch
This feature is NOT supported unless a Pi-hole developer explicitly asks!
Have you read and understood this? [y/N] y
[✓] Branch update/dnsmasq-v2.85 exists
[✓] Downloading and Installing FTL
[✓] Restarting pihole-FTL service...
[✓] Enabling pihole-FTL service to start on reboot...
pi@pihole:~ $ pihole -up
[i] Checking for updates...
[i] Pi-hole Core: up to date
[i] Web Interface: up to date
[i] FTL: up to date
[i] Warning: You are using FTL from a custom branch (update/dnsmasq-v2.85) and might be missing future releases.
[✓] Everything is up to date!
pi@pihole:~ $ dnsmasq
-bash: dnsmasq: command not found

I tried to install this dev branche, but how can I check which version of dnsmasq is used?

Run
pihole-FTL -vv

1 Like

just to let you know, no abnormal things detected (user experience, pihole dashboard, query log). The status 14 count is high, the usual suspects, such as alt1-mtalk.google.com (android mail notification check) but there are no noticeable consequences.

today: 04/07/2021 -> 1617746400
total # of queries today: 9315
status  count   unique  description
0       0       0       Unknown status
1       769     39      Domain contained in gravity database
2       6107    497     Forwarded
3       102     21      Known, replied to from cache
4       384     23      Domain matched by a regex blacklist filter
5       0       0       Domain contained in exact blacklist
6       0       0       By upstream server (known blocking page IP address)
7       0       0       By upstream server (0.0.0.0 or ::)
8       0       0       By upstream server (NXDOMAIN with RA bit unset)
9       95      2       Domain contained in gravity database (CNAME)
10      0       0       Domain matched by a regex blacklist filter (CNAME)
11      0       0       Domain contained in exact blacklist (CNAME)
12      32      20      Retried query
13      0       0       Retried but ignored query (DNSSEC)
14      1826    37      Already forwarded, not forwarding again

I updated to this branche,
and worked with it all day, no issues,
everything works fine and pi hole is just as reliable as always.
If I encounter any issues before the next release I will notify in this threat.

Thank you for the great tool!

This is quite a high number. It seems to be something like 20 % of all queries!

Compare this to my Pi-hole

$ sqlite3 /etc/pihole/pihole-FTL.db --header --column "SELECT status, count(*) "absolute", printf('%.2f%%',(100.0*count(*)/(SELECT count (*) FROM queries WHERE timestamp > strftime('%s','now','-24 hours')))) "relative" FROM queries WHERE timestamp > strftime('%s','now','-24 hours') group by status order by status asc;"
status      absolute    relative  
----------  ----------  ----------
1           1527        11.12%    
2           5616        40.89%    
3           6487        47.23%    
9           7           0.05%     
12          20          0.15%     
14          77          0.56%

where the in-progress queries are only some 0.6%.

If you have the time, it'd be interesting if you could check one or two of them, compare what pihole.log says* and if that matches the database. It should, but checking can never hurt.


*) I know it doesn't say much but you could check if the query came in multiple times but was forwarded only once. Also, pihole-FTL.log may contain something usefull if you still have query debugging enabled.

some results in a PM, but a think (hope) I found a reason why...

remember (previous discussions) , I'm on windows 10 20H2, with IPv4 (fixed address) and IPv6 (temporary IPv6 address, default out of the MS box). The IPv6 address changes every day, or after a reboot/restart. The fact the IPv6 address changes isn't really a problem, the fact that the windows 10 system has both IPv4 and IPv6 (GUA) appears to be a problem (I think).

Look at this very short list of queries, registered in the pihole.log today.

cat /var/log/pihole/pihole.log | grep www.whotracks.me
Apr  8 17:58:22 dnsmasq[18854]: 14786 192.168.2.228/60196 query[A] www.whotracks.me from 192.168.2.228
Apr  8 17:58:22 dnsmasq[18854]: 14786 192.168.2.228/60196 forwarded www.whotracks.me to fdaa:bbcc:ddee:2::5552
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 query[AAAA] www.whotracks.me from 192.168.2.228
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 forwarded www.whotracks.me to fdaa:bbcc:ddee:2::5552
Apr  8 17:58:22 dnsmasq[18854]: 14788 2a02:1810:4d02:6903:7dff:b06a:aed4:9194/59656 query[AAAA] www.whotracks.me from 2a02:1810:4d02:6903:7dff:b06a:aed4:9194
Apr  8 17:58:22 dnsmasq[18854]: 14789 2a02:1810:4d02:6903:7dff:b06a:aed4:9194/60196 query[A] www.whotracks.me from 2a02:1810:4d02:6903:7dff:b06a:aed4:9194
Apr  8 17:58:22 dnsmasq[18854]: 14786 192.168.2.228/60196 reply www.whotracks.me is 13.226.159.6
Apr  8 17:58:22 dnsmasq[18854]: 14786 192.168.2.228/60196 reply www.whotracks.me is 13.226.159.11
Apr  8 17:58:22 dnsmasq[18854]: 14786 192.168.2.228/60196 reply www.whotracks.me is 13.226.159.31
Apr  8 17:58:22 dnsmasq[18854]: 14786 192.168.2.228/60196 reply www.whotracks.me is 13.226.159.75
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 reply www.whotracks.me is 2600:9000:21d7:200:10:8b76:f140:93a1
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 reply www.whotracks.me is 2600:9000:21d7:9800:10:8b76:f140:93a1
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 reply www.whotracks.me is 2600:9000:21d7:2c00:10:8b76:f140:93a1
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 reply www.whotracks.me is 2600:9000:21d7:6000:10:8b76:f140:93a1
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 reply www.whotracks.me is 2600:9000:21d7:3600:10:8b76:f140:93a1
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 reply www.whotracks.me is 2600:9000:21d7:8c00:10:8b76:f140:93a1
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 reply www.whotracks.me is 2600:9000:21d7:a600:10:8b76:f140:93a1
Apr  8 17:58:22 dnsmasq[18854]: 14787 192.168.2.228/59656 reply www.whotracks.me is 2600:9000:21d7:b800:10:8b76:f140:93a1

The system does a A and AAAA query (first four (4) lines), using the IPv4 address, these are forwarded.
Than, the next two (2) lines, there is an A and AAAA, using the IPv6 address, apparently NOT forwarded, but using the same source port as the IPv4 queries did

The database entries:

If I read everything correct in the dnsmasq discussion group, the source port match causes dnsmasq to mark them as retries.

If my analysis is correct (NOT sure) this might be a windows problem, I have NO idea how to fix this...

I hope another windows 10 user reads this, perform the same tests, and hopefully confirm my diagnose...

No, because

The port doesn't matter at all and we don't know if it had the same ID but we know for sure the it is not the same IP address for the client (IPv4 vs. IPv6) --> it is a repeated query, not a retry in which case the behavior of dnsmasq/pihole-FTL is correct and my patch sent to Simon is actually saving you a whole lot of traffic. Much more than I assumed.

And your analysis is spot on: It is a Windows issue here. It should not send the same queries once per IPv4 and once per IPv6. This is needlessly doubled traffic. If you'd have only this one client (and it always behaves this way) you'd have 50% of all queries being unnecessary because IPv4 queries are in no way better than IPv6 queries (or vice versa).

A lot of internet docs say windows (and most other operating systems) prefer IPv6 over IPv4, yet, when looking at the pihole log, most of the DNS queries from this machine are IPv4 queries, only some, like the above example, do IPv4, immediately followed by an IPv6 queries.

The iPv4 DNS server is learned from the IPv4 DHCP server (e.g. the pihole IPv4 address), The IPv6 DNS server address is learned, using discovery. This IPv6 address is configured statically on the pfsense box (ends up in resolv.conf of the pfsense box), as soon as the client gets the IPv6 address (NOT a DHCP server, but the track interface option from pfsense), it also picks up the IPv6 DNS server address, using discovery.

Nothing I can do about this, I can't remove the IPv6 DNS server from the pfsense configuration (the pfsense web interface becomes really sluggish, if I do), I can't change the address into something else, because the client would pick up this address and thus bypass pihole.

Anyway, since the additional, unnecessary IPv6 queries are all local LAN only, and dnsmasq now prevents these queries from being forwarded, the user experience isn't really affected, the cause is now known, all I need to figure out is a solution...

Awaiting pihole-FTL v5.8, with final release of dnsmasq v2.85 (released today).

Again, thanks for your time, patience, effort to make pihole a better tool.

This seems like a bug. Is there something obvious differentiating the two behaviors? Like IPv4-only always from Chrome, IPv4+follow-up-IPv6 from the system for other apps? May help to narrow down who needs fixing.

When you say this, you mean they take it from Router Announcement (RA) packets broadcasted in your network? Or is there something else going on?

This very much looks like a bug in pfsense. Can you do us the favor and report it to them so they can fix it?

What if this is an invalid address? What happens? And what if you set this to ::1 ?

Maybe wait a few days. The bugs in 2.83 which lead to 2.84 (and now 2.85) were only discovered after the official dnsmasq release was out.

I concur. :+1:

  • using edge chromium, hasn't got any configurable DNS settings, as far as I know, apart from clear browser DNS cache, changing network adapter priorities doesn't solve anything, got suck (for the moment on network provider order (error: failed to get network providers).
  • RA
  • NOT a bug, found the requirement a long time ago on the netgate forum (system update status and packet manager take a very long time to load).
  • invalid and / or ::1 results in a timeout.
  • wait a few days? I have been testing v2.85 for the last week. Wait, what would be the point, sit back until other users get into trouble with the final release? Trying to help the developers and community here (mutual benefit)...

So one particular configuration got tested (extensively). I did another test. This does not sound convincing.

Exactly. Wait for problems to get reported to the main dnsmasq mailing list. If you monitor this (I know you do), you see that two bugs were reported (and fixed) since the release. The code is always in motion.