Force update of IP / hostname / clientname relationship

jpgpi250 · August 16, 2020, 9:06am

when my script detects a new client I use the following to send the signal:

isFTLready() {
# Ensure pihole-FTL has picjup the new entry
# ref https://discourse.pi-hole.net/t/pihole-g-list-download-disappointing-performance/27266/91
secs="30"
until timeout 1 getent hosts "$1" &> /dev/null; do
	[[ "${secs:-}" -eq 0 ]] && break
	: $((secs--))
	sleep 5
done
}

…

if ! grep -q "${IPv6ndpAddr}" ${hostsdirFile}; then
	currentTime=$(date)
	# add client to hosts file (/etc/pfsense/neigbour)
	echo "${IPv6ndpAddr} ${IPv6cfgName} ${IPv6cfgName}.localdomain # ${currentTime}" | sudo tee -a ${hostsdirFile}
	isFTLready "${IPv6ndpAddr}"
	pkill -RTMIN+4 pihole-FTL
fi

…

Unfortunately, pihole-FTL now keeps re-resolving all the host names. It doesn't stop anymore..., it keeps making PTR requests. I expected this to happen once.

once again pihole -up destroyes my custom pihole modifications, even without core or web updates. unfortunate...

I have no idea how to force the router (using pfsense) to do that. searching (duckduckgo) for pfsense and EDNS doesn't return any results. Will ask the the question on the netgate forum...
edit
as far as I can see in the pfsense webinterface, this would imply I have to setup a dns forwarder on the pfsense (= dnsmasq), and set the pfsense as the dnsserver for the clients, thus: client -> dnsmasq on pfsense -> pihole-FTL -> unbound -> world. Not really what I want (the additional dnsmasq on pfsense)…
/edit

DL6ER · August 16, 2020, 12:36pm

Ah yes, the signal was not reset. Fixed now.

It would still solve all of your problems...

jpgpi250 · August 16, 2020, 1:24pm

perfect ,works as expected, thank you!!!

don't forget to look at

[2020-08-16 15:02:00.307 15805/T15811] Trying to resolve 192.168.2.232
[2020-08-16 15:02:00.308 15805/T15811] Setting nameservers to:
[2020-08-16 15:02:00.308 15805/T15811]  0: 127.0.0.1:53
[2020-08-16 15:02:00.308 15805/T15811]  1: 0.0.0.0:0
[2020-08-16 15:02:00.308 15805/T15811]  2: 0.0.0.0:0
[2020-08-16 15:02:00.311 15805/T15811]  ---> "7730geth0.localdomain" (found internally)
[2020-08-16 15:02:00.311 15805/T15811] Setting nameservers back to default:
[2020-08-16 15:02:00.311 15805/T15811]  0: 127.0.0.1:53
[2020-08-16 15:02:00.311 15805/T15811]  1: 0.0.0.0:0
[2020-08-16 15:02:00.311 15805/T15811]  2: 0.0.0.0:0
[2020-08-16 15:02:00.311 15805/T15811] 2 / 7 client host names resolved
[2020-08-16 15:02:00.311 15805/T15811] 0 / 2 upstream server host names resolved
[2020-08-16 15:02:01.736 16178/F15805] TCP worker forked for client 192.168.2.232 on interface eth0 (192.168.2.57)
[2020-08-16 15:02:01.745 16178/F15805] TCP worker terminating (client disconnected)

DL6ER · August 16, 2020, 6:36pm

This actually doesn't seem to be an FTL bug, rather res_state does not contain any IPv6 addresses (it cannot). Which raises the question if this is a bug in the GNU C library.

This is the FTL routine printing the server list you quoted above:

github.com/pi-hole/FTL

src/resolve.c

dbd4a696a


      
          static void print_used_resolvers(const char *message)
          {
          	logg("%s", message);
          	for(unsigned int i = 0u; i < MAXNS; i++)
          		logg(" %u: %s:%d", i,
          		     inet_ntoa(_res.nsaddr_list[i].sin_addr),
          		     ntohs(_res.nsaddr_list[i].sin_port));
          }

You can see that is gets the server from _res.nsaddr_list[i].sin_addr. Looking into the responsible file,

github.com/lattera/glibc

resolv/bits/types/res_state.h

895ef79e0


      
          	struct sockaddr_in
          		nsaddr_list[MAXNS];	/* address of name server */

this variable is of type struct sockaddr_in which has only one address variable that is the sin_addr which is only 32-bits wide. So it could not even contain an IPv6 address.

I will check with an attached debugger if the IPv6 addresses are stored elsewhere where they are still accessible, however, so far, I cannot tell where the resolv.conf provided IPv6 servers are. It is rather strange that they are not stored in the _res_state object where all the other resolv.conf related stuff is stored.

DL6ER · August 16, 2020, 7:19pm

Will be fixed by:

https://github.com/pi-hole/FTL/pull/866

It is a pure cosmetic issue, the IPv6 servers are indeed there (and also even used by FTL), however, they are outside of the "normal" place where you store name servers, so I just forgot them in the debug printing.

Coro · August 17, 2020, 9:42am

Thank you for the fix.

jpgpi250 · August 24, 2020, 10:31am

I'm a little worried about the missing cleanup. Today, there was a question about the hourly PTR requests (on a dutch forum), visible in the query log. The user indicated he was already seeing about 90 PTR request on the hour, I assume this is due to the windows 10 privacy extensions and the ever changing IPv6 temporary address. I have no problem writing a cleanup routine myself, the average user will NOT (be able to) do this. Further more I use the MAXDBDAYS=8 setting, which makes cleanup relatively easy (the databse remains relatively small), as you already indicated, most users will NOT use this setting and have the default 365 days history.

Apart from this, letting you know that everything works as expected, no more IPv6 addresses without a name. Thank you for this.

edit
even works perfect for a Raspbian client with enabled privacy extensions, read here
/edit

Coro · August 24, 2020, 3:10pm

What kind of clean-up? Addresses not being used any more?

How would you distinguish temporary IP addresses (which will be abandoned litte later) from devices which left the house but come back later (permanent IPv6 addresses)?

Keeping only the last IPv6 address seen for a client will distort the picture as a client can very well have multiple IPv6 addresses at the same time. I don't think there is much that could be done here, but surprise us with a nice idea and everyone will benefit

This could be relaxed by not trying to resolve host names which have not be seen within the last hour.
But again the question: Is this what we want? May lead to missing to pick up a host name which came available only later.

jpgpi250 · August 24, 2020, 3:41pm

As I explained to DL6ER (and those users who followed these topics) in earlier conversations.

I'm using my router to get neighbour information (pfsense ndp -na) and populate a hosts file, read by pihole-FTL. This hosts file now already contains over 80 entries (six days up and running), cleaning this up is my problem, as this file only exists on my system.

pihole-FTL now uses these entries (normal host entries) to complete the database pihole-FTL.db, table network, field name. This table (network) and the table network_addresses contains all addresses pihole-FTL has ever seen, even if only once.
On other (not mine) systems, pihole-FTL uses EDNS(0) information, possibly other information to find and update the name field. It always happens (hourly), regardless of what I'm doing on my system.

Personally, I'm using MAXDBDAYS=8, so my database contains only information about the last 8 days, the default setting is MAXDBDAYS=365. The network and network_addresses
table are never cleaned up. This implies (assumption, never seen this, as I regularly reinstall pihole fresh) that a user, who has been running pihole -up, since the first appearance of these tables in the database, has every address, used since the installation of pihole, in the database.

Personally, I'm already cleaning up the additional hosts file, by checking in the pihole-FTL.db, queries table, if the addresses are still in the database (older temporary IPv6 addresses will no longer exist in the queries table after 8 days (MAXDBDAYS=8).
Since there is no cleanup routine, I'm considering to do a similar cleanup in the network and network_addresses tables, because I'm only interested in the entries that still have a match in the queries table.

Maybe DL6ER, the developer, will explain this is not necessary (sqlite3 performance not affected in the long run), that's why I'm asking him the question, he knows what I'm talking about, since we discussed it previously.

DanSchaper · August 24, 2020, 4:50pm

This thread is in the Community Help section so expect the community to comment.

Edit: An IPv6 address is 16 bytes. Which addresses should be set for removal, keeping in mind that an address that makes a query should be kept for historical reasons, the reason you keep the database is so you can go back and search for items like "Who made this query?"

DL6ER · August 24, 2020, 5:44pm

I concur with

The question of when to auto-clean which address is still open. It makes sense to do it like @jpgpi250 in the way of

because it doesn't matter if addresses are forgotten after there no more queries for it. However, this is not an universal answer as @Coro mentioned:

I don not really see this as an issue as removing addresses from the network_addresses table has no technical consequences (with the exception that the who-is-who relation mentioned by @DanSchaper will be lost).

Yes, performance-wise this is totally irrelevant. I would only start to worry if the number of addresses goes into the 10,000. I know, it will go into this direction at some point for many users with Windows doing this now by default, I'm not opposed to doing an auto-clean. If it is done, however, it should be done properly.

So ideas:

The way @jpgpi250 mentioned: Scan if the address is still being mentioned in queries and - if not - remove the address. I'm afraid this is not doable because scanning the queries table is a full table search.
On a typical table with a size of 365 days, scanning through several million (maybe dozens of) queries is not feasible. Unfortunately, there is no index of client and retrofitting one is also not a great option as the database update could take up to hours on a Zero and interrupting this (because the user thinks something is broken and hard-resets the device) will make recovery difficult. Not doable
Wipe out addresses not seen for some time. We know for each address when the last query was made so this is possible.
However, what should be the default? One day, one week, one month, one year? This could be user-configurable, however, every extra config option adds extra validation requirements, etc. We could also just reuse the MAXDBDAYS setting here - Doable

So much for the first issue: Auto-cleaning of addresses

There is another topic brought up here:

We could also limit the re-resolving to those clients which were recently active (like within the last hour). Any objections from anyone?

jpgpi250 · August 24, 2020, 5:56pm

maybe (don't know if this is possible) you should take a survey to the entire community, to check if anybody is really interested in having a 365 days history. I set MAXDBDAYS=8, because this allows me to look at queries on the same day, one week ago, e.g. check if the system behaved identically on the same day, one week ago.

time frame one hour + devices that don't have a name yet?

DanSchaper · August 24, 2020, 6:02pm

Do we really need to default to 365 days though? If historical data is needed for that kind of time period then a better option would be something like ElkHole. I'm having a hard time justifying more than 1 month (30 days) data storage with the granularity to know IP to query relationships. Possibly a tiered approach? Granular for 30 days (just arbitrary number) and then something more "fat" like aggregate numbers.

(I think jpgpi has brought up the year long storage issue in the past and I think it's an issue that should be addressed in some fashion.)

DL6ER · August 24, 2020, 6:06pm

I will think about making a poll. I'm not yet convinced this is necessary because it is just a default and users can set whatever they like. I, personally, use the long-term data not very often. However, when I do this, I'm often looking back several months (like to before a certain device was added to my WiFi). Usages differ and my way of doing this may just be another extreme for which I will have to change the default setting.

I'm a bit worried about users updating, not having set this value and immediately loosing 11 months of history (assuming we set the default to 1 month).

This would prevent existing host names from being updated. I'm thinking about DHCP servers handing out the same address to other devices if the first device disappeared (non-deterministic or sequential DHCP). You may be stuck with 192.168.0.2 -> evas-phone even when this information is several hours old and 192.168.0.2 -> adams-tablet is now correct.

So I disagree on the second but agree on the first bit. Maybe two hours. This ensures that (up to) three PTRs are being made and - if nothing is found - we do not try any further. Unless, the client is still active in which case we continue to resolve on the full hour.

jpgpi250 · August 24, 2020, 6:07pm

If your going for a month, I would suggest 32 days. Reasoning option to see what happened on the same day of month, exactly one month ago, longest month is 31 days.

Just a suggestion DanSchaper, no offence intended...

edit
you could do this for new installs only, eliminates the danger that may exist on existing piholes
/edit

DanSchaper · August 24, 2020, 6:41pm

No offense taken, like I said, just an arbitrary number so 32 is no more or less functional than 30.

DL6ER · August 24, 2020, 6:43pm

Cleaning addresses not seen for 7 days:

https://github.com/pi-hole/FTL/pull/871

Only resolve clients and upstream servers being active within the last two hours:

https://github.com/pi-hole/FTL/pull/872

Both PRs are in draft mode as they depend on #762 being merged first. The changes are muss less than suggested by the PR preview.

DanSchaper · August 24, 2020, 6:43pm

With a major version update (like v6) we're allowed to make breaking changes. I think there's a few ways to handle it gracefully though. Backup the existing database (which we should do anyways), only modify if the database is less than 30/32 days old, or some other genius idea you always come up with.

I know huge flashing neon warnings are always ignored but hopefully the recent v5 update poked enough people to realize that auto updating is horrifically bad.

jpgpi250 · August 24, 2020, 6:50pm

do these changes have impact on the changes, currently available in pihole checkout ftl tweak/always_update_database_hostnames, more specifically, is the signal pkill -RTMIN+4 pihole-FTL still available, when all of this is merged?

DL6ER · August 24, 2020, 6:50pm

Yes, they are independent.