Discussion cleanup network and network_addresses table in /etc/pihole-FTL.db

Let's make one thing very clear: I highly value all comments and ideas that DL6ER submits. This is NOT in any way an attempt to disrespect, offend or insult DL6ER. I want to thank him for his time, effort and great work he contributes to making pihole a better product.

As requested by DL6ER, I'm starting a discussion regarding this automated cleanup. This was discussed earlier in this topic.

As a result of the brief discussion, a change was entered by DL6ER:

the change implies all client entries, not seen for seven days are simply removed.

I don't have access to a lot of router/firewall devices, but seem to remember from my old days that a lot of DHCP servers have a default lease time of 14 days. I'm aware this is NOT the case for dnsmasq (=pihole-FTL), the default dnsmasq lease times are one hour for IPv4 and one day for IPv6 (/dnsmasq-2.82/src/config.h).

Personally, I'm not very happy with this (the change suggested by DL6ER) approach, it implies that query log entries, older than seven days, would no longer show a hostname, but the IP address.

The argument DL6ER uses to justify this:
quote
Imagine DHCP servers handing out the IP address 192.168.1.2 today to your phone, tomorrow your laptop, and then your whatever. The network table only stores the most recent address<->device information so abusing it for long-term information can be problematic.
/quote

Although this is true, I feel this is only a cosmetic issue for those users who fail to make static DHCP entries, based on the MAC address of their devices.
Some of you will argue that this is NOT done by the majority of the users. Personally, I think users will eventually be forced to do (learn) this, for the simple reason setting up group management, using clients that get a different IP address from the DHCP server, as soon as the lease expires, would be near impossible. The group assignments would simply apply to a different device, as soon as the address changes. The same applies to piholes Local DNS Records, fairly pointless to enter data in this hostsfile, if the addresses change all the time.

I suggested the following:

  • The MAXDBDAYS setting (default if not in /etc/pihole-FTL.conf = 365). implies there is a database cleanup routine, that removes entries from the queries table, if the number of days specified, is exceeded. It's fairly easy, without severe performance impact on sqlite3, to get the first entry in the queries table:
firstQuery=$(sqlite3 ${piholeFTLdb} ".timeout = 20000" \
	"SELECT timestamp FROM "queries" ORDER BY ROWID ASC LIMIT 1;")

As soon as the MAXDBDAYS routine has completed (again, I assume this routine is executed once a day), the time of the first entry in the queries table will have changed (if the system has been up and running long enough)

  • Now we can get the id and lastQuery fields from the network table for all entries in the network table:
mapfile -t networkArray < <(sqlite3 ${piholeFTLdb} ".timeout = 20000" \
	"SELECT id, lastQuery FROM network \
		ORDER BY id;")
  • Finally, loop over the entries in the network table, excluding the entries that don't have a valid lastQuery entry (usually the router MAC address) and remove the entries that have a lastQuery entry, older than the firstQuery entry in the queries table.:
for (( i=0; i<${#networkArray[@]}; i++ )); do
	IFS='|' read -r id lastQuery <<< ${networkArray[i]}
	if [ \( "${firstQuery}" -gt "${lastQuery}" \) -a \( "${lastQuery}" -gt "0" \) ]; then
		sudo sqlite3 ${piholeFTLdb} ".timeout = 20000" \
			"DELETE from network 
				WHERE id = \"${id}\";"
		sudo sqlite3 ${piholeFTLdb} ".timeout = 20000" \
			"DELETE from network_addresses
				WHERE id = \"${id}\";"
	fi
done

This is, of course the bash solution, unfortunately, I'm not smart enough to write the required FTL code.

This solution is, in my view, the perfect solution for those who have assigned a static DHCP address for their devices. I agree that it might be confusing (cosmetic) for users that don't do that, but also assume they are not using the full potential of pihole (group management), they will sooner or later be forced to use static DHCP assignments, if they ever want group management to be reliable.
The solution also respects the MAXDBDAYS setting, users my have configured (or the 365 days default), and will allow them to identify queries, using the hostname, for queries older than seven days.

DL6ER: You have repeatedly indicated you don't use Windows 10, thus not having experienced the problems (cosmetic) with ever changing temporary IPv6 addresses. Might I suggest you apply these settings, although complete useless (the conclusion of the topic) to one of your clients, just to provide you with a first hand experience of the results of this (default) Windows 10 behavior. Again, just as the majority of the users don't assign static DHCP addresses, I assume the same majority will never disable the Windows 10 privacy extensions.

:+1:

Is Pi-hole using this information at all for host names displaying or is this a hypothetical argument?

Static DHCP entries are a mistake by themselves. Use a deterministic DHCP server (like Pi-hole) to achieve the same thing. Only crap ISP routers hand out sequential IP addresses. Unfortunately, there are many of them.

Agreed.

Why not simply delete all addresses not seen longer than MAXDBDAYS ago? The extra scan of the queries table seems entirely superfluous.


I just read the discussion you linked to and also a few other related ones and I am very surprised about this. Because of what you said above (the Windows issue).

If you use the MAXDBDAYS setting, this will not have any immediate effect on the extreme number of "private" IPv6 addresses. His method would at least partially clean them up. So if you get a few a days, they will disappear after a couple of days. At least, they will not grow towards infinity.

You are arguing that you are looking for a solution for the majority of users. The majority of users will have standard settings, so MAXDBDAYS to one year. Even if they decide to change this to one month, it would still accumulate a ton of addresses.

My question: What do you want (this is not clear to me, no offending in this question intended)?

  1. Do you want to clean up the addresses soon enough so that there is not a ton of addresses after a few days? Or
  2. Do you want to have IP address-to-hostname resolvability for the entire lifespan of the database?

I don't think you can have both.
edit Maybe unless you can reliably identify Windows (or any other) privacy addresses so they can be cleaned up separately (and sooner). However, how about their long-term host names resolving?

I've done extensive testing on this. If the network table doesn't have a hostname in the name field, the query log, long term query log and dashboard show IP addresses, as opposed to the names. If a name is injected into that field, either by pihole-FTL or using sqlite3 commands, the hostname appears in the web interface.

address-to-hostname resolvability for the entire lifespan of the database

yes I can

solved that problem by retrieving and storing the neighbor information in a separate hostsfile.

I use MAXDBDAYS=8, this allows me to see what happened (today is Monday) the entries of the previous Monday, this to identify if something is happening weekly. I hate the fact that the entries from one week ago (last Monday) would require me to resolve IP addresses, even though all th IP/hostsname information is available to me (but no longer in the database, due to the seven day cleanup routine. The very least I would be satisfied with, is a configurable setting, something like CLEANUPNETWORKDAYS=. Using the MAXDBDAYS setting would satisfy me, since I already limited the DB size, but could be a problem for users having a different (or none) MAXDBDAYS setting.

Even faster if you use ftl table with id=2

1 Like

Just to summarize this for myself: Your request is to make the clean-up interval identical to the MAXDBDAYS value, effectively making it rather useless for the current default of 365 days?

Apparently, you seem to enjoy turning this discussion into something ridiculous, using every word or value I mention as a reference to make me look stupid.

I'm using MAXDBDAYS=8, so the default 365 days setting doesn't interest me. For all I care, the cleanup routine doesn't need to be executed at all, if there isn't a MAXDBDAYS setting in /etc/pihole/pihole-FTL.conf, as this would indicate the user isn't interested in optimizing the system.

If the reference for future changes to FTL is whether or not the average user is (will be) using the new or existing features/settings, than DL6ER may just as well throw away 90% of all his work over the last few months, examples, not limited to, here, here, here, ...

Listen, I realize you are ticked off for whatever reasons. But, leave my feature request (“Support for add-subnet option from dnsmasq (ECS/EDNS0 Client Subnet)”) out of it, please.

I think you are missing the value in what I proposed in that request, even for an “average” user. That feature should simplify a lot of “average” configurations and at the same time it fits for more advanced use-cases like the one I specifically called out in the request.

Friendly reminder everyone:

If you preface comments with phrases like "I don't mean to offend" that doesn't mean you can then go on to offend without impunity.

Keep it on topic unless you want to be ignored by the developers.

2 Likes

I see this has went on quite a bit. I agreed with @jpgpi250 already yesterday (elsewhere) that the cleaning is up to discussion. Hence, I switched the corresponding feature pull request into draft mode so we have all the time we need to find out what we want.

It's not a problem of mine not being able to realize that cleaning is needed. Otherwise, I would have not proposed an automatic cleaning feature (PR above). This is also not part of the discussion. The discussion is about:

Is 7 days a good choice or it is too harsh?

Reading through this, I can see mainly two sides:

  1. We want to keep the IPs for MAXDBDAYS to be able to get host names from IP addresses
  2. We want to get rid of addresses popped up once as fast as possible so the network_table doesn't grow like crazy all the time

I thought the previous discussion we had (again elsewhere) was targeting point 2. To clean up with Windows machines filling the network_addresses table to the extreme. It seems I was mistaken when I do now read that this seems to have been a misinterpretation on my side. Such things happen.

The proposal is now to set the expiration of network_addresses to MAXDBDAYS. This will lead to a much less effective cleaning* but otherwise more consistent implementation.

Any objections?


*) In the other discussion, @jpgpi250 pointed out that Windows adds a new temporary IPv6 PE every two hours. This makes 12 a day, 84 a week, roughly 4,400 a year. This is per-device but assuming 24/7 operation so likely worse than you'd see in reality.

New plan to get something everyone can hopefully get happy with: The maximum age of addresses can now be controlled though the new MAXNETAGE config option. It defaults to the value of MAXDBDAYS. The new option MAXNETAGE can be set to zero to avoid any auto-cleaning.

3 Likes

I never said that on a windows 10 machine the temporary IPv6 address changes every two hours. The 2 hour change is something I experimented with on a Linux machine, see this topic.

On a windows machine, the temporary IPv6 address changes every time you reboot OR once a day (if you leave the machine on), unless you change the values in the registry (not tested). You can confirm this by running the command netsh interface ipv6 show privacy, which will show you, among other things Maximum Preferred Lifetime: 1d

great solution, thank you for this.

My native language is not English, this may lead to some confusion from time to time. The opposite of what you understood is true, I encourage DL6ER to make these kind of changes, even if they will only be used by a few users. I have been running another test branch my self, the one that allows regexes such as .*;querytype=!A, and even though I assume very few users will ever use this, I'm an absolute fan of these kind of features / enhancements, even the ones I will not use myself.

I understood it the same way, thanks for clearing this up. The confusion comes from "the system"

which is Linux at this point, but this wasn't clear to me. I guess the same confusion hit @DL6ER.

I agree :+1: