Pi-Hole for a lot of IPv6 clients

Thanks for your reply. on top of everything else, disabling "database.DBexport" resolves the issue. I also noted that the pihole-FTL.db gets quite large over time. (last time I checked was 2.1GB).

I have another problem related to this topic.

because the dns server has a lot of clients. (sometimes more than 2000 concurrent IPv6 clients)

the part of admin panel "Client activity over last 24 hours" takes a few minutes to load and makes the whole admin panel unresponsive. is there a solution for that? maybe disabling that part?

I vaguely remember we have discussed this somewhere in the past but it seems there was no conclusion. I will think out aloud in the following. No guarantee for a linear chain of thoughts!

There are two ways of implementing this:

  1. Reduce the amount of data that is sent via the API.
    This is actually more tricky that it seems. FTL does not sort global maximum clients but, instead sends the data for all clients. We could rather easily say: Don't send more than the data concerning 20[1] clients. But this comes with no guarantee that these are also the most active ones (hint: they will often not be).
  2. Send the full amount of data and have the frontend (Javascript) deal with reducing it
    This has the advantage of that the user's PC running the browser is most often much much beefier than the tiny little thing running Pi-hole but this has several drawbacks as well, most obvious the possible enormous amount of API data that is being returned and then this needing to be handled inside the browser (well, we are talking KB to MB of JSON, not even close to GB here even with thousands of clients)

No. #2 seems to be worse than no. #1 so this is likely the way we go.

But I also don't like the kind of uncertainty that lies in this. We could first make FTL determine the 20[1:1] most active clients and then use these to populate the array for the graph. This would do away with said uncertainty but is more work. Given you have many clients it may actually be a lot of work. However, we should be able to assume that a Pi-hole serving thousands of clients concurrently runs on something better than a PogoPlug[2] so this price may be okay to pay.

Opinions?


  1. The number 20 is kind of an arbitrary choice here, it can surely be discussed ↩︎ ↩︎

  2. One of the most shitty single-board-computers ↩︎

1 Like
1 Like

I totally agree. let the machine lift the heavy things instead of the poor browser.
I also think that in cases like mine where we're dealing with thousands of clients, super accurate reportings aren't as neccessary as other things (like the snappyness of the UI) to be our #1 priority.

or at laest we can keep the accuracy but generate the graphs on demand, not automatically on the index page of Pi-Hole admin.

It is probably the same issue:

I have an update on this. previously, my vpn servers used the IPv6 as their dns resolver and it had the problem of stopping (without any errors or resource exhaustion) but as soon as I changed the dns resolver to the IPv4 of the Pi-Hole server, the problem stopped reoccuring.

note that the amount of queries and concurrent requests are the same, but Pi-Hole is no longer having problems even in the most busy hours of the day. so what's the catch?

To be crystal clear:

  • The only difference in your test is that the VPN server used IPv4 or IPv6 addresses of your Pi-hole?

  • Did you restart pihole-FTL at any time in your tests?, or

  • is it really simply just stopping to reply on IPv6 while IPv4 continues to work as if nothing has happened? (without restarting pihole-FTL at all)

1 Like

Yes. the only difference is using IPv4 instead of IPv6 of the Pi-Hole server.

when it starts being unresponsive, "service Pihole-FTL restart" or "pihole restartdns" brings it back to life. although just temporarily.

and once it becomes unresponsive again on the IPv6, it doesn't work at all. not even with the IPv4 as the resolver.

Do you see anything related in any of the files &var/log/pihole/*.log indicating a problem?

Am I right to assume that your VPN servers have a fixed IPv6 address so Pi-hole only ever sees like ... 6 (or so) clients?

This error pops up the moment it stops working:

2023-12-22 11:26:34.720 [2485M] INFO: All threads joined

2023-12-22 11:26:34.721 [2485M] ERR: SQLite3 message: cannot open file at line 44110 of [ebead0e723] (14)

2023-12-22 11:26:34.721 [2485M] ERR: SQLite3 message: os_unix.c:44110: (0) open(/etc/pihole/pihole-FTL.db) - (14)

2023-12-22 11:26:34.721 [2485M] ERR: Error while trying to open database: unable to open database file

2023-12-22 11:26:34.721 [2485M] WARNING: Failed to open database in backup_db_sessions()

and a "service pihole-FTL restart" solves it.

And no. when Pi-Hole's IPv6 is used, each 'individual' client sees it. the vpn servers mostly have a /112 IPv6 range routed to them and each client gets a /128 so the number of clients shown on Pi-Hole goes up to the total number of vpn clients. (which is 2500+).

And now that I'm using the IPv4, when that db file gets large enough, the same thing happens, it just takes a longer time. and when it happens, either it the dns resolution becomes extremely slow or it doesn't respond at all. so basically I guess whatever the problem is, its about the database file.

Well, interesting...

There is nothing before this? The reason I'm asking is that this is a message that is only shown when FTL is shutting down. Quite obvious it stop reacting when it shut down :wink:

So now we need to find out why it was shut down and if this was triggered internally or externally.

I have cleared the logs. once the problem happens again, i'll post them here. fingers crossed.

It stopped working again. note that the logs were cleared with "rm -f /var/log/pihole/*" and then the "pihole restartdns" were run. used it for a few hours, and now this is the log files in the 'unresponsive' state.

FTL.log.1:

2023-12-22 23:51:10.014 [35747M] INFO: Shutting down... // exit code 0 // jmpret 0
2023-12-22 23:51:10.329 [35747M] INFO: Finished final database update
2023-12-22 23:51:10.329 [35747M] INFO: Waiting for threads to join
2023-12-22 23:51:10.329 [35747M] INFO: Thread database (0) is idle, terminating it.
2023-12-22 23:51:10.329 [35747M] INFO: Thread housekeeper (1) is idle, terminating it.
2023-12-22 23:51:10.329 [35747M] INFO: Thread DNS client (2) is idle, terminating it.
2023-12-22 23:51:10.329 [35747M] INFO: All threads joined
2023-12-22 23:51:10.333 [35747M] INFO: Stored 1 API session in the database
2023-12-22 23:51:12.320 [35747M] INFO: ########## FTL terminated after 12m 55s  (code 0)! ##########
2023-12-22 23:51:12.397 [36017M] INFO: ########## FTL started on Stark-Smart-DNS! ##########
2023-12-22 23:51:12.397 [36017M] INFO: FTL branch: development-v6
2023-12-22 23:51:12.397 [36017M] INFO: FTL version: vDev-7b59c65
2023-12-22 23:51:12.397 [36017M] INFO: FTL commit: 7b59c651
2023-12-22 23:51:12.397 [36017M] INFO: FTL date: 2023-12-17 06:52:04 +0100
2023-12-22 23:51:12.398 [36017M] INFO: FTL user: pihole
2023-12-22 23:51:12.398 [36017M] INFO: Compiled for linux/arm64/v8 (compiled on CI) using cc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924
2023-12-22 23:51:12.400 [36017M] INFO: Parsed config file /etc/pihole/pihole.toml successfully
2023-12-22 23:51:12.400 [36017M] INFO: PID of FTL process: 36017
2023-12-22 23:51:12.401 [36017M] INFO: Database version is 16
2023-12-22 23:51:12.401 [36017M] INFO: Database successfully initialized
2023-12-22 23:51:12.406 [36017M] INFO:  -> Total DNS queries: 0
2023-12-22 23:51:12.406 [36017M] INFO:  -> Cached DNS queries: 0
2023-12-22 23:51:12.407 [36017M] INFO:  -> Forwarded DNS queries: 0
2023-12-22 23:51:12.407 [36017M] INFO:  -> Blocked DNS queries: 0
2023-12-22 23:51:12.407 [36017M] INFO:  -> Unknown DNS queries: 0
2023-12-22 23:51:12.407 [36017M] INFO:  -> Unique domains: 0
2023-12-22 23:51:12.407 [36017M] INFO:  -> Unique clients: 0
2023-12-22 23:51:12.407 [36017M] INFO:  -> Known forward destinations: 0
2023-12-22 23:51:12.407 [36017M] INFO: listening on 0.0.0.0 port 53
2023-12-22 23:51:12.407 [36017M] INFO: listening on :: port 53
2023-12-22 23:51:12.408 [36017M] INFO: PID of FTL process: 36017
2023-12-22 23:51:12.408 [36017M] INFO: FTL is running as user pihole (UID 998)
2023-12-22 23:51:12.408 [36017M] INFO: Reading certificate from /etc/pihole/tls.pem ...
2023-12-22 23:51:12.408 [36017M] INFO: Using SSL/TLS certificate file /etc/pihole/tls.pem
2023-12-22 23:51:12.410 [36017M] INFO: Restored 1 API session from the database
2023-12-22 23:51:12.414 [36017M] INFO: Blocking status is enabled
2023-12-22 23:51:12.517 [36017/T36018] INFO: Compiled 0 allow and 17 deny regex for 3 clients in 6.9 msec


pihole.log.1:

Dec 22 23:51:12 dnsmasq[36017]: started, version pi-hole-v2.89-e1de9c2 cache disabled
Dec 22 23:51:12 dnsmasq[36017]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n IDN2 DHCP DHCPv6 Lua TFTP no-conntrack ipset no-nftset auth cryptohash DNSSEC loop-detect inotify dumpfile
Dec 22 23:51:12 dnsmasq[36017]: using nameserver 8.8.8.8#53
Dec 22 23:51:12 dnsmasq[36017]: using nameserver 8.8.4.4#53
Dec 22 23:51:12 dnsmasq[36017]: using only locally-known addresses for onion
Dec 22 23:51:12 dnsmasq[36017]: using only locally-known addresses for bind
Dec 22 23:51:12 dnsmasq[36017]: using only locally-known addresses for invalid
Dec 22 23:51:12 dnsmasq[36017]: using only locally-known addresses for localhost
Dec 22 23:51:12 dnsmasq[36017]: using only locally-known addresses for test
Dec 22 23:51:12 dnsmasq[36017]: using only locally-known addresses for lan
Dec 22 23:51:12 dnsmasq[36017]: read /etc/hosts - 16 names
Dec 22 23:51:12 dnsmasq[36017]: read /etc/pihole/local.list - 0 names
Dec 22 23:51:12 dnsmasq[36017]: read /etc/pihole/hosts/custom.list - 0 names
root@Stark-Smart-DNS:/etc/pihole# ls -ls
total 402260
     4 -rw-rw---- 1 pihole pihole        65 Dec 20 12:48 adlists.list
     4 drwxr-xr-x 2 pihole pihole      4096 Dec 21 00:01 config_backups
     0 -rw-rw---- 1 pihole pihole         0 Dec 19 12:42 dhcp.leases
     8 -rw-rw---- 1 pihole pihole      5037 Dec 20 14:56 dnsmasq.conf
     4 -rw-rw---- 1 pihole pihole       651 Dec 20 14:43 dns-servers.conf
     4 -rw-rw---- 1 pihole pihole        15 Dec 20 12:47 ftlbranch
   104 -rw-rw---- 1 pihole pihole    106496 Dec 22 07:49 gravity.db
   104 -rw-rw---- 1 pihole pihole    106496 Dec 20 12:48 gravity_old.db
     4 drwxr-xr-x 2 pihole pihole      4096 Dec 20 12:48 hosts
     4 -rw-rw---- 1 pihole pihole       408 Dec 20 14:43 install.log
     4 -rw-rw---- 1 pihole pihole        65 Dec 20 14:44 local.list
     4 -rw-r--r-- 1 root   root         245 Dec 19 12:42 logrotate
  3044 -rw-rw---- 1 pihole pihole   3117056 Dec 20 12:48 macvendor.db
     4 drwxr-xr-x 2 pihole pihole      4096 Dec 19 12:42 migration_backup
398904 -rw-rw-r-- 1 pihole pihole 408473600 Dec 23 14:04 pihole-FTL.db
    44 -rw-rw---- 1 pihole pihole     44764 Dec 21 00:01 pihole.toml
     4 -rw-rw---- 1 pihole pihole       441 Dec 20 08:05 setupVars.conf
     4 -rw-rw---- 1 pihole pihole       668 Dec 20 12:48 tls.crt
     4 -rw-rw---- 1 pihole pihole       956 Dec 20 12:48 tls.pem
     4 -rw-rw---- 1 pihole pihole       380 Dec 22 23:32 versions
root@Stark-Smart-DNS:/etc/pihole# du -msh
394M    .

So this confirms it: Something on your system is shutting down FTL intentionally. But we don't know who is doing that.

My suggestion would be to run

sudo pihole checkout ftl new/who_murders_me

in about half an hour from now (when the binaries are all built). When FTL from this branch is asked to terminate, it tries to determine who is the murderer and logs a line like

2023-12-23 16:00:47.926 [2260911M] INFO: Asked to terminate by "/lib/systemd/systemd --system --deserialize 56" (PID 1, user root UID 0)

This should help finding what is going on and seems useful to have.

2023-12-24 13:44:49.313 [63410M] INFO: Asked to terminate by "/sbin/init" (PID 1, user root UID 0)
2023-12-24 13:44:49.410 [63410M] INFO: Shutting down... // exit code 0 // jmpret 0
2023-12-24 13:44:49.660 [63410M] INFO: Waiting for threads to join
2023-12-24 13:44:49.660 [63410M] INFO: Thread database (0) is idle, terminating it.
2023-12-24 13:44:49.660 [63410M] INFO: Thread housekeeper (1) is idle, terminating it.
2023-12-24 13:44:49.660 [63410M] INFO: Thread DNS client (2) is idle, terminating it.
2023-12-24 13:44:49.660 [63410M] INFO: All threads joined
2023-12-24 13:44:49.674 [63410M] INFO: Stored 1 API session in the database
2023-12-24 13:44:50.609 [63410M] INFO: ########## FTL terminated after 2h 39m 57s  (code 0)! ##########
2023-12-24 13:44:50.691 [65481M] INFO: ########## FTL started on Stark-Smart-DNS! ##########

Okay, so this was init so something on your system ran sudo service pihole-FTL restart. And, in fact, it immediately restarted:

Okay but even if it is being restarted by some unknown process. why does it stop working? a restart should not make it unresponsive.

I full agree. Yes, I am slightly confused now. Did it always restart and was not responsive afterwards? It seems like you needed to restart it manually before, also we have seen no traces of a restart in previous discussions of the problem, e.g.,

So the issue now looks like: Something on your system is restarting FTL, it restarts as asked and the restarted process does not respond at all. But another (manual) restart fixes it?

Could you:

  1. Wait until in unresponsive mode
  2. Run sudo pihole-FTL config debug.queries true
  3. Try to send a few queries - nothing shows up at all in FTL.log ?

Does it only stop responding over the VPN or does it even stop responding locally, i.e., 127.0.0.1 and ::1 ? I wonder if the VPN may have some issues with a process shutting down and immediately rebinding the same port (53).

New errors popped up in FTL.log

2023-12-27 12:34:02.113 [67900/F58120] ERR: SQLite3 message: database is locked in "SELECT hwaddr FROM network WHERE id = (SELECT network_id FROM network_addresses WHERE ip = ? GROUP BY ip HAVING max(lastSeen));" (5)
2023-12-27 12:34:02.114 [67900/F58120] ERR: getMACfromIP("49.12.189.79") - SQL error prepare: database is locked
2023-12-27 12:34:03.117 [67900/F58120] ERR: SQLite3 message: database is locked in "SELECT name FROM network_addresses WHERE name IS NOT NULL AND ip = ?;" (5)
2023-12-27 12:34:03.117 [67900/F58120] ERR: getNameFromIP("49.12.189.79") - SQL error prepare: database is locked
2023-12-27 12:34:04.120 [67900/F58120] ERR: SQLite3 message: database is locked in "SELECT interface FROM network JOIN network_addresses ON network_addresses.network_id = network.id WHERE network_addresses.ip = ? AND interface != 'N/A' AND interface IS NOT NULL;" (5)
2023-12-27 12:34:04.120 [67900/F58120] ERR: getIfaceFromIP("49.12.189.79") - SQL error prepare: database is locked