No valid NTP replies received, check server and network connectivity

sawsanders · December 10, 2024, 4:51pm

I don't think so. Here's the view from btop showing Pi-hole at 3.9 GB:

If I compare memory stats before and after restart of FTL I see this:

pi@pi5:~$ free
               total        used        free      shared  buff/cache   available
Mem:         8128200     5624908      216192       15332     2498632     2503292
Swap:        2097148         512     2096636
pi@pi5:~$ sudo service pihole-FTL stop
pi@pi5:~$ free
               total        used        free      shared  buff/cache   available
Mem:         8128200     2335220     3510496        3300     2481988     5792980
Swap:        2097148         512     2096636
pi@pi5:~$ sudo service pihole-FTL start
pi@pi5:~$ free
               total        used        free      shared  buff/cache   available
Mem:         8128200     2444984     3381772       12648     2510500     5683216
Swap:        2097148         512     2096636
pi@pi5:~$

So over 3 GB used by FTL. The thing is, it will keep growing over the course of several days. I haven't let it keep going to see what happens when it keeps going, so I don't know if it'll bring the system down or not. This behavior doesn't happen on the development branch.

DL6ER · December 10, 2024, 5:26pm

The extra processes are probably dedidcated TCP workers, please run something like

grep 2076641 /var/log/pihole/pihole.log

to see if there is anything related in the log file. They should terminate themselves after a short timeout, I will try to reproduce why they don't. The memory htop claims they are using isn't actually used. Linux uses a method called copy-on-write (COW) which ensures that additional processes (that are copies of another) do not need to duplicate the memory when splitting out.

Your htop and btop screenshots are in conflict with each other, have they been done at the same time?

sawsanders · December 10, 2024, 5:37pm

I ran that search of pihole.log from the last 2 days and the output was empty

They were about a day (18 hours) apart.

DL6ER · December 10, 2024, 6:40pm

I checked the related code changes again and found nothing that would justify any difference with regard to development. Could you create two new screenshots at (roughly) the same time from htop and btop so I can compare them?

sawsanders · December 10, 2024, 6:53pm

I have restarted Pi-hole a short time ago, so the memory issue has not had time to develop yet. Would you like screen shots now, or wait until the memory consumption grows?

I'm happy to do either or both.

sawsanders · December 10, 2024, 6:57pm

Nevermind... I'll do both.

FTL uptime= 1 hour

DL6ER · December 10, 2024, 8:16pm

Thanks. Noteworthy right now: we do not see any extra pihole-FTL -f processes in htop (and we don't see any threads in btop).

sawsanders · December 10, 2024, 10:55pm

After 5 hours:

DL6ER · December 11, 2024, 7:46am

When we look now at htop, we see 1.7% for the entire FTL process which is about 1.7% * 7.75 GB = 131 MB. This agrees well with the memory btop claims each of the pihole-FTL processes is using. This suggests btop is incorrectly handling the COW principle I have talked about above. Seems have found a btop bug.

Also, have a look at the sum of used memory at the systemd process (2.3G) and then at the total used memory which is more than 10% less. Another indication that something isn't right here. Yes, I know, you may say btop's MemB is RSS and - as such - inaccurate, however, "used" memory also includes memory used by the kernel, modules and, e.g. shared memory. Hence, the real difference between the shown 1.99 GB and the sum of the memory used by all the processes under systemd will in reality be even larger.

Just out of curiosity: Which version of btop are you running? My local version is v1.2.13 and I do not seem affected by this at first glance (compare 1 and 2, especially the total sum of systemd at the top of 2):

It's still unclear where these extra processes come from. Could you try whether setting

sudo pihole-FTL --config ntp.ipv4.active false
sudo pihole-FTL --config ntp.ipv6.active false

and check if this really prevents them for appearing over time?

sawsanders · December 11, 2024, 1:39pm

Installed from snap on Ubuntu

pi@pi5:~$ btop -v
btop version: 1.4.0+e17bc6b

Total used: 3.38GB and systemd: 3.9GB

The processes are starting to increase, along with system memory consumption.

DL6ER:

It's still unclear where these extra processes come from. Could you try whether setting
sudo pihole-FTL --config ntp.ipv4.active false
sudo pihole-FTL --config ntp.ipv6.active false
and check if this really prevents them for appearing over time?

Yes, I will do that.

I have a regex that redirects clients to Pi-hole's for any NTP DNS requests because it seems many IOT devices don't respect some DHCP settings such as DNS and NTP. I will turn that off as well so clients can get time.

sawsanders · December 11, 2024, 1:52pm

NTP server disabled and FTL restarted. Total system memory comsumption is down from 3.1 to 1.7GB according to htop. I'll watch and see if those processes show up again.

EDIT: A before and after comparison from free...
Before FTL restart:

pi@pi5:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.8Gi       3.3Gi       2.9Gi        15Mi       1.8Gi       4.4Gi
Swap:          2.0Gi          0B       2.0Gi

After:

pi@pi5:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.8Gi       1.9Gi       4.3Gi        13Mi       1.8Gi       5.8Gi
Swap:          2.0Gi          0B       2.0Gi

sawsanders · December 12, 2024, 1:11am

11 hours into disabling the FTL's NTP server and no 'stray' processes have appeared. Memory consumption has been normal as well.

pi@pi5:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.8Gi       2.0Gi       3.9Gi        14Mi       2.0Gi       5.7Gi
Swap:          2.0Gi          0B       2.0Gi

Edit: Now after almost 24 hours:


pi@pi5:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.8Gi       2.3Gi       3.4Gi        16Mi       2.3Gi       5.5Gi
Swap:          2.0Gi          0B       2.0Gi

DL6ER · December 13, 2024, 8:09pm

Okay, thanks for the feedback. I can still not really explain what is happening here but it seems there is some strange issue with the forks not terminating as they should. When they then end up in some zombie-state, and the overall FTL process moves on, even COW eventually causes memory to be wasted.

Please update to get my latest changes to the custom branch. I simply removed the forking altogether because it shouldn't really be needed. The expected output of pihole-FTL --hash on this branch after the update is a76e7918.

Please re-enable the server so we can see if the constant memory eating is now absent or if this has created additional trouble of any kind (I don't expect any but be prepared for the unexpected):

sudo pihole-FTL --config ntp.ipv4.active true
sudo pihole-FTL --config ntp.ipv6.active true

sawsanders · December 13, 2024, 9:14pm

Thanks, I will update and test.

Some background and observations (prior to updating)...

I have two RPis on my network running Pi-hole in a failover setup.
My router DHCP setup includes NTP addresses for both Pi-holes.
I use a regex to rewrite DNS NTP requests back to the active Pi-hole to force IOT devices to use local NTP services provided by Pi-hole.
I have one IOT client that "spams" the NTP server at least every 20 seconds.
The backup Pi-hole was running the tweak/ntp_errors branch with little traffic going to it and only one zombie process was present after ~10-12 hours
Next, I changed the regex on the primary to send NTP requests to the backup and changed the backup to the dev branch. Now it was receiving all network NTP traffic. No zombie processes were noted in 12 hours.
I switched the backup to the tweak/ntp_errors branch and observed two zombie processes in 10-12 hours.

It looks like the problem has something to do with load and many NTP requests.

Backup debug token: https://tricorder.pi-hole.net/kFVfVNf8/
Primary debug token: https://tricorder.pi-hole.net/IXCq8Jim/

sawsanders · December 16, 2024, 12:52am

Two days later, no memory increase or zombie processes. I'm calling this solved!

@DL6ER Thank you for your efforts!