Expected Behaviour:
We're running a pihole instance on Azure (Ubuntu). There's nothing special on this installation - i followed the official pihole installation guide.
We use pihole to block Cryptominers, Ads, & Co.
Actual Behaviour:
After some days (most of the time 2-4 Weeks) the Webinterface of pihole says, that "DNS service not running". I can still connect via HTTP or SSH to the machine. DNS-Lookups from the console fail.
If i reboot the VM, pihole is up again and working as expected. Also, if i restart "pihole-FTL", the local (and remote) DNS-resolution will also work again.
After the problem occures 2-3 times, i tought it would be a good idea to add a second pihole in another Azure Datacenter. So i created a second instance. I'm using GravitySync to keep the pihole-Database in sync between the two VMs; but i don't think that this is the cause of my problem (because the cause occured before i installed GravitySync and the second VM).
The problem occures on BOTH VMs. The pihole's are used as #1 and #2 DNS-Servers for 4 Offsite-Departments in their Firewall's. The access to the pihole's is restricted by the source ip address.
The VMs are using 1 vCPU and 512 MB RAM. It seems to be enought (Load < 1, RAM 262/403 used). I'm using the same Setup on my Homelab, with also 1 CPU an 512MB without problems. I've also tried more ressources (2 CPUs, 1 GB RAM), but the problem still occured, so i went back.
I'm not sure, what the exact problem is - i can't find hints for stopped services or something similar in the Logs. Also, pihole status seems to be OK:
/var/log/pihole# pihole status
[✓] FTL is listening on port
[✓] UDP (IPv4)
[✓] TCP (IPv4)
[✓] UDP (IPv6)
[✓] TCP (IPv6)
[✓] Pi-hole blocking is enabled
There's also enough disk space:
/var/log/pihole# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 29G 7.4G 22G 26% /
devtmpfs 198M 0 198M 0% /dev
tmpfs 202M 32M 171M 16% /dev/shm
tmpfs 41M 988K 40M 3% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 202M 0 202M 0% /sys/fs/cgroup
/dev/loop1 64M 64M 0 100% /snap/core20/1778
/dev/loop0 64M 64M 0 100% /snap/core20/1738
/dev/loop2 92M 92M 0 100% /snap/lxd/23991
/dev/loop3 92M 92M 0 100% /snap/lxd/24061
/dev/loop4 50M 50M 0 100% /snap/snapd/17883
/dev/sda15 105M 5.2M 100M 5% /boot/efi
/dev/sdb1 3.9G 28K 3.7G 1% /mnt
/dev/loop5 50M 50M 0 100% /snap/snapd/17950
tmpfs 41M 0 41M 0% /run/user/1000
Do you have any ideas, how i can troubleshoot the problem, if it occures the next time?
Debug Token:
I can't upload the Debug-Token, because the DNS-Resolving is not functional. So i paste it here:
(Moderator edit: explicit debug log removed)