Pihole intermittently stops resolving until hardware reboot

Static IP was set for the PiHole during the PiHole installation process:

wget -O basic-install.sh https://install.pi-hole.net
sudo bash basic-install.sh

As mentioned, I've also bound the PiHole's MAC in the gateway and other networking equipment to its static IP.


Returns:

PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

:woman_shrugging:

Check /etc/dhcpcd.conf and make sure it's set as you expect.

But the bigger question is, why does a dig against 8.8.8.8 fail? That's a networking level issue. You can see from the same debug output that Pi-hole is operational and responding correctly.

# A sample configuration for dhcpcd.
# See dhcpcd.conf(5) for details.

# Allow users of this group to interact with dhcpcd via the control socket.
#controlgroup wheel

# Inform the DHCP server of our hostname for DDNS.
hostname

# Use the hardware address of the interface for the Client ID.
clientid
# or
# Use the same DUID + IAID as set in DHCPv6 for DHCPv4 ClientID as per RFC4361.
# Some non-RFC compliant DHCP servers do not reply with this set.
# In this case, comment out duid and enable clientid above.
#duid

# Persist interface configuration when dhcpcd exits.
persistent

# Rapid commit support.
# Safe to enable by default because it requires the equivalent option set
# on the server to actually work.
option rapid_commit

# A list of options to request from the DHCP server.
option domain_name_servers, domain_name, domain_search, host_name
option classless_static_routes
# Respect the network MTU. This is applied to DHCP routes.
option interface_mtu

# Most distributions have NTP support.
#option ntp_servers

# A ServerID is required by RFC2131.
require dhcp_server_identifier

# Generate SLAAC address using the Hardware Address of the interface
#slaac hwaddr
# OR generate Stable Private IPv6 Addresses based from the DUID
slaac private

# Example static IP configuration:
#interface eth0
#static ip_address=192.168.0.10/24
#static ip6_address=fd51:42f8:caae:d92e::ff/64
#static routers=192.168.0.1
#static domain_name_servers=192.168.0.1 8.8.8.8 fd51:42f8:caae:d92e::1

# It is possible to fall back to a static IP if DHCP fails:
# define static profile
#profile static_eth0
#static ip_address=192.168.1.23/24
#static routers=192.168.1.1
#static domain_name_servers=192.168.1.1

# fallback to static profile on eth0
#interface eth0
#fallback static_eth0
interface eth0
        static ip_address=192.168.0.2
        static routers=192.168.0.1
        static domain_name_servers=8.8.8.8 8.8.4.4

Looks fine to me?


That's what I'm struggling with understanding. There's absolutely no reason I can identify that it would fail. I have tried using different upstream DNS (cloudflare) but the issue persists with that changed too.

At least you've helped me narrow it down to probably not being the pihole at fault. Weird that it resolves if I reboot it though.

I would check things like ping 8.8.8.8 and ip route get 8.8.8.8 to make sure you even have basic connectivity to that IP address first, if that works then I'd look for things like DNS rebind protection or any logs on the TP-Link routern. Is it a stock firmware on the router?

I'll run them now and keep the results in a text file, and run them again next time it fails and update this thread with both results.

I'll look into that now.

EDIT: Nothing out of place in the logs. The router does have DNS rebind protection but from what I've read online, disabling its DHCP functionality and letting PiHole serve DHCP and DNS circumnavigates it.

If rebind protection was in effect, I'd not get any DNS resolution through the PiHole, ever. If it was randomly kicking in on the router, the router's VDSL connection also has 8.8.8.8/4.4 specified (because for some reason if you put a local subnet IP in that box the router stops responding entirely, until you factory reset, but I digress).

Point being if rebind protection was kicking in and causing the PiHole to stop getting through to clients, clients would be served unfiltered DNS from 8.8.8.8/4.4 as a fallback. I've tried running ipconfig /flushdns and a variety of netsh winsock reset commands on a Windows client on the network when this PiHole DNS error kicks in, and that does nothing. After running those commands and reconnecting, ipconfig /all still returns the PiHole as DHCP and DNS. So I don't think that's got anything to do with it. Even when in this state, DHCP is still working on the PiHole, evidenced by being assigned correct information after ipconfig /release /renew.

Stock, latest version, manually updated.

Thank you very much for your help so far! :fist:

This won't solve the problem with external DNS upstreams failing but you could do an unbound install as the Pi-hole upstream.

https://docs.pi-hole.net/guides/unbound/

1 Like

Didn't realise unbound was quite so simple to set up. Done. We shall see if the bug returns now unbound it set up. All working fine on unbound as of right now.

If nothing else it should give you a lot more for log debugging options. The default configuration we suggest doesn't have logging enabled but you can set unbound up to be very verbose with logs to pick out all kinds of transit errors.

1 Like

Great, thanks! I'll leave it be for now but if I get further issues in the next few weeks (ambitious?) then I'll look into verbose logging and post what I find here.

Annoyingly with intermittent issues like this, they never show up when you're waiting for them... Murphy's law and all that.

On a Raspi, having intermittent issues, always suspect power issues first:

1 Like

Zero returned for that, I'll try it again if/when it plays up again.

I should note, I did swap the power supply with another of the same type when I switched the RasPi running PiHole in an effort to resolve the bug. Same problem, different hardware, different power adaptors, different physical locations (and therefore electrical rings). I doubt this is the issue but I'll check nontheless if the problem persists. Both adaptors are 5V3A.

Another is a corrupted filesystem.
But for that, you'd have to remove the SD card and insert into another Linux client that has the EXT4 filesystem stack available.
You would do a:

dehakkelaar@laptop:~$ lsblk
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                            8:0    0 111.8G  0 disk
├─sda1                         8:1    0   190M  0 part /boot
├─sda2                         8:2    0  37.3G  0 part /
├─sda3                         8:3    0  69.9G  0 part
│ ├─vg0-lamp.dehakkelaar.nl--swap 254:0    0   1.5G  0 lvm  [SWAP]
│ └─vg0-lamp.dehakkelaar.nl--disk 254:1    0    10G  0 lvm
└─sda4                         8:4    0   4.5G  0 part [SWAP]
sdb                            8:16   1   7.2G  0 disk
├─sdb1                         8:17   1   256M  0 part
└─sdb2                         8:18   1     7G  0 part
sr0                           11:0    1  1024M  0 rom

And run filesystem-checker on the SD card partitions:

sudo fsck /dev/sdb1

sudo fsck /dev/sdb2

1 Like

Linux-wise I've only got another RasPi available right now - is this something I can check with that (utilising a USB card reader)?

Yes.
I have a couple of them.

1 Like

Excellent! I'll check it tomorrow after work and post an update here. Thank you!

1 Like

You can add the -y argument to fsck to answer yes on all questions asked to fix.
But first try run without (can breakout with CTRL-C).

EDIT: ow and make sure the SD card partitions from lsblk aren't mounted when running fsck !
After inserting, check mounts with:

mount

Unmount with:

sudo umount /PATH/TO/MOUNT

1 Like

Guys it's just occurred to me what's going on. Feel free to laugh at me.

I had OpenVPN installed on the same RasPi as PiHole, from when I first set it up a good while back. My logic then was to encrypt all DNS traffic to prevent my ISP from reading/intercepting it (See 'snooper's charter' in the UK - creepy stuff).

Anyway I'd had it configured to use PIA's Swiss server. Then I'd totally forgotten it was installed and running on the RasPi that the PiHole is on, 'cause it'd been problem free for so long.

Long story short I've been disconnected a few times from PIA's Switzerland server on other devices recently. I'm willing to bet my bottom dollar that if the end of the PiHole's VPN tunnel is going down, even briefly, that's enough of a spanner-in-the-works to be what's causing my problems, explaining why PiHole hasn't been able to find 8.8.8.8 at times. One would have hoped I'd connected the dots before wasting your time but here we are.

Rather than faff about with trying to get OpenVPN to reconnect or hop to a different server etc I've just uninstalled it for now. I'll continue to monitor the status, but the issues with DNS resolution on this network and the VPN disconnections I've experienced elsewhere recently are in more-or-less the exact same time frame.

I'll look into other options for encrypting/obfuscating outbound DNS queries, but I'll leave this post 'unsolved' for now. If, in a few days time I've had no outages on the network now that OVPN is gone, I'll come back and mark this post as a solution, forever marking my own stupidity.

Thank you both for taking the time to help. Much appreciated. I'll be donating again to PiHole as a thanks/apology for wasting your time on what is likely entirely my own error.

tl;dr Don't use a VPN on your PiHole, especially if it's unreliable.

EDIT: 36+ hours later and no issues. Looks like it was indeed the VPN dropping out that was causing my issue. May this thread/post remain as a monument to my own shortsightedness :flushed: Hopefully this thread can help someone else one day!

1 Like

:smiley:

Look into if your router is able to dial into the VPN.
Maybe its able to deal better with temporally loss of connection.
Most current firmwares can dial into VPN... and if not, there are alternative firmwares available for popular router models that can dial in to most popular VPN providers.
One such example:

https://dd-wrt.com/

1 Like

So a small update to detail how I've fixed/reimplemented OpenVPN on my PiHole RasPi. Hopefully this will be useful to any future users having issues.

Please bear in mind this is not specifically a PiHole issue and while I'm on the fence about posting it here because of that, I'll do so because it fits in the context of the thread and it may one day help someone.



This 'fix' is a simple one, probably obvious, but should be applicable to anyone having similar connection issues, regardless of their VPN provider.

Simply put the process flow of what was going wrong can be described:

**On booting RasPi:**

* PiHole loads as normal, connected (briefly) directly to your chosen DNS provider.

* OpenVPN asks for DNS resolution of <your_chosen_server.your_provider.com>

* OpenVPN receives IP and connects to your VPN server, tunnel is established.

* PiHole works properly and retrieves DNS results from your chosen DNS provider through the VPN tunnel.

**Things go wrong:**

* VPN server disconnects for some reason.

* PiHole is suddenly unable to retrieve DNS results because the tunnel is down.

* OpenVPN tries to reconnect, asks for DNS resolution of <your_chosen_server.your_provider.com>

* PiHole, unable to contact upstream DNS provider, cannot resolve <your_chosen_server.your_provider.com>

* OpenVPN cannot reconnect to <your_chosen_server.your_provider.com>

* PiHole cannot resolve DNS as it cannot access upstream DNS servers.

This explains why the issue wouldn't resolve until rebooting the RasPi.



So, the fix for this is simple and (with hindsight) obvious, however it may have drawbacks depending on your VPN provider. I'll go over that later in this post.

You'll need to edit your .conf or .ovpn file that OpenVPN is using to connect to the server and remove the need for DNS resolution by changing <your_chosen_server._your_provider.com> to the IP address of that server.

Here's how I did that - but bear in mind this is specific to how I set OpenVPN up. You will need to edit the files you're using in your configuration with the IP address of your chosen VPN server.

Assuming a starting point of your OpenVPN connection working (great guide HERE btw), open up a terminal/ssh session, login if necessary and:

cd /etc/openvpn
sudo nano Netherlands.conf # make sure you specify YOUR chosen .conf file
# in MY case, change "remote nl.privateinternetaccess.com 1198" to "89.187.174.198 1198"
# in YOUR case, change "remote your_chosen_server.your_provider.com ##PORT##" to "xxx.xxx.xxx.xxx ##PORT##"
# ctrl-x, y, return to save your changes

That's it. Reboot for good measure. It should automatically connect as before, but without having to resolve <nl.privateinternetaccess.com> first.

You can find out the IP you need to edit in by either using nslookup or ping on the domain name of your chosen VPN server.

If it doesn't work after making these changes, double check the IP and port are entered correctly. You can always revert the changes made.


Now, about that potential drawback I mentioned of editing your OpenVPN config this way:

Your VPN provider may from time to time change the IP addresses for your chosen server. If this happens, your OpenVPN config will then fail to connect to the (now dead) IP address. It may be better to leave it be. Your call. My system is more or less attended whenever I need it and I have secure remote access set up so I can change things when I'm away if needs be. Now I'm aware of this issue it'll be the first thing I check if PiHole goes down again.



Hope that helps someone!

Thank you again, both of you, for your help. Much appreciated, even if I was waaay wide of the mark with what was actually going on in the first place. Oh well.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.