DNS resolution failed after update to v4.1

ExIT · December 10, 2018, 3:52pm

Amazing work again!

I'm so happy with this update. And just before the holidays too, perfect time to do some heavy adminning among which this beauty.

Just a heads up; when updating from 4.0 to 4.1 in our test environment, since we are that eager to play with it, we ran into this problem:

Info to go with that: we run pihole on a dedicated machine. pihole forwards queries to 127.0.0.1#5353 where unbound runs which contacts root servers. Despite the error, everything is up to date and running. We haven't rebooted yet though

Mcat12 · December 10, 2018, 3:55pm

Check the FTL log (/var/log/pihole-FTL.log), dnsmasq log (/var/log/pihole.log), and journalctl -u pihole-FTL for any errors.

ExIT · December 10, 2018, 4:09pm

No error in either of the three logs. Just the usual log messages, nothing more than notices and information really. Oh, and some "insecure" messages (DNSSEC), but never a problem. So no evidence as of why. It even showed knowing about 127.0.0.1#5353 being the nameserver/forward server.

Mcat12 · December 10, 2018, 4:11pm

Run pihole -d for a debug token.

ExIT · December 10, 2018, 4:16pm

Here you go: iiboyb09gf

Mcat12 · December 10, 2018, 4:21pm

FTL may have just taken too long to start up (you have a lot of block lists), since you don't have any errors in your debug log. Try pihole -g now. If it works, try pihole -up again.

ExIT · December 10, 2018, 4:23pm

pihole -g

resulted in:

Capture

Mcat12 · December 10, 2018, 4:25pm

Does this command complete successfully? dig raw.githubusercontent.com

ExIT · December 10, 2018, 4:26pm

Yes, that resolves perfectly (NOERROR)

Mcat12 · December 10, 2018, 4:27pm

Does this command run successfully (this is what the gravity script runs)?

timeout 1 getent hosts raw.githubusercontent.com

ExIT · December 10, 2018, 4:29pm

I suppose. No output, just returns me to the command line 1 second later.

ExIT · December 10, 2018, 4:35pm

Just to be clear; everything is running normally. this above is going on, but meanwhile we can browse and stuff gets blocked as usual. no actual resolving issues, apart from Pi-hole saying so.

Mcat12 · December 10, 2018, 4:41pm

Try the same command, but change the timeout 1 part to time.

ExIT · December 10, 2018, 4:44pm

Mcat12 · December 10, 2018, 4:46pm

Did the dig command also take that long? What server(s) are in /etc/resolv.conf?

ExIT · December 10, 2018, 4:50pm

No the dig command takes 1 second (reports 1 msec at the bottom)

resolv.conf has 2 servers in this order:

Gateway (router)
Local host

Capture

ExIT · December 10, 2018, 4:55pm

To elaborate on that:

This server runs in a test environment, which is also used enough to be a situation comparable to a real life one. It has a gateway (router) which forwards DNS externally. It also has a server, on which Pi-hole and other services run.

The server itself uses the gateway to resolve DNS. The devices in the test-network use the server to resolve DNS. So the devices are protected by Pi-hole. But optionally can get direct access by setting the DNS server to the gateway.

The server with Pi-hole, as said, runs Unbound. This is used to query root servers from the server. Pi-hole is asked to forward to this local Unbound to resolve.

Edit: TLDR; commands were ran from the server CLI and bypass the Pi-hole/Unbound setup..

ExIT · December 10, 2018, 5:13pm

From a device in the network (Windows), which goes through Pi-hole -> Unbound -> Root servers:

Mcat12 · December 10, 2018, 5:21pm

So the issue is that the getent hosts raw.githubusercontent.com command takes too long, where it should be taking about the same time as the dig command.

If the getent command fails, after the retry time it will make another check using:

timeout 1 dig +short raw.githubusercontent.com

Does this dig command run in less than one second? (it should, since the other dig one did, but just checking).

ExIT · December 10, 2018, 5:23pm

timeout 1 dig +short raw.githubusercontent.com

takes about a second yes