DNS failing across VLAN

Pi-hole version is v5.3.1 (Latest: v5.3.1)
AdminLTE version is v5.5 (Latest: v5.5)
FTL version is v5.8.1 (Latest: v5.8.1)

  1. Pihole1 = Ubuntu Server 20.04.2 VM (Hyper-V host - type 2)
  2. Pihole2 = Raspbian GNU/Linux 10 (buster) Lite (Raspberry Pi 3 Model B Plus Rev 1.3)
  3. Pihole4 = Ubuntu Server 20.04.2 VM (VMware ESXi host - type 1)

All three appliances have Pi-hole with Unbound, via the official guide -- two of which were built/rebuilt this week. Only Pihole1 functions as desired.

Long story short, my 16GB SanDisk Ultra micro SD card finally gave up its ghost after 3+ years of daily writes (thanks for help resolving that). In the process of rebuilding, I now notice that my traffic on the "IoT" VLAN stopped passing requests to Pihole2 & Pihole4. I have repaired/reconfigured the failing devices, to no avail. The brand new VM works (Pihole1), but repeating the same steps on my RPi (and backup VM) do not seem to resolve my issues. Going on day 3 of banging my head against the wall, perhaps someone can help pick out the obvious misconfiguration.

Expected Behaviour:

I expect all the Pi-holes to work across all VLANs with Conditional Forwarding set to 192.168.0.0/16 when listening behavior is set to all (local) interfaces.

Actual Behaviour:

  • No DNS across IoT VLAN (192.168.107.0/24)
  • running tcpdump -npi ath1 port 5353 shows all the Multicast traffic, but with (Cache flush??) issues
19:19:24.494075 IP 192.168.107.73.5353 > 224.0.0.251.5353: 0 PTR (QM)? _ewelink._tcp.local. (37)
19:19:24.494248 IP 192.168.107.73.5353 > 224.0.0.251.5353: 0 [2q] [3n] ANY (QM)? Google-Home-Mini-4431075acaf23668f0fc6ce376e9d68a._googlecast._tcp.local. ANY (QM)? 4431075a-caf2-3668-f0fc-6ce376e9d68a.local. (358)
19:19:24.659733 IP 192.168.107.73.5353 > 224.0.0.251.5353: 0 [2q] [3n] ANY (QM)? Google-Home-Mini-4431075acaf23668f0fc6ce376e9d68a._googlecast._tcp.local. ANY (QM)? 4431075a-caf2-3668-f0fc-6ce376e9d68a.local. (358)
19:19:24.929889 IP 192.168.107.73.5353 > 224.0.0.251.5353: 0 [2q] [3n] ANY (QM)? Google-Home-Mini-4431075acaf23668f0fc6ce376e9d68a._googlecast._tcp.local. ANY (QM)? 4431075a-caf2-3668-f0fc-6ce376e9d68a.local. (358)
19:19:26.219722 IP 192.168.107.73.5353 > 224.0.0.251.5353: 0*- [0q] 4/0/0 PTR Google-Home-Mini-4431075acaf23668f0fc6ce376e9d68a._googlecast._tcp.local., (Cache flush) TXT "id=4431075acaf23668f0fc6ce376e9d68a" "cd=95CCC34A92DC7C7CBEF403AA997787C0" "rm=" "ve=05" "md=Google Home Mini" "ic=/setup/icon.png" "fn=Google_Coral" "ca=199172" "st=0" "bs=FA8FCA3F45EF" "nf=1" "rs=", (Cache flush) SRV 4431075a-caf2-3668-f0fc-6ce376e9d68a.local.:8009 0 0, (Cache flush) A 192.168.107.73 (360)
19:19:27.241152 IP 192.168.107.73.5353 > 224.0.0.251.5353: 0*- [0q] 4/0/0 PTR Google-Home-Mini-4431075acaf23668f0fc6ce376e9d68a._googlecast._tcp.local., (Cache flush) TXT "id=4431075acaf23668f0fc6ce376e9d68a" "cd=95CCC34A92DC7C7CBEF403AA997787C0" "rm=" "ve=05" "md=Google Home Mini" "ic=/setup/icon.png" "fn=Google_Coral" "ca=199172" "st=0" "bs=FA8FCA3F45EF" "nf=1" "rs=", (Cache flush) SRV 4431075a-caf2-3668-f0fc-6ce376e9d68a.local.:8009 0 0, (Cache flush) A 192.168.107.73 (360)

  • if I bring the test VM up (Pihole1) all my devices almost immediately resume responding and I can see the dNAT rules counting up (see above image), so i know my router is properly configured yet i am STILL unable to properly mirror the settings to get everything back up and running. The only thing I'm certain off at this point, is PEBKAC. :hot_face: Thanks, yer nice.

Debug Token:

https://tricorder.pi-hole.net/ocfm3dqq8t

EDIT: If there's anything further I've neglected to provide, please just say the word and I'll follow-up this evening.

What is your DNS Listening Behavoir? Settings>DNS>Listening Behavoir.

Just an observation but your confusing traffic from the avahi-daemon (multicast mDNS/Bonjour) at port 5353 with unbound (unicast) at port 5335 (from the guide):

pi@ph5b:~ $ sudo netstat -nltup | grep ':5353 \|:5335 '
tcp        0      0 127.0.0.1:5335          0.0.0.0:*               LISTEN      496/unbound
udp        0      0 127.0.0.1:5335          0.0.0.0:*                           496/unbound
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           26282/avahi-daemon:
udp6       0      0 :::5353                 :::*                                26282/avahi-daemon:

Listen on all interfaces (at most one hop / local).

Thank you, that was rather confusing wording on my part. Apparently I also missed a bullet point, as I was attempting to suggest what I thought did NOT work (DNS) versus what traffic was able to cross the network...

  • CAN ping devices from LAN to VLAN107
  • and I also see lots of mDNS (5353, they're mostly smart speakers and streaming devices) activity as well -- tcpdump on my wireless access point on port 5335 shows nothing

Something is behaving strangely... hence DNS??

And it shouldn't, as Pi-hole is the only client talking to upstream unbound on the same machine. Also, if you followed our guide, unbound would only listen on the loopback interface anyway.

That expectation isn't justified.
VLANs are a separate configuration set that is completely unrelated to Pi-hole.
If you haven't configured your router and/or switches correctly, there is nothing that you could do on the Pi-hole side.

In addition, Listen on all interfaces will only work if Pi-hole's host networking would have been configured for IP addresses in all of your VLANs.

I suspect this isn't the case for your installation, as your debug log suggests your Pi-hole machine is configured for a single /24 network on eth0, and so is your router where you seem to intend a /16:

*** [ DIAGNOSING ]: Networking
[✓] IPv4 address(es) bound to the eth0 interface:
   192.168.11.8/24 matches the IP found in /etc/pihole/setupVars.conf```
*** [ DIAGNOSING ]: Discovering active DHCP servers (takes 10 seconds)
   Scanning all your interfaces for DHCP servers
   
   * Received 313 bytes from eth0:192.168.11.1
     Offered IP address: 192.168.11.8
     Server IP address: 192.168.11.1
     DHCP options:
      Message type: DHCPOFFER (2)
      server-identifier: 192.168.11.1
      netmask: 255.255.255.0

Switching your interface listening behaviour to Listen on all interfaces, permit all origins should fix your issue, provided VLAN routing is setup correctly.

EDIT: Instead, you could also try to add all relevant IP addresses to your Pi-hole machine's network interface, but that would obviously be more laborious than switching Pi-hole's listening behaviour. Or you could give statically defining a /16 netmask on your Pi-hole machine a try.

Switching your interface listening behaviour to Listen on all interfaces, permit all origins should fix your issue.

Thank you for taking the time to write up a response, unfortunately the above change has not worked. I obviously did a poor job writing up my post by including a bunch of irrelevant information or misleading terms. Please allow me to try again.

Pihole1 (192.168.11.7) on an Ubuntu Server 20.04 VM works properly, my VLAN107 traffic reaches DNS and all my speakers and casting devices work. The moment I try to use Pihole2 (192.168.11.8) the devices suggest that they cannot connect to the internet. I believe my router to be configured properly, because I've been running the EXACT Pi-hole on this EXACT RPi3B+ since 2018 -- only since you helped my resolve the dead/dying micro SD card issue have these problems begun occurring, post-rebuild on a new 64GB micro SD card. I realize a LOT of aspects about this post are outside of the scope of Pi-hole and involve NAT Masquerade, dNAT & firewall rules, while I also capture and invisibly redirect devices with hard-coded DNS (such as these smart speakers)...

I guess the fact that you can't find anything obviously wrong with my deployment should be a good sign, but it still doesn't seem to reliably work. No DNS traffic will cross to Pihole2, but the Pihole1 device worked flawlessly all evening.

I'm obviously missing something glaring.

I can't think of anything besides Interface listening behaviour in a standard Pi-hole that would contribute towards your observation (you didn't apply custom dnsmasq configuration files involving interface options, did you)?

Most likely, this is a router / network configuration issue.

From a client in your VLAN, can you ping 192.168.11.8 and nslookup 192.168.11.8 and provide the results, both when using the working and the failing DNS server?

The most obvious room for blunder would be a changed IP address for your new Pi-hole installation, so routing and/or port redirection would target an invalid IP.
However, your DNAT screenshot target IP range does cover your Pi-hole's IP from the debug log.

Or perhaps the ethernet port on your router or switch changed with your new Pi-hole?

Interface listening behaviour

Pihole2 will NOT pass DNS traffic to VLAN107

versus Pihole1 which DOES properly pass along DNS requests to VLAN107

Most likely, this is a router / network configuration issue.

You're not wrong to question this, as I've brought a number of issues into the fray that are NOT specific to Pi-hole, but why would Pihole1 (192.168.11.7) properly pass DNS traffic across subnets, but Pihole2 (192.168.11.8 -- same IP as previous device build, same physical RPi3B+ hardware, just new micro SD card) cannot? To me this suggests misconfiguration on the RPi (Pihole2) device, NOT network configuration -- also, the simple fact that I have not changed my network config in years.

The most obvious room for blunder would be a changed IP address for your new Pi-hole installation, so routing and/or port redirection would target an invalid IP.
However, your DNAT screenshot target IP range does cover your Pi-hole's IP from the debug log.
Or perhaps the ethernet port on your router or switch changed with your new Pi-hole?

All great suggestions to check, but I assure you none of this has changed. This exact same configuration has worked (mostly) flawlessly for the past 3+ years (since originally building the RPi in 2018). My router configuration hasn't changed. Only a dead micro SD card, which caused me to go into panic mode quickly spinning up a new VM (Pihole1) to take the place of a dead Raspberry Pi until I could dig out a new SD card. The brand new VM works, my baremetal RPi rebuild has not. I have an "Address Group" created to allow a range (outside of DHCP scope) of 5 specific IP addresses 192.168.1.6 - 192.168.1.10 that have been assigned to various Pi-holes over the years via DHCP reservation and I can definitely see the traffic counting up. I wouldn't just take my word for it, either, but I'm fairly confident that the router is configured properly.

VLAN107 has a firewall policy that drops all traffic to RFC1918, it's supposed to have internet access ONLY -- no local, with a few minor exceptions (see above). Still running Pihole2 off the same physical (eth6) port on my router.

I'm terribly sorry to have taken so much of your valuable time. I obviously need to start from scratch and carefully re-evaluate my router configuration. None of this is making any sense, so it's obviously something i have misconfigured in my failed attempts to resolve (what i thought were) DNS issues.

Here's Pihole1 (the working VM) showing VLAN traffic with current timestamps and number of queries.

Thanks again for being SO very patient. I guess for today we'll just allow Pihole1 VM to keep on chugging along and hopefully we survive the next 24-48 hours. After so many failed attempts and research leading me in circles, I was hoping a fresh set of eyeballs could call out some obvious mistake I'd made. Will try again this weekend, perhaps even starting with a fresh installation on the micro SD card. Unfortunately, even if I do get it working again, I probably won't ever understand what went wrong. Learning from my mistake was the ultimate goal. /sigh

Stay safe out there!

Missed out below question:

Check with below if you have any interface= directives in the config files that restrict pihole-FTL from answering on other interfaces:

sudo grep interface= -R /etc/dnsmasq.*

1 Like

Thanks for (re)focusing my attention on this, @deHakkelaar! My 01-pihole.conf is as follows:

# Pi-hole: A black hole for Internet advertisements
# (c) 2017 Pi-hole, LLC (https://pi-hole.net)
# Network-wide ad blocking via your own hardware.
#
# Dnsmasq config for Pi-hole's FTLDNS
#
# This file is copyright under the latest version of the EUPL.
# Please see LICENSE file for your rights under this license.

###############################################################################
#      FILE AUTOMATICALLY POPULATED BY PI-HOLE INSTALL/UPDATE PROCEDURE.      #
# ANY CHANGES MADE TO THIS FILE AFTER INSTALL WILL BE LOST ON THE NEXT UPDATE #
#                                                                             #
#        IF YOU WISH TO CHANGE THE UPSTREAM SERVERS, CHANGE THEM IN:          #
#                      /etc/pihole/setupVars.conf                             #
#                                                                             #
#        ANY OTHER CHANGES SHOULD BE MADE IN A SEPARATE CONFIG FILE           #
#                    WITHIN /etc/dnsmasq.d/yourname.conf                      #
###############################################################################

addn-hosts=/etc/pihole/local.list
addn-hosts=/etc/pihole/custom.list


localise-queries


no-resolv



cache-size=0

log-queries
log-facility=/var/log/pihole.log

local-ttl=2

log-async

server=127.0.0.1#5335
local-service
rev-server=192.168.11.0/24,192.168.11.1
server=/lab.example.com/192.168.11.1
server=/use-application-dns.net/

And the only other configuration in this folder is for my Wireguard vpn, which simply lists the client/peers. Genuinely appreciate the assistance, thank you both for trying so hard to assist me. If I am able to figure it out, I'll obviously report back. It's definitely broken, heh. /sigh

1 Like

Does that mean the grep produced no output?

EDIT: What else is in that Wireguard file you didnt mention before?

Correct, sir. No output.

The 02-pivpn.conf file is:

addn-hosts=/etc/pivpn/hosts.wireguard

And the above hosts file literally just has entries with wireguard IP addresses (10.6.9.0/24) with the associated clientname.wireguardvpn

My apologies for neglecting to include such pertinent information. For now, I've simply purged Unbound and started using my ISP's upstream DNS servers. Sadly, I didn't have time to reflash the SD card at the moment, so this is the path of least resistance. I would REALLY love to resolve these issues, so within the next ~48 hours I hope to be able to pull out a few more chunks of hair while banging my head against this problem. Thank you kindly for your expertise, @deHakkelaar it is most genuinely appreciated. Truly pains me that I could not big brain the solution on my own. I felt it has to be something borderline trivial that I've overlooked.

For the time being, I've reverted my settings as noted above as well as adjusting Interface listening behavior back down and expanded the Conditional Forwarding range to include both subnets via 192.168.0.0/16 seems both devices are happily reporting host names across the VLAN and everything just immediately resumed talking to the internet, didn't even have to reboot or reconnect any IoT devices. They're always DYING to get out and chatter with something, hence my reluctance to allow them local access.

Thanks a million for trying to hold my hand through this, folks. I don't know how I've made such a mess. So sorry to have wasted your time. SMH

1 Like

No apologies needed.
There are so many moving parts to consider and sometimes you forget :wink:
Best approach to diagnose these kinds of problems is how you started out with the tcpdump.
And run the nslookup tool on a client (Windows/MacOS/Linux) in one of the vlan's.
But run the tcpdump without the -p --no-promiscuous-mode argument to sniff all eg:

sudo tcpdump -ni ath1 port 53

Weird interface name by the way ath1.
In another session you tail the logs live with below:

pihole -t

And on a client you test with below (without being depended on upstream configured DNS servers):

nslookup -type=a pi.hole. <PIHOLE_IP_ADDRESS>

And test for a domain that gets forwarded upstream by Pi-hole:

nslookup -type=a pi-hole.net. <PIHOLE_IP_ADDRESS>

But have you solved your issue now?
Do you suspect unbound upstream was causing you issues?
That would not sound logical as unbound is not residing in your vlan's but instead on the localhost loopback adapter named lo (127.0.0.1 etc).
Meaning intercommunication between pihole-FTL and unbound is not broadcasted over your ath1 interface and thus vlan's do not apply.

2 Likes

It's just a wireless access point (UniFi UAP-AC-LR with Atheros chipset, i believe) and the firmware has always been a little shaky, so anytime there's an issue, it wouldn't surprise me if it WERE networking related, but overall it's a large step up from the usual Linksys/Netgear consumer grade / off the shelf stuff we've all had a lifetime of terrible experiences with. That being said, I also started testing an entry level Ruckus R310 out back and it's so much more stable, has significantly better beam-forming, so the range that 2.4GHz clients get is incredible. Sorry, now i'm rambling.

I wouldn't necessarily say that the issue is "solved" per se, as much as I've found a temporary work around. I'd prefer the added privacy and love the <1ms response times from cached queries that Unbound provides, but for whatever reason I just cannot seem to get it to reliably respond to queries from the iot VLAN107 for some strange reason. Have been running about ~20 hours now, all speakers and streaming devices are performing as expected.

And queries to VLAN107 are exactly where they should be:

10:16:10: query[PTR] 8.11.168.192.in-addr.arpa from 192.168.11.23
10:16:10: /etc/pihole/custom.list 192.168.11.8 is pihole2.lab.example.com
10:16:10: query[A] pi.hole from 192.168.11.23
10:16:10: /etc/pihole/local.list pi.hole is 192.168.11.8
10:16:21: query[A] www.google.com from 192.168.107.21
10:16:21: cached www.google.com is 172.217.11.4
10:16:21: query[AAAA] www.google.com from 192.168.107.21
10:16:21: cached www.google.com is 2607:f8b0:4006:814::2004

10:17:44: query[PTR] 8.11.168.192.in-addr.arpa from 192.168.11.23
10:17:44: /etc/pihole/custom.list 192.168.11.8 is pihole2.lab.example.com
10:17:44: query[A] pi.hole.net from 192.168.11.23
10:17:44: forwarded pi.hole.net to 72.18.48.120
10:17:44: validation result is SECURE
10:17:44: reply pi.hole.net is NXDOMAIN
10:17:55: query[A] www.google.com from 192.168.107.216
10:17:55: cached www.google.com is 172.217.11.4
10:17:55: query[AAAA] www.google.com from 192.168.107.216
10:17:55: cached www.google.com is 2607:f8b0:4006:814::2004
10:17:55: query[A] scribe.logs.roku.com from 192.168.107.75
10:17:55: regex blacklisted scribe.logs.roku.com is 0.0.0.0

No other settings have been changed on the router (switch0.107), or access point. No reason to suspect anything was wrong my unboundcheckconf returned no errors on either machine, i thought maybe I had messed something up between IPv4/IPv6 so I ran a repair (on both devices) and made sure that ONLY IPv4 was checked to respond to queries -- but no firewall policies have changed. I have been using the same Unbound configuration since ~2018 and have built numerous (a dozen or more) Pi-hole appliances... not to suggest by any stretch of the imagination that I am an expert, merely attempting to convey my level of enthusiasm with the project. All of this confounds me.

It's definitely DNS, uggh, but makes no sense. nslookup performed as expected across both devices. I remain lost. Oof.

Top PADD screen capture is from Pihole1 on the RPi3B+ running Raspbian Lite OS, while bottom image is a little Hyper-V (type2) VM running Ubuntu Server 20.04.2 -- now with Unbound purged we're simply passing requests upstream to our ISP's surprisingly quick resolver.

DNSperftest: (via this great script)

                  test1   test2   test3   test4   test5   test6   test7   test8   test9   test10  Average
127.0.0.1          1 ms    12 ms   10 ms   7 ms    9 ms    9 ms    9 ms    15 ms   1 ms    24 ms     9.70
ISP_DNS1           3 ms    3 ms    3 ms    2 ms    3 ms    3 ms    3 ms    3 ms    3 ms    3 ms      2.90
ISP_DNS2           5 ms    5 ms    5 ms    5 ms    5 ms    5 ms    5 ms    5 ms    5 ms    40 ms     8.50
cloudflare        13 ms   13 ms   13 ms   14 ms   14 ms   13 ms   13 ms   14 ms   13 ms   14 ms     13.40
level3            19 ms   19 ms   19 ms   18 ms   19 ms   19 ms   19 ms   19 ms   19 ms   19 ms     18.90
google            19 ms   19 ms   22 ms   20 ms   13 ms   56 ms   14 ms   21 ms   13 ms   20 ms     21.70
quad9             48 ms   49 ms   48 ms   49 ms   48 ms   47 ms   49 ms   50 ms   48 ms   48 ms     48.40
freenom           89 ms   82 ms   88 ms   84 ms   92 ms   83 ms   83 ms   98 ms   87 ms   90 ms     87.60
opendns           34 ms   19 ms   20 ms   34 ms   20 ms   59 ms   20 ms   27 ms   34 ms   20 ms     28.70
norton            19 ms   19 ms   19 ms   20 ms   19 ms   19 ms   19 ms   19 ms   20 ms   26 ms     19.90
cleanbrowsing     14 ms   13 ms   13 ms   13 ms   14 ms   13 ms   13 ms   13 ms   14 ms   16 ms     13.60
yandex            120 ms  120 ms  118 ms  118 ms  113 ms  113 ms  120 ms  113 ms  119 ms  113 ms    116.70
adguard           18 ms   20 ms   22 ms   18 ms   19 ms   21 ms   21 ms   18 ms   18 ms   22 ms     19.70
neustar           19 ms   20 ms   20 ms   20 ms   20 ms   20 ms   20 ms   19 ms   19 ms   19 ms     19.60
comodo            30 ms   30 ms   30 ms   31 ms   30 ms   30 ms   31 ms   31 ms   30 ms   74 ms     34.70

3 - 9 ms isn't too shabby for our ISP provided. While i do miss Unbound and the great privacy enhancements, I'm obviously not skilled enough to continue running it at this time. Will revisit things in the very near future. Thanks again for investing so much time giving me great ideas to toss around and check. Stay safe, happy day to all the father's out there.

Brought up Pihole4 (another Ubuntu Server 20.04 VM running on VMware ESXi host) and it's throwing brand new errors:

DNSMASQ_CONFIG FTL failed to start due to failed to create listening socket for port 53: Address already in use

netstat -nltup shows the following:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      1140/lighttpd
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      786/systemd-resolve
tcp6       0      0 :::80                   :::*                    LISTEN      1140/lighttpd
udp        0      0 127.0.0.53:53           0.0.0.0:*                           786/systemd-resolve

Never saw this while running Unbound previously. Ahh well, guess I should rebuild a new VM from scratch anyway. Thanks again for all your time!

Yeah systemd-resolved is a caching DNS forwarder and a bit similar as dnsmasq thats embedded into the pihole-FTL binary.
Both try to claim/bind port 53 so thats going to conflict.
I believe the Pi-hole installer should have taken care of systemd-resolved:

If it didnt, try edit below file:

sudo nano /etc/systemd/resolved.conf

Find a line that says:

DNSStubListener=yes

Change it into:

DNSStubListener=no

If that line is not present in the resolved.conf file, add it.

Save/exit and restart systemd-resolved:

sudo service systemd-resolved restart

Check with netstat if she's gone now on port 53.
And try restart Pi-hole:

pihole restartdns

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.