Weird top domains

csfuser · December 2, 2019, 1:16am

Expected Behaviour:

Top domains not displaying cache names?

Actual Behaviour:

Debug Token:

https://tricorder.pi-hole.net/bgx6wvz644

Been running pihole since Jan.. After latest update, DNS server start shutting down after about 4-6 hrs. I gave up on a solution and re-installed using defaults.
Now, my top domains are showing
_aaplcacheN._tcp.localdomain (where N = nothing thru 4)
Done a ton of research but can't find any reference to it other than stuff about apple...
Top Clients also show my gateway/router as top client.. which I never noticed before.

Any thoughts/help would be welcome!
-Steve

deHakkelaar · December 2, 2019, 6:02pm

Can ask on the Apple support forums why your Apple devices queries those particular records:

You probably configured the upstream WAN DNS server(s) setting on your router to be that of the Pi-hole IP address.
Try defaulting that setting again on the router and setup shop using DHCP to instruct client devices to use the Pi-hole IP address for DNS resolution:

csfuser · December 3, 2019, 2:41am

Thanks for the reply..
I'm running a Ubiquiti network.. with USG, 2 * 8-port POE60w swtiches, and 2 * AP-AC-PRO WAP's.
The WAN DNS "was" pointing to pi-hole's IP.. been like that since I originally configured pi-hole back in Jan. DNS on all the network hardware is STILL pointing to pi-hole.
However, the gateway (USG) shows as top client with a HUGE gap between it and the next closest client. This is also skewing/screwing the charts and totals in the dashboard.

These are the top 2 clients.. see the gap?

The gateway (USG) shows FAR more hits than any device... and I've not changed anything it the network config. This all started after a recent OS upgrade on the Pi..

Bucking_Horn · December 3, 2019, 2:40pm

Pi-hole cannot force clients to issue DSN queries (though there are circumstances where a blocked domain could prompt repetitive DNS queries - but your problem is with permitted domains).

If your Pi-hole has indeed always been setup as upstream DNS for your USG, something else must have triggered that change.

Did you perhaps recently enable content caching on some of your Macs, possibly when upgrading them to Catalina?

The DNS requests in question are SRV lookups for Apple's content caches. It seems that you have setup multiple instances of such content caches that are trying to contact their respective peers for syncing, where normally one dedicated cache would be quite enough for a standard local home network.

As quoted from Apple's What is content caching on Mac?:

The saved content is stored in a content cache on a Mac, and is available for other devices to retrieve without going out over the Internet.

You'd probably have to find out which Mac devices on your network are configured as content cache, decide whether you really need the lot of them, and consult with Apple's documentation on how to best integrate them into your network.

jfb · December 3, 2019, 4:53pm

In addition to the excellent information from @Bucking_Horn, the domains starting with lb._dns.sd are typically associated with the Apple Bonjour service.

The sheer volume of requests being processed by Pi-Hole is likely the cause of this problem. What is the type of request for these cache domains (A, PTR, SRV, etc.)

grep aaplcache /var/log/pihole.log | tail -n15

csfuser · December 3, 2019, 7:48pm

Thanks for the responses...
Thing is.. I have NEVER setup apple or bonjour caching on my apple devices.. I'm guessing this might be related to their latest OS upgrades and possibly caused by some of those devices using "shared" access/notices across devices and icloud? idk.. but these just started showing on the list.
Prior to a clean re-install of pi-hole, which I did 3 days ago, they did NOT show in the list.
I did a clean install because FTL kept shutting down after recent upgrades (others have had similar issues) and I could not find a solution.
Since the clean install I've had these issues, but prior to that had pi-hole running since Jan without a problem.
Now, FTL is offline again.. and ... grrr
Here's what the pi-hole log shows type [TXT] of request when tailing the log... I'm at a loss as to how to clean this up... 192.168.1.1 is the gateway/router.

I'm going to guess - as you said - that the sheer volume is shutting down FTL.
It looks to be the case as tmp (/dev/shm) is at 100%!

Do i flush the logs and restart?

jfb · December 3, 2019, 8:01pm

You have the option of having Pi-Hole process only A or AAAA queries, which will reduce the amount of traffic shown in your logs and in the long term database. Note this will not change the underlying network traffic, you just won't see it in the Pi-Hole. You may have had this option set in your previous install of Pi-Hole, or the previous version may have been old enough that this option was not available.

ANALYZE_ONLY_A_AND_AAAA=true

I don't think this is the case. I have a number of Apple devices sharing data across iCloud, and have none of this traffic. Devices include Macs, IOS devices, Apple TVs, Apple watches, etc.

csfuser · December 3, 2019, 8:04pm

Thanks.. I'll try that.
I had not configure anything special on pi-hole to restrict processing types.
I usually update pi-hole when I get notices to do so via admin console.. and upgrade the OS also.
It was a recent OS upgrade that started all this.
I'll post back if that works!
THX

UPDATE:
Made the change.. seems to have stopped the logging.. but a quick listing of the /dev/shm shows that teh queries log is the largest file..
Is there a way to reset/clean it up.. will pihole -f do that?

Bucking_Horn · December 3, 2019, 9:02pm

You don't have to (and you certainly don't have to justify your use of Apple) - all Apple devices come with these features (i.e mDNS/bonjour auto-config and content cache detection) alive and working. It's part of Apple's no-hassle config-free philosophy.

Note that setting up a content cache still requires additional manual configuration by a user, at least with computers and smartphones. I am not sure how Apple TV or AirPort Time Capsule would behave in this regard, however.

Yes, content caching is especially effective for content supplied by Apple (as updates) or stored in your iCloud.

The idea is that instead of e.g. having each of your Smartphones download an iOS upgrade, you download it once, store it in a local content cache and have all further smartphones use that copy from the local cache.
That is not only faster for you, but potentially less stressful for Apple's servers as well - provided you have indeed multiple clients requesting identical information. It's not much different from setting up a proxy server (e.g. Squid) in your network.

So making a Mac machine in your network a content cache can be a sound decision.

The sheer amount of those requests make it unlikely it is just content cache clients, and the different addresses hint at multiple cache instances being involved. Normally, you'd see only a few of those requests during connection, and fewer still after that - unless you have setup a content cache.

@jfb may well be right about the ANALYZE_ONLY_A_AND_AAAA option, but I still think you should try and identify where this flood of requests originates from.

I don't have Apple devices here at my place (my parents do use Apple almost exclusively at their's), so I can't double check for the exact option:
Did you perhaps elevate some of your Apple clients to act as peers when configuring shared access notifications? Peers in Apple's terminology are indeed part of a distributed content cache.

csfuser · December 3, 2019, 9:21pm

@Bucking_Horn Nope.. NOTHING like any of that.. The Apple devices on my network are all stock and I've done nothing regarding content caching. It's all set to whatever Apple defaults are, and have not knowingly elevated any apple clients to act as peers.

UPDATE: I just checked the iMac and NO sharing of any kind is active

Plus, my LAN traffic stats don't show any apple devices with heavy traffic.. quite the opposite. And I have the ability to do DPI (deep packet inspection) so I can see what each device is doing..
When I installed pi-hole back in Jan.. It was the basic default config.. nothing special.
It was only after the v4.3.2 upgrade and subsequent pi OS upgrades that I started seeing FTL die every 6+ hrs. Not find an answer I uninstalled and did a clean install ( Pi-hole Version v4.3.2 Web Interface Version v4.3.2 FTL Version v4.3.1) which is when all this started happening.

I'm [seriously] tempted to start from scratch with the latest Raspian release install then install pi-hole again. Can't even query any of the long term data..
Something is definitely askew in this config..

And here's more weird stuff.. look at the query volumes from 11pm thru 03:00... something is querying hard.. but I see nothing in my network logs that matches..

csfuser · December 3, 2019, 10:29pm

I'm about to remove pi-hole and start over.. see if that makes any difference..

UPDATE:
OK. Removed pi-hole, then did clean install. Accepted default/recommended settings during install.
DNSSEC on, and selected Cloudflare (only) as upstream DNS.
Switched router DNS back to pi-hole IP and "so far" things - queries, traffic counts, etc. - appear "normal"... or at least close to what I was used to seeing.
I'll keep an eye on it overnight, see if I get the same caching anomalies and report back.

Bucking_Horn · December 3, 2019, 10:45pm

I fear any amount of clean install is not going to solve this, unless you want to make this a habit.
Pi-hole is not at the heart of this anomaly, it's just suffering from it - and quite visually so.
Your Pi-hole dies because its logs spill over.

Even considering @jfb's remark and restricting Pi-hole to log and display only A/AAAA-type queries would still mean you have well over 4 million requests of unknown origin per day.

Your above screenshot (post 3) shows that these requests amass during a certain time frame only - roughly between 11:30 and 15:00 hours.

Are you aware of any client device that is active exclusively during that time?

And as Pi-hole seems to see these requests as to be arriving from your USG:
Are you aware of any devices allowed on your network that do not use Pi-hole as DNS server directly, so they would be forwarded to your USG's upstream DNS (that happens to be Pi-hole)?
Or alternatively, could your USG somehow try to act as an Apple compliant content cache?

And finally, to conclude with something more constructive:
As a counter measure, you could try and add those requests to Pi-hole's blacklist as a regex:
_aaplcache\d{0,2}._tcp.localdomain

On first sight, this approach is clearly inferior to @jfb's proposal of logging only A/AAAA-requests, as blocked requests will still be logged, and thus still promote log file growth. However, ANALYZE_ONLY_A_AND_AAAA wouldn't make those stray DNS requests disappear - they just aren't logged anymore.

My hope is that blocking them altogether will shy the clients -whatever they may be- away from repeating their requests. If this works as I hope, it would mean you won't be able to take advantage of a content cache any more, but also that you can file a missing request report for 4 million DNS queries (only if you wish, though )

csfuser · December 3, 2019, 10:50pm

I fear you are correct..
I'll do some more checking of my USG logs and see what I can find.
I'll watch overnight and see if I get a repeat of the anomaly..
After the clean install (see above post) I did NOT modify the /etc/pihole/pihole-FTL.conf to this install per @jfb's suggestion.. want to see if I have a repeat between 23:00 - 03:00.

Thanks for everyone's help.. I'm learning a lot about how pi-hole works!

jfb · December 3, 2019, 11:00pm

The Pi-Hole is working normally and responding to the received DNS queries. A clean install of Pi-Hole will not change the network behavior. The volume of requests from the network is enormous and overwhelming your Pi-Hole logs and database. Let's see exactly how large the Pi-Hole log files and database have become:

ls -lh /var/log/pihole.log*

ls -lh /etc/pihole/pihole-FTL.db

The daily logs rotate out in 5 days and only the most recent two days are uncompressed. If the long term database is enormous, we can move it to a new location, restart FTL and a new database will be created. Without the huge volume of queries, this should resolve your problem.

csfuser · December 3, 2019, 11:13pm

@jfb
Totally agree and understand..
I looked at the queries.db size before I uninstalled and it was over 1Gb!!!
The current values - it's been up less than an hour are below..
I'll watch overnight and see if I get a repeat of the anomaly... so far the fresh install has shown what I am used to seeing.
Appreciate everyone's input.

jfb · December 3, 2019, 11:20pm

A tip for posting on this forum - you can copy and paste output without posting images. Paste the output into a reply, select that block of text and format it with the "</>" icon and it will become preformatted text. This helps us as well, since we can copy commands from your output and compare them to our results.

Example:

ls -lh /var/log/pihole.log*
-rw-r--r-- 1 pihole pihole 4.0M Dec 3 17:20 /var/log/pihole.log
-rw-r--r-- 1 pihole pihole 4.7M Dec 3 00:00 /var/log/pihole.log.1
-rw-r--r-- 1 pihole pihole 330K Dec 2 00:00 /var/log/pihole.log.2.gz
-rw-r--r-- 1 pihole pihole 354K Dec 1 00:00 /var/log/pihole.log.3.gz
-rw-r--r-- 1 pihole pihole 329K Nov 30 00:00 /var/log/pihole.log.4.gz
-rw-r--r-- 1 pihole pihole 266K Nov 29 00:00 /var/log/pihole.log.5.gz

csfuser · December 3, 2019, 11:27pm

Thanks.. will do that..

pi@raspberrypi:~ $ ls -lh /var/log/pihole.log*
-rw-r--r-- 1 pihole pihole 56M Dec  3 17:42 /var/log/pihole.log

so... after a little more research..I have a number of iPhone's on my network. Some (4) are set to auto-update and are pending an iOS 13.2.3 upgrade/install. One is going to try and install "later tonight".
I wonder if the content caching URL's in my OP were from those devices looking for locally cached content on the _aaplcach_tcp.localdomain lookup's? idk.
@Bucking_Horn - If I get a repeat, I'll definitely look at using the regex blocking. Thanks

Bucking_Horn · December 3, 2019, 11:42pm

A further bit of thinking - consider this:

An Apple device sends a content cache discovery query involving _aaplcache1._tcp.localdomain to your Pi-hole.
Pi-hole correctly recognizes the request as targetting a local host name and forwards it to your USG.
As you never setup a content cache, your USG doesn't know an associated IP - but instead of answering 'no such domain', it chooses to query its upstream DNS, which happens to be Pi-hole. Continue with step 2

Obviously, this is a loop that is only avoidable if

you do not use Pi-hole as your USG's upstream DNS (but you didn't touch the USG's upstream DNS since ever)
your USG is correctly answering the request with NXDOMAIN (if I am correct, this seems to have happened in the past)
Pi-hole is answering the request with NXDOMAIN (which kind of can be achieved by blacklisting)

So, if your USG is acting as DHCP and you have Conditional Forwarding enabled, I think your USG could be the problem - as you didn't touch the USG's upstream DNS, I suspect that your USG has received an update that changed its answering behaviour.
This also perfectly explains why you see your ISG as Pi-hole's top client.

If my assumptions are correct, blacklisting the discovery queries as suggested above (blocking _aaplcache\d{0,2}._tcp.localdomain) will immediately alleviate this problem.

csfuser · December 3, 2019, 11:49pm

"So, if your USG is acting as DHCP and you have Conditional Forwading enabled,"
I do have that setup.. and your explanation of the cause makes sense.
Yes, the USG's default DNS is pi-hole, with a secondary pointing to opendns (in case pi-hole fails).
I'll add the regex..

Bucking_Horn · December 4, 2019, 12:17am

That's an assumoption - it would depend on the client's behaviour.
I wouldn't know how an Apple device's content cache discovery software behaves when it receives Pi-hole's standard null answer for an SRV request, but I draw my analogy on how type A requests behave (they don't get repeated by the client device after TTL expires).

Still, I agree that injecting this information into a DNS record would be the better choice, if a bit trickier to configure. However, as we seem to be dealing with SRV records here, wouldn't srv-host entries be more appropriate?

I'll join you back tomorrow - have to call it a a day now.