Over the last 5 days I’ve been building and tuning a DNS test environment around the Pi-hole, focusing on high-QPS performance and packet-path limits.
Initial goal was simple benchmarking, but it evolved into a full stack test:
• 8 Pi-hole instances (Debian 12 / Proxmox VMs)
• Google for recursive resolution
• dnsdist in front end for load distribution
• dnspyre used for sustained high-concurrency testing
Key findings and tuning steps:
-
Switch bottleneck
A cheap unmanaged switch was dropping packets under load. Replacing it significantly reduced UDP errors and improved consistency.
-
UDP vs TCP tuning
Initially focused on TCP parameters — minimal impact. Real gains came from tuning UDP path and kernel networking.
-
Kernel tuning (major impact)
The most effective change was tuning:
net.core.netdev_max_backlog = 32768
This reduced packet drops while avoiding excessive queue buildup. Higher values increased latency and reduced throughput.
-
VM environment
Testing was done on Proxmox using VirtIO networking with default settings. No advanced NIC queue tuning or CPU pinning was applied — results reflect kernel/network tuning rather than hypervisor-level optimisation.
-
Cache vs real workload
Testing revealed two distinct performance profiles:
• Warm cache (1000 hostnames)
~80k QPS
~0.2–0.3% errors
• Larger working set (5000 hostnames)
~6.5k QPS
~5–6% errors
This clearly shows cache amplification vs real recursive/upstream limits.
- WAN limitation
With ~10 Mbps upload, upstream capacity aligns closely with ~6k QPS when cache effectiveness drops — confirming bandwidth as the limiting factor in “cold” scenarios.
Summary:
The system is now capable of:
• very high throughput for cached responses (~80k QPS)
• stable performance under sustained load
• exposing clear boundaries between cache, network, and upstream limits
Biggest takeaway:
Performance at this level is no longer about Pi-hole itself — it’s about packet handling, buffering, and network path efficiency.
Still experimenting, but current setup is stable and repeatable.
1 Like
So which one should people avoid ?? 
I used a tp-link 8 port switch worth about $30
I switched to tp-link 16 port switch worth about $100
1 Like
Pointing at a specific model here would seem a bit unfair to me.
A typical home network probably sees a few thousands to ten thousands DNS requests a day - it doesn't have to cope with 6,500 queries per second amounting to 23 million per hour, or well over half a billion requests per day, so that question doesn't seem applicable for home usage scenarios.
For home usage people, there's likely no reason to retire their switch just because it didn't manage well in smokingwheels's stress tests (unless they also were into serious stress testing their equipment).
1 Like
It’s always good to know this kind of stuff in case someone starts looking for issues on the Internet and finds this thread mentioning the very basic TP-Link 108 Switch but now we just need the exact Revision of it because there have been like 4 or 5 of them for the whole 105/108 line :
- 105
- 105E
- 105PE
- 108
- 108E
- 108PE
For example Games “SPAM” a lot of UDP traffic and IMHO this might cause packetloss there too… You never know…
@nero355
Yeah revision matters, but let’s be real about expectations here.
The TL-SG105/108 series (any of them) are cheap unmanaged switches with limited buffers and basic ASICs. They’re designed for throughput (Gbps), not high PPS workloads.
What bites people is this:
-
Games, DNS, VoIP, etc. = small UDP packets → very high packets-per-second
-
These switches have tiny buffers + limited packet processing capacity
-
Result = microbursts → buffer overflow → packet loss
So even if you’re only doing:
- 50–100 Mbps
you can still drop packets if PPS is high enough.
Key point:
Bandwidth ≠ performance
PPS (packets/sec) is the real limiter
For $30:
But:
-
Not designed for stress tools (dnsblast, dnspyre, dnsblast-go, etc.)
-
Not designed for sustained high PPS
-
Not consistent under microburst load
So yeah — value for money = good,
but expecting it to behave like enterprise gear under load is unrealistic.
If someone is chasing packet loss:
-
test with a better switch (even a cheap managed one with bigger buffers)
-
or reduce burst/concurrency
-
or accept the hardware limit
Great $30 switch for what it is.
Not a high-PPS device.
Packet loss under UDP bursts is expected, not a fault.
I will switch back to my $30 8 port tp-link switch to test after all the tuning is done.
1 Like
DNS Performance Testing Summary (DNSdist + Pi-hole cluster)
I’ve been benchmarking a local DNS stack using DNSdist (frontend) with multiple Pi-hole backends under high-QPS UDP load (dnspyre/dnsblast style testing).
Hardware
-
HP DL360 (older server)
-
Proxmox VM environment
-
Up to 24 cores available
Test Setup
-
dnsdist as frontend load balancer
-
Pi-hole instances as backends (scaled from a few → up to 12)
-
High PPS / UDP-heavy workload (small packets, burst traffic)
-
Both cached and uncached scenarios tested
NOTE: The results below primarily reflect cached DNS performance, not full recursive (uncached) resolution.
Key Findings
1. Peak vs Sustained Performance
System handles very high bursts but settles to a stable throughput ceiling.
2. Frontend Core Scaling
-
Increasing dnsdist cores alone did not always improve performance
-
With limited backend capacity:
- More cores = more queueing + higher drops
-
With larger backend (12 Pi-holes)
- Higher core counts improved throughput
Frontend scaling only helps if backend can absorb it
3. Backend Scaling (most important factor)
Backend fanout had more impact than CPU tuning
4. RAM Disk (tmpfs) Testing
-
Moving logs/DB to RAM:
-
But:
Removing I/O bottlenecks can increase overload effects
5. Network Tuning (sysctl)
Higher buffers = more queueing, not more capacity
6. Real Bottleneck
The limiting factors are NOT:
The bottleneck is:
-
dnsdist processing + scheduling
-
Pi-hole/FTL backend capacity
-
Queue buildup under burst load
Typical Stable Envelope
Across multiple runs:
-
Throughput: ~65k–70k QPS
-
Error rate: ~1.5%–3%
-
Latency (under burst):
-
p50: ~200–250 ms
-
p95: ~280–300 ms
Key Takeaways
-
More cores ≠ more performance
-
Backend scaling > frontend scaling
-
Burst capacity ≠ sustainable throughput
-
Reducing bottlenecks can expose deeper limits
-
Queueing (not CPU) drives latency at high load
Best Performing Setup
So far:
Best balance of throughput and stability
Final Thoughts
For high-QPS DNS workloads:
-
Focus on backend scaling and distribution
-
Avoid over-driving the system with unrealistic burst loads
-
Tune for sustained throughput, not just peak numbers
Pi-hole (Tuned) Performance Snapshot Cached data
After tuning Pi-hole (logging adjustments, general cleanup, and backend balancing), I ran a focused test to look at individual node behaviour.
Results (single-node style load 1000 domain list)
Results (single-node style load 20 domain list)
Interpretation
-
This reflects per-node performance under sustained load, not aggregate cluster throughput
-
Results are consistent with earlier findings of ~12k cached peak per node, dropping under sustained pressure
-
Latency remains significantly lower than full-cluster burst tests due to:
Key Observations
-
Pi-hole handles moderate sustained load well, but:
-
Error rate (~1%) is acceptable for this test shape and aligns with earlier cluster behaviour
Takeaway
Individual Pi-hole instances perform reliably in the low-thousands QPS range under sustained load
Scaling to higher throughput requires horizontal backend expansion (more nodes) rather than pushing individual instances harder
If anyone else is pushing Pi-hole/dnsdist at high PPS, would be interested to compare results 
1 Like