DNS Performance Testing Summary (DNSdist + Pi-hole cluster)
I’ve been benchmarking a local DNS stack using DNSdist (frontend) with multiple Pi-hole backends under high-QPS UDP load (dnspyre/dnsblast style testing).
Hardware
-
HP DL360 (older server)
-
Proxmox VM environment
-
Up to 24 cores available
Test Setup
-
dnsdist as frontend load balancer
-
Pi-hole instances as backends (scaled from a few → up to 12)
-
High PPS / UDP-heavy workload (small packets, burst traffic)
-
Both cached and uncached scenarios tested
NOTE: The results below primarily reflect cached DNS performance, not full recursive (uncached) resolution.
Key Findings
1. Peak vs Sustained Performance
-
Peak (short burst): ~140k QPS
-
Sustained (realistic): ~60k–70k QPS
System handles very high bursts but settles to a stable throughput ceiling.
2. Frontend Core Scaling
-
Increasing dnsdist cores alone did not always improve performance
-
With limited backend capacity:
- More cores = more queueing + higher drops
-
With larger backend (12 Pi-holes)
- Higher core counts improved throughput
Frontend scaling only helps if backend can absorb it
3. Backend Scaling (most important factor)
-
Adding more Pi-hole instances gave the biggest improvement
-
Example:
-
Smaller backend → ~60–68k QPS
-
12 Pi-holes → ~71k QPS sustained
-
Backend fanout had more impact than CPU tuning
4. RAM Disk (tmpfs) Testing
-
Moving logs/DB to RAM:
-
Reduced disk I/O
-
Improved median latency (p50)
-
-
But:
-
Increased queue depth
-
Higher error rates under burst load
-
Removing I/O bottlenecks can increase overload effects
5. Network Tuning (sysctl)
-
Increased buffers/backlog improved burst handling
-
Allowed ingestion of very high initial QPS (~400k+)
-
But:
-
Did not increase sustained throughput
-
Increased latency under overload
-
Higher buffers = more queueing, not more capacity
6. Real Bottleneck
The limiting factors are NOT:
-
CPU (low load observed)
-
Disk (after tuning)
-
Network stack (after sysctl tuning)
The bottleneck is:
-
dnsdist processing + scheduling
-
Pi-hole/FTL backend capacity
-
Queue buildup under burst load
Typical Stable Envelope
Across multiple runs:
-
Throughput: ~65k–70k QPS
-
Error rate: ~1.5%–3%
-
Latency (under burst):
-
p50: ~200–250 ms
-
p95: ~280–300 ms
-
Key Takeaways
-
More cores ≠ more performance
-
Backend scaling > frontend scaling
-
Burst capacity ≠ sustainable throughput
-
Reducing bottlenecks can expose deeper limits
-
Queueing (not CPU) drives latency at high load
Best Performing Setup
So far:
-
dnsdist: ~20 cores
-
Backend: 12 Pi-hole instances
-
No RAM disk tricks
-
Tuned network stack
Best balance of throughput and stability
Final Thoughts
For high-QPS DNS workloads:
-
Focus on backend scaling and distribution
-
Avoid over-driving the system with unrealistic burst loads
-
Tune for sustained throughput, not just peak numbers
Pi-hole (Tuned) Performance Snapshot Cached data
After tuning Pi-hole (logging adjustments, general cleanup, and backend balancing), I ran a focused test to look at individual node behaviour.
Results (single-node style load 1000 domain list)
-
Achieved send QPS: ~4,588
-
Total queries: 67,164
-
Successful: 66,484
-
Failed: 680
-
Error rate: ~1.01%
-
Latency:
-
p50: ~74 ms
-
p95: ~128 ms
-
Results (single-node style load 20 domain list)
- Achieved send QPS: ~12k
Interpretation
-
This reflects per-node performance under sustained load, not aggregate cluster throughput
-
Results are consistent with earlier findings of ~12k cached peak per node, dropping under sustained pressure
-
Latency remains significantly lower than full-cluster burst tests due to:
-
reduced queue depth
-
more controlled ingestion rate
-
Key Observations
-
Pi-hole handles moderate sustained load well, but:
-
throughput per node is limited under continuous pressure
-
queueing still appears as load increases
-
-
Error rate (~1%) is acceptable for this test shape and aligns with earlier cluster behaviour
Takeaway
Individual Pi-hole instances perform reliably in the low-thousands QPS range under sustained load
Scaling to higher throughput requires horizontal backend expansion (more nodes) rather than pushing individual instances harder
If anyone else is pushing Pi-hole/dnsdist at high PPS, would be interested to compare results ![]()