My V8 Pi-hole instance with (Debian 12 / Proxmox VMs)

smokingwheels · April 8, 2026, 3:49am

DNS Performance Testing Summary (DNSdist + Pi-hole cluster)

I’ve been benchmarking a local DNS stack using DNSdist (frontend) with multiple Pi-hole backends under high-QPS UDP load (dnspyre/dnsblast style testing).

Hardware

HP DL360 (older server)
Proxmox VM environment
Up to 24 cores available

Test Setup

dnsdist as frontend load balancer
Pi-hole instances as backends (scaled from a few → up to 12)
High PPS / UDP-heavy workload (small packets, burst traffic)
Both cached and uncached scenarios tested

NOTE: The results below primarily reflect cached DNS performance, not full recursive (uncached) resolution.

Key Findings

1. Peak vs Sustained Performance

Peak (short burst): ~140k QPS
Sustained (realistic): ~60k–70k QPS

System handles very high bursts but settles to a stable throughput ceiling.

2. Frontend Core Scaling

Increasing dnsdist cores alone did not always improve performance
With limited backend capacity:
- More cores = more queueing + higher drops
With larger backend (12 Pi-holes)
- Higher core counts improved throughput

Frontend scaling only helps if backend can absorb it

3. Backend Scaling (most important factor)

Adding more Pi-hole instances gave the biggest improvement
Example:
- Smaller backend → ~60–68k QPS
- 12 Pi-holes → ~71k QPS sustained

Backend fanout had more impact than CPU tuning

4. RAM Disk (tmpfs) Testing

Moving logs/DB to RAM:
- Reduced disk I/O
- Improved median latency (p50)
But:
- Increased queue depth
- Higher error rates under burst load

Removing I/O bottlenecks can increase overload effects

5. Network Tuning (sysctl)

Increased buffers/backlog improved burst handling
Allowed ingestion of very high initial QPS (~400k+)
But:
- Did not increase sustained throughput
- Increased latency under overload

Higher buffers = more queueing, not more capacity

6. Real Bottleneck

The limiting factors are NOT:

CPU (low load observed)
Disk (after tuning)
Network stack (after sysctl tuning)

The bottleneck is:

dnsdist processing + scheduling
Pi-hole/FTL backend capacity
Queue buildup under burst load

Typical Stable Envelope

Across multiple runs:

Throughput: ~65k–70k QPS
Error rate: ~1.5%–3%
Latency (under burst):
- p50: ~200–250 ms
- p95: ~280–300 ms

Key Takeaways

More cores ≠ more performance
Backend scaling > frontend scaling
Burst capacity ≠ sustainable throughput
Reducing bottlenecks can expose deeper limits
Queueing (not CPU) drives latency at high load

Best Performing Setup

So far:

dnsdist: ~20 cores
Backend: 12 Pi-hole instances
No RAM disk tricks
Tuned network stack

Best balance of throughput and stability

Final Thoughts

For high-QPS DNS workloads:

Focus on backend scaling and distribution
Avoid over-driving the system with unrealistic burst loads
Tune for sustained throughput, not just peak numbers

Pi-hole (Tuned) Performance Snapshot Cached data

After tuning Pi-hole (logging adjustments, general cleanup, and backend balancing), I ran a focused test to look at individual node behaviour.

Results (single-node style load 1000 domain list)

Achieved send QPS: ~4,588
Total queries: 67,164
Successful: 66,484
Failed: 680
Error rate: ~1.01%
Latency:
- p50: ~74 ms
- p95: ~128 ms

Results (single-node style load 20 domain list)

Achieved send QPS: ~12k

Interpretation

This reflects per-node performance under sustained load, not aggregate cluster throughput
Results are consistent with earlier findings of ~12k cached peak per node, dropping under sustained pressure
Latency remains significantly lower than full-cluster burst tests due to:
- reduced queue depth
- more controlled ingestion rate

Key Observations

Pi-hole handles moderate sustained load well, but:
- throughput per node is limited under continuous pressure
- queueing still appears as load increases
Error rate (~1%) is acceptable for this test shape and aligns with earlier cluster behaviour

Takeaway

Individual Pi-hole instances perform reliably in the low-thousands QPS range under sustained load
Scaling to higher throughput requires horizontal backend expansion (more nodes) rather than pushing individual instances harder

If anyone else is pushing Pi-hole/dnsdist at high PPS, would be interested to compare results