My V8 Pi-hole instance with (Debian 12 / Proxmox VMs)

:test_tube: DNS Performance Testing Summary (DNSdist + Pi-hole cluster)

I’ve been benchmarking a local DNS stack using DNSdist (frontend) with multiple Pi-hole backends under high-QPS UDP load (dnspyre/dnsblast style testing).

:desktop_computer: Hardware

  • HP DL360 (older server)

  • Proxmox VM environment

  • Up to 24 cores available


:wrench: Test Setup

  • dnsdist as frontend load balancer

  • Pi-hole instances as backends (scaled from a few → up to 12)

  • High PPS / UDP-heavy workload (small packets, burst traffic)

  • Both cached and uncached scenarios tested

:backhand_index_pointing_right: NOTE: The results below primarily reflect cached DNS performance, not full recursive (uncached) resolution.


:bar_chart: Key Findings

1. Peak vs Sustained Performance

  • Peak (short burst): ~140k QPS

  • Sustained (realistic): ~60k–70k QPS

:right_arrow: System handles very high bursts but settles to a stable throughput ceiling.


2. Frontend Core Scaling

  • Increasing dnsdist cores alone did not always improve performance

  • With limited backend capacity:

    • More cores = more queueing + higher drops
  • With larger backend (12 Pi-holes)

    • Higher core counts improved throughput

:right_arrow: Frontend scaling only helps if backend can absorb it


3. Backend Scaling (most important factor)

  • Adding more Pi-hole instances gave the biggest improvement

  • Example:

    • Smaller backend → ~60–68k QPS

    • 12 Pi-holes → ~71k QPS sustained

:right_arrow: Backend fanout had more impact than CPU tuning


4. RAM Disk (tmpfs) Testing

  • Moving logs/DB to RAM:

    • Reduced disk I/O

    • Improved median latency (p50)

  • But:

    • Increased queue depth

    • Higher error rates under burst load

:right_arrow: Removing I/O bottlenecks can increase overload effects


5. Network Tuning (sysctl)

  • Increased buffers/backlog improved burst handling

  • Allowed ingestion of very high initial QPS (~400k+)

  • But:

    • Did not increase sustained throughput

    • Increased latency under overload

:right_arrow: Higher buffers = more queueing, not more capacity


6. Real Bottleneck

The limiting factors are NOT:

  • CPU (low load observed)

  • Disk (after tuning)

  • Network stack (after sysctl tuning)

The bottleneck is:

  • dnsdist processing + scheduling

  • Pi-hole/FTL backend capacity

  • Queue buildup under burst load


:chart_decreasing: Typical Stable Envelope

Across multiple runs:

  • Throughput: ~65k–70k QPS

  • Error rate: ~1.5%–3%

  • Latency (under burst):

    • p50: ~200–250 ms

    • p95: ~280–300 ms


:brain: Key Takeaways

  • More cores ≠ more performance

  • Backend scaling > frontend scaling

  • Burst capacity ≠ sustainable throughput

  • Reducing bottlenecks can expose deeper limits

  • Queueing (not CPU) drives latency at high load


:chequered_flag: Best Performing Setup

So far:

  • dnsdist: ~20 cores

  • Backend: 12 Pi-hole instances

  • No RAM disk tricks

  • Tuned network stack

:right_arrow: Best balance of throughput and stability


:speech_balloon: Final Thoughts

For high-QPS DNS workloads:

  • Focus on backend scaling and distribution

  • Avoid over-driving the system with unrealistic burst loads

  • Tune for sustained throughput, not just peak numbers


:test_tube: Pi-hole (Tuned) Performance Snapshot Cached data

After tuning Pi-hole (logging adjustments, general cleanup, and backend balancing), I ran a focused test to look at individual node behaviour.

:bar_chart: Results (single-node style load 1000 domain list)

  • Achieved send QPS: ~4,588

  • Total queries: 67,164

  • Successful: 66,484

  • Failed: 680

  • Error rate: ~1.01%

  • Latency:

    • p50: ~74 ms

    • p95: ~128 ms


:bar_chart: Results (single-node style load 20 domain list)

  • Achieved send QPS: ~12k

:brain: Interpretation

  • This reflects per-node performance under sustained load, not aggregate cluster throughput

  • Results are consistent with earlier findings of ~12k cached peak per node, dropping under sustained pressure

  • Latency remains significantly lower than full-cluster burst tests due to:

    • reduced queue depth

    • more controlled ingestion rate


:magnifying_glass_tilted_left: Key Observations

  • Pi-hole handles moderate sustained load well, but:

    • throughput per node is limited under continuous pressure

    • queueing still appears as load increases

  • Error rate (~1%) is acceptable for this test shape and aligns with earlier cluster behaviour


:chequered_flag: Takeaway

:backhand_index_pointing_right: Individual Pi-hole instances perform reliably in the low-thousands QPS range under sustained load
:backhand_index_pointing_right: Scaling to higher throughput requires horizontal backend expansion (more nodes) rather than pushing individual instances harder

If anyone else is pushing Pi-hole/dnsdist at high PPS, would be interested to compare results :+1: