It's currently impossible to run more than one synchronized pi-hole instance in a network.
Running a single pi-hole instance in a network with hundreds of clients is not fault tolerant. Unexpected failures or even planned updates cause disruption. Running multiple instances requires additional logic for synchronization. Even with a synchronized configuration, statistics and access logs would be incomplete.
Running multiple instances using the same sqlite database results in malformed files - tested using the kubernetes-pihole helm-chart made by Mojo2000. With 2 replicas and a persistent volume set to ReadWriteMany (provided by longhorn), the database breaks before pi-hole even finishes starting up for the first time.
Adding the ability to use a user-provided database, for example postgresql or redis/valkey, would solve this issue. Ideally this feature would allow for using different databases using adapters, for easier extensions.
Currently, pihole-FTL's in-memory and SQLite's persistent database are tightly coupled.
AFAIAAO, that involves some custom C code for SQLite.
I'm not sure how other DB engines would allow that kind of integration (if at all), and whether our development team would be able and willing to dedicate resources to adding functionality that would benefit an edge case only, potentially forcing them to sacrifice development of other features.
Even if we allow the use of an external database (which would not be a simple change), you would still need different databases for different Pi-holes.
Currently there is no way no aggregate data from multiple Pi-holes without completely mixing the records (there is no way to differentiate records between Pi-hole A and Pi-hole B), unless we execute a complete rewrite in all database related code.
There's a few users that run Pi-hole on kubernetes very successfully. I think you'd get a better, and quicker, response by contacting the helm chart maintainer with your issues.
Don't use one sqlite database for multiple Pi-hole instances.
Besides the current locking issues - why not? All resolvers should have the same data, no matter which one I'm talking to. It also shouldn't be necessary to use the api repeatedly on every instance to insert a local dns record, for example.
The setups I've seen so far tend to be overly complex with multiple deployments to separate write/management instances from read instances. This works well on databases managed by operators with automated failover scenarios, but is discouraged for frontend or services.
I don't see how the chart maintainers should be able to build a chart following best practises, hence this feature request. They can't do magic when pi-hole isn't supporting it.
I'm just trying to get a bit of focus on high availability and scalability (and modularity in that context), since I consider these topics as important - independent of the project size and target audience.
Without that there will be no way to show a valid query log. The clients in Pi-hole A will have IDs, but there is no way to guarantee the same clients will have the same IDs on Pi-hole B.
The same is true for Lists, Domains, and any other information stored on the database.
Usually 2 dns resolvers are propagated via dhcp in a network(-segment). Clients are free to ask whichever resolver they want. Both resolvers should give the same answer to a specific request.
I assume query logs as well as lists, domains and such have a primary key, usually auto-increment or random uuid. The insert action is either atomic or uses transactions. As I wrote above, it doesn't matter if a request got answered by pi-hole A or B, since both should hold the same data.
Please enlighten me - what do I miss?
Edit: Are you talking about two pi-holes in their current form, with separate databases? In that case you're right. And that's why I'm asking for an external or common database for multiple instances.
Currently (without a massive FTL rewrite) there will be ID=1 from Pi-hole A and ID=1 from Pi-hole B.
To increase performance, most of the data related to DNS queries is initially stored in a "in-memory" database. The local "in-memory" database will never be able to communicate to the other Pi-hole "in-memory" database.
If 2 new clients make DNS queries, these client will receive new IDs, but only locally ("in-memory"). If the same clients make DNS queries to the other Pi-hole, they will receive new IDs on the other Pi-hole (there is no way to guarantee they will receive the same IDs or on the same order).
The same ID logic will apply to DNS queries. Each Pi-hole will create new IDs for each query.
When each Pi-hole finally stores the data in the disk database, these records collide.
I'd consider not writing immediately a design issue, but okay. In that case I'd keep the new data in a sequential wal-log without ids and only assign ids when writing into the database.
And how much would you like to pay to sponsor a complete rewrite of Pi-hole to suit your particular wants?
I'm sorry but the number of people that come in demanding that we change everything because they want it is astounding. You have all of the source code, you want changes, write them yourself.
I gave you feedback about your project. I didn't blame anyone, especially for a design decision that probably happened years ago. I provided you my point of view as an experienced system engineer about potential for optimization.