Issue with Cloudsync compatibility

dispo2 · January 25, 2021, 9:11pm

That appears to be the route of the difference then.

The above code uses a looping routine with the INSERT INTO function whereas pihole-cloudsync uses the .IMPORT function of sqlite.

I have no idea if that is the best way of doing a sync between master/slave databases but it seems to be the shortest code. 1 line to drop the table, 1 line to import it.

The .IMPORT function seems very limited in its syntax.

The code used by pihole-cloudsync could be made more complicated to deal with that but it seems a shame that it needs to.

With the table name and the limited syntax of .import it's a case of two 'thats inconvenient' combining to make a 'thats a pita'

DanSchaper · January 25, 2021, 9:43pm

Here's where I run in to some confusion:

Unless you are the only person using cloudsync then why isn't anyone else seeing this?

dispo2 · January 25, 2021, 10:26pm

At the moment the current version of cloudsync does not support groups. The groundwork has been done by the author but it's not there yet.

I run 3 pihole, 2 in a common network and 1 in a remote location and I use groups so I decided to look at the code and implement it for myself. I didnt do this via pull request as I was just experimenting. I also liked how the use of githut to sync could be kept private and yet be reachable by any pihole with a net connection.

I found that implementing the group and client sync was easy to code (thanks to how cloudsync had already been written and the fact that group/client related info is just contained in extra tables) but I also ran into factors that complicated my code.

The table name is one but another is the client info. For example it should not be assumed that piholes being synced for their adlists and group design should also have their client info synced as not all piholes will be in the same location and therefore see the same clients. (not an issue just a design consideration).

As I said, I'm not asking for support on this. It's not even official cloudsync code that I am running but the author has indicated that group sync is coming so I assume he will run into the same issues I have.

If I could code it properly then I would do a pull request but at the moment, with the group table and the syntax / unreliable nature of .import, its a bodge so I dont want to.

I'm just trying to raise awareness of the issue, if you call it that, to save people some dev time further down the line.

DanSchaper · January 25, 2021, 10:29pm

Well, lets see the code and maybe we can help. Though I wouldn't say that needing a loop instead of a one-liner to be complicated. It truly is just a case of quoting the table names. We don't know what you are doing so we can't look at replicating it or finding out if the issue lies elsewhere.

dispo2 · January 25, 2021, 10:47pm

So cloudsync has two basic functions, controlled from the commandline.

The nominated master pihole exports the relevant tables to csv files and uses standard github processes to detect changes and push them to a repository

$SUDO sqlite3 $gravity_db -header -csv "SELECT * FROM adlist" >$ad_list

This is just repeated for all relevant tables and can include anything, adlists, groups, clients it doesnt matter as its just an export. This works flawlessly for all tables but requires a minor syntax change for the group table due to the name.

    $SUDO sqlite3 $gravity_db -header -csv "SELECT * FROM 'group'" >$group_list

The nominated slave(s) use the exact same github processes to detect changes and update the local cache. If changes are detected then the FTL process is stopped, various tables are dropped and then recreated using the sqlite built in .import function and then FTL is restarted.

$SUDO sqlite3 $gravity_db "DROP TABLE adlist;"
$SUDO sqlite3 $gravity_db -header -csv ".import adlist.csv adlist"

This is again repeated for all required tables and this is where you get to choose what to drop/import - such as client data. Tbh this is a great way of doing the sync as the file copy is measured in Kb whereas copying the entire gravity.db can easily be tens of Mb (mine if 79Mb)

This works for all tables except group as the .import function within sqlite refuses to work with that name - as far as I have found anyway. The plain name and quoted name just dont work.

My workaround for this was to make it a 3 step process for the group table

    $SUDO sqlite3 $gravity_db "DROP TABLE 'group';"
    $SUDO sqlite3 $gravity_db -header -csv ".import group.csv group_tmp"
    $SUDO sqlite3 $gravity_db "ALTER TABLE group_tmp RENAME TO 'group';"

However this is unreliable and (apparently) depending on the version of sqlite installed, this either works or errors due to issues with the views within gravity.db looking for main.group after the table has been dropped which prevents the ALTER TABLE function from completing.

Just to be clear, the above is the code as used by cloudsync but I have been testing directly from the sqlite commandline to remove any possible issues introduced by the above syntax and "" quotes etc.

The only table that ever has any problems is the group table. It's not conclusive (but the syntax errors strongly indicate) that is is due to the name of the group. All of the operations listed above work perfectly on any table I have created / imported as long as they use a different name [that isnt also a keyword].

sqlite> ALTER TABLE group_tmp RENAME TO 'group';
Error: error in view vw_whitelist: no such table: main.group

This only happens on some slaves and the only clear difference between them is sqlite version.
I tried dropping the view (just to test) and the error just moved to the next view and only stopped when all views were dropped. I'm not sure what the views do but dropping them just to get the group import working definitely felt the wrong direction to go.

DanSchaper · January 25, 2021, 11:04pm

What is the output you are seeing when you try this? A syntax error or Usage?

DanSchaper · January 25, 2021, 11:06pm

The reason why we do a copy instead of an in place update is for performance. Copy, edit and then flip that new database in has very minimal downtime. Edit in place takes the database out of use due to locking for the time it takes to do the full process of manipulating the database.

DanSchaper · January 25, 2021, 11:15pm

github.com/pi-hole/pi-hole

Improve gravity performance

pi-hole:release/v5.0 ← pi-hole:tweak/gravity_performance

opened 03:48PM - 24 Jan 20 UTC

DL6ER

+272 -158

**By submitting this pull request, I confirm the following:** - [X] I have r…ead and understood the [contributors guide](https://github.com/pi-hole/pi-hole/blob/master/CONTRIBUTING.md), as well as this entire template. - [X] I have made only one major change in my proposed changes. - [X] I have commented my proposed changes within the code. - [X] I have tested my proposed changes, and have included unit tests where possible. - [X] I am willing to help maintain this change if there are issues with it later. - [X] I give this submission freely and claim no ownership. - [X] It is compatible with the [EUPL 1.2 license](https://opensource.org/licenses/EUPL-1.1) - [X] I have squashed any insignificant commits. ([`git rebase`](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html)) --- **What does this PR aim to accomplish?:** Improve gravity performance. There is an extensive discussion about this in on [Discourse](https://discourse.pi-hole.net/t/pihole-g-list-download-disappointing-performance/27266). **IMPORTANT:** *Please review + merge #3089 first* **How does this PR accomplish the above?:** Rewrote most of the gravity code which in concerned with the database. Create a second database during the download process, inject all domains there and swap this new with the old database. This has the additional benefit that there is no time in which the database is unavailable to FTL. **What documentation changes (if any) are needed to support this PR?:** Needs to be checked.

dispo2 · January 25, 2021, 11:24pm

'Tbh this is a great way of doing the sync as the file copy is measured in Kb whereas copying the entire gravity.db can easily be tens of Mb (mine if 79Mb)'

I meant this purely for how cloudsync works. It keeps the file size down and allows easy use of github and reduces traffic to remote hosts for the sync. They still refresh the adlists when changes are detected but that's still a lot less traffic.

A copy of the database during pihole -up makes perfect sense.

This is what I get when I try and use the .import command within sqlite. As you can see it fails with group but happily works with group2. The code within sqlite only appears to allow very limited syntax for .import which does not include quotes.

SQLite version 3.27.2 2019-02-25 16:06:06
Enter ".help" for usage hints.
sqlite> .tables
adlist domainlist_by_group vw_gravity
adlist_by_group gravity vw_regex_blacklist
client group vw_regex_whitelist
client_by_group info vw_whitelist
domain_audit vw_adlist
domainlist vw_blacklist

sqlite> drop table 'group';
sqlite> .mode csv

sqlite> .import /usr/local/bin/my-pihole-lists/group.csv group
Error: near "group": syntax error

sqlite> .import /usr/local/bin/my-pihole-lists/group.csv 'group'
Error: near "group": syntax error

sqlite> .import /usr/local/bin/my-pihole-lists/group.csv "group"
Error: near "group": syntax error

sqlite> .tables
adlist domainlist vw_blacklist
adlist_by_group domainlist_by_group vw_gravity
client gravity vw_regex_blacklist
client_by_group info vw_regex_whitelist
domain_audit vw_adlist vw_whitelist
sqlite>

sqlite> .import /usr/local/bin/my-pihole-lists/group.csv group2
sqlite> .tables
adlist domainlist_by_group vw_gravity
adlist_by_group gravity vw_regex_blacklist
client group2 vw_regex_whitelist
client_by_group info vw_whitelist
domain_audit vw_adlist
domainlist vw_blacklist
sqlite>

And then the ALTER TABLE fails (for this version of sqlite)

sqlite> ALTER TABLE group2 RENAME TO 'group';
Error: error in view vw_whitelist: no such table: main.group
sqlite>

DL6ER · January 26, 2021, 5:48pm

The code line I posted is the .import function of sqlite:

I'm not quite sure what the problem is here, check the following (test.db didn't exist before):

wget https://cdn.sqlitetutorial.net/wp-content/uploads/2016/05/city.csv
sqlite3 test.db ".import city.csv group"
sqlite3 test.db "SELECT COUNT(*) FROM \"group\""

The last command returned 274 telling me that everything worked as expected?

I see you are using an older version of SQLite3

so I'd first suggest using a more recent version. Try the SQLite3 engine embedded into FTL (pihole-FTL sqlite3), it is much more recent than what you have on your system, e.g.


wget https://cdn.sqlitetutorial.net/wp-content/uploads/2016/05/city.csv
rm test.db
pihole-FTL sqlite3 test.db ".import city.csv group"
pihole-FTL sqlite3 test.db "SELECT COUNT(*) FROM \"group\""

Are you still seeing problems?

dispo2 · January 26, 2021, 10:36pm

That works and many thanks.

I was not aware that you could access sqlite within FTL like that, I had assumed that the pihole installer just added sqlite from the distro if not present but using your own inbuilt makes much more sense.

I have modified my version of the cloudsync code to use the FTL sqlite and included the use of " instead of ' to quote the group table (for futureproofing) and all apears to work fine.

One thing I noticed is that using the FTL sqlite the DROP TABLE started to produce a foreign key error so I had to amend the drop commands as follows (this is just FYI). Gotta love the variations from different sqlite versions.

old
$SUDO pihole-FTL sqlite3 $gravity_db "DROP TABLE adlist;"

new
$SUDO pihole-FTL sqlite3 $gravity_db "PRAGMA foreign_keys=0;DROP TABLE adlist;"

This doesnt appear to create issues as I am effectively replacing all the tables on the slaves so they shouldn't get out of sync but I will test further.

yubiuser · January 27, 2021, 7:22am

That's fairly new Update SQLite3 to 3.34.0 by DL6ER · Pull Request #1006 · pi-hole/FTL · GitHub

DL6ER · January 27, 2021, 6:05pm

Because what you're doing here is not correct: You are deleting the table adlist even though there are still rows in gravity referencing entries in this table.

You should not delete the parent rows before deleting the child rows. Disabling foreign key constraint enforcement is only silencing the issue, several database function may not work reliably/at all afterwards.

dispo2 · January 27, 2021, 9:34pm

Then I am at a bit of a loss as to how multiple piholes can be synced without copying the entire gravity database. Many of the tables and views are interrelated.

Even if I could come up with the correct order of dropping tables I would need to drop tables not directly affected by adlist / group / client changes that are wanting to be synced. And then recreate them using blank tables that are repopulated by pihole -g or full exports which end up being large files to sync.

I also think that the potential problems being highlighted by the foreign key errors are not new, they were always present due to the way that cloudsync works - they just werent being reported because cloudsync was using the distro sqlite not the FTL version.

Another unfortunate aspect of this is that cloudsync syncing groups has now become dependent on the version of FTL installed (or distro sqlite). Previous to this the script would work with any version of 5.x but for groups it requires a version that supports the FTL sqlite shell - which appears to date from Dec-20 so version 5.4 ? (or it requires that the distro version of sqlite be reasonably new).

For example the systems I am running pihole on have sqlite 3.27.2 - these are new builds of debian buster and thats the version of sqlite in the distro.

I notice that someone also edited the title of this thread. Whilst this is a cloudsync issue and relates to an external function that most pihole users dont use it all does come back to the use of group as a table name. None of the other tables generate any of the issues that that one table does - due to its name.

Bucking_Horn · January 27, 2021, 11:00pm

I share your generic concern for not using SQL keywords like GROUP as a name, and I think the Pi-hole development team does so as well - but I also think that point has been made quite early on in this discussion.

However, I do not share your take on the custom solution you are using. (click for details)

Pi-hole itself does not claim to support syncing multiple instances, and neither does it make any assertions towards supporting those.

At the same time, it does not forbid third parties to develop a syncing solution.
Pi-hole's source code is publically available, you can build on that.

stevejenkins/pihole-cloudsync dates about two years back.
Back then, Pi-hole 4 was using pihole-FTL.db only, and gravity was a term for a collection of blocklist files. Naturally, that solution would have relied on file based operations.

gravity.db was introduced along with client-based filtering and group management some 8 months ago with Pi-hole 5 in May last year, changing file based operation into database interactions.

That allowed for a major step forward in terms of fine-grained control of DNS blocking if you care to make use of it, while it will just function as usual for those users that don't.

That major shift didn't happen in the dark:
There has been an extensive public beta-phase starting in January 2020, four months before Pi-hole 5 was rolled out.
That would have been the right time for 3rd party tool maintainers to familiarise with the changes and raise their concerns, and let me assure you that the Pi-hole development is always more than keen to listen (though it may take strong, valid arguments to convince them of your ways).

It is still possible to sync two Pi-hole instances with Pi-hole 5.x.

If you are not depending on pihole-cloudsync, have a look at other community solutions. e.g. GitHub - vmstan/gravity-sync: An easy way to synchronize the blocklist and local DNS configurations of multiple Pi-hole 5.x instances..

jfb · January 28, 2021, 12:39am

I edited it. The problem is with your Cloud sync script and the new title more accurately reflects the topic of the thread.

Coro · January 28, 2021, 6:00am

Yes.

Yes but where exactly is the problem?
Example: You want to sync the adlist table.

The existing and new lists are identical:
Just don't do anything.
The existing and new lists are the same but some properties like the comment changed:
Use UPDATE instead of hard flushing the table. No foreign key issues.
The lists have changed:
Selectively delete the adlist that should be removed by first issuing something like DELETE FROM gravity WHERE adlist_id = <id of your list> and only then removing the adlist itselt: DELETE FROM adlist WHERE id = <id of your list>.
You should delete the orphan entries from gravity because they are pointless (they point to no parent row as @DL6ER said). Then you add new adlists to the table.
Whenever adlist changed, you will have to run pihole -g in any way to get a meaningful database. What I want to say is that you'd loose the corresponding gravity entries either way on the next pihole -g run when the corresponding lists are gone.
Again, when doing it like this, no foreign key issues.

DanSchaper · January 28, 2021, 7:22pm

DanSchaper · January 28, 2021, 7:23pm

Reverted topic change since it's not really working.

system · February 4, 2021, 7:23pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.