Interesting study: Leaky Forms

A very intersting study can be read here. It shows how data from forms that haven't been submitted (yet, or user changes his mind) is tracked.

The article mentions whotracks.me, among others. On that site, I found a link to a database (sql script), containing a lot of categories (16) and the trackers ( 4327). Using sqlite3 queries, it is possible to generate a pihole compatible blocklist (local lists can be added, using the syntax file://)

For some reason (haven't looked into the cause), the database cannot be created using pihole-FTL sqlite3. You need the actual sqlite3 binary (sudo apt-get install -y sqlite3) to create the database.

The article also mentions in the Q/A section:
quote
... and encouraged us to reach out to the blocklist maintainers to add missing tracker domains.
/quote
A question (mail) to one of the article authors learned, the trackers, identified during the study, have been shared with some key players (blocklist maintainers):
quote
We reached out EasyList and EasyPrivacy, also DuckDuckGo will check their rules that they used in their extension after our results.
/quote.

A quick test (pihole -q) learned me most of the trackers, mentioned in the first table of the article (taboola.com, bizible.com, glassboxdigital.io, rlcdn.com, fullstory.com, ...) are already included in one or more blocklists I've been using.

The article also mentions (don't click the links, unless you want to find out you're protected) the meta pixel (known to most users by now) and the tiktok pixel use a feauture called "Automatic Advanced Matching", they collect hashed personal identifiers from the web forms in an automated manner.

I'm already using a regex to block facebook (meta) altogether:

^(.+\.)?(facebook|fb(cdn|sbx)?|tfbnw)\.[^.]+$

It looks like I'm going to need something similar for tiktok...

1 Like

Pleases investigate if you have time. Most likely it's because Pi-hole uses the most recent sqlite version which is unlikely installed from your OS repositories.
We had an other issue report lately which also does not happen with older sqlite versions.

when using the command pihole-FTL sqlite3 ./whotracks.db < ./trackerdb.sql

two errors pop up (multiple times):

Parse error near line 2315: no such table: main.trackers
Runtime error near line 7633: FOREIGN KEY constraint failed (19)

The database simply isn't created.

first error (line 2315):

INSERT INTO "tracker_domains" VALUES('1000mercis','mmtro.com',NULL);

the line before that line is the line where the table is (should be) created:

CREATE TABLE tracker_domains (tracker TEXT NOT NULL, domain TEXT UNIQUE NOT NULL, notes TEXT, FOREIGN KEY (tracker) REFERENCES trackers (id));

all inserts fail because the table doesn't exist.

edit
for completeness, version info of sqlite3 binary, that does succesfully creates the database:

apt-cache policy sqlite3
sqlite3:
  Installed: 3.34.1-3
  Candidate: 3.34.1-3
  Version table:
 *** 3.34.1-3 500
        500 http://raspbian.raspberrypi.org/raspbian bullseye/main armhf Packages
        100 /var/lib/dpkg/status

/edit

edit2
for FTL, there is a test scenario, using powerDNS, that checks if all queries return the expected result, this to ensure no broken binary is released (CNAME problem a while ago).
maybe, @DL6ER, you should add a test scenario for the embedded sqlite3, this to ensure all commands really work...
/edit2

I looked at the code you linked above and extracted the necessary information for one example. Note: the order is exactly as in the script (if I didn't confused it).

CREATE TABLE categories(id INTEGER PRIMARY KEY,name TEXT UNIQUE);
INSERT INTO "categories" VALUES(6,'site_analytics');

CREATE TABLE companies (id TEXT NOT NULL UNIQUE, name TEXT NOT NULL, description TEXT, privacy_url TEXT, website_url TEXT, ghostery_id TEXT, country VARCHAR (2), privacy_contact TEXT, notes TEXT);
INSERT INTO "companies" VALUES('1000mercis','1000mercis','\"1000mercis Group, a pioneer in interactive advertising and marketing, provides innovative solutions for companies willing to optimise their customer acquisition and retention through interactive media (Internet, mobile phones and tablets).\"','http://ads.1000mercis.com/fr.html','http://www.1000mercis.com/','2262',NULL,'contactUSA@1000mercis.com','11/14/12 CT: Description Source: Website\r\n09/20/16 JW:  Disabled in lab.  Appnexus wont white list our lab and cant properly test.\r\n9/18/17 CH: Updated Privacy Policy URL: from none to http://mmtro.com/privacy/fr/\r\nUpdated Privacy Contact Form URL: from none to http://www.1000mercis.com/#!/contact');

CREATE TABLE tracker_domains (tracker TEXT NOT NULL, domain TEXT UNIQUE NOT NULL, notes TEXT, FOREIGN KEY (tracker) REFERENCES trackers (id));
INSERT INTO "tracker_domains" VALUES('1000mercis','mmtro.com',NULL);


CREATE TABLE trackers (id TEXT NOT NULL UNIQUE, name TEXT NOT NULL, category_id INTEGER, website_url TEXT, company_id TEXT, ghostery_id TEXT, notes TEXT, alias TEXT REFERENCES trackers (id), FOREIGN KEY (category_id) REFERENCES categories (id), FOREIGN KEY (company_id) REFERENCES companies (id));
INSERT INTO "trackers" VALUES('1000mercis','1000mercis',6,NULL,'1000mercis','2662',NULL,NULL);

The issue happens because table tracker_domains referenced table trackers which does not exist so far. But before tackers is created, data should already be inserted into tracker_domains

The way it works is

CREATE TABLE categories(id INTEGER PRIMARY KEY,name TEXT UNIQUE);
INSERT INTO "categories" VALUES(6,'site_analytics');

CREATE TABLE companies (id TEXT NOT NULL UNIQUE, name TEXT NOT NULL, description TEXT, privacy_url TEXT, website_url TEXT, ghostery_id TEXT, country VARCHAR (2), privacy_contact TEXT, notes TEXT);
INSERT INTO "companies" VALUES('1000mercis','1000mercis','\"1000mercis Group, a pioneer in interactive advertising and marketing, provides innovative solutions for companies willing to optimise their customer acquisition and retention through interactive media (Internet, mobile phones and tablets).\"','http://ads.1000mercis.com/fr.html','http://www.1000mercis.com/','2262',NULL,'contactUSA@1000mercis.com','11/14/12 CT: Description Source: Website\r\n09/20/16 JW:  Disabled in lab.  Appnexus wont white list our lab and cant properly test.\r\n9/18/17 CH: Updated Privacy Policy URL: from none to http://mmtro.com/privacy/fr/\r\nUpdated Privacy Contact Form URL: from none to http://www.1000mercis.com/#!/contact');

CREATE TABLE tracker_domains (tracker TEXT NOT NULL, domain TEXT UNIQUE NOT NULL, notes TEXT, FOREIGN KEY (tracker) REFERENCES trackers (id));
CREATE TABLE trackers (id TEXT NOT NULL UNIQUE, name TEXT NOT NULL, category_id INTEGER, website_url TEXT, company_id TEXT, ghostery_id TEXT, notes TEXT, alias TEXT REFERENCES trackers (id), FOREIGN KEY (category_id) REFERENCES categories (id), FOREIGN KEY (company_id) REFERENCES companies (id));

INSERT INTO "trackers" VALUES('1000mercis','1000mercis',6,NULL,'1000mercis','2662',NULL,NULL);
INSERT INTO "tracker_domains" VALUES('1000mercis','mmtro.com',NULL);

Tested with latest FTL version.
I'm not sure why "old" sqlite versions did ignore the missing tables but I would not consider that a feature but a bug which is now fixed.

I created an issue here, time will tell if the sql script is updated, following your findings / recommendations.

did an additional test.

  • created the db with sqlite3
  • created sql script, using pihole-FTL sqlite3 /home/pi/whotracks/whotracks.db ".dump" > /home/pi/whotracks/test.sql
  • deleted the db
  • recreate the db, using pihole-FTL sqlite3 /home/pi/whotracks/whotracks.db < /home/pi/whotracks/test.sql

database is created, which means the original script, has been created with an older sqlite3 version, that dumps the database in a format (order) the latest sqlite3 version (embedded in pihole-FTL) can't handle correctly.

What's the difference between the original script and the dumped one? How does the order differ?

test.sql = .dump, using pihole-FTL sqlite3
trackerdb.sql = original

sql.zip (1.1 MB)

The order is the same as in the trackerdb.sql. Differences are

  1. the table names in the INSERT INTO statements are quoted
  2. most importantly: the sql starts with PRAGMA foreign_keys=OFF;

somebody looked at the issue and replied, aparently there is no problem when using sqlite3 v3.38.5

I compiled, from source, this version (v3.38.5) on another system (identical OS and patchlevel)

and ran the commands, as specified in the reply

wget https://raw.githubusercontent.com/whotracksme/whotracks.me/master/whotracksme/data/assets/trackerdb.sql
sqlite3 ./whotracks.db < ./trackerdb.sql

the database is created without any problem. test:

sqlite3 ./whotracks.db "select * from tracker_domains where tracker like '24%';"

result (OK):

24-ads.com|24-ads.com|
24_7|247-inc.net|
24_7|d1af033869koo7.cloudfront.net|
24log|counter.24log.ru|
24smi|24smi.net|
24smi|24smi.org|

so, either the embedded version in pihole-FTL has a problem OR pihole-FTL doesn't handle '< ./trackerdb.sql' as it should...

Maybe there is a special method to allow pihole-FTL to process redirected input, that I don't know about.

@DL6ER Can you please lookin to this? Thanks for your time and effort.

Try pihole-FTl sqlite3 -cmd ".read trackerdb.sql"

The issue is quite clearly with the order of the statements, as yubiuser has pointed out.

The script issues statements that must fail because executing them would violate a foreign key constraint. In any database that honors foreign key constraints, those constraints must be kept, or your database may become corrupted.

The reason why this works with different versions of SQLite3 is that the internal, SQLite3-specific PRAGMA foreign_keys that controls whether those constraints are honored may have a different default value in different versions, and that default value may even differ for the same SQLite3 version, depending on the actual compile time options used.

You could decide to add PRAGMA foreign_keys = OFF; to your personal script, but it wouldn't be a good idea to have Pi-hole simply gloss over this to allow arbitrary statements to complete when they would be expected to fail.

As SQL as a language would not target a specific database implementation, the whotracks.me maintainers should adopt their script to the order that would be required for a successful completion within a database system with active foreign constraints.

EDIT: I've added a respective comment to the whotracks.me issue.

1 Like

same problem

I would expect the embedded pihole-FTL sqlite3 to be using the same compiler options as the distro sqlite3 version, this to ensure the command 'pihole-FTL sqlite3' behaves the same as 'sqlite3' (the distro (debian/raspbian ???) binary).
The fact that just compiling sqlite3 from source results in a binary that produces a good result (database created) indicates the embedded pihole-FTL is NOT using the same compiler options. Why is this?

I would expect a SQL script to be written in a way that can be run on any database, regardless if it's forcing foreign key constraints or not.
If the script is targeting a specific database, it should include the database specific commands that would allow it to complete, or clearly document the requirements for successful execution.

Because of the following from the documentation:

Foreign key constraints are disabled by default (for backwards compatibility), so must be enabled separately for each database connection. (Note, however, that future releases of SQLite might change so that foreign key constraints enabled by default. Careful developers will not make any assumptions about whether or not foreign keys are enabled by default but will instead enable or disable them as necessary.) The application can also use a PRAGMA foreign_keys statement to determine if foreign keys are currently enabled. The following command-line session demonstrates this:

Assuming the library is compiled with foreign key constraints enabled, it must still be enabled by the application at runtime, using the PRAGMA foreign_keys command. For example:

sqlite> PRAGMA foreign_keys = ON;

The compile options we use includes SQLITE_DEFAULT_FOREIGN_KEYS=1

This macro determines whether enforcement of foreign key constraints is enabled or disabled by default for new database connections. Each database connection can always turn enforcement of foreign key constraints on and off and run-time using the foreign_keys pragma. Enforcement of foreign key constraints is normally off by default, but if this compile-time parameter is set to 1, enforcement of foreign key constraints will be on by default.

Here is the current list of options and the comments explaining each one.

just trying to understand ... Why is this?

I checked

  • all files in /etc/.pihole/advanced/Scripts/database_migration/gravity/
  • /etc/.pihole/advanced/Templates/gravity.db.sql
  • /etc/.pihole/advanced/Templates/gravity_copy.sql

/etc/.pihole/advanced/Templates/gravity.db.sql contains 'PRAGMA foreign_keys=OFF;'. There are no other foreign_keys entries in the scripts.

Of course, it might be used in the code to create pihole-FTL.db, there is no script for this, the db is created by the pihole-FTL binary.

My problem is solved by just installing the sqlite3 binary, and only use pihole-FTL sqlite3 on the pihole databases, but it doesn't make life any easier for any user that uses sqlite3 databases, other than the pihole databases. In my case that would be:

image

It's easy to fix your issue, include the PRAGMA change as the first line of your sql script.

We support a number of different distributions besides Raspbian/Debian flavors.

Because this is the official recommendation by the SQLite3 developers. The distro-provided binaries intentionally violate this recommendation because they are aware of broken scripts. As the SQLite3 developers say that this will eventually become the default in a future version, the distro-provided binaries will break, too, in the future.

I'll leave a reply over there, too. FTL is really not doing anything wrong here. It is stricter than the distro-provided binaries as it adheres to all SQLite3 recommendations. Note that compiling SQLite3 from source will not use some set of recommended settings but the worst-case fallback options for maximum backwards compatibility. This is far from an optimal solution.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.