Store additional info in database

jpgpi250 · October 25, 2021, 2:28pm

I have, and so do a lot of users, a firewall rule, that redirects (NAT) requests from clients, to other DNS servers than pihole, to pihole. This ensures no device is able to bypass pihole, examples: chromecast, nvidea shield, ...

The solution works perfectlly, however, it is impossible, using pihole, to determine which clients have tried this (use alternate DNS server), because the incoming ip address isn't in the database (as oposed to the ip address where pi-hole-FTL forwarded to).

To achieve this, I've already modified my /etc/dnsmasq.d/01-pihole.conf, content:

#interface=eth0
listen-address=192.168.2.57
listen-address=192.168.2.48
listen-address=127.0.0.1

dhcp configures the clients to use 192.168.2.57, the firewall redirects offending requests to 192.168.2.48

If the database (query table) could provide the incoming IP adddress, it would be very easy to use the query log / long term query data, to determine who the offenders are, and what they are trying to resolve (something like filter incoming_address = 192.168.2.48), or simply execute some sqlite3 statement to get the required data.

yubiuser · October 25, 2021, 5:27pm

Have you tried to block DNS request to DNS servers other than Pi-hole. I had good results with this approach, e.g. Chromecast started using DHCP provided DNS servers after 8.8.8.8 was blocked. If this works for your clients there would be no need to add an extra column.

jpgpi250 · October 25, 2021, 6:33pm

I don't block them, I redirect (NAT) to pihole, and want to be able to see which clients have actually tried bypassing pihole in the query log, hence the additional column (dnsmasq address the request was received on).

It's not only chromecast doing this, lots of devices try this approach (iot), some of them don't work properly, when simply blocking the requests.

jfb · October 25, 2021, 6:47pm

Did you miss this prominent warning at the beginning of that file?

###############################################################################
#      FILE AUTOMATICALLY POPULATED BY PI-HOLE INSTALL/UPDATE PROCEDURE.      #
# ANY CHANGES MADE TO THIS FILE AFTER INSTALL WILL BE LOST ON THE NEXT UPDATE #
#                                                                             #
#        IF YOU WISH TO CHANGE THE UPSTREAM SERVERS, CHANGE THEM IN:          #
#                      /etc/pihole/setupVars.conf                             #
#                                                                             #
#        ANY OTHER CHANGES SHOULD BE MADE IN A SEPARATE CONFIG FILE           #
#                    WITHIN /etc/dnsmasq.d/yourname.conf                      #
###############################################################################

Are you seeing the IP's in the query log on the dashboard (and not in the long term database)?

The IP of the requesting client is available for today plus the past 5 days in the pihole logs in directory /var/log. A grep or zgrep command can quickly parse the information you seek.

ls -lha /var/log/pihole.log*
-rw-r--r-- 1 pihole pihole 1.3M Oct 25 13:45 /var/log/pihole.log
-rw-r--r-- 1 pihole pihole 2.0M Oct 25 00:00 /var/log/pihole.log.1
-rw-r--r-- 1 pihole pihole 221K Oct 24 00:00 /var/log/pihole.log.2.gz
-rw-r--r-- 1 pihole pihole 360K Oct 23 00:00 /var/log/pihole.log.3.gz
-rw-r--r-- 1 pihole pihole 154K Oct 22 00:00 /var/log/pihole.log.4.gz
-rw-r--r-- 1 pihole pihole  71K Oct 21 00:00 /var/log/pihole.log.5.gz

zgrep "from 192.168.0.135" /var/log/pihole.log* | tail -n5
/var/log/pihole.log.5.gz:Oct 20 23:59:14 dnsmasq[819]: query[A] Mac-Mini-2018 from 192.168.0.135
/var/log/pihole.log.5.gz:Oct 20 23:59:19 dnsmasq[819]: query[A] ssl.gstatic.com from 192.168.0.135
/var/log/pihole.log.5.gz:Oct 20 23:59:43 dnsmasq[819]: query[A] 0.client-channel.google.com from 192.168.0.135
/var/log/pihole.log.5.gz:Oct 20 23:59:46 dnsmasq[819]: query[A] api.mixpanel.com from 192.168.0.135
/var/log/pihole.log.5.gz:Oct 21 00:00:01 dnsmasq[819]: query[A] sessions.bugsnag.com from 192.168.0.135

zgrep "from 192.168.0.135" /var/log/pihole.log* | grep "Oct 23" | tail -n5
/var/log/pihole.log.2.gz:Oct 23 23:59:04 dnsmasq[11290]: query[AAAA] Mac-Mini-2018 from 192.168.0.135
/var/log/pihole.log.2.gz:Oct 23 23:59:25 dnsmasq[11290]: query[A] firebaselogging-pa.googleapis.com from 192.168.0.135
/var/log/pihole.log.2.gz:Oct 23 23:59:29 dnsmasq[11290]: query[A] fcmconnection.googleapis.com from 192.168.0.135
/var/log/pihole.log.2.gz:Oct 23 23:59:51 dnsmasq[11290]: query[A] client.dropbox.com from 192.168.0.135
/var/log/pihole.log.2.gz:Oct 23 23:59:55 dnsmasq[11290]: query[A] firebaselogging-pa.googleapis.com from 192.168.0.135

jpgpi250 · October 25, 2021, 6:59pm

I know all off that, and more, however, it would be much simpler if the records, matching the criteria could be available in the long term data and /or retrieved with a simple sqlite3 query.

The logs rotate every day, the database can hold information up to 365 days (MAXDBDAYS=)

If I want to do this the hard way, I'll simply enable logging for the relevant rules on the firewall, what I'm looking for is a single point of administration for DNS, hence the database, possibly the web interface (query log / long term query data)

jfb · October 25, 2021, 7:11pm

In my opinion, this is best done with your firewall. Pi-hole is not intended to be (nor should it become) a network DNS administration tool.

Coro · October 26, 2021, 2:22pm

I have seen the argument in other places: Every addition to the database should be well justified because it will cause the database to grow significantly. Say an IP address has 15 bytes like 192.168.0.123 (yes, it will be a lot worse for IPv6!). And you have usual 50,000 queries per day, this means 365 * 50,000 * 15 bytes = 273 MB of extra needed space. This does not include any bookkeeping bytes which are surely required in the database. If your system would be using ULA IPv6 addresses, this will already add 1.5 GB on top for a small network. And when you see 200,000 queries per day (active home network), this can already be 6 GB of extra disk space needed to store this.

These are quite extreme numbers for something I'd agree on to be the edge-case of the edge-case. I recall having seen you set the database lifetime to one day elsewhere so the database growth may be minimal for you but not for the default settings user. You could do this daily by examining the pihole.log file, instead. Especially since you seem only concerned about the clients but no other query details.

jpgpi250 · October 26, 2021, 7:35pm

You are assuming things are stored in the query table, additional field, the full IPv4 or IPv6 address, I assume a lot of database designers would agree this isn't a pretty design.
You could very easely create a new database table, holding all IP addresses of the machine it is running on (less than 10 on an average macihine), than in the query table, use the ID of the IP in the new table, as a reference.

I'm don't know where you get the information on the number of queries an average pihole receives per day, however, the number of queries my pihole processes per day is significattly lower.

going with your figures, excluding the space for the smal extra table for the IP's and their ID's
365 * 50000 * 1 bytes=18 MB (not 273MB)
365 * 200000 * 1 bytes=70 MB (not 6GB)

I use 32GB SD cards, 3,7 GB used (14%). The additional, worst case 70 MB will not cause any problem.

edit
pihole, with MAXDBDAYS=365 (default) with that amount of queries per day, expierence troubles when searching the long term query data. The 365 days default is not a very good choice for a busy pihole, example here (not the only example). Question: anybody checking for a DNS query, dated 27/10/2020, given today is 26/10/2021?
/edit

Coro · October 27, 2021, 6:15am

I read somewhere that the developers are looking at reducing this default value to three months because the proposed v6.0 branch will store more data in the table to allow full restoring after a restart. Currently, we are loosing reply type and time, DNSSEC status and others.

Also, I recall they want to use a linked table similar to what you suggested but, instead of adding a new table, they want to reuse the network_addresses table which should know all IP addresses.

What does "significantly" want to say compared to the 50,000 queries I have seen on many posts here on the forum? More like 5,000 queries a day?

jpgpi250 · October 27, 2021, 6:35am

long term data yesterday

long term data last 7 days

4 devices always on (24/7)
2 devices on for about 12 hours / day

MAXDBDAYS= 8 (see what happend last week at the same time)

jfb · October 27, 2021, 1:14pm

The discussion is drifting toward database size, and that is not the issue. The basic issue is whether these changes are within the scope of what Pi-hole should do.

In my opinion, given that other existing tools can provide the information, there is little value in changing the information stored by Pi-hole. That's more code to write and support, and this is an edge case at minimum.

DL6ER · October 27, 2021, 7:41pm

I fully agree on

recording

is very likely always the same for all users (well, except listen-address is used). Query processing in dnsmasq isn't even aware of on which address a query arrived. A lot of code would have to be rewritten to propagate this information further down into the space where FTL could see it. I don't see a justification for doing this and risking merge conflicts down the line in dnsmasq.

jpgpi250 · October 28, 2021, 8:47pm

OK, it's not going to happen, I submit to it.

Found a solution to suit my needs, by running an additional dnsmasq instance.

system · April 26, 2022, 8:48pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.