RegEx engine improvements

The regex improvements, added by DL6ER, read this topic, examples here. are working very well. I've been using

  • .*;querytype=!A (regex blacklist)
  • .*;querytype=ANY (regex blacklist) (read here)
  • .*;querytype=PTR (regex whitelist) (required, !A is to restrictive)

on devices, such as a chromecast, PS4, IPv4 only desktop, ... (group allowAqueriesOnly), this to instruct pihole-FTL, NOT to forward unnecessary requests. (example: the apps and devices shouldn't be making the request, since they have no IPv6).

The reply however, on regex match is NOT configurable (reply depends on blocking mode):

  • an AAAA query on these device (.*;querytype=!A) always returns ::
  • an ANY query on all devices (.*;querytype=ANY - group Default) always returns 0.0.0.0

Request: Make the reply configurable, something like .*;querytype=ANY;nodata or .*;querytype=!A;nxdomain

It should likely be a property=value relation, so rather like

.;querytype=ANY;reply=NODATA

or

.;querytype=!A;reply=NXDOMAIN

I'll have to check how (and if) this can be implemented without making too many changes. The code being responsible for detecting (database + regex lookups) and actually performing (creating a suitable DNS reply packet) the blocking is pretty far from each other.

Please try

pihole checkout ftl new/regex_replytype

with

;reply=NODATA

or

;reply=NXDOMAIN

added to your regex.

Pi-hole version is v5.3.1 (Latest: v5.3.1)
AdminLTE version is v5.5 (Latest: v5.5)
FTL version is new/regex_replytype vDev-380ee38 (Latest: v5.8)

HOWEVER

pihole-FTL log:

[2021-04-18 13:21:41.145 24988/T24992] REGEX WARNING: Invalid regex blacklist filter ".*;querytype=!A;reply=NODATA": Unknown replytype
[2021-04-18 13:21:41.340 24988/T24992] REGEX WARNING: Invalid regex blacklist filter ".*;querytype=ANY;reply=NODATA": Unknown replytype
[2021-04-18 13:21:41.569 24988/T24992] Compiled 3 whitelist and 51 blacklist regex filters for 14 clients in 435.5 msec

AND

dig AAAA tweakers.net on a machine in the correct group

; <<>> DiG 9.16.4 <<>> AAAA tweakers.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45507
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;tweakers.net.                  IN      AAAA

;; Query time: 0 msec
;; SERVER: 192.168.2.57#53(192.168.2.57)
;; WHEN: Sun Apr 18 13:22:51 Romance Daylight Time 2021
;; MSG SIZE  rcvd: 41

in the query log (reply column says IP):

click on "Blocked (regex blacklist)

Okay, so NODATA works for you but you see an incorrect warning. I cannot reproduce this and, looking at the code I also don't know where this may come from.

I updated the branch with a version that is a bit more explicit when complaining about unknown reply types and should also fix the incorrectly displayed reply type in the Query Log.

edit The CI is lacking behind. The correct version will be vDev-d8c8df39 for your next tests.

yes

still error message ( [Unknown reply "NODATA"), attached pihole-FTL log, since 'pihole -up'

pihole-FTL.zip (2.3 KB)

the reported count is correct (Compiled 3 whitelist and 51 blacklist regex filters for 15 clients in 518.0 msec) , the database contains 54 entries (sum of type 2 and 3).

edit
Pi-hole version is v5.3.1 (Latest: v5.3.1)
AdminLTE version is v5.5 (Latest: v5.5)
FTL version is new/regex_replytype vDev-d8c8df3 (Latest: v5.8)
/edit

Apparently it doesn't always work (OR I'm doing something wrong)

with dig, same group membership as the one below,

query log looks OK

my config (other device that gets :: as reply):

Apr 18 14:56:40 dnsmasq[15073]: 656 192.168.2.240/13147 query[AAAA] mtalk.google.com from 192.168.2.240
Apr 18 14:56:40 dnsmasq[15073]: 656 192.168.2.240/13147 regex blacklisted mtalk.google.com is ::

I must be doing something wrong........

edit
additional info:

pihole -q mtalk.google.com
 Match found in regex whitelist
   ^((alt)[0-9](-))?mtalk\.google\.com$

BUT only for the android group, shouldn't affect nvshieldeth0.localdomain (192.168.2.240)

/edit

Ah, I see now where the warning came from. A simple logic error without consequences for the actual function. I pushed another update.

You can add DEBUG_REGEX=true in /etc/pihole/pihole-FTL.conf to get even more details about the compilation and execution of regex. Based on this data it'll be easier to see why one device is getting an unexpected reply.

the regex errors are gone!

in the pihole-FTL log, with DEBUG_REGEX=true:

[2021-04-18 16:06:31.457 4370/T4374] Regex blacklist: Enabling regex with DB ID 158 for client 192.168.2.240

in the database

158	3	.*;querytype=!A;reply=NODATA	1	1618749334	1618749361
159	3	.*;querytype=ANY;reply=NODATA	1	1618749350	1618749350

in the pihole log I find this:

Apr 18 16:07:47 dnsmasq[4370]: 29 192.168.2.240/37655 query[AAAA] mtalk.google.com from 192.168.2.240
Apr 18 16:07:47 dnsmasq[4370]: 29 192.168.2.240/37655 cached mtalk.google.com is regex blacklisted

in the pihole-FTL log, I find:

[2021-04-18 16:07:47.805 4370M] Regex blacklist (49, DB ID 158) >> MATCH: "mtalk.google.com" vs. ".*;querytype=!A;reply=NODATA"

in order to allow you to verify, pihole-FTL log attached.

pihole-FTL.zip (3.4 KB)

this time the reply is N/A (should this not be NODATA?)

Yes. Please also add DEBUG_QUERIES=true so we get more details in pihole-FTL.log and try again. The log will grow substantially. We only need the relevant lines from the query we're looking at here.

DEBUG_REGEX off (#), DEBUG_QUERIES=true

pihole log:

Apr 18 17:22:48 dnsmasq[18613]: 20 192.168.2.240/64738 query[AAAA] mtalk.google.com from 192.168.2.240
Apr 18 17:22:48 dnsmasq[18613]: 20 192.168.2.240/64738 cached mtalk.google.com is regex blacklisted

pihole-FTL log:

[2021-04-18 17:22:48.393 18613M] **** new UDP query[AAAA] query "mtalk.google.com" from eth0:192.168.2.240 (ID 20, FTL 10330, /root/project/src/dnsmasq/forward.c:1592)
[2021-04-18 17:22:48.393 18613M] mtalk.google.com is not known
[2021-04-18 17:22:48.394 18613M] Reply is 1
[2021-04-18 17:22:48.395 18613M] Set reply to NODATA (1)
[2021-04-18 17:22:48.395 18613M] Blocking mtalk.google.com as mtalk.google.com is regex blacklisted
[2021-04-18 17:22:48.395 18613M] Forcing next reply to 1

It seems to work for me just fine, I'll have to try more to reproduce this, apparently.

Screenshot from 2021-04-18 17-48-30

dig AAAA aaaaa

Do you see similar things for NXDOMAIN ?

regex deleted, new regex .*;querytype=!A;reply=NXDOMAIN

pihole log:

Apr 18 17:53:36 dnsmasq[25796]: 22 192.168.2.240/56410 query[AAAA] mtalk.google.com from 192.168.2.240
Apr 18 17:53:36 dnsmasq[25796]: 22 192.168.2.240/56410 cached mtalk.google.com is regex blacklisted

pihole-FTL log:

[2021-04-18 17:53:36.885 25796M] **** new UDP query[AAAA] query "mtalk.google.com" from eth0:192.168.2.240 (ID 22, FTL 11312, /root/project/src/dnsmasq/forward.c:1592)
[2021-04-18 17:53:36.885 25796M] mtalk.google.com is not known
[2021-04-18 17:53:36.886 25796M] Reply is 2
[2021-04-18 17:53:36.887 25796M] Set reply to NXDOMAIN (2)
[2021-04-18 17:53:36.887 25796M] Blocking mtalk.google.com as mtalk.google.com is regex blacklisted
[2021-04-18 17:53:36.888 25796M] Forcing next reply to 2

changed back to NODATA.

on the windows desktop , the DIG replies do actually register as NODATA, It is the device on 192.168.2.240, for witch the query log always displays N/A.

I tested this now, using dig and DEBUG_QUERIES=true, both the DIG command on the desktop and the device (192.168.2.240 - no access possible, black box, so queries are initiated automatically) use UDP. The output in the log is identical. Is the device asking something more / else, than a simple DIG command?

attached packet capture, filter port 53 and ip 192.168.2.240. You'll notice some DNS requests to 8.8.8.8, but the firewall redirects these to pihole.

packetcapture.zip (6.4 KB)

JUST NOTICED, one reply IP, the other NODATA

just asking, don't know the code, does this have something to do with Forcing next reply to 2, and the wrong reply is "forced"?
/edit

edit2
removed obsolete screenshots, caused by my mistake (wrong group assignement)
/edit2

Who knows. The capture is a good idea, could you generate the same from the Windows machine that behaves as expected? Comparing the two would make it a lot easier for me. Both the question and the reply look perfectly fine (and indeed NODATA).


So I tried to reproduce your setup here, correct me if my assumptions are wrong anywhere:

  1. Created a new group
    Screenshot from 2021-04-18 19-19-27
  2. Assigned one client exclusively to this group
  3. Added your regex only for this new group

When dig AAAA google.com from localhost it is not blocked (expected, it is not member of the group created for the regex above). If I query the same domain from 127.0.0.2 it is indeed blocked:

Does the link in your

Really go to the regex only meant for the other group?

I'm so sorry, I recreated the regex to use NODATA (after the NX DOMAIN test, but forgot to assign it to the correct group, fixed that, AAAA queries are back to normal on that device (with IPv6), again, SORRY....

On the desktop with IPv4 only, client assigned to the regex (via group), with an OK result, started capture, executed dig, stopped capture.

packetcapture (2).zip (358 Bytes)

edit
despite my stupid mistake, did you notice, the last screenshot of that post, one query is answered wit IP, the other with NODATA
/edit

edit2
even on the desktop, that behaves as it should, there are some inconsistencies, so it actually behaves a little bit random


/edit2

edit3
started capture for the desktop, repeated dig until there was a different reply (3 digs required)

packetcapture (3).zip (516 Bytes)

ignore the queries for ctldl.windowsupdate.com, different machine...


/edit3

Any progress on this OR dead end?

FYI
Just noticed a typo in this regex completely kills DNS resolution, e.g. example .*;querytype=!!A, a typo, is accepted by the webinterface (success), FTL notices the problem and generates a WARNING message in the FTL log

[2021-05-02 21:37:50.291 30445/T30449] REGEX WARNING: Invalid regex blacklist filter ".*;querytype=!!A": Unknown querytype

while the incorrect regex is enabled, all queries (dig) are replied with 0.0.0.0
disabling or deleting the incorrect regex restores DNS resolution.

My vote is dead end. It seems to be just too much. I can see adding additional selectors like querytype very useful but specifying the result in the regex itself seems just very wrong. Asking the regex experts @PromoFaux @DanSchaper about their opinion as well.

We should be cautious to not add more and more and more features when we want to have a still fast and easily to maintain Pi-hole ecosystem. Only this will ensure the project can live on in the future.

hahahaha, I don't know where anyone got the impression I was a Regex expert. I really shouldn't touch it...

1 Like

I wasn't aware that it is that bad. I know you are the badass SQL master so I presumptively extended this to other strange languages (like regex) as well-