Prevent users entering invalid regexps

Problem with Beta 5.0:
Web interface currently allows users to enter invalid regexps. E.g. (^|\.testthisinvalidregexp\.com

Whilst this may not be an issue for FTL to handle, it could lead to users having regex filters that aren't doing what they think they're doing.

I don't know what is and isn't possible with PHP as it's been a long time since I have worked with it, but is it not possible to run the query against a test string and check the return code (similar to what you can do with grep), before adding?

We implemented POSIX ERE (extended Regex) for FTL as it is known to cover almost everything while being extremely efficient and fast during run time as it has a (comparably) much low complexity of the compiled regex bytecode compared to other regex flavors. I have not seen an ERE extension for PHP we could use here.

Oh, you would! Adding

will cause FTL to skip this query.

However, FTL will also complain and try to explain why it didn't like your regex:

[2020-02-18 17:54:55.700 17284] Warning: Invalid regex blacklist filter "(^|\.testthisinvalidregexp\.com": Unmatched ( or \( (error code 8)

Well, if I'd have pasted the complete snippet, you'd have also seen the ID. The more verbose output is, however, only shown when in DEBUG_REGEX mode, but I guess that's acceptable:

[2020-02-18 21:30:14.866 17284] Compiling blacklist regex 0 (database ID 53): facebook.com
[2020-02-18 21:30:14.866 17284] Compiling blacklist regex 1 (database ID 54): (^|\.testthisinvalidregexp\.com
[2020-02-18 21:30:14.866 17284] Warning: Invalid regex blacklist filter "(^|\.testthisinvalidregexp\.com": Unmatched ( or \( (error code 8)

The chronology of things is problematic here.

  1. You add a new domain
  2. FTL is told to reload-lists
  3. The page reloads, showing the additional regex
  4. With a tiny bit of delay, FTL finds out that the regex is a bad one

FTL is asynchronous in such a way to ensure running DNS requests are finished before reloading the lists. This means that there may be a notable delay between 2 and 4 (there typically is none), however FTL will very likely only find that the regex is broken after the page already refreshed. Implementing a fixed delay of one second doesn't seem to be a valid solution.

Neither do I think a modification of the regex comment is a successful idea, it is a mere hack. Stripping the comment may be problematic, just as you mentioned as well. I would prefer to solve this with an additional table (maybe called domain_message) containing something like

domain_id message
54 `Invalid regex blacklist filter "(^

This table could be emptied before compiling any regex and only populated with error messages. While this would not solve the temporal sequence issue, it would, at least, provide a meaningful way to transport the debugging information from withing FTL to the user.

I'm traveling a lot until around mid-March and I don't think this is something that needs to go into v5.0. It can be a thing for v5.1. Especially, because the functionality is already available, it is just not as prominent as it could be.

Checking back with @PromoFaux about what he thinks of this feature becoming a v5.1 one.

Version 5 is a major update, if things upgrade easily then consider that to be a bonus. Major changes mean breaking changes and different ways of doing things.

Semantic Versioning 2.0.0

Summary

Given a version number MAJOR.MINOR.PATCH, increment the:

  1. MAJOR version when you make incompatible API changes,
  2. MINOR version when you add functionality in a backwards compatible manner, and
  3. PATCH version when you make backwards compatible bug fixes.

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.