Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscriptions expressions: More powerful pattern matching on string-typed properties #220

Open
1 task done
mlongob opened this issue Mar 18, 2024 · 2 comments
Open
1 task done
Assignees
Labels
A-Broker Area: C++ Broker enhancement New feature or request

Comments

@mlongob
Copy link
Contributor

mlongob commented Mar 18, 2024

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

There is currently a single useful operator for string properties (== and !=). For use-cases that need more powerful pattern matching, the following operations would be useful:

  • Checking that a string contains another string
  • Checking that a string starts with another string or a character
  • Doing pattern matching on a sub-section of an URI (example: /var/log/foobar/*/*.log.)

Describe the solution you'd like

A regex operator would solve many of those problems. We can use the following library to compile regexes without introducing additional dependencies to the broker:
https://github.com/bloomberg/bde/blob/d6674e1df52b62d66942ca318882f225d26104bf/groups/bdl/bdlpcre/bdlpcre_regex.h#L669

Alternatives you considered

No response

@mlongob mlongob added enhancement New feature or request A-Broker Area: C++ Broker labels Mar 18, 2024
@mlongob mlongob changed the title Subscriptions expressions: more powerful pattern matching on string-typed properties Subscriptions expressions: More powerful pattern matching on string-typed properties Mar 18, 2024
@678098
Copy link
Collaborator

678098 commented Mar 19, 2024

When we worked on subscriptions, we tried to limit expressions complexity and ensure that it's impossible to build an expression which takes too long to evaluate.

This is why we have constraints like this in the expression evaluator:

enum {
/// The maximum number of operators allowed in a single expression.
k_MAX_OPERATORS = 10,
k_MAX_PROPERTIES = 10
// The maximum number of properties allowed in a single expression.
};

RegEx engine looks like a ready tool to do exactly what we might need to do for a more powerful pattern matching. However, it's usage goes against our initial design and expected usage scenario, because it allows to reduce BlazingMQ performance drastically.

In this article, you can find an example of a short RegEx which takes seconds to evaluate on a short string, and it can be even worse:

https://blog.codinghorror.com/regex-performance/

So, here are some questions to think:

  1. Do we want to lift restrictions on expression complexity?
  2. Do we really need all the features a full RegEx engine provides?
  3. Can we assume that clients will never (intentionally or unintentionally) construct an ineffective RegEx?
  4. Is it possible to use a limited RegEx engine, e.g. disable some features which make possible to overload the engine?
  5. Can we accurately predict that the provided RegEx is inefficient when we first see this RegEx during configuration?
  6. Should we measure expression evaluation time and raise an ALARM if it's too long?
  7. What is the performance of the used RegEx engine in "normal" cases in comparison with already implemented operands in ExpressionEvaluator?
  8. Do we really need something other than * star or . dot support for pattern matching? Is it difficult to implement a fast matching with only these 2?

@pniedzielski
Copy link
Collaborator

In this article, you can find an example of a short RegEx which takes seconds to evaluate on a short string, and it can be even worse:

https://blog.codinghorror.com/regex-performance/

Note that this is not a property of regular expressions, but rather a property of backtracking regular expression engines. Strings can be checked against regular expressions (without lookahead assertions) in linear time. PCRE is a backtracking regex matcher, but there are others that don't suffer from this; see for example https://github.com/google/re2/wiki/WhyRE2, designed for similar use cases to ours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Broker Area: C++ Broker enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants