Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing RegEx function in Database Exprs. #14

Open
aidin36 opened this issue Feb 7, 2014 · 21 comments
Open

Implementing RegEx function in Database Exprs. #14

aidin36 opened this issue Feb 7, 2014 · 21 comments

Comments

@aidin36
Copy link
Owner

aidin36 commented Feb 7, 2014

Implement a RegEx class which is derived from FunctionExpr in database/expr/

When it compiles, it returns a Jx9 code that calls a C function. The C function will match a string with a RegEx pattern.

@DThiebaud
Copy link
Contributor

What does it do if the compile fails due to an invalid regex?

@aidin36
Copy link
Owner Author

aidin36 commented Oct 22, 2014

It should throw an exception which tells the client what was wrong with the regex.

@DThiebaud
Copy link
Contributor

Experimenting with the Posix regex library, the library returns an error message on an invalid regex so this should be feasible.

@aidin36
Copy link
Owner Author

aidin36 commented Nov 25, 2014

Thank Dick (: I didn't know that Posix have a Regex library. That would be great!

Should I assign the issue to you? Are you going to work on this?

@DThiebaud
Copy link
Contributor

I will word on this, but I'll be pretty busy until mid-December.

Where does an expression get compiled?

Would the generated Jx9 code compare one string to the regex and return true or false?

@aidin36
Copy link
Owner Author

aidin36 commented Nov 27, 2014

We should add Regex as a function to Jx9. It should be function that take a string and a Regex pattern, and return true or false. Then we can use it in our queries.
Calling an external function from Jx9 is explained in the Unqlite documents.

@DThiebaud
Copy link
Contributor

Aiden, please assign this task to me.

@aidin36
Copy link
Owner Author

aidin36 commented Dec 25, 2014

Yours now (:

@DThiebaud
Copy link
Contributor

Aidin, it looks like class RegularExpr should be a subclass of Expr and should be allowed in a ConnectiveExpr. Do you agree?

Am I correct that the scope of this task is to define a regular expression type in libtocc and not to define an interface for it in cli?

@aidin36
Copy link
Owner Author

aidin36 commented Dec 26, 2014

Yes. The second question is true. But first one is not.

Note that, before anything, we need a C function, that we can call from inside of a Jx9 script.
After it is done, we simply drive a class from FunctionExpr (not the Expr itself) which calls that function.

Take a look at WildCardExpr. We need something like that.

@DThiebaud
Copy link
Contributor

Aidin, it appears that the constructor of RegexExpr will need, as a parameter, a pointer to the Unqlite VM. Is there any problem with this?

@DThiebaud
Copy link
Contributor

Aidin, I see two possible ways of doing this.

  1. Compile regex in the constructor for the RegexExpr object and keep the compiled regex in the RegexExpr object. Register a pointer to the compiled regex as a resource variable in the Unqlite VM. Pass a pointer to the compiled regex to the C++ function called from Jx9.

    This way, if the regex match function is called for 20 records from Jx9, the regex is only compiled once. However, this requires that a pointer to the Unqlite VM be passed to the RegexExpr::RegexExpr constructor from CLI or whatever calls libtocc.

  2. Compile the regex in the function called from Jx9. This way, no pointer to the Unqlite VM needs to be passed to the RegexExpr::RegexExpr constructor. However, if the regex match function is called for 20 records from Jx9, the regex is compiled 20 times, once for each record.

    Thoughts? Do you prefer one way or the other?

@aidin36
Copy link
Owner Author

aidin36 commented Dec 30, 2014

Let me see...

There's a third way!
Have two function available to Jx9:

CompiledRegex* compile_regex(const char* regex);
bool match_regex(CompiledRegex* regex, const char* str_to_match)

Then, Jx9 first calls the first function and get a pointer to a compiled regex. Then, for each 20 records, passes the compiled regex to the second function.

What do you think?

@DThiebaud
Copy link
Contributor

If Jx9 calls the first function, where will the Regex that compile_regex compile reside? If is in the stack of of compiled_regex, the pointer might be invalidated by the time match_regex is called. It could be in a static field in compile_regex, but only if no more than one regex will ever be active. I think what we need to do is have compile_regex create the regex on the heap with malloc. We will need a Jx9 function free_regex(CompiledRegex* regex) which will call regfree and then free the memory that was malloced. This will work.

I've also thought of another possible way. Have Jx9 function:

bool match_regex(const char* compiled_regex_address, const char* str_to_match)

where compiled_regex_address is a string representing the address of the regex. When we compile the regex in RegexExpr::RegexExpr, we will convert its to address to a string and then put it in this->protected_data->arg. FunctionExpr::Compile put this string into the first argument of the Jx9 call to match_regex. (This is what happens for will convert the address to a string and put this in the J9X string to call the match_regex. (This will work like FunctionExpr::Compile does for WildCardExpr. The C++ code called by match_regex will receive the string, convert it to the address
of the regex, and do the match.

static regex_t *string_to_regex_pointer(const char *string)
{
regex_t *regex_pointer;
sscanf(string, "%p", &regex_pointer);
return regex_pointer;
}

static void regex_pointer_to_string(regex_t *regex_pointer, char *string, size_t string_length)
{
if (string_length < 20)
{
throw InvalidArgumentError("string less than 20 characters long passed to RegexExpr::regex_pointer_to_string");
}
snprintf (string, string_length, "%p", regex_pointer);
}

What are your thoughts about all this?

@aidin36
Copy link
Owner Author

aidin36 commented Jan 2, 2015

Creative Idea!
Though the code will become a little dirty. I couldn't came up with a cleaner idea. So, give it a try!
I'm waiting for your Pull Request (:

@DThiebaud
Copy link
Contributor

On 01/02/2015 09:30 AM, Aidin Gharibnavaz wrote:

Creative Idea!
Though the code will become a little dirty. I couldn't came up with a
cleaner idea. So, give it a try!
I'm waiting for your Pull Request (:


Reply to this email directly or view it on GitHub
#14 (comment).

I should have the code fairly soon, but creating the test cases will
take longer.

@DThiebaud
Copy link
Contributor

How do a add a new file to libtocc/tests, regex_tests.hpp?

@aidin36
Copy link
Owner Author

aidin36 commented Jan 3, 2015

Create a cpp file for the test. And add it to Makefile.am under the libtocc/tests/src/ directory.
That should be enougth.

@DThiebaud
Copy link
Contributor

I'm having a problem running libtocc/tests/configure. I get the following error:

configure: error: Could not find libtocc library. Please make sure you have this library in your libs path. Refer to documentations for more info.

I built libtocc and ran "sudo make install" on it successfully so the library should not be missing.

@DThiebaud
Copy link
Contributor

We can use one of two libraries: regex or pcre.

Pcre (Perl Compatible Regular Expressions) uses a format of regular expression compatible with Perl, Python, PHP, Java, and other packages. It seems to be the most commonly used regular expression library. It requires an external library to be linked in, the same way we link in Unqlite. It is available in MS Windows if we ever port TOCC to Windows.

Regex is the posix regular expression library. Its format is compatible with egrep and not compatible with PCRE. In Unix-like OS's, no additional library needs to be linked in. Regex is not available for Windows.

Which should we use?

@aidin36
Copy link
Owner Author

aidin36 commented Jan 7, 2015

  1. It looks for libtocc.pc. It should be in /usr/local/lib/pkgconfig/libtocc.pc. If you couldn't fix your problem, please ask in malining list. (Others may have the answer, and we keep this issue clean from not-related talks (: )

  2. Personally, I'm more comfortable with PCRE regex. And I think most of the people do. Though we will depend on another library, I think we should prefer PCRE.
    At first, when I told you to use Posix Regex, I didn't know it's not compatible with Perl Regex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants