Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with string literals in router. #166

Open
Lord-Kamina opened this issue Sep 20, 2022 · 16 comments
Open

Problem with string literals in router. #166

Lord-Kamina opened this issue Sep 20, 2022 · 16 comments

Comments

@Lord-Kamina
Copy link

I was previously using the express router, configured something like this:

	router->http_get(u8R"--(/:path(.*))--", [this](auto request, auto params){
		do stuff
	});

Since updating to 0.6.17 (using libfmt9), I'm getting the following error: "non-escaped bracket ')' at pos 9: may be unmatched group finish"

If I disregard the fact that this is a raw string and thus nothing should need to be escaped, the error goes away but my rules don't match any request and they all end unhandled.

@eao197
Copy link
Member

eao197 commented Sep 21, 2022

Hi!

I can't reproduce that problem on my Ubuntu 20.04 with gcc-9.4 (C++14), clang-13 (C++14, C++20) and gcc-11 (C++20).

Could you provide a minimal example that reproduces this error and tell more about your OS, compiler and C++ standard?

@Lord-Kamina
Copy link
Author

Lord-Kamina commented Sep 21, 2022

@eao197 I'm not sure why this is happening; I'm putting it through the debugger, and saw that for some reason it processes the / properly, adding \\, but after that it goes to path:(.*, as seen on this picture. From there, the suffix is ), which is picked up as the unescaped bracket. I'm not really sure why it's happening though...

Screen Shot 2022-09-21 at 11 08 46

Screen Shot 2022-09-21 at 11 12 09

P.S. I was trying a slightly modified pattern when I took the pictures, to see if it made any difference.

@Lord-Kamina
Copy link
Author

Lord-Kamina commented Sep 21, 2022

Could you provide a minimal example that reproduces this error and tell more about your OS, compiler and C++ standard?

I'll try to get one later. The weird thing is, I was rebasing an old PR and saw this, but it wasn't occurring before. I'm on macOS 12, using Clang. The parent project where RESTinio is being included is c++17.

@eao197
Copy link
Member

eao197 commented Sep 21, 2022

but after that it goes to path:(.*, as seen on this picture

It looks like a memory corruption. Maybe you use a dangling pointer somewhere?

@Lord-Kamina
Copy link
Author

but after that it goes to path:(.*, as seen on this picture

It looks like a memory corruption. Maybe you use a dangling pointer somewhere?

I made a smaller test-case where I can't reproduce the issue, even though I'm using VERY similar code. I'm really at my wits' end.

@Lord-Kamina
Copy link
Author

@eao197 I think this is a bug concerning std::regex itself. Not sure whether there might be a bug in the path2regexcode also (incidentally, I looked at the original project and the code seems to have been almost completely rewritten in the last couple of years)

From what I'm gathering... the problem presents itself in my app after I initialize our localization engine. I tried to get more specific and noticed the following:

TranslationEngine::TranslationEngine() {
	std::string_view route_sv(R"-(/:path(.+))-");
	const std::regex main_path_regex{ R"((\\.)|(?:\:(\w+)(?:\(((?:\\.|[^\\()])+)\))?|\(((?:\\.|[^\\()])+)\))([+*?])?)" };

	std::cregex_iterator token_it{
			route_sv.data(),
			route_sv.data() + route_sv.size(),
			main_path_regex
	};

	initializeAllLanguages();
	/* set all languages in configuration.
	 * They are kept untranslated internally which prevents having issues
	 * when changing languages (and need to update internal stuff with
	 * translations) */
	populateLanguages(GetAllLanguages());

	std::clog << "locale/debug: Checking if language is set in config." << std::endl;

	auto prefValue = config["game/language"].getEnumName();
	auto lang = getLanguageByHumanReadableName(prefValue);

	setLanguage(lang); ///< this calls std::locale::global();
	const std::regex main_path_regex2{ R"((\\.)|(?:\:(\w+)(?:\(((?:\\.|[^\\()])+)\))?|\(((?:\\.|[^\\()])+)\))([+*?])?)" };

	std::cregex_iterator token_it4 {
			route_sv.data(),
			route_sv.data() + route_sv.size(),
			main_path_regex
	};
	std::cregex_iterator token_it5 {
			route_sv.data(),
			route_sv.data() + route_sv.size(),
			main_path_regex2
	};
}

Interestingly, looking at this in lldb:

Using the first regex, instantiated before calling std::locale

(lldb) print (*token_it)[0].str()
(std::sub_match<const char *>::string_type) $3 = ":path(.+)"

(lldb) print (*token_it)[2].str()
(std::sub_match<const char *>::string_type) $4 = "path"

Using the same regex instance, after calling std::locale

(lldb) print (*token_it4)[0].str()
(std::sub_match<const char *>::string_type) $5 = ":path(.+)"
(lldb) print (*token_it4)[2].str()
(std::sub_match<const char *>::string_type) $6 = "path"

And, using a new regex instance for the same pattern, created after changing locale:

(lldb) print (*token_it5)[0].str()
(std::sub_match<const char *>::string_type) $7 = ":path(.+"
(lldb) print (*token_it5)[2].str()
(std::sub_match<const char *>::string_type) $8 = "path(.+"

This was referenced Oct 7, 2022
@eao197
Copy link
Member

eao197 commented Oct 12, 2022

I'm in doubt there is an error in std::regex implementation. I've tried the following code:

#include <regex>
#include <string_view>
#include <locale>
#include <iostream>

int main()
{
	std::string_view route_sv(R"-(/:path(.+))-");
	const std::regex main_path_regex{ R"((\\.)|(?:\:(\w+)(?:\(((?:\\.|[^\\()])+)\))?|\(((?:\\.|[^\\()])+)\))([+*?])?)" };

	std::cregex_iterator token_it{
			route_sv.data(),
			route_sv.data() + route_sv.size(),
			main_path_regex
	};

	std::cout << "token_it[0]str: '" << (*token_it)[0].str() << "'" << std::endl;
	std::cout << "token_it[2]str: '" << (*token_it)[2].str() << "'" << std::endl;

	for( const auto ln : { "be_BY.utf8", "ru_RU.utf8", "en_NZ.utf8", "ru_UA.utf8", "POSIX" } )
	{
		std::cout << "*** " << ln << " ***" << std::endl;

		std::locale::global( std::locale(ln) );

		const std::regex main_path_regex2{ R"((\\.)|(?:\:(\w+)(?:\(((?:\\.|[^\\()])+)\))?|\(((?:\\.|[^\\()])+)\))([+*?])?)" };

		std::cregex_iterator token_it4 {
				route_sv.data(),
				route_sv.data() + route_sv.size(),
				main_path_regex
		};

		std::cout << "token_it4[0]str: '" << (*token_it4)[0].str() << "'" << std::endl;
		std::cout << "token_it4[2]str: '" << (*token_it4)[2].str() << "'" << std::endl;

		std::cregex_iterator token_it5 {
				route_sv.data(),
				route_sv.data() + route_sv.size(),
				main_path_regex2
		};

		std::cout << "token_it5[0]str: '" << (*token_it5)[0].str() << "'" << std::endl;
		std::cout << "token_it5[2]str: '" << (*token_it5)[2].str() << "'" << std::endl;
	}
}

And it produces the same results on my Linux machine (and, if I remove "POSIX", on Windows machine with VC++ from VS2022).

So may be there is no need to modify RESTinio code?

@Lord-Kamina
Copy link
Author

You're correct with that example however, there's a missing piece of the puzzle I forgot to mention: Boost.locale.

If I just add Boost.locale so instead of

 std::locale::global( std::locale(ln) );

it becomes

boost::locale::generator m_gen{};
std::locale::global( m_gen(ln) );

to your example, this is the output:

token_it[0]str: ':path(.+)'
token_it[2]str: 'path'
*** be_BY.UTF8 ***
token_it4[0]str: ':path(.+)'
token_it4[2]str: 'path'
token_it5[0]str: ':path(.+'
token_it5[2]str: 'path(.+'
*** ru_RU.UTF8 ***
token_it4[0]str: ':path(.+)'
token_it4[2]str: 'path'
token_it5[0]str: ':path(.+'
token_it5[2]str: 'path(.+'
*** en_NZ.UTF8 ***
token_it4[0]str: ':path(.+)'
token_it4[2]str: 'path'
token_it5[0]str: ':path(.+'
token_it5[2]str: 'path(.+'
*** ru_UA.UTF8 ***
token_it4[0]str: ':path(.+)'
token_it4[2]str: 'path'
token_it5[0]str: ':path(.+'
token_it5[2]str: 'path(.+'
*** en_US.UTF8 ***
token_it4[0]str: ':path(.+)'
token_it4[2]str: 'path'
token_it5[0]str: ':path(.+'
token_it5[2]str: 'path(.+'
*** POSIX ***
token_it4[0]str: ':path(.+)'
token_it4[2]str: 'path'
token_it5[0]str: ':path(.+'
token_it5[2]str: 'path(.+'
Program ended with exit code: 0

@eao197
Copy link
Member

eao197 commented Oct 14, 2022

Have you tried boost_regex_engine_t for router? If Boost.Locale breaks std::regex then may be it doesn't breaks Boost.Regex?

@Lord-Kamina
Copy link
Author

Have you tried boost_regex_engine_t for router? If Boost.Locale breaks std::regex then may be it doesn't breaks Boost.Regex?

It doesn't break Boost.Regex indeed, but it's still a no-go, because the code causing the problem is in path2regex.hpp, before restinio even checks what engine the user wants.

BTW: I'm working on fixing bugs with my PR. I'll re-submit it once it can pass all tests.

@eao197
Copy link
Member

eao197 commented Oct 15, 2022

It looks a bit weird: boost.locale breaks the normal work of std::regex, but changes are proposed for RESTinio. What if we will use std::regex somewhere inside RESTinio for other purposes? Should we take care about the incorrect behavior of boost.locale+std::regex? And what to do if there will be some other library that breaks std::regex (or something else from the standard library)? Rewrite a part of RESTinio yet again?

@Lord-Kamina
Copy link
Author

Lord-Kamina commented Oct 17, 2022

It looks a bit weird: boost.locale breaks the normal work of std::regex, but changes are proposed for RESTinio. What if we will use std::regex somewhere inside RESTinio for other purposes? Should we take care about the incorrect behavior of boost.locale+std::regex? And what to do if there will be some other library that breaks std::regex (or something else from the standard library)? Rewrite a part of RESTinio yet again?

I am proposing changes to RESTinio because, in wanting to support express-router, you opted to rely on a third party library, path-to-regexp (same as express.js does). Except, you're using an ancient version of the path-to-regexp code. I have spotted (and fixed) already several flaws in my PR.

Still, at least (but probably not at most) one of the failures are because the original tests have been changed. For example, in path-to-regexp sometime before 2.0 (which is what I believe you used)

  [
    '/:test?-bar',
    null,
    [
      {
        name: 'test',
        prefix: '/',
        delimiter: '/',
        optional: true,
        repeat: false,
        partial: true,
        asterisk: false,
        pattern: '[^\\/]+?'
      },
      '-bar'
    ],
    [
      ['/-bar', ['/-bar', undefined]],
      ['/foo-bar', ['/foo-bar', 'foo']]
    ],
    [
      [{ test: 'foo' }, '/foo-bar']
    ]

Same test in path-to-regexp 6.2.1 (current version):

  [
    "/:test?-bar",
    undefined,
    [
      {
        name: "test",
        prefix: "/",
        suffix: "",
        modifier: "?",
        pattern: "[^\\/#\\?]+?",
      },
      "-bar",
    ],
    [
      ["-bar", ["-bar", undefined]],
      ["/-bar", null],
      ["/foo-bar", ["/foo-bar", "foo"]],
    ],
    [
      [undefined, "-bar"],
      [{ test: "foo" }, "/foo-bar"],
    ],
  ],

The syntax of the data structures is a bit different, but you can probably see the difference in matching behavior.

@Lord-Kamina
Copy link
Author

The whole regex issue that motivated my working on this is essentially just a symptom. If you don't update your implementation of path2regex, the behavior between RESTinio and express.js will continue to differ.

@Lord-Kamina
Copy link
Author

I'll open a new PR now, as I've corrected some errors I found in my port. There's still going to be breaking changes for some people, but they'll likely be things path-to-regexp itself already had a while ago. Not all of these have yet made it into express.js, though., given that the 5.0 betas depend on path-to-regexp 3.2.0

From documentation of path-to-regexp, main changes are (most are copied from the path-to-regexp release notes):

  1. Allow empty string with ending: false to match both relative and absolute paths
  2. Use delimiter when processing repeated matching groups (e.g. foo/bar has no prefix, but has a delimiter)
  3. Support starting option to disable anchoring from beginning of the string
  4. Various enhancements from path-to-regexp 6.0, which are intended partly to restore some compatibility with older versions:
    • Support for nested non-capturing groups in regexp, e.g. /(abc(?=d))
    • Support for custom prefix and suffix groups using /{abc(.*)def}
    • Tokens in an unexpected position will throw an error:
      • Paths like /test(foo previously worked treating ( as a literal character, now it expects ( to be closed and is treated as a group
      • You can escape the character for the previous behavior, e.g. /test\(foo
  5. Support custom prefixes (same as delimiters in RESTinio) option (default is /. which acts like every version since 0.x again)
  6. Add support for {} to capture prefix/suffix explicitly, enables custom use-cases like /:attr1{-:attr2}?
  7. Use /#? as default (end)delimiter to avoid matching on query or fragment parameters; if you are matching non-paths (e.g. hostnames), you can set delimiterto .
  8. Fix invalid matching of :name* parameter

I've updated the test suite to reflect the changes, as well as added some new tests that had been added to the upstream path-to-regexp as well.

@eao197
Copy link
Member

eao197 commented Oct 25, 2022

Hi, @Lord-Kamina !

Thanks you very much for your work.

However, I think that breaking compatibility in 0.6 isn't a good thing, even if that break closes a gap between RESTinio and the original path-to-regex. It seems that your work can be used for work on 0.7 branch when (and if) that work will be started. I'll take a closer look at your PR this week and will merge it in a separate branch.

@Lord-Kamina
Copy link
Author

While I'd like to see this merged eventually, I'm currently using a heavily modified branch on my app and likely will for a while yet, so it's fine if you'll wait for a bigger release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants