Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace jsoncpp with nlohmann json #14877

Merged
merged 1 commit into from May 7, 2024
Merged

Conversation

aarlt
Copy link
Member

@aarlt aarlt commented Feb 20, 2024

Depends on

@aarlt aarlt marked this pull request as draft February 20, 2024 15:55
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch from 01f8c7b to d48404d Compare February 20, 2024 15:59
@ethereum ethereum deleted a comment from stackenbotten Feb 20, 2024
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 2 times, most recently from 37c997e to 0d8499f Compare February 20, 2024 17:24
@ethereum ethereum deleted a comment from stackenbotten Feb 20, 2024
@ethereum ethereum deleted a comment from stackenbotten Feb 20, 2024
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 3 times, most recently from 064a5ce to bbac03f Compare February 20, 2024 17:31
@ethereum ethereum deleted a comment from stackenbotten Feb 20, 2024
@ethereum ethereum deleted a comment from stackenbotten Feb 20, 2024
@ethereum ethereum deleted a comment from stackenbotten Feb 20, 2024
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 2 times, most recently from cbadc42 to fdcf92e Compare February 21, 2024 01:47
@ethereum ethereum deleted a comment from stackenbotten Feb 22, 2024
@ethereum ethereum deleted a comment from stackenbotten Feb 22, 2024
@ethereum ethereum deleted a comment from GAMECHANGE Feb 23, 2024
@ethereum ethereum deleted a comment from stackenbotten Feb 23, 2024
@ethereum ethereum deleted a comment from stackenbotten Feb 23, 2024
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 10 times, most recently from 6e43c8d to dc88deb Compare February 26, 2024 18:23
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch from 982f7c6 to a0eb445 Compare March 8, 2024 13:06
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch from a0eb445 to 84df68d Compare March 13, 2024 14:44
Copy link

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Mar 28, 2024
@aarlt aarlt removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 3, 2024
@cameel cameel added has dependencies The PR depends on other PRs that must be merged first and removed has dependencies The PR depends on other PRs that must be merged first labels Apr 8, 2024
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 4 times, most recently from 7d1853c to 0ab7dbc Compare April 10, 2024 15:25
Copy link
Member

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another full pass and mainly found one issue with printing updated expectation in gas tests we should still fix.
Apart from that we should just have one more look into those invalid utf-8 sequence cases in source names. I think that's technically breaking, but it does look like json-cpp already behaved weird in such cases, s.t. we can probably book it as bugfix. But we should confirm the behaviour on this branch and maybe check what json-cpp does with other invalid utf-8 sequences in source names and such, just to be clear that we're fine.

@@ -1087,13 +1087,12 @@ BOOST_AUTO_TEST_CASE(standard_json_include_paths)
TemporaryDirectory tempDir({"base/", "include/", "lib/nested/"}, TEST_CASE_NAME);
TemporaryWorkingDirectory tempWorkDir(tempDir);

std::string const mainContractSource = withPreamble(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional? Seems uncalled for (not a big deal anyways)


jsonChild["3.1"] = "3.1";
jsonChild["3.2"] = 2;
json["1"] = 1;
json["2"] = "2";
json["3"] = jsonChild;
json["4"] = "ऑ ऒ ओ औ क ख";
json["5"] = "\xff";
Copy link
Member

@ekpyron ekpyron Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright - but I guess this is not really an issue, right?
I mean, we do allow non-ascii and unicode inputs.
Like

{
        "language": "Solidity",
        "sources":
        {
                "A":
                {
                        "content": "pragma solidity >=0.0; contract C { function f() public pure returns (bytes memory) { return unicode'z'; } }"
                }
        }
}

I just paranoidly tested this - if I pipe that through sed -e "s/z/Z\xff/", replacing the string content with an invalid unicode sequence, I get SyntaxError: Contains invalid UTF-8 sequence at position 1..
I guess this would already hit an error sooner now with nlohmann-json and already fail at json-parsing-time?

That would be fine then, if this just moves the error, but we just turn error cases in different error cases.

Apart from that the only thing I could imagine would be things outside the sources like the contract name. If I pipe the above through sed -e "s/A/\xff/", the result unforunately compiles... it seems to interpret the source name as \ufffd" in the output...

I guess that also fails with the nlohmann branch now?

I mean, technically, strictly speaking, that'd make this breaking... but we may get around arguing that supporting non-unicode non-ascii source names is more of a bug than a feature...
Do people these days still use non-unicode-non-ascii encodings for anything?

It's actually also curious that it becomes \ufffd with json-cpp... does json-cpp try to guess and fix invalid unicode sequences there and reinterprets them as the closest it finds? If so, we could definitely call it a bugfix...

But yeah, we should spend a bit more thought about this here and about these cases, but in the end I think we can get away with it.

test/libsolidity/GasTest.cpp Outdated Show resolved Hide resolved
solc/CommandLineInterface.cpp Outdated Show resolved Hide resolved
@@ -22,19 +22,16 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: still need a final look through this.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Apr 25, 2024
@aarlt aarlt removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 29, 2024
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 2 times, most recently from cb2b661 to a173c72 Compare May 6, 2024 08:51
@@ -0,0 +1 @@
{"errors":[{"type":"JSONError","component":"general","severity":"error","message":"Error parsing input JSON."}]}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange. I would expect also an error message here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is caused because of an exception in solidity::util::jsonPrint(Json const& _input, JsonFormat const& _format) where _input seem to be a valid json object containing the error message where it throws an exception in _input.dump(..).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an additional handler for Json::exception in StandardCompiler::compile(..).

Copy link
Member Author

@aarlt aarlt May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this fix is not really correct. The main problem is that the initial error message is containing the character that created the problem, so the error message contains that invalid utf8 sequence. So during the creation of the error message, in particular the creation of the json - the json contains the invalid utf8 sequence because of the error message. Dumping that is not possible, because it contains the invalid sequence. Fixing this in a proper way will led the command line tests fail, because of that invalid utf8 sequence:

 - standard_non_utf8_filename
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 234: invalid start byte

The problem is that the error message that is now created with the Json::exception handler is not referring to the correct position of the input json, but it refers to the position of the error json that also contained the invalid utf8 sequence.

However, I would say we should just leave it like this, because this message at least contains the cause of the parsing problem - the invalid utf8 sequence. The clean way might be to replace all non-printable ascii characters with its hex-code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... lets do it the clean way...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 2 times, most recently from f10d9ce to c0710bf Compare May 6, 2024 10:58
@aarlt aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch from c0710bf to 87d86bf Compare May 6, 2024 11:02
@ethereum ethereum deleted a comment from stackenbotten May 6, 2024
@ethereum ethereum deleted a comment from stackenbotten May 6, 2024
@aarlt
Copy link
Member Author

aarlt commented May 6, 2024

Alright - but I guess this is not really an issue, right?

I think not.

I mean, we do allow non-ascii and unicode inputs.
Like

{
        "language": "Solidity",
        "sources":
        {
                "A":
                {
                        "content": "pragma solidity >=0.0; contract C { function f() public pure returns (bytes memory) { return unicode'z'; } }"
                }
        }
}

I just paranoidly tested this - if I pipe that through sed -e "s/z/Z\xff/", replacing the string content with an invalid unicode sequence, I get SyntaxError: Contains invalid UTF-8 sequence at position 1..
I guess this would already hit an error sooner now with nlohmann-json and already fail at json-parsing-time?

Yes, exactly. We just see this earlier.

That would be fine then, if this just moves the error, but we just turn error cases in different error cases.

Apart from that the only thing I could imagine would be things outside the sources like the contract name. If I pipe the above through sed -e "s/A/\xff/", the result unforunately compiles... it seems to interpret the source name as \ufffd" in the output...

I guess that also fails with the nlohmann branch now?

Yes, this is failing now.

I mean, technically, strictly speaking, that'd make this breaking... but we may get around arguing that supporting non-unicode non-ascii source names is more of a bug than a feature...
Do people these days still use non-unicode-non-ascii encodings for anything?

I don't think so, but of course it is technically possible. In general NTFS and exFAT seem to use UTF16 encoding for filenames. Where FAT12, FAT16 and FAT32 is using OEM character sets (https://learn.microsoft.com/en-gb/windows/win32/intl/character-sets-used-in-file-names?redirectedfrom=MSDN). However, I guess we can ignore that.

However, I was curious about the use of UTF16 on Windows and whether this is really working right now. It turns out we don't support unicode filenames on Windows - because they are UTF16 encoded. For UTF16 encoded filenames that are not just ASCII we are only receiving ? question mark characters (63 decimal). After investigating this a bit further, I surprisingly only receive these question mark characters using boost::filesystem::directory_iterator.

So if we have a contract named 片仮名.sol we only receive ???.sol, where ???.sol is of course not existing and failing to open the contract. (On Mac & Linux this works, because the filesystems used there normally use utf8)

←[1m←[31mError←[0m←[1m←[37m: "C:\Users\alex\CLionProjects\untitled1\cmake-build-debug\???.txt" is not found.
←[0m

This is the behaviour of our current compiler and it looks like that no windows user ever had a problem with this. So I guess we could really just ignore the use of non-utf8 encoded filenames.

It's actually also curious that it becomes \ufffd with json-cpp... does json-cpp try to guess and fix invalid unicode sequences there and reinterprets them as the closest it finds? If so, we could definitely call it a bugfix...

I don't know, but \ufffd is also an invalid unicode sequence - so json-cpp even didn't fix it. I would say we can consider this as a bugfix.

But yeah, we should spend a bit more thought about this here and about these cases, but in the end I think we can get away with it.

I added some basic tests for the decoding of standard jsons containing invalid utf8 sequences. When I was investigating the utf16 encoded filenames on Windows I found this bug that I mentioned - but because nobody ever found this, I guess it's ok.

However, I should probably open an issue for this utf16 filename encoding bug. I also think that it is not worth supporting FAT12, FAT16 and FAT32 filesystems correctly.

@ethereum ethereum deleted a comment from github-actions bot May 6, 2024
Copy link
Member

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YOLO, let's merge this!

/// \param _errs [out] Formatted error messages
/// \return \c true if the document was successfully parsed, \c false if an error occurred.
bool parse(Json::CharReaderBuilder& _builder, std::string const& _input, Json::Value& _json, std::string* _errs)
std::string formatLikeJsoncpp(std::string const& _dumped, JsonFormat const& _format)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea for this was to just merge this PR, and then remove this in a subsequent PR accepting the test changes, right :-)? So I don't need to properly review it :-D?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! Thats the plan!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #15079

@aarlt aarlt merged commit 59a16df into develop May 7, 2024
72 checks passed
@aarlt aarlt deleted the replace-jsoncpp-with-nlohmann-json branch May 7, 2024 12:28
@aarlt aarlt mentioned this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has dependencies The PR depends on other PRs that must be merged first
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants