Replace jsoncpp with nlohmann json #14877

aarlt · 2024-02-20T15:54:57Z

Depends on

[evmasm] Fix handling of missing source locations during import. [evmasm] Fix handling of missing source locations during import. #14903
solc-js test: change back to master once solc-js #729 was merged.

github-actions · 2024-03-28T12:06:07Z

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

ekpyron

I had another full pass and mainly found one issue with printing updated expectation in gas tests we should still fix.
Apart from that we should just have one more look into those invalid utf-8 sequence cases in source names. I think that's technically breaking, but it does look like json-cpp already behaved weird in such cases, s.t. we can probably book it as bugfix. But we should confirm the behaviour on this branch and maybe check what json-cpp does with other invalid utf-8 sequences in source names and such, just to be clear that we're fine.

ekpyron · 2024-04-10T18:46:58Z

test/solc/CommandLineInterface.cpp

@@ -1087,13 +1087,12 @@ BOOST_AUTO_TEST_CASE(standard_json_include_paths)
 	TemporaryDirectory tempDir({"base/", "include/", "lib/nested/"}, TEST_CASE_NAME);
 	TemporaryWorkingDirectory tempWorkDir(tempDir);

-	std::string const mainContractSource = withPreamble(


Intentional? Seems uncalled for (not a big deal anyways)

ekpyron · 2024-04-10T20:11:20Z

test/libsolutil/JSON.cpp


 	jsonChild["3.1"] = "3.1";
 	jsonChild["3.2"] = 2;
 	json["1"] = 1;
 	json["2"] = "2";
 	json["3"] = jsonChild;
 	json["4"] = "ऑ ऒ ओ औ क ख";
-	json["5"] = "\xff";


Alright - but I guess this is not really an issue, right?
I mean, we do allow non-ascii and unicode inputs.
Like

{ "language": "Solidity", "sources": { "A": { "content": "pragma solidity >=0.0; contract C { function f() public pure returns (bytes memory) { return unicode'z'; } }" } } }

I just paranoidly tested this - if I pipe that through sed -e "s/z/Z\xff/", replacing the string content with an invalid unicode sequence, I get SyntaxError: Contains invalid UTF-8 sequence at position 1..
I guess this would already hit an error sooner now with nlohmann-json and already fail at json-parsing-time?

That would be fine then, if this just moves the error, but we just turn error cases in different error cases.

Apart from that the only thing I could imagine would be things outside the sources like the contract name. If I pipe the above through sed -e "s/A/\xff/", the result unforunately compiles... it seems to interpret the source name as \ufffd" in the output...

I guess that also fails with the nlohmann branch now?

I mean, technically, strictly speaking, that'd make this breaking... but we may get around arguing that supporting non-unicode non-ascii source names is more of a bug than a feature...
Do people these days still use non-unicode-non-ascii encodings for anything?

It's actually also curious that it becomes \ufffd with json-cpp... does json-cpp try to guess and fix invalid unicode sequences there and reinterprets them as the closest it finds? If so, we could definitely call it a bugfix...

But yeah, we should spend a bit more thought about this here and about these cases, but in the end I think we can get away with it.

test/libsolidity/GasTest.cpp

solc/CommandLineInterface.cpp

ekpyron · 2024-04-10T20:34:56Z

libsolutil/JSON.cpp

@@ -22,19 +22,16 @@



Note to self: still need a final look through this.

aarlt · 2024-05-06T09:47:22Z

test/cmdlineTests/standard_non_utf8_filename2/output.json

@@ -0,0 +1 @@
+{"errors":[{"type":"JSONError","component":"general","severity":"error","message":"Error parsing input JSON."}]}


This looks strange. I would expect also an error message here.

This is caused because of an exception in solidity::util::jsonPrint(Json const& _input, JsonFormat const& _format) where _input seem to be a valid json object containing the error message where it throws an exception in _input.dump(..).

Added an additional handler for Json::exception in StandardCompiler::compile(..).

Ok, this fix is not really correct. The main problem is that the initial error message is containing the character that created the problem, so the error message contains that invalid utf8 sequence. So during the creation of the error message, in particular the creation of the json - the json contains the invalid utf8 sequence because of the error message. Dumping that is not possible, because it contains the invalid sequence. Fixing this in a proper way will led the command line tests fail, because of that invalid utf8 sequence:

- standard_non_utf8_filename Traceback (most recent call last): File "<stdin>", line 2, in <module> File "<frozen codecs>", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 234: invalid start byte

The problem is that the error message that is now created with the Json::exception handler is not referring to the correct position of the input json, but it refers to the position of the error json that also contained the invalid utf8 sequence.

However, I would say we should just leave it like this, because this message at least contains the cause of the parsing problem - the invalid utf8 sequence. The clean way might be to replace all non-printable ascii characters with its hex-code.

hmm... lets do it the clean way...

Added this to solidity::util::jsonParseStrict(..)

aarlt · 2024-05-06T11:34:19Z

Alright - but I guess this is not really an issue, right?

I think not.

I mean, we do allow non-ascii and unicode inputs.
Like
{
        "language": "Solidity",
        "sources":
        {
                "A":
                {
                        "content": "pragma solidity >=0.0; contract C { function f() public pure returns (bytes memory) { return unicode'z'; } }"
                }
        }
}
I just paranoidly tested this - if I pipe that through sed -e "s/z/Z\xff/", replacing the string content with an invalid unicode sequence, I get SyntaxError: Contains invalid UTF-8 sequence at position 1..
I guess this would already hit an error sooner now with nlohmann-json and already fail at json-parsing-time?

Yes, exactly. We just see this earlier.

That would be fine then, if this just moves the error, but we just turn error cases in different error cases.

Apart from that the only thing I could imagine would be things outside the sources like the contract name. If I pipe the above through sed -e "s/A/\xff/", the result unforunately compiles... it seems to interpret the source name as \ufffd" in the output...

I guess that also fails with the nlohmann branch now?

Yes, this is failing now.

I mean, technically, strictly speaking, that'd make this breaking... but we may get around arguing that supporting non-unicode non-ascii source names is more of a bug than a feature...
Do people these days still use non-unicode-non-ascii encodings for anything?

I don't think so, but of course it is technically possible. In general NTFS and exFAT seem to use UTF16 encoding for filenames. Where FAT12, FAT16 and FAT32 is using OEM character sets (https://learn.microsoft.com/en-gb/windows/win32/intl/character-sets-used-in-file-names?redirectedfrom=MSDN). However, I guess we can ignore that.

However, I was curious about the use of UTF16 on Windows and whether this is really working right now. It turns out we don't support unicode filenames on Windows - because they are UTF16 encoded. For UTF16 encoded filenames that are not just ASCII we are only receiving ? question mark characters (63 decimal). After investigating this a bit further, I surprisingly only receive these question mark characters using boost::filesystem::directory_iterator.

So if we have a contract named 片仮名.sol we only receive ???.sol, where ???.sol is of course not existing and failing to open the contract. (On Mac & Linux this works, because the filesystems used there normally use utf8)

←[1m←[31mError←[0m←[1m←[37m: "C:\Users\alex\CLionProjects\untitled1\cmake-build-debug\???.txt" is not found.
←[0m

This is the behaviour of our current compiler and it looks like that no windows user ever had a problem with this. So I guess we could really just ignore the use of non-utf8 encoded filenames.

It's actually also curious that it becomes \ufffd with json-cpp... does json-cpp try to guess and fix invalid unicode sequences there and reinterprets them as the closest it finds? If so, we could definitely call it a bugfix...

I don't know, but \ufffd is also an invalid unicode sequence - so json-cpp even didn't fix it. I would say we can consider this as a bugfix.

But yeah, we should spend a bit more thought about this here and about these cases, but in the end I think we can get away with it.

I added some basic tests for the decoding of standard jsons containing invalid utf8 sequences. When I was investigating the utf16 encoded filenames on Windows I found this bug that I mentioned - but because nobody ever found this, I guess it's ok.

However, I should probably open an issue for this utf16 filename encoding bug. I also think that it is not worth supporting FAT12, FAT16 and FAT32 filesystems correctly.

ekpyron

YOLO, let's merge this!

ekpyron · 2024-05-06T20:37:05Z

libsolutil/JSON.cpp

-/// \param _errs [out] Formatted error messages
-/// \return \c true if the document was successfully parsed, \c false if an error occurred.
-bool parse(Json::CharReaderBuilder& _builder, std::string const& _input, Json::Value& _json, std::string* _errs)
+std::string formatLikeJsoncpp(std::string const& _dumped, JsonFormat const& _format)


The idea for this was to just merge this PR, and then remove this in a subsequent PR accepting the test changes, right :-)? So I don't need to properly review it :-D?

Exactly! Thats the plan!

aarlt marked this pull request as draft February 20, 2024 15:55

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch from 01f8c7b to d48404d Compare February 20, 2024 15:59

ethereum deleted a comment from stackenbotten Feb 20, 2024

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 2 times, most recently from 37c997e to 0d8499f Compare February 20, 2024 17:24

ethereum deleted a comment from stackenbotten Feb 20, 2024

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 3 times, most recently from 064a5ce to bbac03f Compare February 20, 2024 17:31

ethereum deleted a comment from stackenbotten Feb 20, 2024

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 2 times, most recently from cbadc42 to fdcf92e Compare February 21, 2024 01:47

ethereum deleted a comment from stackenbotten Feb 22, 2024

ethereum deleted a comment from GAMECHANGE Feb 23, 2024

ethereum deleted a comment from stackenbotten Feb 23, 2024

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 10 times, most recently from 6e43c8d to dc88deb Compare February 26, 2024 18:23

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch from 982f7c6 to a0eb445 Compare March 8, 2024 13:06

aarlt mentioned this pull request Mar 12, 2024

[yul] Add support for parsing debug data attributes. #14857

Open

2 tasks

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch from a0eb445 to 84df68d Compare March 13, 2024 14:44

github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Mar 28, 2024

aarlt removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 3, 2024

cameel added has dependencies The PR depends on other PRs that must be merged first and removed has dependencies The PR depends on other PRs that must be merged first labels Apr 8, 2024

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 4 times, most recently from 7d1853c to 0ab7dbc Compare April 10, 2024 15:25

ekpyron requested changes Apr 10, 2024

View reviewed changes

github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Apr 25, 2024

aarlt removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 29, 2024

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 2 times, most recently from cb2b661 to a173c72 Compare May 6, 2024 08:51

aarlt commented May 6, 2024

View reviewed changes

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch 2 times, most recently from f10d9ce to c0710bf Compare May 6, 2024 10:58

Replace internal JSON library jsoncpp with nlohmann::json.

87d86bf

aarlt force-pushed the replace-jsoncpp-with-nlohmann-json branch from c0710bf to 87d86bf Compare May 6, 2024 11:02

ethereum deleted a comment from stackenbotten May 6, 2024

ethereum deleted a comment from github-actions bot May 6, 2024

ekpyron approved these changes May 6, 2024

View reviewed changes

aarlt merged commit 59a16df into develop May 7, 2024
72 checks passed

aarlt deleted the replace-jsoncpp-with-nlohmann-json branch May 7, 2024 12:28

aarlt mentioned this pull request May 7, 2024

Correct Changelog. #15080

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace jsoncpp with nlohmann json #14877

Replace jsoncpp with nlohmann json #14877

aarlt commented Feb 20, 2024 •

edited

github-actions bot commented Mar 28, 2024

ekpyron left a comment •

edited

ekpyron Apr 10, 2024

ekpyron Apr 10, 2024 •

edited

ekpyron Apr 10, 2024

aarlt May 6, 2024

aarlt May 6, 2024

aarlt May 6, 2024

aarlt May 6, 2024 •

edited

aarlt May 6, 2024

aarlt May 6, 2024

aarlt commented May 6, 2024

ekpyron left a comment •

edited

ekpyron May 6, 2024

aarlt May 7, 2024

aarlt May 7, 2024

		@@ -0,0 +1 @@
		{"errors":[{"type":"JSONError","component":"general","severity":"error","message":"Error parsing input JSON."}]}

Replace jsoncpp with nlohmann json #14877

Replace jsoncpp with nlohmann json #14877

Conversation

aarlt commented Feb 20, 2024 • edited

github-actions bot commented Mar 28, 2024

ekpyron left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekpyron Apr 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarlt May 6, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarlt commented May 6, 2024

ekpyron left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarlt commented Feb 20, 2024 •

edited

ekpyron left a comment •

edited

ekpyron Apr 10, 2024 •

edited

aarlt May 6, 2024 •

edited

ekpyron left a comment •

edited