New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace jsoncpp with nlohmann json #14877
Conversation
01f8c7b
to
d48404d
Compare
37c997e
to
0d8499f
Compare
064a5ce
to
bbac03f
Compare
cbadc42
to
fdcf92e
Compare
6e43c8d
to
dc88deb
Compare
982f7c6
to
a0eb445
Compare
a0eb445
to
84df68d
Compare
This pull request is stale because it has been open for 14 days with no activity. |
7d1853c
to
0ab7dbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another full pass and mainly found one issue with printing updated expectation in gas tests we should still fix.
Apart from that we should just have one more look into those invalid utf-8 sequence cases in source names. I think that's technically breaking, but it does look like json-cpp already behaved weird in such cases, s.t. we can probably book it as bugfix. But we should confirm the behaviour on this branch and maybe check what json-cpp does with other invalid utf-8 sequences in source names and such, just to be clear that we're fine.
test/solc/CommandLineInterface.cpp
Outdated
@@ -1087,13 +1087,12 @@ BOOST_AUTO_TEST_CASE(standard_json_include_paths) | |||
TemporaryDirectory tempDir({"base/", "include/", "lib/nested/"}, TEST_CASE_NAME); | |||
TemporaryWorkingDirectory tempWorkDir(tempDir); | |||
|
|||
std::string const mainContractSource = withPreamble( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentional? Seems uncalled for (not a big deal anyways)
|
||
jsonChild["3.1"] = "3.1"; | ||
jsonChild["3.2"] = 2; | ||
json["1"] = 1; | ||
json["2"] = "2"; | ||
json["3"] = jsonChild; | ||
json["4"] = "ऑ ऒ ओ औ क ख"; | ||
json["5"] = "\xff"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright - but I guess this is not really an issue, right?
I mean, we do allow non-ascii and unicode inputs.
Like
{
"language": "Solidity",
"sources":
{
"A":
{
"content": "pragma solidity >=0.0; contract C { function f() public pure returns (bytes memory) { return unicode'z'; } }"
}
}
}
I just paranoidly tested this - if I pipe that through sed -e "s/z/Z\xff/"
, replacing the string content with an invalid unicode sequence, I get SyntaxError: Contains invalid UTF-8 sequence at position 1.
.
I guess this would already hit an error sooner now with nlohmann-json and already fail at json-parsing-time?
That would be fine then, if this just moves the error, but we just turn error cases in different error cases.
Apart from that the only thing I could imagine would be things outside the sources like the contract name. If I pipe the above through sed -e "s/A/\xff/"
, the result unforunately compiles... it seems to interpret the source name as \ufffd"
in the output...
I guess that also fails with the nlohmann branch now?
I mean, technically, strictly speaking, that'd make this breaking... but we may get around arguing that supporting non-unicode non-ascii source names is more of a bug than a feature...
Do people these days still use non-unicode-non-ascii encodings for anything?
It's actually also curious that it becomes \ufffd
with json-cpp... does json-cpp try to guess and fix invalid unicode sequences there and reinterprets them as the closest it finds? If so, we could definitely call it a bugfix...
But yeah, we should spend a bit more thought about this here and about these cases, but in the end I think we can get away with it.
@@ -22,19 +22,16 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: still need a final look through this.
cb2b661
to
a173c72
Compare
@@ -0,0 +1 @@ | |||
{"errors":[{"type":"JSONError","component":"general","severity":"error","message":"Error parsing input JSON."}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks strange. I would expect also an error message here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is caused because of an exception in solidity::util::jsonPrint(Json const& _input, JsonFormat const& _format)
where _input
seem to be a valid json object containing the error message where it throws an exception in _input.dump(..)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an additional handler for Json::exception
in StandardCompiler::compile(..)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this fix is not really correct. The main problem is that the initial error message is containing the character that created the problem, so the error message contains that invalid utf8 sequence. So during the creation of the error message, in particular the creation of the json - the json contains the invalid utf8 sequence because of the error message. Dumping that is not possible, because it contains the invalid sequence. Fixing this in a proper way will led the command line tests fail, because of that invalid utf8 sequence:
- standard_non_utf8_filename
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 234: invalid start byte
The problem is that the error message that is now created with the Json::exception
handler is not referring to the correct position of the input json, but it refers to the position of the error json that also contained the invalid utf8 sequence.
However, I would say we should just leave it like this, because this message at least contains the cause of the parsing problem - the invalid utf8 sequence. The clean way might be to replace all non-printable ascii characters with its hex-code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... lets do it the clean way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to solidity::util::jsonParseStrict(..)
f10d9ce
to
c0710bf
Compare
c0710bf
to
87d86bf
Compare
I think not.
Yes, exactly. We just see this earlier.
Yes, this is failing now.
I don't think so, but of course it is technically possible. In general NTFS and exFAT seem to use UTF16 encoding for filenames. Where FAT12, FAT16 and FAT32 is using OEM character sets (https://learn.microsoft.com/en-gb/windows/win32/intl/character-sets-used-in-file-names?redirectedfrom=MSDN). However, I guess we can ignore that. However, I was curious about the use of UTF16 on Windows and whether this is really working right now. It turns out we don't support unicode filenames on Windows - because they are UTF16 encoded. For UTF16 encoded filenames that are not just ASCII we are only receiving So if we have a contract named
This is the behaviour of our current compiler and it looks like that no windows user ever had a problem with this. So I guess we could really just ignore the use of non-utf8 encoded filenames.
I don't know, but
I added some basic tests for the decoding of standard jsons containing invalid utf8 sequences. When I was investigating the utf16 encoded filenames on Windows I found this bug that I mentioned - but because nobody ever found this, I guess it's ok. However, I should probably open an issue for this utf16 filename encoding bug. I also think that it is not worth supporting FAT12, FAT16 and FAT32 filesystems correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YOLO, let's merge this!
/// \param _errs [out] Formatted error messages | ||
/// \return \c true if the document was successfully parsed, \c false if an error occurred. | ||
bool parse(Json::CharReaderBuilder& _builder, std::string const& _input, Json::Value& _json, std::string* _errs) | ||
std::string formatLikeJsoncpp(std::string const& _dumped, JsonFormat const& _format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea for this was to just merge this PR, and then remove this in a subsequent PR accepting the test changes, right :-)? So I don't need to properly review it :-D?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly! Thats the plan!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #15079
Depends on
master
oncesolc-js #729
was merged.