-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzer #617
base: master
Are you sure you want to change the base?
Fuzzer #617
Conversation
This fuzzer provides significantly higher coverage of the library, as it implements more handlers and supports parser suspend/resume/reset and external entity parsing.
expat/CMakeLists.txt
Outdated
GIT_REPOSITORY https://github.com/google/libprotobuf-mutator.git | ||
GIT_TAG master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a master branch — a moving target — will be a problem to CI stability.
The latest release of libprotobuf-mutator is of 2020-05-20 while its master
has commits as recent as 2022-11-13. My vote for making a new release in libprotobuf-mutator so that we have something robust and non-moving for potential use in a CI. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the branch to build protobuf from source and pin the versions of both protobuf and libprotobuf-mutator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c01db33f thanks for the pin, but there is more to it: Which release we're pinning precisely, how old the release is, and how well the project is maintained. Pinning a release of 2 years ago of a project that has stopped doing releases would not be healthy. So only a recent release in libprotobuf-mutator will really be able to show the project is healthy enough and understands need for releases, to be a dependable in 2022. What do you think?
With regard to core protobuf, I would like to understand your choice of v3.20.1 over v3.21.4 and v21.4. Could you elaborate?
This change modifies the build to pull in and compile libprotobuf as well as libprotobuf-mutator to avoid needing a preinstalled copy of protobuf/protoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the findings from our call today now, mostly complete but two areas remain explicit TODOs for me on review side for the moment:
- the
XML_Content
model checking code - the additions to the CMake build system
More on that soon, in particular on XML_Content
FetchContent_Declare( | ||
ProtobufMutator | ||
GIT_REPOSITORY https://github.com/google/libprotobuf-mutator.git | ||
GIT_TAG v1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for later, related: google/libprotobuf-mutator#211
required Encoding encoding = 1; | ||
repeated Action actions = 2; | ||
repeated int32 fail_allocations = 3; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a trialing newline here.
string chunk = 1; | ||
string last_chunk = 2; | ||
bool reset = 3; | ||
string external_entity = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If protobuf syntax allows adding short inline comments here about what these actions do (so that it can be understood without reading the C++ code) that would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: Candidates for additional actions could be calls to functions:
XML_GetCurrentByteCount
XML_GetCurrentByteIndex
XML_GetCurrentColumnNumber
XML_GetCurrentLineNumber
XML_GetInputContext
Just an idea.
(const XML_Char*)action.chunk().data(), | ||
action.chunk().size(), 0)) { | ||
// Force a reset after parse error. | ||
XML_ParserReset(parser, encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to XML_ParserReset
dropping all but one of the handlers, a consecutive call to InitializeParser
seems missing here.
} | ||
|
||
XML_ParserFree(parser); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a trialing newline here.
enum XML_Status Parse(XML_Parser parser, const XML_Char* input, int input_len, | ||
int is_final) { | ||
enum XML_Status status = XML_Parse(parser, input, input_len, is_final); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of XML_Char*
seems mistaken here, see https://libexpat.github.io/doc/api/latest/#XML_Parse .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should consider use of XML_GetBuffer
and XML_ParseBuffer
here, for better use of computing resources.
XML_SetDoctypeDeclHandler(parser, StartDoctypeDeclHandler, | ||
EndDoctypeDeclHandler); | ||
XML_SetEntityDeclHandler(parser, EntityDeclHandler); | ||
// NB: This is mutually exclusive with entity_decl_handler, and there isn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learned from this that NB:
is "nota bene" but maybe Note:
would be more clear to non-natives with regard to English.
|
||
# This is enough for libprotobuf-mutator build to find our build of protobuf | ||
set(Protobuf_INCLUDE_DIR ${Protobuf_SOURCE_DIR}/src) | ||
set(Protobuf_LIBRARIES ${Protobuf_BINARY_DIR}/libprotobuf.a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for now or later: while we depend on .a
, we likely need to add -Dprotobuf_BUILD_SHARED_LIBS=OFF
so we get .a
out in all cases.
\___/_/\_\ .__/ \__,_|\__| | ||
|_| XML parser | ||
|
||
Copyright (c) 2022 Google LLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if Google LLC has the copyright, would you mind using format…
Copyright (c) 2022<six spaces>Mark Brand <xxxxxxxxx@google.com>
…, instead? We have that format elsewhere and there is cheap automation that expects this format, applied when doing releases. A line like that will auto-appear for Jann as well.
XML_ParserFree(ext_parser); | ||
} | ||
|
||
return rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be beneficial to always return XML_STATUS_OK
here so that the parser does not abort parsing after this handler.
assert(content->quant == XML_CQUANT_NONE | ||
|| content->quant == XML_CQUANT_REP); | ||
assert(content->name == NULL); | ||
for (int i = 0; i < content->numchildren; ++i) { | ||
assert(content->children[i].type == XML_CTYPE_NAME); | ||
assert(content->children[i].numchildren == 0); | ||
TouchString(content->children[i].name); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have validated the part on XML_CTYPE_MIXED
to be correct now 👍 I would like to suggest adding two more checks to the body of the loop:
assert(content->children[i].quant == XML_CQUANT_NONE);
assert(content->children[i].children == NULL);
PS: My sources for validation have been:
- https://www.w3.org/TR/2006/REC-xml-20060816/#sec-logical-struct
- https://www.w3.org/TR/2006/REC-xml-20060816/#NT-Mixed
- the new example code of pull request examples: Add new example on element declarations #673
- the XML document input mentioned in the description of pull request examples: Add new example on element declarations #673
assert(content->numchildren == 0); | ||
TouchString(content->name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add:
assert(content->children == NULL);
On another note, content->quant
could be any of the four valid XML_QUANT_*
values here. If it's not too verbose, we could add e.g.:
assert((content->quant == XML_CQUANT_NONE) || (content->quant == XML_CQUANT_OPT) || (content->quant == XML_CQUANT_REP) || (content->quant == XML_CQUANT_PLUS));
|
||
case XML_CTYPE_CHOICE: | ||
case XML_CTYPE_SEQ: | ||
assert(content->name == NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content->quant
could be any of the four valid XML_QUANT_*
values here. If it's not too verbose, we could add e.g.:
assert((content->quant == XML_CQUANT_NONE) || (content->quant == XML_CQUANT_OPT) || (content->quant == XML_CQUANT_REP) || (content->quant == XML_CQUANT_PLUS));
case XML_CTYPE_SEQ: | ||
assert(content->name == NULL); | ||
for (int i = 0; i < content->numchildren; ++i) { | ||
TouchChildNodes(&content->children[i], false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A (likely subjective) idea for simplification:
--- &content->children[i]
+++ content->children + i
The existing fuzzers don't cover very much of the possible state, and they don't test the validity of any of the arguments passed to the callback functions, so I wrote a more thorough fuzzer.