Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzer #617

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Fuzzer #617

wants to merge 3 commits into from

Conversation

c01db33f
Copy link

The existing fuzzers don't cover very much of the possible state, and they don't test the validity of any of the arguments passed to the callback functions, so I wrote a more thorough fuzzer.

This fuzzer provides significantly higher coverage of the library, as
it implements more handlers and supports parser suspend/resume/reset
and external entity parsing.
Comment on lines 688 to 689
GIT_REPOSITORY https://github.com/google/libprotobuf-mutator.git
GIT_TAG master
Copy link
Member

@hartwork hartwork Jul 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a master branch — a moving target — will be a problem to CI stability.
The latest release of libprotobuf-mutator is of 2020-05-20 while its master has commits as recent as 2022-11-13. My vote for making a new release in libprotobuf-mutator so that we have something robust and non-moving for potential use in a CI. Thank you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the branch to build protobuf from source and pin the versions of both protobuf and libprotobuf-mutator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c01db33f thanks for the pin, but there is more to it: Which release we're pinning precisely, how old the release is, and how well the project is maintained. Pinning a release of 2 years ago of a project that has stopped doing releases would not be healthy. So only a recent release in libprotobuf-mutator will really be able to show the project is healthy enough and understands need for releases, to be a dependable in 2022. What do you think?

With regard to core protobuf, I would like to understand your choice of v3.20.1 over v3.21.4 and v21.4. Could you elaborate?

This change modifies the build to pull in and compile libprotobuf as well as
libprotobuf-mutator to avoid needing a preinstalled copy of protobuf/protoc.
Copy link
Member

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the findings from our call today now, mostly complete but two areas remain explicit TODOs for me on review side for the moment:

  • the XML_Content model checking code
  • the additions to the CMake build system

More on that soon, in particular on XML_Content

FetchContent_Declare(
ProtobufMutator
GIT_REPOSITORY https://github.com/google/libprotobuf-mutator.git
GIT_TAG v1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for later, related: google/libprotobuf-mutator#211

required Encoding encoding = 1;
repeated Action actions = 2;
repeated int32 fail_allocations = 3;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have a trialing newline here.

Comment on lines +46 to +49
string chunk = 1;
string last_chunk = 2;
bool reset = 3;
string external_entity = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If protobuf syntax allows adding short inline comments here about what these actions do (so that it can be understood without reading the C++ code) that would be nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: Candidates for additional actions could be calls to functions:

  • XML_GetCurrentByteCount
  • XML_GetCurrentByteIndex
  • XML_GetCurrentColumnNumber
  • XML_GetCurrentLineNumber
  • XML_GetInputContext

Just an idea.

(const XML_Char*)action.chunk().data(),
action.chunk().size(), 0)) {
// Force a reset after parse error.
XML_ParserReset(parser, encoding);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to XML_ParserReset dropping all but one of the handlers, a consecutive call to InitializeParser seems missing here.

}

XML_ParserFree(parser);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have a trialing newline here.

Comment on lines +107 to +109
enum XML_Status Parse(XML_Parser parser, const XML_Char* input, int input_len,
int is_final) {
enum XML_Status status = XML_Parse(parser, input, input_len, is_final);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of XML_Char* seems mistaken here, see https://libexpat.github.io/doc/api/latest/#XML_Parse .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should consider use of XML_GetBuffer and XML_ParseBuffer here, for better use of computing resources.

XML_SetDoctypeDeclHandler(parser, StartDoctypeDeclHandler,
EndDoctypeDeclHandler);
XML_SetEntityDeclHandler(parser, EntityDeclHandler);
// NB: This is mutually exclusive with entity_decl_handler, and there isn't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned from this that NB: is "nota bene" but maybe Note: would be more clear to non-natives with regard to English.


# This is enough for libprotobuf-mutator build to find our build of protobuf
set(Protobuf_INCLUDE_DIR ${Protobuf_SOURCE_DIR}/src)
set(Protobuf_LIBRARIES ${Protobuf_BINARY_DIR}/libprotobuf.a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for now or later: while we depend on .a, we likely need to add -Dprotobuf_BUILD_SHARED_LIBS=OFF so we get .a out in all cases.

\___/_/\_\ .__/ \__,_|\__|
|_| XML parser

Copyright (c) 2022 Google LLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if Google LLC has the copyright, would you mind using format…
Copyright (c) 2022<six spaces>Mark Brand <xxxxxxxxx@google.com>
…, instead? We have that format elsewhere and there is cheap automation that expects this format, applied when doing releases. A line like that will auto-appear for Jann as well.

XML_ParserFree(ext_parser);
}

return rc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be beneficial to always return XML_STATUS_OK here so that the parser does not abort parsing after this handler.

Comment on lines +150 to +157
assert(content->quant == XML_CQUANT_NONE
|| content->quant == XML_CQUANT_REP);
assert(content->name == NULL);
for (int i = 0; i < content->numchildren; ++i) {
assert(content->children[i].type == XML_CTYPE_NAME);
assert(content->children[i].numchildren == 0);
TouchString(content->children[i].name);
}
Copy link
Member

@hartwork hartwork Oct 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have validated the part on XML_CTYPE_MIXED to be correct now 👍 I would like to suggest adding two more checks to the body of the loop:

assert(content->children[i].quant == XML_CQUANT_NONE);
assert(content->children[i].children == NULL);

PS: My sources for validation have been:

Comment on lines +161 to +162
assert(content->numchildren == 0);
TouchString(content->name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add:

assert(content->children == NULL);

On another note, content->quant could be any of the four valid XML_QUANT_* values here. If it's not too verbose, we could add e.g.:

assert((content->quant == XML_CQUANT_NONE) || (content->quant == XML_CQUANT_OPT) || (content->quant == XML_CQUANT_REP) || (content->quant == XML_CQUANT_PLUS));


case XML_CTYPE_CHOICE:
case XML_CTYPE_SEQ:
assert(content->name == NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content->quant could be any of the four valid XML_QUANT_* values here. If it's not too verbose, we could add e.g.:

assert((content->quant == XML_CQUANT_NONE) || (content->quant == XML_CQUANT_OPT) || (content->quant == XML_CQUANT_REP) || (content->quant == XML_CQUANT_PLUS));

case XML_CTYPE_SEQ:
assert(content->name == NULL);
for (int i = 0; i < content->numchildren; ++i) {
TouchChildNodes(&content->children[i], false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A (likely subjective) idea for simplification:

--- &content->children[i]
+++ content->children + i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants