Fuzzer #617

c01db33f · 2022-07-22T15:01:23Z

The existing fuzzers don't cover very much of the possible state, and they don't test the validity of any of the arguments passed to the callback functions, so I wrote a more thorough fuzzer.

This fuzzer provides significantly higher coverage of the library, as it implements more handlers and supports parser suspend/resume/reset and external entity parsing.

hartwork · 2022-07-30T22:59:33Z

expat/CMakeLists.txt

+        GIT_REPOSITORY https://github.com/google/libprotobuf-mutator.git
+        GIT_TAG master


Adding a master branch — a moving target — will be a problem to CI stability.
The latest release of libprotobuf-mutator is of 2020-05-20 while its master has commits as recent as 2022-11-13. My vote for making a new release in libprotobuf-mutator so that we have something robust and non-moving for potential use in a CI. Thank you.

I've updated the branch to build protobuf from source and pin the versions of both protobuf and libprotobuf-mutator.

@c01db33f thanks for the pin, but there is more to it: Which release we're pinning precisely, how old the release is, and how well the project is maintained. Pinning a release of 2 years ago of a project that has stopped doing releases would not be healthy. So only a recent release in libprotobuf-mutator will really be able to show the project is healthy enough and understands need for releases, to be a dependable in 2022. What do you think?

With regard to core protobuf, I would like to understand your choice of v3.20.1 over v3.21.4 and v21.4. Could you elaborate?

This change modifies the build to pull in and compile libprotobuf as well as libprotobuf-mutator to avoid needing a preinstalled copy of protobuf/protoc.

hartwork

Added the findings from our call today now, mostly complete but two areas remain explicit TODOs for me on review side for the moment:

the XML_Content model checking code
the additions to the CMake build system

More on that soon, in particular on XML_Content

hartwork · 2022-10-13T22:17:15Z

expat/CMakeLists.txt

+    FetchContent_Declare(
+        ProtobufMutator
+        GIT_REPOSITORY https://github.com/google/libprotobuf-mutator.git
+        GIT_TAG v1.0


Note for later, related: google/libprotobuf-mutator#211

hartwork · 2022-10-13T22:19:22Z

expat/fuzz/xml_lpm_fuzzer.proto

+  required Encoding encoding = 1;
+  repeated Action actions = 2;
+  repeated int32 fail_allocations = 3;
+}


It would be great to have a trialing newline here.

hartwork · 2022-10-13T22:20:57Z

expat/fuzz/xml_lpm_fuzzer.proto

+    string chunk = 1;
+    string last_chunk = 2;
+    bool reset = 3;
+    string external_entity = 4;


If protobuf syntax allows adding short inline comments here about what these actions do (so that it can be understood without reading the C++ code) that would be nice.

PS: Candidates for additional actions could be calls to functions:

XML_GetCurrentByteCount

XML_GetCurrentByteIndex

XML_GetCurrentColumnNumber

XML_GetCurrentLineNumber

XML_GetInputContext

Just an idea.

hartwork · 2022-10-13T22:21:57Z

expat/fuzz/xml_lpm_fuzzer.cpp

+                                      (const XML_Char*)action.chunk().data(),
+                                      action.chunk().size(), 0)) {
+          // Force a reset after parse error.
+          XML_ParserReset(parser, encoding);


Due to XML_ParserReset dropping all but one of the handlers, a consecutive call to InitializeParser seems missing here.

hartwork · 2022-10-13T22:22:37Z

expat/fuzz/xml_lpm_fuzzer.cpp

+  }
+
+  XML_ParserFree(parser);
+}


It would be great to have a trialing newline here.

hartwork · 2022-10-13T22:44:31Z

expat/fuzz/xml_lpm_fuzzer.cpp

+enum XML_Status Parse(XML_Parser parser, const XML_Char* input, int input_len,
+                      int is_final) {
+  enum XML_Status status = XML_Parse(parser, input, input_len, is_final);


Use of XML_Char* seems mistaken here, see https://libexpat.github.io/doc/api/latest/#XML_Parse .

Also, we should consider use of XML_GetBuffer and XML_ParseBuffer here, for better use of computing resources.

hartwork · 2022-10-13T22:46:21Z

expat/fuzz/xml_lpm_fuzzer.cpp

+  XML_SetDoctypeDeclHandler(parser, StartDoctypeDeclHandler,
+                            EndDoctypeDeclHandler);
+  XML_SetEntityDeclHandler(parser, EntityDeclHandler);
+  // NB: This is mutually exclusive with entity_decl_handler, and there isn't


I learned from this that NB: is "nota bene" but maybe Note: would be more clear to non-natives with regard to English.

hartwork · 2022-10-13T22:47:26Z

expat/CMakeLists.txt

+
+    # This is enough for libprotobuf-mutator build to find our build of protobuf
+    set(Protobuf_INCLUDE_DIR ${Protobuf_SOURCE_DIR}/src)
+    set(Protobuf_LIBRARIES ${Protobuf_BINARY_DIR}/libprotobuf.a)


Note for now or later: while we depend on .a, we likely need to add -Dprotobuf_BUILD_SHARED_LIBS=OFF so we get .a out in all cases.

hartwork · 2022-10-13T22:58:09Z

expat/fuzz/xml_lpm_fuzzer.proto

+                        \___/_/\_\ .__/ \__,_|\__|
+                                 |_| XML parser
+
+   Copyright (c) 2022      Google LLC


Even if Google LLC has the copyright, would you mind using format…
Copyright (c) 2022<six spaces>Mark Brand <xxxxxxxxx@google.com>
…, instead? We have that format elsewhere and there is cheap automation that expects this format, applied when doing releases. A line like that will auto-appear for Jann as well.

hartwork · 2022-10-13T23:01:15Z

expat/fuzz/xml_lpm_fuzzer.cpp

+    XML_ParserFree(ext_parser);
+  }
+
+  return rc;


It could be beneficial to always return XML_STATUS_OK here so that the parser does not abort parsing after this handler.

hartwork · 2022-10-30T20:10:08Z

expat/fuzz/xml_lpm_fuzzer.cpp

+      assert(content->quant == XML_CQUANT_NONE
+             || content->quant == XML_CQUANT_REP);
+      assert(content->name == NULL);
+      for (int i = 0; i < content->numchildren; ++i) {
+        assert(content->children[i].type == XML_CTYPE_NAME);
+        assert(content->children[i].numchildren == 0);
+        TouchString(content->children[i].name);
+      }


I have validated the part on XML_CTYPE_MIXED to be correct now 👍 I would like to suggest adding two more checks to the body of the loop:

assert(content->children[i].quant == XML_CQUANT_NONE); assert(content->children[i].children == NULL);

PS: My sources for validation have been:

https://www.w3.org/TR/2006/REC-xml-20060816/#sec-logical-struct

https://www.w3.org/TR/2006/REC-xml-20060816/#NT-Mixed

the new example code of pull request examples: Add new example on element declarations #673

the XML document input mentioned in the description of pull request examples: Add new example on element declarations #673

hartwork · 2022-10-30T20:17:02Z

expat/fuzz/xml_lpm_fuzzer.cpp

+      assert(content->numchildren == 0);
+      TouchString(content->name);


Let's add:

assert(content->children == NULL);

On another note, content->quant could be any of the four valid XML_QUANT_* values here. If it's not too verbose, we could add e.g.:

assert((content->quant == XML_CQUANT_NONE) || (content->quant == XML_CQUANT_OPT) || (content->quant == XML_CQUANT_REP) || (content->quant == XML_CQUANT_PLUS));

hartwork · 2022-10-30T20:21:37Z

expat/fuzz/xml_lpm_fuzzer.cpp

+
+    case XML_CTYPE_CHOICE:
+    case XML_CTYPE_SEQ:
+      assert(content->name == NULL);


content->quant could be any of the four valid XML_QUANT_* values here. If it's not too verbose, we could add e.g.:

assert((content->quant == XML_CQUANT_NONE) || (content->quant == XML_CQUANT_OPT) || (content->quant == XML_CQUANT_REP) || (content->quant == XML_CQUANT_PLUS));

hartwork · 2022-10-30T20:22:55Z

expat/fuzz/xml_lpm_fuzzer.cpp

+    case XML_CTYPE_SEQ:
+      assert(content->name == NULL);
+      for (int i = 0; i < content->numchildren; ++i) {
+        TouchChildNodes(&content->children[i], false);


A (likely subjective) idea for simplification:

--- &content->children[i] +++ content->children + i

c01db33f added 2 commits July 22, 2022 15:39

fuzzers: Add a new libprotobuf-mutator fuzzer.

1a4cbd4

This fuzzer provides significantly higher coverage of the library, as it implements more handlers and supports parser suspend/resume/reset and external entity parsing.

fuzzers: Add support for failing allocations.

02a6f9b

hartwork reviewed Jul 30, 2022

View reviewed changes

fuzzers: Pin tagged versions of protobuf and libprotobuf-mutator

228447b

This change modifies the build to pull in and compile libprotobuf as well as libprotobuf-mutator to avoid needing a preinstalled copy of protobuf/protoc.

hartwork requested changes Oct 13, 2022

View reviewed changes

hartwork reviewed Oct 30, 2022

View reviewed changes

hartwork added the enhancement label Sep 2, 2023

hartwork mentioned this pull request Jan 8, 2024

fuzz: improve coverage #797

Merged

hartwork added the fuzzing label Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzzer #617

Fuzzer #617

c01db33f commented Jul 22, 2022

hartwork Jul 30, 2022 •

edited

c01db33f Aug 2, 2022

hartwork Aug 2, 2022

hartwork left a comment

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Nov 2, 2022

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Oct 13, 2022

hartwork Oct 30, 2022 •

edited

hartwork Oct 30, 2022

hartwork Oct 30, 2022

hartwork Oct 30, 2022

		GIT_REPOSITORY https://github.com/google/libprotobuf-mutator.git
		GIT_TAG master

		assert(content->numchildren == 0);
		TouchString(content->name);

Fuzzer #617

Are you sure you want to change the base?

Fuzzer #617

Conversation

c01db33f commented Jul 22, 2022

hartwork Jul 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hartwork left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hartwork Oct 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hartwork Jul 30, 2022 •

edited

hartwork Oct 30, 2022 •

edited