Skip to content

ivochkin/embedjson

Repository files navigation

embedjson

License MIT Build Status codecov.io

SAX-style JSON parser, a component of jcppy code generator

Disclaimer

If you are looking for a generic JSON parser/emitter, you are in the wrong place.

Embedjson is not a general purpose JSON library, it is designed to be embedded into the C/C++ object files generated by jcppy to handle specific case of JSON parsing when document structure is known (completely, or partically, as defined by JSON Schema). In the case, JSON parsing can be optimized by throwing away code branches that will be never reached due to the restricting document schema and inlining embedjson functions.

Take a look at those brilliant projects if you need a generic JSON library:

Features

  • Written in pure C99
  • No dependencies - even libc is not needed
  • No memory allocations. Embedjson can be configured to use externally managed dynamic stack
  • UTF-8 validation, including UTF-8 Shortest Form
  • Passes all tests from JSONTestSuite

Configuring embedjson

A set of #define directives can be specified before inlining embedjson.c into the code to configure it:

Name Default Description
EMBEDJSON_DEBUG 0 Define to enable paranoid self-checking mode. Spotted errors will be reported as EMBEDJSON_INTERNAL_ERROR. Also turns on printing debug messages to stdout.

Not recommended for release builds.
EMBEDJSON_DYNAMIC_STACK 0 Define to enable dynamic stack to hold parser's state. When dynamic stack is enabled, user is responsible for initializing embedjson_parser.stack and embedjson_parser.stack_size properties . By default static stack of the fixed size is used.

When EMBEDJSON_DYNAMIC_STACK is enabled, one have to provide embedjson_stack_overflow function implementation in addition to regular parsing events handlers.
EMBEDJSON_STATIC_STACK_SIZE 16 Size (in bytes) of the stack. Size of the stack determines maximum supported objects/arrays nesting level. Each nesting level consumes 1 bit of the stack, so 16 byte stack allows at most 128 nested objects or arrays.
EMBEDJSON_VALIDATE_UTF8 1 Enable UTF-8 validation
EMBEDJSON_BIGNUM 0 Enable big numbers support. By big we assume integers and floating-point numbers that do not fit into EMBEDJSON_INT_T and double types respectively.

When EMBEDJSON_BIGNUM is enabled, one have to provide following functions implementation in addition to regular parsing events handlers:
  • embedjson_bignum_begin
  • embedjson_bignum_chunk
  • embedjson_bignum_end
Note, that one have to implement big number parsing inside callbacks - embedjson guarantees that data provided for embedjson_bignum_chunk contains only digits, '.', '-', 'e' and 'E' characters.
EMBEDJSON_SIZE_T guessed A type to use where size_t is needed. By default, unsigned long or unsigned long long are used, depending on the target architecture.

This macro is needed to maintain independency from libc.
EMBEDJSON_INT_T long long A type to store and operate with parsed integer values. 64-bit long long should be enough for any common usage case. However, if json to be parsed contains extra long integers, one could re-define EMBEDJSON_INT_T to 128-bit integer type supported by the compiler.

Usage guide

Run scripts/amalgamate.sh to generate embedjson.c. Inline embedjson.c into your code and provide an implementation for the following functions:

  • int embedjson_error(embedjson_parser* parser, const char* position);
  • int embedjson_null(embedjson_parser* parser);
  • int embedjson_bool(embedjson_parser* parser, char value);
  • int embedjson_int(embedjson_parser* parser, embedjson_int_t value);
  • int embedjson_double(embedjson_parser* parser, double value);
  • int embedjson_string_begin(embedjson_parser* parser);
  • int embedjson_string_chunk(embedjson_parser* parser, const char* data, embedjson_size_t size);
  • int embedjson_string_end(embedjson_parser* parser);
  • int embedjson_object_begin(embedjson_parser* parser);
  • int embedjson_object_end(embedjson_parser* parser);
  • int embedjson_array_begin(embedjson_parser* parser);
  • int embedjson_array_end(embedjson_parser* parser);
  • int embedjson_bignum_begin(embedjson_parser* parser, embedjson_int_t initial_value); (Only if EMBEDJSON_BIGNUM is enabled)
  • int embedjson_bignum_chunk(embedjson_parser* parser, const char* data, embedjson_size_t size); (Only if EMBEDJSON_BIGNUM is enabled)
  • int embedjson_bignum_end(embedjson_parser* parser); (Only if EMBEDJSON_BIGNUM is enabled)
  • int embedjson_stack_overflow(embedjson_parser* parser); (Only if EMBEDJSON_DYNAMIC_STACK is enabled)

Construst embedjson_parser instance and memset it's content to zero. Provide data for json parsing via embedjson_push and embedjson_finalize methods. Parsing results are returned via callback functions listed above.

Finally you'll end up with a source file similar to this:

// main.c
#include <string.h> /* for memset */

/* Configure embedjson library (optional, see "Configuring embedjson" in README.md) */
#define EMBEDJSON_DYNAMIC_STACK 0
#define EMBEDJSON_BIGNUM 0
#include <embedjson.c>

static int embedjson_error(struct embedjson_parser* parser, const char* position)
{
  return 1;
}
static int embedjson_null(embedjson_parser* parser)
{
  // Place here the code that handles incoming "null" value.
  // The same logic applies to other embedjson_* functions.
  return 0;
}
static int embedjson_bool(embedjson_parser* parser, char value) { return 0; }
static int embedjson_int(embedjson_parser* parser, embedjson_int_t value) { return 0; }
static int embedjson_double(embedjson_parser* parser, double value) { return 0; }
static int embedjson_string_begin(embedjson_parser* parser) { return 0; }
static int embedjson_string_chunk(embedjson_parser* parser,
    const char* data, embedjson_size_t size) { return 0; }
static int embedjson_string_end(embedjson_parser* parser) { return 0; }
static int embedjson_object_begin(embedjson_parser* parser) { return 0; }
static int embedjson_object_end(embedjson_parser* parser) { return 0; }
static int embedjson_array_begin(embedjson_parser* parser) { return 0; }
static int embedjson_array_end(embedjson_parser* parser) { return 0; }

int main()
{
  char json[] = "{\"some\": \"json\", \"object\": true}";
  embedjson_parser parser;
  memset(&parser, 0, sizeof(parser));
  if (embedjson_push(&parser, json, sizeof(json) - 1)) {
    return 1;
  }
  if (embedjson_finalize(&parser)) {
    return 1;
  }
  return 0;
}

An example of how to intergrate embedjson into the real-world application can be found in embedjson_lint.c.

Breaking changes

Semantic versioning is used to label embedjson releases. A list of all breaking changes of each major release is accumulated in this section.

3.x (upcoming)

  • Change embedjson_int interface:

    - int embedjson_int(embedjson_parser* parser, long long value)
    + int embedjson_int(embedjson_parser* parser, embedjson_int_t value)
  • Change embedjson_error interface:

    - int embedjson_error(embedjson_parser*, const char*)
    + int embedjson_error(embedjson_parser*, embedjson_error_code, const char*)

2.x (and prior)

API changes haven't been tracked for versions prior to 2.x. Version 2.0.0 should be considered a first stable release.

TODO

  • UTF-16, UTF-32 support
  • Integrate all tests from https://github.com/nst/JSONTestSuite into unit tests
  • 95+% test coverage
  • Ensure unicode escape does not need validation
  • Fuzzing
  • non-arithmetical double construction (IEEE 754)
  • recovery from several types of errors, e.g. EMBEDJSON_LEADING_PLUS, EMBEDJSON_UNESCAPED_CONTROL_CHAR, EMBEDJSON_EXPONENT_OVERFLOW