Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new JSON parser utility #12531

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wenduwan
Copy link
Contributor

@wenduwan wenduwan commented May 8, 2024

This patch introduces a utility in OPAL based on the 3rd-party project https://github.com/json-parser/json-parser.git

The utility provides APIs to read JSON into memory along with getters to retrieve C values.

Please see the unit test opal_json.c for example usage.

@wenduwan wenduwan force-pushed the opal_util_json branch 2 times, most recently from cec8408 to e494d37 Compare May 8, 2024 15:01
@lrbison
Copy link
Contributor

lrbison commented May 8, 2024

I understand the desire to insulate ourselves from the specific json reader implementation by abstracting out an interface for us to use. However, I am a little uncomfortable inventing a new interface for a json reader. Relevant XKCD. Why not use json-parser's json.c/json.h symbols directly?

@wenduwan
Copy link
Contributor Author

wenduwan commented May 8, 2024

@lrbison For the most part, json-parser only provides an API to parse a string. This PR extended that API to support filename.

Another reason is that I modeled the JSON object with a opal_object_t subclass. The data structure is not compatible with json-parser's json_value, so we need a layer to hide the latter.

@wenduwan wenduwan self-assigned this May 8, 2024
@juntangc
Copy link
Contributor

juntangc commented May 8, 2024

It's probably better to add some descriptions on how to use the json parser than pointing the users to some source code. Please also include the new features added and how to convert existing tuning files into the json files.

@wenduwan
Copy link
Contributor Author

wenduwan commented May 8, 2024

That's a good idea. I can add an example later.

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but there are still some questions pending.

  1. Can you don't use the same name for the two .[ch] files. I know they are in different directories, but the identical name is confusing.
  2. What is the benefit to have the OPAL_OBJECT support ? Is this benefit worth enough to account for the extra memory and CPU needed (including the refcount) to handle the opal_object_t.

test/util/Makefile.am Show resolved Hide resolved
test/util/opal_json.c Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
@wenduwan
Copy link
Contributor Author

wenduwan commented May 9, 2024

@bosilca Thank you very much for the review. I have renamed the files as suggested.

What is the benefit to have the OPAL_OBJECT support ? Is this benefit worth enough to account for the extra memory and CPU needed (including the refcount) to handle the opal_object_t.

The primary motivation is memory management. In json-parser, only the root object should be freed after use - the child objects are merely pointers to the root. This can be tricky to use. With the opal object lifecycle hooks we can override the destructor to correctly free the root object only, while the application can safely OBJ_RELEASE the objects in any order. I understand that this can also be achieved with an explicit destructor function opal_json_free but I like the OPAL_OBJ pattern.

TBH I am not concerned with resource usage - the parser should only be used to read the file into memory and stored somewhere else, e.g. a schema with efficient query methods. Afterwards the parser should be torn down.

@wenduwan
Copy link
Contributor Author

wenduwan commented May 9, 2024

@juntangc I added an example program in opal_json.h. Please take a look.

.gitignore Outdated Show resolved Hide resolved
config/opal_config_files.m4 Show resolved Hide resolved
opal/util/json/opal_json.h Outdated Show resolved Hide resolved
@bosilca
Copy link
Member

bosilca commented May 10, 2024

If you don't add all the OPAL objectification you only have to free a single object, the initial json object via json_value_free. Simpler and cleaner, no need for extra reference counts, nor extra memory allocations. JSON manipulation is simple and straightforward, we should try to keep it that way.

@wenduwan
Copy link
Contributor Author

If you don't add all the OPAL objectification you only have to free a single object, the initial json object via json_value_free. Simpler and cleaner, no need for extra reference counts, nor extra memory allocations. JSON manipulation is simple and straightforward, we should try to keep it that way.

That is also my goal. But my opinion differs in that:

  1. I would like to be more defensive against double free behaviors. Instead of trusting the user to correctly remember the root object and NOT free its children by accident, we can provide a better user experience by hiding this detail and handle it internally.
  2. IMO the OPAL_OBJECT provides a consistent user interface and uniform coding style. I imagine this is also lowers learning curve for the user.

@wenduwan wenduwan force-pushed the opal_util_json branch 2 times, most recently from ab22d6e to 6363bf1 Compare May 10, 2024 17:45
@bosilca
Copy link
Member

bosilca commented May 10, 2024

To state this differently instead of trusting to user to release a single object once (aka. the main json object she obtained from the load function), you trust them to release each object resulting from opal_json_get_key, opal_json_get_index in addition to the high level object obtained via opal_json_load*. Sorry, but I'm hardly convinced.

@bosilca
Copy link
Member

bosilca commented May 10, 2024

I think we can solve our disagreement if instead of opal_json_t** the json API takes an opal_json_t*object. This way the user will always be in charge of the life expectancy of the objects, and the code will now really look as in the example (the one where the upper level object is overwritten by the lower json reader).

@wenduwan
Copy link
Contributor Author

@bosilca Thinking more about OPAL_OBJECT I have a question about inheritance:

In this PR I want to hide json_value from a 3rd-party library and only expose

struct opal_json_t {
    opal_object_t super;
    opal_json_type type;
};

Currently I take advantage of the subclass

struct opal_json_internal_t {
    struct opal_json_t parent;
    json_value *value;
};

This makes the intention clear that the user should not touch the internals beyond struct opal_json_t.

I wonder what the alternative is if I don't use OPAL_OBJECT. It is obvious to me that we can still do something similar without subclassing, e.g.

struct opal_json_t {
    opal_json_type type;
};
...
struct opal_json_internal_t {
    struct opal_json_t parent;
    json_value *value;
};

Would this be better in your opinion?

@wenduwan
Copy link
Contributor Author

@lrbison @bosilca I added a commit that show the difference if I don't use OPAL_OBJECT. Please take a look and let me know if you like it better.

@wenduwan wenduwan requested a review from bosilca May 17, 2024 16:39
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks much better to me.

opal/util/json/opal_json.c Outdated Show resolved Hide resolved
opal/util/json/opal_json.c Outdated Show resolved Hide resolved
@wenduwan wenduwan force-pushed the opal_util_json branch 2 times, most recently from 47d5261 to 00503cb Compare May 17, 2024 22:19
*out = OPAL_JSON_OBJECT;
break;
default:
opal_output_verbose(1, 0, "Invalid JSON value type %d\n", type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a show_help, not a direct opal_output_verbose? (same goes for all cases of errors emitted via opal_output_verbose)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I switched to show help.

/**
* Free JSON resources
*
* @param[in] json Pointer to JSON object to free
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is json set to NULL upon return? The param is marked in, but you pass a **.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope json is not modified upon return.

The ** is due to the fact of that json being used(and likely declared) as const * everywhere - here we need a double pointer to tell the compiler that we are not modifying json, but rather a pointer to it.

* @returns OPAL_SUCCESS if successful
* OPAL_ERROR otherwise
*/
OPAL_DECLSPEC int opal_json_get_object_size(const opal_json_t *json, size_t *size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this returning the number of elements in the json object, or the size of the object in bytes?

The name is a little misleading -- IMHO, size refers to number of bytes. Could we use opal_json_get_num_keys or somesuch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An object is untyped, it cannot contain elements, it only contains bytes. The other function you are referring to is opal_json_get_array_len but it can only apply to arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bosilca Actually this function is also counting object length, not bytes.

I wonder if I should simply keep 1 function for both object and array, i.e. opal_json_get_len.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, object is a keymapped values storage of json_value_t* while array is just an array of json_value_t*. Then a single assessor would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I decided to keep a single function opal_json_get_container_size for both array and object. They are too similar to have different functions.

This patch introduces a utility in OPAL based on the 3rd-party project
https://github.com/json-parser/json-parser.git

The utility provides APIs to read JSON into memory along with getters to
retrieve C values.

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants