Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #24833: Migrate NodeApi to zio-json in rudder-rest and utils #5656

Conversation

clarktsiory
Copy link
Contributor

@clarktsiory clarktsiory commented May 7, 2024

https://issues.rudder.io/issues/24833

  • First commit : create zio-json datatypes for node details fields. This is just basically creating the same datatype but with the same fields order as in previous serialiser, and doing the mapping using chimney
  • Second commit : remove lift-json imports from utils
  • Remaining : migrating the APIs

@clarktsiory clarktsiory force-pushed the arch_24833/migrate_nodeapi_to_zio_json_in_rudder_rest_and_utils branch from 0bf2300 to e99f345 Compare May 7, 2024 15:09
@clarktsiory clarktsiory requested a review from fanf May 7, 2024 15:09
@clarktsiory clarktsiory marked this pull request as draft May 7, 2024 15:09
@clarktsiory
Copy link
Contributor Author

PR updated with a new commit

@clarktsiory clarktsiory force-pushed the arch_24833/migrate_nodeapi_to_zio_json_in_rudder_rest_and_utils branch from ef835b6 to 45a17dd Compare May 10, 2024 08:07
@clarktsiory
Copy link
Contributor Author

PR updated with a new commit

@clarktsiory clarktsiory force-pushed the arch_24833/migrate_nodeapi_to_zio_json_in_rudder_rest_and_utils branch from 0dbd8d7 to a5032b2 Compare May 10, 2024 12:37
Comment on lines -113 to -120
def serializeServerInfo(srv: Srv, status: String): JValue

def serializeNodeInfo(nodeInfo: NodeInfo, status: String): JValue
def serializeNode(node: Node): JValue

def serializeInventory(inventory: FullInventory, status: String): JValue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Those are getting removed little by little

@clarktsiory clarktsiory force-pushed the arch_24833/migrate_nodeapi_to_zio_json_in_rudder_rest_and_utils branch from a5032b2 to 4677514 Compare May 10, 2024 12:54
@clarktsiory
Copy link
Contributor Author

PR rebased

@clarktsiory clarktsiory marked this pull request as ready for review May 10, 2024 12:55
@clarktsiory
Copy link
Contributor Author

PR updated with a new commit

1 similar comment
@clarktsiory
Copy link
Contributor Author

PR updated with a new commit

}
}
val customFields = fields.filter { field =>
field != "minimal" &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why you special-case minimal and default here. From the field's name point of view, they are invalid like "foo" could be, and you already took care of them aboce.

It won't have any impact on perf either, since allFields is not big, the hash resolution will be quick.
Is there any other reason for doing so I don't see ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole portion of code was copied from the previous RestExtractorService method to "parse" the node details level...

Right below we actually need to separate the input set of string into either :

  • special values ("full", "minimal", "default")
  • custom values ("bios", etc.), which we need to validate before passing them to CustomDetailLevel(special, custom)
    => Maybe what we want here is a smarter constructor for CustomDetailLevel, which is the actual structure that needs the splitting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and I forgot to remove the stale RestExtractorService method which is now unused)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Done in 10768db, which now only allows to construct the CustomDetailLevel safely

}

implicit def transformer(implicit
nodeInfo: NodeInfo,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to start from NodeFact here. We kept nodeinfo / inventory for compat reason with the existing API, but that's not the goal. We need to take the opportunity to remove it enterily from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ => b8c2bf3

.buildTransformer
}

@jsonDiscriminator("type") sealed abstract class OsDetails(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why you are redefining these structure while they already exist in NodeFact.
One of the goal of node facts is to be a "as near as possible" json <-> scala mapper.
There will be some friction at some points (espcially regarding rudder config / agent / etc) for the rest output, but in most case, we would like to use the same data structures.
For bios etc, it should be ok.

If it's done for the maybe-different lifecycle of data: right, we need to have stronger garanties for the rest API than for node fact. Is this for that ? If so, can you clearly document it ? Plus, we will likely want a process to know what to do where when nodefact or rest data need to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly yes, it is to have stronger garanties for the JSON output...

But there are also some structures which need to be rearranged to get the type added to it, the way it was with lift-json (for example the case here for OsDetails and some others).
For inventory fields, I realized the output would have more fields than in the lift-json implementation if I just serialized inventory fields as is from NodeFact (e.g. for some reason Network#description was omitted). And the ordering of fields is also not the same, which may or may not be a concern.

I assumed that adding fields and reordering them would be a huge change in the JSON output of the API, so I tried to keep the result as is, also to have minimal changes in the JSON of API tests.
Do we actually want to avoid the duplication here, in favor of changing the API response (reordering and adding fields) ?

Copy link
Member

@fanf fanf May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule of thumb, the less code we have, the better it is. That's less code to maintain, less code to modify (typically if we add a relevant information in inventory for bios, we need to change it at many more place without automatic (as in compiler/tests) way to prevent error, and also - and especially - less code to wonder "why that code exists? What is meaningful about it, is it important or just something to clean-up?"

All this as to be balanced with the need to have strong garanties to public API, and both your consideration are valid.
So let's see how we do the balancing :)

  • in the case of node facts, one of the stated goal is to get a common bi-directionnal json/scala structure. So it's a place where we are ok to sacrifice a little bit of business-code-side agility in favor of global optimisation in code reuse. What means that sacrifices will happen, especially now that we are at the first iteration of node facts more general use (ie the first time it goes out of it's poc place where everything was neatly aligned for it).
  • for adding fields, that's not a problem. This doesn't break any existing code, and actually it's something we want to be able to do routinely. But for a public API, what we don't want, it's empty optionnal field, which clutters a lot the output and json size. Typically, "description":"" is generally not what is done, we remove the field. It seems that we can do that in the case of NodeFact serialisation too
  • for field ordering. Well, it should not matter, since json is not ordered in the spec, but it will, because the world is like that. Still, it seems that the ordering can be done directly in business objects. The risk it then that somewhere, we switch two Option[String] without noticing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that, I would like that you explore the cost/risk for changing NodeFact and related data structure that are identical modulo order/additional fields, and only keep API-specific structure for when there is changes that need to be taken care of. Basically, so that if a structure is redefined, the person reading the code can know/see/read the comment to understand why it exists without having to guess

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, all objects/fields are not equals. For most of them, we don't care of the order: slot, video, etc - if people use them, it's for data mapping, so field name will be used. So, yes, look it's risky/costly to change order in our business object, but if yes, perhaps we will just accept to change their order in public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with finding the right balance, so I looked at how to minimize additional datastructures.
The resulting change is here => 04e280c

When reusing all NodeFact structures I still found a main obstacle : the existing serializers use the original field name (ifAddresses VS ipAddresses, commandName VS name) in that case we are doomed because we would have to rename the encoded field, which breaks compatibility for existing serialization of node facts, or we would need to encode both fields, which will need another case class anyway...

So, the ones that need to be copied and changed are the ones which have renamed any field in the JSON of the node details endpoint. I found out eventually that the only ones from nodefacts whose codec is initially compatible with the endpoint JSON are :

  • NodeTimezone
  • Processor
  • Slot
  • Sound
  • Video
  • VirtualMachine
    And sadly, that's all 😞... all others have to be duplicated (I added block comments for each of them though)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not at ease with having diverging serialization because just a field name, especially since for now, nodefact serialization is only used internally for storage of inventory on acceptation/refusal. So in the worst case, we can just change the names in 8.1.n + migrate in base and forget about it.

Still, migration are annoying, so a better solution would be to write for the time being in a common format, but be able to read both (and do that in 8.1, and only have the new format 8.2).

zio-json allows that with @jsonAliases("...") so I think we should just do that.
https://zio.dev/zio-json/configuration#aliases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually with aliases, I'm not even sure we need to do the change in 8.1, we can keep the aliases for ~ forever

@clarktsiory
Copy link
Contributor Author

PR updated with a new commit

@clarktsiory clarktsiory force-pushed the arch_24833/migrate_nodeapi_to_zio_json_in_rudder_rest_and_utils branch from 99bb20e to 10768db Compare May 13, 2024 16:44
@clarktsiory
Copy link
Contributor Author

PR updated with a new commit

1 similar comment
@clarktsiory
Copy link
Contributor Author

PR updated with a new commit

Comment on lines 412 to 419
@jsonDiscriminator("type") sealed abstract class OsDetails(
val name: String,
val version: Version,
val fullName: String,
val servicePack: Option[String],
val kernelVersion: Version
)
object OsDetails {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ This has already been factored to a JsonOsDetails case class for the codec definition in NodeFacts

Comment on lines +419 to +422
object MachineType extends Enum[MachineType] {
case object UnknownMachineType extends MachineType("Unknown")
case object PhysicalMachineType extends MachineType("Physical")
case object VirtualMachineType extends MachineType("Virtual")
case object NoMachine extends MachineType("No machine Inventory")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the MachineInfo ADT to make it closest to the domain object of the same name i.e. instead of having the type discriminator on an sum type ADT, just map the machineType attribute from the case class to this custom MachineType one...

(I should have done that from the start 🙈)

final case class DisplayDateTime(value: DateTime) extends AnyVal

/**
* Needed because datetime is serialized differently and therefore needs the DisplayDateTime wrapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want that. Having only one date format for everything is more UX friendlier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the Bios keep the same format of datatime for the releaseDate field in the API versions before v20 b591687

}

/**
* Needed for "swap", which has no mountpoint but the field is not optional in the domain object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to use none for swap mountpoint, it's what is done in fstab, so we're not going out of standard path here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to the the current API b591687

@clarktsiory
Copy link
Contributor Author

clarktsiory commented May 24, 2024

PR updated with a new BIG commit : b591687

The change focuses on re-using the definition of the NodeFacts datatype (the one which is already used for serializing/deserializing in/from the database) to derive JSON codecs for the API for node details.
In fact, the definitions reused in the node details API are the ones for the inventory datatypes (Network, Port, Process, etc.), but the fields can be renamed or not be present (backward compatible), or have a different format in the API : the memory sizes are expressed in megabytes instead of bytes, and date format are slightly different (breaking changes).

So, can make two different set of changes to the JSON codecs for the NodeFacts datatype :

  1. for keeping backward compatibility, and be able to still read back from the database (with renamed fields, ...), and still use the resulting codec in the current v19 API
  2. for breaking changes which have the purpose of changing the node details API in a new v20 version, to have the exact same JSON format (and codec) as the one for the database.

In practice, to make the two generated codec as "close" as possible, the change consists in :

  • moving codecs for inventory datatypes from NodeFacts to the inventory-api package to make codec derivation and imports clearer. We need to define two namespaces to import, older_implicits (having backward compatible JSON with v19 API) and implicits (backward compatible JSON with nodefacts PG table, and the v20 for the API).
  • importing the corresponding implicits namespace to derive the codec respectively for < v20 and >= v20 in the node details API implementation

Most effort was spent adding JSON tests for backward compatibility, but I think they are totally worthy since zio-json is so easy to use that it could be error-prone e.g. with the wrong implicits imports in scope or the use of wrong macro annotations...

Comment on lines +58 to +61
<dependency>
<groupId>com.comcast</groupId>
<artifactId>ip4s-core_${scala-binary-version}</artifactId>
</dependency>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was needed to move the INetAddress codec from NodeFacts (from rudder-core)

@clarktsiory
Copy link
Contributor Author

PR updated with a new commit

Copy link
Member

@fanf fanf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 LGTM 👏 👏 👏

@Normation-Quality-Assistant
Copy link
Contributor

This PR is not mergeable to upper versions.
Since it is "Ready for merge" you must merge it by yourself using the following command:
rudder-dev merge https://github.com/Normation/rudder/pull/5656
-- Your faithful QA
Kant merge: "Live your life as though your every act were to become a universal law."
(https://ci.normation.com/jenkins/job/merge-accepted-pr/84904/console)

@fanf fanf force-pushed the arch_24833/migrate_nodeapi_to_zio_json_in_rudder_rest_and_utils branch from a0c4c58 to 883837d Compare May 24, 2024 16:31
@fanf
Copy link
Member

fanf commented May 24, 2024

OK, merging this PR

@fanf fanf merged commit ff1ca86 into Normation:master May 24, 2024
0 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants