Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for parallel relationships #1361

Open
gabesullice opened this issue Jan 2, 2019 · 14 comments · May be fixed by #1677
Open

Support for parallel relationships #1361

gabesullice opened this issue Jan 2, 2019 · 14 comments · May be fixed by #1677

Comments

@gabesullice
Copy link
Contributor

gabesullice commented Jan 2, 2019

The spec is silent about parallel relationships (that is, more than one resource identifier object with the same type and id combo) in all but one case:

If a client makes a POST request to a URL from a relationship link, the server MUST add the specified members to the relationship unless they are already present. If a given type and id is already in the relationship, the server MUST NOT add it again.

I think it would be useful for this spec to provide explicit support for parallel relationships. This would mean relaxing the above statement and then clarifying their treatment and serialization.

Let's think about a real-world example:

  • I have a resource object with the type: organization
  • I have a relationship field named members.
  • I have a resource object with the type: user
  • I have a user who is both a "founder" of the organization and the "treasurer" & these roles have start and end dates.

A request to /organizations/1/relationships/members should return:

HTTP/1.1 200 Okay

{
  "data": [{
    "type": "user",
    "id": 42,
    "meta": {
      "role": "founder",
      "since": "2012-09-01T15:53:00+06:00",
      "until": "heat death of the universe",
    }
  }, {
    "type": "user",
    "id": 42,
    "meta": {
      "role": "treasurer",
      "since": "2017-01-01T00:00:00+06:00",
      "until": "2021-01-01T00:00:00+06:00"
    }
  }]
}

Obviously, one could model these memberships as standalone resource objects, however this can pose a significant technical challenge on systems which do not natively model relationships this way (my scenario). Moreover, in a traditional relational database, modeling relationships in that way can lead to application performance issues due to an increase in the average number of JOINs required for its typical queries.

We currently work around this shortcoming by adding a new member to the meta object of resource objects that we call arity:

{
    "type": "user",
    "id": 42,
    "meta": {
      "arity": 0
    }
  }, {
    "type": "user",
    "id": 42,
    "meta": {
      "arity": 1
    }
  }

We use this member to help a client disambiguate parallel relationships, to identify the relationship to delete on relationship DELETE requests and to provide a stable mechanism to add a parallel relationship via POST (this is the part that's in conflict with the spec).

I found this issue while attempting to author a profile for this mechanism, but realized it was in conflict with the spec. This made me think that it might actually be more valuable to add support for parallel relationships in the base spec and not in a profile since it's so closely related to core functionality.

So, what's my proposal?

Add an arity member to a resource identifier object with the following language:

A “resource identifier object” MUST contain an arity member when it is in an array that includes another resource identifier object with the same type and id values. The arity member MUST NOT be present if the resource identifier object is unique within the array.

  • The value of the arity member MUST be a non-negative integer.
  • The value of the first occurrence of an arity member MUST be zero for each set of resource identifier objects in the array with shared type and id values. For each subsequent resource identifier object within that set, the value MUST be incremented by one.
  • The arity member MUST NOT be used to indicate an order for resource identifier objects in an array.
  • When an array of resource identifier objects is reordered, the value of each arity member MUST be reindexed.

For example, it is possible for two resource identifier objects to have the same arity as long as they do not share the same type and id value; unrepeated resource identifier objects will not have an arity member; and arity does not correspond to array order:

{
  "data": [
    {"type": "color", "id": "red", "arity": 0},
    {"type": "color", "id": "blue", "arity": 0},
    {"type": "color", "id": "green"},
    {"type": "color", "id": "red", "arity": 1},
    {"type": "color", "id": "red", "arity": 2},
    {"type": "color", "id": "blue", "arity": 1}
  ]
}

This is proposal is exactly what we have been doing successfully in our own implementation (except we put arity under the meta member). In doing so, we've worked all the gritty details out for complete CRUD support too. More information here: https://www.drupal.org/node/2992674

@wimleers
Copy link
Contributor

wimleers commented Jan 3, 2019

Great write-up, thanks @gabesullice!

I think the third and fourth bullet (both talking about ordering) could be clarified: the third says arity does not indicate ordering, the fourth then does talk about ordering.

One thing that hasn't been mentioned yet: @FlorentTorregrosa opened #1156 about this ~1.5 year ago. We implemented the recommendation that @beauby made at #1156 (comment).

@gabesullice
Copy link
Contributor Author

gabesullice commented Jan 3, 2019

I think the third and fourth bullet (both talking about ordering) could be clarified: the third says arity does not indicate ordering, the fourth then does talk about ordering.

I'm struggling to think of a way to clarify if, maybe you can help?

In the third bullet, what I'm trying to say is that arity is not to be confused with the concept of "weight". IOW, that you can't use arity to reorder items in an array.

In the fourth bullet, I'm trying to indicate that arity isn't a mutable or storable value, it's deterministic and computed according to the order of the resource identifiers in the array. IOW, arity is just supposed to be used to disambiguate repeated resource identifiers, I thought "reindexing" would force clients and servers to implement it that way.

@gabesullice
Copy link
Contributor Author

I updated the example, perhaps that's enough?

@wimleers
Copy link
Contributor

Just stumbled upon #1332, which is definitely related. See
#1332 (comment).

@wimleers
Copy link
Contributor

I think the fourth bullet conflicts with
https://jsonapi.org/format/#document-resource-object-linkage, which says:

Note: The spec does not impart meaning to order of resource identifier objects in linkage arrays of to-many relationships, although implementations may do that. Arrays of resource identifier objects may represent ordered or unordered relationships, and both types can be mixed in one response object.

That explicitly says that the order may be meaningful to the server, but it's up to the server implementation.

Third bullet: clarify the order aspect in relation to elsewhere in the spec

*The arity member MUST NOT be used indicate an order for resource identifier objects in an array, since the spec does not impart meaning to order in linkage arrays, although specific implementations may do that. arity is independent of order.

Fourth bullet: do we really need it?

Why does the arity, if present, need to be monotonically increasing with parsing order of resource identifier objects? Does this truly make a meaningful difference to the client? Why?

In other words: Why even have the fourth bullet?

@gabesullice
Copy link
Contributor Author

@wimleers, I think you're misinterpreting bullets 3 and 4. Did you see my comment about them?(#1361 (comment)). It's also possible I don't understand what you're trying to point out.

@gabesullice
Copy link
Contributor Author

Just stumbled upon #1332, which is definitely related. See
#1332 (comment).

I do not see the connection.

@ethanresnick
Copy link
Member

Something similar has been considered a few times.

The tricky part with proposals like this is always: how do you update the attributes associated with each resource identifier object (i.e., the data that you put in meta in the OP)?

This question inevitably comes up because the whole point of allowing the same resource identifier object to appear multiple times is to associate different metadata with each occurrence; if you only cared about representing the number of times that a given resource is included in a relationship, you would actually just include the target resource's type/id pair once, with meta.arity holding the total count for the number of times that that resource is in the relationship. (This was actually @beauby's original proposal, iiuc.)

So, again, any proposal like this needs some way to update the attributes associated with each resource identifier object. And, to do that, clients need to have some way to identify the identifier object they want to target for an update.

In your proposal, iiuc, the client does that by sending the arity value with a PATCH/DELETE request. And that works because type/id/arity together form a unique identifier for the object. The problem with this is that arity is not a stable value; if Client A deletes an occurrence of the resource from the relationship, then many of the arities will shift, and Client B may end up updating the wrong identifier object's attributes by mistake.

Sure, you could use some form of optimistic concurrency control to prevent this, but that probably shouldn't be a requirement; instead, a good design would probably allow the server to assign a stable id to each identifier object.

And then another question comes up: what if you want one of the pieces of data associated with your identifier object to actually be a pointer to another JSON:API resource? E.g., in your team members example, what if there are role resources (with, e.g., descriptions about the role, eligibility criteria, etc.), and you want role to be a pointer to one of those?

At this point, I personally start to think "Gee, these identifier objects are becoming a lot like standard resource objects".

I'm gonna write a longer comment about this soon, but a key thing I've been thinking about is "What should have identity in JSON:API?"

Key point from that link:

By [an] identity I mean a stable logical entity associated with a series of different values over time... Note that by identities I don’t mean names (I call my mother Mom, but you wouldn’t).

So, for this discussion, an identity is an entity that has a state, which is its value at a point in time. And a value is something that doesn’t change. 42 doesn’t change. June 29th 2008 doesn’t change... Even aggregates are values. The set of my favorite foods doesn’t change, i.e. if I prefer different foods in the future, that will be a different set. [At that point, though, an identity representing my favorite foods would take on that new set as its value.]

Once we start talking about having mutable attributes on resource identifier objects, at a fundamental level, we're talking about turning resource identifier objects from values into identities.

Personally, I think JSON:API already has way to many distinct concepts -- look at the mess around links vs. relationships -- so I'm loathe to add another one when a similar one already seems to exist. In this case, I think that's resource objects, which already are JSON:API's key identities (they're named by their type/id, with changing fields over time).

~~

You say that modeling these as standard resource objects is challenging technically or raises performance issues, but I don't really understand that part. Can you explain it to me?

If you already have a facility to store and update the metadata, it would seem that whether these show up as identifier objects or as distinct resources is more a question of serialization logic than data storage or anything else. I.e., I'm imagining you have a table like:

team_members
------------------------------------------------------------
| id  | team_id | member_id | role | start_date | end_date |
| ...   ....      .......      ...    .........   .......  |
------------------------------------------------------------

Can't that just as easily become:

{
   "type": "team-members"
   "id": team_members.id,
   "attributes": {
     "role": team_members.role,
     "from": team_members.start_date,
     "until": team_members.end_date
   },
   "relationships": {
     "team": { 
        data: { "type": "user", "id": team_members.team_id }
      }, 
      "member": {
        "data": { "type": "user", "id": team_members.member_id } 
      }
   }
}

Maybe the tricky part here is now for the client, which needs to add an extra ?include to see the user id? But honestly, that doesn't seem to bad to me.

What I do think would be very useful is a profile that standardizes this pattern of "lifting up" identifier objects into full resource objects to attach attributes to them. So, for example, the profile might say that these resource objects should always use the same type value (or have a specific key in meta) and should always have at least two relationships — e.g. owner and target — that point to the two sides of the relationship, or something like that.

@gabesullice
Copy link
Contributor Author

gabesullice commented Jan 24, 2019

Thanks for the detailed reply! I'm only able to respond to it in part ATM.

In your proposal, iiuc, the client does that by sending the arity value with a PATCH/DELETE request.

Partially correct. For DELETE, yes. For PATCH there's no reason to care particularly about arity except to disambiguate any parallel relationships in the request document. There's no real need to identify relationship values on the server since all values in the relationship are replaced in a relationship PATCH request.

The problem with this is that arity is not a stable value; if Client A deletes an occurrence of the resource from the relationship, then many of the arities will shift, and Client B may end up updating the wrong identifier object's attributes by mistake.

This is true, but partially mitigated by the fact that in this proposal each set of parallel relationships have their own series of arities. IOW, if Client A deletes a resource identifier that is not parallel to the relationship that Client B is updating, there would be no cross-effect. What you say is true if the relationships are parallel though and I admit this falls down without either a stable identifier or Etag + If-Match.


It seems concurrency control is a recurring impediment (I've seen it mentioned in a couple issues now). The obvious answer seems to be to recommend the use of the Etag + If-Match headers, but I guess there must be some reason this has not been considered suitable in the past?

UPDATE: Looks like there was a lot of discussion about Etag and If-Match in #600 and #824, I'll read those.

@jelhan
Copy link
Contributor

jelhan commented Oct 22, 2022

The request to support attributes for relationships has come up multiple times. Often meta has been used to workaround the missing support. One example for doing so is given in this issue.

Parallel relationships seems to be a special case of this. I haven't seen an use case yet for parallel relationships without having additional attributes for each relationship.

Going forward, I assume that parallel relationships are only needed if having attributes on the relationship. Please speak up if you have an use case for parallel relationships, whoch does not require having attributes on the relationship.

@ethanresnick well summarized the challenges if supporting attributes pee relationship. And described the alternative approach to use an intermediate resource instead. I agree with that.

I would tend towards explicitly forbidding parallel relationships to be honest. One could argue that doing so would be a clarification. And not a breaking change. Processing rules for POST requests to relationship links indicate that not allowing parallel relationships was always the intend of the specification.

@dgeb Would be great to hear your opinions.

@dgeb
Copy link
Member

dgeb commented Oct 24, 2022

@dgeb Would be great to hear your opinions.

I think we need to tread very carefully here. The spec's constraints regarding resource identity (type + id) are pretty core to its simplicity and drive every implementation of the spec. The introduction of lid in v1.1 was quite narrow in that it's only allowed in a document for resources that do not already have an id assigned. Introducing an additional relationship-specific field, such as arity, to distinguish otherwise identical members of a relationship risks muddying the simplicity at the core of resource identity.

In general, I agree with the impression that @ethanresnick expresses above:

At this point, I personally start to think "Gee, these identifier objects are becoming a lot like standard resource objects".

I also have concerns about the stability of the arity field, which @gabesullice mentions:

It seems concurrency control is a recurring impediment (I've seen it mentioned in a couple issues now).

So I am definitely not in favor of supporting a new field such as arity without a lot more study, preferably in an extension. Even then, I think it requires so much of implementations that it should probably remain as an extension.

I would tend towards explicitly forbidding parallel relationships to be honest. One could argue that doing so would be a clarification. And not a breaking change. Processing rules for POST requests to relationship links indicate that not allowing parallel relationships was always the intend of the specification.

@jelhan I agree in spirit with this because, in my experience, parallel relationships were not explicitly considered to be valid at the time of writing the original spec. However, at this point we may only have the option to restrict their usage (SHOULD NOT) rather than explicitly prevent it (MUST NOT). This feels more fair to those who have found edge cases that use these relationships and for those who want to explore an extension.

I am also curious to hear from others in the community, especially since this topic was raised a few years back.

@gabesullice
Copy link
Contributor Author

gabesullice commented Oct 25, 2022

Hey! Received a few notifications for this. +1 to not adding my original proposal to the spec and +1 to the @dgeb's hesitance to forbidding parallel relationships at this point with "must not".

I still struggle with the idea of recommending that they be modeled as separate resources. I'll provide a bit more context from the Drupal-perspective:

Drupal has a concept of "fieldable entities" (models with attributes that can be added/removed at runtime). It also has a concept of file entities. File entities store metadata about files Drupal knows about such as: the owner, a description, label, etc. Finally, Drupal has an image field (that can be added to the aforementioned fieldable entities). Image fields store a file ID and alt text for the referenced image. They can also have more than one value (e.g. for multiple images on a product).

It is possible to reference the same image file entity twice, with different alt text. And that's where the snag lies: Should Drupal's JSON:API implementation have entirely separate resources for the relationship between a product and its photos?

Honestly, in retrospect, maybe it would have been worth implementing them as resources. In practice, I've never encountered a bug report about concurrent update problems with the arity solution that we did implement.

I think a recommendation (using RECOMMENDED) with a note explaining the pitfalls may be the best outcome here. It may have pushed us in a better direction.

Finally, I agree with @dgeb that an extension is the right approach for mutable, parallel relationships.

@dgeb
Copy link
Member

dgeb commented Oct 25, 2022

Thanks so much for weighing in here, @gabesullice! This use case is quite interesting.

It is possible to reference the same image file entity twice, with different alt text. And that's where the snag lies: Should Drupal's JSON:API implementation have entirely separate resources for the relationship between a product and its photos?

Honestly, in retrospect, maybe it would have been worth implementing them as resources. In practice, I've never encountered a bug report about concurrent update problems with the arity solution that we did implement.

Yeah, I probably would have chosen to model these as separate resources, especially if alt text were mutable. This would also provide room for other product-photo-specific fields to be accessed in a standard way.

It occurs to me that another way to think about arity is that it could effectively be joined with id to form a new unique id that makes a related resource addressable as its own unique resource. For instance, a photo with an id of 123 and an arity of 2 could be referenced as a product-photo with an id of 123-2. This kind of synthetic resource does not necessarily need to be represented anywhere other than the API layer.

@jelhan
Copy link
Contributor

jelhan commented Oct 25, 2022

Thanks a lot @gabesullice and @dgeb for sharing your thoughts. I think we have an agreement. I will try creating a PR implementing it the next days.

@jelhan jelhan linked a pull request Oct 28, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants