Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undocumented behavior in the case of unions of typeDSL-ed optional types #626

Open
tom-tan opened this issue Nov 10, 2022 · 1 comment
Open

Comments

@tom-tan
Copy link
Member

tom-tan commented Nov 10, 2022

Here is a type definition of SaladRecordField#jsonldPredicate.

- name: jsonldPredicate
type:
- string?
- JsonldPredicate?

According to the spec of SALAD, string? and JsonldPredicate? are expanded with the typeDSL rule.
It says:

If the type ends with a question mark ?, the question mark is stripped off and the type is expanded to a union with null

The definition of union types are in the spec of Apache Avro.
It says:

Unions, as mentioned above, are represented using JSON arrays.

By naively interpreting the descriptions, the above type definition is processed to the following:

 - name: jsonldPredicate 
   type: 
     - ["null", "string"]
     - ["null", "JsonldPredicate"]

However, it violates the definition of SaladRecordField#type.

schema-salad-tool processes the above example to the following but this behavior is not documented in the spec of SALAD and Avro.

 - name: jsonldPredicate 
   type: 
     - "null"
     - string
     - JsonldPredicate

If it is correct, it would be nice if there is a description of this behavior.


IMO, the current behavior causes extra complexity to the parsers (especially the case of combining with $import that is replaced in the array node) while it only provides a tiny syntax sugar :-(.

@tetron
Copy link
Member

tetron commented Dec 5, 2022

The current behavior, as implemented, is to flatten nested arrays and remove duplicates.

if isinstance(datum2, CommentedSeq):
datum3 = CommentedSeq()
seen = [] # type: List[str]
for i, item in enumerate(datum2):
if isinstance(item, CommentedSeq):
for j, v in enumerate(item):
if v not in seen:
datum3.lc.add_kv_line_col(
len(datum3), item.lc.data[j]
)
datum3.append(v)
seen.append(v)
else:
if item not in seen:
if datum2.lc and datum2.lc.data:
datum3.lc.add_kv_line_col(
len(datum3), datum2.lc.data[i]
)
datum3.append(item)
seen.append(item)
document[d] = datum3

This was implemented as a "do what I mean" behavior, optional types are turned into unions, if you nest a union within a union, the most convenient/intended behavior is to flatten it to a single list. Are you proposing that it should throw an error instead?

I can provide a specification of the flattening/deduplication behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants