Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variadic Function Arguments #1

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

rhdunn
Copy link

@rhdunn rhdunn commented Oct 15, 2018

This describes the variable function arguments syntax extension discussed on the expath group.

Copy link
Member

@michaelhkay michaelhkay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A variadic function parameter has at least one item" - I don't understand what this is trying to say. Are we talking about function declarations or function calls? Static or dynamic?

"that function defines a set of functions" - we seem to be using the word "function" here in two different senses. Needs greater precision.

If a function is declared with one argument, and that argument is variadic, and the type is a type that allows arrays (for example item()), and we pass an empty array [], I can't see how we distinguish whether we are passing [] as the value of the first argument, or whether we are passing no arguments.

The automatic coercion of sequence to array makes this especially problematic.

@rhdunn
Copy link
Author

rhdunn commented Oct 16, 2018

"A variadic function parameter has at least one item" - I don't understand what this is trying to say. Are we talking about function declarations or function calls? Static or dynamic?

This was meaning that the definition of fn:concat (which has at least 2 parameters), would be defined as:

declare function fn:concat(
    $arg1 as xs:anyAtomicType?,
    $args as variadic array(xs:anyAtomicType?)
) as xs:string external;

Here, $args is being interpreted as being at least 1 parameter in the proposal as currently worded. Thus, fn:concat#1 and fn:concat(1) would not be valid when using the above function declaration. Thinking about this more, I think it makes sense to allow no items to bind to the variadic argument list, such that fn:concat#1, and fn:concat(1) would be valid, and fn:concat would need to be defined as:

declare function fn:concat(
    $arg1 as xs:anyAtomicType?,
    $arg2 as xs:anyAtomicType?,
    $other-args as variadic array(xs:anyAtomicType?)
) as xs:string external;

I will update the proposal to use the latter behaviour (allowing no arguments to be passed to a variadic parameter).

"that function defines a set of functions" - we seem to be using the word "function" here in two different senses. Needs greater precision.

The former is the function declaration, the latter is the resulting (function, arity) pairs that are effectively added to the statically known functions. I am saying effectively here as it is impractical to add an undetermined number of functions to the static context at the point the function is declared. XQuery processors already handle this with fn:concat (and any other variadic functions they support) when they are referred to by a named function reference.

I will update the proposal to clarify the wording and behaviour.

If a function is declared with one argument, and that argument is variadic, and the type is a type that allows arrays (for example item()), and we pass an empty array [], I can't see how we distinguish whether we are passing [] as the value of the first argument, or whether we are passing no arguments.

The automatic coercion of sequence to array makes this especially problematic.

That is a good point. The intention behind this was to make it easier to write recursive functions that call the function with 1 fewer argument each step in the recursion.

I think it would make sense for this proposal to remove that logic and just stick to the mapping from the function declaration to the statically known functions. This would mean that given:

declare function fn:concat(
    $arg1 as xs:anyAtomicType?,
    $arg2 as xs:anyAtomicType?,
    $other-args as variadic array(xs:anyAtomicType?)
) as xs:string external;

then fn:concat#2, fn:concat#3, fn:concat#4 would return named function references to functions with 2, 3, and 4 parameters respectively, where each parameter has the type xs:anyAtomicType. Calling any of these named function references will call the variadic fn:concat function declared above.

I'll update this proposal accordingly. I will also consider writing a separate proposal to support the ability to "unpack" sequences or arrays to function arguments like can be done in other languages.

Thanks for the review.

@rhdunn
Copy link
Author

rhdunn commented Oct 17, 2018

I have updated the proposal to address your feedback, and to improve the wording.

@michaelhkay
Copy link
Member

One way to allow sum([1,2,3]) as well as sum(1,2,3) would be to have a function expath:aggregator(f) (find a better name if you can) which takes the variadic function as input and produces the array-accepting function as its result.

@michaelhkay
Copy link
Member

michaelhkay commented Oct 17, 2018

And here's another idea. Declare the function as declare %variadic(sum) function sum-array(array(xs:numeric)) as xs:numeric {....}, with an implementation that assumes an array is supplied; the annotation %variadic(sum) causes functions sum#1, sum#2, etc to be added to the static context with signatures sum(xs:numeric), sum(xs:numeric, xs:numeric) etc. (No new syntax!)

@rhdunn
Copy link
Author

rhdunn commented Oct 17, 2018

I'm not immediately opposed to using an annotation to specify variadic arguments. There are some issues that would need to be resolved for me to support it fully:

  1. Annotations are only supported in XQuery, so will not be usable in XPath inline function expressions.
  2. Annotations only allow Literal values, so does not support EQNames. This means it would need to be specified as a string and parsed as an EQName like string-based PITests. This makes it harder to spot errors when writing a variadic function and could cause confusion with people trying to write it without quotes.
  3. How do you support making the variadic version public but the non-variadic version private (e.g. when defining fn:concat or other vendor-specific variadic function)?
  4. How are NCNames expanded?
  5. Would QNames and URIQualifiedNames be supported?
  6. How does this handle variadic functions that have other non-variadic parameters (e.g. the first one/two arguments of fn:concat, or the various vendor-specific functions that have variadic parameters)?

One possibility would be something like:

declare %variadic function fn:concat(
    $arg1 as xs:anyAtomicType?,
    $args as xs:anyAtomicType?
) as xs:string external;

Here, the last parameter would be variadic and add fn:concat#2, fn:concat#3, fn:concat#4, etc. to the static context with signatures fn:concat(xs:anyAtomicType?, xs:anyAtomicType?), fn:concat(xs:anyAtomicType?, xs:anyAtomicType?, xs:anyAtomicType?), etc.

This would still mean that the solution would be XQuery-specific, and does not provide a mechanism to access the variadic arguments. Access to the arguments could be done via expath:variadic-arguments($args) or similar, which would return an array of the variadic arguments passed to the %variadic function -- this is similar to how variable arguments are accessed in C. To follow C semantics, this would be the arguments after $args, i.e. not including $args.

The parameters to array behaviour is how languages like Java and C# implement variadic arguments (and C++ for variadic template types).

@adamretter
Copy link
Member

adamretter commented Oct 21, 2018

My 2 Cents

Hmmm... I want to think a bit about the initial impetus for this feature and its real use-cases.

As I understand it, @rhdunn took inspiration from fn:concat.

I want to try and make the argument that in my opinion fn:concat is badly designed and its approach should not be followed. My intention is not to cause offence to anyone who was involved in its design. Rather I want to explain how I think a simpler design would have served equally well, and then raise the question of whether we really need variadic arguments.

fn:concat

fn:concat($arg1 as xs:anyAtomicType?,
                 $arg2 as xs:anyAtomicType?,
                 ...	) as xs:string

This function accepts two or more xs:anyAtomicType arguments and casts each one to xs:string. The function returns the xs:string that is the concatenation of the values of its arguments after conversion. If any argument is the empty sequence, that argument is treated as the zero-length string.

The fn:concat function is specified to allow two or more arguments, which are concatenated together. This is the only function specified in this document that allows a variable number of arguments. This capability is retained for compatibility with XML Path Language (XPath) Version 1.0

My problem with fn:concat

Firstly, I might be educating only myself here, but the part capability is retained for compatibility with XML Path Language (XPath) Version 1.0, seems to be key. XPath 1.0 had no notion of sequences, and so presumably the authors of fn:concat needed a way to allow N parameter(s) to concat, but didn't have the luxury of using a sequence. So they got creative and used .... The ... syntax, and behaviour is very similar to Java's Varargs.

The part that confuses me in the current design of fn:concat though is they state:

This function accepts two or more xs:anyAtomicType arguments

However, they allow those arguments to be empty sequences! Why? Convenience I guess, so you don't have to worry about having a value or an empty sequence.

Interestingly, the evaluation of fn:concat((), ()) is "" (i.e. an empty string).

It seems to me that without changing either the call-sites or the evaluation behaviour, they could have had the simpler definition of:

fn:concat($args as xs:anyAtomicType?...) as xs:string

Without Backwards Compatibility

If we pretend that backwards compatibility had not been an issue, we might probably imagine that the authors would not have needed Varargs, and instead designed fn:concat using a sequence parameter:

fn:concat($args as xs:anyAtomicType*) as xs:string

So that is nicer as we have not had to do some hand-waving, i.e.: "the only function specified in this document that allows a variable number of arguments", without actually defining a variable arguments concept in the spec.

But we can't do that if we want backwards compatibility, and we do want that. We don't need to do anything about fn:concat, it is already in the spec and is not going away.

Do we need varargs at all?

I don't see the use-case for varargs as being framed by the heritage of fn:concat. However for arguments sake, let's say that fn:concat had been designed in the non-backwards compatible way using a sequence, would we be happy with it?

To call the sequency fn:concat we would need to use an explicit sequence constructor unless we were passing by reference, e.g.

fn:concat( ("a", "b", "c") )

All those parentheses don't look very nice now do they! Surely we could define something in spec to help us?
Ah! Perhaps this is what variadic function arguments are needed for?

Two approaches jump to mind:

  1. An implicit approach. Where we define some spec that says something like - "For functions that have a parameter in the last position of their parameter list which is a sequence, that sequence may be specified at the calling site as either a sequence or as multiple arguments".

  2. An explicit approach. We define the syntax for ... to represent Varargs in the same way as Java and C. Varags can only be the last parameter in the parameter list.

There is little difference between (1) and (2), the outcomes are the same. Both will lead to situations during static analysis where looking up a function-call to find the defined function-signature leads to ambiguity.

The question is, how do we handle that ambiguity? Likely we take the "road well travelled" and raise a static error in exactly the same way that Java or C does. Other options are available ;-)

So far I have assumed that Varargs is just a sequence. Sure there is the argument that with Varargs we might need sequences of sequences, etc, but I see that as a later discussion.

Conclusion

Is there a strong use-case for Varags? I am not personally convinced.

Is calling something like fn:concat( ("a", "b", "c") ) really that bad? Personally, I don't think so. For beginners, maybe, but then it would force them to get to grips with sequences sooner, which can only be a good thing.

@michaelhkay
Copy link
Member

michaelhkay commented Oct 21, 2018

Treating an empty sequence as a zero-length string is something that all the XPath 1.0 string-handling functions do, and it makes eminent sense in the case of concat(), though I'm less comfortable with it elsewhere (e.g. substring()). You also seem to be commenting on the fact that concat() was defined to take two-or-more strings rather than zero-or-more. I remember we had a debate about that during the 2.0 development and it was a classic "orthogonality is good" versus "is there a use case?" debate.

@rhdunn
Copy link
Author

rhdunn commented Oct 21, 2018

The Motivation

The motivation for this was based on fn:concat and vendor-specific functions that I mentioned in the last update. Specifically:

  1. fn:concat -- W3C, variadic over xs:anyAtomicType?
  2. out:format -- BaseX, variadic over item()
  3. xdmp:apply -- MarkLogic, variadic over item()*
  4. sem:coalesce -- MarkLogic, variadic over item()*

Thinking in terms of annotations, there are also annotations that take variadic arguments:

  1. rest:consumes / rest:produces -- EXQuery, variadic over xs:string
  2. rest:cookie-param, rest:form-param, etc. -- EXQuery, variadic over xs:string

The motivation then is to be able to specify these using valid XPath/XQuery syntax.

Requirements for a Variadic Argument Proposal

  1. MUST be able to specify the existing built-in variadic functions in use.
  2. MAY be able to specify existing annotations in use.
  3. MUST support type checking of each variadic argument passed to the variadic function.
  4. MUST support named function references to any sized reference
  5. MUST allow sequences (empty, or any length) to specific arguments -- this is for MarkLogic's xdmp:apply and sem:coalesce

The Design

For the design, I took inspiration from Java that maps the variable arguments into an object array. The variadic arguments needed to support nested sequences at a given parameter, and supply an optional type for the variadic parameters. This is how I ended up with the syntax I proposed.

@michaelhkay
Copy link
Member

As for the general question, is a general varargs mechanism useful, I think the answer is probably yes, so long as it can be done very cleanly. concat() is a kludge; eliminating the kludge would be good; extending the kludge so that it affects more of the language would not be good.

@adamretter
Copy link
Member

adamretter commented Oct 21, 2018

You also seem to be commenting on the fact that concat() was defined to take two-or-more strings rather than zero-or-more. I remember we had a debate about that during the 2.0 development and it was a classic "orthogonality is good" versus "is there a use case?" debate.

@michaelhkay Unless I missed something, fn:concat doesn't take two or more strings. Each of the first two arguments has a cardinality of zero-or-one.

As each of those arguments can be an empty sequence, it seems to me that fn:concat as it is currently defined actually takes zero or more strings. For example, fn:concat( (), () ) is perfectly valid with the current spec and involves no string arguments. Maybe I missed something about how implicit conversions are done?

@adamretter
Copy link
Member

adamretter commented Oct 21, 2018

@rhdunn I appreciate you restating your motivation and your requirements, I think that is helpful :-)

I am a little uneasy about pointing to annotations as an example for variadic parameters though. Annotations don't have parameters, rather they have explicit literal values given in a list.

@michaelhkay
Copy link
Member

@adamretter, the fn:concat() function requires two or more arguments each of which is converted to a string by applying the string() function. So after this conversion, it takes two or more strings.

There's no logical reason why it should not allow 0 arguments or 1 argument, but the WG decided not to allow that for lack of a use case. Which might have been a reasonable argument at the time, but feels flawed now that we have constructs like fn:function-lookup() and fn:apply() that would be simpler and cleaner if fn:concat#0 and fn:concat#1 existed.

@rhdunn
Copy link
Author

rhdunn commented Oct 21, 2018

@adamretter I'm happy to defer the specification of variadics on annotations -- I was thinking about it in terms of my annotation declaration proposal, but we could talk about them in that proposal.

@michaelhkay
Copy link
Member

I think I would go for the following design:

  • the syntax of the function declaration adds an optional * after the parameter name to indicate that the parameter may occur zero or more times: for example declare function f:product($val* as xs:numeric) as xs:numeric. The * can appear only on the last parameter; there can be preceding parameters. (Note: this is partly with a view to also allowing $val? to make the last parameter optional).

  • the equivalent XSLT syntax is <xsl:param name="val" as="xs:numeric" repeatable="yes"/>

  • the effect of the declaration is to add an infinite set of functions to the static context with arity F, F+1, F+2, ... (up to the limit defined by the implementation) where F is the number of parameters preceding the variadic one.

  • the caller can invoke any of these functions by supplying the requisite number of arguments.

  • if a generic call supplying an array (or sequence) is required, then (a) the designer of the function library can provide a different function that expects an array (or sequence), or (b) if they fail to do so, the user can construct such a function by combining the capabilities of fn:function-lookup() and fn:apply(). Alternatively we could provide a function expath:variadic(function()) as function() which takes any one of the infinite set of declared functions as input and returns the generic (array-accepting) function as its result.

  • The implementor of the function writes the function body as if the argument $x* as xs:numeric were declared as $x as array(xs:numeric).

@fgeorges
Copy link
Member

I like the idea of allowing $val? as well. I often have something like the following:

declare function f() { f(()) };
declare function f($un) { f($un, ()) };
declare function f($un, $deux) { f($un, $deux, ()) };
declare function f($un, $deux, $trois) { concat($un, $deux, $trois) };

which could be replaced with (at least if the default values are always () as above):

declare function f($un?, $deux?, $trois?) { concat($un, $deux, $trois) };

@adamretter
Copy link
Member

the syntax of the function declaration adds an optional * after the parameter name to indicate that the parameter may occur zero or more times: for example declare function f:product($val* as xs:numeric) as xs:numeric

So I like the idea, but I would request different syntax. I think things like f:product($val* as xs:numeric*) will lead to confusion. Think of the poor users who put their * in the wrong position ;-)

@fgeorges
Copy link
Member

@adamretter What about something inspired from C++ then, like the following?

declare function f($un, $deux, $trois, ...) { 'result' };

$trois is a variadic argument, because of the , ... (it is not possible to use the notation $trois..., as dots are legal characters in QNames). And then for the optional arguments, it would be those with an explicit default value:

declare function f($un := 1, $deux := '2', $trois, ...) { 'result' };

@michaelhkay
Copy link
Member

michaelhkay commented Oct 24, 2018

My only reservation about this is that it feels as if $trois must occur 1-to-many (or even 2-to-many) times, rather than 0-to-many. Otherwise it seems OK. And I agree, the asterisk is heavily overloaded already.

How about declare function f($un, $deux, $trois ...) { 'result' }; with mandatory whitespace?

@ChristianGruen
Copy link
Member

How about declare function f($un, $deux, $trois ...) { 'result' }; with mandatory whitespace?

+1. I liked the (potential) idea of making the last parameter optional via $trois?; but I also believe it could be too much of a burden for unexperienced users. And the three dots are well-established.

@fgeorges
Copy link
Member

@michaelhkay Sounds good. The space would then not be mandatory per se, but only as a consequence of having 2 different tokens out of the lexer (which would not be the case in case of $arg... with not space, but would be OK with $arg as item()...)

@rhdunn
Copy link
Author

rhdunn commented Nov 10, 2018

I have updated the proposal to reflect the proposed ... syntax. The text is available at https://github.com/expath/xpath-ng/blob/0dded843cf1e7e21d357c9360bf5faf5b9e1e129/variadic-function-arguments.md.

…the function arity; add an ArrowFunctionSpecifier example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants