Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Object method message passing syntax #1160

Open
mhermier opened this issue Mar 22, 2023 · 28 comments
Open

[RFC] Object method message passing syntax #1160

mhermier opened this issue Mar 22, 2023 · 28 comments

Comments

@mhermier
Copy link
Contributor

mhermier commented Mar 22, 2023

Hi,
One of the last features that retains me from updating #1006, is that I lack a way to implement method invocation. While ideally it should look like:

some_method_mirror.call(receiver, args)

there are some technical issues that forbid to do that in a module (basically I need at minimum a primitive or language support). And that issue makes me struggling since a while.

One idea that came to my mind (after viewing a video about OOP today), would be to add language support for that feature.

I think about 2 ideas for the syntax:

var symbol="call(_,_)"

// @ is a placeholder token, but I like it or maybe `@@`
receiver@("call()")
receiver@(symbol, arg0, arg1)
// or
receiver."call()"(args0, arg1)
receiver."%(symbol)"(args0, arg1)

Implementation wise, it would more sense to have the symbol string as the last parameter. So the opcode would not need to shift all the arguments on the stack after resolving the symbol index. But these syntax feels quite natural.

With the first syntax, since the symbol is evaluated as a regular value, it must be inside the parenthesis to avoid parsing hell problems with mixed getter/setter syntax shown in the following example:

receiver@foo.getMirror().toString(arg0, arg1) // That cannot be parsed
receiver@(foo.getMirror().toString, arg0, arg1) // Parse nicely

The second syntax has the advantage of not consuming a token for that feature. But cost the allocation of a string in what I consider it the more general usage. edit Thinking at it again the syntax is viable, by optimizing the string interpolation implementation and String.+(_) (which are really under optimized right now...)

edit: It should help to solve a not solvable problem of the constructor syntax. Because of the way constructors are implemented, you can't call a constructor named differently. With super, we end up having chains of construct new which defeat the point of having named constructor. It is a secondary usage that I would not recommend, and use super where possible, but it at least gives a solution to that problem.

@PureFox48
Copy link
Contributor

If we need language support for this then, to my mind, the nicer syntax of the two would be:

receiver@(symbol, arg0, arg1)

If #1154 lands, then @ would already be an available token in the language. I see no particular reason to double it even if the receiver were a class rather than an instance thereof.

Although I note what you say about having the symbol string as the last parameter, placing it first seems more natural to me.

@mhermier
Copy link
Contributor Author

If #1161 lands, second syntax will become viable. It would only cost a ".toString" call which should not allocate in the general case. So it is an overhead I'm willing to pay

@mhermier
Copy link
Contributor Author

Added a commend about how it can help with the constructor of different name issue.

@PureFox48
Copy link
Contributor

PureFox48 commented Mar 23, 2023

Yes, that's a point worth making as the current situation of not being able to directly call a parent constructor with a different name using super is unsatisfactory.

Another thing I noticed recently is that one can have both a constructor and a static method with the same signature - apparently the last one to be declared always wins!

@mhermier
Copy link
Contributor Author

mhermier commented Mar 23, 2023

This is by design, and is one of the things I really don't like. It defeat the point of named constructors in some way...

class Foo {
  construct foo() {} // That creates `static foo()` and  `"init foo()" so that it is not accessible by normal means
}

So what you are probably trying to do is:

class Foo {
  construct foo(...) { foo(...) }
  foo(...)
}

@PureFox48
Copy link
Contributor

There's actually a fix for it, namely #820, which is how I came to notice it in the first place.

Not sure why this wasn't merged into Wren 0.4.0 unless there's some problem with it.

@mhermier
Copy link
Contributor Author

For me that behavior is broken. It would make more sense to have them as a regular function. It would be callable from constructor by normal mean, and would allow to create some constructors and modifiers in one go.

@mhermier
Copy link
Contributor Author

mhermier commented Mar 24, 2023

I implemented the obj."method"(args) syntax, so far it works great. I still have to implement super."method"(args).

Thought there is something that I don't really like.

class Foo {
  static foo() { "bar()"() }
  static bar() { }
}

This should technically be valid, but it is too much complicated for the compiler, because of string interpolation.

Using the @ syntax allow to solve this, but I'm not pleased to have @ as binary left and unary operator implicit this.

class Foo {
  static foo() {
    @("bar()") // This feels more awkward to me than
    ."bar()"() // that
  }
  static bar() { }
}

I'll probably will not bother for wren, and only allow them with an explicit receiver.
But this is an extra argument for me to have . unary operator as implicit this in veery...

@PureFox48
Copy link
Contributor

Why don't you just specify that if the "bar()" syntax is used (or the @ syntax for that matter), then there must be an explicit receiver even if it's just this.

If nothing else, that will prevent people from mixing it up with ordinary strings.

@mhermier
Copy link
Contributor Author

For wren, I'll remain simple and impose explicit receiver.

The thing is that, by allowing the obj."method"(arg) syntax, it blurs the differences between a method name and a string. So it is reasonable to assume we can use that syntax in the other contexts, like the implicit this method call. Strictly speaking, unless I missed something, nothing prevent from adding this rule in the grammar (only the compiler do). But I found its interaction with name resolution to be disturbing (for the human eye), and it makes me reconsider the old idea of the unary left .. Visually, we see it is an implicit this call. In the normal case .call(), it allow the compiler to avoid round trips in name resolution, since that syntax makes the identifier an explicit method name.

@mhermier
Copy link
Contributor Author

I just thought about another syntax that is a mix of both:

receiver.@"method"(args)
@"method"(args) // implicit this

More interesting?

@PureFox48
Copy link
Contributor

I hadn't really thought it through before but if "method" could be a variable (m say) rather than a string lteral, then you're going to need syntax such as receiver.@"method"(args) for disambiguation purposes.

This is because if people could write receiver.m(args) then it might mean they were calling a method m in receiver's class which would therefore be ambiguous. Having to write receiver.@m(args) would remove the ambiguity.

@"method"(args) or @m(args) would also then be fine as the implicit receiver could be nothing other than this.

@mhermier
Copy link
Contributor Author

True, and interesting. Thought it would mean the support for full fledged expression would need to be added. So
receiver.@(expr.to.method)(args) would make sense, and the presence of extra '()' is an extra debate.
Yet, I would rely on string interpolation to provide some security and simplicity.

@PureFox48
Copy link
Contributor

Yeah, logically the method signature would have to be an arbitrary expression which evaluated to a string.

It starts to get a bit complicated though as you could have function calls and all sorts of stuff in that expression. Having said that, even if you restricted it to string literals you could still have arbitrary interpolated expressions within that!

As long as it's manageable, perhaps this sort of complication isn't a bad thing as it gives folks more freedom to define the method signature. However from an aesthetic viewpoint it's not so nice.

One thing you could do would be to restrict the expression following @ to being a simple variable. However, if you did that, then folks would need to assign the method signature expression to the variable before they could call the method so it would always be a two-stage process.

@mhermier
Copy link
Contributor Author

mhermier commented Mar 26, 2023 via email

@PureFox48
Copy link
Contributor

I'm not sure the average user of an embedded scripting language would be sufficiently versed in OO design to consider stuff such as the Visitor pattern but I agree it could make things simpler for those who are.

@mhermier
Copy link
Contributor Author

I'm debating to add the super version of it. It would be a very niche usage, on top of a very niche one. I have difficulties finding a justification for it, but for the sake of completeness...

@PureFox48
Copy link
Contributor

If it's not too difficult, I think I'd add it for completeness sake.

Incidentally, I'm pleased you mentioned the Visitor pattern as it gave me an idea for a new Rosetta Code task. Whilst I've used this pattern in the past in C#, I haven't used it in Wren so it was quite interesting to code some examples :)

@mhermier
Copy link
Contributor Author

With the implementation of the veery compiler, I'm completely drown in the deepest waters of the visitor pattern because of the AST. From the syntax checking, optimizations passes to serialization of the results, everything is the visitor pattern. To make things worse and a more optimal, you can't just traverse the tree per optimization passes. You have to traverse it at most once for every enabled checks and transformations, and ideally once per serialization, if it can't be blended with them XD

@PureFox48
Copy link
Contributor

Hmm, one thing I've found with several of the Gang of Four patterns is that they seem great in theory using simple examples. But when they collide with the reality of a complex application it can be a different story making one wonder whether it's better just to slog through repetitive (but simple) code and forget about the pattern.

@mhermier
Copy link
Contributor Author

@PureFox48 why did you not consider doing accept(visitor) { visitor.visitFoo(this) }?

They are conceptual tools that works great. But usually, they are not used as this, and are usually blend with many other considerations.

@PureFox48
Copy link
Contributor

If you're talking about the RC Java example, I did consider it but, as the actions were very simple, I decided instead to have a single method which checked the run time type of the element and then performed the associated action.

In a more realistic example, the actions would be more complicated and I'd then have separate methods for each element type.

@mhermier
Copy link
Contributor Author

Well, since it is somehow teaching material, I think you should reconsider. But, I'm not forcing you.

@PureFox48
Copy link
Contributor

PureFox48 commented Mar 27, 2023

The C# example uses different 'visit' methods for the Literal and Addition classes so i'm happy to leave the Java example as an alternative way of tackling the problem for a dynamic language.

There are, in any case, likely to be a wide range of task solutions as many of the languages used on RC are not OO at all. It will be interesting to see how they approach it.

@mhermier
Copy link
Contributor Author

About handling super, it is not very hard. But the final patch will introduce 34 new opcodes...

@PureFox48
Copy link
Contributor

Wow, that's a lot of extra opcodes!

Is the reason why you need so many because you're trying to get good performance?

In C# for example, the last time I looked, it was taking about 5 times longer to call a method via reflection as it was to call it directly. The devs (and by and large the users) just accept this penalty as it's not something that needs to be done very often.

@mhermier
Copy link
Contributor Author

No, it is to mirror how CALL opcodes are handled, so I followed the design. In theory we could only one opcode for all the call variations, but consuming configuration/mode bytes from the assembly seems very slow. Though in the process we could have more arguments. I'll implement it the normal way, but I'll give it another shot with configuration bytes again, since usually methods have less than 3 parameters, having those hardcoded should be enough for most cases.

@mhermier
Copy link
Contributor Author

I have the implementation done. It is integrated inside #1006 (because it obviously use it for method mirroring invocation). I can split both on demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants