Feature: Added Methods #320

DawDavis · 2020-11-03T09:25:42Z

OVERVIEW

This PR aims to bring a receiver-alike functionality to Tengo, allowing for getters, setters, mutators, and more. Overall, the change makes it much easier to use some OOP principles in Tengo, much as the 'this' keyword does in certain JS applications.

A method declaration is syntactically defined as follows:

//derived from func (r *receiver) method (argument interface{}) {...}
method := func(receiver)(argument) { ... }

This is done to alleviate two major concerns I had with this feature:

The strange 'this' keyword functionality in JS would not be very idiomatic.
Golang methods require a declaration, but Tengo specifically avoids declarations.

Specifics

These proposed Tengo receivers do not carry a guarantee about their underlying type: It could be any index-able Tengo type, or undefined, should the method be called in a non-parent context. For example:

f1 := func(r)() { return r }

a := f1() // a = undefined

m := { f2: f1 }

b := m.f2() // b = m

You may notice that unlike JS, where 'this' contains some scope information no matter the context, Tengo's receivers very often carry undefined as their value.

Another important question is how this is different from a self-capturing closure. e.g:

m := { get: func() { return m }}

a := m.get() // a = m

at first glance, this serves a similar purpose, however the given relationship breaks down if you attempt to copy m.

m := { get: func() { return m }}
o := copy(m)
a := o.get() // a = m, even though we copied it.

This is because closures capture at definition time, whereas receivers capture at call-time. The same operation, when done with a receiver, acts as a method call:

m := { get: func(r)() { return r }}
o := copy(m)
a := o.get() // a = o

Implementation

The implementation of receivers is handled by altering the way free-vars are stored when a method is indexed out of a map or array. At parsing, a look-ahead is performed on any selector or index expressions, in order to see if they directly feed into a call operation. If they do, they mark themselves as being a method call.

At compile time, a select or index expression is then compiled to the OpIndex opcode, with a new flag set afterwards, akin to the spread flag, that indicates whether the given indexing operation should include its context info in the indexed object.

Then finally, at run time, when OpIndex is called with a method flag, it behaves as per usual, except that it attempts to cast the retrieved object as a compiled function, and then if it is successful, copies the function, sets freevar[0] inside of the function as itself (the indexed map/array), and then continues normally.

This copy prevents the given map context from escaping this particular function call.

It is also useful to note that the freevars in compiled functions are now always +1 in length, as the 0th item is reserved for the receiver implementation. If no receiver exists, a placeholder "" freevar is created in order to reserve the space. This is done to streamline the function calls at runtime.

Caveats

This feature does not extend to non-compiled methods, mainly because that would require a huge API change that I do not feel comfortable suggesting. Additionally, I questioned the usefulness of being able to move context out into Go-space.

Accessing methods in any way that is not an "index" or "select" operation results in an undefined receiver. To my mind, this is fine, mainly because those are the places where parent-access is most frequently required.

Continued extensions of this feature

I considered adding a bind keyword to the specification, which would allow a receiver to be permanently bound to a specific map/array.

m := { f: func(r)() { return r }}
fn := bind(m.f)
a := fn() // a = m, even though it was called without context

And while I believe this may be a useful feature, I do think its use is extremely niche.

Benchmarks:

Master:

-------------------------------------
fibonacci(35)
-------------------------------------
Result:  9227465
Go:      60.857359ms
Parser:  46.585µs
Compile: 131.165µs
VM:      3.346390935s
-------------------------------------
fibonacci(35) (tail-call #1)
-------------------------------------
Result:  9227465
Go:      82.520452ms
Parser:  21.779µs
Compile: 87.366µs
VM:      3.444464444s
-------------------------------------
fibonacci(35) (tail-call #2)
-------------------------------------
Result:  9227465
Go:      335ns
Parser:  20.929µs
Compile: 77.084µs
VM:      16.204µs

This Branch:

-------------------------------------
fibonacci(35)
-------------------------------------
Result:  9227465
Go:      63.743294ms
Parser:  44.97µs
Compile: 137.976µs
VM:      3.265088942s
-------------------------------------
fibonacci(35) (tail-call #1)
-------------------------------------
Result:  9227465
Go:      82.495663ms
Parser:  20.835µs
Compile: 128.924µs
VM:      3.373017762s
-------------------------------------
fibonacci(35) (tail-call #2)
-------------------------------------
Result:  9227465
Go:      305ns
Parser:  19.629µs
Compile: 79.299µs
VM:      15.721µs

geseq · 2020-11-03T14:53:31Z

Thanks for the PR. I'm gonna give this some thought.

At first glance I am comfortable with a feature like this, but I'm not entirely sure about the syntax for declaring the methods.

@d5 thoughts?

geseq · 2020-11-03T14:56:14Z

vm_test.go

+	m := { f1: func(r)() { return r == undefined}}
+	out = m.f1()`,
+	nil, false)
+}


Add tests for array as well

Thanks for pointing that out - it completely slipped my mind while writing the other cases.

I added a few that I think test the broadest categories of array operations. If there are any other unit tests needed, please let me know.

d5 · 2020-11-03T15:21:42Z

Thanks for the PR. I really like the documentation on this change. I thought a lot about adding "methods" in the past, and, I've explored Python-like syntax as well. I will definitely think more about this suggested syntax.

I'm curious, what would be the problems if we introduce a new language construct such as "this" or even "context" (maybe we can find an interesting approach that is similar to Go's context)? Just a quick thought.

EDIT: I totally misunderstood the suggested syntax. Lol. Still I will think more about it. 😄

DawDavis · 2020-11-03T15:30:04Z

Thanks for getting to this so quickly, both of you.

I hemmed and hawed over what to do in regard to the a keyword (Obviously it would be "this" I think, just due to convention). Several of my prototype versions used a keyword like that, but it just never felt go-like. The context idea is neat - it could perhaps even be an array of context, with each context on top of the other?

(That would be more difficult to implement, but I did consider it. There was always the question in my mind about how useful that would end up being.)

A ctx object would be interesting though... especially if the most "root" object contained the context from golang.

I'm not certain either way. I landed on this syntax as a kind of call-back to how golang tends to do things.

EDIT: Oh and thanks for the compliment on the documentation changes - I spent a lot of time trying to make sure it was of the same calibre as the rest of Tengo's docs.

geseq

LGTM overall. Just two comments. An array example in the documentation would be nice. That said, should we have receivers on arrays at all. I suppose it simplifies things a little on the OpIndex, but it seems counterintuitive.

DawDavis · 2020-11-03T20:51:27Z

@geseq Yeah, I considered omitting them from the specification, but because Tengo has no defined classes nor definitions, it made more sense to me to not artificially limit which collection scopes allowed receiver passing.

DawDavis · 2020-11-03T21:44:48Z

Syntax

I've been thinking about the method declaration syntax, and these are a few thoughts I've had on the subject. This is all just speculative brainstorming. I based my ideas off of this python snippet:

class Obj:
   def __init__(self, y):
      self.x = y

   def getX(self):
      return self.x
//...
o = Obj(23)
n = o.getX() //n = 23

Translated to Tengo-ish syntax:

newObj := func() {
   return {
      x: 23
      getX: func(self) {
         return self.x
      }
   }
}

n := newObj().getX() //n = 23

Pros:

The "self" keyword makes the receiver-nature of the method call obvious.

Cons:

Adding "self" to the function definition looks messy, and is a bit obtuse.

If we went with a standard keyword, self should be it IMO. It's more intuitive than the c++/java/JS/c# flavored this. However, we should not include it in the definition (too cluttery) - that said if we did both of those things, it then becomes a compile-time challenge as to how to figure out that a function is in fact a method. The obvious check, I suppose, would be to verify if the self variable is ever accessed while inside a function body, and then modify the FuncLitExpr in order to reflect that.

To avoid that, we could set up a new definition keyword to append to func or to replace func. This version may end up looking something like:

newObj := func() {
   return {
      x: 23
      getX: func method () {
         return self.x
      }
   }
}

n := newObj().getX() //n = 23

Here I am using method as a new keyword. I kinda hate this formatting: it's weird to actually see the word method in code... Maybe that's just a personal thing. And now that I think about it, it doesn't quite make sense to refer to this feature as a 'true' method, because the defined function is not actually bound to the parent map.

Perhaps changing func method to func& or func* would be slightly less jarring?

This format would likely open itself to having a self.parent hierarchy access syntax, which may be a nice-to-have.

However, this idea does make it less intuitive to capture a receiver via a closure.
e.g:

Current Branch

m := {
   grab: func(outer)() {
      return func(inner)() {
         return outer
      }
   }
}

f := m.grab()
f() // returns m

Possible Change with 'Self'

m := {
   grab: func*() {
      outer := self
      return func*() {
         return outer
      }
   }
}

f := m.grab()
f() // returns m

Personally, I think this is relatively niche, but I thought I'd mention it, due to how many hours have been lost figuring out what this means in a JavaScript context. If Tengo can prevent that sort of head-scratching at the syntax level, I think it might be worth it.

Overall though, If we don't go with the syntax proposed in the PR comment, I think that the above snippet is the second-best candidate.

d5 · 2020-11-08T20:31:45Z

I've spent some time to try this, but, I'm still not quite convinced if we should make this change. This is a significant change and naturally increases the maintenance costs by a lot (more bugs and edge cases) and makes the syntax a bit more alienating to Go coders in my opinion. But my main concern is maintainability to be honest.

I'm actually more interested in Go's context-like approach: is immutable, can be safely passed around, and, (like @DawDavis mentioned earlier) can be potentially useful if we use it to pass from actual Go context value.

DawDavis · 2020-11-08T23:39:42Z

@d5 , I have had half a mind to create a module that includes ctx stuff for Tengo, due to the project I upstreamed this PR from. Is that a direction that you're willing to explore with Tengo moving forward?

d5 · 2020-11-08T23:48:45Z

I'm sure there will be more details we will have to figure out, but, I think Go-like "context" construct is better because it's more easily understood and very commonly used by most Go coders. One obvious question is should that be explicitly declared? Or should we make it some kind of intrinsic value (or keyword)?

Bai-Yingjie · 2021-03-19T08:59:39Z

How about this approach?

fmt := import("fmt")

newCgroup := func(name) {
    cg := {name:name, pids:[]}

    cg.addPid = func(pid) {
        fmt.println("add ", pid, " to cgroup ", cg.name)
        cg.pids = append(cg.pids, pid)
        fmt.println("all pids:", cg.pids)
    }

    return cg
}

cg1 := newCgroup("cg1")
cg2 := newCgroup("cg2")

cg1.addPid(123)
cg1.addPid(456)

cg2.addPid(111)
cg2.addPid(222)
cg2.addPid(333)

cg1.addPid(789)

//output:
//add 123 to cgroup cg1
//all pids:[123]
//add 456 to cgroup cg1
//all pids:[123, 456]
//add 111 to cgroup cg2
//all pids:[111]
//add 222 to cgroup cg2
//all pids:[111, 222]
//add 333 to cgroup cg2
//all pids:[111, 222, 333]
//add 789 to cgroup cg1
//all pids:[123, 456, 789]

DawDavis added 11 commits October 30, 2020 13:21

added changes to support the v1 of 'this'

cb237b6

changed receiver to work via Free Vars.

ceca269

removed some erroneous changes

b38b350

fixed more erroneous changes

b63a989

added builin is_method

b7cc17c

documentation added for methods and builtins

4154aee

more documentation changes.

a79e373

altered the way that indexing is handled for methods

dec4d58

added intellij .idea/ to ignore; added parser tests

ac644ef

fixed ignore

af8da10

changed how free vars are assembled into closures

66a0227

DawDavis changed the title ~~Feature/add parent support~~ Feature: Added Methods Nov 3, 2020

fix a minor difference in the parser test

0187971

geseq requested changes Nov 3, 2020

View reviewed changes

added VM tests for methods in arrays

d8e8aed

DawDavis requested a review from geseq November 3, 2020 20:10

geseq reviewed Nov 3, 2020

View reviewed changes

added arrays to documentation, as per geseq request

37ff48c

DawDavis requested a review from geseq November 7, 2020 20:01

Update go.mod

f0f5431

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Added Methods #320

Feature: Added Methods #320

DawDavis commented Nov 3, 2020 •

edited

geseq commented Nov 3, 2020

geseq Nov 3, 2020

DawDavis Nov 3, 2020

d5 commented Nov 3, 2020 •

edited

DawDavis commented Nov 3, 2020 •

edited

geseq left a comment

DawDavis commented Nov 3, 2020

DawDavis commented Nov 3, 2020 •

edited

d5 commented Nov 8, 2020 •

edited

DawDavis commented Nov 8, 2020

d5 commented Nov 8, 2020

Bai-Yingjie commented Mar 19, 2021

Feature: Added Methods #320

Are you sure you want to change the base?

Feature: Added Methods #320

Conversation

DawDavis commented Nov 3, 2020 • edited

OVERVIEW

Specifics

Implementation

Caveats

Continued extensions of this feature

Benchmarks:

Master:

This Branch:

geseq commented Nov 3, 2020

geseq Nov 3, 2020

Choose a reason for hiding this comment

DawDavis Nov 3, 2020

Choose a reason for hiding this comment

d5 commented Nov 3, 2020 • edited

DawDavis commented Nov 3, 2020 • edited

geseq left a comment

Choose a reason for hiding this comment

DawDavis commented Nov 3, 2020

DawDavis commented Nov 3, 2020 • edited

Syntax

Current Branch

Possible Change with 'Self'

d5 commented Nov 8, 2020 • edited

DawDavis commented Nov 8, 2020

d5 commented Nov 8, 2020

Bai-Yingjie commented Mar 19, 2021

DawDavis commented Nov 3, 2020 •

edited

d5 commented Nov 3, 2020 •

edited

DawDavis commented Nov 3, 2020 •

edited

DawDavis commented Nov 3, 2020 •

edited

d5 commented Nov 8, 2020 •

edited