Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain Syntax #21

Open
Person-93 opened this issue Feb 25, 2022 · 15 comments
Open

Explain Syntax #21

Person-93 opened this issue Feb 25, 2022 · 15 comments

Comments

@Person-93
Copy link
Contributor

Hi,

I think this language looks pretty cool and it'll be informative to see chumsky being used as intended by you, the original author.

However, I'm having trouble learning the language's syntax from the examples. If you have the time, would you be willing to write a brief explanation of the syntax?

I understand that this is a hobby project so I'm not looking for anything fancy, it's just that I'm interested in your thoughts on language design and I'd like to understand it a bit better. If I end writing any programs in tao that actually work, that would be entirely incidental.

If it'd be helpful/interesting to you, I can write an itemized list of things I find confusing with links to the relevant lines of code.

@zesterer
Copy link
Owner

zesterer commented Feb 25, 2022

The syntax is, broadly, a strange combination of Haskell, ML, and Rust. It has first-class functions, pattern-matching, and a variety of other features typical for functional languages.

Expression syntax is broadly similar to C-likes (you call a function f like f(a, b, c)) but differs a little in that it lacks block delimiters. This is because Tao is a pure language: no expression has side-effects, so there's never a need to execute statements, only return values. For this reason, almost everything in Tao is an expression. Here are a few examples:

Function calls

f(a, b)

alternatively, the following syntax is equivalent to the above (but makes chaining function calls easier in many cases)

b:f(a)

Conditionals

(ternary operators, in most C-like languages)

if condition then a else b

Variable declaration

let five = 5 in
let six = 6 in
five + six

Pattern matching:

(the final branch uses \ instead of | to avoid ambiguity)

match x in
    | 0 => "zero"
    | 1 => "one"
    \ n => "a large number"

Pattern-matching in particular is interesting and arises very often. If you're not used to functional languages, it might feel a little unintuitive but, but it's basically like C's switch on steroids: instead of just checking a single value, it can be used to check the structure of many different properties of a value, extracting sub-values from it. It works for many different types too: lists, data types, union types, integers, reals, and others can all be matched against.

Functions also support pattern-matching on their arguments directly. Consider a factorial function written in C:

int factorial(int n) {
    if (n == 0) {
        return 1;
    } else {
        return n * factorial(n - 1);
    }
}

In Tao, we can very efficiently express this with pattern-matching in a way that allows the compiler to check the exhaustivity of our logic (i.e: ensure that we've covered all cases, and that no runtime errors are possible):

fn factorial =
    | 0 => 1
    \ m ~ n + 1 => m * factorial(n)

This m ~ n + 1 syntax might look a little confusing, but it might make more sense when I say that ~ allows us to bind a pattern to a variable (Rust uses @ for this). n + 1 is itself a pattern that matches any natural number greater than or equal to 1, and binds the value that's one less than it.

The end result of this is that the compiler can automatically determine that no underflow or unhandled cases exist in the resulting problem, allowing writing programs with significantly more confidence that they work correctly.

Another example is the len function which calculates the length of a list

fn len =
    | [] => 0
    \ [_ .. tail] => 1 + len(tail)

Here, we have two patterns. The first binds on the empty list, [], the second on a list with 'at least one' item, [_ .. tail]. The first case is the 'base case': an empty list always has a length of 0. In the 'at least one element' case, we simply call the function len recursively with the tail of the list, adding 1 to the result.

As before, the compiler verifies that we're handling all possible cases and will throw an error if this is not the case. Definitely give it a try!

There is a lot more to explain about the syntax: I've not touched on sum types, tuples, classes (called 'typeclasses' or 'traits' in other languages: they aren't the same as C++-style classes!), etc. but hopefully this gives a taste of what Tao can do!

One thing I would mention is that Tao is still very experimental. I change the syntax relatively often and I'm still doing some pretty fundamental work on the compiler internals. I don't expect it to be usable for general-purpose programming for quite a while yet, if ever. If that's fine by you, then feel free to play around with it to see what you can get it to do (or, alternatively, figure out how it can be broken!).

@zesterer
Copy link
Owner

If it'd be helpful/interesting to you, I can write an itemized list of things I find confusing with links to the relevant lines of code.

For sure, I'd be happy to explain any particular areas of the syntax you find particularly confusing!

@Person-93
Copy link
Contributor Author

Thanks for taking the time to write this. This has certainly clarified some things for me.

The syntax is, broadly, a strange combination of Haskell, ML, and Rust. It has first-class functions, pattern-matching, and a variety of other features typical for functional languages.

To give you a sense of my background: I have a passing familiarity with Haskell, and a pretty good handle on Rust.

Variable declaration

let five = 5 in
let six = 6 in
five + six

Does this mean that all variables are explicitly scoped to a given expression?

fn factorial =
    | 0 => 1
    \ m ~ n + 1 => m * factorial(n)

This pattern matching is amazing! Can the expressions be arbitrarily complex? Can they involve more than one variable? Does it work on user defined types? If it doesn't now, is something like that even possible or desirable?

One thing I would mention is that Tao is still very experimental. I change the syntax relatively often and I'm still doing some pretty fundamental work on the compiler internals. I don't expect it to be usable for general-purpose programming for quite a while yet, if ever. If that's fine by you, then feel free to play around with it to see what you can get it to do (or, alternatively, figure out how it can be broken!).

This is perfectly fine. I'm primarily interested in Tao as a case study in language design. I've started dabbling in languages and I find them fascinating.

If it'd be helpful/interesting to you, I can write an itemized list of things I find confusing with links to the relevant lines of code.

For sure, I'd be happy to explain any particular areas of the syntax you find particularly confusing!

What is the most convenient way for me to ask you questions? Should I just keep opening new issues? I know that some maintainers have strong feelings about issues being used as a general forum.

@zesterer
Copy link
Owner

zesterer commented Feb 25, 2022

Does this mean that all variables are explicitly scoped to a given expression?

Yes. The syntax is let <pat> = <expr> in <expr>. Any values bound to variables in <pat> are accessible in the final expression. Obviously, this also composes so it's possible to nest many let expressions, effectively allowing many lets within a single expression:

let x = ... in
let y = x in
let z = x + y in
x + y + z # All of x, y, and z are valid in the final trailing expression

This pattern matching is amazing!

Thanks. This grew out of the desire to statically avoid runtime errors when doing x - 1 if x is unsigned. Obviously, shifting this sort of thing to the type system is a big advantage!

Can the expressions be arbitrarily complex?

Unfortunately not, and this is (in general, at least) intractable. Currently, only arithmetic add works for Nats (natural numbers), but any value can be on the right-hand side.

I would like to generalise this more in the future to allow for patterns like x * 2 that bind only even numbers or x * 2 + 1 that binds only odd numbers. Obviously, with this sort of expressive power, it makes sense to be able to compose patterns in the same way that it's possible to do for expressions, allowing for something like is_even x. I've not really thought to much about this yet so I'm sure there are many more opportunities!

Does it work on user defined types?

Uh... sort of? But not deliberately.

Consider a definition of Nat that looks like:

data Nat =
    | Succ Nat
    \ Zero

Here, Nat is defined inductively as successors on top of Nat. The number 3 would just be Succ Succ Succ Zero, the successor of the successor of, the successor of zero. You could then use this definition in a factorial function:

fn factorial =
    | Zero => Zero
    \ m ~ Succ n => multiply(m, factorial(n))

(multiply is assumed to be defined elsewhere)

Obviously, the type system would be able to reason about this but I'm sure this isn't quite what you were expecting 😀

What is the most convenient way for me to ask you questions? Should I just keep opening new issues? I know that some maintainers have strong feelings about issues being used as a general forum.

Totally fine to throw them in here, I'm not bothered. For me, Tao is just a fun exercise to see how much I can learn and how much I can push language design so I'm very happy to discuss it casually here!

@venil7
Copy link

venil7 commented Jul 6, 2022

may I suggest that this is extracted to SYNTAX.md file on the repository itself

@zesterer
Copy link
Owner

zesterer commented Jul 6, 2022

may I suggest that this is extracted to SYNTAX.md file on the repository itself

A previous revision of the language had a book explaining the language, and I'm hopeful that I can find the time to resurrect that once the syntax has stabilised a little. Much of the behaviour remains the same, so the old book might still be of some use!

@matthiasbeyer
Copy link
Contributor

Function calls

f(a, b)

alternatively, the following syntax is equivalent to the above (but makes chaining function calls easier in many cases)

b:f(a)

Just to be sure: Am I missing something? Shouldn't this be a:f(b) instead of b:f(a)? 🤔

@zesterer
Copy link
Owner

zesterer commented Mar 10, 2023

Nope, it's b->f(a) (: was changed to -> recently). That might sound counterintuitive at first, but it follows from the fact that b->f(a) is not itself a pattern recognised by the parser, but is instead a product of two function calls which you should read as b -> (f(a))

i.e: f(a) is evaluated, and then b -> ... is evaluated.

That might sound a little topsy-turvy, but it enables really nice method-chaining-like syntax like this:

let y = range(0, 10)
    -> map(fn x => x + 1)
    -> sum
in

@matthiasbeyer
Copy link
Contributor

Ah, cool. Well that syntax is really confusing, but I see the value (as in the last example you provided).

@zesterer
Copy link
Owner

Yes, I'm not overly happy with it myself. It also comes with the difficulty that functions like map have global scope (as they do in Haskell, say). This becomes a bit annoying when two different types (say, impl Iterator and Result) should implement the function differently.

I am considering moving to a setup where it's possible to implement functions "on" a type, Rust-style, and the type system would choose which implementation to use based on the first parameter. I've not done too much thinking about this yet though, so I'm open to ideas!

@matthiasbeyer
Copy link
Contributor

matthiasbeyer commented Mar 10, 2023

Well, I really like function chaining syntax in Rust, but I am not so sure whether it actually makes sense in a functional language.

FWIW, I find the composing in Haskell really nice:

combined = sum $ map \x -> x + 1 $ range 0 10

-- or
combined =
  let
     mapper = map \x -> x + 1
  in sum $ mapper $ range 0 10

(I am not a Haskeller, so that might be slightly or completely wrong).

Which, AFAICS would translate to something like

combined = sum (map (fn x => x + 1) (range 1 10))

# or
combined = let
    mapper = map (fn x => x + 1)
    in sum (mapper (range 1 10))

in tao?

I can clearly see from the first glimpse at that piece of code, what it does.
I also must say that I am a fan of rather having less syntax that is easy to grasp than fancy syntax that one must first learn before they can start hacking.
(Or rather: I am a fan of syntax that uses few special characters and mainly stays inside A-Za-z, if you know what I mean 😆

@zesterer
Copy link
Owner

zesterer commented Mar 10, 2023

Well, I really like function chaining syntax in Rust, but I am not so sure whether it actually makes sense in a functional language.

Interestingly, Tao's -> operator is pretty much identical to F#'s pipeline operator (I didn't know this at the time) so it seems like infix function calls have a pretty strong precedence among functional languages.

Which, AFAICS would translate to something like ... in tao?

Close enough, yes. Tao uses C-style function calling syntax though (i.e: it requires parentheses) so it would be more like this:

let combined = sum(map(fn x => x + 1, range(1, 10))

# or
let combined = let
        mapper = map(fn x => x + 1)
    in sum(mapper(range(1, 10)))

I also must say that I am a fan of rather having less syntax that is easy to grasp than fancy syntax that one must first learn before they can start hacking.

I definitely agree. I've tried to keep Tao's actual syntax relatively minimal (at least, without hurting ease of use), tending to favour explicit keywords instead of opaque symbols.

There are almost no operators in Tao that you wouldn't find in your typical C-like language, with perhaps two exceptions:

  • postfix ! (used like print("Hello!")!) which is the 'effect propagation' operator. Think of it like Rust's ?, .await, and yield all rolled into one.

  • ++, which is the list joining operator. I'm planning to get rid of it in favour of the regular + operator though since it's a legacy from the days before Tao had operator typeclasses and could generalise + to things like lists.

@matthiasbeyer
Copy link
Contributor

Interestingly, Tao's -> operator is pretty much identical to F#'s pipeline operator [...] so it seems like infix function calls have a pretty strong precedence among functional languages.

Hm, yes, you're right. Maybe the thing that is confusing to me is the missing spaces around the operator b->f instead of b -> f. The former looks like the C "deref pointer b to field f" to me 😆

postfix ! (used like print("Hello!")!) which is the 'effect propagation' operator. Think of it like Rust's ?, .await, and yield all rolled into one.

I guess I have to wrap my head around that a bit better. Does it mean that we can short-circuit function implementations in case of error with the ! operator? If yes, how does this work in terms of a purely functional language? I am a bit confused 🤔

Further syntax questions

I have a list of questions about syntax here.

Please don't get overwhelmed with me asking a lot of questions here and don't hesitate to skip some or all of them if you think they're not worthy of an answer 😆

Maybe a bit context to all my questions (also for other readers of this comment):

I started playing around with my own idea of a functional programming language [here](https://github.com/vunk-lang/vunk-lang/). There's nothing to see yet except some syntax examples, because I am a lazy person and only implemented a tokenizer so far. That's also why I contribute here: Because I hope that joining forces results in me getting my ass of the couch and do something cool.

My idea of a functional syntax is inspired by three points:

  • I want a language like Rust, with all the awesome things (like borrow checker, strong type system, compiling to native code with LLVM to be extra fast, good interop with C code, ...) but purely functional
  • Haskell is great, but so many operators are confusing (really what is all this <*, <*>, <$>, $>,...)
  • A functional language that one can read as easy as Ruby code would be nice

(from most important to "least" important).

There's an example of the syntax here(, or the same code with some comments here).

Pattern matching

You mentioned above that the last branch in a pattern uses \ instead of |:

(the final branch uses \ instead of | to avoid ambiguity)

Is this necessary? I believe that it makes maintenance of code rather complex, as adding a new variant would mean that the last variant needs to be touched as well, right?

Variable declaration

I see that the let ... in syntax is sane in a way that it is easy to implement (I guess) because it is kind of recursive...

But it is rather hard to maintain, isn't it?

# Why
let a = 5 in 
let b = 6 in
a + b

# instead of
let
   a = 5
   b = 6
in a + b

?

Function call syntax

Why is the function call syntax made with parentheses? Especially with currying and possibly partial function application in mind, it becomes even more of a question to me.

I see that foo(bar(12), 13) is more obvious to users than foo bar 12 13, but still... I find the latter syntax way simpler to parse (in my brain), yet it is functionally (no pun intended) equivalent.

Another example: Would this be the right way:

mapFunc = map(func)

for defining a function mapFunc which maps func over a list of elements?

Implementation of typeclasses

As far as I can see, implementing a typeclass is done with member Foo of Bar. I am not sure whether "member" is some mathematical term here... but I find it rather confusing.

How about impl Foo for Bar?

Trait bounds

Or is it "class bounds"? 🤔

From the standard library:

for A < Neg member [A] of Neg =
  # ...

This is really confusing to me. I guess it means "For all A that implement Neg, implement Neg for [A]". But the syntax is very confusing.

How about this:

impl Neg for [A]
    where A: Neg
    = # code

or, if we want to be a bit more consise:

impl Neg for [A] where A: Neg =
    # code

or

with A: Neg
impl Neg for [A] =

Additional bounds could then be added with +: with A: Neg + Debug + Something (or the equivalent with the other syntaxes).

~

It came up above as binding a pattern to a variable,... but let me quote this piece of code (from the standard library):

fn fix A, B, e : ((A -> e ~ B) -> A -> e ~ B) -> A -> e ~ B =
    \ f, n => f(f->fix, n)!

I stared at this for several minutes and it is still not clear to me what it does except for defining a function "fix".

  • What does e ~ B do?
  • Are A and B variables or type parameters?

My guess:

This returns a function that takes two arguments (f and n) that calls f on fix(f) and n and wraps(?) the whole thing in an effect?
Might be completely wrong though (I bet it is).

Member functions

As far as I can see, tao does not have the concept of "member functions", syntax wise, does it?
Would you consider adding it?

For example (again, example taken from the standard lib):

fn map_err A, E, F : (E -> F) -> Result A E -> Result A F =
    # some impl

# vs
# (using my idea of generic impl syntax from above)

with A, E
impl Result A E:
  with F
  fn map_err: Self -> (E - F) -> Result A F =
    # some impl

Or something like this? Would only be syntax sugar of course, but maybe worth a thought...


I hope that you don't mind me asking so many questions 😆

@matthiasbeyer
Copy link
Contributor

Ah and one bit that also just came up: What's the difference between def and fn? Especially because they seem to be used both to define functions?

@zesterer
Copy link
Owner

I guess I have to wrap my head around that a bit better. Does it mean that we can short-circuit function implementations in case of error with the ! operator? If yes, how does this work in terms of a purely functional language? I am a bit confused thinking

Yes, effect propagation can short-circuit, just like .await and ? (although currently the bytecode VM backend only supports effects that always return... but that's an implementation detail I've yet to flesh out).

Is this necessary? I believe that it makes maintenance of code rather complex, as adding a new variant would mean that the last variant needs to be touched as well, right?

I originally added this is avoid needing either delimiters (like the braces that surround match expressions in Rust) or indentation sensitivity. While I think it's still a fairly neat syntax that looks quite visually clean, I'm coming round to the idea of just going for indentation sensitivity instead.

But it is rather hard to maintain, isn't it?

Tao actually supports your suggested syntax! You can do

let
    a = 5,
    b = a + 4,
in

Why is the function call syntax made with parentheses? Especially with currying and possibly partial function application in mind, it becomes even more of a question to me.

Mostly personal preference at this stage. I've always used C-like languages, and I found it easier to use. I'm not opposed to switching though!

Although Tao currently supports currying, I'm actually tempted to remove it (or, at least, require explicit currying, perhaps like map(fn x => x + 1, _)). I've found that the compiler assuming that every function can be curried makes for some pretty gnarly error messages sometimes when you forget function parameters.

The argument for this becomes even greater if the b -> f(a) ends up getting replaced with something more akin to Rust-style methods where currying f isn't actually required, IMO.

Another example: Would this be the right way ... for defining a function mapFunc which maps func over a list of elements?

Yes.

I am not sure whether "member" is some mathematical term here... but I find it rather confusing.

I see typeclasses as sets. member A of B simply declares A to be a member of the set B. For example, member Int of Add Int means "Int is a member of the set of things that can be added to Ints". I am not opposed to changing this syntax, however.

This is really confusing to me. I guess it means "For all A that implement Neg, implement Neg for [A]". But the syntax is very confusing.

Tao supports where clauses, so

member [A] of Neg where
    A < Neg
= ...

Is just as valid. I use < to mean 'is a member of' because it's supposed to look like , meaning "is a member of". I am also not opposed to changing this syntax.

I stared at this for several minutes and it is still not clear to me what it does except for defining a function "fix".

It's true that function syntax is a bit opaque right now, and this is something I've been trying to come up with a rugged solution for. I believe the reason for this is that the type signature is separated from the function expression itself. If we fit it into a more Rusty syntax, you end up with:

fn fix<A, B, e>(
    f: fn(A) -> e ~ B,
    n: A,
) -> e ~ B {
    f(fix(f), n)!
}

Having the types be layered on top of the function definition help quite a bit. I want to find a similar syntax for Tao that doesn't involve throwing braces and arrows everywhere as Rust does.

That said, the definition of fix is just inherently complicated: it's got two type parameters, an effect parameter, it takes a higher-order function as a parameter, and both it and the higher order function return effect objects. It's also a function that's uber-generic, so there's no good point of reference for what it actually does unless you know what fix does. The fact that Tao is (at least, for now) non-lazy and that this function allows for the inner function to have side effects means that the signature needs to be even more complex than it would otherwise be.

As far as I can see, tao does not have the concept of "member functions", syntax wise, does it?

It doesn't. As mentioned earlier, I think it would be a useful feature to have and would improve both error messages and also allow differentiating between difference functions with the same name, such as map.

Ah and one bit that also just came up: What's the difference between def and fn?

fn sum =
    | [] => 0
    \ [x .. xs] => xs -> sum + x

is just syntax sugar for

def sum = fn
    | [] => 0
    \ [x .. xs] => xs -> sum + x

It's part of my effort, as mentioned above, to come up with a more terse and easily understandable syntax for function definitions, but I consider it incomplete.

In Tao, functions are just values (as in most functional languages). There's no difference between a value item and a function item. In fact, value items can have type parameters! (unlike, say, Rust's statics or consts)

# Defines the empty list
def empty A : [A] = []

You might initially be a little concerned about this: what if we created a value item that had a side-effect such as printing to the console? Surely that would need to happen in a const (i.e: compile-time) context?

Well, effects solve this problem!

def print_hello = print("hello")

Because effectful functions return effect objects, this does nothing: but gets assigned the type io ~ (). Sort of like calling an async function in Rust, but never .await-ing it. However, we can then go on to use it in a context where the effect can be propagated (such as main):

def main = print_hello!

And you'll end up with hello in your console when you run the program.

In effect, this means that all Tao code is const by default. It's almost like everything has an implicit const annotation (in Rust parlance), and to perform observable side-effects you need to opt out of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants