Add support for struct/record types #418

simmsb · 2024-04-27T14:54:30Z

Hey, I've been playing with numbat recently and I've found the type system to work perfectly for my use case (calculating inductor, resistor, etc parameters for a boost converter flashlight driver), however the lack of record types is causing me to repeat myself a bit too much.

Here I've added an implementation of structural record types that are used like so:

>>> ${foo: 1A, bar: 3V}

  ${foo: 1 ampere, bar: 3 volt}

    = ${foo: 1 A, bar: 3 V}    [${foo: Current, bar: Length² × Mass / (Current × Time³)}]

>>> ${foo: 1A, bar: 3V}.foo

  ${foo: 1 ampere, bar: 3 volt}.foo

    = 1 A    [Current]

# A value level struct
${x: Length, y: Length}

# A function with a struct as a parameter
fn euclidian_distance(a: ${x: Length, y: Length}, b: ${x: Length, y: Length}) =
  sqrt(sqr(a.x - b.x) + sqr(a.y - b.y))
  
assert_eq(
  euclidian_distance(${x: 0m, y: 0m}, ${x: 6m, y: 8m}),
  10m)
  
let r = ${foo: 1}

# Struct fields can be accessed using `.field` notation
let x = r.foo

The choice of the ${ syntax for records was because it looked fairly easy to add unambiguously.

Records are implemented in the interpreter with a new value type, and two new opcodes:

Value::Struct(Arc<[(String, usize)]>, Vec<Value>)
Contains the field metadata of the struct (the list of field names and mapped value locations), and the list of values of the struct. The field metadata is only used when printing the fields of the struct.
Op::BuildStruct
Takes two parameters: a 'struct metadata index' and the number of fields, uses the metadata index to retrieve the struct field metadata from the VM, then pops N values from the stack and places them and the metadata in a struct value
Op::DestructureStruct
Takes one parameter: the index into the struct to retrieve. Pops a struct from the stack and pushes back on the value at the given index.

Please feel free to decline this PR, or request significant changes if you intended to go with a different direction for record types :)

sharkdp · 2024-04-28T18:18:03Z

Hi. I have not yet done a detailed review, but let start by saying that I am extremely excited about this PR! This is definitely something I was planning to add to the language.

Hey, I've been playing with numbat recently and I've found the type system to work perfectly for my use case (calculating inductor, resistor, etc parameters for a boost converter flashlight driver), however the lack of record types is causing me to repeat myself a bit too much.

👍

Here I've added an implementation of structural record types

I feel a bit bad to go back to the design level here because you already added this complete implementation including documentation, error handling, etc.

But before we go ahead with this, I would really like to understand if structural record types are the best choice for Numbat, or if we should rather go with a nominative system.

I am familiar with structural record types from languages like PureScript. If you don't have an additional (strong) newtype mechanism on top, there's the obvious problem of record types colliding with other (structurally same) record types of completely different semantic meaning. For example, you might want to have ${x: Length, y: Length} for two different coordinate systems which you don't want to confuse.

I think my first impulse would have been to add more "usual" struct/record types to Numbat that would be based on the name of the type.. with a Rust-like syntax:

struct WorldCoordinate {
  x: Length,
  y: Length,
}

struct LocalCoordinate {
  x: Length,
  y: Length,
}

fn world_to_local(pos: WorldCoordinate) -> LocalCoordinate = …

I'm curious why you decided to go the other way? Would a nominative record type be harder to implement?

simmsb · 2024-05-01T11:30:14Z

I'm curious why you decided to go the other way? Would a nominative record type be harder to implement?

This was pretty much the reason, with named structs the parser would either need to become context sensitive, or we add an extra step after parsing or during typechecking to resolve whether an identifier is a dimension or a struct name.

I think the implementation of structs in the evaluator can stay the same, I'm happy to work on changing this to a nominal system.

Do you have any thoughts on the best way to go about parsing and disambiguating struct and dimension names?

sharkdp · 2024-05-01T18:35:26Z

I think the implementation of structs in the evaluator can stay the same, I'm happy to work on changing this to a nominal system.

It would be great if we could explore this. If it turns out to be more complex than we think now, please feel free to discuss it first before spending a lot of effort.

Do you have any thoughts on the best way to go about parsing and disambiguating struct and dimension names?

So before we talk about implementation, maybe we should consider the language design first. Is it confusing for users to have both struct types (and later maybe enums, type aliases, …) in the same namespace as the dimension types? I mean, we have the same problem with the other builtin types (String, Bool, DateTime, …), but it might be nice to know from reading the code whether something less familiar like Area is a dimension type or a struct (struct Area { width: Length, height: Length }). But that would require us to either work with a prefix (like $Area for struct types) or some kind of case convention (Upper camel case for dimension types, lower snake case for other types). That would solve the problem immediately (in the parser), but somehow I don't like those syntax/name restrictions. So unless we find a(nother) good reason to separate them by prefix/naming-convention, let's assume we don't put any restrictions on the struct names.

Which leaves us with the problem you mentioned. In the parser, we don't know what we are parsing when we see let x: Area …. I would probably leave the parser context free and solve that in a later stage. We do something similar for prefixed units like km where the parser doesn't know if that is a normal identifier, or a unit m which is prefixed by k for kilo. We resolve that later in the "prefix transformer" stage, which comes right after the parser, before the type checker.

We could generalize this to a "name resolution" stage that would solve both the prefix and the dimension-vs-normal-type ambiguity. It would rely on seeing the dimension or struct definition first.

Another option which might be more reasonable is to do it in the type checker. We already add all dimension types to TypeChecker::registry. And all function definitions that we see to TypeChecker::function_signatures. Similarly, we would probably have a place to store all user-defined non-dimension types, together with the information we need for type checking (field names with corresponding field types). So whenever we see a type name like Area, we could check which class it belongs to. Why do you think it requires an additional stage/pass? I guess that would certainly be required if types could be declared after their usage, but I would postpone that problem for now (?).

In both cases, we would have to modify the parser.. at least naming-wise. What is now called DimensionExpression would probably be renamed to TypeExpression. And DimensionExpression::Dimension would be renamed to TypeExpression::TypeIdentifier. And then we would simply allow the parsing of type annotations like Area^2, even if Area would be a struct type. We would only throw an error later in the type checker when we see a non-dimension type being used in a DimensionExpression::Multiply/Divide/Power type-level operation.

Do you think that could work, or do you see additional problems?

sharkdp · 2024-05-01T18:50:02Z

Just a side note regarding your current implementation: the record types do not work with generics.

>>> fn norm<T>(p: ${x: T, y: T}) -> T = sqrt(p.x^2 + p.y^2)

  fn norm<T>(p: ${x: T, y: T}) -> T = sqrt(p.x² + p.y²)

>>> norm(${x: 3, y: 4})
error: while type checking
  ┌─ <input:15>:1:12
  │
1 │ fn norm<T>(p: ${x: T, y: T}) -> T = sqrt(p.x^2 + p.y^2)
  │            - ${x: T, y: T}
  │
  ┌─ <input:16>:1:6
  │
1 │ norm(${x: 3, y: 4})
  │      ^^^^^^^^^^^^^ ${x: Scalar, y: Scalar}
  │
  = Incompatible types in function call: expected '${x: T, y: T}', got '${x: Scalar, y: Scalar}' instead

This is not a problem of your change, but more a problem with my buggy generics implementation… which does not properly distinguish between type parameters and concrete types in some cases (see #166 and a non-ready attempt to resolve it: #253).

Just wanted to mention it. I would probably skip generics for structs in the first iteration. Later, something like

struct Coordinate<T> {
  x: T,
  y: T,
}

might be nice to have. But there are a lot of other things that are more important 😄.

simmsb · 2024-05-04T15:40:04Z

We could generalize this to a "name resolution" stage that would solve both the prefix and the dimension-vs-normal-type ambiguity. It would rely on seeing the dimension or struct definition first.

This makes sense and I think it's the best route to take, I'll have a play around with implementing this.

We would only throw an error later in the type checker when we see a non-dimension type being used in a DimensionExpression::Multiply/Divide/Power type-level operation.

Yeah, I suppose I'll keep structs and dimensions separate in the typed AST, and do the renaming of DimensionExpression you suggest in the untyped AST.

record types do not work with generics.

Ah, I actually made this work for return types, are parameter generics significantly different?

simmsb · 2024-05-06T19:32:57Z

I've made structs nominally typed using the syntax you've proposed, I also refactored a small amount how name collisions are handled so that checking for name collisions between items in the 'type' namespace (and also 'value' namespace) are a bit simpler (however I didn't refactor the collision checking that occurs within the prefix transformer)

As for disambiguating dimensions and struct names, I simply check if a type expression is only a type identifier with a name matching a struct, and take it to be a struct type here: https://github.com/simmsb/numbat/blob/simmsb/struct-types/numbat/src/typechecker.rs#L1835-L1842. I think this is sound as a dimension name cannot clash with a struct.

sharkdp · 2024-05-08T19:26:31Z

Thank you very much for the update — I tried a few things and it looks great!

Some remarks before I dive deeper into the review:

Some tests are currently failing. It looks like newlines are currently not supported in struct definitions (but of course it would be great if we could allow them)
It should not be allowed to define a struct Meter (or similar) with a name that is already taken by a dimension type. And vice versa.
I think it would be great to have some basic tests (1) in the parser, (2) for name resolving but most importantly (3) for the type checker.
The new struct keyword should be added to numbat/src/keywords.rs (see comment in tokenizer.rs to keep the lists in sync)

simmsb · 2024-05-09T11:47:13Z

I've fixed all the tests and CI, there's now tests for parsing around structs and some more tests for type checking of structs, and name collisions.

Struct names do collide properly with dimension names, but not with unit, variable, and function names.

>>> struct Current {}
error: identifier clash in definition
   ┌─ Module 'core::dimensions', File <builtin>/modules/core/dimensions.nbt:32:11
   │
32 │ dimension Current
   │           ------- Previously defined dimension here
   │
   ┌─ <input:3>:1:8
   │
 1 │ struct Current {}
   │        ^^^^^^^ identifier is already in use

numbat/src/parser.rs

numbat/src/ast.rs

numbat/tests/interpreter.rs

book/src/structs.md

numbat/tests/interpreter.rs

book/src/structs.md

examples/numbat_syntax.nbt

numbat/src/bytecode_interpreter.rs

numbat/src/typechecker.rs

sharkdp · 2024-05-13T18:09:53Z

numbat/src/typechecker.rs

+                Type::Struct(StructInfo {
+                    definition_span,
+                    name,
+                    fields,
+                }) => Type::Struct(StructInfo {
+                    definition_span: *definition_span,
+                    name: name.clone(),
+                    fields: fields
+                        .into_iter()
+                        .map(|(n, (s, t))| {
+                            (
+                                n.to_owned(),
+                                (s.clone(), apply_substitutions(t, substitute, substitutions)),
+                            )
+                        })
+                        .collect(),
+                }),
+                type_ => type_.clone(),


Can this code even trigger? We can't have structs with generic parameters, can we? That would require something like (copying Rust syntax, again):

struct Vec<Coordinate> { x: Coordinate, y: Coordinate, }

numbat/src/typechecker.rs

numbat/src/vm.rs

sharkdp

Ok, I finished my review. Sorry for taking a bit longer. This looks really great! Most comments are just minor naming/clarification things. Let me know if you need any help with finishing this. I'm really looking forward to integrating this.

sharkdp · 2024-05-13T20:45:39Z

Another small thing that I found:

This leads to an error:

dimension A
fn foo<A>(a: A) -> Scalar = 1

but this doesn't:

struct A {}
fn foo<A>(a: A) -> Scalar = 1

Edit: I fixed this.

sharkdp

I made some changes myself. Hope that's okay. There are two remaining open comments/questions, which I do not see as blocking issues that would prevent me from merging this. Maybe we can come back to them at some point.

simmsb · 2024-05-18T12:05:45Z

Nice, thanks for taking the time to make those changes :)

sharkdp · 2024-05-18T12:39:45Z

FYI: we're already making use of structs in a new feature: #431

simmsb added 6 commits May 4, 2024 17:49

Add support for struct types

bfc6177

Add tests for structs

cbd9ae6

Add docs for structs

c1ab865

Store field metadata in an Arc instead

38238c7

Cargo fmt

ebe2c80

Make structs nominally typed

5d3ff85

simmsb force-pushed the simmsb/struct-types branch from 2e11a7f to 5d3ff85 Compare May 6, 2024 19:23

Doc updates

a1d5d7c

simmsb added 5 commits May 9, 2024 11:28

Allow newlines inside structs

a5e0ff0

Add more tests for struct parsing, type checking, and name resolution

afaf73e

Add struct to keywords

b01c0db

Cargo fmt

fd8590c

Update book

031e86a