Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for struct/record types #418

Merged
merged 31 commits into from May 15, 2024
Merged

Conversation

simmsb
Copy link
Contributor

@simmsb simmsb commented Apr 27, 2024

Hey, I've been playing with numbat recently and I've found the type system to work perfectly for my use case (calculating inductor, resistor, etc parameters for a boost converter flashlight driver), however the lack of record types is causing me to repeat myself a bit too much.

Here I've added an implementation of structural record types that are used like so:

>>> ${foo: 1A, bar: 3V}

  ${foo: 1 ampere, bar: 3 volt}

    = ${foo: 1 A, bar: 3 V}    [${foo: Current, bar: Length² × Mass / (Current × Time³)}]

>>> ${foo: 1A, bar: 3V}.foo

  ${foo: 1 ampere, bar: 3 volt}.foo

    = 1 A    [Current]
# A value level struct
${x: Length, y: Length}

# A function with a struct as a parameter
fn euclidian_distance(a: ${x: Length, y: Length}, b: ${x: Length, y: Length}) =
  sqrt(sqr(a.x - b.x) + sqr(a.y - b.y))
  
assert_eq(
  euclidian_distance(${x: 0m, y: 0m}, ${x: 6m, y: 8m}),
  10m)
  
let r = ${foo: 1}

# Struct fields can be accessed using `.field` notation
let x = r.foo

The choice of the ${ syntax for records was because it looked fairly easy to add unambiguously.

Records are implemented in the interpreter with a new value type, and two new opcodes:

  • Value::Struct(Arc<[(String, usize)]>, Vec<Value>)
    Contains the field metadata of the struct (the list of field names and mapped value locations), and the list of values of the struct. The field metadata is only used when printing the fields of the struct.
  • Op::BuildStruct
    Takes two parameters: a 'struct metadata index' and the number of fields, uses the metadata index to retrieve the struct field metadata from the VM, then pops N values from the stack and places them and the metadata in a struct value
  • Op::DestructureStruct
    Takes one parameter: the index into the struct to retrieve. Pops a struct from the stack and pushes back on the value at the given index.

Please feel free to decline this PR, or request significant changes if you intended to go with a different direction for record types :)

@sharkdp
Copy link
Owner

sharkdp commented Apr 28, 2024

Hi. I have not yet done a detailed review, but let start by saying that I am extremely excited about this PR! This is definitely something I was planning to add to the language.

Hey, I've been playing with numbat recently and I've found the type system to work perfectly for my use case (calculating inductor, resistor, etc parameters for a boost converter flashlight driver), however the lack of record types is causing me to repeat myself a bit too much.

👍

Here I've added an implementation of structural record types

I feel a bit bad to go back to the design level here because you already added this complete implementation including documentation, error handling, etc.

But before we go ahead with this, I would really like to understand if structural record types are the best choice for Numbat, or if we should rather go with a nominative system.

I am familiar with structural record types from languages like PureScript. If you don't have an additional (strong) newtype mechanism on top, there's the obvious problem of record types colliding with other (structurally same) record types of completely different semantic meaning. For example, you might want to have ${x: Length, y: Length} for two different coordinate systems which you don't want to confuse.

I think my first impulse would have been to add more "usual" struct/record types to Numbat that would be based on the name of the type.. with a Rust-like syntax:

struct WorldCoordinate {
  x: Length,
  y: Length,
}

struct LocalCoordinate {
  x: Length,
  y: Length,
}

fn world_to_local(pos: WorldCoordinate) -> LocalCoordinate = …

I'm curious why you decided to go the other way? Would a nominative record type be harder to implement?

@simmsb
Copy link
Contributor Author

simmsb commented May 1, 2024

I'm curious why you decided to go the other way? Would a nominative record type be harder to implement?

This was pretty much the reason, with named structs the parser would either need to become context sensitive, or we add an extra step after parsing or during typechecking to resolve whether an identifier is a dimension or a struct name.

I think the implementation of structs in the evaluator can stay the same, I'm happy to work on changing this to a nominal system.

Do you have any thoughts on the best way to go about parsing and disambiguating struct and dimension names?

@sharkdp
Copy link
Owner

sharkdp commented May 1, 2024

I think the implementation of structs in the evaluator can stay the same, I'm happy to work on changing this to a nominal system.

It would be great if we could explore this. If it turns out to be more complex than we think now, please feel free to discuss it first before spending a lot of effort.

Do you have any thoughts on the best way to go about parsing and disambiguating struct and dimension names?

So before we talk about implementation, maybe we should consider the language design first. Is it confusing for users to have both struct types (and later maybe enums, type aliases, …) in the same namespace as the dimension types? I mean, we have the same problem with the other builtin types (String, Bool, DateTime, …), but it might be nice to know from reading the code whether something less familiar like Area is a dimension type or a struct (struct Area { width: Length, height: Length }). But that would require us to either work with a prefix (like $Area for struct types) or some kind of case convention (Upper camel case for dimension types, lower snake case for other types). That would solve the problem immediately (in the parser), but somehow I don't like those syntax/name restrictions. So unless we find a(nother) good reason to separate them by prefix/naming-convention, let's assume we don't put any restrictions on the struct names.

Which leaves us with the problem you mentioned. In the parser, we don't know what we are parsing when we see let x: Area …. I would probably leave the parser context free and solve that in a later stage. We do something similar for prefixed units like km where the parser doesn't know if that is a normal identifier, or a unit m which is prefixed by k for kilo. We resolve that later in the "prefix transformer" stage, which comes right after the parser, before the type checker.

We could generalize this to a "name resolution" stage that would solve both the prefix and the dimension-vs-normal-type ambiguity. It would rely on seeing the dimension or struct definition first.

Another option which might be more reasonable is to do it in the type checker. We already add all dimension types to TypeChecker::registry. And all function definitions that we see to TypeChecker::function_signatures. Similarly, we would probably have a place to store all user-defined non-dimension types, together with the information we need for type checking (field names with corresponding field types). So whenever we see a type name like Area, we could check which class it belongs to. Why do you think it requires an additional stage/pass? I guess that would certainly be required if types could be declared after their usage, but I would postpone that problem for now (?).

In both cases, we would have to modify the parser.. at least naming-wise. What is now called DimensionExpression would probably be renamed to TypeExpression. And DimensionExpression::Dimension would be renamed to TypeExpression::TypeIdentifier. And then we would simply allow the parsing of type annotations like Area^2, even if Area would be a struct type. We would only throw an error later in the type checker when we see a non-dimension type being used in a DimensionExpression::Multiply/Divide/Power type-level operation.

Do you think that could work, or do you see additional problems?

@sharkdp
Copy link
Owner

sharkdp commented May 1, 2024

Just a side note regarding your current implementation: the record types do not work with generics.

>>> fn norm<T>(p: ${x: T, y: T}) -> T = sqrt(p.x^2 + p.y^2)

  fn norm<T>(p: ${x: T, y: T}) -> T = sqrt(p.x² + p.y²)

>>> norm(${x: 3, y: 4})
error: while type checking
  ┌─ <input:15>:1:12
  │
1 │ fn norm<T>(p: ${x: T, y: T}) -> T = sqrt(p.x^2 + p.y^2)
  │            - ${x: T, y: T}
  │
  ┌─ <input:16>:1:6
  │
1 │ norm(${x: 3, y: 4})
  │      ^^^^^^^^^^^^^ ${x: Scalar, y: Scalar}
  │
  = Incompatible types in function call: expected '${x: T, y: T}', got '${x: Scalar, y: Scalar}' instead

This is not a problem of your change, but more a problem with my buggy generics implementation… which does not properly distinguish between type parameters and concrete types in some cases (see #166 and a non-ready attempt to resolve it: #253).

Just wanted to mention it. I would probably skip generics for structs in the first iteration. Later, something like

struct Coordinate<T> {
  x: T,
  y: T,
}

might be nice to have. But there are a lot of other things that are more important 😄.

@simmsb
Copy link
Contributor Author

simmsb commented May 4, 2024

We could generalize this to a "name resolution" stage that would solve both the prefix and the dimension-vs-normal-type ambiguity. It would rely on seeing the dimension or struct definition first.

This makes sense and I think it's the best route to take, I'll have a play around with implementing this.

We would only throw an error later in the type checker when we see a non-dimension type being used in a DimensionExpression::Multiply/Divide/Power type-level operation.

Yeah, I suppose I'll keep structs and dimensions separate in the typed AST, and do the renaming of DimensionExpression you suggest in the untyped AST.

record types do not work with generics.

Ah, I actually made this work for return types, are parameter generics significantly different?

@simmsb
Copy link
Contributor Author

simmsb commented May 6, 2024

I've made structs nominally typed using the syntax you've proposed, I also refactored a small amount how name collisions are handled so that checking for name collisions between items in the 'type' namespace (and also 'value' namespace) are a bit simpler (however I didn't refactor the collision checking that occurs within the prefix transformer)

As for disambiguating dimensions and struct names, I simply check if a type expression is only a type identifier with a name matching a struct, and take it to be a struct type here: https://github.com/simmsb/numbat/blob/simmsb/struct-types/numbat/src/typechecker.rs#L1835-L1842. I think this is sound as a dimension name cannot clash with a struct.

@sharkdp
Copy link
Owner

sharkdp commented May 8, 2024

Thank you very much for the update — I tried a few things and it looks great!

Some remarks before I dive deeper into the review:

  • Some tests are currently failing. It looks like newlines are currently not supported in struct definitions (but of course it would be great if we could allow them)
  • It should not be allowed to define a struct Meter (or similar) with a name that is already taken by a dimension type. And vice versa.
  • I think it would be great to have some basic tests (1) in the parser, (2) for name resolving but most importantly (3) for the type checker.
  • The new struct keyword should be added to numbat/src/keywords.rs (see comment in tokenizer.rs to keep the lists in sync)

@simmsb
Copy link
Contributor Author

simmsb commented May 9, 2024

I've fixed all the tests and CI, there's now tests for parsing around structs and some more tests for type checking of structs, and name collisions.

Struct names do collide properly with dimension names, but not with unit, variable, and function names.

>>> struct Current {}
error: identifier clash in definition
   ┌─ Module 'core::dimensions', File <builtin>/modules/core/dimensions.nbt:32:11
   │
32 │ dimension Current
   │           ------- Previously defined dimension here
   │
   ┌─ <input:3>:1:8
   │
 1 │ struct Current {}
   │        ^^^^^^^ identifier is already in use

numbat/src/parser.rs Outdated Show resolved Hide resolved
numbat/src/ast.rs Outdated Show resolved Hide resolved
book/src/structs.md Outdated Show resolved Hide resolved
book/src/structs.md Outdated Show resolved Hide resolved
examples/numbat_syntax.nbt Outdated Show resolved Hide resolved
numbat/src/typechecker.rs Outdated Show resolved Hide resolved
Comment on lines +707 to +724
Type::Struct(StructInfo {
definition_span,
name,
fields,
}) => Type::Struct(StructInfo {
definition_span: *definition_span,
name: name.clone(),
fields: fields
.into_iter()
.map(|(n, (s, t))| {
(
n.to_owned(),
(s.clone(), apply_substitutions(t, substitute, substitutions)),
)
})
.collect(),
}),
type_ => type_.clone(),
Copy link
Owner

@sharkdp sharkdp May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this code even trigger? We can't have structs with generic parameters, can we? That would require something like (copying Rust syntax, again):

struct Vec<Coordinate> {
  x: Coordinate,
  y: Coordinate,
}

numbat/src/vm.rs Outdated Show resolved Hide resolved
Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I finished my review. Sorry for taking a bit longer. This looks really great! Most comments are just minor naming/clarification things. Let me know if you need any help with finishing this. I'm really looking forward to integrating this.

@sharkdp
Copy link
Owner

sharkdp commented May 13, 2024

Another small thing that I found:

This leads to an error:

dimension A
fn foo<A>(a: A) -> Scalar = 1

but this doesn't:

struct A {}
fn foo<A>(a: A) -> Scalar = 1

Edit: I fixed this.

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes myself. Hope that's okay. There are two remaining open comments/questions, which I do not see as blocking issues that would prevent me from merging this. Maybe we can come back to them at some point.

@sharkdp sharkdp merged commit 26cad48 into sharkdp:master May 15, 2024
15 checks passed
@simmsb
Copy link
Contributor Author

simmsb commented May 18, 2024

Nice, thanks for taking the time to make those changes :)

@sharkdp
Copy link
Owner

sharkdp commented May 18, 2024

FYI: we're already making use of structs in a new feature: #431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants