Value of `null`? #111

k-sareen · 2022-12-02T00:03:26Z

Having been programming in Rust for the last couple of years, I will say the one feature I'd like to import from Rust into Virgil is that references aren't allowed to be null, i.e. they have to be the Option (or Result) type. I quite like this as the type system forces you to check a potentially nullable reference at compile time and you avoid the nasty !NullPointerException errors (and yes -- I know many languages have had this for ages, but Rust was my first serious exposure to this). Virgil already has support for ADTs etc., so a native Option type is already possible except for the fact that null still exists (as you can just set an Option variable to null, which defeats the purpose of Option).

Hence, my question is, do you think there is still value in keeping null? I think it'd be nicer to not have null, but I understand it's a very significant change to the language and will require major refactoring.

The text was updated successfully, but these errors were encountered:

titzer · 2022-12-04T22:55:26Z

I'd like to incrementally move away from nullable types, perhaps introducing ? as the option type constructor, and null as the value for not-present. My concern is how much real code will end up with ? everywhere for nullable types and how often programs end up using null-tolerant constructs such as .?.

In Virgil, functions can be null. I've recently started to feel like having a optional-invoke construct would be useful, e.g. for invoking debugging/tracing code, like:

var onEvent: E -> ();

def myfunc(e: E) {
  onEvent?(e);
  realWork(e);
}

k-sareen · 2022-12-05T02:19:20Z

My concern is how much real code will end up with ? everywhere for nullable types

Yeah. In my subjective experience with Rust, it can litter the code with ? everywhere if one function deep in the call stack returns an Option (or Result) type, as you have to pass the values through the callstack until a function has enough context to deal with the None (or Err) value. Though often it is just the function 1 level higher (i.e. the caller function) than the function that is returning the optional value.

I've recently started to feel like having a optional-invoke construct would be useful, e.g. for invoking debugging/tracing code

Feels like ? is starting to get very overloaded then if it will be used for checking if a variable is None or Err (as Rust uses it), as the "optional-invoke construct", and also as the type query operator (.?). The optional-invoke construct definitely sounds quite interesting for benchmarking as well.

k-sareen · 2022-12-05T02:28:57Z

Though Virgil already has a lot of:

val = something();
if (val == null) {
    fail("val is null");
}

which would be cleaned up by the ? operator.

The major way I've seen null being used (and correct me if I'm wrong), is for initialization, which I think would benefit from the ? operator.

titzer · 2022-12-05T22:56:32Z

One possible migration path would be to introduce the option type constructor first, while keeping the nullability of class and array types as it is now. The option type constructor applied to type T creates a type that represents the union of the null value with the set of values represented by the type T.

For syntax, I like postfix ? for types, so they look like T?, such as int?, Foo?, and so on.
This type constructor would have a couple of special cases, like T?? == T?, because of the union semantics above. Similarly, for a class or array type T, initially T? == T.

Overall, this kind of change requires the compiler to support the union of the old language behavior and the new language behavior, selectable with a command-line flag. (Similar to how I did the -legacy-cast migration), but bigger. It takes at least two bootstraps and stable releases to get the behavior on-by-default, because the compiler must first support the new semantics under a flag, then the compiler (and all tests) need to be migrated to comply to the new rules, then the flag can be flipped, then the flag removed.

To support the union of behaviors I would first introduce non-nullable types in the compiler's representation of Virgil types. They wouldn't necessarily have a source syntax, but it would allow working on the verifier rules. Thus the verifier could be incrementally migrated to generate errors, enabled by the command-line flag.

k-sareen · 2022-12-06T07:10:03Z

Yes that makes sense to me.

What do you imagine the semantics of a nullable type are? For the sake of argument, let's take a user defined type Foo that has a field var length: int and function def bar() -> f64. Now, is the following program valid?

var f: Foo? = null;
System.puts(f.length)
System.puts(f.bar())

In Rust, you can't access the internal type T without unpacking the Option<T> (either with a match/if statement or the ? operator), and Kotlin, for example, does kinda allow it, but will throw a compilation error. Kotlin has the "safe call operator" similar, but not the same, to Rust's ? operator, wherein any nullable type's fields or methods can be accessed by doing f?.length. I personally like the Option type in Rust more due to its explicitness (note that it's not verbose due to the existence of the ? operator), but it may just be Stockholm syndrome, haha.

Also, semantics-wise, it'd be good to have predefined functions on every nullable type such as def is_some() -> bool and def is_none() -> bool as well.

Rust optimizes Option<T> to be the same size as T for pointer types such as Box, &T, &mut T etc. (effectively null pointer optimization). This would be an important optimization imo.

While we're here, how does Virgil want to support errors/exception handling? The Rust (or sort of functional) style with the Result type (so explicit error propagation), or the Java style exception handling? A Result type is almost an Option type, so it could be beneficial to discuss this. The ? operator in Rust can be used for both checking and returning an Option and Result value immediately. That is,

fn some_function() -> Result<f64, Err> {
    let f = some_other_function()?;
    // use f
}

is equivalent to

fn some_function() -> Result<f64, Err> {
    let f = match some_other_function() {
        Ok(f) => f;
        Err(e) => return Err(e);
    };
    // use f
}

If the goal is to have explicit error propagation like with a Result type, then I think an operator for quickly returning errors like ? operator in Rust is a good idea in order to reduce repetitive, boring, boilerplate code (I'm looking at you, Go).

titzer · 2022-12-06T07:56:27Z

Yes that makes sense to me.

What do you imagine the semantics of a nullable type are? For the sake of argument, let's take a user defined type Foo that has a field var length: int and function def bar() -> f64. Now, is the following program valid?
var f: Foo? = null;
System.puts(f.length)
System.puts(f.bar())

I think an error for accessing fields or calling methods on a nullable type would be in order. (Otherwise, what's the point, ja?)

In Rust, you can't access the internal type T without unpacking the Option<T> (either with a match/if statement or the ? operator), and Kotlin, for example, does kinda allow it, but will throw a compilation error. Kotlin has the "safe call operator" similar, but not the same, to Rust's ? operator, wherein any nullable type's fields or methods can be accessed by doing f?.length. I personally like the Option type in Rust more due to its explicitness (note that it's not verbose due to the existence of the ? operator), but it may just be Stockholm syndrome, haha.

I think I like some shorthands for either forcing a nullcheck (with !NullPointerException) or for accessing fields/methods, similar to Kotlin.

The expression e.?f, representing a null-tolerant field load, would be sugar for {var tmp = e; if(tmp != null, tmp.f) } and similarly for method calls, e.m(exprs) is sugar for {var tmp = e; if(tmp != null, tmp(exprs) } (note, exprs not evaluated if e is null.

Also, semantics-wise, it'd be good to have predefined functions on every nullable type such as def is_some() -> bool and def is_none() -> bool as well.

Sure, you could just use T.?(e) and e == null for that, because the type T would represent the non-null type.

Rust optimizes Option<T> to be the same size as T for pointer types such as Box, &T, &mut T etc. (effectively null pointer optimization). This would be an important optimization imo.

Yeah, I want to generally upgrade the middle of the compiler to represent ADTs more efficiently, so it's effectively like using a tag bit for non-reference types and just using a null pointer for reference types.

While we're here, how does Virgil want to support errors/exception handling? The Rust (or sort of functional) style with the Result type (so explicit error propagation), or the Java style exception handling? A Result type is almost an Option type, so it could be beneficial to discuss this. The ? operator in Rust can be used for both checking and returning an Option and Result value immediately. That is,

No on using Java-style exceptions, so more in the style of encoding errors into return types (sometimes as ADTs). It turns out I sometimes end up encoding errors as configurable behavior on an object, like how DataReader has a mutable onError function member, or passing an additional argument which is an error generator/collector and returning a default value. (E.g. both the Virgil compiler and the Wizard verifier do this a lot).

fn some_function() -> Result<f64, Err> {
    let f = some_other_function()?;
    // use f
}
is equivalent to
fn some_function() -> Result<f64, Err> {
    let f = match some_other_function() {
        Ok(f) => f;
        Err(e) => return Err(e);
    };
    // use f
}
If the goal is to have explicit error propagation like with a Result type, then I think an operator for quickly returning errors like ? operator in Rust is a good idea in order to reduce repetitive, boring, boilerplate code (I'm looking at you, Go).

I can see the value of having an explicit kind of error-propagating type, so if there was a way to integrate that in a more Virgilistic way (I avoid build too many named types, particularly capitalized named types), that'd be neat.

k-sareen · 2022-12-06T14:24:11Z

I think an error for accessing fields or calling methods on a nullable type would be in order. (Otherwise, what's the point, ja?)

Haha yes of course. I meant more about the syntax and UX, I guess.

The expression e.?f, representing a null-tolerant field load, would be sugar for {var tmp = e; if(tmp != null, tmp.f) } and similarly for method calls, e.m(exprs) is sugar for {var tmp = e; if(tmp != null, tmp(exprs) } (note, exprs not evaluated if e is null.

I think ideally it's the same syntax for accessing a field and a method like how Kotlin does it.

Sure, you could just use T.?(e) and e == null for that, because the type T would represent the non-null type.

Exactly yeah. Just better to have semantic/descriptive function names in my personal subjective opinion.

I can see the value of having an explicit kind of error-propagating type, so if there was a way to integrate that in a more Virgilistic way (I avoid build too many named types, particularly capitalized named types), that'd be neat.

Ah right -- I just realized that yeah Virgil does not have built-in types that start with a capital letter other than Array. Is there any particular reason for that? Don't want to potentially have to deal with shadowing user-defined types? Or something else?

Could do something like an err type? Or result equivalently (this is probably more semantically appropriate).

titzer · 2022-12-07T04:01:34Z

Ah right -- I just realized that yeah Virgil does not have built-in types that start with a capital letter other than Array. Is there any particular reason for that? Don't want to potentially have to deal with shadowing user-defined types? Or something else?

Yes, I am trying to keep a very strict separation between what is a library and what is in the language, and language types looking like library types can cause some confusion. Array<T> was originally a placeholder for making a decision about a better syntax. It stuck because it's easy to read.

Could do something like an err type? Or result equivalently (this is probably more semantically appropriate).

Perhaps. One thing about errors that I have dealt with a lot recently is that the error cases basically have arguments, such as which file or line number they occurred at, etc. There is some application-level data that might be programmatically attached and useful.

k-sareen · 2022-12-09T01:37:38Z

Perhaps. One thing about errors that I have dealt with a lot recently is that the error cases basically have arguments, such as which file or line number they occurred at, etc. There is some application-level data that might be programmatically attached and useful.

Right. But these fields can be transparently inserted by the compiler at compilation time without the user having to specify them (well I don't know if it's specifically easy to do in Virgil, but it is theoretically possible at least). Unless you imagine exposing these fields (i.e. file name, line number) to the programmer? I don't immediately see how it's useful or relevant, since generally something like an enum with a field (describing the error) is good enough for the programmer to report errors.

k-sareen · 2022-12-09T01:41:41Z

We could do something like:

def fn_that_returns_error(fail: bool) -> !UserType {
    if (fail) {
        return err("some error");
    } else {   
        return UserType.new();
    }
}

or some other similar syntax for returning errors. Here !UserType is a union of err<T>(val: T) and UserType.

Though maybe this discussion should be moved to a separate GitHub issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value of `null`? #111

Value of `null`? #111

k-sareen commented Dec 2, 2022 •

edited

titzer commented Dec 4, 2022

k-sareen commented Dec 5, 2022 •

edited

k-sareen commented Dec 5, 2022 •

edited

titzer commented Dec 5, 2022

k-sareen commented Dec 6, 2022 •

edited

titzer commented Dec 6, 2022 •

edited

k-sareen commented Dec 6, 2022

titzer commented Dec 7, 2022

k-sareen commented Dec 9, 2022 •

edited

k-sareen commented Dec 9, 2022 •

edited

Value of null? #111

Value of null? #111

Comments

k-sareen commented Dec 2, 2022 • edited

titzer commented Dec 4, 2022

k-sareen commented Dec 5, 2022 • edited

k-sareen commented Dec 5, 2022 • edited

titzer commented Dec 5, 2022

k-sareen commented Dec 6, 2022 • edited

titzer commented Dec 6, 2022 • edited

k-sareen commented Dec 6, 2022

titzer commented Dec 7, 2022

k-sareen commented Dec 9, 2022 • edited

k-sareen commented Dec 9, 2022 • edited

Value of `null`? #111

Value of `null`? #111

k-sareen commented Dec 2, 2022 •

edited

k-sareen commented Dec 5, 2022 •

edited

k-sareen commented Dec 5, 2022 •

edited

k-sareen commented Dec 6, 2022 •

edited

titzer commented Dec 6, 2022 •

edited

k-sareen commented Dec 9, 2022 •

edited

k-sareen commented Dec 9, 2022 •

edited