Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range types for integers (or refinement types?) #671

Open
steveklabnik opened this issue Jan 21, 2015 · 40 comments
Open

Range types for integers (or refinement types?) #671

steveklabnik opened this issue Jan 21, 2015 · 40 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@steveklabnik
Copy link
Member

Issue by BryanQuigley
Saturday Dec 13, 2014 at 03:46 GMT

For earlier discussion, see rust-lang/rust#19801

This issue was labelled with: A-an-interesting-project, E-hard in the Rust repository


It seems like a natural extension of how variables (immutable by default, mutable if specified) are defined to allow the programmer to dictate a specific range of allowed values for an integer. If I know a value is only valid between 0-1000 the sooner I declare that the better it is for catching bugs, off by one errors, and more...

I'm not sure what exact syntax would work, maybe:

let mut(0,1000) x = 0i;

x is only valid from 0-1000 inclusive.

(Apologies if this is already possible, I've been parsing the docs trying to learn Rust.)

@kmcallister
Copy link
Contributor

Ada has this. It would be great in Rust, as we're going for the same niche of low-level + high-assurance.

I wonder though if we should go for a more general system, such as refinement types or pre/post-conditions. There have been a number of successes bolting these onto languages like Haskell, F#, and C#. AIUI, they manage this without heavy changes to the core language. The condition checker is a mostly-separate tool that gathers proof obligations from the source code and passes them to a SMT solver.

Basically I think this is an area where we should give researchers and other interested parties some time to experiment before we standardize something into Rust itself.

@vks
Copy link

vks commented Feb 22, 2015

Does Ada check the ranges at runtime or at compile time?

I don't think we need a special syntax for this.

Here is a toy implementation of an integer that checks the range at runtime:

use std::num::Int;

#[derive(Copy,Debug)]
struct RangedInt<T: Int> {
    value: T,
    min: Option<T>,
    max: Option<T>,
}

impl<T: Int> RangedInt<T> {
    fn checked_add(self, other: Self) -> Option<Self> {
        let sum = match self.value.checked_add(other.value) {
            None => return None,
            Some(i) => i,
        };
        match self.min {
            Some(i) if sum < i => return None,
            _ => (),
        };
        match self.max {
            Some(i) if sum > i => return None,
            _ => (),
        };
        Some(RangedInt { value: sum, min: self.min, max: self.max })
    }
}

fn main() {
    let a = RangedInt::<i64> { value: 5, min: Some(0), max: Some(10) };
    let b = RangedInt::<i64> { value: 9, min: Some(0), max: Some(10) };
    let ok = a.checked_add(a);
    let fail = a.checked_add(b);
    println!("ok:   {:?}", ok);
    println!("fail: {:?}", fail);
}    

yielding

ok:   Some(RangedInt { value: 10, min: Some(0), max: Some(10) })
fail: None

@kmcallister
Copy link
Contributor

@vks: I believe Ada's semantics are those of run-time checking, but the compiler can elide checks when it proves they can't fail.

I think this is a pragmatic approach to range checking. You can use the feature today, without understanding a complicated system for getting through the compiler. The code may be slow, but it will get faster over time as the compiler gets smarter, with zero effort on your part.

It sounds nice anyway.

Another approach is to insert dynamic range checks in debug builds only. This fits nicely with the new integer overflow semantics. Release builds would just prepare a report of which ranges were not verified statically. This lets you target unchecked ranges as a code metric to reduce over time.

At any rate, I agree that we don't need special syntax for range types. All we need is to allow macro invocations in type expressions. Then

fn foo(x: range!(0..10)) {

will work whether range! is a macro_rules! macro, a compiler plugin, or a built-in feature with a feature gate.

@freebroccolo has already done some of this work.

@vks
Copy link

vks commented Feb 22, 2015

@kmcallister I would much rather have generics over values. Something like RangedInt<i64, 0, 10>.

@oli-obk
Copy link
Contributor

oli-obk commented Mar 6, 2015

The issue with generic types or macros is that rust cannot automatically modify the range when it learns something new about a variable.

let mut(0,1000) x = 0i;
if (x < 100) { return; }
// range of x is now (100..1000)
let (0,1000) x = 0i;
let (0, 1000) y = ...;
let z = x / y;
// range of x is now (1..1000) as otherwise the division would have panicked.

@vks
Copy link

vks commented Mar 6, 2015

You can if you define the operators in terms of &mut self.

On Fri, Mar 6, 2015, 14:40 Oliver Schneider notifications@github.com
wrote:

The issue with generic types or macros is that rust cannot automatically
modify the range when it learns something new about a variable.

let mut(0,1000) x = 0i;if (x < 100) { return; }// range of x is now (100..1000)

let (0,1000) x = 0i;let (0, 1000) y = ...;let z = x / y;// range of x is now (1..1000) as otherwise the division would have panicked.


Reply to this email directly or view it on GitHub
#671 (comment).

@oli-obk
Copy link
Contributor

oli-obk commented Mar 6, 2015

Not if you implement it with [not yet available] generic values. Sidenote: This is actually similar to typestate.

@vks
Copy link

vks commented Mar 6, 2015

I think you can implement it as of now, with checks at runtime though.

On Fri, Mar 6, 2015, 16:22 Oliver Schneider notifications@github.com
wrote:

Not if you implement it with [not yet available] generic values. Sidenote:
This is actually similar to typestate.


Reply to this email directly or view it on GitHub
#671 (comment).

@kmcallister
Copy link
Contributor

Definitely check out Sage and the associated papers. Their system uses a hybrid of traditional type checking/inference, a theorem-proving black box, and dynamic checking as a fall-back. In 1,200 lines of (ML-syntax) test code, they have 5,087 subtyping judgements, of which 99.7% are discharged at compile time.

One of the Sage examples is a binary search tree whose type includes the range of valid keys, which is used to guarantee that the tree stays in search order. In Rust-like syntax:

fn Range(lo: isize, hi: isize) -> type {
    those x: isize where lo <= x && x < hi
}

enum BiTree(lo: isize, hi: isize) {
    Empty,
    Node {
        mid: Range(lo, hi),
        left: BiTree(lo, mid),
        right: BiTree(mid, hi),
    },
}

impl BiTree(lo: isize, hi: isize) {
    fn search(&self, v: Range(lo, hi)) -> bool {
        match *self {
            BiTree::Empty => false,
            BiTree::Node { ref mid, ref left, ref right } => {
                match v.cmp(mid) {
                    Ordering::Less => left.search(v),
                    Ordering::Equal => true,
                    Ordering::Greater => right.search(v),
                }
            }
        }
    }

    fn insert(&self, v: Range(lo, hi)) -> BiTree(lo, hi) {
        match *self {
            BiTree::Empty => BiTree::Node {
                mid: v,
                left: BiTree::Empty,
                right: BiTree::Empty,
            }
            BiTree::Node { ref mid, ref left, ref right } if v < *mid => {
                BiTree::Node {
                    mid: mid,
                    left: left.insert(v),
                    right: right,
                }
            }
            BiTree::Node { ref mid, ref left, ref right } => {
                BiTree::Node {
                    mid: mid,
                    left: left,
                    right: right.insert(v),
                }
            }
        }
    }
}

Note that in the dependently-typed setting, there is no need to separate type parameters and value parameters. I have taken various liberties with the syntax, particularly the treatment of implicit function arguments. The syntax

those x: isize where lo <= x && x < hi

represents a refinement type, equivalent to the more traditional

{x: isize | lo <= x && x < hi}

Refinements on fn arguments could use the existing where syntax and don't need the those keyword.

Making Rust dependently typed would be a huge step. But it's a huge step that provides integer generics, variadic generics, refinement types, compile-time function evaluation, and lots of other goodies, in a single unified framework. Potentially it could subsume a lot of the existing compile-time machinery, as well.

cc @bstrie

@nielsle
Copy link

nielsle commented May 25, 2018

The Const generics"-RFC will allow types that are generic over an integer parameter. This will probably be useful for defining ranged types in a library.

@leonardo-m
Copy link

Instead of this syntax:

let mut(0,1000) x = 0i;

I like:

let mut x: u32[.. 1_000] = 0;

Or even:

let mut x = 0_u32[.. 1_000];

@RustyYato
Copy link

RustyYato commented Jul 17, 2019

Once const-generics is completely implemented, const parameters can refer to type parameters and we are able to make bounds based off of const parameters, we will ideally be able to do this as a library like so,

struct Ranged<T: Ord, const LOW: T, const HIGH: T>(T) where LOW <= HIGH;

Right now on nightly we can do

struct RangedU32<const LOW: u32, const HIGH: u32>(u32);

But this will not allow the compiler to take advantage of the unused-space like NonZero* does. If it gets integrated into std then these Ranged* versions can be lang items while NonZero* becomes based off of these new Ranged* types.

@leonardo-m
Copy link

leonardo-m commented Jul 18, 2019

But this will not allow the compiler to take advantage of the unused-space like NonZero* does.

There's a list of various important features that library-defined intervals can't support. So I think they need to be built-ins.

@jhpratt
Copy link
Member

jhpratt commented Oct 21, 2019

Is there an RFC for something along the lines of what @KrishnaSannasi said? This could absolutely allow the compiler to optimize space, and could halve the size of certain structs. If not, I'll look into what's actually necessary for a proper RFC.

@RustyYato
Copy link

@jhpratt I am not aware of any RFC into this, but I think it would probably be best to hold off on it until const generics is complete on nightly.

@leonardo-m
Copy link

@KrishnaSannasi why do you think pipelining (working in parallel) on such things is a bad idea?

@RustyYato
Copy link

@leonardo-m I think that bounded range types should start off as a crate so we can try out different apis, and we can't really do that right now (no expression handling in const generics).

@jhpratt
Copy link
Member

jhpratt commented Oct 21, 2019

@KrishnaSannasi One of the key advantages of range bounded integers is the ability for smaller representation, which isn't possible without rustc integration.

@RustyYato
Copy link

RustyYato commented Oct 21, 2019

@jhpratt but the api we use to work with the bounded integer types don't depend on the compiler, so they can be worked on as a seperate crate. Also, with complete expressions in const generics, you could implement the space optimizations without compiler integration just by using Rust's powerful type system. Finally, bounded integers are useful even without that optimisation, so it would be fine to just have a crate.

@jhpratt
Copy link
Member

jhpratt commented Oct 21, 2019

Yes, most of the API would be possible to implement as a crate, though treating it as a primitive would not be. Ideally, I'd be able to do let x: int<1, 12> = 7; and it would just work without having to call .into() (or even perform runtime checks, as the compiler could check this simple case).

How would space optimization be enabled? int<1, 12> only needs 4 bits to store the information, but the backing type would be at the minimum eight. You could get around Option<int<1, 12>> by using specialization, but that gets complicated very quickly. The orphan rule might also have something to say about that, but I'm not 100% sure on that.

It's certainly useful, and something I may look into implementing to an extent myself at some point. It's just that it would be far more useful to have it built-in.

@RustyYato
Copy link

It's certainly useful, and something I may look into implementing to an extent myself at some point. It's just that it would be far more useful to have it built-in.

Yes, this is a valid reason to merge an external crate into std. But this is not a sufficient reason to start it in std.

I'd be able to do let x: int<1, 12> = 7; and it would just work without having to call .into() (or even perform runtime checks, as the compiler could check this simple case)

If there were runtime checks for let x: int<1, 12> = 7; in the generated assembly, I would be very concerned about all of Rust's zero-cost abstractions. This seems trivial to optimize out. Having the ability to not call into is some sugar is really nice, and even more reason to put them in std once we have a viable crate.

How would space optimization be enabled?

We could round up to the nearest byte (which would already be a large space saving) and use traits like those in typenum to pick which backing integer type to use.

You could get around Option<int<1, 12>> by using specialization, but that gets complicated very quickly.

You can't specialize layouts like this without compiler integration.


One thing about specializing layouts that I didn't see brought up. It would make multiplication and division much more costly. Assuming that you are only storing the offset from the lower bound, every multiplication and division would have an additional addition folded in to offset back to the correct value. Is that worth the space savings? Is there some clever way to not do this? Shifting back may not be possible, because the multiplication may overflow u128, in that case you could use other methods to multiply or divide, but it gets messy.

I bring this up because it doesn't look like anyone has discussed this here. Also, how do other languages implement this? Do any other languages implement this as a library?

@scottmcm
Copy link
Member

scottmcm commented Oct 24, 2019

Ideally, I'd be able to do let x: int<1, 12> = 7; and it would just work without having to call .into()

Note that this is also true for other things like NonZeroU32 and BigInteger, so to me that argues for a "custom literals" feature, not for "range integers need to be in core".

@sighoya
Copy link

sighoya commented Nov 3, 2019

If rust would adopt refinement types for example over where constraints, would it guarantee for a decidable fragment?

@khoek
Copy link

khoek commented Apr 25, 2020

Is anything like this in the language yet?

@HadrienG2
Copy link

It's probably best to experiment at the library level first, and that in turn requires const generics if you're not in the mood for fighting typenum.

@leonardo-m
Copy link

so to me that argues for a "custom literals" feature, not for "range integers need to be in core".

We should make a list of all the things a library defined ranged value can't do compared to a well designed built-in feature. I think the number of sub-features that should to be build-in is limited, but they are important enough to make them worth using often.

@burdges
Copy link

burdges commented May 21, 2020

We'll eventually want compiler plugins (forks?) for various "slow" static analysis task, including refinement types, but presumably only after chalk.

It'd also help prevent them from interacting with soundness. Refinements must not touch dispatch obviously. If refinements tried influencing dispatch, then they'd wind up inside some much longer purgatory than the mere 5 years required by specialization. ;)

An optional compile phase that rejects some programs when called, but cannot modify behavior sounds cleanest. And developers would prefer such phases being invoked only by CI.

@Lokathor
Copy link
Contributor

only by CI?

@jhpratt
Copy link
Member

jhpratt commented Nov 25, 2020

Just for anyone that may be thinking of doing the same thing — I intend to begin work on a basic implementation of ranged integers on nightly. With min_const_generics looking likely to stabilize in February 2021, it's probably best to get out ahead of it.

@scottmcm
Copy link
Member

@jhpratt Great to hear it! I look forward to seeing how it goes.

The piece I keep looking forward to, though -- Integer<A,B> + Integer<C,D> => Integer<{A+C},{B+D}> -- isn't in min_, sadly.

@jhpratt
Copy link
Member

jhpratt commented Nov 25, 2020

Yeah, there will be some explicit .try_into() calls needed for sure. For all I know this could go absolutely nowhere. I'm just interested in the feasibility of it for the time crate, honestly.

@jhpratt
Copy link
Member

jhpratt commented Dec 10, 2020

For the curious, I recently started working on that. Right now all there exists is definitions of various ranged types and their respective From/TryFrom implementations — though not yet with each other (so no TryFrom<RangedU16<…>> for RangedU8<…> yet).

Out of curiosity, I tried using #[rustc_layout_scalar_valid_range_start] and #[rustc_layout_scalar_valid_range_end], but they currently require literals. That's a compiler internal, though, so not a big deal.

If you'd like to follow along and/or contribute, I'll be periodically pushing code up to jhpratt/deranged. I haven't yet published anything on crates.io, though I have reserved the name.

@iFreilicht
Copy link

I'd love if we could find some solution for this that allows literals to be checked against the range of arbitrary types.
There's already num_traits::Bounded and fixed, which would benefit greatly from this.

@chrysn
Copy link

chrysn commented Sep 26, 2022

The upcoming version of ux uses a method to achieve something like ranged integers on stable Rust: It exposes numeric types (eg. the 28 bit unsigned integer u28) that are technically a struct, repr'd and aligned in such a way that the most significant byte is a 16-variant enum (inhabiting the values 0x0X) and the tail is bytes, that is always tuned (by inspecting the machine architecture) so that a number is represented exactly in its relevant-size'd representation. This way, the compiler knows about available niches, and it has many of the effects that a good solution to this should have. (Not all possibilities are realized by the compiler, but it has all the data and progressively implements more).

This has a few downsides:

  • It gets complex really fast. Building a NonZeroU28 would be rather complex.
  • It does take some time to build.
  • It's still a hack. It's a genius hack, and one I'd have no reservations about using, but it's still throwing a lot of code at something that could be a simple annotation to the compiler.

Back when this was started, we didn't have many of the tools we have now. Maybe it's time to revisit. To give a straw man proposal, the core library could expose (for each numeric type, using u16 as stand-in):

#[repr(transparent)]
pub struct RangedU16<const MIN: u16, const MAX: u16>(
    #[rustc_layout_scalar_valid_range_start(MIN)]
    #[rustc_ayout_scalar_valid_range_end(MAX)]
    u16
)

with the appropriate TryFrom, Into and and a pub unsafe fn new_unchecked(value: u16) -> Self. This could be introduced unstably and stabilized without the need to stabilize rustc_layout_scalar_valid_range_start, and the underlying implementation would stay free to change as long as the type can be transmuted into an u16 (which I assume is a practical thing, and NonZeroU16 does it too). It might even be interesting to look at whether NonZeroU16 can be typedefed at introduction to be RangedU16<1, 65535>.

@afetisov
Copy link

If we get variable-width numbers, they should be compiler built-ins. Otherwise we're inviting the C++ debug performance and compile time calamity.

@chrysn
Copy link

chrysn commented Sep 26, 2022

I agree; the ux is not a suggestion how to do this, but mentioned to illustrate that there is a need and that workarounds are being used.

But they don't have to look like compiler built-ins, they can behave like any other type (just with the difference that the compiler itself may have better tools to do that). The original post's proposal looked very built-in -- but back then we had no const generics to do any better.

@iFreilicht
Copy link

iFreilicht commented Sep 29, 2022

It seems #76560 could enable this. As @scottmcm said:

The piece I keep looking forward to, though -- Integer<A,B> + Integer<C,D> => Integer<{A+C},{B+D}> -- isn't in min_, sadly.

And we really only need basic arithmetic in const generics, which is exactly what the goal of that initiative is.
Maybe I'll dig out @jhpratt's effort from two years ago and see if that can be improved with the experimental complex const generics.

On the topic of ergonomics:

If we go for maximum correctness, I would assume that any constant and literal of value x has a type of RangedI32<x, x> by default. And they should of course be assignable to compatible ranged types as well:

const a = 6; 
// Equivalent to
const a: RangedI32<6, 6> = 6;
const b: RangedU64<6, 6> = 6;
const c: RangedU8<6, 6> = 6;

Now, if you use constants or literals in any sort of arithmetic, the same thing would apply, we talked about that before:

const a = 6; 
const b = 12;
// All the type annotations could be omitted as well
const sum: RangedI32<18, 18> = a + b;
const product: RangedI32<72, 72> a * b;
const div: RangedI32<2, 2> = b / a;

And so on. Extending to actual ranged values:

fn do_safe_arithmetic(a: RangedU8<30, 100>, b: RangedU8<4, 20>, factor: RangedU8<0, 2>) -> RangedU8<0, 192> {
    let z  = a - b;  // Type would be `RangedU8<10, 96>`
    z * factor
}

The compiler can now prove that some operations are always valid and can never overflow or underflow. Additionally, it could also give a warning or an error if something overflows:

fn do_dangerous_arithmetic(a: RangedU8<30, 100>, b: RangedU8<4, 20>, divisor: RangedU8<10, 120>) -> RangedU8<1, 200>  {
    let z = a * b;  // Type would be `RangedI8<120, 2000>`, which is of course not possible
//          ^^^^^       
//          |
//          Maximum result of calculation would be 2000, which is more than 255, the highest representable value
//          help: Try casting to RangedU16: [...]
    z / divisor
}

But here you can start to see the problem. If I were to follow the logical suggestion and cast the operands of the multiplication, the resulting function is awfully verbose:

fn do_arithmetic_proper(a: RangedU8<30, 100>, b: RangedU8<4, 20>, divisor: RangedU8<10, 120>) -> RangedU8<1, 200>  {
    let z = (a as  RangedU16<30, 100>) * (b as RangedU16<4, 20>);  // Type would now be `RangedU16<120, 2000>`, which is fine
    (z / divisor) as RangedU8<1, 200> 
}

(as seems to make more sense here than .into() as all these casts are known to be trivial as the range is representable in the new type)

Maybe this is just the price we have to pay for panic-safe arithmetic, but it feels icky. I'd rather have u8 just be an alias for u8<_, _>, so tracking the range at type level happens implicitly without the user having to explicitly specify it in every arithmetic operation.

fn do_arithmetic_proper(a: u8<30, 100>, b: u8<4, 20>, divisor: u8<10, 120>) -> u8<1, 200>  {
    let z = (a as  u16) * (b as u16);  // Implicit types would be `z: u16<120, 2000>`, `a: u16<30,100>`, `b: u16<10,120>`
    (z / divisor) as u8 
}

(when using u8 without range parameters and they can't be inferred, like in function signatures, the default should be the biggest representable range, of course)

I concede that this is kind of magical, but I feel like this is so much more readable, and would make the whole feature so incredibly useful and nice to adopt, it would be worth the squeeze.

Basically you'd get on-demand, zero-cost (except compile time) overflow safety.

@chrysn
Copy link

chrysn commented Sep 29, 2022 via email

@jhpratt
Copy link
Member

jhpratt commented Sep 29, 2022

I've played around with deranged before and have been able to successfully add implementations like that. They get quite verbose, but it's doable.

@safinaskar
Copy link

The work on range/pattern types is happening here: rust-lang/rust#107606 (gigantic PR +1,210 −237 is written) 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests