Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from int to long int? #3277

Open
SteveBronder opened this issue Apr 1, 2024 · 3 comments
Open

Move from int to long int? #3277

SteveBronder opened this issue Apr 1, 2024 · 3 comments

Comments

@SteveBronder
Copy link
Collaborator

Summary:

I'm not sure where to post this as it's a Stan wide thing. Currently int values are just basic C int types with values from +/-2,147,483,647. Recently @bob-carpenter had a model with 1M parameters and it had me thinking that it's more realistic that we start having users try larger data sets in the future. For reference, a 10 column matrix of doubles with rows of length max int would be about 172 GB (8 * 2,147,483,647 * 10). Single machines are capable now of having 1TB of memory. While I think we are a few years away from having to worry about this it's something we should think about a little.

I think there's a few options for us

  1. Have a stan::index alias that we use throughout all the repos. We can set this to long int, unsigned int, or whatever we want. (Eigen::Index is long int by default)
  2. Add a long int to the language and have all the code that accepts integers be templated or use auto to deduce the int type. This is more work but lets users keep using int types without worry any loss in speed or memory size from larger integral types.

I'm personally fine with both. The only thing I feel strongly about is that we should keep using a signed type as the default for any index type

Current Version:

v2.34.1

@bob-carpenter
Copy link
Contributor

It was a rookie mistake on my part just using int in the beginning.

Rather than (1) or (2), how about:

  1. Replace all of the int types with the size-specific C++11 int64_t type.

I'd also be OK with a typedef set to that value if you think it may change again in the future. I'm not worried about mildly breaking backward compatibility in the case of a Stan program that relies on integer overflow behavior.

@WardBrian
Copy link
Member

WardBrian commented Apr 1, 2024

This is blocked partially by our current IO mechanisms, since the model classes' write_array serializes everything as a double.

Every int32 is representable in a double more-or-less exactly, but not every int64 is. For a classic example, 2^53 +1 is a totally valid integer, but it's nearest double representation is indistinguishable from 2^53

Now, @mitzimorris's ongoing look at re-thinking the IO has essentially already determined that this current way of doing things in write_array needs to go if we want other nice features, so hopefully we won't need to worry about this when the time comes.

@SteveBronder
Copy link
Collaborator Author

Replace all of the int types with the size-specific C++11 int64_t type.

Yes I like that

This is blocked partially by our current IO mechanisms, since the model classes' write_array serializes everything as a double.

Ooof, did not know that.

Yeah this is not a huge rush thing and idt it will be a problem for the foreseeable future. We can certainly handle the problems around this until we are ready for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants