Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lapack_lite should use 64-bit integer indices #5906

Closed
njsmith opened this issue May 21, 2015 · 37 comments · Fixed by #15218
Closed

lapack_lite should use 64-bit integer indices #5906

njsmith opened this issue May 21, 2015 · 37 comments · Fixed by #15218

Comments

@njsmith
Copy link
Member

njsmith commented May 21, 2015

Currently lapack-lite uses int for indices, so large arrays cause overflow and things crash (e.g. #5898). There's a separate need to handle this issue when using real BLAS/LAPACK (which may or may not handle arrays with >2**31 elements, depending on the vendor), but when falling back on our built-in code there's no reason we shouldn't get this right.

@argriffing
Copy link
Contributor

I remember a mailing list thread suggesting the possibility of blis, a BSD-licensed BLAS-like framework. I just checked its config notes, and it explicitly allows separate internal vs. interface integer sizes as documented in the 'BLAS compatibility layer' section in
https://code.google.com/p/blis/wiki/ConfigurationHowTo
In particular, you could have

#define BLIS_BLAS2BLIS_INT_TYPE_SIZE 32
#define BLIS_INT_TYPE_SIZE 64

If I understand correctly, the current numpy code doesn't have this flexibility. On the other hand I'm not sure if 'blis' and its related projects include enough of the BLAS/LAPACK functionality yet, or if its Windows support is mature enough, or if enough of its plumbing is otherwise compatible with numpy.

@njsmith
Copy link
Member Author

njsmith commented May 22, 2015

Blis is only a replacement for blas, not for lapack. But fortunately, the
same people also have a pure-C lapack replacement called flame, which is
feature complete at least. So yeah, this is one possible direction
direction could go. They're at UT Austin so we should talk to them at
SciPy. (And I'm in contact contact them in the next email thread over...)
That's a somewhat more ambitious and complicated strategy than just doing a
search-replace on our existing fallback code though :-).
On May 22, 2015 8:45 AM, "argriffing" notifications@github.com wrote:

I remember a mailing list thread
http://mail.scipy.org/pipermail/numpy-discussion/2014-April/069868.html
suggesting the possibility of blis https://code.google.com/p/blis/, a
BSD-licensed BLAS-like framework. I just checked its config notes, and it
explicitly allows separate internal vs. interface integer sizes as
documented in the 'BLAS compatibility layer' section in
https://code.google.com/p/blis/wiki/ConfigurationHowTo
In particular, you could have

#define BLIS_BLAS2BLIS_INT_TYPE_SIZE 32
#define BLIS_INT_TYPE_SIZE 64

If I understand correctly, the current numpy code doesn't have this
flexibility. On the other hand I'm not sure if 'blis' and its related
projects include enough of the BLAS/LAPACK functionality yet, or if its
Windows support is mature enough, or if enough of its plumbing is otherwise
compatible with numpy.


Reply to this email directly or view it on GitHub
#5906 (comment).

@argriffing
Copy link
Contributor

just doing a search-replace on our existing fallback code though :-)

Which file should be search and replaced? I think the C file is somehow autogenerated. Also, both the interface and internal variables use type integer, so it would not be possible to just re-typedef integer or to search-replace integer in the C file.

@njsmith
Copy link
Member Author

njsmith commented May 22, 2015

I think (say, 70% confidence, so don't trust me blindly :-)), that: (a)
those files were were autogenerated once long ago via means that may or may
not be reproducible now, and which may or may not have been hand-tweaked
since, so hand tweaking them now is not necessarily an issue. (b) The files
are used entirely internally inside numpy, so we can change the interface
however we like.
On May 22, 2015 10:52 AM, "argriffing" notifications@github.com wrote:

just doing a search-replace on our existing fallback code though :-)

Which file should be search and replaced? I think the C file is somehow
autogenerated. Also, both the interface and internal variables use type
integer, so it would not be possible to just re-typedef integer or to
search-replace integer in the C file.


Reply to this email directly or view it on GitHub
#5906 (comment).

@argriffing
Copy link
Contributor

we can change the interface however we like.

I think it would make sense for the interface to remain 32 bit, for compatibility with the real BLASes.

@njsmith
Copy link
Member Author

njsmith commented May 22, 2015

What compatibility issues do you foresee? It's purely an internal
interface, and having a 64 bit version gives us something to fall back on
when real blas/lapack don't work. Bug-for-bug compatibility isn't really
a goal here.
On May 22, 2015 11:06 AM, "argriffing" notifications@github.com wrote:

we can change the interface however we like.

I think it would make sense for the interface to remain 32 bit, for
compatibility with the real BLASes.


Reply to this email directly or view it on GitHub
#5906 (comment).

@argriffing
Copy link
Contributor

@njsmith The 32-bitness of the interface is not the main cause of the bugs. For example, https://github.com/numpy/numpy/blob/master/numpy/linalg/umath_linalg.c.src uses int in a bunch of places, including in the interface. I'm suggesting to not change these types. On the other hand, https://raw.githubusercontent.com/numpy/numpy/master/numpy/linalg/lapack_lite/dlapack_lite.c has a bunch of integer declarations, some of which are for the interface and some of which are for internal indexing. I think the interface types should remain 32-bit and the integer types used for indexing in that file should be bumped to size_t.

@njsmith
Copy link
Member Author

njsmith commented May 22, 2015

What happens when someone has an array with more than 2**31 elements along
one axis?
On May 22, 2015 11:20 AM, "argriffing" notifications@github.com wrote:

@njsmith https://github.com/njsmith The 32-bitness of the interface is
not the main cause of the bugs. For example,
https://github.com/numpy/numpy/blob/master/numpy/linalg/umath_linalg.c.src
uses int in a bunch of places, including in the interface. I'm suggesting
to not change these types. On the other hand,
https://raw.githubusercontent.com/numpy/numpy/master/numpy/linalg/lapack_lite/dlapack_lite.c
has a bunch of integer declarations, some of which are for the interface
and some of which are for internal indexing. I think the interface types
should remain 32-bit and the integer types used for indexing in that file
should be bumped to size_t.


Reply to this email directly or view it on GitHub
#5906 (comment).

@argriffing
Copy link
Contributor

What happens when someone has an array with more than 2**31 elements along one axis?

Then everything is ruined. I think this is the case even for the default settings of the 'good' real BLASes but I'm not completely sure.

@njsmith
Copy link
Member Author

njsmith commented May 22, 2015

It is the case for real blas (modulo non-portable extensions), but that
doesn't mean it has to be the case for numpy.
On May 22, 2015 12:03 PM, "argriffing" notifications@github.com wrote:

What happens when someone has an array with more than 2**31 elements along
one axis?

Then everything is ruined. I think this is the case even for the default
settings of the 'good' real BLASes but I'm not completely sure.


Reply to this email directly or view it on GitHub
#5906 (comment).

@argriffing
Copy link
Contributor

I think it would make sense to fix these problems one at a time. For example, the first improvement would get the numpy blas to the point where it does not segfault when real blas does not segfault, then later maybe add the second improvement that lets the numpy blas work with larger arrays than the real blas can handle.

@pv
Copy link
Member

pv commented May 22, 2015 via email

@pv
Copy link
Member

pv commented May 22, 2015 via email

@argriffing
Copy link
Contributor

Lots of work if done manually, it seems.

I'm vaguely aware of research projects that try to automate this kind of thing, but I don't know any specifics. Also, some time around 2009-2010 the netlib lapack switched from the f2c CLAPACK (which I think we are using in numpy) to including C implementations in their main lapack distribution, if I understand their webpage correctly.

@njsmith
Copy link
Member Author

njsmith commented May 22, 2015

According to the README in lapack_lite/, we are using an independent
translation sorta similar to but different from CLAPACK. There's also code
in there to do the translation and automatically postprocess the code in
various ways.
On May 22, 2015 12:53 PM, "argriffing" notifications@github.com wrote:

Lots of work if done manually, it seems.

I'm vaguely aware of research projects that try to automate this kind of
thing, but I don't know any specifics. Also, some time around 2009-2010 the
netlib lapack switched from the f2c CLAPACK (which I think we are using in
numpy) to including C implementations in their main lapack distribution, if
I understand their webpage correctly.


Reply to this email directly or view it on GitHub
#5906 (comment).

@argriffing
Copy link
Contributor

So I looked at the most recent netlib lapack C code, and it might implement the first of the two levels of improvement. On one hand it doesn't seem to use different types for the interface vs. the internal indexing variables (in contrast to blis and openblas), but on the other hand it appears to use size_t casts where necessary. I'll include a couple of examples.


    for( i = 0; i < MIN( y, ldin ); i++ ) {
        for( j = 0; j < MIN( x, ldout ); j++ ) {
            out[ (size_t)i*ldout + j ] = in[ (size_t)j*ldin + i ];
        }
    }

This works because it explicitly casts to size_t even though the indexing variables are small.


a_t = (double*)LAPACKE_malloc( sizeof(double) * lda_t * MAX(1,n) );

This works because I think sizeof has type size_t so even though lda_t and n are both small integer types they do not overflow when multiplied. This would be as opposed to malloc(8 * lda_t * MAX(1, n)); which I think would overflow.

Edit: Now I see that the C code included with recent netlib lapack comprises only a couple of layers of C wrapping around the original Fortran functions.

@njsmith
Copy link
Member Author

njsmith commented May 22, 2015

@pv: What if we just did 'typedef npy_intp integer'?

On Fri, May 22, 2015 at 1:32 PM, argriffing notifications@github.com
wrote:

So I looked at the most recent netlib lapack C code, and it might
implement the first of the two levels of improvement. On one hand it
doesn't seem to use different types for the interface vs. the internal
indexing variables in the same way as blis and openblas, but on the other
hand it appears to use size_t casts where necessary. I'll include a
couple of examples.

for( i = 0; i < MIN( y, ldin ); i++ ) {
    for( j = 0; j < MIN( x, ldout ); j++ ) {
        out[ (size_t)i*ldout + j ] = in[ (size_t)j*ldin + i ];
    }
}

This works because it explicitly casts to size_t even though the indexing
variables are small.

a_t = (double*)LAPACKE_malloc( sizeof(double) * lda_t * MAX(1,n) );

This works because I think sizeof has type size_t so even though lda_t
and n are both small integer types they do not overflow when multiplied.
This would be as opposed to malloc(8 * lda_t * MAX(1, n)); which I think
would overflow.


Reply to this email directly or view it on GitHub
#5906 (comment).

Nathaniel J. Smith -- http://vorpus.org

@pv
Copy link
Member

pv commented May 22, 2015 via email

@njsmith
Copy link
Member Author

njsmith commented May 22, 2015

Having looked over the code a bit, it does seem doable to me -- the main
issue is that the way the build system works, we assume that lapack_lite
routines will have the same signatures as real blas/lapack routines, so the
callers just have hard-coded signatures and rely on the build system to
link them to one library or the other. So if lapack_lite ended up with
different signatures then we'd need some build-time #define's or something
in the callers to choose the right signatures. Which is doable.
.
For the int arrays, you're thinking of things like dgeqrf's lwork argument?
These are simple arrays allocated by the caller, right? So if we have a
LAPACK_INT_T #define then the caller can allocate an array of the
appropriate type and we're all good, right? AFAICT there is no way for any
code outside of the numpy source tree to call these things.
.
Not saying the patch is trivial necessarily, just it seems doable to me if
someone wants to take it on.

On Fri, May 22, 2015 at 3:15 PM, Pauli Virtanen notifications@github.com
wrote:

@njsmith: some of the routines take int arrays --- can't just change it


Reply to this email directly or view it on GitHub
#5906 (comment).

Nathaniel J. Smith -- http://vorpus.org

@argriffing
Copy link
Contributor

some of the routines take int arrays --- can't just change it

Although the lapack-lite function definition c code does not make the scalar vs array distinction, the information is available in the original fortran file. Maybe richer typedef-ing in f2c or in the numpy code generation scripts could preserve the difference?

from https://raw.githubusercontent.com/numpy/numpy/master/numpy/linalg/lapack_lite/dlapack_lite.c:

/* Subroutine */
int dgetrf_(integer *m, integer *n, doublereal *a, integer *lda, integer *ipiv, integer *info) {

from dgetrf.f:

*  =====================================================================
      SUBROUTINE DGETRF( M, N, A, LDA, IPIV, INFO )
*
*  -- LAPACK computational routine (version 3.4.0) --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*     November 2011
*
*     .. Scalar Arguments ..
      INTEGER            INFO, LDA, M, N
*     ..
*     .. Array Arguments ..
      INTEGER            IPIV( * )
      DOUBLE PRECISION   A( LDA, * )

@argriffing
Copy link
Contributor

But fortunately, the same people also have a pure-C lapack replacement called flame, which is feature complete at least.

Wait, are you sure this is feature complete? I had the idea that the BLAS part is feature-complete, but that some of the LAPACK parts depend on a real underlying LAPACK. For example I searched the docs for gesv and the only hit seems to defer to an external implementation.

Imp. Notes: FLA_Svd_external() performs its computation by calling an external implementation
of the LAPACK routines ?gesvd()/?gesvd(). The algorithmic variants employed by
these routines, as well as any blocksizes used by subroutines of ?gesvd()/?gesvd(), are
implementation-dependent.

@njsmith
Copy link
Member Author

njsmith commented May 23, 2015

Wait, are you sure this is feature complete? I had the idea that the BLAS part is feature-complete, but that some of the LAPACK parts depend on a real underlying LAPACK.

I am not sure! I just know that Robert van de Geijn told me: "we have an entire LAPACK implementation that does not require a Fortran compiler, passes the LAPACK test suite, and in cases where our libflame library has better algorithms links to libflame." So I'm not sure what that means exactly (apparently this is something different than libflame itself?), and it's always possible I misinterpreted somehow. If you want to investigate this more than I can put you in touch with them :-). (Will you be at SciPy by chance?)

@argriffing
Copy link
Contributor

Maybe it would be possible to use a slow portable OpenBLAS pure-C reference implementation if this exists? I know this is not the point of OpenBLAS, and I don't know OpenBLAS well enough to know whether this is a thing. This could especially make sense if OpenBLAS becomes the recommended 'real' BLAS/LAPACK as proposed in http://mail.scipy.org/pipermail/numpy-discussion/2015-May/072979.html.

(Will you be at SciPy by chance?)

I may try to fit it between the turpentine, yak, and alligator conferences!

@njsmith
Copy link
Member Author

njsmith commented May 27, 2015

Pff, SciPy will be a way better waste of time than those. Also, @xianyi and the BLIS developers are all at UT Austin these days, just like SciPy, so it's a good opportunity to try and sort some of these things out.

@xianyi
Copy link

xianyi commented Jun 2, 2015

OpenBLAS supports 64-bit integer by make INTERFACE64=1.

@argriffing
Copy link
Contributor

I guess OpenBLAS also has the advantage (vs. numpy's 'vendored' fallback BLAS/LAPACK) of supporting 64-bit indices internally even when the interface is 32-bit (BLASLONG vs. blasint). Currently numpy's BLAS/LAPACK doesn't make this distinction between interface integer size vs. internal index size, and the resulting unnecessary overflows cause segfaults; it would be nice to either 'vendor' a better vanilla BLAS/LAPACK or to depend on an external BLAS/LAPACK but I'm not sure which option is more realistic.

@tkelman
Copy link
Contributor

tkelman commented Jun 3, 2015

As Julia learned the hard way (and eventually fixed), using 64-bit integers in BLAS (aka ILP64) is quite dangerous if you are calling into BLAS as a shared library, because it represents an incompatible ABI to the long-established standard configuration in which libblas is most commonly compiled. If any other library or C extension you load into the same application happens to call a function from libblas expecting the conventional 32-bit-int API, you get segfaults and other nasty behavior. We avoid this now in Julia by modifying the API, the function names (appending a suffix 64_ to all ILP64 blas and lapack symbols), by which we call our own internal ILP64 copy of libblas (most commonly libopenblas). See JuliaLang/julia#4923 and JuliaLang/julia#8734 and OpenMathLib/OpenBLAS#459 and #5479 (comment).

Supporting this as a non-default build configuration for anyone who wants to compile from source is perfectly fine to do, but it should come with a big warning unless you also modify the function names.

lapack-lite might not be subject to this problem, depending on whether it's built as a shared library, whether or not it uses the conventional blas and lapack function names directly, whether or not it's visible for the outside world to ever use it directly, or other caveats that I'm not aware of.

@argriffing
Copy link
Contributor

As Julia learned the hard way (and eventually fixed), using 64-bit integers in BLAS (aka ILP64) is quite dangerous if you are calling into BLAS as a shared library

Yes that sounds right.

Just to repeat some stuff, there is a distinction between the integer types used in the interface (which is trickier to change, if you even want to) vs. the non-interface integer types. The current lapack-lite works analogously to this, I think:

typedef int integer;
integer some_lapack_thing(integer *pwidth, integer *pheight, integer *foo_array) {
  integer size = (*pwidth) * (*pheight); /* <- could unnecessarily overflow */
  for (integer i=0; i<size; ++i) {
    something_with(foo_array[i]); /* <- maybe hangs or segfaults */
  }
  return 0;
}

Changing the integer sizes everywhere, for example by using ILP64 where int is 64-bit or by re-typedef-ing integer would cause the problems you mentioned because the interface would change. On the other hand I have in mind something less drastic to improve numpy, which is how I think OpenBLAS and probably most if not all other real BLAS/LAPACK libraries already work even in 32-bit interface mode:

typedef int32_t blasint;
typedef int64_t blaslong;
blasint some_lapack_thing(blasint *pwidth, blasint *pheight, blasint *foo_array) {
  /* This interface still can't accept larger-than-32-bit width or height,
   * but that's OK for now and is difficult to change for the reasons
   * mentioned by @tkelman */
  blaslong width = *pwidth;
  blaslong height = *pheight;
  blaslong size = width * height; /* <- no more overflow! */
  for (blaslong i=0; i<size; ++i) {
    something_with(foo_array[i]); /* <- doesn't hang or segfault! */
  }
  return 0;
}

The idea would be to somehow effect this change without manually editing the code. Maybe OpenBLAS or BLIS/FLAME could be 'vendored' somehow to replace lapack-lite, or maybe if a BLAS/LAPACK becomes standard enough then it could be added as a hard dependency and lapack-lite could be ditched.

@tkelman
Copy link
Contributor

tkelman commented Jun 3, 2015

Yes that sounds safer, openblas does heavily use BLASLONG throughout its internals (set to long long on win64, if you were worried about LLP64), whether or not you build with INTERFACE64=1. BLIS supports a similar build configuration where the "classic API" blas compatibility layer can use 32-bit ints while all of the internal implementations and new more sophisticated C API's use int64_t. I guess netlib reference blas doesn't have any more targeted option for this than -fdefault-integer-8, and neither do the various f2c translations?

@argriffing
Copy link
Contributor

I guess netlib reference blas doesn't have any more targeted option for this than -fdefault-integer-8, and neither do the various f2c translations?

I don't know. If it exists, this would seem to be the easiest route because lapack-lite is based on an f2c translation of the netlib reference.

@juliantaylor
Copy link
Contributor

does openblas have 64 bit namespaced variants? or will INTERFACE64 change everything to int64, including the api?
The problem is very simple to solve if the internals are already 64 bit capable. Just wrap all functions and suffix the 64 bit variants but keep the 32 bit variant unchanged and present. Optionally add a macro that switches the 32 bit variants to the 64 bit ones for consumers that are completely (including dependencies) 64 bit aware.

That is the way POSIX as solved the 64 bit filesize issues more than a decade ago. The base libraries simply have a 64 suffixed variants and the old ones still included, e.g. open and open64, seek and seek64 etc. Newly compiled stuff either switches by locally using the suffixed ones or globally via _FILE_OFFSET_BITS/_LARGEFILE_SOURCE macros, old stuff continues to use the 32 bit wrappers.

@tkelman
Copy link
Contributor

tkelman commented Jun 3, 2015

I added a build-time option (not set by default) to openblas in OpenMathLib/OpenBLAS#459 where you can set SYMBOLPREFIX or SYMBOLSUFFIX and it uses either objcopy or objconv (http://www.agner.org/optimize/#objconv, only needed on mac since objcopy doesn't exist there) to rename all the API symbols.

I never worked out a great way of putting the 32 bit variants into the same library as the suffixed 64 bit variants, it looked like that would require running openblas' build system twice and we already build with DYNAMIC_ARCH enabled so it takes a while. Nobody has asked for conventional 32-bit blas in Julia since we made this change, packages or external libraries that want it can use their own 32-bit-int libblas from whatever other source they want.

@juliantaylor
Copy link
Contributor

openblas having one or the other isn't an option for distributions, it has to be both. Being good for binary distributors while still being fast is the main advantage openblas has over all other free variants.

One does not need to build openblas twice, one could also create small wrapper for all 32 bit functions that converts the argument to 64 bit and then calls the 64 bit variant. The wrappers represent the regular 32 bit ABI that so many programs rely on.
Very easy to do, but tedious.

@tkelman
Copy link
Contributor

tkelman commented Jun 3, 2015

Distributions should probably stick with LP64 for the foreseeable future, there's too much code in the world that would break, and not that many people doing multiple gigabytes worth of linear algebra within a single memory space. The people who need to do this probably know how to compile their own custom copy of blas from source themselves, and will usually be aware of these compatibility issues.

And does it really need to be both within the same single-file library? As long as they don't conflict I don't see why you couldn't have the two interfaces be in separate files. Distribution buildbots aren't that time-constrained, they can run make twice. We just decided not to do that for Julia since it wasn't worth the hassle.

one could also create small wrapper for all 32 bit functions that converts the argument to 64 bit and then calls the 64 bit variant.

Not for anything in lapack that sets a return array of integers for pivots, that would be a bit messier to wrap needing copy-casting loops.

@insertinterestingnamehere
Copy link
Contributor

Adding an additional suffix for the 64 bit versions sounds like a very good idea. Unless this is resolved properly, some symbols may leak into other extensions, as was observed in scipy/scipy#4736 (comment).
That example applies only to the custom xerbla that numpy uses, but similar problems may arise for other functions as well. It's still not clear to me what caused that symbol to leak into the SciPy code, but the prospect of having symbols from a modified version of BLAS and LAPACK shadow the existing ones is concerning.

@nalimilan
Copy link

FWIW, Fedora and EPEL now ship an ILP64 OpenBLAS variant using the 64_ suffixed API. So distro builds can switch to ILP64 whenever they want.

@eric-wieser eric-wieser changed the title lapack-lite should use 64-bit integer indices lapack_lite should use 64-bit integer indices Feb 24, 2017
@eric-wieser
Copy link
Member

eric-wieser commented Feb 28, 2017

@njsmith:

those files were were autogenerated once long ago via means that may or may
not be reproducible now

Note that as of #8381, this is reproducible again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants