-
-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lapack_lite should use 64-bit integer indices #5906
Comments
I remember a mailing list thread suggesting the possibility of blis, a BSD-licensed BLAS-like framework. I just checked its config notes, and it explicitly allows separate internal vs. interface integer sizes as documented in the 'BLAS compatibility layer' section in #define BLIS_BLAS2BLIS_INT_TYPE_SIZE 32
#define BLIS_INT_TYPE_SIZE 64 If I understand correctly, the current numpy code doesn't have this flexibility. On the other hand I'm not sure if 'blis' and its related projects include enough of the BLAS/LAPACK functionality yet, or if its Windows support is mature enough, or if enough of its plumbing is otherwise compatible with numpy. |
Blis is only a replacement for blas, not for lapack. But fortunately, the
|
Which file should be search and replaced? I think the C file is somehow autogenerated. Also, both the interface and internal variables use type |
I think (say, 70% confidence, so don't trust me blindly :-)), that: (a)
|
I think it would make sense for the interface to remain 32 bit, for compatibility with the real BLASes. |
What compatibility issues do you foresee? It's purely an internal
|
@njsmith The 32-bitness of the interface is not the main cause of the bugs. For example, https://github.com/numpy/numpy/blob/master/numpy/linalg/umath_linalg.c.src uses |
What happens when someone has an array with more than 2**31 elements along
|
Then everything is ruined. I think this is the case even for the default settings of the 'good' real BLASes but I'm not completely sure. |
It is the case for real blas (modulo non-portable extensions), but that
|
I think it would make sense to fix these problems one at a time. For example, the first improvement would get the numpy blas to the point where it does not segfault when real blas does not segfault, then later maybe add the second improvement that lets the numpy blas work with larger arrays than the real blas can handle. |
The fix is anyway doable in a mechanical way; just replace "integer n ->
interace_int interface_n" in the argument lists, add "integer n =
interface_n" declarations to the body, and finally change the default
integer type to ssize_t.
|
... except it isn't because it's full of integer pointers :(
Lots of work if done manually, it seems.
|
I'm vaguely aware of research projects that try to automate this kind of thing, but I don't know any specifics. Also, some time around 2009-2010 the netlib lapack switched from the f2c CLAPACK (which I think we are using in numpy) to including C implementations in their main lapack distribution, if I understand their webpage correctly. |
According to the README in lapack_lite/, we are using an independent
|
So I looked at the most recent netlib lapack C code, and it might implement the first of the two levels of improvement. On one hand it doesn't seem to use different types for the interface vs. the internal indexing variables (in contrast to blis and openblas), but on the other hand it appears to use for( i = 0; i < MIN( y, ldin ); i++ ) {
for( j = 0; j < MIN( x, ldout ); j++ ) {
out[ (size_t)i*ldout + j ] = in[ (size_t)j*ldin + i ];
}
} This works because it explicitly casts to a_t = (double*)LAPACKE_malloc( sizeof(double) * lda_t * MAX(1,n) ); This works because I think Edit: Now I see that the C code included with recent netlib lapack comprises only a couple of layers of C wrapping around the original Fortran functions. |
@pv: What if we just did 'typedef npy_intp integer'? On Fri, May 22, 2015 at 1:32 PM, argriffing notifications@github.com
Nathaniel J. Smith -- http://vorpus.org |
@njsmith: some of the routines take int arrays --- can't just change it
|
Having looked over the code a bit, it does seem doable to me -- the main On Fri, May 22, 2015 at 3:15 PM, Pauli Virtanen notifications@github.com
Nathaniel J. Smith -- http://vorpus.org |
Although the lapack-lite function definition c code does not make the scalar vs array distinction, the information is available in the original fortran file. Maybe richer typedef-ing in f2c or in the numpy code generation scripts could preserve the difference? from https://raw.githubusercontent.com/numpy/numpy/master/numpy/linalg/lapack_lite/dlapack_lite.c: /* Subroutine */
int dgetrf_(integer *m, integer *n, doublereal *a, integer *lda, integer *ipiv, integer *info) { from dgetrf.f: * =====================================================================
SUBROUTINE DGETRF( M, N, A, LDA, IPIV, INFO )
*
* -- LAPACK computational routine (version 3.4.0) --
* -- LAPACK is a software package provided by Univ. of Tennessee, --
* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
* November 2011
*
* .. Scalar Arguments ..
INTEGER INFO, LDA, M, N
* ..
* .. Array Arguments ..
INTEGER IPIV( * )
DOUBLE PRECISION A( LDA, * ) |
Wait, are you sure this is feature complete? I had the idea that the BLAS part is feature-complete, but that some of the LAPACK parts depend on a real underlying LAPACK. For example I searched the docs for
|
I am not sure! I just know that Robert van de Geijn told me: "we have an entire LAPACK implementation that does not require a Fortran compiler, passes the LAPACK test suite, and in cases where our libflame library has better algorithms links to libflame." So I'm not sure what that means exactly (apparently this is something different than libflame itself?), and it's always possible I misinterpreted somehow. If you want to investigate this more than I can put you in touch with them :-). (Will you be at SciPy by chance?) |
Maybe it would be possible to use a slow portable OpenBLAS pure-C reference implementation if this exists? I know this is not the point of OpenBLAS, and I don't know OpenBLAS well enough to know whether this is a thing. This could especially make sense if OpenBLAS becomes the recommended 'real' BLAS/LAPACK as proposed in http://mail.scipy.org/pipermail/numpy-discussion/2015-May/072979.html.
I may try to fit it between the turpentine, yak, and alligator conferences! |
Pff, SciPy will be a way better waste of time than those. Also, @xianyi and the BLIS developers are all at UT Austin these days, just like SciPy, so it's a good opportunity to try and sort some of these things out. |
OpenBLAS supports 64-bit integer by |
I guess OpenBLAS also has the advantage (vs. numpy's 'vendored' fallback BLAS/LAPACK) of supporting 64-bit indices internally even when the interface is 32-bit ( |
As Julia learned the hard way (and eventually fixed), using 64-bit integers in BLAS (aka ILP64) is quite dangerous if you are calling into BLAS as a shared library, because it represents an incompatible ABI to the long-established standard configuration in which libblas is most commonly compiled. If any other library or C extension you load into the same application happens to call a function from libblas expecting the conventional 32-bit-int API, you get segfaults and other nasty behavior. We avoid this now in Julia by modifying the API, the function names (appending a suffix Supporting this as a non-default build configuration for anyone who wants to compile from source is perfectly fine to do, but it should come with a big warning unless you also modify the function names. lapack-lite might not be subject to this problem, depending on whether it's built as a shared library, whether or not it uses the conventional blas and lapack function names directly, whether or not it's visible for the outside world to ever use it directly, or other caveats that I'm not aware of. |
Yes that sounds right. Just to repeat some stuff, there is a distinction between the integer types used in the interface (which is trickier to change, if you even want to) vs. the non-interface integer types. The current lapack-lite works analogously to this, I think: typedef int integer;
integer some_lapack_thing(integer *pwidth, integer *pheight, integer *foo_array) {
integer size = (*pwidth) * (*pheight); /* <- could unnecessarily overflow */
for (integer i=0; i<size; ++i) {
something_with(foo_array[i]); /* <- maybe hangs or segfaults */
}
return 0;
} Changing the integer sizes everywhere, for example by using ILP64 where typedef int32_t blasint;
typedef int64_t blaslong;
blasint some_lapack_thing(blasint *pwidth, blasint *pheight, blasint *foo_array) {
/* This interface still can't accept larger-than-32-bit width or height,
* but that's OK for now and is difficult to change for the reasons
* mentioned by @tkelman */
blaslong width = *pwidth;
blaslong height = *pheight;
blaslong size = width * height; /* <- no more overflow! */
for (blaslong i=0; i<size; ++i) {
something_with(foo_array[i]); /* <- doesn't hang or segfault! */
}
return 0;
} The idea would be to somehow effect this change without manually editing the code. Maybe OpenBLAS or BLIS/FLAME could be 'vendored' somehow to replace lapack-lite, or maybe if a BLAS/LAPACK becomes standard enough then it could be added as a hard dependency and lapack-lite could be ditched. |
Yes that sounds safer, openblas does heavily use |
I don't know. If it exists, this would seem to be the easiest route because lapack-lite is based on an f2c translation of the netlib reference. |
does openblas have 64 bit namespaced variants? or will INTERFACE64 change everything to int64, including the api? That is the way POSIX as solved the 64 bit filesize issues more than a decade ago. The base libraries simply have a 64 suffixed variants and the old ones still included, e.g. open and open64, seek and seek64 etc. Newly compiled stuff either switches by locally using the suffixed ones or globally via _FILE_OFFSET_BITS/_LARGEFILE_SOURCE macros, old stuff continues to use the 32 bit wrappers. |
I added a build-time option (not set by default) to openblas in OpenMathLib/OpenBLAS#459 where you can set I never worked out a great way of putting the 32 bit variants into the same library as the suffixed 64 bit variants, it looked like that would require running openblas' build system twice and we already build with |
openblas having one or the other isn't an option for distributions, it has to be both. Being good for binary distributors while still being fast is the main advantage openblas has over all other free variants. One does not need to build openblas twice, one could also create small wrapper for all 32 bit functions that converts the argument to 64 bit and then calls the 64 bit variant. The wrappers represent the regular 32 bit ABI that so many programs rely on. |
Distributions should probably stick with LP64 for the foreseeable future, there's too much code in the world that would break, and not that many people doing multiple gigabytes worth of linear algebra within a single memory space. The people who need to do this probably know how to compile their own custom copy of blas from source themselves, and will usually be aware of these compatibility issues. And does it really need to be both within the same single-file library? As long as they don't conflict I don't see why you couldn't have the two interfaces be in separate files. Distribution buildbots aren't that time-constrained, they can run
Not for anything in lapack that sets a return array of integers for pivots, that would be a bit messier to wrap needing copy-casting loops. |
Adding an additional suffix for the 64 bit versions sounds like a very good idea. Unless this is resolved properly, some symbols may leak into other extensions, as was observed in scipy/scipy#4736 (comment). |
FWIW, Fedora and EPEL now ship an ILP64 OpenBLAS variant using the |
Currently lapack-lite uses
int
for indices, so large arrays cause overflow and things crash (e.g. #5898). There's a separate need to handle this issue when using real BLAS/LAPACK (which may or may not handle arrays with >2**31 elements, depending on the vendor), but when falling back on our built-in code there's no reason we shouldn't get this right.The text was updated successfully, but these errors were encountered: