Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation of RandomAccess class #1007

Merged
merged 3 commits into from
Jan 21, 2024
Merged

implementation of RandomAccess class #1007

merged 3 commits into from
Jan 21, 2024

Conversation

stylewarning
Copy link
Member

@stylewarning stylewarning commented Oct 6, 2023

EDIT: See discussion below on the renaming from Array to RandomAccess.

EDIT: This PR now just implements the class and a couple obvious instances.

This adds a new class to the standard library (Array :F :T) allows the storage of elements of type :T inside of a storage type :F. The design requires :F be associated with only one type :T, so there's no possibility of doing any type shenanigans, such as accessing a U8 from a U64 array, even if that's valid in principle. As explained in a comment, this trades conceptual flexibility for much higher ergonomics.

The class implements a few instances (which could be expanded to all reasonable efficient types), and uses the convention of adding an 's' to the base type (e.g., a storage of Double-Float is DoubleFloats). We don't keep "legacy" '-' in existing names.

[Side note: I decided against having a general (Array :t) type, which I think would require a bunch of built-in compiler support. While it may still be a good idea worth discussing, it seems like it's not needed, and leads to a less flexible design anyway. (What if we want an array allocated in C-space? Or a GPU buffer?) Moreover, we have other data structures for expressing generic sequence data... having parametric polymorphism at the type level means, at some point, a fabled (Array :t) would have to just be a list of pointers anyway, as it is in Common Lisp.]

All operations become efficient Common Lisp code, up to inlining. There is some example code which includes using the monomorphizer.

This is a proposed base design, and I welcome feedback on the design, and any additional requests for implementation.

I suggest to myself the following before merging:

  • removing the example code obviously

CC @digikar99.

@digikar99
Copy link

digikar99 commented Oct 6, 2023

EDIT: Looks like my concerns might be addressed using the coalton:specialize facility.

The following can augment the example code u/stylewarning wrote to resolve the issue I raised below.

(coalton-toplevel
  (declare double!-sf (SingleFloats -> Unit))
  (define (double!-sf a)
    (let ((len (size a)))
      (lisp Unit (a len)
        (cffi:with-pointer-to-vector-data (ptr a)
          (cl:print a)
          (cffi:foreign-funcall "cblas_sscal" :int len
                                              :float 2.0
                                              :pointer ptr
                                              :int 1))
        Unit)))
  (specialize double! double!-sf (SingleFloats -> Unit)))

Pre-EDIT:

Let me know if I'm missing something. With this design, you either have a function defined for a particular specialized array type, or the entire (Array :f :t) class, and nowhere in between.

However, say, with a CUDA array, the example implementation of %sum-array and map! would be terribly inefficient, because they are computing using the CPU and using the GPU for nothing more than storage. This issue would also arise for other types of devices. So, I think it is important to make a distinction between arrays on the basis of their storage devices, and we should have an intermediate level of abstraction depending on whether an array is a LispArray, ForeignArray, CUDAArray, or OpenCLArray, that is neither as broad as (Array :f :t) nor as specialized as a SingleFloats.

A related issue is about being able to specialize the implementation of, say, double!. May be this isn't an issue with (Array :f :t) itself, but with the example implementations themselves. With BLAS, I could specialize double! to use sscal dscal cscal zscal for each of the specialized lisp arrays. But for other numeric types, I could fall back on the default implementation given in the example. As I understand, this would require each of the array functions sum-array, double!, etc to be defined as a typeclass in themselves. Would that be sufficient, or would that pose other issues? And are there better ways? For example -

(coalton-toplevel
  (define-class (ArrayDoubler :f :t (:f -> :t))
    (double! (:f -> Unit))))

(coalton-toplevel
  (define-instance (ArrayDoubler SingleFloats Single-Float)
    (define (double! a)
      (let ((len (size a)))
        (lisp Unit (a len)
            (cffi:with-pointer-to-vector-data (ptr a)
              (cl:print a)
              (cffi:foreign-funcall "cblas_sscal" :int len
                                                  :float 2.0
                                                  :pointer ptr
                                                  :int 1))
          Unit)))))

(cl:setf (cl:aref a n) x)
Unit)))))

(%define-unboxed-array cl:single-float Single-Float SingleFloats)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a really useful type, and I could see myself reworking a lot of stuff with this in mind.

I wonder if there might be utility in having a macro/function like define-type-with-array that would automatically add an array type when you define the new type. I know that's probably a later stage polish.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't unboxed-arrays be limited by the underlying lisp implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Izaakwltn Only some kinds of base types might benefit from an array type in Lisp. Other types might need more finesse, like, for example, an unboxed array of RGB values, which might require working with foreign memory.

Moreover, it's possible that we define (Vector :t) as a kind of storage of :t, so that there's at least one generically useful parametric storage type for all types :t, even if it's not as optimized as DoubleFloats and friends.

@stylewarning
Copy link
Member Author

@digikar99 Another possibility is to make a new type for C data, here's a sketch:

(repr :native cffi:pointer)
(define-type ForeignDoubleFloats)

(define-instance (Array ForeignDoubleFloats Double-Float)
  <logic to make and index foreign arrays>)

(declare call-with-foreign-array (DoubleFloats -> (ForeignDoubleFloats -> Unit) -> Unit))
(define (call-with-foreign-array dfs work)
  (lisp Unit (dfs work)
    (cffi:with-pointer-to-vector-data (ptr dfs)
      (call-coalton-function work dfs))))

;; examples

(define double! ...) ; (Array :f :t) => :f -> Unit

(define floats (the DoubleFloats (make 10 1.0d0))
(double! floats)
(call-with-foreign-array floats double!)

This doesn't contain an implementation that lets double! be specialized with BLAS, but it does let C arrays and Lisp arrays to (1) coexist, and (2) be used as one-another (or at least Lisp as C arrays). As you pointed out, a way to accomplish that is via specialization (double! -> double-blas! when called with a specific monomorphic argument) or type classes (works with any argument at runtime).

Copy link
Collaborator

@macrologist macrologist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Should add instances of (at least) the following:

  • IntoIterator
  • Semigroup
  • Monoid
  • Eq
  • Foldable

This might prove uncomfortable.

Because the instance defintion

(define-instance (Array :f :t => IntoIterator :f :t) ...)

does not refer to any actual types in the array package, you need to define instance in the coalton-library/iterator package itself.

The same would be true for Eqthe other "fundamental" typeclasses. This may indicatie that Arrray should be thought of as a fundamental class and be redefined in coalton-library/classes.

I don't think this is unreasonable, Array is indeed defining something fairly abstract and fundamental - a strict relationship between two types, one of them representing a reified mutable map from bounded naturals to the other type; a fairly fundamental concept for computing. Not quite as abstract as Functor, for example, but at least as abstract as Num and fromInt in my opinion.

@digikar99
Copy link

Does anyone have a preference for names?

To me, Array-Single-Float Array-Double-Float Array-I8 etc look better than SingleFloats DoubleFloats I8s etc.

@stylewarning
Copy link
Member Author

Does anyone have a preference for names?

To me, Array-Single-Float Array-Double-Float Array-I8 etc look better than SingleFloats DoubleFloats I8s etc.

I'm not particularly wedded to any particular convention. I didn't like that (in my initial experiment) things like Array-Complex-Single-Float was so long.

As a general comment, I myself am a big fan of idioms and colloquialisms if there's good reason to introduce them. Sort of like how we chose the name Seq instead of ImmutableSequence or RRBTreeImmutableFiniteSequence.Why? No other reason except it's quick to type and we'd prefer to emphasize/incentivize immutable data structures.

(the SingleFloats (make 100 5.0f0)) reads nicer to me than (the Array-Single-Floats (make 100 5.0f0)), and de-emphasizes potentially confusing misunderstandings about what is an array, a storage, a vector, 1D/N-D, what properties the aggregate supports, etc.

But as I said, I'm not so opinionated that I'd veto a majority opinion.

@digikar99
Copy link

de-emphasizes potentially confusing misunderstandings about what is an array, a storage, a vector, 1D/N-D, what properties the aggregate supports, etc.

That's a good point.

But, can some hyphens be thrown in, eg Single-Floats instead of SingleFloats?

@eliaslfox
Copy link
Collaborator

[Side note: I decided against having a general (Array :t) type, which I think would require a bunch of built-in compiler support. While it may still be a good idea worth discussing, it seems like it's not needed, and leads to a less flexible design anyway. (What if we want an array allocated in C-space? Or a GPU buffer?) Moreover, we have other data structures for expressing generic sequence data... having parametric polymorphism at the type level means, at some point, a fabled (Array :t) would have to just be a list of pointers anyway, as it is in Common Lisp.]

I think having a parametric array type is the correct approach. The compiler support required would be minimal, just extending lisp-type.

This design requires users to define new types (by calling an internal macro?) for each type they want to store in an array. A single parametric type would avoid this entirely.

@stylewarning
Copy link
Member Author

A parametric type may (eventually) have specialized representations, but I argue an array class is higher priority, if only to allow multiple array-like objects to be accessed uniformly. We do this in other Lisp code, where we work with Lisp and foreign arrays.

(Haskell has MArray for their class.)

I'm actually slightly against a (PackedVector :t) type, because if we had it, we would have to select one of two choices:

  1. Limit the allowed :t to types that actually have a packed/unboxed representation. This is what Haskell does. Haskell's STUArray _ _ t type actually only allows t to be: Word{,8,16,32,64}, Int{,8,16,32,64}, Float, Double, Char, Bool. That is, they have a parametric type for representing efficient unboxed arrays, but it's limited to only 15 element types.

  2. Allow any :t and internally use CL:MAKE-ARRAY, noting that for most :t, CL:MAKE-ARRAY will just return back an array of pointers (i.e., boxed objects).

The second option is what Vector did, but the compiler didn't know how to emit specialized Lisp type declarations in monomorphic scenarios.

I find both of these choices suboptimal: Arrays (in the context of this PR) are supposed to be a ticket to efficient code with well known space-time properties. Option #1 violates expectations of parametric polymorphism (i.e., :t ought to mean for all :t), and Option #2 violates expectations of space-time efficiency.

There's a third issue. Should general array processing code seek parametric or ad hoc polymorphism? I argue ad hoc for essentially these reasons:

  • Advertising a parametric type means library writers are incentivized to write code targeting that type, and not its natural interface. This is a critical flaw of the QVM, where code has been deeply specialized on a specific number type, a specific array type, etc. A flourishing ecosystem of (PackedArray :t) code means that none of it can use foreign arrays, GPU arrays, etc. My PR disincentivizes seeking polymorphism over the array type (because it doesn't actually exist) by only having concrete storage types with predictable performance.

  • Packed arrays aren't actually parametrically polymorphic, which is why they're inexpressible as basic Coalton code and we would seek compiler support for them. That is, a PackedArray type isn't actually uniformly treated in all use sites. Deep down, any code manipulating arrays has to effectively embed a runtime dispatch on the array type to execute any fundamental array operation (i.e., a global and non-extensible type class dictionary, without the type class constraint).

While a (PackedArray :t) sounds elegant at face value, I think it's fraught with issues, and Coalton could take a bold step to do efficient arrays in a new way.

@stylewarning
Copy link
Member Author

(As a secondary comment, I am interested to hear more about the minimal changes needed to support such a feature, since we'd need to do it for complex numbers, probably.)

@stylewarning
Copy link
Member Author

One downside of my approach of having a handful of separately named types is that it's inherently non-portable. Some Lisp implementations, for instance, will specialize (complex double-float), and others won't. It's easy to detect (look at upgraded-array-element-type), but it does pose a conundrum: in the case that we have such a handful of concrete types, do we only export those that have specialized representations, making it a syntax error to use a type that's not supported? Maybe that's a feature, not a downside...

@stylewarning stylewarning changed the title implementation of Array class implementation of RandomAccess class Oct 11, 2023
@stylewarning
Copy link
Member Author

After some discussion with @macrologist I've renamed Array to RandomAccess. The reasons are as follows:

  • Array can be confusing, especially if we take the literal computer science definition of Array which allows the index type to vary.
  • Array may or may not be better served as an actual data type, not a class of types.
  • RandomAccess really captures what the class is all about: contiguous collections which can be efficiently accessed by an integer index in constant time. This puts Vector in the purview of RandomAccess.

Some alternate names that were considered but eliminated as possibilities: Array, ArrayLike, Array1D, LinearMemory, LinearBuffer, Packed, FlatPack, RAM, TypedBuffer, FlatArray

In addition to the renaming of the class to RandomAccess, I also made set! and aref safe by default. Moreover, the types may not be readable (queried by readable?) or writable (queried by writable?), which means this class can be used for e.g. read-only mmap'd files, or write-only configuration memory.

In general, though, I fully expect "efficient code" (TM) to use unsafe-* functions extensively.

I guess the question of whether there should be some concrete collection types in this PR blessed by the standard library is still open.

@stylewarning
Copy link
Member Author

stylewarning commented Oct 11, 2023

Further side note: This PR continues to add to the confusion of aref, set! argument orders and the position of unsafe in the function name (*-unsafe or unsafe-*).

@stylewarning
Copy link
Member Author

I think what I will do to move on this PR is remove the specific type definitions and keep the class + Vector, and we can move in a separate PR what to do about type, esp. since I started working on the type specialization stuff in a separate PR.

@eliaslfox
Copy link
Collaborator

I'm in favor of separating out the prs for generic random access and specialized arrays.

Also is there a reason not to track readonly/writable at the type level?

@stylewarning
Copy link
Member Author

Also is there a reason not to track readonly/writable at the type level?

I didn't think of a good way to approach it. In my mind I thought about examples like mmaping a file, which itself might possibly be R or W only, depending on the UNIX permissions. With this, I imagined a hypothetical mmap-open :: Filename -> ByteArray could be written unconstrained instead of a ReadOnlyByteArray or (ReadOnly ByteArray) or something like that. I also wanted to avoid a whole zoo of classes/wrapper types/etc.

With that said, I didn't think about it too deeply beyond that. Do you have any design thoughts on/sketched of this?

@stylewarning
Copy link
Member Author

@eliaslfox I stripped out everything but the class (and a couple instances).

This adds a new class to the standard library (RandomAccess :F :T)
allows the storage of elements of type :T inside of a storage type :F
with efficient O(1) read/write access.

The class implements a few instances (which could be expanded to all
reasonable efficient types), and uses the convention of adding an 's'
to the base type (e.g., a storage of Double-Float is DoubleFloats). We
don't keep "legacy" '-' in existing names.

All operations become efficient Common Lisp code, up to inlining.
@stylewarning stylewarning merged commit 27574d1 into main Jan 21, 2024
13 of 17 checks passed
@stylewarning stylewarning deleted the arrays branch January 21, 2024 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants