Tensor contraction? #9

rreusser · 2015-08-16T22:45:54Z

Following up here on my ill-advised rreusser/ndarray-awise-prototype#1 since this is starting to seem more like a extension or fork of cwise than a use of cwise.

Was trying to figure out how to express reduce operations in a general manner but fell way short. Your response "there are probably better ways to do this but hey, more than one way to skin a cat" is way too kind. 😝

I'm currently thinking about what it'd mean to reorder dimensions within cwise. Expressing this seems the most immediately challenging part while the rest is an exercise for the implementer…

What about actually allowing index notation? For example:

contraction over one index: In other words, matrix multiplication. The k loop goes on the inside with its pre/post just outside it. The scope of this would be only over a k loop, but you could use c directly. Would have to think about a syntax that would actually allow gemm-style block-wise multiplications. Represented as:

{
  args: [ 'array(i,j,k)', 'array(k,l,m)', 'array(i,j,l,m)'],
  contractIndices: ['k'],
  beforeContract: function(a, b, c) { c = 0 },
  body: function(a, b, c) { c += a * b },
  afterContract: function(a, b, c) { }
}

contraction over two indices: Similar. beforeContract and afterContract go just outside the innermost contraction loops:

{
  args: [ 'array(i,j,k)', 'array(j,k,l)', 'array(i,l)' ],
  contractIndices: ['j','k'],
  beforeContract: function(a, b, c) { c = 0 },
  body: function(a, b, c) { c += a * b },
  afterContract: function(a, b, c) { }
}

Average over two of three dimensions: Here the notation gets unwieldy and it starts to feel like it's all falling apart.

{
  args: [ 'array(j)', 'array(i,j,k)', 'size' ],
  contractIndices: ['i', 'k'],
  pre: function( size ) { this.factor = 1 / size.i / size.k }
  beforeContract: function(a, b) { a = 0 },
  body: function(a, b) { a += b },
  afterContract: function(a, b) { b *= this.factor }
}

I think this is logically consistent and can be optimized within reason (or at least reduces to cwise if unused). It seems possible but probably too invasive to work into cwise which seems more oriented toward image processing. Features like blockIndices and offset are great, but it might add factorial complexity to get all combinations of cases to play well together.

It sounded like you've thought about this before. I think what I've described is a thing that could exist; I just don't know if it's a thing that should exist…

Then again, I might be aiming for something more complicated than the benefit derived from it…

The text was updated successfully, but these errors were encountered:

jaspervdg · 2015-08-17T08:53:43Z

My current thinking is that this kind of thing would in fact best be done as part of a new cwise. In particular, the way cwise currently generates code is a little ad-hoc, and difficult to generalize/maintain, but I do think reductions and the like should probably be an explicit part of cwise (if only to make it easier to in the future support different backends). I'm thinking of something that would generate a "plan" of some sort, apply some transformations if desired, and then "compiles" the plan into code.

I think your proposal is interesting, but instead of having explicit indices, it might be better to use a syntax more reminiscent of blockIndices for example, as this will make it much easier to write general interfaces. I also agree that it would definitely be worth thinking about the simplest possible interface we can get away with (while still allowing for sufficient generality). For example, I'm not entirely sure beforeContract and afterContract are really worth it, given that this can also be done by pre-/post-processing.

Note that apart from explicitly supporting reductions and broadcasts, we may also want to consider "scans".

rreusser · 2015-08-17T10:42:07Z

Good points. The reason for the indices is that I was trying to understand how to add additional controls so that an intermediate layer could generate operators that can't just be written once (e.g. general-purpose mean along some axes) with enough control to avoid pathological cases.

More generally, a couple things I've run into using cwise:

It's slow for lots of small vectors (i.e. linear algebra). I think this is because of the type signature memoization, but I'm now working through the code to better understand why. Seems like it'd be nice to allow access to the bare-bones loop code for fast repetitive applications on small/moderate-sized vectors.
axis-wise reductions are tricky. Thoughts on an argmax operator, for example:
- a temporary variable for the value requires this to be done on one slice at a time. Maybe the solution should be to write a wrapper that uses a cwise operator inside a loop, but I've had bad luck using cwise inside loops (slow!), so was trying to avoid the pathological case where you have 1000 applications of cwise on two elements. (that is, for example, reducing a 1000x2 array to a 1000x1 array of means)
- the blockIndices strategy requires a lot of input-transposing on behalf of the user. Maybe this is the way it should be, but feels like a workaround. Also it's just not technically necessary since cwise could represent this.

So the common thread is that I had some subtle issues trying to use cwise as a general-purpose high-performance tool since a lot of what I ended up doing with it was one pathological case or another.

Maybe what I'm talking about is, as you've suggested, a planning layer that represents the gory, low-level controls so that powerful operations can be generated without mucking up the outward-facing syntax (which is currently really nice!). What if a cwise code generator received an 'plan' representation of the operator. cwise could stay unchanged and generate this plan. The features I've described would just require a different plan generator.

rreusser · 2015-08-17T13:11:01Z

Clarification: not so much advocating different forks, just a separation of declaration of the operation from internal representation of the operation, as you've suggested.

skosch · 2015-10-06T18:51:20Z

Sorry for barging in, but reduction over arbitrary dimensions is exactly what I need, and what I am having trouble with.

Given I've transposed my array such that the to-be-aggregated-dimensions are last, and there are n of them, how would I go about writing a cwise body that sums them up? I understand that I would use blockIndices: -(numAggregatedAxes) and then, in the body, loop over all of the items that I'm aggregating. But since I don't know in advance how many dimensions I'm aggregating (i.e. looping) over, I'm stuck. Is there a way to do this using a single cwise function?

(Thanks a million for an amazingly useful library, by the way!)

rreusser · 2015-10-06T19:09:49Z

@skosch I don't think current cwise allows arbitrary notation for reductions, but here's a workaround:

It turns out that scalar arguments are basically passed through, whether they're actually Numbers or not. That means you can pass a result array. Together with an index argument, you can accomplish maybe what you need. Maybe that's better explained through an example.

Okay, so for a 3x4x5 array, say we want to sum over the second dimension. You could do this:

var pool = require('ndarray-scratch')
var A = pool.zeros([3,4,5])
var sum = pool.zeros([3,5])

var operator = cwise({
  args: ['array', 'scalar', 'index'],
  body: function(A, sum, index) {
    var newValue = A + sum.get( index[0], index[2] )
    sum.set( index[0], index[2], newValue )
  }
})

operator(A)

This will iterate over every element of A (in no particular order), adding it to the correct entry of sum. It's not optimal since sum could be looped over more efficiently, but it should do the trick.

Since that doesn't handle, as you mentioned, arbitrary reductions though, I wrote a quick module that handles arbitrary reductions and uses precisely this trick: ndarray-awise-prototype. It's not the most efficient thing ever (though I haven't profiled it so it might not be entirely inefficient either), but it applies a map-reduce option and takes care of dimensions for you. You can see an example here. I'm reasonably confident in the correct functioning of the module; I only didn't publish it more confidently because I had some other ideas. Here's a more out-there and non-functioning attempt at a macro language that would encompass these operations.

rreusser · 2015-10-06T19:23:31Z

(Strictly speaking though, it's definitely not a solved problem, in case you have a better idea!)

skosch · 2015-10-06T19:24:42Z

Wow, that was quick. Thanks!

Your example is interesting. I don't know if it helps much in my case. I was hoping for some undocumented way of accessing the elements I'm reducing over, without having to know how many []'s I'll need. So, you know how

sum += arr[0][i];

is rewritten into

_inline_4_sum += a1[p1+0*t1b0+_inline_4_i*t1b1];

... right? Well, if I could bypass that somehow, then I could just do that index-by-stride multiplication myself, with an arbitrary number of them. Would that do the trick?

(I mean, it's not exactly a beautiful solution, much less a convenient one, but if it worked I'd take it)

edit: it appears to me that this is kind of the direction your own experiment was going in, is that correct?

rreusser · 2015-10-06T19:33:06Z

Correct. My compromise is that I was willing to use external get/set which compute the index all over again. For me, I mainly wanted to express the operations consisely so the overhead was entirely neglible.

Can you clarify what operation you're trying to compute?

skosch · 2015-10-06T19:41:29Z

Nothing in particular right now. I build mathematical optimization apps on top of an existing platform, and I'm trying to write a few utility functions to make it easier to quickly aggregate and manipulate large, up-to-6-dimensional tables for display, because I tend to be inundated in tons of nested loops and/or recursions.

I'm hoping to implement at least a few common suspects (sum, mean, product, min, max, count, etc.), but the whole thing really is only useful if I can throw arbitrary dimensions at it.

rreusser · 2015-10-06T20:01:53Z

Makes sense. Here are my thoughts then:

ndarrays support hash-tables and proxies among maybe other storage formats. I completely failed to appreciate that before writing some of my modules, so one of these days I need to go fix some of them. Maybe relevant for you, maybe not. But good to be aware of before the fact 😄
I don't think you can achieve this with vanilla cwise. It does some work to hide the interpreted code from you (edit: or rather, it does no work to expose the interpreted code). And blockIndices sounds right, but it's not (it's intended for rgb arrays or similar).
Those operations are implementable in a couple lines using the awise thing. If your application isn't performance-limited, then it might save time since it handles the array size/shape gymnastics correctly. If it is performance limited, then it'll probably be necessary to create your own module. I'm reasonably confident in the robustness, so could be used as a placeholder until you can write something better. (Of course the advantage of cwise in the first place is that it iterates in an intelligent order so that high-dimensional operations are efficient, so rewriting is nontrivial.)

In short, this is exactly what I wanted to address with ndarray-tensor-op-experiment—a more general notation for generating these loops—but it turned out to be a lot of work!

But then I know @jaspervdg and @mikolalysenko would love for this to exist and have their thoughts (and a lot more, better experience with this under their belts 😄) too!

skosch · 2015-10-06T20:26:28Z

Alright, that's helpful. Thanks for taking the time to explain all of this to me. :)

I suppose I'll go with your awise for now, and update my utility functions when a proper cwise extension for this comes along. I don't think these reductions are going to become the bottleneck, honestly ... I've been using nested standard Arrays up until now (I know) and fared just fine.

rreusser · 2015-10-06T20:45:40Z

You're spot on that it would be great for such a thing to exist though! Unfortunately it's a lot of work to rewrite cwise, so I wouldn't hold your breath waiting for the tensor stuff to get written. In the mean time at least, my module isn't great, but it does work… I published ndarray-awise-prototype to npm.

domoritz mentioned this issue Jun 22, 2018

Cumulative sum with cwise? #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor contraction? #9

Tensor contraction? #9

rreusser commented Aug 16, 2015

jaspervdg commented Aug 17, 2015

rreusser commented Aug 17, 2015

rreusser commented Aug 17, 2015

skosch commented Oct 6, 2015

rreusser commented Oct 6, 2015

rreusser commented Oct 6, 2015

skosch commented Oct 6, 2015

rreusser commented Oct 6, 2015

skosch commented Oct 6, 2015

rreusser commented Oct 6, 2015

skosch commented Oct 6, 2015

rreusser commented Oct 6, 2015

Tensor contraction? #9

Tensor contraction? #9

Comments

rreusser commented Aug 16, 2015

jaspervdg commented Aug 17, 2015

rreusser commented Aug 17, 2015

rreusser commented Aug 17, 2015

skosch commented Oct 6, 2015

rreusser commented Oct 6, 2015

rreusser commented Oct 6, 2015

skosch commented Oct 6, 2015

rreusser commented Oct 6, 2015

skosch commented Oct 6, 2015

rreusser commented Oct 6, 2015

skosch commented Oct 6, 2015

rreusser commented Oct 6, 2015