tensor: allow `compute` to bypass caching mechanism #408

rohany · 2021-02-13T19:58:32Z

This commit adds a boolean to the TensorBase::compute method allowing
for bypass of the compute caching mechanism. This is useful when
benchmarking the generated kernels.

This commit adds a boolean to the `TensorBase::compute` method allowing for bypass of the compute caching mechanism. This is useful when benchmarking the generated kernels.

rohany · 2021-02-25T00:00:23Z

cc @stephenchouca this PR makes it easier to write benchmarks, so I'd like to merge it (and also into the array_algebra branch) for the ooplsa benchmarks

stephenchouca · 2021-02-25T17:56:17Z

Is this change really necessary? As an example, if one wants to benchmark SpMV multiple times, they can already do something like this with the current API:

for (int t = 0; t < trials; t++) {
  Tensor<double> y({A.getDimension(0)}, dv);
  y(i) = A(i,j) * x(j);
  y.compile();
  auto begin = std::chrono::high_resolution_clock::now();
  y.assemble();
  y.compute();                                             
  auto end = std::chrono::high_resolution_clock::now();
  auto diff = std::chrono::duration<double,std::milli>(end - begin).count();
}

The compute kernel would be run in every iteration since y is redefined, but compilation would actually still only happen once due to the code caching mechanism.

More fundamentally though, I wonder if perhaps we should be discouraging people from benchmarking TACO using the lazy evaluation API. As things are now, it can be difficult for users to figure out what's actually being timed, since there's so much happening in the background. For instance, if A and x were not already packed in the previous example, it wouldn't be at all obvious that the previous example actually also measures the amount of time it takes to pack the operands. Perhaps there needs to be a way for users to simply disable the lazy evaluation API so that it's more intuitive what exactly each method call is doing at runtime.

rohany · 2021-02-25T18:07:36Z

Is this change really necessary

probably not. I understand that you can benchmark it this way, but it just becomes sort of cumbersome and slower to run than being able to just do the computation in the benchmark loop.

More fundamentally though, I wonder if perhaps we should be discouraging people from benchmarking TACO using the lazy evaluation API. As things are now, it can be difficult for users to figure out what's actually being timed, since there's so much happening in the background. For instance, if A and x were not already packed in the previous example, it wouldn't be at all obvious that the previous example actually also measures the amount of time it takes to pack the operands. Perhaps there needs to be a way for users to simply disable the lazy evaluation API so that it's more intuitive what exactly each method call is doing at runtime.

I agree. I don't have suggestions as of now, but probably not how the CLI does it (manually lowering and then calling the packed functions).

tensor: allow compute to bypass caching mechanism

aca351d

This commit adds a boolean to the `TensorBase::compute` method allowing for bypass of the compute caching mechanism. This is useful when benchmarking the generated kernels.

stephenchouca mentioned this pull request Jun 28, 2021

RFC: Allow computations to semantically be evaluated when compute is invoked #478

Open

rohany added this to Doing in Stanford TACO Issues Tracker Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor: allow `compute` to bypass caching mechanism #408

tensor: allow `compute` to bypass caching mechanism #408

rohany commented Feb 13, 2021

rohany commented Feb 25, 2021

stephenchouca commented Feb 25, 2021

rohany commented Feb 25, 2021

tensor: allow compute to bypass caching mechanism #408

Are you sure you want to change the base?

tensor: allow compute to bypass caching mechanism #408

Conversation

rohany commented Feb 13, 2021

rohany commented Feb 25, 2021

stephenchouca commented Feb 25, 2021

rohany commented Feb 25, 2021

tensor: allow `compute` to bypass caching mechanism #408

tensor: allow `compute` to bypass caching mechanism #408