[WIP] Laser forEach iterators #479

mratsim · 2020-12-12T10:53:49Z

This replaces the use of mapX_inline and applyX_inline by the forEach / forEachContiguous / forEachParallel / forEachSerial laser iterators.

This is particularly valuable for recurrent neural network like GRU because we can implement the equation in a straightforward manner with meaning ful variable name instead of magic x, y, z and we would have needed an apply11_inline anyway (with the correct var/non-var parameters).

TODO:

There is a significant parallel performance regression on GRU when running test_nnp_gru. While before this PR the parallel version was "only" 2x slower than serial, now it's 13x slower than serial which probably signals a false sharing issue.
Even then RNN are a type of iterative stencil application that require special care and often used for polyhedral benchmarking so another approach using tiling is probably needed to properly speed those up. (see https://github.com/numforge/laser/blob/master/research/automatic_loop_nest_scheduling.md#polyhedral-approaches)

Not in scope:

using forEach for backward propagation in GRU: this is a headache inducing refactoring

…igate (was already slower than serial)

mratsim · 2021-01-03T13:01:47Z

Some changes in operator BLAS L1, require changing autograd to +.= because the += based on apply2_inline was doing implicit broadcast somehow. But then we have cascading issues that requires changing +.= to use broadcast2 and broadcast2 fixes, and then we have the Stack autograd layer that is failing tests. See 2929cb5 in the laser-iterators-stashed branch

…gmoid_cross_entropy

mratsim added 12 commits January 3, 2021 11:42

first introduction of laser iterators

5a9ca29

Support tensor fields iterators for PCA.

2177ae3

forEach for error function (1% slowdown in test_nnp_loss.nim)

03abaf7

Support array access and export forEach from the "tensor" module

1a59bdf

update the optimizer tester

1a69a60

Handled typed AST passed to untyped param :/

886f939

Optimizers use forEach

2dfdd81

activation functions

46e0c79

Use the new forEach, warning! huge parallel perf regression to invest…

ac2f240

…igate (was already slower than serial)

change all apply_inline

1037c2b

no need for apply4 workaround in GRU

c69835a

Change softmax iterator, also make example compile

1910009

mratsim force-pushed the laser-iterators branch from c86fdc9 to 1910009 Compare January 3, 2021 11:32

Convert apply3 and frobenius inner prod

d8d986d

mratsim added 3 commits January 3, 2021 15:05

#485 first part, don't parallelize setZero

69efb08

Trying to solve #485 to no avail, at least drop parallelization of si…

979f5d5

…gmoid_cross_entropy

don't parallelize copyFromRaw either

d2d625a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Laser forEach iterators #479

[WIP] Laser forEach iterators #479

mratsim commented Dec 12, 2020

mratsim commented Jan 3, 2021

[WIP] Laser forEach iterators #479

Are you sure you want to change the base?

[WIP] Laser forEach iterators #479

Conversation

mratsim commented Dec 12, 2020

mratsim commented Jan 3, 2021