Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Laser forEach iterators #479

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

[WIP] Laser forEach iterators #479

wants to merge 16 commits into from

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Dec 12, 2020

This replaces the use of mapX_inline and applyX_inline by the forEach / forEachContiguous / forEachParallel / forEachSerial laser iterators.

This is particularly valuable for recurrent neural network like GRU because we can implement the equation in a straightforward manner with meaning ful variable name instead of magic x, y, z and we would have needed an apply11_inline anyway (with the correct var/non-var parameters).

TODO:

  • There is a significant parallel performance regression on GRU when running test_nnp_gru. While before this PR the parallel version was "only" 2x slower than serial, now it's 13x slower than serial which probably signals a false sharing issue.
    Even then RNN are a type of iterative stencil application that require special care and often used for polyhedral benchmarking so another approach using tiling is probably needed to properly speed those up. (see https://github.com/numforge/laser/blob/master/research/automatic_loop_nest_scheduling.md#polyhedral-approaches)

Not in scope:

  • using forEach for backward propagation in GRU: this is a headache inducing refactoring

@mratsim
Copy link
Owner Author

mratsim commented Jan 3, 2021

Some changes in operator BLAS L1, require changing autograd to +.= because the += based on apply2_inline was doing implicit broadcast somehow. But then we have cascading issues that requires changing +.= to use broadcast2 and broadcast2 fixes, and then we have the Stack autograd layer that is failing tests. See 2929cb5 in the laser-iterators-stashed branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant