Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize Everything #11

Open
alejandroschuler opened this issue May 31, 2015 · 1 comment
Open

Parallelize Everything #11

alejandroschuler opened this issue May 31, 2015 · 1 comment

Comments

@alejandroschuler
Copy link
Contributor

I think it would be useful to parallelize all of the code, since parallelized code can still be run on a single processor. We can get rid of glrm altogether, or keep it for pedagogical purposes. For massive datasets, it currently takes forever to run stuff like observations() and we can turn df2array into df2sharedarray in one shot.

@time data = readtable("data/sample_100000");
elapsed time: 29.124417016 seconds (5721166992 bytes allocated, 7.27% gc time)

julia> @time obs = observations(data);
 elapsed time: 547.430868981 seconds (12869023916 bytes allocated, 93.97% gc time)
@madeleineudell
Copy link
Owner

Specializing on the new shareGLRM type, which is a subtype of the AbstractGLRM type, should make it easier and cleaner to parallelize larger swaths of the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants