-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParallelSGD does not work as a drop-in replacement for SGD and it is not compatible with the rest of the code (does not compile) #2731
Comments
This looks like an issue that should be opened in the ensmallen repository. Nonetheless, the documentation for ParallelSGD points out that the API required for a function being optimized with The Hogwild! algorithm is most performant when the gradients of the objective function are sparse. That may not be the case for a neural network in general, but it can work well for, e.g., a high-dimensional sparse logistic regression problem. Check out the paper on the algorithm for more details. Please include a minimum reproducible example with bug reports (so that we can compile that directly), as well as the exact errors that are being encountered. Although you provided some directions about how to reproduce the issue, it would take a while to write the code and debug it, and there would also be no guarantee that we would even see the same issue you are reporting. 👍 |
Thank you for your answer Ryan! Would you recommend ParallelSGD for recurrent neural networks? Please find an example project in attachment. This is a slightly modified former version of the RNN electricity consumption example. I have dropped in ParallelSGD instead of SGD. Things I have removed are signed with '//@-psgd' and things I have added are signed with '//@+psgd'. There are only a very few things changed. It would be interesting to see if this works if it will be able to compile. |
I wouldn't---I don't expect an RNN (unless very specifically constructed) to have sparse gradients. There may be other parallel SGD variants that could work for dense data and gradients but honestly I think that the best level of single-node parallelism for neural networks is not at the optimizer level but at the linear algebra level. So if you are using OpenBLAS already then the large linear algebra operations should already be using multiple cores. Let me try out the example code you sent... |
Ok, I see what's going on here. The issue isn't actually anything with In order to fix this issue, the At the same time, it's worth pointing out that #290 is an issue that has been open for a long tine with the intention of fully templatizing mlpack's algorithms to work with any matrix type. If/when that is done, then RNNs will work with Hogwild, but like I mentioned I don't think it would yield noticeable speedup even if it did work. 👍 |
OK! I understand. I will close this issue. |
Issue description
ParallelSGD does not work as a drop-in replacement for SGD and it is not compatible with the rest of the code (it does not compile). SGD uses only 1 thread and therefore it is slow. It would be interesting to have a faster optimization if ParallelSGD would work. It seems to me that ParallelSGD is not worked out well. What is the reasons for this? Is it not a good algorithm? If the ParallelSGD algorithm is not good, would it be possible to speed up the SGD optimizer with parallel processing in some way?
Steps to reproduce
Use for example a simple neural network example with SGD and replace the SGD optimizer with ParallelSGD (needs a modified function...)
Expected behavior
The code should compile well and the optimization should be much faster.
Actual behavior
The code does not compile (several errors with mentioning the wrong number of parameters on several places (e.g. the Evaluate function).
The text was updated successfully, but these errors were encountered: