ParallelSGD does not work as a drop-in replacement for SGD and it is not compatible with the rest of the code (does not compile) #2731

zsogitbe · 2020-11-23T10:12:55Z

Issue description

ParallelSGD does not work as a drop-in replacement for SGD and it is not compatible with the rest of the code (it does not compile). SGD uses only 1 thread and therefore it is slow. It would be interesting to have a faster optimization if ParallelSGD would work. It seems to me that ParallelSGD is not worked out well. What is the reasons for this? Is it not a good algorithm? If the ParallelSGD algorithm is not good, would it be possible to speed up the SGD optimizer with parallel processing in some way?

Steps to reproduce

Use for example a simple neural network example with SGD and replace the SGD optimizer with ParallelSGD (needs a modified function...)

Expected behavior

The code should compile well and the optimization should be much faster.

Actual behavior

The code does not compile (several errors with mentioning the wrong number of parameters on several places (e.g. the Evaluate function).

rcurtin · 2020-12-01T13:42:45Z

This looks like an issue that should be opened in the ensmallen repository. Nonetheless, the documentation for ParallelSGD points out that the API required for a function being optimized with ParallelSGD is slightly different than for regular separable differentiable functions, and suggests changes that you can make to your function's implementation that should allow the usage of ParallelSGD.

The Hogwild! algorithm is most performant when the gradients of the objective function are sparse. That may not be the case for a neural network in general, but it can work well for, e.g., a high-dimensional sparse logistic regression problem. Check out the paper on the algorithm for more details.

Please include a minimum reproducible example with bug reports (so that we can compile that directly), as well as the exact errors that are being encountered. Although you provided some directions about how to reproduce the issue, it would take a while to write the code and debug it, and there would also be no guarantee that we would even see the same issue you are reporting. 👍

zsogitbe · 2020-12-01T14:12:15Z

Thank you for your answer Ryan!

Would you recommend ParallelSGD for recurrent neural networks?

Please find an example project in attachment. This is a slightly modified former version of the RNN electricity consumption example. I have dropped in ParallelSGD instead of SGD. Things I have removed are signed with '//@-psgd' and things I have added are signed with '//@+psgd'. There are only a very few things changed. It would be interesting to see if this works if it will be able to compile.
LSTMTimeSeriesUnivariatePSGD.zip

rcurtin · 2020-12-01T15:08:02Z

I wouldn't---I don't expect an RNN (unless very specifically constructed) to have sparse gradients. There may be other parallel SGD variants that could work for dense data and gradients but honestly I think that the best level of single-node parallelism for neural networks is not at the optimizer level but at the linear algebra level. So if you are using OpenBLAS already then the large linear algebra operations should already be using multiple cores.

Let me try out the example code you sent...

rcurtin · 2020-12-01T15:33:54Z

Ok, I see what's going on here. The issue isn't actually anything with ParallelSGD; it's that the RNN class does not implement the sparse separable differentiable function API required by ParallelSGD. As I mentioned earlier, it's not likely that an RNN will be able to make effective use of the Hogwild algorithm because the gradients will, in general, be fully dense.

In order to fix this issue, the RNN::Evaluate() and RNN::Gradient() methods would need to be adapted such that they could take an arbitrary matrix type as input. However, that's quite an undertaking and given the reasons above I don't think it's worthwhile to do that right now.

At the same time, it's worth pointing out that #290 is an issue that has been open for a long tine with the intention of fully templatizing mlpack's algorithms to work with any matrix type. If/when that is done, then RNNs will work with Hogwild, but like I mentioned I don't think it would yield noticeable speedup even if it did work. 👍

zsogitbe · 2020-12-01T15:55:40Z

OK! I understand. I will close this issue.

zsogitbe added s: unanswered t: bug report labels Nov 23, 2020

rcurtin removed the s: unanswered label Dec 1, 2020

zsogitbe closed this as completed Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParallelSGD does not work as a drop-in replacement for SGD and it is not compatible with the rest of the code (does not compile) #2731

ParallelSGD does not work as a drop-in replacement for SGD and it is not compatible with the rest of the code (does not compile) #2731

zsogitbe commented Nov 23, 2020

rcurtin commented Dec 1, 2020

zsogitbe commented Dec 1, 2020

rcurtin commented Dec 1, 2020

rcurtin commented Dec 1, 2020

zsogitbe commented Dec 1, 2020

ParallelSGD does not work as a drop-in replacement for SGD and it is not compatible with the rest of the code (does not compile) #2731

ParallelSGD does not work as a drop-in replacement for SGD and it is not compatible with the rest of the code (does not compile) #2731

Comments

zsogitbe commented Nov 23, 2020

Issue description

Steps to reproduce

Expected behavior

Actual behavior

rcurtin commented Dec 1, 2020

zsogitbe commented Dec 1, 2020

rcurtin commented Dec 1, 2020

rcurtin commented Dec 1, 2020

zsogitbe commented Dec 1, 2020