Gram-based CD/BCD/FISTA solvers for (group)Lasso when `n_samples >> n_features` #4

PABannier · 2022-04-20T22:11:32Z

The goal of this PR is to write a CD (BCD) solver when n_samples >> n_features.
Such configurations are solved much faster by pre-computing a Gram matrix XtX and updating the gradient (rather than the residuals) at every CD cycle.

A quick experiment with 1e6 samples and 300 features:

########
Lasso
########

Celer: 5.43s
Gram: 1.89s

###########
Group Lasso
###########

Celer: 43.41s
Gram: 4.23s

mathurinm · 2022-04-21T07:21:44Z

@PABannier thanks a lot ! I tried it and it really shines on the data by @sehoff

Caveat: the stopping criteria are not the same. Monitoring the primal decrease is much looser than checking the duality gap. I adjusted the tolerance manually to obtain similar results

sehoff · 2022-04-22T12:21:16Z

Thanks a lot! I tried it out (on slightly different data than I provided), and can confirm sizable speedups. I use warm-starts for celer, so I guess the figures are even biased towards celer.

Note: in the current setting of the gram-solver: res = gram_group_lasso(X, y, a, groups=grps, tol=1e-10, max_iter=10_000, check_freq=10) with very small alphas (alpha_max/1000), the solver does not convergence after max_iter.

Do you have any suggestion on a reasonable decrease in the tolerance? Because for tol=1e-9 it reaches convergence after ~3500 iterations.

.

skglm/solvers/gram.py

mathurinm · 2022-04-22T15:19:07Z

@sehoff hard to tell without the data but consider increasing max_iter, probably the solver is not very far from convergence.You can run it in verbose mode to see if you stop far from convergence or not.

It's easy to add warm start to the gram solvers, it should be done shortly. Beware in your comparison that the stopping criteria are not the same.

PABannier · 2022-04-24T22:52:36Z

Quick update:

skglm/solvers/gram.py

… gram_solver

mathurinm · 2022-04-25T08:49:52Z

@sehoff support for weights (inifnite ones too) is here. Let us know how it works !

sehoff · 2022-04-26T18:32:38Z

@sehoff hard to tell without the data but consider increasing max_iter, probably the solver is not very far from convergence.You can run it in verbose mode to see if you stop far from convergence or not.

It's easy to add warm start to the gram solvers, it should be done shortly. Beware in your comparison that the stopping criteria are not the same.

Increasing max_iter works for me, thank you !

PABannier · 2022-04-26T18:39:42Z

@sehoff For very small alphas, I'd recommend using FISTA instead. Have a look at the gram_fista_group_lasso function!

sehoff · 2022-04-26T18:52:11Z

@sehoff support for weights (inifnite ones too) is here. Let us know how it works !
@sehoff For very small alphas, I'd recommend using FISTA instead. Have a look at the gram_fista_group_lasso function!

Here are the results of my comparisons, where I used the data I provided in the dropbox, and a tol=1e-8 in all solvers. Furthermore, the results I show refer to Case 3.1., i.e., with one weight set to infinity. As a side remark, this highlights that all three solvers can handle infinite weights!
Bottomline: especially for (very) small alphas the Gram-based solvers outperform the one in celer significantly. So depending on the grid one eventually searches, each solver has its advantages.
Comparison 1:

Comparison 2: Note: I leave out celer here, because for alpha_max/1000 it takes quite long for convergence.

PABannier · 2022-04-26T19:16:51Z

@sehoff indeed, Celer is particularly efficient in settings where n_features >> n_samples. It implements a working set strategy, that is particularly useful in high-regularization regime (where there are few active features). In your first figure, for low level of regularization, since your design matrix has way more samples than features, Celer working set strategy is less useful.

Besides, for your data, the Gram solver has a cheaper update since X.T @ X is of size (n_features, n_features) (X.T @ X being a useful quantity at every gradient update).

PABannier · 2022-04-26T19:18:51Z

@sehoff Thanks for the plots. They are very insightful for us. By the way, if you ever find yourself in need for an automated way of benchmarking optimization routines, look at https://github.com/benchopt/benchopt

PABannier · 2022-10-22T15:15:28Z

@mathurinm do we want to keep this PR open? FISTA and Gram-solver have been merged

mathurinm · 2022-10-22T15:17:33Z

This one supports groups while the merged PR doesn't, so it does not hurt to leave it open IMO

mathurinm · 2024-04-11T08:59:34Z

note: this is a PR from the skglm repo, instead it should be done from a branch of your fork @PABannier

add gram_solver

999b89e

mathurinm marked this pull request as draft April 21, 2022 06:20

mathurinm added 2 commits April 21, 2022 08:57

test with large data

af666a5

isolate gram solver in solvers submodule

787c8c2

mathurinm changed the title ~~Faster CD solver when n_samples >> n_features~~ Implement Gram-based CD/BCD solver when n_samples >> n_features Apr 21, 2022

mathurinm reviewed Apr 22, 2022

View reviewed changes

skglm/solvers/gram.py Outdated Show resolved Hide resolved

PABannier added 9 commits April 22, 2022 17:53

added weights and warm_start

51b4cfe

WIP FISTA

a7db68b

added FISTA gram

fbee02d

larger examples

5ffd02b

added weights

d88df2a

ENH dual gap criterion

c6342e9

CLN Gram FISTA solver

dc7a0eb

format

adbab98

WIP Gram FISTA BCD

b81d348

PABannier and others added 5 commits April 25, 2022 08:50

duality gap for BCD

f142317

working BCD FISTA gram with weights

552485e

CLN

1a6e6aa

CLN

a2cf8a7

Merge branch 'main' of github.com:mathurinm/skglm into gram_solver

9d9a761

Badr-MOUFAD reviewed Apr 25, 2022

View reviewed changes

skglm/solvers/gram.py Show resolved Hide resolved

Badr-MOUFAD reviewed Apr 25, 2022

View reviewed changes

skglm/solvers/gram.py Show resolved Hide resolved

mathurinm reviewed Apr 25, 2022

View reviewed changes

skglm/solvers/gram.py Outdated Show resolved Hide resolved

mathurinm reviewed Apr 25, 2022

View reviewed changes

skglm/solvers/gram.py Show resolved Hide resolved

mathurinm reviewed Apr 25, 2022

View reviewed changes

skglm/solvers/gram.py Outdated Show resolved Hide resolved

fix primal comp

bfae088

PABannier added 3 commits April 25, 2022 10:39

FIX prox_L21 for variable size groups

f7bda72

Merge branch 'gram_solver' of https://github.com/mathurinm/skglm into…

4ef225a

… gram_solver

better precision and working group lasso celer

f8e4cf0

mathurinm changed the title ~~Implement Gram-based CD/BCD solver when n_samples >> n_features~~ Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features Apr 25, 2022

PABannier added 2 commits April 25, 2022 10:54

example with weights

5c0a33a

ENH fista to example

ae92334

This was referenced Apr 26, 2022

ENH add Gram matrix based solver for when n_features << n_samples mathurinm/celer#233

Closed

slow runtimes for small alphas in GroupLasso & inf weights in weighted GroupLasso mathurinm/celer#229

Closed

mathurinm mentioned this pull request May 5, 2022

ENH flexible gram solver with penalty and using datafit #16

Closed

This was referenced Oct 12, 2022

FEAT add FISTA solver #89

Closed

ENH Add FISTA solver #91

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gram-based CD/BCD/FISTA solvers for (group)Lasso when `n_samples >> n_features` #4

Gram-based CD/BCD/FISTA solvers for (group)Lasso when `n_samples >> n_features` #4

PABannier commented Apr 20, 2022 •

edited

mathurinm commented Apr 21, 2022

sehoff commented Apr 22, 2022 •

edited

mathurinm commented Apr 22, 2022

PABannier commented Apr 24, 2022 •

edited

mathurinm commented Apr 25, 2022 •

edited

sehoff commented Apr 26, 2022

PABannier commented Apr 26, 2022

sehoff commented Apr 26, 2022 •

edited

PABannier commented Apr 26, 2022

PABannier commented Apr 26, 2022

PABannier commented Oct 22, 2022

mathurinm commented Oct 22, 2022

mathurinm commented Apr 11, 2024

Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features #4

Are you sure you want to change the base?

Gram-based CD/BCD/FISTA solvers for (group)Lasso when n_samples >> n_features #4

Conversation

PABannier commented Apr 20, 2022 • edited

mathurinm commented Apr 21, 2022

sehoff commented Apr 22, 2022 • edited

mathurinm commented Apr 22, 2022

PABannier commented Apr 24, 2022 • edited

mathurinm commented Apr 25, 2022 • edited

sehoff commented Apr 26, 2022

PABannier commented Apr 26, 2022

sehoff commented Apr 26, 2022 • edited

PABannier commented Apr 26, 2022

PABannier commented Apr 26, 2022

PABannier commented Oct 22, 2022

mathurinm commented Oct 22, 2022

mathurinm commented Apr 11, 2024

Gram-based CD/BCD/FISTA solvers for (group)Lasso when `n_samples >> n_features` #4

Gram-based CD/BCD/FISTA solvers for (group)Lasso when `n_samples >> n_features` #4

PABannier commented Apr 20, 2022 •

edited

sehoff commented Apr 22, 2022 •

edited

PABannier commented Apr 24, 2022 •

edited

mathurinm commented Apr 25, 2022 •

edited

sehoff commented Apr 26, 2022 •

edited