Weighted model #113

michalk8 · 2020-09-16T18:37:08Z

Hi, @davidsebfischer ,
I've started working on the weighted model based on your notes (thanks a lot) and I think I've gotten most of the stuff in numpy right, though I haven't tested this yet.

Questions:

how shall closedform_glm_mean and closedform_glm_scale be modified to include weights?

Todos:

check if the propagation of weights is correct2
write tests (not only for NB)

Todos to discuss:

remove dead code
typing
implement rest of the models so that it matches the TF backend
numba - are there any bottlenecks/how does the performance compare to e.g. tf/dask or using sparse?
software engineering stuff: pre-commit, travis, black - what's the min. python version you support?

My recommendation would be for now, it would be just the weighted NB model in numpy + tests + SWE stuff, the rest could 2 separate PRs (1 for TF, 1 for the rest of the models and numba if desired/needed).

Related CellRank issue: theislab/cellrank#377

P.S. I like how the repo's structure (i.e. the api/external/pkg_constants - it nicely avoids cyclic imports). The only ugly thing is, that you have to import .api.

michalk8 · 2020-09-16T18:42:57Z

batchglm/train/numpy/base_glm/model.py

-                np.einsum('ob,of->fob', xh, w),
-                xh
-            )
+        w = self.jac_weight_b_j(j=j)  # (observations x features)


This seemed like a bug, so I've changed it (I think this is one of the unused function, based on what PyCharm told me).

michalk8 · 2020-09-23T20:27:18Z

Hi @davidsebfischer
I was wondering about the following:

is there a reason why the estimator always needs dask (the design_loc is always dask array)?
when the data matrix is sparse and as_dask=False, it gets broken (see https://github.com/theislab/batchglm/pull/113/files#diff-16f723c756df0788db6b339a29976b1fR196)

I think both of these are very related and could be solved essentially by doing this: https://github.com/theislab/batchglm/pull/113/files#diff-2a60c6b6af3d9ce6c634d470674d6a55R49 (just remove the true and replace the computes with if dask then compute).
Or for sparse matrices, only allow dask (I already check for that).

davidsebfischer · 2020-09-24T10:00:55Z

how shall closedform_glm_mean and closedform_glm_scale be modified to include weights?

Sorry I did not see this question before. closedform_glm_mean is probably still possible for weighting, if the weights are used on the observations in this linear system? closedform_glm_scale is methods of moments, this should also work. In worst case you can always disable these at first if weights are used in default to easier initialisations.

P.S. I like how the repo's structure (i.e. the api/external/pkg_constants - it nicely avoids cyclic imports). The only ugly thing is, that you have to import .api.

definitely also open to suggestions here!

is there a reason why the estimator always needs dask (the design_loc is always dask array)?

we can change that!

when the data matrix is sparse and as_dask=False, it gets broken (see https://github.com/theislab/batchglm/pull/113/files#diff-16f723c756df0788db6b339a29976b1fR196)
I think both of these are very related and could be solved essentially by doing this: https://github.com/theislab/batchglm/pull/113/files#diff-2a60c6b6af3d9ce6c634d470674d6a55R49 (just remove the true and replace the computes with if dask then compute).
Or for sparse matrices, only allow dask (I already check for that).

cool, we can definitely do that!

michalk8 · 2020-09-24T21:43:32Z

we can change that!
cool, we can definitely do that!

Ok, I've started with typing + defining some utility functions. There are 4/5 more things I'd like to do:

I'd also like to remove the sys.stdout (in favour of print/log)
f-strings (there's a backport for Python3.6 https://github.com/asottile/future-fstrings)
enable pre-commit (including black formatting, if you don't mind it)
later down the line: docstring polishing (some arguments are missing, style is inconsistent [e.g. use numpy-style])

Do you have any tests locally so that I might check if things are breaking? If not, I will start with some simple unit tests.

davidsebfischer · 2020-09-25T08:14:02Z

* I'd also like to remove the sys.stdout (in favour of print/log)

ok for me, but then we have to do this consistently across the package on this PR.

* f-strings (there's a backport for Python3.6 https://github.com/asottile/future-fstrings)

yes, have also thought about this before - great!

* enable pre-commit (including black formatting, if you don't mind it)

nice let s try!

* later down the line: docstring polishing (some arguments are missing, style is inconsistent [e.g. use numpy-style])

yes we can build this as we go, this is also work in progress in internal functions!

Do you have any tests locally so that I might check if things are breaking? If not, I will start with some simple unit tests.

I haven't automated running these but I have test for most functionalities, we could set up continuous integration if you want, happy to spend some time doing that! Important test for example concern accurarcy on the parameter estimates: https://github.com/theislab/batchglm/blob/master/batchglm/unit_test/test_acc_glm_all_numpy.py

michalk8 added 5 commits September 16, 2020 16:15

Add weights to input data

4a8700a

Propagate weights

2ead7cc

Add forgotten import

05a8c54

Fix missing import, remove TypeAlias

6227b77

Fix types

4edc75c

michalk8 commented Sep 16, 2020

View reviewed changes

michalk8 added 4 commits September 23, 2020 21:07

Fix shape, dtype check

d977f3d

Fix dtype, start fixing sparse implementation

dd98812

Revert previous solution, raise if not uisng dask

72fbd20

Fix parantheses

e827427

michalk8 added 3 commits September 24, 2020 22:47

Add utils, start sparse-no-dask

1f8295b

Start typing

570244b

Finish base typing (mostly)

ff81b68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighted model #113

Weighted model #113

michalk8 commented Sep 16, 2020

michalk8 Sep 16, 2020

michalk8 commented Sep 23, 2020 •

edited

davidsebfischer commented Sep 24, 2020

michalk8 commented Sep 24, 2020

davidsebfischer commented Sep 25, 2020

Weighted model #113

Are you sure you want to change the base?

Weighted model #113

Conversation

michalk8 commented Sep 16, 2020

michalk8 Sep 16, 2020

Choose a reason for hiding this comment

michalk8 commented Sep 23, 2020 • edited

davidsebfischer commented Sep 24, 2020

michalk8 commented Sep 24, 2020

davidsebfischer commented Sep 25, 2020

michalk8 commented Sep 23, 2020 •

edited