Better documentation of missing value imputation #484

Kiord · 2023-01-23T16:32:37Z

Describe the bug

I am trying to impute values with svd_interface in a matrix, but the mask values don't seem to impact the result.
I am using n_eigenvecs=matrix.shape[0] (no compression / no data loss)

Steps or Code to Reproduce

import numpy as np
from tensorly.tenalg.svd import svd_interface

m, n = 30, 50
matrix = np.random.rand(m, n)
mask = np.ones_like(matrix)
mask[m//2:, ::2] = 0 # mask some data
print(f'{np.count_nonzero(mask) / float(mask.size) * 100}% imputed')

U, S, V = svd_interface(matrix, n_eigenvecs=m, mask=mask)
recon = U @ np.diag(S) @ V
print(np.allclose(matrix, recon))

Expected behavior

The values where mask == 0 should differ from matrix to recon

Actual result

The imputing code is as follows:

    U, S, V = svd_fun(matrix, n_eigenvecs=n_eigenvecs, **kwargs)

    if mask is not None:
        for _ in range(n_iter_mask_imputation):
            matrix = matrix * mask + (U @ tl.diag(S) @ V) * (1 - mask)
            U, S, V = svd_fun(matrix, n_eigenvecs=n_eigenvecs, **kwargs)

I think the result U, S, V does not depend on mask:

iteration 0 (first line) : matrix transformed in svd form (no loss data)
iteration 1 :
- since matrix is equal to U @ tl.diag(S) @ V, this is equivalent to matrix=matrix
- 'matrix' is transformed to U, S, V (no loss data)
iteration 2 :
- since matrix is equal to U @ tl.diag(S) @ V, this is equivalent to matrix=matrix
- 'matrix' is transformed to U, S, V (no loss data)
...

In the end mask has no effect on the result.

Versions

Windows-10-10.0.22000-SP0
Python 3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)]
NumPy 1.21.5
SciPy 1.8.0

The text was updated successfully, but these errors were encountered:

aarmey · 2023-01-24T01:00:49Z

Hi @Kiord—When the rank is full, SVD is able to fully represent the matrix. As a consequence of this, it can represent whatever values are filled in as missing values, and there is no way for SVD to impute those values. Imputation occurs as a consequence of SVD being an approximation of the matrix, which can only happen when it is an incomplete representation. You should find that the imputed values change if the rank is less than full, though.

Kiord · 2023-01-24T10:24:08Z

Hi, thank you. I was unsure if this behavior was wanted.

aarmey · 2023-01-24T15:14:25Z

No worries. Happy to help!

JeanKossaifi · 2023-01-24T16:17:49Z

Thanks @aarmey - do you think it would make sense to add a Notes section on imputation in the docstring?

aarmey · 2023-01-30T19:16:17Z

I am not sure if this should be user-facing. If a user wants to impute by SVD, there are better options in fancyimpute. However, I do agree that there could be better documentation of handling missing values and, potentially, using the tensor methods to impute. Do you think that would go in the notes section of the methods, or in a section of the user guide?

JeanKossaifi · 2023-01-31T00:43:53Z

I agree -- maybe just a short note section in the docstring about what we mean by mask etc. We can also add separately a short section in the user guide about missing data with our tensor methods in general -- I guess there is no such thing as too exhaustive a documentation :)

JeanKossaifi · 2023-05-25T17:18:58Z

@Kiord would you be able to make a small PR with the changes you have in mind and ping me and @aarmey to review it?

Kiord · 2023-06-08T22:21:32Z

Hello @JeanKossaifi, I can do try to do that. What I understood is that svd_interface is not designed to impute the values like sklearn.impute.SimpleImputer for instance, but an optional functionality (masking) allows the user to impute some data but only when the rank is reduced. Correct ?

aarmey · 2023-06-09T13:39:38Z

Hi @Kiord—That is right. There is nothing wrong with using svd_interface for imputation, we just have not added a lot of basic functionality one might expect. For example, it only allows you to set a constant number of iterations, rather than running until some convergence condition. We also do not check the inputs very carefully to ensure they are reasonable (such as being lower rank than the data).

aarmey assigned aarmey and unassigned aarmey Jan 24, 2023

Kiord closed this as completed Jan 24, 2023

aarmey reopened this Jan 31, 2023

aarmey changed the title ~~svd_interface not imputing missing values (when rank is full)~~ Better documentation of missing value imputation Jan 31, 2023

JeanKossaifi added the documentation label May 25, 2023

Kiord mentioned this issue Jun 28, 2023

Documentation update for svd missing values imputation #508

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better documentation of missing value imputation #484

Better documentation of missing value imputation #484

Kiord commented Jan 23, 2023

aarmey commented Jan 24, 2023

Kiord commented Jan 24, 2023

aarmey commented Jan 24, 2023

JeanKossaifi commented Jan 24, 2023

aarmey commented Jan 30, 2023

JeanKossaifi commented Jan 31, 2023

JeanKossaifi commented May 25, 2023

Kiord commented Jun 8, 2023

aarmey commented Jun 9, 2023

Better documentation of missing value imputation #484

Better documentation of missing value imputation #484

Comments

Kiord commented Jan 23, 2023

Describe the bug

Steps or Code to Reproduce

Expected behavior

Actual result

Versions

aarmey commented Jan 24, 2023

Kiord commented Jan 24, 2023

aarmey commented Jan 24, 2023

JeanKossaifi commented Jan 24, 2023

aarmey commented Jan 30, 2023

JeanKossaifi commented Jan 31, 2023

JeanKossaifi commented May 25, 2023

Kiord commented Jun 8, 2023

aarmey commented Jun 9, 2023