Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lastfm tutorial performs transpose on data, but generation code no longer requires it #644

Open
alastair opened this issue Jan 26, 2023 · 1 comment

Comments

@alastair
Copy link

Hi,

In the lastfm tutorial, there is a specific step

# get the transpose since the most of the functions in implicit expect (user, item) sparse matrices instead of (item, user)
user_plays = artist_user_plays.T.tocsr()

However it looks like this may no longer be necessary in some cases.
In 32c06aa#diff-b8a4c78fbfcc629a3d35255010d1a4ae21d5909664b8d3c1283da18359ae5a0aL77-R77 some changes were made which also swapped the order of users/artists when building the sparse matrix. Therefore, if we generate a new copy of the hdf5 from the source data file, the artist_user_plays matrix is already in the correct orientation.

However, it does seem that the binary hdf5 file which is downloaded by the tool was generated with the older version of this code, which is still using the (artist, user) format.

It seems like to reduce confusion it would be a good idea to re-generate the binary hdf5 and remove the transform from the tutorial, or revert the dimension change in the matrix generation step.

@benfred
Copy link
Owner

benfred commented Jun 6, 2023

Yeah - thats a great callout. The datasets were generated before the API refactor in #481 - and we really should generate new ones with transposed data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants