Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of Implicit sequential model throws ValueError #161

Open
impaktor opened this issue May 10, 2019 · 4 comments
Open

Evaluation of Implicit sequential model throws ValueError #161

impaktor opened this issue May 10, 2019 · 4 comments
Labels

Comments

@impaktor
Copy link

Hi!

I'm trying to train an implicit sequential model on click stream data, but as soon as I try to evaluate (e.g. using MRR, or Precision & Recall) after having trained the model, it throws an error:

mrr = spotlight.evaluation.mrr_score(implicit_sequence_model, test, train)

ValueErrorTraceback (most recent call last)
<ipython-input-78-349343a26e9b> in <module>
----> 1 mrr = spotlight.evaluation.mrr_score(implicit_sequence_model, test, train)

~/.local/lib/python3.7/site-packages/spotlight/evaluation.py in mrr_score(model, test, train)
     45             continue
     46
---> 47         predictions = -model.predict(user_id)
     48
     49         if train is not None:

~/.local/lib/python3.7/site-packages/spotlight/sequence/implicit.py in predict(self, sequences, item_ids)
    316
    317         self._check_input(item_ids)
--> 318         self._check_input(sequences)
    319
    320         sequences = torch.from_numpy(sequences.astype(np.int64).reshape(1, -1))

~/.local/lib/python3.7/site-packages/spotlight/sequence/implicit.py in _check_input(self, item_ids)
    188
    189         if item_id_max >= self._num_items:
--> 190             raise ValueError('Maximum item id greater '
    191                              'than number of items in model.')
    192

ValueError: Maximum item id greater than number of items in model.

Perhaps the error is obvious, but I can't pinpoint what I'm doing wrong, so below I'll describe as concisely as possible, what I'm doing.

Comparison of experimental with synthetic data

I tried generating synthetic data and use that instead of my experimental data, and then it works. This lead me to compare the data structure of the synthetic data with my experimental:

Table 1: Synthetic data with N=100 unique users, M=1k unique items, and Q=10k interactions
user_id item_id timestamp
0 958 1
0 657 2
0 172 3
1 129 4
1 . 5
1 . 6
. . .
. . .
. . .
. . .
N . Q-2
N . Q-1
N 459 Q
Table 2: Experimental data, N=2.5M users, M=20k items, Q=14.8M interactions
user_id item_id timestamp
725397 3992 0
2108444 10093 1
2108444 10093 2
1840496 15616 3
1792861 16551 4
1960701 16537 5
1140742 6791 6
2074022 4263 .
2368959 19258 .
2368959 17218 .
. . .
. . Q-1
. . Q
  1. Both data sets have users indexed from [0..N-1], but my experimental is not sorted on user_ids as is the case for the synthetic data.

  2. Both data sets have item_ids indexed from [1..M], yet it only throws the "ValueError: Maximum item id greater than number of items in model." for my experimental data.

  3. I've re-shaped my timestamps to be just the data frame index after sorting on time, so this is also as in the synthetic data set. (Previously my timestamps were in seconds since 1970 of the event, and some events were simultaneous, i.e. order arbitrary/degenerate state.

Code for processing the experimental data:

# pandas dataframe with unique string identifier for users ('session_id'), 
# and 'Article number' for item_id, and 'timestamp' for event
df = df.sort_values(by=['timestamp']).reset_index(drop=True)


# encode string identifiers for users and items to integer values:
from sklearn import preprocessing
le_usr = preprocessing.LabelEncoder() # user encoder
le_itm = preprocessing.LabelEncoder() # item encoder

# shift item_ids with +1 (but not user_ids):
item_ids = (le_itm.fit_transform(df['Article number']) + 1).astype('int32')
user_ids = (le_usr.fit_transform(df['session_id'])     + 0).astype('int32')


from spotlight.interactions import Interactions
implicit_interactions = Interactions(user_ids, item_ids, timestamps=df.index.values)

from spotlight.cross_validation import user_based_train_test_split, random_train_test_split
train, test = random_train_test_split(implicit_interactions, 0.2)

Code for training the model:

from spotlight.sequence.implicit import ImplicitSequenceModel
sequential_interaction = train.to_sequence()
implicit_sequence_model = ImplicitSequenceModel(use_cuda=True, n_iter=10, loss='pointwise', representation='pooling')
implicit_sequence_model.fit(sequential_interaction, verbose=True)

import spotlight.evaluation
mrr = spotlight.evaluation.mrr_score(implicit_sequence_model, test, train)

Questions on input format:

Here are some questions I thought might pinpoint the error, in where my data might differ from the synthetic data set:

  1. Is there any purpose, or even harm, to include users with only a single interaction?

  2. Does the model allow a user have multiple events with the same timestamp-value?

  3. As long as (userid,itemid,timestamp) triplets pair up, does row-ordering matter?

@maciejkula
Copy link
Owner

maciejkula commented May 10, 2019 via email

@impaktor
Copy link
Author

Thanks for fast reply!

Before you start the evaluation routine on your real data, can you compare the number of items in your train and test data? They should be the same.

They're the same as far as I can tell, this is the output after I've run random_train_test_split:

In [6]: test
Out[6]: <Interactions dataset (2517443 users x 20861 items x 2968924 interactions)>

In [7]: train
Out[7]: <Interactions dataset (2517443 users x 20861 items x 11875692 interactions)>

I've also tried using either user_based_train_test_split(), and random_train_test_split(), but result always ends with the ValueError thrown. I've tried using 'pointwise' or 'adaptive_hinge', just to see if that would change anything, but naturally it did naught; and model training seems to work fine either way.

But indeed the actual number of items is one less (20860, see below) than the interaction dataset thinks (20861, see above), for some reason:

In [8]: print(len(np.unique(item_ids)), min(item_ids), max(item_ids))
20860 1 20860

In [15]: len(item_ids) - (2968924 + 11875692)
Out[15]: 0

Is this some how related to me doing a +1 to all item_ids in the code of my original post? (repeated below)

# shift item_ids with +1 (but not user_ids):
item_ids = (le_itm.fit_transform(df['Article number']) + 1).astype('int32')

If I don't do this, I will have a zero indexed item_vector and that will trigger an assert/error check, if I remember correctly.

@maciejkula
Copy link
Owner

One explanation for why this would happen is if I didn't propagate the total number of items correctly across train/test splits and sequential interaction conversion (the total number of items in the model must be the higher of the maximum item id in train/test). However, I don't see anything wrong with the code.

The invariant that needs to be upheld is train.num_items == test.num_items == model._num_items (and item_ids.max() < model._num_items).

I think unless you can provide a snippet that I can run that has the same problem I won't be able to help further.

(By the way, random train/test split doesn't make any sense for sequential models: use the user-based split.)

@impaktor
Copy link
Author

impaktor commented Nov 8, 2019

Hi @maciejkula

After 6 months, I've now revisited this, and I believe I know exactly how to trigger this bug.

(Quick recap of above: Evaluating my ImplicitSequenceModel worked with synthetic data, but not with my "real" data, as I got error: ValueError: Maximum item id greater than number of items in model. yet I check this on both train and test, and all indices look to be correct)

I provide code that transforms the synthetic data to my use case, which triggers the bug.

The following code will trigger the bug:

from spotlight.cross_validation import user_based_train_test_split
from spotlight.datasets.synthetic import generate_sequential
from spotlight.evaluation import sequence_mrr_score
from spotlight.evaluation import mrr_score
from spotlight.sequence.implicit import ImplicitSequenceModel

trigger_crash = True
if trigger_crash:
    n_items = 100
else:
    n_items = 1000

dataset = generate_sequential(num_users=1000,
                              num_items=n_items,
                              num_interactions=10000,
                              concentration_parameter=0.01,
                              order=3)

train, test = user_based_train_test_split(dataset)

train_seq = train.to_sequence()

model = ImplicitSequenceModel(n_iter=3,
                              representation='cnn',
                              loss='bpr')
model.fit(train_seq, verbose=True)

# this always works
test_seq = test.to_sequence()
mrr_seq = sequence_mrr_score(model, test_seq)
print(mrr_seq)

# using mrr_score (or precision_recall) with num_items < num_users
# triggers crash:
mrr = mrr_score(model, test)
print(mrr)

I.e. if num_items < num_users the mrr_score nor precision_recall_score works, however, sequence_mrr_score and sequence_precision_recall_score works fine.

Question is:

  1. Am I wrong in trying to use the non sequence_* version of these evaluation metrics for an implicit sequence model?

  2. If so, is it just luck that they work when items > users?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants