Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key error during selecting sentence for training set #34

Open
hansd410 opened this issue Oct 20, 2021 · 0 comments
Open

Key error during selecting sentence for training set #34

hansd410 opened this issue Oct 20, 2021 · 0 comments

Comments

@hansd410
Copy link

hansd410 commented Oct 20, 2021

I've got key error during selecting sentence for training set. (error message below)

[INFO] 2021-10-20 04:32:44,511 - pipeline - Finished selecting sentences for dev set. INFO:pipeline:Finished selecting sentences for dev set. [INFO] 2021-10-20 04:32:44,512 - pipeline - Starting selecting sentences for training set... INFO:pipeline:Starting selecting sentences for training set... 100%|███████████████████████████████████████████████████████████████████████████████████████████| 145449/145449 [03:46<00:00, 642.38it/s] Traceback (most recent call last): File "src/scripts/athene/pipeline.py", line 196, in <module> sentence_retrieval_ensemble(logger, args.mode) File "src/scripts/athene/pipeline.py", line 138, in sentence_retrieval_ensemble sentence_retrieval_ensemble_entrance(_args) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/ensemble.py", line 265, in entrance random_seed=args.random_seed, reserve_embed=args.reserve_embed) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 33, in __init__ self.data_pipeline() File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 69, in data_pipeline self.test_indexes = self.predict_indexes_loader(test_indexes_path, tests) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 439, in predict_indexes_loader predicts_indexes = self.predict_data_indexes(predict_data, self.iword_dict) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 400, in predict_data_indexes sent_index = self.sent_2_index(sent, word_dict, self.s_max_length) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 376, in sent_2_index word_indexes.append(word_dict[word.lower()]) KeyError: 'wedgwood'

I think the problem comes from word dictionary that is generated from train_sample.p.
Since train_sample.p is generated from negative sampled training dataset, the vocabulary does not include whole words in training data.

I solved this problem by changing data.py from

    words_dict_path = os.path.join(self.embedding_path, "words_dict.p")
    if os.path.exists(words_dict_path):
        with open(words_dict_path, "rb") as f:
            self.word_dict = pickle.load(f)
    else:
        self.word_dict = self.get_complete_words(words_dict_path, X_train, devs, tests)

to

    words_dict_path = os.path.join(self.embedding_path, "words_dict.p")
    self.word_dict = self.get_complete_words(words_dict_path, X_train, devs, tests)

to update dictionary every time.

Is my solution looks fine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant