You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@MichaelSolotky@MelLain
Hello. I'm getting this error when trying to access TopicKernelScore stats. I think that problem related with the size of the document corpus. I have 13k documents in my collection and this bug occurs when document corpus is greater than 1500 documents (e.g lines in vw.txt)
project with 1500 documents:
Project: bugged_model.zip
All other metrics is working fine with any size of documents corpus
code:
import artm
batches_folder = 'batches/'
data_path='vw.txt'
batch_vectorizer = artm.BatchVectorizer(data_path=data_path,
data_format='vowpal_wabbit',
target_folder=batches_folder)
dictionary = artm.Dictionary()
dictionary.gather(data_path=batches_folder)
topic_names = ["Topic_"+str(i) for i in range(30)]
model = artm.ARTM(topic_names=topic_names,
num_topics=30,
dictionary=dictionary)
model.scores.add(artm.PerplexityScore(name='PerplexityScore', dictionary=dictionary))
model.scores.add(artm.TopicKernelScore(name='TopicKernelScore',
probability_mass_threshold=0.07))
model.fit_offline(batch_vectorizer=batch_vectorizer,
num_collection_passes=40)
print(model.score_tracker['PerplexityScore'].value)
print(model.score_tracker['TopicKernelScore'].average_contrast)
print(model.score_tracker['TopicKernelScore'].average_purity)
stack trace:
---------------------------------------------------------------------------
DecodeError Traceback (most recent call last)
<ipython-input-2-2cc99a1e0396> in <module>()
21 num_collection_passes=40)
22 print(model.score_tracker['PerplexityScore'].value)
---> 23 print(model.score_tracker['TopicKernelScore'].average_contrast)
24 print(model.score_tracker['TopicKernelScore'].average_purity)
~/anaconda3/lib/python3.6/site-packages/artm/score_tracker.py in <lambda>(self, p)
86 setattr(class_ref,
87 name,
---> 88 property(lambda self, p=_p: _get_score(self._name, self._master, p)))
89 setattr(class_ref,
90 'last_{}'.format(name),
~/anaconda3/lib/python3.6/site-packages/artm/score_tracker.py in _get_score(score_name, master, field_attrs, last)
41 return result_dict
42
---> 43 data_array = master.get_score_array(score_name)
44
45 if field_attrs[1] == 'optional' and field_attrs[2] == 'scalar':
~/anaconda3/lib/python3.6/site-packages/artm/master_component.py in get_score_array(self, score_name)
715 """
716 args = messages.GetScoreArrayArgs(score_name=score_name)
--> 717 score_array = self._lib.ArtmRequestScoreArray(self.master_id, args)
718
719 scores = []
~/anaconda3/lib/python3.6/site-packages/artm/wrapper/api.py in artm_api_call(*args)
163 # return result value
164 if spec.request_type is not None:
--> 165 return self._get_requested_message(length=result, func=spec.request_type)
166 if spec.result_type is not None:
167 return result
~/anaconda3/lib/python3.6/site-packages/artm/wrapper/api.py in _get_requested_message(self, length, func)
104 self._check_error(error_code)
105 message = func()
--> 106 message.ParseFromString(message_blob.raw)
107 return message
108
DecodeError: Error parsing message
@MichaelSolotky @MelLain
Hello. I'm getting this error when trying to access TopicKernelScore stats. I think that problem related with the size of the document corpus. I have 13k documents in my collection and this bug occurs when document corpus is greater than 1500 documents (e.g lines in vw.txt)
project with 1500 documents:
Project: bugged_model.zip
All other metrics is working fine with any size of documents corpus
code:
stack trace:
versions:
bugged_model.zip
The text was updated successfully, but these errors were encountered: