Error running Pipeline with BasicReferenceRecognizer #60

xesaad · 2021-08-11T15:23:11Z

Hi there! I am a new and frequent user of this great package, which also comes with a few inevitable GitHub issues 😅

When I initialize the pipeline as follows:

name = "absa/classifier-rest-0.2"
model = absa.BertABSClassifier.from_pretrained(name)
tokenizer = BertTokenizer.from_pretrained(name)
reference_recognizer = absa.aux_models.BasicReferenceRecognizer()
professor = absa.Professor(reference_recognizer) 
nlp = absa.Pipeline(model=model, tokenizer=tokenizer, professor=professor)

I receive the following error:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_514/72277120.py in <module>
      2 model = absa.BertABSClassifier.from_pretrained(name)
      3 tokenizer = BertTokenizer.from_pretrained(name)
----> 4 reference_recognizer = absa.aux_models.BasicReferenceRecognizer()
      5 professor = absa.Professor(reference_recognizer)
      6 nlp = absa.Pipeline(model=model, tokenizer=tokenizer, professor=professor)

TypeError: __init__() missing 1 required positional argument: 'weights'

I realise this is because the BasicReferenceRecognizer needs to be trained in order to select weights. This leads me to two questions/issues:

The BasicReferenceRecognizer class has no train method. Is there another way in which to train it, or any ways to load a pretrained model from the package? From the unit tests for the BasicReferenceRecognizer I found there were two pre-trained models, 'absa/basic_reference_recognizer-rest-0.1' and 'absa/basic_reference_recognizer-lapt-0.1', but on trying to initialize with these I received an ImportError.
I also tried directly initializing the BasicReferenceRecognizer with weights=(-0.025, 44) as is done in this line. However, upon making predictions I get an error in the Pipeline at the postprocess step:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_514/3162923628.py in <module>
      3 for row in df.itertuples():
      4     print(row)
----> 5     prediction = predict(row.Review, row.Aspect)
      6     sentiment = get_sentiment(prediction)
      7     certainty_score = get_certainty_score(prediction)

/tmp/ipykernel_514/1002360698.py in predict(text, aspect)
     16         output_batch = nlp.predict(input_batch)
     17         predictions = nlp.review(tokenized_examples, output_batch)
---> 18         completed_task = nlp.postprocess(task, predictions)
     19         completed_subtask = completed_task.subtasks[aspect]
     20         return completed_subtask

/pyenv/versions/3.8.5/envs/seo-advice-page/lib/python3.8/site-packages/aspect_based_sentiment_analysis/pipelines.py in postprocess(task, batch_examples)
    301             aspect, = {e.aspect for e in examples}
    302             scores = np.max([e.scores for e in examples], axis=0)
--> 303             scores /= np.linalg.norm(scores, ord=1)
    304             sentiment_id = np.argmax(scores).astype(int)
    305             aspect_document = CompletedSubTask(

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

I believe that this error is related to a TypeError between int and float. If instead I initialize with weights = (1,1), for example, I receive no error.

I wanted to flag these issues for your awareness. Thank you very much for any advice you can provide 😄

The text was updated successfully, but these errors were encountered:

xesaad · 2021-08-31T13:58:22Z

Update: I believe that this issue is due to the following: if the BasicReferenceRecognizer does not detect an aspect, the professor component sets scores = [0,0,0], which is a list of integers. When scores is then normalised by dividing by its norm, the error is raised because you are dividing an int when you really want to divide a float (of course, there may also be a ZeroDivisionError lurking here!)

Suggestion:

In this line, redefine scores = [0.0, 0.0, 0.0].
For extra caution, in this line define scores = np.max([e.scores for e in examples], axis=0).astype(float).

I tried to open a PR to fix these suggestions myself, but unfortunately I don't have permission to push to this repository. I hope that these suggestions help with resolving this issue!

abaveja313 · 2023-01-01T02:46:50Z

Thank you for the suggestion! Fixed my problem @xesaad

xesaad mentioned this issue Oct 22, 2021

Asaad fix reference recognizer #64

Open

xesaad mentioned this issue Feb 9, 2022

Type mismatch error during operation on scores array #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running Pipeline with BasicReferenceRecognizer #60

Error running Pipeline with BasicReferenceRecognizer #60

xesaad commented Aug 11, 2021

xesaad commented Aug 31, 2021 •

edited

abaveja313 commented Jan 1, 2023

Error running Pipeline with BasicReferenceRecognizer #60

Error running Pipeline with BasicReferenceRecognizer #60

Comments

xesaad commented Aug 11, 2021

xesaad commented Aug 31, 2021 • edited

abaveja313 commented Jan 1, 2023

xesaad commented Aug 31, 2021 •

edited