Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type mismatch error during operation on scores array #65

Open
brieucdandin opened this issue Jan 3, 2022 · 1 comment
Open

Type mismatch error during operation on scores array #65

brieucdandin opened this issue Jan 3, 2022 · 1 comment

Comments

@brieucdandin
Copy link

TL;DR

When defining a pipeline's elements and testing it on a couple of sentences, the tests results are inconsistent. Some go smoothly, while some encounter the following type error:
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
Some alternate depending on runs.

It appears it is due to a type mismatch when attempting an operation to obtain an array scores: one array is populated with ints and the other with floats.

Am I missing something? Is it just due to a package version issue?
The full stack of the error is at the bottom of the post.

Environment

To make the package work, I first had to go through a few ad-hoc tweaks that might explain the issue:

pip install aspect-based-sentiment-analysis
pip install --upgrade protobuf
python3 -m spacy download en
from transformers import BertTokenizer

Note Python 3.7 is used:

print(sys.version)

Output: 3.7.10 (default, Jun  4 2021, 14:48:32) 
[GCC 7.5.0]

Tests

I initially conducted the following tests:

tests_list = [{'text' : "We are great fans of Slack, but we wish the subscriptions were more accessible to small startups.",
               'aspects' : ['slack', 'price']
              },
              {'text' : "I love it overall, but the screen itself is terrible.",
               'aspects' : ['overall', 'screen']
              },
              {'text' : "I love it.",
               'aspects' : ['it']
              },
              {'text' : "I love it!",
               'aspects' : ['it']
              },
              {'text' : "I love it. :)",
               'aspects' : ['it']
              },
              {'text' : "I hate it.",
               'aspects' : ['it']
              },
              {'text' : "I love it: the sound is warm, the ambiance nice and the staff is great! The image is just OK, but who cares?",
               'aspects' : ['it', 'sound', 'ambiance', 'staff', 'image']
              },
             ]

Functional default pipeline

The preset pipeline works just fine.

nlp_load = absa.load()
for text_to_test in tests_list:
  text = text_to_test['text']
  aspects = text_to_test['aspects']
  features_sentiments_list = nlp_load(text, aspects=aspects)
  print(text)
  for feature_index in range(len(features_sentiments_list.subtasks)):
    aspect = aspects[feature_index]
    if aspect != '_':
      print(aspect, ':\t', features_sentiments_list.subtasks[aspect].sentiment)
  print()

returns:

We are great fans of Slack, but we wish the subscriptions were more accessible to small startups.
slack :	 Sentiment.positive
price :	 Sentiment.negative

I love it overall, but the screen itself is terrible.
overall :	 Sentiment.positive
screen :	 Sentiment.negative

I love it.
it :	 Sentiment.positive

I love it!
it :	 Sentiment.positive

I love it. :)
it :	 Sentiment.positive

I hate it.
it :	 Sentiment.negative

I love it: the sound is warm, the ambiance nice and the staff is great! The image is just OK, but who cares?
it :	 Sentiment.positive
sound :	 Sentiment.positive
ambiance :	 Sentiment.positive
staff :	 Sentiment.positive
image :	 Sentiment.neutral

Inconsistently dysfunctional pipeline defined by hand

Yet, with a pipeline in which the components are defined (the one from the README), we obtain the type mismatch error on certain tests and not others.

Pipeline definition:

model_name = 'absa/classifier-rest-0.2'
model = absa.BertABSClassifier.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)

prof_name = 'absa/basic_reference_recognizer-rest-0.1'
recognizer = absa.aux_models.BasicReferenceRecognizer.from_pretrained(prof_name)
professor = absa.Professor(reference_recognizer=recognizer)

text_splitter = absa.sentencizer() # The English CNN model from SpaCy.
nlp_pipeline = absa.Pipeline(model, tokenizer, professor, text_splitter)

Functional test

text = "I like it overall, but the text is long. This is just a bunch of letters and words to fill in the gap and make sure the error is not due to the length of the text."
aspects = ['overall', 'text']
print(text, aspects, '\n')
nlp_pipeline(text, aspects=aspects)

returns:

I like it overall, but the text is long. This is just a bunch of letters and words to fill in the gap and make sure the error is not due to the length of the text. ['overall', 'text'] 

Out[17]: CompletedTask( ... )

Most of the tests in tests_list work like charms too (most of the time):

for text_to_test in tests_list[1:]: # TODO: Why does tests_list[0] return an error?
  text = text_to_test['text']
  aspects = text_to_test['aspects']
  print(text, aspects)
  features_sentiments_list = nlp_pipeline(text, aspects=aspects)
  for feature_index in range(len(features_sentiments_list.subtasks)):
    aspect = aspects[feature_index]
    if aspect != '_':
      print(aspect, ':\t', features_sentiments_list.subtasks[aspect].sentiment)
  print()

returns:

I love it overall, but the screen itself is terrible. ['overall', 'screen']
overall :	 Sentiment.positive
screen :	 Sentiment.negative

I love it. ['it']
it :	 Sentiment.positive

I love it! ['it']
it :	 Sentiment.positive

I love it. :) ['it']
it :	 Sentiment.positive

I hate it. ['it']
it :	 Sentiment.negative

I love it: the sound is warm, the ambiance nice and the staff is great! The image is just OK, but who cares? ['it', 'sound', 'ambiance', 'staff', 'image']
it :	 Sentiment.positive
sound :	 Sentiment.positive
ambiance :	 Sentiment.positive
staff :	 Sentiment.positive
image :	 Sentiment.neutral

Dysfunctional tests

text = "We are great fans of Slack, but we wish the subscriptions were more accessible to small startups."
aspects = ['slack', 'price']
print(text, aspects, '\n')
nlp_pipeline(text, aspects=aspects)

returns the aforementioned TypeError ' - cf. below for full stack.
The same goes with text set to tests_list[0]['text'], tests_list[1]['text'], 'yo' and "yo".

Which leaves that some tests (like with tests_list[1]) alternatively fail and pass on successive runs, with no apparent correlation.

Full error stack

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<command-1097563962837147> in <module>
      2 aspects = tests_list[0]['aspects']
      3 print(text, aspects, '\n')
----> 4 nlp_pipeline(text, aspects=aspects)

/local_disk0/.ephemeral_nfs/envs/pythonEnv-7b4c08c6-365d-4481-b4a5-c7c62c416c5b/lib/python3.7/site-packages/aspect_based_sentiment_analysis/pipelines.py in __call__(self, text, aspects)
    206         task = self.preprocess(text, aspects)
    207         predictions = self.transform(task.examples)
--> 208         completed_task = self.postprocess(task, predictions)
    209         return completed_task
    210 

/local_disk0/.ephemeral_nfs/envs/pythonEnv-7b4c08c6-365d-4481-b4a5-c7c62c416c5b/lib/python3.7/site-packages/aspect_based_sentiment_analysis/pipelines.py in postprocess(task, batch_examples)
    301             aspect, = {e.aspect for e in examples}
    302             scores = np.max([e.scores for e in examples], axis=0)
--> 303             scores /= np.linalg.norm(scores, ord=1)
    304             sentiment_id = np.argmax(scores).astype(int)
    305             aspect_document = CompletedSubTask(

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
@xesaad
Copy link

xesaad commented Feb 9, 2022

Hi @brieucdandin! Unfortunately not an answer, but just wanted to share that I have also encountered the same issue and opened a PR in an attempt to fix it. It seems that the package author has been unresponsive for some time now though :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants