Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combining 'sentiment' and 'emotion' models causes crash #106

Open
jonnyzwi opened this issue Mar 26, 2022 · 1 comment
Open

combining 'sentiment' and 'emotion' models causes crash #106

jonnyzwi opened this issue Mar 26, 2022 · 1 comment
Assignees

Comments

@jonnyzwi
Copy link

jonnyzwi commented Mar 26, 2022

I'm working in a Google Colab notebook and I set up via

!wget http://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash

import nlu

a quick version check nlu.version() confirms 3.4.2

Several of the official tutorial notebooks (for ex.: XLNet)) create a multi-model pipeline that includes both 'sentiment' and 'emotion'.

Direct copy of content from the notebook:

import pandas as pd

# Download the dataset 
!wget -N https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/sarcasm/train-balanced-sarcasm.csv -P /tmp

# Load dataset to Pandas
df = pd.read_csv('/tmp/train-balanced-sarcasm.csv')

pipe = nlu.load('sentiment pos xlnet emotion') 

df['text'] = df['comment']

max_rows = 200

predictions = pipe.predict(df.iloc[0:100][['comment','label']], output_level='token')

predictions

However, running a prediction on this pipe results in the following error:


sentimentdl_glove_imdb download started this may take some time.
Approximate size to download 8.7 MB
[OK!]
pos_anc download started this may take some time.
Approximate size to download 3.9 MB
[OK!]
xlnet_base_cased download started this may take some time.
Approximate size to download 417.5 MB
[OK!]
classifierdl_use_emotion download started this may take some time.
Approximate size to download 21.3 MB
[OK!]
glove_100d download started this may take some time.
Approximate size to download 145.3 MB
[OK!]
tfhub_use download started this may take some time.
Approximate size to download 923.7 MB
[OK!]
sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<ipython-input-1-9b2e4a06bf65> in <module>()
     34 
     35 # NLU to gives us one row per embedded word by specifying the output level
---> 36 predictions = pipe.predict( df.iloc[0:5][['text','label']], output_level='token' )
     37 
     38 display(predictions)

9 frames
/usr/local/lib/python3.7/dist-packages/pyspark/sql/utils.py in raise_from(e)

IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in SentimentDLModel_6c1a68f3f2c7.

Current inputCols: sentence_embeddings@glove_100d. Dataset's columns:
(column_name=text,is_nlp_annotator=false)
(column_name=document,is_nlp_annotator=true,type=document)
(column_name=sentence,is_nlp_annotator=true,type=document)
(column_name=sentence_embeddings@tfhub_use,is_nlp_annotator=true,type=sentence_embeddings).
Make sure such annotators exist in your pipeline, with the right output names and that they have following annotator types: sentence_embeddings

Having experimented with various combinations of models, it turns out that the problem is caused whenever 'sentiment' and 'emotion' models are specified in the same pipeline (regardless of pipeline order or what other models are listed).

Running pipe = nlu.load('emotion ANY OTHER MODELS') or pipe = nlu.load('sentiment ANY OTHER MODELS') will be successful, so it really appears to be only a result of combining 'sentiment' and 'emotion'

Is this a known bug? Does anyone have any suggestions for fixing?

My temporary solution has been to run emoPipe = nlu.load('emotion').predict() in isolation, then inner join the resulting dataframe to the the resulting df of pipe = nlu.load('sentiment pos xlnet').predict().

However, I would like to understand better what is failing and to know if there is a way to streamline the inclusion of all models.

Thanks

@C-K-Loan
Copy link
Member

Thank you @jonnyzwifor this issue,

this is indeed a bug with the way NLU generates NLP pipelines and we are looking into fixing this

@C-K-Loan C-K-Loan self-assigned this Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants