MultiClassifierDLApproach not transforming every row of my dataset #14218

AntoineF3006 · 2024-03-27T13:19:22Z

Is there an existing issue for this?

I have searched the existing issues and did not find a match.

Who can help?

No response

What are you working on?

I am currently working on a multi-output classification task, in order to classify some customers comments into several cateogories. I am using MultiClassifierDLApproach for this task, with already labeled data for training.
I followed this tutorial : https://www.johnsnowlabs.com/mastering-text-classification-with-spark-nlp.

Current Behavior

After fitting my pipeline (described below) on my train set, I am transforming my train and test sets with said pipeline. The results are pretty good, but on some rows the column category is empty and I don't have any calculated probabilities for any category.

Expected Behavior

I was expecting every row to get the probabilities for every category : maybe not selected categories since I have put a treshold at 0.5, but at least the values for each category.

Steps To Reproduce

https://drive.google.com/file/d/1tmJYwZKBVZoHtLcuyWtWhsu6nbonKG-S/view?usp=sharing

On this zip you will find a .ipynb recreating the steps I used to create my pipeline, some sample data and their results, and said pipeline already fitted.
The input column is texte_sw, the label is niveau_2_MC, the output is category.
The issue seems to happen uniformly on my data, the time and date, the length or the number of words doesn't seem to be the problem.

Spark NLP version and Apache Spark

sparknlp.version() : 5.2.3
spark.version : 3.2.0.3.2.7170.1008-2'

Type of Spark Application

Python Application

Java Version

No response

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

AntoineF3006 · 2024-04-03T09:43:57Z

Hello @maziyarpanahi, is my issue complete enough or do I need to add some more context or data in order to discuss the subject ?
Kind regards,

AntoineF3006 added the question label Mar 27, 2024

AntoineF3006 assigned maziyarpanahi Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiClassifierDLApproach not transforming every row of my dataset #14218

MultiClassifierDLApproach not transforming every row of my dataset #14218

AntoineF3006 commented Mar 27, 2024 •

edited

AntoineF3006 commented Apr 3, 2024

MultiClassifierDLApproach not transforming every row of my dataset #14218

MultiClassifierDLApproach not transforming every row of my dataset #14218

Comments

AntoineF3006 commented Mar 27, 2024 • edited

Is there an existing issue for this?

Who can help?

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Spark NLP version and Apache Spark

Type of Spark Application

Java Version

Java Home Directory

Setup and installation

Operating System and Version

Link to your project (if available)

Additional Information

AntoineF3006 commented Apr 3, 2024

AntoineF3006 commented Mar 27, 2024 •

edited