Converted PLSC to hierarchical #704

x-tabdeveloping · 2024-05-14T13:07:25Z

Checklist for adding MMTEB dataset

Reason for dataset addition:
Converted both PLSC tasks (S2S, P2P) to hierarchical clustering. #702

x-tabdeveloping · 2024-05-14T13:08:16Z

The later levels seem very hard. Maybe we should limit the levels to two?

x-tabdeveloping · 2024-05-14T13:30:58Z

I'm not sure whether the way I formulated the task makes sense.
@rafalposwiata You added the dataset initially, therefore you might know: Is the "disciplines" column hierarchically ordered or just multilabel? Or could "scientific_fields" be used as the first level and "disciplines" as the second? What's your take on this?

KennethEnevoldsen

Looks good

KennethEnevoldsen · 2024-05-14T18:54:10Z

results/intfloat__multilingual-e5-small/PLSCHierarchicalClusteringP2P.json

+        0.09242363263566819,
+        0.08387202889701235
+      ],
+      "Level 3": [


I would cut from here and down

rafalposwiata · 2024-05-14T19:32:12Z

Is the "disciplines" column hierarchically ordered or just multilabel?

Disciplines are multilabel but for the added clustering tasks I chose only those cases where there is one discipline.

Or could "scientific_fields" be used as the first level and "disciplines" as the second?

Yes, "scientific_fields" could be used as the first level and "disciplines" as the second.

The entire dataset is available at https://huggingface.co/datasets/rafalposwiata/plsc

KennethEnevoldsen · 2024-05-21T09:59:27Z

@x-tabdeveloping will you add points for this then I believe it is ready to merge

x-tabdeveloping · 2024-05-21T12:41:16Z

I'm not sure though. The task formulation might be wrong. I think doing "scientific_fields" as first level and "disciplines" as the second might be the way to go.
From what I've gathered it seems that this is just multilabel, not hierarchical the way I formulated it, right @rafalposwiata ?

KennethEnevoldsen · 2024-05-21T12:42:58Z

@x-tabdeveloping but the current approach is fine with that right? As I understand it is just does the clustering at each level?

x-tabdeveloping · 2024-05-21T13:11:31Z

Yes, unless the order is not fixed, and I don't know if it is (we have to check)

KennethEnevoldsen · 2024-05-21T14:06:21Z

Right. Once checked we can either close or merge

x-tabdeveloping · 2024-05-21T14:50:36Z

Nope, it's not hierarchical at all. We can maybe rephrase it as multilabel classification if we really want to, otherwise fine to leave it as flat clustering.

KennethEnevoldsen · 2024-05-27T14:54:58Z

Let us leave it as flat clustering

x-tabdeveloping added 2 commits May 14, 2024 14:52

Added PLSC hierarchical clustering

dae8858

Added results for PLSC hierarchical clustering

07367c9

KennethEnevoldsen approved these changes May 14, 2024

View reviewed changes

imenelydiaker assigned KennethEnevoldsen May 15, 2024

x-tabdeveloping closed this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converted PLSC to hierarchical #704

Converted PLSC to hierarchical #704

x-tabdeveloping commented May 14, 2024

x-tabdeveloping commented May 14, 2024

x-tabdeveloping commented May 14, 2024

KennethEnevoldsen left a comment

KennethEnevoldsen May 14, 2024

rafalposwiata commented May 14, 2024 •

edited

KennethEnevoldsen commented May 21, 2024

x-tabdeveloping commented May 21, 2024

KennethEnevoldsen commented May 21, 2024

x-tabdeveloping commented May 21, 2024

KennethEnevoldsen commented May 21, 2024

x-tabdeveloping commented May 21, 2024

KennethEnevoldsen commented May 27, 2024

Converted PLSC to hierarchical #704

Converted PLSC to hierarchical #704

Conversation

x-tabdeveloping commented May 14, 2024

Checklist for adding MMTEB dataset

x-tabdeveloping commented May 14, 2024

x-tabdeveloping commented May 14, 2024

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

KennethEnevoldsen May 14, 2024

Choose a reason for hiding this comment

rafalposwiata commented May 14, 2024 • edited

KennethEnevoldsen commented May 21, 2024

x-tabdeveloping commented May 21, 2024

KennethEnevoldsen commented May 21, 2024

x-tabdeveloping commented May 21, 2024

KennethEnevoldsen commented May 21, 2024

x-tabdeveloping commented May 21, 2024

KennethEnevoldsen commented May 27, 2024

rafalposwiata commented May 14, 2024 •

edited