Pipelines with dedicated score names don't work #388

hechth · 2023-02-23T14:47:07Z

Describe the bug
When executing the following code, the score names are apparently not handled or compared correctly, causeing an error.

To Reproduce

from matchms.similarity import MetadataMatch, FingerprintSimilarity
from matchms.importing import load_from_msp, scores_from_json
from matchms import Scores, set_matchms_logger_level
from matchms.filtering import add_fingerprint
set_matchms_logger_level("WARNING")

# use case with multiple steps in the pipeline causing a bug
var_references = list(load_from_msp("file.msp"))
var_queries = list(load_from_msp("file2.msp"))

# fingerprint similarity
var_queries = list(map(add_fingerprint, var_queries))
var_references = list(map(add_fingerprint, var_references))
scores = Scores(references=var_references, queries=var_queries, is_symmetric=False)

similarity = FingerprintSimilarity(similarity_measure="jaccard")
new_scores = scores.calculate(similarity, name="test", array_type="numpy")

var_field = "retention_time"
var_tolerance = 0.05
similarity = MetadataMatch(field = var_field, matching_type="difference", tolerance=var_tolerance)
name = "MetadataMatch_" + var_field + str(var_tolerance)

new_scores_v2 = new_scores.calculate(similarity, name=name, array_type="sparse", join_type="inner")

Expected behavior
The correct score names should be used in the join operation.

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?008536b3-20d7-4ba5-8598-a8a6aa345a31)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 18
     15 similarity = MetadataMatch(field = var_field, matching_type="difference", tolerance=var_tolerance)
     16 name = "MetadataMatch_" + var_field + str(var_tolerance)
---> 18 new_scores_v2 = new_scores.calculate(similarity, name=name, array_type="sparse", join_type="inner")

File c:\Users\473355\Miniconda3\envs\matchms-pipeline\lib\site-packages\matchms\Scores.py:183, in Scores.calculate(self, similarity_function, name, array_type, join_type)
    181 elif len(new_scores.score_names) == 1:
    182     new_scores.data.dtype.names = [name]
--> 183     self._scores.add_sparse_data(new_scores.row,
    184                                  new_scores.col,
    185                                  new_scores.data, "", join_type=join_type)
    186 else:
    187     self._scores.add_sparse_data(new_scores.row,
    188                                  new_scores.col,
    189                                  new_scores.data, name, join_type=join_type)

File c:\Users\473355\Miniconda3\envs\matchms-pipeline\lib\site-packages\sparsestack\StackedSparseArray.py:332, in StackedSparseArray.add_sparse_data(self, row, col, data, name, join_type)
    330     assert np.max(row) <= self.shape[0], "row values have dimension larger than sparse stack"
    331     assert np.max(col) <= self.shape[1], "column values have dimension larger than sparse stack"
--> 332 self.row, self.col, self.data = join_arrays(self.row, self.col, self.data,
    333                                             row, col,
    334                                             data,
    335                                             name,
...
     98     for dname in data2.dtype.names:
---> 99         data_join[f"{name}_{dname}"][idx_right_new] = data2[dname][idx_right]
    100 return data_join

ValueError: no field of name _MetadataMatch_retention_time0.05

The text was updated successfully, but these errors were encountered:

hechth · 2023-04-03T09:44:26Z

To reproduce this error, you can use the galaxy wrappers in this PR: RECETOX/galaxytools#321

hechth added the bug Something isn't working label Feb 23, 2023

hechth mentioned this issue Feb 23, 2023

New matchms galaxy tool design RECETOX/galaxytools#322

Closed

hechth self-assigned this Feb 27, 2023

zargham-ahmad linked a pull request Apr 21, 2023 that will close this issue

Fix pipelines with dedicated score names #406

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipelines with dedicated score names don't work #388

Pipelines with dedicated score names don't work #388

hechth commented Feb 23, 2023

hechth commented Apr 3, 2023

Pipelines with dedicated score names don't work #388

Pipelines with dedicated score names don't work #388

Comments

hechth commented Feb 23, 2023

hechth commented Apr 3, 2023