Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelines with dedicated score names don't work #388

Open
hechth opened this issue Feb 23, 2023 · 1 comment · May be fixed by #406
Open

Pipelines with dedicated score names don't work #388

hechth opened this issue Feb 23, 2023 · 1 comment · May be fixed by #406
Assignees
Labels
bug Something isn't working

Comments

@hechth
Copy link
Collaborator

hechth commented Feb 23, 2023

Describe the bug
When executing the following code, the score names are apparently not handled or compared correctly, causeing an error.

To Reproduce

from matchms.similarity import MetadataMatch, FingerprintSimilarity
from matchms.importing import load_from_msp, scores_from_json
from matchms import Scores, set_matchms_logger_level
from matchms.filtering import add_fingerprint
set_matchms_logger_level("WARNING")

# use case with multiple steps in the pipeline causing a bug
var_references = list(load_from_msp("file.msp"))
var_queries = list(load_from_msp("file2.msp"))

# fingerprint similarity
var_queries = list(map(add_fingerprint, var_queries))
var_references = list(map(add_fingerprint, var_references))
scores = Scores(references=var_references, queries=var_queries, is_symmetric=False)

similarity = FingerprintSimilarity(similarity_measure="jaccard")
new_scores = scores.calculate(similarity, name="test", array_type="numpy")

var_field = "retention_time"
var_tolerance = 0.05
similarity = MetadataMatch(field = var_field, matching_type="difference", tolerance=var_tolerance)
name = "MetadataMatch_" + var_field + str(var_tolerance)

new_scores_v2 = new_scores.calculate(similarity, name=name, array_type="sparse", join_type="inner")

Expected behavior
The correct score names should be used in the join operation.

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?008536b3-20d7-4ba5-8598-a8a6aa345a31)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 18
     15 similarity = MetadataMatch(field = var_field, matching_type="difference", tolerance=var_tolerance)
     16 name = "MetadataMatch_" + var_field + str(var_tolerance)
---> 18 new_scores_v2 = new_scores.calculate(similarity, name=name, array_type="sparse", join_type="inner")

File c:\Users\473355\Miniconda3\envs\matchms-pipeline\lib\site-packages\matchms\Scores.py:183, in Scores.calculate(self, similarity_function, name, array_type, join_type)
    181 elif len(new_scores.score_names) == 1:
    182     new_scores.data.dtype.names = [name]
--> 183     self._scores.add_sparse_data(new_scores.row,
    184                                  new_scores.col,
    185                                  new_scores.data, "", join_type=join_type)
    186 else:
    187     self._scores.add_sparse_data(new_scores.row,
    188                                  new_scores.col,
    189                                  new_scores.data, name, join_type=join_type)

File c:\Users\473355\Miniconda3\envs\matchms-pipeline\lib\site-packages\sparsestack\StackedSparseArray.py:332, in StackedSparseArray.add_sparse_data(self, row, col, data, name, join_type)
    330     assert np.max(row) <= self.shape[0], "row values have dimension larger than sparse stack"
    331     assert np.max(col) <= self.shape[1], "column values have dimension larger than sparse stack"
--> 332 self.row, self.col, self.data = join_arrays(self.row, self.col, self.data,
    333                                             row, col,
    334                                             data,
    335                                             name,
...
     98     for dname in data2.dtype.names:
---> 99         data_join[f"{name}_{dname}"][idx_right_new] = data2[dname][idx_right]
    100 return data_join

ValueError: no field of name _MetadataMatch_retention_time0.05
@hechth hechth added the bug Something isn't working label Feb 23, 2023
@hechth hechth self-assigned this Feb 27, 2023
@hechth
Copy link
Collaborator Author

hechth commented Apr 3, 2023

To reproduce this error, you can use the galaxy wrappers in this PR: RECETOX/galaxytools#321

@zargham-ahmad zargham-ahmad linked a pull request Apr 21, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant