Setting ignore_attribute with edit_dataset only uses last attribute #1289

amueller · 2023-11-03T01:27:49Z

So I tried to create a new version of cylinder-bands because of openml/openml-data#59

import openml
ds = openml.datasets.get_dataset("cylinder-bands", version=2)
new_did = openml.datasets.fork_dataset(data_id=ds.id)
openml.datasets.edit_dataset(new_did , ignore_attribute=[
            "timestamp",
            "cylinder_number",
              "job_number"
        ])

However, that seems to have replaced the ignore_attribute just with "job_number" as you can see here:
https://www.openml.org/api/v1/json/data/45686

Opening this here since I used the Python interface, but the Python code looks pretty easy, so maybe it's an issue in the backend?

The text was updated successfully, but these errors were encountered:

amueller · 2023-11-03T01:44:15Z

The XML send by the client code is

'<?xml version="1.0" encoding="utf-8"?>\n<oml:data_edit_parameters xmlns:oml="http://openml.org/openml"><oml:ignore_attribute>timestamp</oml:ignore_attribute><oml:ignore_attribute>cylinder_number</oml:ignore_attribute><oml:ignore_attribute>job_number</oml:ignore_attribute></oml:data_edit_parameters>'

Not sure if that is correct? @joaquinvanschoren ?

PGijsbers · 2023-11-03T10:24:10Z

Strange, when I query the production database, for dataset 45686 only timestamp and cylinder_number are saved with ignore_attribute set, job_number is not. I am very confused about what could be happening here. Hoping that someone more familiar with the back-end can chime in.

SELECT * FROM `data_feature` WHERE `did`=45686

joaquinvanschoren · 2023-11-03T11:07:57Z

Looks like a bug in the API:
https://github.com/openml/OpenML/blob/858b9d471554bfd70b30bd16f53226c8ab916fa9/openml_OS/models/api/v1/Api_data.php#L564

This seems to overwrite the previous values in the same request. It also looks like every call replaces the columns to be ignored, it doesn't add them. As a workaround, passing all values at once (comma-separated string) should work (but haven't tested it).

@PGijsbers what do you think? Is it worth fixing this in API v1 or do a workaround now and fix this in API v2?

joaquinvanschoren · 2023-11-03T11:11:28Z

@PGijsbers

Eh, could it be that the /data/edit endpoint only changes the dataset table, not the data_feature table?

Looks like it:
https://github.com/openml/OpenML/blob/858b9d471554bfd70b30bd16f53226c8ab916fa9/openml_OS/models/api/v1/Api_data.php#L623C31-L623C38

PGijsbers · 2023-11-03T12:40:43Z

The dataset table does only list "job_number" as ignore_attribute, explaining the response.

@amueller Does Joaquin's suggested workaround work?

@joaquinvanschoren If the work-around works, we could hotfix openml-python. To the best of my knowledge it is the only way this particular endpoint is exposed (it's not even listed in the documentation). My preference would be to fix this server-side though if it's not too much trouble, since with v2 we would want to expected format be an explicit list instead of a comma-separated string. If we apply our hotfix to openml-python now, we would need to adjust openml-python again once v2 standards are adopted.

amueller · 2023-11-06T21:29:14Z

Do you mean passing all values at once as a string? I tried that before opening the issue, the server-side validation didn't seem to like it the way I did it. There might be another way, though?

amueller · 2023-11-15T20:53:02Z

It would be great to have a work-around for this, I'd really like to use this dataset.

PGijsbers · 2023-11-16T12:52:12Z

For me the workaround seems to work? openml.org/d/45705

import openml
ds = openml.datasets.get_dataset("cylinder-bands", version=2)
new_did = openml.datasets.fork_dataset(data_id=ds.id)
openml.datasets.edit_dataset(new_id, ignore_attribute='timestamp,cylinder_number,job_number')

after removing cache:

>>> import openml
>>> d=openml.datasets.get_dataset(45705)
...WARNINGS...
>>> d.ignore_attribute
['timestamp', 'cylinder_number', 'job_number']
>>>

Please try again with the provided script, perhaps there were other formatting errors when you tried the workaround. If that still doesn't work, please provide the error message. And also the dataset id of the dataset that you tried to modify (i.e., your "fork" (new_id), not the original).

Running on a dev version of openml-python, but I don't think there have been any changes that would affect this for many releases.

amueller · 2023-11-16T17:57:44Z

I think I had spaces after the comma, that might have been the issue. Thank you! Version two is my fork IIRC :)

amueller · 2023-11-16T20:54:37Z

FYI it seems that if you fork a dataset, it keeps the owner by default. I'm not sure if that's intentional?

PGijsbers · 2023-11-17T11:03:32Z

I am not sure what you mean by that. I see multiple uploaders:

amueller · 2023-11-28T22:26:41Z

Hm ok so this is the last person that edited it? Because 45705 was the one I created and it's now "uploaded" by you.

PGijsbers · 2023-11-29T09:23:27Z

I am a little confused. Are you saying that the "uploader" for a specific dataset id changed? E.g., 45705 was first marked as "uploaded by you" and now "uploaded by me"? Because I don't think that's supposed to happen.

PGijsbers mentioned this issue Nov 3, 2023

POST /data/edit openml/server-api#86

Open

PGijsbers mentioned this issue Nov 3, 2023

Avoid storing duplicate information in the database openml/server-api#87

Open

PGijsbers added bug serverside These issues are present in the rest API and not fixable by the Python package. labels Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting ignore_attribute with edit_dataset only uses last attribute #1289

Setting ignore_attribute with edit_dataset only uses last attribute #1289

amueller commented Nov 3, 2023

amueller commented Nov 3, 2023

PGijsbers commented Nov 3, 2023

joaquinvanschoren commented Nov 3, 2023

joaquinvanschoren commented Nov 3, 2023

PGijsbers commented Nov 3, 2023 •

edited

amueller commented Nov 6, 2023 •

edited

amueller commented Nov 15, 2023

PGijsbers commented Nov 16, 2023 •

edited

amueller commented Nov 16, 2023 •

edited

amueller commented Nov 16, 2023

PGijsbers commented Nov 17, 2023

amueller commented Nov 28, 2023

PGijsbers commented Nov 29, 2023

Setting ignore_attribute with edit_dataset only uses last attribute #1289

Setting ignore_attribute with edit_dataset only uses last attribute #1289

Comments

amueller commented Nov 3, 2023

amueller commented Nov 3, 2023

PGijsbers commented Nov 3, 2023

joaquinvanschoren commented Nov 3, 2023

joaquinvanschoren commented Nov 3, 2023

PGijsbers commented Nov 3, 2023 • edited

amueller commented Nov 6, 2023 • edited

amueller commented Nov 15, 2023

PGijsbers commented Nov 16, 2023 • edited

amueller commented Nov 16, 2023 • edited

amueller commented Nov 16, 2023

PGijsbers commented Nov 17, 2023

amueller commented Nov 28, 2023

PGijsbers commented Nov 29, 2023

PGijsbers commented Nov 3, 2023 •

edited

amueller commented Nov 6, 2023 •

edited

PGijsbers commented Nov 16, 2023 •

edited

amueller commented Nov 16, 2023 •

edited