Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards compatibility #119

Open
josepablocam opened this issue Sep 11, 2020 · 5 comments
Open

Backwards compatibility #119

josepablocam opened this issue Sep 11, 2020 · 5 comments

Comments

@josepablocam
Copy link

Thank you all for the great package! It is really lovely to be able to include pmlb in a project and only fetch datasets as need (while avoiding having to personally host and provide access to others directly). Recently I ran into the situation that older versions of pmlb now return 404 errors, and when upgrading pmlb some datasets no longer exist or have been renamed. In general, not a huge deal for ad-hoc projects, but pmlb provides this great functionality for some longer term artifacts.

With that in mind, is there any interest in receiving a patch for (some amount) of backwards compatibility? It would be great if datasets that were removed because they are duplicates re-directed to their canonical names, and similarly that failed requests made a best-efforts attempt to provide an alternative request that succeeds.

I'd be happy to take a look at this, but wanted to gauge interest first (in the meantime I've found myself having to include old datasets directly in repos).

@lacava
Copy link
Collaborator

lacava commented Sep 16, 2020

hi @josepablocam, we'd be happy to receive such a patch and would really appreciate it! I think it's a bit tricky to handle, so I'm not sure of how to do it, but let us know what you're thinking. Maybe @weixuanfu or @trang1618 or @JDRomano2 can weigh in as well.

@josepablocam
Copy link
Author

@lacava great, let me take a look and plan something out and I'll share here and see what everyone thinks. Thanks again for the library, really appreciate all the work you and the other maintainers/developers have put into this!

@trangdata
Copy link
Collaborator

Thank you for the suggestion and for offering to help @josepablocam! I'm wondering if a mapping from old dataset names to current dataset names would help?

I do want to emphasize that using the current benchmark collection would be most recommended to avoid errors in past data.

@JDRomano2
Copy link
Collaborator

The easiest way to do this is - as @trang1618 said - create a file that lives someplace in the source tree, mapping obsolete database names to their current names, and have fetch_data() check against the contents of this file every time you run it.

Obviously, the tough part will be retroactively identifying all of these changes up until the current version. Has there been any convention (formal or informal) for mentioning when a database name changes, like in commit messages?

As part of this, there is probably a more graceful way PMLB can fail when it can't find the database other than returning a 404. Even just error text with a link to currently valid databases and a link to a Github Issue template for reporting it when legacy database access has 'broken', so we could add it to this hypothetical mapping file.

@lacava
Copy link
Collaborator

lacava commented Sep 16, 2020

create a file that lives someplace in the source tree, mapping obsolete database names to their current names, and have fetch_data() check against the contents of this file every time you run it.

that seems like a good solution for handling revisions to dataset names within the current version. i realize now I was thinking of "forwards" compatibility when mentioning trickiness, i.e. coming up with some kind of link redirect strategy or using symbolic links to work with older released versions of fetch_data() (e.g. in https://pypi.org/project/pmlb/0.3/).

Obviously, the tough part will be retroactively identifying all of these changes up until the current version. Has there been any convention (formal or informal) for mentioning when a database name changes, like in commit messages?

I don't think so.

As part of this, there is probably a more graceful way PMLB can fail when it can't find the database other than returning a 404. Even just error text with a link to currently valid databases and a link to a Github Issue template for reporting it when legacy database access has 'broken', so we could add it to this hypothetical mapping file.

I like this idea as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants