Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update copy analysis result and experiment data behavior #1432

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

coruscating
Copy link
Collaborator

Summary

Fixes #1396. When an ExperimentData object was copied, it didn't update the analysis result IDs and their associated experiment IDs, resulting in the copied experiment entry not having analysis results in the cloud service.

Details and comments

  • The code to to assign new result IDs when copying an AnalysisResultTable isn't very efficient because it has to loop through the rows and call _create_unique_hash() for each ID. Maybe this logic can be refactored.

@coruscating coruscating added the backport stable potential The issue or PR might be minimal and/or import enough to backport to stable label Mar 27, 2024
@coruscating coruscating marked this pull request as ready for review March 27, 2024 21:15
@coruscating coruscating added this to the Release 0.7 milestone Apr 10, 2024
Copy link
Contributor

@wshanks wshanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few questions but this looks like the right fix.

@@ -204,16 +204,26 @@ def clear(self):
with self._lock:
self._data = pd.DataFrame(columns=self.DEFAULT_COLUMNS)

def copy(self):
def copy(self, new_ids: bool = True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing uses new_ids=False, so should we just not include that option for now? It's a question of why a user is calling copy. If making a copy, it seems likely the user would want to change something and then in that case likely that new IDs should be used?

new_ids: Whether to generate new IDs for copied entries. Defaults to True.

Returns:
A new shallow copied DataFrame object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the use of DataFrame here because the DataFrame is an internal container of the data not directly exposed to the user. Also, maybe the point about being shallow is better in a .. note:: block like it was and here it could just say A copy of the analysis result table? I think the original note was not clear enough. The default columns of the data frame are all strings and numbers which are all immutable so the only difference between being shallow and deep is saving some memory. However, I think it's possible that the user could add a column with mutable objects? The concern there is that then mutating the objects in the copied table would also mutate the ones in the original. Maybe this is a pretty minor concern -- would a user add mutable objects to an analysis result table?

exp_data.add_data(self._get_job_result(1))
copied = exp_data.copy(copy_results=False)
self.assertEqual(exp_data.data(), copied.data())
self.assertFalse(copied.analysis_results())
self.assertEqual(exp_data.provider, copied.provider)

def test_copy_analysis_results(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Should there be a test of copy in test_analysis_results_table.py as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport stable potential The issue or PR might be minimal and/or import enough to backport to stable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Copied analysis result not saving to cloud service
2 participants