Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring tests to log io behavior #31

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

refactoring tests to log io behavior #31

wants to merge 11 commits into from

Conversation

betolink
Copy link
Member

This PR is to integrate code that logs fsspec network operations and tweak the I/O libraries (hypy, fsspec).
The fsspec-logs notebook should be a stand-alone test when ready (90% there)

@betolink
Copy link
Member Author

betolink commented Feb 6, 2024

I think this PR is ready to be reviewed, still work in progress. h5coro needs upstream changes to work with s3 links that don't require a login. We probably need to create a pip installable package called h5benchmark and then all the bootstrapping code and relative paths should go away.

The main notebooks are the ones with portable in their name, I think once we have subsetting ready we could move on to complete the benchmarking numbers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betolink I received an error in the 3rd codeblock:

results = xarray_original.run(io_params)

ZeroDivisionError: division by zero

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it on a fresh clone and didn't get that error, perhaps we do some screen sharing to see what's going on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, there is a bug, fixing it now....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betolink I received an error in the last plotting block:

for name, group in df.groupby(['tool', 'dataset', 'format']):
    tool, dataset, formated = name
    x = f'{tool}, {dataset}, {formated}'
    y = group['time'].mean()
    ax.bar(f'{tool}, {dataset}, {formated}', group['time'].mean(), label=f'{tool}, {dataset}, {formated}', align='center')
    ax.text(x, y + 0.05, f'{group["time"].mean():.2f}', ha='center', va='bottom', color='black', fontsize=8)

KeyError: 'tool'

Copy link
Member Author

@betolink betolink Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this notebook was just prototyping some stuff, the ones that we want to plot the results are the ones named "portable-"

Copy link
Member

@asteiker asteiker Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betolink We should add a warning to this notebook to explain that it's not functioning due to the anonymous access issue, and/or comment all the code blocks that produce an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, or in the meantime we can use the same granules just from the CryoCloud bucket.

Copy link
Member

@asteiker asteiker Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betolink

  • One small suggestion is to add a short title + description to the top of this notebook to explain its usage.
  • I also got a divide by zero error here in the 3rd code block:

results = h5py_original.run(io_params)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the h5py test notebook:

  • One small suggestion is to add a short title + description to the top of this notebook to explain its usage.
  • I also got a divide by zero error here in the 3rd code block:
    results = h5py_original.run(io_params)

Copy link
Member

@asteiker asteiker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to run many of the notebooks end to end (tested in CryoCloud). Once those errors are resolved plus suggested updates to include brief title and descriptions for the new notebooks, then it looks good on my end.

Also for our collaborators, do we need any additional documentation for them to utilize the notebooks (i.e. any breaking changes for the other CO formats that others would need to be aware of)?

@asteiker asteiker linked an issue Feb 8, 2024 that may be closed by this pull request
@betolink betolink removed the request for review from weiji14 February 12, 2024 16:20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these file names/paths still being used, or could we remove this file since the links were updated in links.json so it's effectively in the git history?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a second storage location (different hub) for the same set of files? Or are the files different?

Andy Barrett added 5 commits February 28, 2024 17:51
Saves results of experiments to results
Copied plotting from portable-full-comparison
Add plotting for end to end running
so that canonical plotting is in plot_benchmark_results.ipynb
@asteiker asteiker linked an issue Feb 29, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorporate fsspec notebook into testing code Re-plot performance testing based on individual files
3 participants