Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing a Paper Alongside 1.0.0 Release #890

Open
CSSFrancis opened this issue Jan 11, 2023 · 13 comments
Open

Writing a Paper Alongside 1.0.0 Release #890

CSSFrancis opened this issue Jan 11, 2023 · 13 comments
Assignees
Milestone

Comments

@CSSFrancis
Copy link
Member

Is your feature request related to a problem? Please describe.

Maybe this is putting the Cart before the horse but it might be worth thinking about how a pyxem paper would be set up/ written/edited. The goal of a publication like this would be to better present our capabilities and advertise the package to a wider audience. I also think it is a good place to organize our goals and narrow our focus. Additionally there are several underlying decisions about processing. Specifically how data is loaded and how the map function upstream in hyperspy preforms which can be better explained in a longer form article as well.

Describe the solution you'd like
I have a couple of thoughts on this and would love any additional feedback.

  1. I would like the writing of the paper and development of the figures to be done in public.
    • Preferably this would be done on Github using something like a .rst .tex document.
    • All of the contributors would be coauthors on the paper so this allows for each coauthor to propose edits or help with the development.
    • The Figures would be ideally directly taken from the Demo notebooks so it is important to update those notebooks, but additionally it gives readers the ability to directly reproduce figures from the paper.
  2. I would like to include some information about changes to hyperspy/rosettasciio.
    • I think a valuable thing to include is information about how we save/ load data as well as how the map function is designed
    • Performance metrics depend largely on these two things and I feel that we should have an entire section dependent on performance of pyxem.

I'm very open to suggestions in this process. I can spearhead this effort but any support that people are willing to give is greatly appreciated. I can create an outline or @pc494 I image that you might have some writing associated with your thesis that we can repurpose.

@pc494 pc494 self-assigned this Jan 11, 2023
@pc494
Copy link
Member

pc494 commented Jan 11, 2023

I think we should write up our 1.0.0 and I think doing it (broadly) in the open is an excellent idea. I've had some success integrating Overleaf with Github (and private repositories now allow collaborators) so perhaps that's our best development option? Once that's agreed upon we can scope out length/journal/content etc?

@CSSFrancis
Copy link
Member Author

I think we should write up our 1.0.0 and I think doing it (broadly) in the open is an excellent idea. I've had some success integrating Overleaf with Github (and private repositories now allow collaborators) so perhaps that's our best development option? Once that's agreed upon we can scope out length/journal/content etc?

I think that in general that is a good idea and writing a paper by committee sounds fun! I'm good with a latex document, but I'm not terribly familiar with integrating github and overleaf/ a .tex editor. Is the general idea the same where people can create their own branches and then merge them? On one hand this sounds like a good idea but I don't know how annoying the merge conflicts are going to be. Then again it depends on how much concurrent writing is actually happening.

@uellue or @sk1p know that you have had some experience with similar open paper writing/development in LiberTEM and wonder if you had any suggestions for journal/collaboration etc? We would love any suggestions that you might have. I can't seem to find it but I recall you had a code review with a journal publication as well at sometime in the past? It might be worthwhile to do something equivalent to that.

@pc494
Copy link
Member

pc494 commented Jan 11, 2023

So you can write commits from overleaf directly and I tend to just edit in overleaf but with slightly better version tracking. What you could then do is use GitHub issues to keep track of any more significant problems.

@pc494 pc494 changed the title Writing a Paper Alongside 1.00.0 Release Writing a Paper Alongside 1.0.0 Release Jan 11, 2023
@uellue
Copy link

uellue commented Jan 12, 2023

@CSSFrancis Yes, we published at JOSS: https://joss.theoj.org/papers/10.21105/joss.02006

The paper draft is an MD file in the repo (ours is still here: https://github.com/LiberTEM/LiberTEM/tree/master/docs/publications/joss)

The review process is a GitHub discussion, much like a PR review.

Genereally, I can really recommend it for scientific software! It really helps to improve the documentation and software when a "random reviewer" tries to install and try it, too.

@CSSFrancis
Copy link
Member Author

@uellue Thank you for your input! I think the concept of an external code review is an excellent idea. For a largely software based submission it does seem appropriate that the software should be reviewed rather than just focusing on a document which can make dubious claims which might not be fully realized in the software due to bugs etc.

I think that ideally we would prepare a co-publication. https://joss.readthedocs.io/en/latest/submitting.html#co-publication-of-science-methods-and-software alongside the JOSS publication. I would suggest that we prepare a submission to Microscopy and Microanalysis as well as a submission to JOSS. I think that publishing in M&M allows us to reach a larger audience even if it isn't the best way to publish. That allows us to separate the methods, metrics, and capabilities into a separate publication.

@pc494
Copy link
Member

pc494 commented Jan 13, 2023

Carter and I spoke today via Zoom, with the consensus being the dual-publication route described above, with a potential target point of M&M 2023. However, a fair amount of coding needs to happen before then, so this particular issue might be quiet for some time.

@pc494 pc494 pinned this issue Jan 13, 2023
@hakonanes
Copy link
Member

A pyxem reference paper in M&M would be great. I hope to help whenever I find the time (away from kikuchipy).

The Figures would be ideally directly taken from the Demo notebooks

Agree. Ideally, the (raw) data should be downloadable from an open repository as well (Zenodo etc.).

Performance metrics depend largely on these two things and I feel that we should have an entire section dependent on performance of pyxem.

Providing tips on how to save and load data for optimal performance (memory, speed) would be great. Do you mean showing something like how file formats or chunking can affect various operations? I believe comparing performance of pyxem to other software is subjective (parameter/case dependent) and not that interesting. I suggest to focus more on the flexibility from which file format agnostic, reproducible and shareable workflows can be built instead.

@CSSFrancis
Copy link
Member Author

A pyxem reference paper in M&M would be great. I hope to help whenever I find the time (away from kikuchipy).

Any help would be appreciated!

Agree. Ideally, the (raw) data should be downloadable from an open repository as well (Zenodo etc.).

Providing tips on how to save and load data for optimal performance (memory, speed) would be great. Do you mean showing something like how file formats or chunking can affect various operations?

I think this could probably fill an entire paper but we can do our best here. I was intending to at very least compare loading binary data, poorly/well chunked hd5f data and poorly/well chunked zarr files. It would probably be good to show how each of these does (or does not) scale with the number of cores as well/ explain if they work with dask-distributed. This can be a lot, even before a discussion about hardware which further complicates things. Most of our operations are i-o bound so most of the potential gains for any operation come in the form of better storage hardware or better schemes/ streaming for compression. (This is why zarr is sooo much faster than hdf5 in many instances.)

I believe comparing performance of pyxem to other software is subjective (parameter/case dependent) and not that interesting. I suggest to focus more on the flexibility from which file format agnostic, reproducible and shareable workflows can be built instead.

I agree with this for the most part. Performance comparisons are pretty subjective, especially when everything is so i-o or chunking dependent. I also know that I can milk every bit of speed out of hyperspy/pyxem, something I definitely cannot do in other packages. That being said the landscape of 4D-STEM projects can be confusing to a newcomer and while we are the oldest of the projects we probably need to define our niche and what differentiates us.

Part of that is hyperspy. We depend on a highly used, collaborative project where performance improvements in one place drive performance in many different applications. We also have successfully integrated a couple of different projects in pixstem, pyxem and empyer which I think shows a really good track record. How our package is managed, developed and tested is a good credit towards the quality of the code that goes in as well as the quality of code upstream in hyperspy.

I really do think that our package is easy to use but this is subjective as well. But familiarity in hyperspy translates very well which is quite powerful. This means that someone who is mostly doing EELS and wants to do 4DSTEM and vise versa can easily switch between the packages. You can also do both EELS and 4DSTEM of the same sample, location etc without changing how the data is analyzed.

The last part of our draw is a consistent and large library of accelerated tools. We do need a good way to say that while we might not be the fastest at some things we are often comparable and have the largest library of accelerated tools.

@hakonanes
Copy link
Member

Showing scaling of some (key) operations with number of threads/cores is a good demonstration of the thought that has gone into developing an algorithm, I think.

I really do think that our package is easy to use but this is subjective as well.

Yeah, this is subjective. It depends on what you want to do as well. But that some things are relatively easy to do with pyxem will hopefully come across if a reader inspects some notebooks in the potential supplementary material.

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Dec 6, 2023

@pc494 @magnunor @dnjohnstone @JoonatanL

I have a reasonably fleshed out draft for the pyxem 1.0.0 paper and was wondering if any of you would be willing to review it? There are some figures which still need to created as the software is completed but that can be added later.

I can upload it to GitHub if anyone else is interested and would like to make edits. For the author list I have just taken everyone who has contributed and added them and then added my advisor as well as Paul Midgely. Are there any other advisors who I should add? I would think that anyone who gave major suggestions on the architecture of the package should be listed.

@magnunor
Copy link
Collaborator

magnunor commented Dec 7, 2023

@CSSFrancis, I can have a look it at in two weeks. Currently got a very big application deadline coming up next week.

@pc494
Copy link
Member

pc494 commented Dec 7, 2023

@CSSFrancis I'm happy to review. I think any of the email addresses you have for me should work.

@CSSFrancis
Copy link
Member Author

@magnunor @pc494 sonds good! I'm going to give it another couple of read throughs and I'll send it your way

@pc494 pc494 removed the 1.0.0 label Jan 26, 2024
@CSSFrancis CSSFrancis added this to the v1.0.0 milestone Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants