Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example notebook database #1003

Closed
dnjohnstone opened this issue May 20, 2016 · 19 comments
Closed

Example notebook database #1003

dnjohnstone opened this issue May 20, 2016 · 19 comments
Milestone

Comments

@dnjohnstone
Copy link
Contributor

Example notebooks are often what people actually learn to use HyperSpy from - they also offer a chance to show people aspects of analysis that are handled via workflow rather than actual new HyperSpy code.

Creating a database of example notebooks as is started in the "demos" section of the website could therefore be very valuable for people. Really all this needs is for current users to have in mind that they could publish a notebook when they have a useful one locally.

@thomasaarholt
Copy link
Contributor

I really like this idea. Almost literally everything I've learnt has been through examples.

@thomasaarholt
Copy link
Contributor

It's very easy to view Jupyter notebooks on github with this (type in "hyperspy" and see):

https://nbviewer.jupyter.org/

You can link directly to examples like this.

@dnjohnstone
Copy link
Contributor Author

Great that you're keen @thomasaarholt

My thinking was that a notebook like that could be pull requested to the hyperspy-demos github repository, which is here... https://github.com/hyperspy/hyperspy-demos Then things merged this way add to hyperspy.org pretty trivially.

The biggest issue with this idea is maintenance - i.e. who takes responsibility for keeping notebooks up-to-date when new HyperSpy versions break a lot of stuff. For example, right now much of the EDS tutorial is "broken" because of the Python 3 switch and some other changes (I've fixed this and it will be released before 7th June).

Perhaps your suggestion of linking is good as it shifts the responsibility to whoever made the thing in the first place - but this also has some issues I think...

@k8macarthur
Copy link
Contributor

I'm keen on this idea and happy to contribute one on quantitative EDX with cross sections.

I think we definitely need to think about who is responsible for updating them. But also how to keep track of them.

Would it be too unreasonable to ask someone updating a function to check the work books? For example changing the order of how arguments are called in get_lines_spectrum() it would seem obvious to check the EDX workbooks?

@k8macarthur
Copy link
Contributor

Alternatively I guess users might flag up when they break.

A big update like the jump to Python 3 should include examining the work books just the same as the rest of the code was evaluated, I think.

@thomasaarholt
Copy link
Contributor

Might be worth just trying and see how it develops!

@k8macarthur
Copy link
Contributor

I think it would depend on how many workbooks there are so how much extra work it was for the person submitting the pull request.

@dnjohnstone
Copy link
Contributor Author

I think we should have a "core" set of notebooks focussed specifically on getting new users going - keeping these up to date should be a step in the checklist for each new version release. You will see that there are now some new ones as of this week and that these are updated for v0.8.5.

For more specialist ones probably we need people specifically interested in those to try to pull together - probably we'll have to leave it up to them and just insist that people specify which version it was developed under.

@thomasaarholt
Copy link
Contributor

I think having a core and an experimental (or other name) works well so that we can have one clear and maintained example set and one where people can show off / fiddle.

Those are some nice examples you've put up! (though the final image in this example gave me epilepsy [I love it])

Is there a particular reason that George Harrison is used as the in the examples?

@francisco-dlp
Copy link
Member

Maybe we could simply maintain the core ones as we already do and, in parallel, keep a list of links to "external notebooks" contributed by anybody who want to share their data analysis notebooks. In the latter case, the authors would be responsible to maintain them if they want to. If we go down this route, it'll be very helpful to have a notebook template as proposed in hyperspy/hyperspy-demos#11 .

@thomasaarholt, if you run through the SVD and BSS tutorial you'll see that the full Beatles (not only Harrison) are in the dataset. The reason is that I needed a dataset for blind source separation in 2010. I found out that 2010 was the 50th anniversary of The Beatles, so it seemed appropriate to separate then again with modern means as a tribute. I've used this example, and a variation of it, ever since for my introduction to BSS talks.

@francisco-dlp francisco-dlp added this to the Discussion milestone Jun 7, 2016
@dnjohnstone
Copy link
Contributor Author

If anyone hasn't noticed - we added a large number of new tutorials for a workshop a couple of weeks ago, which can be found here: http://nbviewer.jupyter.org/github/hyperspy/hyperspy-demos/tree/master/

These all work with v0.8.5 and use Signal1D/Signal2D

@k8macarthur
Copy link
Contributor

Question: From a stylistic point of view, would it be better to have the workbook run linearly from start to finish, or can sections be repeated? For example: If I were to write one work book EDX quantification, I would include how to extract the EDX cross section for each element, all the way through to quantification of a map. This is something that could just be repeated for each element, without writing out the whole thing each time. Thoughts?

@thomasaarholt
Copy link
Contributor

Could you write it as a for loop to loop through a list of the EDX signals instead?

@dnjohnstone
Copy link
Contributor Author

@k8macarthur If you're repeating the exact same methods over and over again it sounds like a moment when you want to write a function that does that whole thing.

@k8macarthur
Copy link
Contributor

I've uploaded my first draft workbooks here: https://github.com/k8macarthur/hyperspy_workbooks I would appreciate any suggestions or thoughts people have to improve them. I know the particle analysis one is not quite finished yet, but I'm hoping to have something working in time for EMC. Obviously plotting inline would make the more pretty and the functionality is almost there.

@thomasaarholt
Copy link
Contributor

We didn't settle on this. Since github (has recently?) lets one view uploaded .ipyn's in a read-only notebook view, should we just link to Examples and upload some notebook examples there?

@k8macarthur
Copy link
Contributor

Currently, I have just made my own personal repository and direct people towards it as necessary (on posters last week and in talks). I think a decision needs to be made in terms of maintenance and usability. Like @francisco-dlp said about links to external notebooks is probably the way most will be discovered I guess.

@dnjohnstone
Copy link
Contributor Author

My opinion at this stage is that people creating personal notebook repositories as @k8macarthur has done is a good model - it simplifies maintenance and doesn't put any additional burden on the developers. It's also very clear that whoever makes a little repository like that is responsible for its content and whether or not it's up to date.

Perhaps we can make some space on the website to link to these?

I would suggest having a good top matter as with the template we followed for e.g. https://github.com/hyperspy/hyperspy-demos/blob/master/electron_microscopy/EDS/TEM_EDS_nanoparticles.ipynb

Importantly it says which version of HyperSpy it was updated for so people can only expect it to work with that. You could also include a static copy of HyperSpy in that same repository if you want to never have to touch it again.

@dnjohnstone
Copy link
Contributor Author

Closing because we now have much more extensive examples and the preference seems to be for external example databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants