Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples of pandas and plotly in jupyter notebook #56

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gwenzel
Copy link

@gwenzel gwenzel commented Apr 13, 2020

Description:

A new jupyter notebook was added, showing how to import data from pandas, and how to plot data using the plotly packages. The data and examples are the same as in the initial notebook written by @felipehuerta17 .

List of changes

  • Added pandas and plotly to pipfile.
  • Added datasets folder with data from Ealing.
  • Added notebook with examples on how to use pandas and plotly.

@gwenzel gwenzel added the documentation Improvements or additions to documentation label Apr 13, 2020
@gwenzel gwenzel self-assigned this Apr 13, 2020
@jia200x
Copy link
Contributor

jia200x commented Apr 13, 2020

Thanks for the contribution!

There are some comments that IMO should be addressed before it gets to a "ready to merge" state:

  1. The commits are not following the Angular JS Commit.
  2. Some commits are unrelated to this PR (e.g d3fbba9)
  3. Commits should be atomic. A.k.a there should be one commit per semantic change. E.g one "chore(dataset): add csv with ealing data", one with "docs(jupyter): add Plotly example", etc

On the other hand, IMO this should be independent of pipenv. I would expect people to use this Jupyter right after doing "pip install opensir". So, neither "plotly" nor "pandas" should be in the Pipfile (although #35 added the dependency because it's intended to be shown as a Sphinx page).

Also, I would try to limit the scope of this file to one of these:

  • If this is intended to be used as a tutorial, I suggest to follow the approach of feat(model): extend fit to SIR-X  #35 and add more verbose information. Then we can include it in the Sphinx documentation
  • On the other side if this is intended to be used as an example application, I suggest to move this to a "samples" directory, and then we can explicitly link the documentation to the list of available samples.

What do you think?

PS: Regarding comments 1-3, if you have any questions don't hesitate to ask @felipehuerta17 , @leandrolanzieri , @sasalatart or me.

@felipehuerta17
Copy link
Contributor

Hi all,

Thanks for the very well and polished contribution @gwenzel ! Besides from the TI aspects, this looks a great example on how:

  1. Create a datasets/ealing.csv file. This is a much more reproducible approach and it differentiates the notebook from the SIR.ipynb. Additionally, it looks familiar for Data Scientists in terms of structure and data exploration.

  2. The key message from the notebook is using Plotly to develop interactive visualizations. This enhances the user-friendliness of the notebooks and increases the impact of open-sir. I agree with @jia200x that markdown cells will be very beneficial, but I think that the narrative should be focused on the decision that you make @gwenzel to create these amazing plots!

Observations
As the SIR model is already explained on SIR.ipynb, extend the narrative through:

  • Please describe the usage of plotly.graph_objects and make_subplots.
  • Provide references to plotly documentation if necessary. Just
  • Take the reader "from the hand" and assume that he knows nothing. After the first plot, invite the reader or user to hover their mouse over the figure and explain what he should see.
  • Describe why this plots are better than in SIR.ipynb. For example, including a date time index in the x axis make the results much more communicable and compelling.
  • Comment after Long Term model: with Plotly it's possible to know exactly the day of the peak and the number of infected rather than proving an estimation of applying a function like "max". This facilitates greatly the process of obtaining results.

They look excellent and they are within the scope of open-sir, not only "plotly", because you are merging your modelling thinking with a data visualization approach.

2.1) Figure 2.1 looks great and it's much better displayed than using matplotlib.
Note that now model.solve() provides the solution in terms of number of infected, so you don't need to multiply by the population. I will make a pull request with the changes.
In the Sensitivity Analysis, is there a way to show the legends below the figures, so the figures can scale better in displays with smaller aspect ratio?

I think this should be a separate tutorial, and have crosslinks with SIR.ipynb. As the hello-world is SIR.ipynb, after the sensitivity analysis we can provide a link on SIR.ipynb to this notebook to show how to create more compelling visualizations. In a similar way, before importing data provide a brief description of the SIR model (no diff eqs), and provide a link to SIR.ipynb and the documentation so the reader can get more context.

Copy link
Contributor

@felipehuerta17 felipehuerta17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution and first commit!

To be ready to merge:

  1. Take onboard the specific comments on the different parts of the code.
  2. Provide a narrative tutorial on "how to use plotly to increase the impact of the results that can be generated using open-sir"
  3. With the new repository, you may need to do from opensir.models import SIR,SIRX
  4. Let me know if you need help on doing the TI stuff with the commits. We can do this together in 5-10 minutes after you deal with the other changes. I tried to solve similar comments by my own in the past and took a lot of time for the first time and built frustration.

Pipfile Outdated
@@ -14,6 +14,8 @@ matplotlib = "*"
scipy = "*"
jupyter = "*"
sklearn = "*"
pandas = "*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated when you update your master repository and rebase this branch :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pandas" was already added, so this commit can be dropped

Pipfile Outdated
@@ -14,6 +14,8 @@ matplotlib = "*"
scipy = "*"
jupyter = "*"
sklearn = "*"
pandas = "*"
plotly = "*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jia200x @gwenzel here we need to make a decision whether we make plotly a dependency of opensir, or we write in the notebook that plotly is required and the user has to install it using pip install plotly in order to run the notebook.

At the moment, I wouldn't be against of making plotly a dependency as the plotly figures created by @gwenzel may constitute in the future a function of model.plot().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jia200x @gwenzel here we need to make a decision whether we make plotly a dependency of opensir, or we write in the notebook that plotly is required and the user has to install it using pip install plotly in order to run the notebook.

As described above, the user of this Jupyter is the End User, not the developer user.
There are a lot of problems of using Pipenv for end users (e.g if the user doesn't have Python 3.7, it won't be able to run opensir).

At the moment, I wouldn't be against of making plotly a dependency as the plotly figures created by @gwenzel may constitute in the future a function of model.plot().

If the model.plot uses plotly, then that PR should add the dependency. Adding spare dependencies is not recommended because it adds more "moving parts" if not used

3/26/2020,136
3/27/2020,165
3/28/2020,209
3/29/2020,241
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this and the idea of having a datasets folder

"# Convert into seconds\n",
"tf_long = long_term_days-1\n",
"sol_long = my_SIR_fitted.solve(tf_long, long_term_days).fetch()\n",
"N_S_long = sol_long[:,1]*P\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substitute

 N_S_long = sol_long[:,1]*P

for

N_S_long = sol_long[:,1]

As now the output of sol is in number of people.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies for line 145 and 146

@gwenzel
Copy link
Author

gwenzel commented Apr 16, 2020

Previous comments were taken into account and a new commit was made: e2ac3ea.
The new commit was also rebased to the new master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants