Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Bars for plotting measures #124

Open
wingedRuslan opened this issue Jun 24, 2019 · 10 comments
Open

Error Bars for plotting measures #124

wingedRuslan opened this issue Jun 24, 2019 · 10 comments

Comments

@wingedRuslan
Copy link
Collaborator

Before adding error bars to the plot created by plot_network_measures function, I want to make sure that I create error bars correctly. (so that I will not need to change again code)

Here is an example of plotting error bar for just 1 measure:
image

So I take the mean value of random measures and then calculate the standard deviation of the random values.

While plotting I pass the mean value as height and standard deviation as an errorbar.

Is that the right way?

@KirstieJane
Copy link
Member

Hi @wingedRuslan! This looks great.

Having just had a google around, I actually think showing the variance using the 95% confidence interval makes most sense (especially as for this plot the variance is so small!)

I think I'd code this up using the seaborn barplot function: https://seaborn.pydata.org/generated/seaborn.barplot.html. By default it will add the 95% CI error bars.

The work therefore is in building a data frame with the x and y values that you need. I don't have time to open up jupyter etc to figure out the code, but if you structure the data frame so that you can run hue="random" and then plot the real and random measures next to each other that should look great. (See for inspiration the plot from the documentation below).

>>> ax = sns.barplot(x="day", y="total_bill", hue="sex", data=tips)

image

The more I stare at this the more I wonder if the dataframe will have to be weirdly long to actually get the seaborn function working....so it might be better to go with something similar to what you've got above.

I can take a look tomorrow afternoon though if you have a bash 😄

@KirstieJane
Copy link
Member

Here's a formula for the 95% confidence interval if you do stay with the code you've added at the top of this issue: 0.96SD to 1.05SD (https://www.graphpad.com/guides/prism/7/statistics/stat_confidence_interval_of_a_stand.htm?toc=0&printWindow)

@wingedRuslan
Copy link
Collaborator Author

Hi @KirstieJane

Thank you for tips!

This is the plot I have when there are only 5 random graphs:

image

By default seaborn.barplot adds the 95% CI error bars.

What do you think about the figure?

@KirstieJane
Copy link
Member

This looks great!! I don’t like the colours (I like the grey for the random graphs) but the plot looks fantastic!

@KirstieJane
Copy link
Member

Oh! And one other thing. I think the ratio can be a bit “squarer”. Change the figsize option to make the bigger number (the x dimension) a little smaller.

@wingedRuslan
Copy link
Collaborator Author

@KirstieJane I did not test for random_graphs = 1000 (or at least 100), as my system froze 😕
So tomorrow I want to run the code for a bigger number of random graphs and I expect to see confidence interval on other measures as well. Right now it is displayed only for assortativity and shortest_path measures.

Sure, I will change colors and make figure squarer :)

@wingedRuslan
Copy link
Collaborator Author

I've made the plotting function to leverage seaborn capabilities.
But building an input (dataframe) for seaborn.barplot takes a lot of time.

image

100 random graphs - 5 mins

@KirstieJane What is your opinion on that?

@KirstieJane
Copy link
Member

@wingedRuslan can you split the timings into the "making random graphs", "network measures" and plotting steps? I suspect that 1 and 2 take a while, but the plotting should be very quick

@KirstieJane
Copy link
Member

(plot looks great btw!)

@wingedRuslan
Copy link
Collaborator Author

@KirstieJane ohhhhh, yesterday was a bad day for me in terms of attention, so many small mistakes

you are right about time complexities for different operations!
Yesterday, I plotted the measures without having already calculated measures for 100 random graphs, that's why it took 5 mins, as plotting function was calculating these values.

If we pass GraphBundle with already calculated measures, it is less than 1 sec for 100 random graphs.

image

image

yeah, finally I addressed all the comments and ready to submit my commits to PR #121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants