Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes in functions for generation and an example of vertical plot with additional charts #182

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

ennanco
Copy link

@ennanco ennanco commented Feb 24, 2022

I have slightly modified the functions generate_samples and generate_counts to get the data to develop a vertical Upsetplot with additional charts attached to it.

Copy link
Owner

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you feel like extending this to close #52 at the same time? (I.e. add some tests for these functions)

@jnothman
Copy link
Owner

And thank you for contributing!

@ennanco
Copy link
Author

ennanco commented Feb 28, 2022 via email

@ennanco
Copy link
Author

ennanco commented Feb 28, 2022

Already added the unitary tests for those two functions. You should review them but I think that you can close issue #52

@ennanco
Copy link
Author

ennanco commented Mar 2, 2022

I have made some additional changes, but some examples failed due to their usage of the property name which, by the way, can cause like in this case problems when we change from a single column view to a multi-column result

@jnothman
Copy link
Owner

jnothman commented Mar 2, 2022

With luck, I'll have time to look at this next week, @ennanco

@jnothman
Copy link
Owner

The failures currently in CI pertain to our continued support for very old version of Python (time to drop them, I think), and PEP8 violations in your contribution. Here are those PEP8 issues:

./upsetplot/data.py:40:31: E228 missing whitespace around modulo operator
./upsetplot/data.py:40:40: E225 missing whitespace around operator
./upsetplot/data.py:40:80: E501 line too long (82 > 79 characters)
./upsetplot/data.py:45:19: E228 missing whitespace around modulo operator
./upsetplot/data.py:45:28: E231 missing whitespace after ','
./upsetplot/data.py:49:26: E228 missing whitespace around modulo operator
./upsetplot/tests/test_data.py:210:1: E302 expected 2 blank lines, found 1
./upsetplot/tests/test_data.py:228:8: E111 indentation is not a multiple of 4
./upsetplot/tests/test_data.py:232:8: E111 indentation is not a multiple of 4
./upsetplot/tests/test_data.py:233:8: E111 indentation is not a multiple of 4
./upsetplot/tests/test_data.py:233:34: E231 missing whitespace after ','
./upsetplot/tests/test_data.py:234:8: E111 indentation is not a multiple of 4
./upsetplot/tests/test_data.py:237:48: E231 missing whitespace after ','
./upsetplot/tests/test_data.py:238:47: E231 missing whitespace after ','
./upsetplot/tests/test_data.py:239:53: E231 missing whitespace after ','
./upsetplot/tests/test_data.py:239:80: E501 line too long (80 > 79 characters)
./upsetplot/tests/test_data.py:243:80: E501 line too long (84 > 79 characters)
./upsetplot/tests/test_data.py:255:55: E226 missing whitespace around arithmetic operator
./upsetplot/tests/test_data.py:258:47: E231 missing whitespace after ','
./upsetplot/tests/test_data.py:259:29: E211 whitespace before '('
./upsetplot/tests/test_data.py:261:80: E501 line too long (86 > 79 characters)
./upsetplot/tests/test_data.py:267:1: W391 blank line at end of file
./examples/plot_vertical.py:31:80: E501 line too long (82 > 79 characters)
./examples/plot_vertical.py:33:80: E501 line too long (106 > 79 characters)
./examples/plot_vertical.py:39:1: W391 blank line at end of file

I could just adopt black to make resolving these simpler...

@jnothman
Copy link
Owner

Thanks for your efforts here! Can we make a point of keeping the existing behaviour of generate_counts stable and backwards compatible? That is, unless extra columns are requested, the user should not have to go .value. Should we call the new generation parameter extra_columns?

@ennanco
Copy link
Author

ennanco commented Mar 19, 2022 via email

@ennanco
Copy link
Author

ennanco commented Mar 19, 2022

I have finally make the proposed changes with the extra_columns parameter instead of len_sample in generate_counts

Copy link
Owner

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the same in generate_samples? Could you please update the docstring? Thanks

…consistency to use extra_columns in order to keep retrocompatibility
@ennanco
Copy link
Author

ennanco commented Mar 22, 2022

Yes, I thought about it, however, I don't want to change it without your indication. Sorry about the docstring, I have already updated it.

Copy link
Owner

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally reviewing this! It's looking pretty good, but I'm wondering about:

  • the use of the name value{i}, or whether there's a distinctly better column name
  • whether we should consider generating values in a way that varies with the categories, such that the visualisations show covariance

upsetplot/data.py Outdated Show resolved Hide resolved
upsetplot/data.py Outdated Show resolved Hide resolved
upsetplot/data.py Outdated Show resolved Hide resolved
upsetplot/data.py Outdated Show resolved Hide resolved
examples/plot_vertical.py Outdated Show resolved Hide resolved
ennanco and others added 5 commits January 2, 2023 10:01
Co-authored-by: Joel Nothman <joeln@canva.com>
Co-authored-by: Joel Nothman <joeln@canva.com>
Co-authored-by: Joel Nothman <joeln@canva.com>
Co-authored-by: Joel Nothman <joeln@canva.com>
Co-authored-by: Joel Nothman <joeln@canva.com>
df = pd.DataFrame({'value': np.zeros(n_samples)})
len_samples = 1 + extra_columns
df = pd.DataFrame(np.zeros((n_samples, len_samples)))
valuename_lst = [f'value{i}' if i > 0 else 'value' for i in
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just call this variable columns or column_names?

extra_columns=extra_columns)
df.drop('index', axis=1, inplace=True)
df = df if extra_columns > 0 else df.value
return df.groupby(level=list(range(n_categories))).count()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think counting is meaningful for the extra columns. Maybe we should use a different aggregate?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we shouldn't offer this functionality in generate_counts, making things somewhat simpler.

Comment on lines +50 to +51
r = rng.rand(n_samples, len_samples)
df[f'cat{i}'] = r[:, 0] > rng.rand()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This puzzles me. We're only using the first column of a random matrix of values, and extra_columns is unused.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about making the values correlate with the categories. Just put in the docstring that the extra column values may change in a future version so we have licence to do it later.

#########################################################################
# An UpSetplot with additional plots on vertical
# and tuning some visual parameters
example = generate_counts(extra_columns=2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using generate_samples here makes more sense? But maybe 10k samples is a lot for three swam plots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants