Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_column_plot produces misleading graphs (for uniform-like distributions) #302

Open
fealho opened this issue Feb 5, 2023 · 0 comments
Labels
feature request Request for a new feature

Comments

@fealho
Copy link
Member

fealho commented Feb 5, 2023

get_column_plot produces histograms which take a lot of liberty when representing the data, especially at the edges.

The Real data and the matplotlib plot represent the same data (ignore the synthetic data). Basically, the edges always start at 0.5 with the get_column_plot graph, which can be quite misleading.

Screenshot 2023-02-05 alle 7 40 54 AM

SDV code to generate the above:

    data = pd.DataFrame({'col1': np.random.random(1000)})
    metadata = SingleTableMetadata()
    metadata.detect_from_dataframe(data)
    synthesizer = GaussianCopulaSynthesizer(metadata)

    # Run and Assert
    synthesizer.fit(data)
    samples = synthesizer.sample(1000)
    print(samples)
    get_column_plot(data, samples, metadata, 'col1').show()

    import matplotlib.pyplot as plt
    plt.hist(data, 50)
    plt.ylabel('some numbers')
    plt.show()
@fealho fealho added the new Label applied to new issues label Feb 5, 2023
@npatki npatki added feature request Request for a new feature and removed new Label applied to new issues labels Mar 21, 2023
@npatki npatki changed the title get_column_plot produces misleading graphs get_column_plot produces misleading graphs (for uniform-like distributions) Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants