Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for columns contains only numbers. #737

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gszecsenyi
Copy link

@gszecsenyi gszecsenyi commented Nov 7, 2023

In the case where the column names contain only numbers, typically when scaling the data, the code doesn't work well because the first value of the variable will be a numeric value, not a string, to which you cannot append a string later.

The executed commands are:

synthesizer = SingleTablePreset(
metadata,
name='FAST_ML'
)

synthesizer.fit(
data=test_df
)

synthetic_data = synthesizer.sample(
num_rows=500
)

synthetic_data.head()

The output:

File [~/GitHub/ml_network_analysis_experiments/.venv/lib/python3.9/site-packages/rdt/transformers/base.py:367](https://file+.vscode-resource.vscode-cdn.net/Users/**********/GitHub/ml_network_analysis_experiments/~/GitHub/ml_network_analysis_experiments/.venv/lib/python3.9/site-packages/rdt/transformers/base.py:367), in BaseTransformer._set_seed(self, data) 365 hash_value = self.columns[0] 366 for value in data.head(5): --> 367 hash_value += str(value) 369 hash_value = int(hashlib.sha256(hash_value.encode('utf-8')).hexdigest(), 16) 370 self.random_seed = hash_value % ((2 ** 32) - 1) # maximum value for a seed

This is why this modifications are needed.

self.column_prefix = '#'.join(map(str, self.columns))

and

hash_value = str(self.columns[0])

@gszecsenyi gszecsenyi requested a review from a team as a code owner November 7, 2023 21:54
@gszecsenyi gszecsenyi requested review from lajohn4747 and removed request for a team November 7, 2023 21:54
@gszecsenyi gszecsenyi changed the title Support for columns containing only numbers. Support for columns contains only numbers. Nov 7, 2023
@sdv-dev sdv-dev deleted a comment from CLAassistant Nov 8, 2023
@npatki
Copy link
Contributor

npatki commented Nov 8, 2023

Hello! Thanks for your interest in contributing to the SDV software. Before we are able to review or approve your code changes, we require that you read and sign our new Contributor License Agreement (CLA).

To request a CLA, please fill out the required information in this form: https://bit.ly/sdv-cla-form

Once we receive your submission, we'll get back to you with more details. Thanks, and let us know if you have any questions.

@gszecsenyi
Copy link
Author

Thank you, I submitted my contact data.

@gszecsenyi
Copy link
Author

Is there any update? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants