Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARSynthesizer is synthesizing integers for the sequence_key column when source data is text #1880

Open
srinify opened this issue Mar 29, 2024 · 0 comments
Labels
bug Something isn't working data:sequential Related to timeseries datasets

Comments

@srinify
Copy link

srinify commented Mar 29, 2024

Originally discovered here: #1875

Environment Details

SDV 1.10 (and 1.11 too)

Problem Statement

When attempting to synthesize sequential data, if you update a text column (e.g. stock tickers like AAPL, MSFT, etc) to the IDsd type, set it as thesequence_key`, the synthesized values for that column aren't text values but instead numerical ones:

image

Correct Behavior

We should match what we do in SingleTable, where synthesized text ID's are clearly text values:

id
__
synth-001
synth-002
synth-003

Workaround

For now, we recommend manually setting a regular expression for this ID column to let the SDV libraries know that we expect text here:

metadata.update_column(column_name='Name',
    sdtype='id',
    regex_format= "[A-Z]{4}"
)
@srinify srinify added bug Something isn't working new Automatic label applied to new issues labels Mar 29, 2024
@srinify srinify changed the title PARSynthesizer should synthesize PARSynthesizer is synthesizing integers for the sequence_key column when source data is text Mar 29, 2024
@srinify srinify added the data:sequential Related to timeseries datasets label Mar 29, 2024
@srinify srinify removed the new Automatic label applied to new issues label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:sequential Related to timeseries datasets
Projects
None yet
Development

No branches or pull requests

1 participant