Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotate_metadata_with_index raises ValueError when dtypes don't match #948

Open
joverlee521 opened this issue May 13, 2022 · 0 comments · May be fixed by #952
Open

annotate_metadata_with_index raises ValueError when dtypes don't match #948

joverlee521 opened this issue May 13, 2022 · 0 comments · May be fixed by #952
Assignees
Labels
bug Something isn't working easy problem Requires less work than most issues

Comments

@joverlee521
Copy link
Contributor

Current Behavior
Within annotate_metadata_with_index, the index TSV is read without explicitly set dtypes. This can lead to numerical id strain values to be interpreted as numbers, leading to a ValueError when merging with the metadata.

Additional context
See error reported by user: https://discussion.nextstrain.org/t/value-error-trying-to-merge-on-object-and-int64-columns/1106

@joverlee521 joverlee521 added the bug Something isn't working label May 13, 2022
@huddlej huddlej self-assigned this May 25, 2022
@victorlin victorlin added the easy problem Requires less work than most issues label May 25, 2022
huddlej added a commit that referenced this issue May 25, 2022
Adds a functional test to cover a use case described in #948.
huddlej added a commit that referenced this issue May 25, 2022
Sets the dtype of the strain column in the sequence index to "string"
prior to annotating metadata with that index. This change prevents
pandas from inferring the dtype as numeric when strain names are all
numeric.

Fixes #948
@huddlej huddlej linked a pull request May 25, 2022 that will close this issue
2 tasks
huddlej added a commit that referenced this issue Aug 5, 2022
Adds a functional test to cover a use case described in #948.
huddlej added a commit that referenced this issue Aug 5, 2022
Sets the dtype of the strain column in the sequence index to "string"
prior to annotating metadata with that index. This change prevents
pandas from inferring the dtype as numeric when strain names are all
numeric.

Fixes #948
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working easy problem Requires less work than most issues
Projects
No open projects
Status: In Review
Development

Successfully merging a pull request may close this issue.

3 participants