Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pandas datatypes #179

Open
devmcp opened this issue Jul 26, 2022 · 0 comments
Open

Support for pandas datatypes #179

devmcp opened this issue Jul 26, 2022 · 0 comments

Comments

@devmcp
Copy link

devmcp commented Jul 26, 2022

Pandas datatypes, such as pd.Int64Dtype (see here), do not seem to be supported:

import recordlinkage
from recordlinkage.datasets import load_febrl4

dfA, dfB = load_febrl4()

# Convert column types to pandas nullable integer (Int64):
dfA.postcode = pd.to_numeric(dfA.postcode).convert_dtypes()
dfB.postcode = pd.to_numeric(dfB.postcode).convert_dtypes()

# Indexation step
indexer = recordlinkage.Index()
indexer.block("given_name")
candidate_links = indexer.index(dfA, dfB)

# Comparison step
compare_cl = recordlinkage.Compare()
compare_cl.numeric("postcode", "postcode", label="postcode")

features = compare_cl.compute(candidate_links, dfA, dfB)

gives the error:

TypeError: Cannot interpret 'Int64Dtype()' as a data type
@devmcp devmcp changed the title Support pandas datatypes Support for pandas datatypes Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant