Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Error while running csv example for my file #136

Open
purnima1612 opened this issue Apr 15, 2024 · 0 comments
Open

Getting Error while running csv example for my file #136

purnima1612 opened this issue Apr 15, 2024 · 0 comments

Comments

@purnima1612
Copy link

Hello all ,
I am trying to run csv exmaple for my file which has 850 records . Also I am trying to find duplicates based on custom function which Levenshtein distance . Trying to group all names under one entity_num which shre match of name more than 80% .

While preparning data I changed smaple size to 50
deduper.prepare_training(data_d,sample_size=50 )

after I finish labeling I am getting following error


Traceback (most recent call last):
  File "C:\Python_Projects\Python_extra_code\csv_example.py", line 132, in <module>
    deduper.train()
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\api.py", line 1215, in train
    self.predicates = self.active_learner.learn_predicates(recall, index_predicates)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 397, in learn_predicates
    return self.blocker.learn_predicates(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 136, in learn_predicates
    return self.block_learner.learn(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 72, in learn
    candidate_cover = self.random_forest_candidates(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 112, in random_forest_candidates
    sample_predicates = random.sample(predicates, pred_sample_size)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\random.py", line 453, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

Process finished with exit code 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant