Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvlink appears to hang during training #103

Open
dougscc opened this issue Jun 21, 2023 · 0 comments
Open

csvlink appears to hang during training #103

dougscc opened this issue Jun 21, 2023 · 0 comments

Comments

@dougscc
Copy link

dougscc commented Jun 21, 2023

csvlink hangs after a few seconds with 0.0% CPU

  • python version: 3.7.3
  • environment: centos

CSV Files to Match

$ wc -l train-*
   494 train-left.csv
   481 train-right.csv

Config file

Attempting to match on 9 fields.

{
 "field_names": [
  "state",
  "email",
  "address_2",
  "address_1",
  "county",
  "postal_code",
  "city",
  "name"
 ],
 "field_definition": [
  {
   "field": "state",
   "type": "String",
   "Has Missing": true
  },
  {
   "field": "email",
   "type": "String",
   "Has Missing": true
  },
  {
   "field": "address_2",
   "type": "String",
   "Has Missing": true
  },
  {
   "field": "address_1",
   "type": "String",
   "Has Missing": true
  },
  {
   "field": "county",
   "type": "String",
   "Has Missing": true
  },
  {
   "field": "postal_code",
   "type": "String",
   "Has Missing": true
  },
  {
   "field": "city",
   "type": "String",
   "Has Missing": true
  },
  {
   "field": "name",
   "type": "String",
   "Has Missing": true
  }
 ],
 "output_file": "deduped.csv",
 "skip_training": false,
 "training_file": false,
 "sample_size": 150000,
 "recall_weight": 2
}

Command

Running csvlink with the following:

csvlink train-left.csv train-right.csv --config_file=config.json --inner_join

After an initial large cpu hit, the script settles down into a very relaxed state:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14191 somebody+  20   0  558660 143092  10144 S   0.0  0.9   0:52.45 csvlink

Am I doing something wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant