Cardinality on column values #1682

gitsathish · 2020-06-15T15:31:40Z

Feature request

Wondering if there is a way to do this, impose cardinalities on columns.
Example, Generate 10000 rows with an Integer column.
Integer column min,max is 1 and 25000.
But there should only be 100 unique values of the integer in the 10000 rows.

Similar, functionality for String would be useful as well.

tjohnson-scottlogic · 2020-06-24T10:20:17Z

Hi, thanks for reaching out. This can be achieved via the use of inSet (see the User Guide or example for further info), like this:

[tim@sn1 bin]$ cat profiles/cardinality.json
{
    "fields": [
    {
      "name": "an_integer",
      "type": "integer",
      "nullable": false
    }
  ],
  "constraints": [
    {
      "field": "an_integer",
      "inSet": "integer_set.csv"
    }
  ]
}

[tim@sn1 bin]$ cat profiles/integer_set.csv
1
25000
10
1000

[tim@sn1 bin]$ ./datahelix --profile-file=profiles/cardinality.json --max-rows 3 --quiet
an_integer
25000
1000
1

Would this approach work for you? This would work for any data type.

gitsathish added the enhancement Issues related to improving the codebase, the documentation or process within the project label Jun 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cardinality on column values #1682

Cardinality on column values #1682

gitsathish commented Jun 15, 2020

tjohnson-scottlogic commented Jun 24, 2020 •

edited by ColinEberhardt

Cardinality on column values #1682

Cardinality on column values #1682

Comments

gitsathish commented Jun 15, 2020

Feature request

tjohnson-scottlogic commented Jun 24, 2020 • edited by ColinEberhardt

tjohnson-scottlogic commented Jun 24, 2020 •

edited by ColinEberhardt