Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

Cardinality on column values #1682

Open
gitsathish opened this issue Jun 15, 2020 · 1 comment
Open

Cardinality on column values #1682

gitsathish opened this issue Jun 15, 2020 · 1 comment
Labels
enhancement Issues related to improving the codebase, the documentation or process within the project

Comments

@gitsathish
Copy link

Feature request

Wondering if there is a way to do this, impose cardinalities on columns.
Example, Generate 10000 rows with an Integer column.
Integer column min,max is 1 and 25000.
But there should only be 100 unique values of the integer in the 10000 rows.

Similar, functionality for String would be useful as well.

@gitsathish gitsathish added the enhancement Issues related to improving the codebase, the documentation or process within the project label Jun 15, 2020
@tjohnson-scottlogic
Copy link
Contributor

tjohnson-scottlogic commented Jun 24, 2020

Hi, thanks for reaching out. This can be achieved via the use of inSet (see the User Guide or example for further info), like this:

[tim@sn1 bin]$ cat profiles/cardinality.json
{
    "fields": [
    {
      "name": "an_integer",
      "type": "integer",
      "nullable": false
    }
  ],
  "constraints": [
    {
      "field": "an_integer",
      "inSet": "integer_set.csv"
    }
  ]
}
[tim@sn1 bin]$ cat profiles/integer_set.csv
1
25000
10
1000
[tim@sn1 bin]$ ./datahelix --profile-file=profiles/cardinality.json --max-rows 3 --quiet
an_integer
25000
1000
1

Would this approach work for you? This would work for any data type.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Issues related to improving the codebase, the documentation or process within the project
Projects
None yet
Development

No branches or pull requests

2 participants