Generate value for a nullable column with a percentage #1704

semisft · 2020-09-10T06:20:46Z

Some column values must be filled by a percentage, for example one field must be 10% filled, another 30% in the same profile.
For %10 I tried a field from weighted inSet file and used in an if statement. but results seem to give %50.
How can I configure this?

percent10.csv

1,10
0,90

profile.json

{
	"fields": [
		{
			"name": "percent10",
			"type": "integer"
		},
		{
			"name": "name",
			"type": "firstname",
			"nullable": true
		}
	],
	"constraints": [
		{
			"field": "percent10",
			"inSet": "percent10.csv"
		},
		{
			"if": {
				"field": "percent10",
				"equalTo": 1
			},
			"then": {
				"field": "name",
				"isNull": false
			},
			"else": {
				"field": "name",
				"isNull": true
			}
		}
	]
}

The text was updated successfully, but these errors were encountered:

Tom-hayden · 2020-09-11T11:17:02Z

Hi @semisft, this appears to be a bug with the datahelix. I have raised an issue for it here #1705

sl-slaing · 2021-01-14T09:56:24Z

I've tried this issue with the above profile given the latest edition of the code (to verify if the issue still exists). An example of the output (30 rows) is below:

percent10,name
1,Rory
1,Lily
1,Finn
0,
0,
0,
0,
0,
1,Amelia
1,Thea
1,Zara
1,Christina
1,Jake
0,
1,Maya
1,Liam
0,
1,Zac
1,Hamish
0,
0,
0,
0,
1,Lila
0,
0,
0,
1,Frank
0,
1,Phoebe

This shows a 50% spread of each of the values for percent10, where there should be 10% (3 rows) with 0 and 90% (27 rows) with 1. The issue is still confirmed to be valid - will investigate further.

sl-slaing · 2021-01-14T12:11:24Z

Investigation:
In RandomRowSpecDecisionTreeWalker a list of rowSpecs are generate that represent the rows that can be generated. These are generated as:

name=not null & in (names) and percent10=not null & in (1)
name=null and percent10=not null & in (0)

The generator will then randomly select between the two items above to generate rows. The items above do not have any weighting however (which could have been inherited from the value for percent10) so the generator generates (randomly) an even spread of rows from the two specs above.

Either of the below (or something more elegant) would be required:

The items above need to indicate their weighting, i.e. item1 = 10% and item2 = 90% and use this in the getRandomRowSpec() method
The items above are duplicated as many times as appropriate to create a representative spread, i.e. create 9 item2's for every 1 item1. Then there would be a sample of row specs that can be randomly selected from
something else

…ments

Tom-hayden added the bug Something isn't working label Sep 11, 2020

Tom-hayden mentioned this issue Sep 11, 2020

Using an inSetConstraint with weights and an if constraint causes weights to be ignored #1705

Closed

sl-slaing added a commit that referenced this issue Jan 14, 2021

chore(#1704): Update string representation in notEqual constraint

2fc7da4

sl-slaing added a commit that referenced this issue Jan 14, 2021

fix(#1704): Add weighted decision analysis to generation

b3b6ec6

sl-slaing linked a pull request Jan 14, 2021 that will close this issue

#1704/#1705 weighted decision selection #1718

Draft

sl-slaing added a commit that referenced this issue Jan 18, 2021

chore(#1704): Add javaDoc to method

9cf6fe5

sl-slaing self-assigned this Jan 18, 2021

sl-slaing added a commit that referenced this issue Jan 18, 2021

chore(#1704): Minor refactor from code review comments

0d0e4cf

sl-slaing added a commit that referenced this issue Jan 18, 2021

chore(#1704): Refactor tests to use helper methods to clarify intent

0fef0a4

sl-slaing added a commit that referenced this issue Jan 18, 2021

chore(#1704): Refactor logic into specific functions

e94a048

sl-slaing added a commit that referenced this issue Jan 18, 2021

chore(#1704): Add comments to explain the purpose of the method state…

73bc213

…ments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate value for a nullable column with a percentage #1704

Generate value for a nullable column with a percentage #1704

semisft commented Sep 10, 2020 •

edited

Tom-hayden commented Sep 11, 2020

sl-slaing commented Jan 14, 2021

sl-slaing commented Jan 14, 2021 •

edited

Generate value for a nullable column with a percentage #1704

Generate value for a nullable column with a percentage #1704

Comments

semisft commented Sep 10, 2020 • edited

percent10.csv

Tom-hayden commented Sep 11, 2020

sl-slaing commented Jan 14, 2021

sl-slaing commented Jan 14, 2021 • edited

semisft commented Sep 10, 2020 •

edited

sl-slaing commented Jan 14, 2021 •

edited