-
I'm trying to complete a dataset with all combinations of two columns ("date" and "country"). In this example, I have 3 distinct dates (stored as Javascript Date objects):
And 3 distinct country names:
What I need as a result is all the possible combinations of these, without filling in missing dates (i.e. I don't need Feb 1-4). Can I use the So far, I'm getting some unexpected results, so wanted to confirm before I keep digging Example DataStarting data:
Hoping for this result:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Normal complete on "date" and "country" should work in this case if your dates are strings as shown above.
Produces [
{"date": "2020-01-01", "country": "Canada", "value": 100},
{"date": "2020-01-01", "country": "US", "value": 150},
{"date": "2020-01-01", "country": "UK", "value": 0},
{"date": "2020-02-05", "country": "Canada", "value": 200},
{"date": "2020-02-05", "country": "US", "value": 250},
{"date": "2020-02-05", "country": "UK", "value": 220},
{"date": "2020-03-14", "country": "Canada", "value": 300},
{"date": "2020-03-14", "country": "US", "value": 0},
{"date": "2020-03-14", "country": "UK", "value": 320}
] If you are working with actual tidy(input,
mutate({ dateValue: d => d.date.valueOf() }),
complete(
["dateValue", "country"],
{ value: 0 }
),
mutate({ date: d => d.date ?? new Date(d.dateValue) }),
select("-dateValue")
); This produces the following (although "date" are actual Date objects not strings): [
{"country": "Canada", "date": "2020-01-01T00:00:00.000Z", "value": 100},
{"country": "US", "date": "2020-01-01T00:00:00.000Z", "value": 150},
{"country": "UK", "date": "2020-01-01T00:00:00.000Z", "value": 0},
{"country": "Canada", "date": "2020-02-05T00:00:00.000Z", "value": 200},
{"country": "US", "date": "2020-02-05T00:00:00.000Z", "value": 250},
{"country": "UK", "date": "2020-02-05T00:00:00.000Z", "value": 220},
{"country": "Canada", "date": "2020-03-14T00:00:00.000Z", "value": 300},
{"country": "US", "date": "2020-03-14T00:00:00.000Z", "value": 0},
{"country": "UK", "date": "2020-03-14T00:00:00.000Z", "value": 320}
] You didn't ask for this, but thought I'd mention it anyway. For filling in blanks between days, you can provide a full sequence of dates. There's some examples in the docs under moving averages that may work for you: https://pbeshai.github.io/tidy/docs/examples/moving_average_example#moving-average-by-date-with-zerofill |
Beta Was this translation helpful? Give feedback.
-
@pbeshai thanks! The second solution you included is exactly what we need. The date sequence example is great - I think we'll need that at some point soon. For now, we don't know the date granularity our users need (since it will vary depending on the dataset and we don't have a way for them to choose yet). I've thought about doing something like calculating the min difference between any two date values in a column, then using that as the sequence granularity, which is what we do for numerical data |
Beta Was this translation helpful? Give feedback.
Normal complete on "date" and "country" should work in this case if your dates are strings as shown above.
Produces