You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
a) Make TimeSeries, TimeSeriesInstance, and TimeSeriesInstances all mutable
- This complicates the code somewhat as we have to assume data has could been mutated, e.g. metadata has to be recomputed
- We then have to attach listeners to sub structures to keep an eye on them, e.g. attach listeners to TimeSeries to get notified when a missing value is set so we can update the hasMissing in the containing TimeSeriesInstance.
- This introduces complexity all over the place having to manage listeners, etc.
b) Make TimeSeries, TimeSeriesInstance and TimeSeriesInstance immutable
- Once the object is created, the data can't be altered. If you need to, you'd have to create an entirely new object. This means this can't be a drop in replacement for weka's Instances.
- As TimeSeriesInstances is currently mutable (at least can add TSInstance) we would have to refactor those usages to use the above idea of creating a new instance rather than doing modification in place.
- Metadata becomes much more simple as we are assured the underlying data will not change.
- The code becomes much simpler and less error prone (big plus imo)
c) Make TimeSeries and TimeSeriesInstance immutable but keep TimeSeriesInstances mutable
- This covers that majority of use cases (I think!). I.e. take some instances / dimensions, do some modification to them (e.g. perform a transform), and then add it into a new set of TimeSeriesInstances.
- This is also the easiest drop in solution to replace weka's Instances.
I think (b) is best because the code is super simple and least error prone. Given the majority of the time we pull the data out into a double[] or List, do some operations, then put back into an Instance object (b) matches this use case perfectly. Any use case which mutates data in place can be done in the same method with some minor tweaking, which I think is worth it for the immutability on the new data structs.
I've had a quick go at implementing (a) and (b) and can confirm (a) becomes rather complex whereas (b) remains very similar to what we've currently got :)
Lastly, we need to consider mutability of class labels. Currently there's a setter for class labels in TimeSeriesInstances, but I think the class labels should be set during construction and never changed to avoid situations where TimeSeriesInstances is build from some data with N classes and somewhere less than N classes are set or worse, the class labels are rearranged. Setting once during construction is simple and far less error prone. Irrelevant of the choices made above, I believe class labels should be immutable.
The text was updated successfully, but these errors were encountered:
I think I'm beginning to more favour C. Although i agree complete immutability simplifies everything.
The case for updating stats etc with the top level Instances structure being mutable isn't overly problematic. Only problems that might arise is with passing references, and transformers/classifiers changing/editing the dataset in place.
Totally agree on changing class labels to be immutable. i think this is a leftover from when i was writing the file parse, and had issues with passing data into the constructor because of weird generic rules.
Fork in the road on the TSInsts data structures:
a) Make TimeSeries, TimeSeriesInstance, and TimeSeriesInstances all mutable
- This complicates the code somewhat as we have to assume data has could been mutated, e.g. metadata has to be recomputed
- We then have to attach listeners to sub structures to keep an eye on them, e.g. attach listeners to TimeSeries to get notified when a missing value is set so we can update the hasMissing in the containing TimeSeriesInstance.
- This introduces complexity all over the place having to manage listeners, etc.
b) Make TimeSeries, TimeSeriesInstance and TimeSeriesInstance immutable
- Once the object is created, the data can't be altered. If you need to, you'd have to create an entirely new object. This means this can't be a drop in replacement for weka's Instances.
- As TimeSeriesInstances is currently mutable (at least can add TSInstance) we would have to refactor those usages to use the above idea of creating a new instance rather than doing modification in place.
- Metadata becomes much more simple as we are assured the underlying data will not change.
- The code becomes much simpler and less error prone (big plus imo)
c) Make TimeSeries and TimeSeriesInstance immutable but keep TimeSeriesInstances mutable
- This covers that majority of use cases (I think!). I.e. take some instances / dimensions, do some modification to them (e.g. perform a transform), and then add it into a new set of TimeSeriesInstances.
- This is also the easiest drop in solution to replace weka's Instances.
I think (b) is best because the code is super simple and least error prone. Given the majority of the time we pull the data out into a double[] or List, do some operations, then put back into an Instance object (b) matches this use case perfectly. Any use case which mutates data in place can be done in the same method with some minor tweaking, which I think is worth it for the immutability on the new data structs.
I've had a quick go at implementing (a) and (b) and can confirm (a) becomes rather complex whereas (b) remains very similar to what we've currently got :)
Lastly, we need to consider mutability of class labels. Currently there's a setter for class labels in TimeSeriesInstances, but I think the class labels should be set during construction and never changed to avoid situations where TimeSeriesInstances is build from some data with N classes and somewhere less than N classes are set or worse, the class labels are rearranged. Setting once during construction is simple and far less error prone. Irrelevant of the choices made above, I believe class labels should be immutable.
The text was updated successfully, but these errors were encountered: