Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutability in Time Series Instances / The new data structures #444

Open
goastler opened this issue Oct 23, 2020 · 1 comment
Open

Mutability in Time Series Instances / The new data structures #444

goastler opened this issue Oct 23, 2020 · 1 comment
Assignees

Comments

@goastler
Copy link
Member

Fork in the road on the TSInsts data structures:

a) Make TimeSeries, TimeSeriesInstance, and TimeSeriesInstances all mutable
- This complicates the code somewhat as we have to assume data has could been mutated, e.g. metadata has to be recomputed
- We then have to attach listeners to sub structures to keep an eye on them, e.g. attach listeners to TimeSeries to get notified when a missing value is set so we can update the hasMissing in the containing TimeSeriesInstance.
- This introduces complexity all over the place having to manage listeners, etc.

b) Make TimeSeries, TimeSeriesInstance and TimeSeriesInstance immutable
- Once the object is created, the data can't be altered. If you need to, you'd have to create an entirely new object. This means this can't be a drop in replacement for weka's Instances.
- As TimeSeriesInstances is currently mutable (at least can add TSInstance) we would have to refactor those usages to use the above idea of creating a new instance rather than doing modification in place.
- Metadata becomes much more simple as we are assured the underlying data will not change.
- The code becomes much simpler and less error prone (big plus imo)

c) Make TimeSeries and TimeSeriesInstance immutable but keep TimeSeriesInstances mutable
- This covers that majority of use cases (I think!). I.e. take some instances / dimensions, do some modification to them (e.g. perform a transform), and then add it into a new set of TimeSeriesInstances.
- This is also the easiest drop in solution to replace weka's Instances.

I think (b) is best because the code is super simple and least error prone. Given the majority of the time we pull the data out into a double[] or List, do some operations, then put back into an Instance object (b) matches this use case perfectly. Any use case which mutates data in place can be done in the same method with some minor tweaking, which I think is worth it for the immutability on the new data structs.

I've had a quick go at implementing (a) and (b) and can confirm (a) becomes rather complex whereas (b) remains very similar to what we've currently got :)

Lastly, we need to consider mutability of class labels. Currently there's a setter for class labels in TimeSeriesInstances, but I think the class labels should be set during construction and never changed to avoid situations where TimeSeriesInstances is build from some data with N classes and somewhere less than N classes are set or worse, the class labels are rearranged. Setting once during construction is simple and far less error prone. Irrelevant of the choices made above, I believe class labels should be immutable.

@ABostrom
Copy link
Member

ABostrom commented Oct 26, 2020

I think I'm beginning to more favour C. Although i agree complete immutability simplifies everything.

The case for updating stats etc with the top level Instances structure being mutable isn't overly problematic. Only problems that might arise is with passing references, and transformers/classifiers changing/editing the dataset in place.

Totally agree on changing class labels to be immutable. i think this is a leftover from when i was writing the file parse, and had issues with passing data into the constructor because of weird generic rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants