Port's set method does not handle length robustly #32

jonathanjfshaw · 2019-12-29T18:58:40Z

In io.py we have:

class Port:
...
    def set(self, rows, timestamps=None, names=None, meta={}):
        if timestamps is None:
            rate = 1 if Registry.rate == 0 else Registry.rate
            stop = int(Registry.cycle_start * 1e6)
            start = stop - int(1e6 / rate)
            timestamps = np.linspace(start, stop, len(rows), False, dtype='datetime64[us]')
        self.data = pd.DataFrame(rows, index=timestamps, columns=names)

This set method seems obviously designed to create a timestamp index for each row of the provided data.

It works if the provided rows argument is a numpy array, but not if it's a dictionary. Obviously this is because the length of a dict is determined by the number of keys, not by the length of items within the dict.

Probably this should also use the time_range helper method from clock.py

The text was updated successfully, but these errors were encountered:

jonathanjfshaw · 2019-12-29T19:18:51Z

Perhaps the solution is to create the dataframe, measure its length, then generate and add the index. This should work robustly with any valid source of data for a dataframe.

mesca · 2019-12-31T00:32:09Z

Hi Jonathan,

As you noticed, the set method expects an array-like object, not a dictionary. We should document this, and raise an exception if the type is invalid.

Your solution would work, but we need to refactor this method anyway: the Registry will soon be deprecated, and I don't recommend generating the timestamp index with set, at it can lead to artificial jitter if the pipeline congests or if your device sends chunks at an irregular rate.

For now, if you really want to use a dictionary, you can generate the index yourself, and simply do something like: self.o.data = pd.DataFrame(my_dict, index=my_index).

There are several strategies you can use to generate the index:

If you trust your device clock, you can count the samples and set the timestamps according to the rate. Examples are available in the AMTI driver and the Bitalino driver.
Instead of counting the total number of samples since the start, you can also do it by packet, as in the PL4 driver.
Another way would be to get the samples from your device one by one, and set each timestamp explicitely. You would need to run this in a thread, as in the Gazepoint driver.
If you really want to reproduce the behavior of the set method, you can use the time_range helper.

It really depends of what you need in terms of precision for your project. If you are OK with some potential artificial jitter, you can of course continue to use the set method (with arrays or lists instead of dicts for now).

jonathanjfshaw · 2020-01-01T11:49:59Z

Hi Pierre, thanks for your detailed answer. I had found the problem easy to work round as you suggested. I think the 'set' name led me into into thinking this was a recommended/preferred way of setting the data on the port.

mesca · 2020-01-01T11:53:34Z

Great! Let me know if I can help further. You can also join the Slack channel for general support.

mesca added the enhancement New feature or request label Dec 31, 2019

jonathanjfshaw closed this as completed Jan 1, 2020

jonathanjfshaw reopened this Jan 1, 2020

mesca mentioned this issue Jan 22, 2020

Set do not ensure index is montonic #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port's set method does not handle length robustly #32

Port's set method does not handle length robustly #32

jonathanjfshaw commented Dec 29, 2019

jonathanjfshaw commented Dec 29, 2019

mesca commented Dec 31, 2019

jonathanjfshaw commented Jan 1, 2020

mesca commented Jan 1, 2020

Port's set method does not handle length robustly #32

Port's set method does not handle length robustly #32

Comments

jonathanjfshaw commented Dec 29, 2019

jonathanjfshaw commented Dec 29, 2019

mesca commented Dec 31, 2019

jonathanjfshaw commented Jan 1, 2020

mesca commented Jan 1, 2020