Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port's set method does not handle length robustly #32

Open
jonathanjfshaw opened this issue Dec 29, 2019 · 4 comments
Open

Port's set method does not handle length robustly #32

jonathanjfshaw opened this issue Dec 29, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@jonathanjfshaw
Copy link
Contributor

In io.py we have:

class Port:
...
    def set(self, rows, timestamps=None, names=None, meta={}):
        if timestamps is None:
            rate = 1 if Registry.rate == 0 else Registry.rate
            stop = int(Registry.cycle_start * 1e6)
            start = stop - int(1e6 / rate)
            timestamps = np.linspace(start, stop, len(rows), False, dtype='datetime64[us]')
        self.data = pd.DataFrame(rows, index=timestamps, columns=names)

This set method seems obviously designed to create a timestamp index for each row of the provided data.

It works if the provided rows argument is a numpy array, but not if it's a dictionary. Obviously this is because the length of a dict is determined by the number of keys, not by the length of items within the dict.

Probably this should also use the time_range helper method from clock.py

@jonathanjfshaw
Copy link
Contributor Author

Perhaps the solution is to create the dataframe, measure its length, then generate and add the index. This should work robustly with any valid source of data for a dataframe.

@mesca
Copy link
Member

mesca commented Dec 31, 2019

Hi Jonathan,

As you noticed, the set method expects an array-like object, not a dictionary. We should document this, and raise an exception if the type is invalid.

Your solution would work, but we need to refactor this method anyway: the Registry will soon be deprecated, and I don't recommend generating the timestamp index with set, at it can lead to artificial jitter if the pipeline congests or if your device sends chunks at an irregular rate.

For now, if you really want to use a dictionary, you can generate the index yourself, and simply do something like: self.o.data = pd.DataFrame(my_dict, index=my_index).

There are several strategies you can use to generate the index:

  • If you trust your device clock, you can count the samples and set the timestamps according to the rate. Examples are available in the AMTI driver and the Bitalino driver.
  • Instead of counting the total number of samples since the start, you can also do it by packet, as in the PL4 driver.
  • Another way would be to get the samples from your device one by one, and set each timestamp explicitely. You would need to run this in a thread, as in the Gazepoint driver.
  • If you really want to reproduce the behavior of the set method, you can use the time_range helper.

It really depends of what you need in terms of precision for your project. If you are OK with some potential artificial jitter, you can of course continue to use the set method (with arrays or lists instead of dicts for now).

@mesca mesca added the enhancement New feature or request label Dec 31, 2019
@jonathanjfshaw
Copy link
Contributor Author

Hi Pierre, thanks for your detailed answer. I had found the problem easy to work round as you suggested. I think the 'set' name led me into into thinking this was a recommended/preferred way of setting the data on the port.

@mesca
Copy link
Member

mesca commented Jan 1, 2020

Great! Let me know if I can help further. You can also join the Slack channel for general support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants