Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing a vector for approximate entropy #5

Open
Tam-Pham opened this issue May 10, 2020 · 3 comments
Open

Removing a vector for approximate entropy #5

Tam-Pham opened this issue May 10, 2020 · 3 comments
Assignees

Comments

@Tam-Pham
Copy link

Hi @raphaelvallat I have been finding your packages and your guides extremely helpful!

I'm currently working on NeuroKit with @DominiqueMakowski and we are looking at implementing the functions for different entropy. I have a small question regarding your implementation below of ApEn:

def _app_samp_entropy(x, order, metric='chebyshev', approximate=True):

if approximate:
        emb_data1 = _emb_data1
    else:
        emb_data1 = _emb_data1[:-1]

It seems like here the last vector in the embedded time-series is removed if approximate is False (sample entropy). However, I couldn't find the rationale for this particular removal. Would really appreciate it if you could point me to the right direction.

Many thanks!
Tam

@raphaelvallat raphaelvallat self-assigned this May 10, 2020
@raphaelvallat
Copy link
Owner

Hi @Tam-Pham!

Thanks for the feedback and for opening this issue!

The implementations of the approximate and sample entropy are simply 1D adaptations of a code from the MNE-features package and since I did not write the original code I'm not sure to understand why the last embedded vector is removed here (also, I worked on these functions almost two years ago and I have a very bad memory...😬).

I have compared the output of several implementations of the sample entropy in the Jupyter notebook attached. As you can see, the two methods implemented in entropy gives similar results to the nolds package, but a different output than the example code on Wikipedia as well as a Matlab implementation. Even though the differences between implementations are quite small, it is still troublesome and it would be great to understand what causes them. I'll definitely look at that in the next weeks, but please let me know if you make any progress on your side.

One other minor issue that can lead to very small differences is whether we use the sample or population standard deviation to define the tolerance, i.e. r = 0.2 * np.std(x, ddof=0) or r = 0.2 * np.std(ddof=1), respectively. I have found both implementations, but I'm not sure which one is more valid.

Take care,
Raphael

sampen.zip

@Tam-Pham
Copy link
Author

Tam-Pham commented May 10, 2020

Thanks @raphaelvallat for very detailed answer and the comparison script.

Recently, I have been looking at the paper here: Shi (2017) and the paper seems to suggest that:

image

for a signal with N number of samples, m dimension and τ time delay, the number of vector formed is limited to N−mτ vectors.

And since the full embedding matrix that we obtain has the shape of N - (m-1)τ - by - m, it would make more sense to remove the τ number of embedded vector when approximate entropy is calculated 🤔

I'm still investigating if my above interpretation is correct. Do let me know if you have a different interpretation of this paper 😄

@Tam-Pham
Copy link
Author

By the way, we might look into implementing a function to optimize r for each signal, based on this paper Lu (2008).
True to what you say, since r can have such significant effect on the results, I think it should deserve some "complex optimization" on it own 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants