Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolate (upsample) non-equispaced timeseries into equispaced 18.0rc1 #12552

Open
marcelnem opened this issue Mar 7, 2016 · 3 comments
Open
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@marcelnem
Copy link

I want to interpolate (upscale) nonequispaced time-series to obtain equispaced time-series.

Currently I am doing it in following way:

  1. take original timeseries.
  2. create new timeseries with NaN values at each 30 seconds intervals ( using resample('30S').asfreq() )
  3. concat original timeseries and new timeseries
  4. sort the timeseries to restore order of times (This I do not like - sorting has complexity of O = n log(n) )
  5. interpolate
  6. remove original points from the timeseries

is there a more simple way? like in matlab you have original timeseries and you pass new times as a parameter to the interpolate() function to receive values at desired times. Ideally I would like to have a function such as

origTimeSeries.interpolate(newIndex=newTimeIndex, method='spline')

I remark that times of original timeseries might not be be a subset of the times of desired timeseries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

values = [271238, 329285, 50, 260260, 263711]
timestamps = pd.to_datetime(['2015-01-04 08:29:4',
                             '2015-01-04 08:37:05',
                             '2015-01-04 08:41:07',
                             '2015-01-04 08:43:05',
                             '2015-01-04 08:49:05'])

ts = pd.Series(values, index=timestamps)
ts
ts[ts==-1] = np.nan
newFreq=ts.resample('60S').asfreq()

new=pd.concat([ts,newFreq]).sort_index()
new=new.interpolate(method='time')

ts.plot(marker='o')
new.plot(marker='+',markersize=15)

new[newFreq.index].plot(marker='.')

lines, labels = plt.gca().get_legend_handles_labels()
labels = ['original values (nonequispaced)', 'original + interpolated at new frequency (nonequispaced)', 'interpolated values without original values (equispaced!)']
plt.legend(lines, labels, loc='best')
plt.show()


image

@jreback
Copy link
Contributor

jreback commented Mar 7, 2016

use ordered_merge rather than concat and sort
http://pandas.pydata.org/pandas-docs/stable/merging.html#merging-ordered-data

@marcelnem
Copy link
Author

It would be nice to do it without need of merge altogether since I do not really need the merged time series, I only need the resultant equispaced time series. Is the way I described (enhanced with the ordered_merge) the most efficient way to do such? Maybe using spicy directly would be better then

http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/interpolate.html#d-interpolation-interp1d
scipy allows to do it in Matlab style, keep the original timeseries, and pass new index to obtain new timeseries.

also I will be working will online data so the original time series will grow and I will need to interpolate the new data and add them to the interpolated (equispaced) time series.

@jreback
Copy link
Contributor

jreback commented Mar 7, 2016

this gets you pretty close

In [42]: ts.reindex(ts.resample('60s').asfreq().index, method='nearest', tolerance=pd.Timedelta('60s')).interpolate('time')
Out[42]: 
2015-01-04 08:29:00    271238.000000
2015-01-04 08:30:00    271238.000000
2015-01-04 08:31:00    279530.428571
2015-01-04 08:32:00    287822.857143
2015-01-04 08:33:00    296115.285714
2015-01-04 08:34:00    304407.714286
2015-01-04 08:35:00    312700.142857
2015-01-04 08:36:00    320992.571429
2015-01-04 08:37:00    329285.000000
2015-01-04 08:38:00    329285.000000
2015-01-04 08:39:00    219540.000000
2015-01-04 08:40:00    109795.000000
2015-01-04 08:41:00        50.000000
2015-01-04 08:42:00        50.000000
2015-01-04 08:43:00    260260.000000
2015-01-04 08:44:00    260260.000000
2015-01-04 08:45:00    260950.200000
2015-01-04 08:46:00    261640.400000
2015-01-04 08:47:00    262330.600000
2015-01-04 08:48:00    263020.800000
2015-01-04 08:49:00    263711.000000
Freq: 60S, dtype: float64

@mroeschke mroeschke added Enhancement Timeseries Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jan 13, 2019
@mroeschke mroeschke added Resample resample method and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Mar 31, 2020
@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Resample resample method Timeseries labels Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

3 participants