Process Pandas Dataframe instead of list? #4

Franky1 · 2022-02-22T15:17:33Z

Usually Pandas Dataframes are used in all Python financial frameworks/workflows.

Is it possible to rewrite the Rust functions so that they can directly process a Pandas dataframe?
Currently everything has to be converted: DataFrame > List > Panther > List > DataFrame

In addition, the speed_tests are not fully meaningful in this context, because the conversions to and from lists have to be counted.

Add: I tried this out briefly. If you want to get it back into a pandas dataframe, all speed advantages are gone.
Can you show a real workflow example with a pandas dataframe?

The text was updated successfully, but these errors were encountered:

Franky1 · 2022-02-22T17:01:27Z

I was wrong, the Rust functions can actually already process pandas dataframes directly, the examples under speed_tests/ have irritated me, because tolist() is used there.

However one runs then into another problem, since the functions return then another number of indices.
The functions seems to return no NaN value until the average window is fully filled.

This modified example doesn't work, unfortunately:

print("Timing Panther:")
start = timer()
# data['EMA'] = ema(data['Close'], 4)
data['SMA'] = sma(data['Close'], 5)
end = timer()
print(timedelta(seconds=end-start))

ValueError: Length of values (1255) does not match length of index (1259)

# This does work, but is a dirty hack to fill the missing values with NaN:
print("Timing Panther:")
start = timer()
data['EMA'] = [np.nan] * 3 + ema(data['Close'], 4)
data['SMA'] = [np.nan] * 3 + sma(data['Close'], 4)
end = timer()
print(timedelta(seconds=end-start))

gregyjames · 2022-02-22T20:10:00Z

This occurs because the length of the EMA list generated is (length - period + 1). The first element is calculated by summing the (1:period) and then using that value for the subsequent values by using result[i-1]+(price[i+period-1]-result[i-1])*(2/(period+1)). So essentially, it removes those first couple values. I need to change this, I have another algorithm for EMA I want to try that should fix this issue and the weird initial values issue that you experienced. Will be releasing this some time this week. Thanks for pointing out these issues.

Franky1 · 2022-02-22T22:17:24Z

I think the Panther functions all need to be rewritten to fit seamlessly into a common workflow with Pandas dataframes, otherwise I don't think it makes much sense.
As long as the averaging filters are not yet filled, the functions must still return e.g. NaN, otherwise the indices of the DataFrame will no longer fit.

Then you should do an honest(!) benchmark test against the other Python TA libraries that already exist:

TA-Lib
pandas-ta
ta
finta

Franky1 · 2022-02-22T23:43:16Z

Another suggestion:
Why keep reinventing the wheel?
There are already so many different TA libraries.
Why not take an existing Rust TA lib and write a Python wrapper for it with PyO3?

gregyjames self-assigned this Feb 22, 2022

gregyjames added the bug Something isn't working label Feb 22, 2022

gregyjames added enhancement New feature or request and removed bug Something isn't working labels Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process Pandas Dataframe instead of list? #4

Process Pandas Dataframe instead of list? #4

Franky1 commented Feb 22, 2022 •

edited

Franky1 commented Feb 22, 2022 •

edited

gregyjames commented Feb 22, 2022

Franky1 commented Feb 22, 2022

Franky1 commented Feb 22, 2022 •

edited

Process Pandas Dataframe instead of list? #4

Process Pandas Dataframe instead of list? #4

Comments

Franky1 commented Feb 22, 2022 • edited

Franky1 commented Feb 22, 2022 • edited

gregyjames commented Feb 22, 2022

Franky1 commented Feb 22, 2022

Franky1 commented Feb 22, 2022 • edited

Franky1 commented Feb 22, 2022 •

edited

Franky1 commented Feb 22, 2022 •

edited

Franky1 commented Feb 22, 2022 •

edited