Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process Pandas Dataframe instead of list? #4

Open
Franky1 opened this issue Feb 22, 2022 · 4 comments
Open

Process Pandas Dataframe instead of list? #4

Franky1 opened this issue Feb 22, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@Franky1
Copy link

Franky1 commented Feb 22, 2022

Usually Pandas Dataframes are used in all Python financial frameworks/workflows.

Is it possible to rewrite the Rust functions so that they can directly process a Pandas dataframe?
Currently everything has to be converted: DataFrame > List > Panther > List > DataFrame

In addition, the speed_tests are not fully meaningful in this context, because the conversions to and from lists have to be counted.

Add: I tried this out briefly. If you want to get it back into a pandas dataframe, all speed advantages are gone.
Can you show a real workflow example with a pandas dataframe?

@Franky1
Copy link
Author

Franky1 commented Feb 22, 2022

I was wrong, the Rust functions can actually already process pandas dataframes directly, the examples under speed_tests/ have irritated me, because tolist() is used there.

However one runs then into another problem, since the functions return then another number of indices.
The functions seems to return no NaN value until the average window is fully filled.

This modified example doesn't work, unfortunately:

print("Timing Panther:")
start = timer()
# data['EMA'] = ema(data['Close'], 4)
data['SMA'] = sma(data['Close'], 5)
end = timer()
print(timedelta(seconds=end-start))
ValueError: Length of values (1255) does not match length of index (1259)

# This does work, but is a dirty hack to fill the missing values with NaN:
print("Timing Panther:")
start = timer()
data['EMA'] = [np.nan] * 3 + ema(data['Close'], 4)
data['SMA'] = [np.nan] * 3 + sma(data['Close'], 4)
end = timer()
print(timedelta(seconds=end-start))

@gregyjames
Copy link
Owner

This occurs because the length of the EMA list generated is (length - period + 1). The first element is calculated by summing the (1:period) and then using that value for the subsequent values by using result[i-1]+(price[i+period-1]-result[i-1])*(2/(period+1)). So essentially, it removes those first couple values. I need to change this, I have another algorithm for EMA I want to try that should fix this issue and the weird initial values issue that you experienced. Will be releasing this some time this week. Thanks for pointing out these issues.

@gregyjames gregyjames self-assigned this Feb 22, 2022
@gregyjames gregyjames added the bug Something isn't working label Feb 22, 2022
@Franky1
Copy link
Author

Franky1 commented Feb 22, 2022

I think the Panther functions all need to be rewritten to fit seamlessly into a common workflow with Pandas dataframes, otherwise I don't think it makes much sense.
As long as the averaging filters are not yet filled, the functions must still return e.g. NaN, otherwise the indices of the DataFrame will no longer fit.

Then you should do an honest(!) benchmark test against the other Python TA libraries that already exist:

  • TA-Lib
  • pandas-ta
  • ta
  • finta

@Franky1
Copy link
Author

Franky1 commented Feb 22, 2022

Another suggestion:
Why keep reinventing the wheel?
There are already so many different TA libraries.
Why not take an existing Rust TA lib and write a Python wrapper for it with PyO3?

@gregyjames gregyjames added enhancement New feature or request and removed bug Something isn't working labels Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants