Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get forex data #258

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Boreas813
Copy link

add get_forex_data, this method added some parameters not in stock model, details are in the comments of the method and readme

@enzoampil enzoampil self-requested a review October 6, 2020 08:26
@enzoampil enzoampil linked an issue Oct 6, 2020 that may be closed by this pull request
Copy link
Owner

@enzoampil enzoampil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! 😁

Starting with some initial comments around whether there's a way to do the data pull with a date filter right away so that the processing time can be shortened. At the moment, it seems to take very long. Also, added a comment about including get_forex_data in the init.py of the data module.

@@ -0,0 +1,127 @@
import unittest
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this file to fastquant/python/tests/? This is where we store the tests and run them via pytest. I'm not sure if this will automatically run as a class via pytest, but if now, we can address that in a later PR 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's a unittest file, I‘ve never used pytest. I can rewrite it use pytest later

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, unittest works fine but we have to convert this to use pytest for consistency. And we make sure that this test is also run in Travis-CI. @Boreas514 , pytest structure is simpler in that you don't have to define classes but only functions and assert statements (see tests).


logger.info(f'start downloading forex {symbol} data zip file...')
file_size = 0
res = requests.get(f'http://www.forextester.com/templates/data/files/{symbol}.zip', stream=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Boreas514 So it seems that we download the full historical data from each data pull? Is there a way to query only the data from a specific time period? This make the get data process very slow for users 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first time to use, it will download data from online and make local cache, after that set get_forex_data's parameter read_from_local=False so that it will read data from local.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with enzoampil that we should avoid the caching all forex data. We should cache only what is requested. Is there an easy implementation you can try?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makeing one symbol and one period cache in single process needs 15 minutes and 2GB RAM, is the cost acceptable?
A minimum of 2GB of memory is required to load 1 min chart of forex data from 2000 to 2018.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why such large memory is needed?

from fastquant.data.forex.forextester import get_forextester_data, get_local_data


def get_forex_data(symbol, start_date=None, end_date=None, source="forextester", time_frame='D1', read_from_local=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also import get_forex_data in the __init__.py of the data module (here).

README.md Outdated
@@ -110,6 +112,39 @@ get_crypto_data("BTCUSDT", "2019-01-01", "2019-03-01")

*R does NOT have support for backtesting yet*

## Get forex data

All symbols from [forextester.com](https://forextester.com/) are accessible via `get_forex_data`.This mehtod fetch one minute data from source and shape it to other time frame, M1, M15, H1, D1, W1 are supported time frame now.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
All symbols from [forextester.com](https://forextester.com/) are accessible via `get_forex_data`.This mehtod fetch one minute data from source and shape it to other time frame, M1, M15, H1, D1, W1 are supported time frame now.
All symbols from [forextester.com](https://forextester.com/) are accessible via `get_forex_data`.This method fetch one minute data from source and shape it to other time frame, M1, M15, H1, D1, W1 are supported time frame now.

README.md Outdated
2020-08-30 22:00:00 1.1906 1.1918 1.1901 1.1916
[6127 rows x 5 columns]
```
this process download online data, transfer it, save cache files in disk, first call will need 25 minutes or more and 7GB free RAM to generate cache files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of our users may not have 7GB free RAM so we cannot implement this. Let's try to cache only data the user actually querries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makeing one symbol and one period cache in single process needs 15 minutes and 2GB RAM, is the cost acceptable?
A minimum of 2GB of memory is required to load 1 min chart of forex data from 2000 to 2018.

README.md Outdated
```
this process download online data, transfer it, save cache files in disk, first call will need 25 minutes or more and 7GB free RAM to generate cache files.

After then, set `read_from_local` to True and you get a second response if don't need update local data to newest.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can read cache automatically when available else download so probably no need to add the read_from_local flag.

Copy link
Collaborator

@jpdeleon jpdeleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only partial caching needs to change and we can merge.

@Boreas813
Copy link
Author

I modified the function to reduce the hardware requirements to 2.3GB and a 20-minute conversion process, have a try.


After then, set `read_from_local` to True and you get a second response if don't need update local data to newest.
After then, set `read_from_local=Ture` and you will get a second response if don't need update local data to newest.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_from_local=True

Copy link
Collaborator

@jpdeleon jpdeleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No errors when I ran the branch in colab here: https://colab.research.google.com/drive/128gBYBjc4DcoS_QN3Kf4wdV1524vAjOA?hl=en#scrollTo=7sulhEFAWPl0

I like the logging functionality with ETA info.
But it is still very time consuming so we may need to implement faster solutions.

@enzoampil
Copy link
Owner

Thanks again for this @Boreas514 ! I agree with @jpdeleon , it's still kind of long. Is there a way that we can have a get function that follows the pattern of get_stock_data taking around < 10 seconds to get a specific date range?

Keen to get your thoughts on this 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Pull data for forex pairs
3 participants