Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignored "raw" flag for StockDataStream #247

Open
nv-pipo opened this issue Feb 8, 2023 · 11 comments
Open

Ignored "raw" flag for StockDataStream #247

nv-pipo opened this issue Feb 8, 2023 · 11 comments

Comments

@nv-pipo
Copy link

nv-pipo commented Feb 8, 2023

Issue

Initializing StockDataStream with parameter raw_data=True creates a websocket that returns timestamp as 'Timestamp', instead of the RAW string with the datetime.

Replicate

from alpaca.data.live.stock import StockDataStream

wss_client = StockDataStream('api-key', 'secret-key', raw_data=True)

# async handler
async def quote_data_handler(data: Any):
    # quote data will arrive here
    print(data)
    print("===")

wss_client.subscribe_quotes(quote_data_handler, "AAPL")

wss_client.run()

output:
Screenshot 2023-02-08 at 1 30 14 PM

Expected output (from https://alpaca.markets/docs/api-references/market-data-api/stock-pricing-data/realtime/):
Screenshot 2023-02-08 at 1 29 17 PM

@impredicative
Copy link

impredicative commented Apr 26, 2023

By the way, I am seeing two timestamps in the raw quote, both as msgpack.ext.Timestamp objects, with keys t and r. Only t is documented whereas r isn't.

Examples:

<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'Q', 'ap': 313.74, 'as': 9, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=506885515), 'r': Timestamp(seconds=1682466638, nanoseconds=507788703)}
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'U', 'ap': 313.74, 'as': 18, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=507231768), 'r': Timestamp(seconds=1682466638, nanoseconds=508096712)}
<class 'dict'> {'T': 'q', 'S': 'QQQ', 'bx': 'P', 'bp': 313.72, 'bs': 2, 'ax': 'U', 'ap': 313.73, 'as': 9, 'c': ['R'], 'z': 'C', 't': Timestamp(seconds=1682466638, nanoseconds=507273406), 'r': Timestamp(seconds=1682466638, nanoseconds=508120891)}

I see that t < r.

@nv-pipo Just so I understand, what's currently the harm in leaving it as a msgpack.ext.Timestamp object? Timestamps are of course essential for trading. Do you have a much faster way of parsing the timestamp?

Here are the available methods on the timestamp object:

dir(msgpack.ext.Timestamp)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'from_bytes', 'from_datetime', 'from_unix', 'from_unix_nano', 'nanoseconds', 'seconds', 'to_bytes', 'to_datetime', 'to_unix', 'to_unix_nano']

@nv-pipo
Copy link
Author

nv-pipo commented May 1, 2023

@impredicative,
There is no harm, but using the "raw" flag I would expect a "RFC-3339 formatted timestamp with nanosecond precision".

@impredicative
Copy link

impredicative commented Jun 19, 2023

@impredicative, There is no harm, but using the "raw" flag I would expect a "RFC-3339 formatted timestamp with nanosecond precision".

I believe it's natively coming in through the socket with msgpack encoding. As such, msgpack seems to be its raw format. As such, the expectation of a string is probably invalid.

@nv-pipo
Copy link
Author

nv-pipo commented Jun 28, 2023

@impredicative
Well, it seems that msgpack supports strings: https://en.wikipedia.org/wiki/MessagePack. I'm no expert thought.

Note that the RAW flag for the timestamp field is also ignored in the historical API calls (let me know if I should open a separate issue):

>>> from alpaca.data.timeframe import TimeFrame
>>> from alpaca.data.requests import StockBarsRequest
>>> from alpaca.data.historical import StockHistoricalDataClient
>>>
>>> # keys required
>>> stock_client = StockHistoricalDataClient(key, secret)
>>> request_params = StockBarsRequest(
...                         symbol_or_symbols=["AAPL"],
...                         timeframe=TimeFrame.Day,
...                         start="2022-10-04 00:00:00",
...                         end="2022-10-05 00:00:00",
...                         raw_format=True,
...                  )
>>>
>>> bars = stock_client.get_stock_bars(request_params)
>>> print(bars)
data={'AAPL': [{   'close': 146.1,
    'high': 146.22,
    'low': 144.26,
    'open': 145.03,
    'symbol': 'AAPL',
    'timestamp': datetime.datetime(2022, 10, 4, 4, 0, tzinfo=datetime.timezone.utc),
    'trade_count': 687505.0,
    'volume': 87880164.0,
    'vwap': 145.490979}]}
>>>

@impredicative
Copy link

impredicative commented Jun 28, 2023

msgpack is ostensibly specified as the desired encoding by the client, and so in my understanding it makes sense for the response to use a msgpack object where it helps. If you don't want it, try using your own custom client without requesting the msgpack encoding, although the results may then be inefficiently delivered. I am not convinced that this issue should stay open.

@nv-pipo
Copy link
Author

nv-pipo commented Jun 29, 2023

Yeah, I had come to the same conclusion some time ago: I've been using my own client for months.

However, I believe it is important to fulfill the flag expectations! and this alpaca-py's stream and historical endpoints do not truly respect the RAW flag. This should be fixed if you want people to use the SDK, instead of driving them away.

@impredicative
Copy link

impredicative commented Jul 2, 2023

Well, it is important for the user to have both the integer seconds value and the integer nanoseconds value. If a msgpack object was not used, the two values would then have to be represented in a list, but they would then be unclear to a reader as to what the list means.

An RFC-3339 timestamp, regardless of its precision, is less useful here because it is a slow and complicated object to work with when speed matters. That's even more true when it's a string that requires parsing. Integers are simplest and fastest to work with.

@nv-pipo
Copy link
Author

nv-pipo commented Jul 3, 2023

@impredicative
I am not arguing that a string representation of a date is better than a "date data type"!

I am simply saying that functions should behave as expected, given the parameters passed: if the parameter RAW is passed, then the RAW response has to be returned. This is not the case today, for neither Websockets nor historical API calls, and thus needs to be fixed. (RAW is described/defined by the RESTful/Websocket endpoints.)

Alternatively, if these functions cannot return the expected value, the RAW parameter should be dropped from functions calls...or rename it to match the return value, eg: "RAW_EXCEPT_FOR_TIMESTAMP".

@impredicative
Copy link

I see your point. Perhaps the arg name raw_data can be changed to native.

@nv-pipo
Copy link
Author

nv-pipo commented Jul 4, 2023

That's fair.

Should I open a separate ticket for the "Historical Stock Pricing Data" endpoint?

@impredicative
Copy link

impredicative commented Jul 4, 2023

@nv-pipo I would just add it to the original comment of this issue itself, also including code to replicate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants