New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read index API #1568
base: master
Are you sure you want to change the base?
Read index API #1568
Conversation
Mostly looks good, few comments/ideas:
|
class TestReadSingleIndex: | ||
@pytest.mark.parametrize("index", [ | ||
pd.RangeIndex(start=0, stop=10), | ||
pd.RangeIndex(start=0, stop=10, step=2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would test with a non-zero start value here.
]) | ||
def test_rowrange(self, lmdb_version_store_static_and_dynamic, index): | ||
lmdb_version_store_static_and_dynamic.write("sym", pd.DataFrame({"col": range(0, len(index))}, index=index)) | ||
result = lmdb_version_store_static_and_dynamic.read_index("sym") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lmdb_version_store_static_and_dynamic
is of type NativeVersionStore
, which has a read_index
method with different functionality.
def test_as_of(self, lmdb_version_store_static_and_dynamic, indexes): | ||
data = [list(range(0, len(index))) for index in indexes] | ||
|
||
lmdb_version_store_static_and_dynamic.write("sym", pd.DataFrame({"col": data[0]}, index=indexes[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to use a fixture with 2 columns per segment, and >2 columns to check that everything works with column slicing.
* Different versions of as_of * date_rage * row_range * Different column slice size
Reference Issues/PRs
relates to: #399
What does this implement or fix?
Add unit tests to showcase the proposed API for reading the index of a symbol.
Proposal
Add new
arcticdb.Library
APIsymbol
andas_of
have the same meaning as in arcticdb.Library.readVersionedItem.data
will contain the index retrieved from the symbol. Its type will be one ofpandas.RangeIndex
,pandas.DatetimeIndex
,pandas.MultiIndex
(it can be expanded to all supported index types) .TBD
date_range
/row_range
argument needed? (I would add it only if someone requests it)VersionedItem
? Pros: It's consistent witharcticdb.Library.read
. Cons: IfVersionedItem
starts to grow it can start carrying additional data that's not usedAlternatives
Extend
arcticdb.Library.read
with additional parameterindex_only
.Pros:
Cons:
VersionedItem.data
will be of different type depending on the parameter which might lead to confusion and bugsarcticdb.Library.read
might become huge and hard to understand and useAny other comments?
Checklist
Checklist for code changes...