Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiIndex with RangeIndex as first column is not consitent with RangeIndex #1569

Open
vasil-pashov opened this issue May 10, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@vasil-pashov
Copy link
Collaborator

Describe the bug

Arctic symbols having range index have some constraints.

  • When appending the start of the new index must be the same as the end of the current data
  • The step of the appended index must match the step of the current data

When MultiIndex these constraints are not applied. Thus one can create a MultiIndex-ed DataFrame whose main index is out of order.

Steps/Code to Reproduce

import pandas as pd
import numpy as np
import arcticdb as adb
dates1 = pd.date_range("01/01/2024", "01/10/2024")
dates2 = pd.date_range("01/15/2024", "01/20/2024")
rowrange1 = pd.RangeIndex(start=0, stop=10)
rowrange2 = pd.RangeIndex(start=15, stop=21)
midx1 = pd.MultiIndex.from_arrays([rowrange1, dates1], names=["datetime", "level"])
midx2 = pd.MultiIndex.from_arrays([rowrange2, dates2], names=["datetime", "level"])

ac = adb.Arctic("lmdb://test")
lib = ac.get_library("test", create_if_missing=True)
lib.write("test", pd.DataFrame({"col": range(0, len(midx1))}, index=midx1))
lib.append("test", pd.DataFrame({"col": range(0, len(midx2))}, index=midx2))
lib.append("test", pd.DataFrame({"col": range(0, len(midx1))}, index=midx1))

print(lib.read("test").data)

Output

                     col
datetime level
0        2024-01-01    0
1        2024-01-02    1
2        2024-01-03    2
3        2024-01-04    3
4        2024-01-05    4
5        2024-01-06    5
6        2024-01-07    6
7        2024-01-08    7
8        2024-01-09    8
9        2024-01-10    9
15       2024-01-15    0
16       2024-01-16    1
17       2024-01-17    2
18       2024-01-18    3
19       2024-01-19    4
20       2024-01-20    5
0        2024-01-01    0
1        2024-01-02    1
2        2024-01-03    2
3        2024-01-04    3
4        2024-01-05    4
5        2024-01-06    5
6        2024-01-07    6
7        2024-01-08    7
8        2024-01-09    8
9        2024-01-10    9

Expected Results

Apply the same constraints for pd.RangeIndex when it's part of a MultiIndex. Throw exception in the above case.

OS, Python Version and ArcticDB Version

Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
OS: Windows-10-10.0.22631-SP0
ArcticDB: dev

Backend storage used

No response

Additional Context

No response

@vasil-pashov vasil-pashov added the bug Something isn't working label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant