You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears the time to write data scales linearly with the number of versions. This is not great. On my local computer, it starts off at 10 ms and after a few thousand versions becomes 30 ms. For a higher-latency store, I bet this is more dramatic. One user reported latency of 1.5 sec after 8k versions.
My best guess is this is because to load the latest version we are listing all files in versions directory. We might have to implement the first part of #1362 to fix this.
Reproduce this
fromdatetimeimporttimedeltaimporttimeimportpyarrowaspaimportlancedata=pa.table({'a': pa.array([1])})
# Uncomment this part to reset and see once we delete versions, the latency# goes back down.# ds = lance.dataset("test_data")# ds.cleanup_old_versions(older_than=timedelta(seconds=1), delete_unverified=True)foriinrange(10000):
start=time.monotonic()
# Use overwrite to eliminate possibility that it is O(num files)lance.write_dataset(data, 'test_data', mode='overwrite')
print(time.monotonic() -start)
The text was updated successfully, but these errors were encountered:
It appears the time to write data scales linearly with the number of versions. This is not great. On my local computer, it starts off at 10 ms and after a few thousand versions becomes 30 ms. For a higher-latency store, I bet this is more dramatic. One user reported latency of 1.5 sec after 8k versions.
My best guess is this is because to load the latest version we are listing all files in versions directory. We might have to implement the first part of #1362 to fix this.
Reproduce this
The text was updated successfully, but these errors were encountered: