Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ManagedDB] Rest API based Thin client for ManagedService #2666

Open
wants to merge 101 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
ccde21a
Create think client
ProgerDav Oct 20, 2023
00c7117
Add most apis to client
ProgerDav Oct 26, 2023
7735e98
More tuninings
ProgerDav Oct 30, 2023
c9a5cb7
Attempt adding tests
ProgerDav Oct 30, 2023
1dcff5c
Pass remining add params
ProgerDav Nov 2, 2023
a5aad51
Merge remote-tracking branch 'origin' into managed-thin-client
ProgerDav Nov 3, 2023
655784e
Slight cleanup
ProgerDav Nov 3, 2023
503a707
refactoring deepmemory to make it a decorator
adolkhan Nov 8, 2023
a6787c7
fixing tests
adolkhan Nov 8, 2023
3034f78
added decorator to search
Nov 8, 2023
9d68bb9
Adjust parameter name
ProgerDav Nov 8, 2023
04ba5ee
Merge branch 'main' into managed-thin-client
Nov 9, 2023
fae5205
Merge branch 'deepmemory_refac' into managed-thin-client
Nov 9, 2023
de04c28
refactoring
Nov 9, 2023
6ad0030
Removing uneccessary imports
Nov 10, 2023
7aaa75c
Merge branch 'managed-thin-client' of github.com:activeloopai/deeplak…
ProgerDav Nov 10, 2023
79a1aeb
Adjust param in managed dh
ProgerDav Nov 10, 2023
a41fdb8
test fixes + enabling passing datasets as a parameter to the vectorstore
Nov 10, 2023
f67c029
deep_memory
Nov 10, 2023
753a770
ManagedDH fixes
ProgerDav Nov 11, 2023
885b1a1
test fixes
Nov 13, 2023
55ef72b
mypy fixes
Nov 13, 2023
8865157
typo fix
Nov 13, 2023
0ff85e9
Change dtype to string
ProgerDav Nov 14, 2023
f37dc86
Get ids for append correctly
ProgerDav Nov 15, 2023
682680f
Pass through redundant kwargs
ProgerDav Nov 15, 2023
b90a89b
Merge branch 'deepmemory_refac' into managed-thin-client
adolkhan Nov 16, 2023
f3a0e87
merge main
adolkhan Nov 17, 2023
f4879fd
LightWeight initialization
adolkhan Nov 17, 2023
abf9855
fixing error with light_initialization
AdkSarsen Nov 17, 2023
d72c4a0
adding support for path to accept path to serilized vectorstore object
AdkSarsen Nov 21, 2023
1743105
fixing ManagedDH attribute error
AdkSarsen Nov 23, 2023
d1df40e
Move after init
dgaloop Nov 23, 2023
6cadc94
Merge branch 'managed-thin-client' of github.com:activeloopai/deeplak…
dgaloop Nov 23, 2023
b1a4746
Refactoring deepmemory
AdkSarsen Nov 23, 2023
7c3d746
adding tool to easilly train deepmemory
AdkSarsen Nov 23, 2023
b51df03
Commenting get_dataset_handler logic as TODO
AdkSarsen Nov 24, 2023
8937682
creating a separate module for def helping tools
AdkSarsen Nov 24, 2023
85bbad1
fixing update embedding bug
AdkSarsen Nov 24, 2023
89d29c2
adding vectorstore commit test
AdkSarsen Nov 24, 2023
58d15bf
fixing commit test error
AdkSarsen Nov 24, 2023
a8777af
Added basic version control to VectorStore
AdkSarsen Nov 24, 2023
d396323
adding tests for the case when either both dataset and path are speci…
AdkSarsen Nov 24, 2023
e292925
adding test for the case when env token is empty
AdkSarsen Nov 24, 2023
ce62ceb
darglint fix
AdkSarsen Nov 24, 2023
b48144d
added __init__.py to dev_helpers
AdkSarsen Nov 24, 2023
a5e3853
fixing update_embedding test
AdkSarsen Nov 24, 2023
64dd6cd
removing ManagedSideDH
AdkSarsen Nov 27, 2023
91c1327
Increasing code coverage and removing managed db staff
AdkSarsen Nov 27, 2023
76d78e0
removing vectorstore dev_helpers
AdkSarsen Nov 27, 2023
45e74b5
Add index params to init API call
dgaloop Nov 27, 2023
51592a8
Merge branch 'deepmemory_refac' into managed-thin-client
AdkSarsen Nov 27, 2023
6730397
Fix index maintenance params
dgaloop Nov 27, 2023
2cd17d1
slight args fixes
dgaloop Nov 27, 2023
10f2d73
first changes
Nov 28, 2023
459b9fe
adding support for embedding_dict in update
AdkSarsen Nov 28, 2023
94a53e3
- Merge remote-tracking branch 'origin/main' into managed_thin
sounakr Dec 4, 2023
ae96871
- Update
sounakr Dec 5, 2023
d22ce91
Merge remote-tracking branch 'origin' into managed-thin-client
dgaloop Dec 6, 2023
be728cd
adding checks for unsupported parameter verification in managed_db
AdkSarsen Dec 7, 2023
9e848fe
fixing all test errors for arguments verification in ManagedDH
AdkSarsen Dec 7, 2023
9711958
removing old client_side_dataset_handler.py
AdkSarsen Dec 7, 2023
a5e7dc7
minor tests fixes
AdkSarsen Dec 7, 2023
52c15ef
Beginning to switch managed_client to polling of /vectorstore/job sta…
nvoxland Dec 8, 2023
f7f0923
Switch managed_client to polling of /vectorstore/job status API
nvoxland Dec 11, 2023
d15d96a
removing non default parameters
AdkSarsen Dec 11, 2023
5494a1b
commneting tests related to non default params
AdkSarsen Dec 11, 2023
8ca8b29
fixing search method
Dec 14, 2023
8bc28e8
Merge branch 'managed-thin-client' into managed-thin-client-jobcheck
dgaloop Dec 16, 2023
d5e8df4
Improve polling logic
dgaloop Dec 17, 2023
4cc7fdf
Add string return of dataset summary
dgaloop Dec 21, 2023
f5d3a66
thin client fixes
AdkSarsen Dec 22, 2023
7bc8b00
Fixes to init
dgaloop Dec 25, 2023
bf89544
Add exceptions
dgaloop Dec 25, 2023
7372550
Merge branch 'main' into managed-thin-client
AdkSarsen Dec 27, 2023
c80a065
failing tests fixes
AdkSarsen Dec 27, 2023
8b7a0ea
deepmemory typo fix
AdkSarsen Dec 27, 2023
1604447
Attempt update_embeddings fix
dgaloop Dec 27, 2023
d20e4d6
Fixing the deepmemory error
Dec 28, 2023
0763abf
Add temp. return_tql to managed_side search
dgaloop Dec 28, 2023
c0a76dd
deepmemory fixes
Dec 28, 2023
b0a3efc
fixing failing tests
Dec 29, 2023
424fe99
Merge branch 'managed-thin-client' of github.com:activeloopai/deeplak…
dgaloop Dec 29, 2023
345b23f
removing commented test
Dec 29, 2023
562ea7e
linting fixes
AdkSarsen Dec 30, 2023
5543e5e
Adding token to test
AdkSarsen Dec 30, 2023
b464e38
fixing failing tests
AdkSarsen Jan 2, 2024
9138890
Merge branch 'managed-thin-client' of github.com:activeloopai/deeplak…
dgaloop Jan 3, 2024
f5d64b3
Merge branch 'main' into managed-thin-client
dgaloop Jan 4, 2024
bebae98
memory backup impl
FayazRahman Jan 10, 2024
254fd6c
commit based rollback for update and pop
FayazRahman Jan 15, 2024
d06ca2b
Merge remote-tracking branch 'origin' into managed-thin-client
dgaloop Jan 22, 2024
43e03c7
from_meta method
FayazRahman Jan 23, 2024
6665ace
fix
FayazRahman Jan 23, 2024
22d8e44
fixes
FayazRahman Jan 24, 2024
8cf3c5a
check for link creds
FayazRahman Jan 24, 2024
786ddf4
Fixed edge cases in indra dataset adaptor.
khustup2 Feb 4, 2024
9dc5cd7
Merge remote-tracking branch 'origin' into managed-thin-client
dgaloop Feb 4, 2024
dd7aa24
Temporary skip deepmemory check on init
dgaloop Feb 6, 2024
6daf75d
Ensure no access by default and skip org fetching
dgaloop Feb 6, 2024
ddd1065
Merge branch 'main' of github.com:activeloopai/deeplake into managed-…
dgaloop Feb 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions deeplake/__init__.py
Expand Up @@ -111,3 +111,5 @@ def send_event():


threading.Thread(target=send_event, daemon=True).start()

shutdown_event = threading.Event()
2 changes: 2 additions & 0 deletions deeplake/client/config.py
Expand Up @@ -40,3 +40,5 @@
VECTORSTORE_SEARCH_SUFFIX = "/api/dlserver/vectorstore/search"
VECTORSTORE_ADD_SUFFIX = "/api/dlserver/vectorstore/add"
VECTORSTORE_REMOVE_ROWS_SUFFIX = "/api/dlserver/vectorstore/remove"
VECTORSTORE_UPDATE_ROWS_SUFFIX = "/api/dlserver/vectorstore/update"
JOB_POLLING_INTERVAL = 5
Empty file.
244 changes: 244 additions & 0 deletions deeplake/client/managed/managed_client.py
@@ -0,0 +1,244 @@
from time import sleep
from requests import Response # type: ignore

import numpy as np
from typing import Callable, Dict, List, Any, Optional, Union

from deeplake.client.client import DeepLakeBackendClient
from deeplake.client.utils import (
check_response_status,
)
from deeplake.client.config import (
GET_VECTORSTORE_SUMMARY_SUFFIX,
INIT_VECTORSTORE_SUFFIX,
DELETE_VECTORSTORE_SUFFIX,
VECTORSTORE_ADD_SUFFIX,
VECTORSTORE_REMOVE_ROWS_SUFFIX,
VECTORSTORE_UPDATE_ROWS_SUFFIX,
VECTORSTORE_SEARCH_SUFFIX,
JOB_POLLING_INTERVAL,
)

from deeplake.client.managed.models import (
VectorStoreSummaryResponse,
VectorStoreInitResponse,
VectorStoreSearchResponse,
VectorStoreAddResponse,
VectorStoreDeleteResponse,
VectorStoreUpdateResponse,
)


class ManagedServiceClient(DeepLakeBackendClient):
def _preprocess_embedding(self, embedding: Union[List[float], np.ndarray, None]):
if embedding is not None and isinstance(embedding, np.ndarray):
return embedding.tolist()
return embedding

Check warning on line 36 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L34-L36

Added lines #L34 - L36 were not covered by tests

def _get_result_or_poll(self, response: Response):
data = response.json()
if response.status_code == 202:
url = data["url"]
data = self.poll_status(url)

Check warning on line 42 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L39-L42

Added lines #L39 - L42 were not covered by tests

return data

Check warning on line 44 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L44

Added line #L44 was not covered by tests

def poll_status(self, url: str):
while True:
response = self.request(

Check warning on line 48 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L47-L48

Added lines #L47 - L48 were not covered by tests
method="GET",
relative_url=url,
)
if response.status_code == 202:
sleep(JOB_POLLING_INTERVAL)
continue

Check warning on line 54 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L52-L54

Added lines #L52 - L54 were not covered by tests

data = response.json()

Check warning on line 56 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L56

Added line #L56 was not covered by tests

return data

Check warning on line 58 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L58

Added line #L58 was not covered by tests

def init_vectorstore(
self,
path: str,
overwrite: bool,
tensor_params: List[Dict[str, Any]],
index_params: Dict,
branch: str,
verbose: bool,
):
response = self.request(
method="POST",
relative_url=INIT_VECTORSTORE_SUFFIX,
json={
"dataset": path,
"overwrite": overwrite,
"tensor_params": tensor_params,
"index_params": index_params,
"branch": branch,
"verbose": verbose,
},
)
data = self._get_result_or_poll(response)
error = data.get("error", None)

Check warning on line 82 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L81-L82

Added lines #L81 - L82 were not covered by tests

if error is not None:
raise ValueError(error)

Check warning on line 85 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L84-L85

Added lines #L84 - L85 were not covered by tests

return VectorStoreInitResponse(

Check warning on line 87 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L87

Added line #L87 was not covered by tests
status_code=response.status_code,
path=data["path"],
summary=data["summary"],
length=data["length"],
tensors=data["tensors"],
exists=data.get("exists", False),
)

def delete_vectorstore(self, path: str, force: bool = False):
response = self.request(

Check warning on line 97 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L97

Added line #L97 was not covered by tests
method="DELETE",
relative_url=DELETE_VECTORSTORE_SUFFIX,
json={"dataset": path, "force": force},
)
check_response_status(response)

Check warning on line 102 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L102

Added line #L102 was not covered by tests

def get_vectorstore_summary(self, path: str):
org_id, dataset_id = path[6:].split("/")
response = self.request(

Check warning on line 106 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L105-L106

Added lines #L105 - L106 were not covered by tests
method="GET",
relative_url=GET_VECTORSTORE_SUMMARY_SUFFIX.format(org_id, dataset_id),
)
check_response_status(response)
data = response.json()

Check warning on line 111 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L110-L111

Added lines #L110 - L111 were not covered by tests

return VectorStoreSummaryResponse(

Check warning on line 113 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L113

Added line #L113 was not covered by tests
status_code=response.status_code,
summary=data["summary"],
length=data["length"],
tensors=data["tensors"],
)

def vectorstore_search(
self,
path: str,
embedding: Optional[Union[List[float], np.ndarray]] = None,
k: int = 4,
distance_metric: Optional[str] = None,
query: Optional[str] = None,
filter: Optional[Dict[str, str]] = None,
embedding_tensor: str = "embedding",
return_tensors: Optional[List[str]] = None,
deep_memory: bool = False,
):
url = VECTORSTORE_SEARCH_SUFFIX
body = {

Check warning on line 133 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L132-L133

Added lines #L132 - L133 were not covered by tests
"dataset": path,
"embedding": self._preprocess_embedding(embedding),
"k": k,
"distance_metric": distance_metric,
"query": query,
"filter": filter,
"embedding_tensor": embedding_tensor,
"return_tensors": return_tensors,
"deep_memory": deep_memory,
}
response = self.request(method="POST", relative_url=url, json=body)
data = self._get_result_or_poll(response)
error = data.get("error", None)

Check warning on line 146 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L144-L146

Added lines #L144 - L146 were not covered by tests

if error is not None:
raise ValueError(error)

Check warning on line 149 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L148-L149

Added lines #L148 - L149 were not covered by tests

return VectorStoreSearchResponse(

Check warning on line 151 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L151

Added line #L151 was not covered by tests
status_code=response.status_code,
length=data["length"],
data=data["data"],
)

def vectorstore_add(
self,
path: str,
processed_tensors: Dict[str, List[Any]],
rate_limiter: Optional[Dict[str, Any]] = None,
return_ids: bool = False,
):
response = self.request(

Check warning on line 164 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L164

Added line #L164 was not covered by tests
method="POST",
relative_url=VECTORSTORE_ADD_SUFFIX,
json={
"dataset": path,
"data": processed_tensors,
"rate_limiter": rate_limiter,
"return_ids": return_ids,
},
)
data = self._get_result_or_poll(response)
data = data.get("result", {})
ids = data.get("ids", None)
error = data.get("error", None)

Check warning on line 177 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L174-L177

Added lines #L174 - L177 were not covered by tests

if error is not None:
raise ValueError(error)

Check warning on line 180 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L179-L180

Added lines #L179 - L180 were not covered by tests

return VectorStoreAddResponse(

Check warning on line 182 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L182

Added line #L182 was not covered by tests
status_code=response.status_code, ids=ids, error=error
)

def vectorstore_remove_rows(
self,
path: str,
row_ids: Optional[List[int]] = None,
ids: Optional[List[str]] = None,
filter: Optional[Dict[str, str]] = None,
query: Optional[str] = None,
delete_all: bool = False,
):
response = self.request(

Check warning on line 195 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L195

Added line #L195 was not covered by tests
method="POST",
relative_url=VECTORSTORE_REMOVE_ROWS_SUFFIX,
json={
"dataset": path,
"row_ids": row_ids,
"ids": ids,
"filter": filter,
"query": query,
"delete_all": delete_all,
},
)
data = self._get_result_or_poll(response)
data = data.get("result", {})
error = data.get("error", None)

Check warning on line 209 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L207-L209

Added lines #L207 - L209 were not covered by tests

if error is not None:
raise ValueError(error)

Check warning on line 212 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L211-L212

Added lines #L211 - L212 were not covered by tests

return VectorStoreDeleteResponse(status_code=response.status_code, error=error)

Check warning on line 214 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L214

Added line #L214 was not covered by tests

def vectorstore_update_embeddings(
self,
path: str,
row_ids: List[str],
ids: List[str],
filter: Union[Dict, Callable],
query: str,
embedding_dict: Optional[Dict[str, Union[List[float], List[float]]]],
):
response = self.request(

Check warning on line 225 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L225

Added line #L225 was not covered by tests
method="POST",
relative_url=VECTORSTORE_UPDATE_ROWS_SUFFIX,
json={
"dataset": path,
"row_ids": row_ids,
"ids": ids,
"filter": filter,
"query": query,
"embedding_dict": embedding_dict,
},
)
data = self._get_result_or_poll(response)
data = data.get("result", {})
error = data.get("error", None)

Check warning on line 239 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L237-L239

Added lines #L237 - L239 were not covered by tests

if error is not None:
raise ValueError(error)

Check warning on line 242 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L241-L242

Added lines #L241 - L242 were not covered by tests

return VectorStoreUpdateResponse(status_code=response.status_code, error=error)

Check warning on line 244 in deeplake/client/managed/managed_client.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/managed/managed_client.py#L244

Added line #L244 was not covered by tests
42 changes: 42 additions & 0 deletions deeplake/client/managed/models.py
@@ -0,0 +1,42 @@
from typing import NamedTuple, Dict, List, Optional, Any


class VectorStoreSummaryResponse(NamedTuple):
status_code: int
summary: str
length: int
tensors: List[
Dict[str, Any]
] # Same format as `tensor_params` in `init_vectorstore`


class VectorStoreInitResponse(NamedTuple):
status_code: int
path: str
summary: str
length: int
tensors: List[Dict[str, Any]]
exists: bool


class VectorStoreSearchResponse(NamedTuple):
status_code: int
length: int
data: Dict[str, List[Any]]
error: Optional[str] = None


class VectorStoreAddResponse(NamedTuple):
status_code: int
ids: Optional[List[str]] = None
error: Optional[str] = None


class VectorStoreDeleteResponse(NamedTuple):
status_code: int
error: Optional[str] = None


class VectorStoreUpdateResponse(NamedTuple):
status_code: int
error: Optional[str] = None
3 changes: 3 additions & 0 deletions deeplake/client/utils.py
Expand Up @@ -24,6 +24,7 @@
UnexpectedStatusCodeException,
EmptyTokenException,
UnprocessableEntityException,
RequestPayloadTooLargeException,
)


Expand Down Expand Up @@ -91,6 +92,8 @@
if message != " ":
raise ResourceNotFoundException(message)
raise ResourceNotFoundException
elif response.status_code == 413:
raise RequestPayloadTooLargeException

Check warning on line 96 in deeplake/client/utils.py

View check run for this annotation

Codecov / codecov/patch

deeplake/client/utils.py#L96

Added line #L96 was not covered by tests
elif response.status_code == 422:
raise UnprocessableEntityException(message)
elif response.status_code == 423:
Expand Down
2 changes: 1 addition & 1 deletion deeplake/constants.py
Expand Up @@ -237,7 +237,7 @@
{
"name": "embedding",
"htype": "embedding",
"dtype": np.float32,
"dtype": "float32",
"create_id_tensor": False,
"create_sample_info_tensor": False,
"create_shape_tensor": True,
Expand Down