Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] paulgraham_essays cannot store to personal account #2740

Open
1 task done
davidgao7 opened this issue Jan 12, 2024 · 0 comments
Open
1 task done

[BUG] paulgraham_essays cannot store to personal account #2740

davidgao7 opened this issue Jan 12, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@davidgao7
Copy link

davidgao7 commented Jan 12, 2024

Severity

P1 - Urgent, but non-breaking

Current Behavior

[Environment]:

  • Python 3.10.0
  • llama-index==0.9.14.post3
  • openai==1.3.8
  • cohere==4.37
  • deeplake==3.8.14
  • libdeeplake==0.0.95
  • python-dotenv==1.0.0

[OS]

  • macOS Sonoma 14.2.1
  • chip Intel

I was practicing llama index while taking 'Retrieval Augmented Generation for Production with LangChain & LlamaIndex' (Thank you for making this awesome course possible! ) course on activeloop. When I was going through the section 'mastering advanced RAG techniques with LlamaIndex' tutorial, I tried to store the paul_graham_essay.txt into the vector store database following the code provided

from llama_index.vector_stores import DeepLakeVectorStore

my_activeloop_org_id = "genai360". # I changed this to my own organization id
my_activeloop_dataset_name = "LlamaIndex_paulgraham_essays"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"

# Create an index over the documnts
vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=False)

where I replace the organization ID as my own freshly created org, the rest codes are the same as the example,
I got the RateLimitError

Error details

RateLimitError                            Traceback (most recent call last)
Cell In[32], line 12
     10 storage_context = StorageContext.from_defaults(vector_store=vector_store)
     11 storage_context.docstore.add_documents(nodes)
---> 12 vector_index = VectorStoreIndex(nodes, storage_context=storage_context)

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:49, in VectorStoreIndex.__init__(self, nodes, index_struct, service_context, storage_context, use_async, store_nodes_override, show_progress, **kwargs)
     47 self._use_async = use_async
     48 self._store_nodes_override = store_nodes_override
---> 49 super().__init__(
     50     nodes=nodes,
     51     index_struct=index_struct,
     52     service_context=service_context,
     53     storage_context=storage_context,
     54     show_progress=show_progress,
     55     **kwargs,
     56 )

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/indices/base.py:71, in BaseIndex.__init__(self, nodes, index_struct, storage_context, service_context, show_progress, **kwargs)
     69 if index_struct is None:
     70     assert nodes is not None
---> 71     index_struct = self.build_index_from_nodes(nodes)
     72 self._index_struct = index_struct
     73 self._storage_context.index_store.add_index_struct(self._index_struct)

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:255, in VectorStoreIndex.build_index_from_nodes(self, nodes, **insert_kwargs)
    244 def build_index_from_nodes(
    245     self,
    246     nodes: Sequence[BaseNode],
    247     **insert_kwargs: Any,
    248 ) -> IndexDict:
    249     """Build the index from nodes.
    250 
    251     NOTE: Overrides BaseIndex.build_index_from_nodes.
    252         VectorStoreIndex only stores nodes in document store
    253         if vector store does not store text
    254     """
--> 255     return self._build_index_from_nodes(nodes, **insert_kwargs)

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:236, in VectorStoreIndex._build_index_from_nodes(self, nodes, **insert_kwargs)
    234     run_async_tasks(tasks)
    235 else:
--> 236     self._add_nodes_to_index(
    237         index_struct,
    238         nodes,
    239         show_progress=self._show_progress,
    240         **insert_kwargs,
    241     )
    242 return index_struct

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:189, in VectorStoreIndex._add_nodes_to_index(self, index_struct, nodes, show_progress, **insert_kwargs)
    186 if not nodes:
    187     return
--> 189 nodes = self._get_node_with_embedding(nodes, show_progress)
    190 new_ids = self._vector_store.add(nodes, **insert_kwargs)
    192 if not self._vector_store.stores_text or self._store_nodes_override:
    193     # NOTE: if the vector store doesn't store text,
    194     # we need to add the nodes to the index struct and document store

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/indices/vector_store/base.py:101, in VectorStoreIndex._get_node_with_embedding(self, nodes, show_progress)
     90 def _get_node_with_embedding(
     91     self,
     92     nodes: Sequence[BaseNode],
     93     show_progress: bool = False,
     94 ) -> List[BaseNode]:
     95     """Get tuples of id, node, and embedding.
     96 
     97     Allows us to store these nodes in a vector store.
     98     Embeddings are called in batches.
     99 
    100     """
--> 101     id_to_embed_map = embed_nodes(
    102         nodes, self._service_context.embed_model, show_progress=show_progress
    103     )
    105     results = []
    106     for node in nodes:

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/indices/utils.py:137, in embed_nodes(nodes, embed_model, show_progress)
    134     else:
    135         id_to_embed_map[node.node_id] = node.embedding
--> 137 new_embeddings = embed_model.get_text_embedding_batch(
    138     texts_to_embed, show_progress=show_progress
    139 )
    141 for new_id, text_embedding in zip(ids_to_embed, new_embeddings):
    142     id_to_embed_map[new_id] = text_embedding

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/embeddings/base.py:256, in BaseEmbedding.get_text_embedding_batch(self, texts, show_progress, **kwargs)
    250 if idx == len(texts) - 1 or len(cur_batch) == self.embed_batch_size:
    251     # flush
    252     with self.callback_manager.event(
    253         CBEventType.EMBEDDING,
    254         payload={EventPayload.SERIALIZED: self.to_dict()},
    255     ) as event:
--> 256         embeddings = self._get_text_embeddings(cur_batch)
    257         result_embeddings.extend(embeddings)
    258         event.on_end(
    259             payload={
    260                 EventPayload.CHUNKS: cur_batch,
    261                 EventPayload.EMBEDDINGS: embeddings,
    262             },
    263         )

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/embeddings/openai.py:385, in OpenAIEmbedding._get_text_embeddings(self, texts)
    378 """Get text embeddings.
    379 
    380 By default, this is a wrapper around _get_text_embedding.
    381 Can be overridden for batch queries.
    382 
    383 """
    384 client = self._get_client()
--> 385 return get_embeddings(
    386     client,
    387     texts,
    388     engine=self._text_engine,
    389     **self.additional_kwargs,
    390 )

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/tenacity/__init__.py:289, in BaseRetrying.wraps.<locals>.wrapped_f(*args, **kw)
    287 @functools.wraps(f)
    288 def wrapped_f(*args: t.Any, **kw: t.Any) -> t.Any:
--> 289     return self(f, *args, **kw)

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/tenacity/__init__.py:379, in Retrying.__call__(self, fn, *args, **kwargs)
    377 retry_state = RetryCallState(retry_object=self, fn=fn, args=args, kwargs=kwargs)
    378 while True:
--> 379     do = self.iter(retry_state=retry_state)
    380     if isinstance(do, DoAttempt):
    381         try:

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/tenacity/__init__.py:325, in BaseRetrying.iter(self, retry_state)
    323     retry_exc = self.retry_error_cls(fut)
    324     if self.reraise:
--> 325         raise retry_exc.reraise()
    326     raise retry_exc from fut.exception()
    328 if self.wait:

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/tenacity/__init__.py:158, in RetryError.reraise(self)
    156 def reraise(self) -> t.NoReturn:
    157     if self.last_attempt.failed:
--> 158         raise self.last_attempt.result()
    159     raise self

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/concurrent/futures/_base.py:438, in Future.result(self, timeout)
    436     raise CancelledError()
    437 elif self._state == FINISHED:
--> 438     return self.__get_result()
    440 self._condition.wait(timeout)
    442 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/concurrent/futures/_base.py:390, in Future.__get_result(self)
    388 if self._exception:
    389     try:
--> 390         raise self._exception
    391     finally:
    392         # Break a reference cycle with the exception in self._exception
    393         self = None

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/tenacity/__init__.py:382, in Retrying.__call__(self, fn, *args, **kwargs)
    380 if isinstance(do, DoAttempt):
    381     try:
--> 382         result = fn(*args, **kwargs)
    383     except BaseException:  # noqa: B902
    384         retry_state.set_exception(sys.exc_info())  # type: ignore[arg-type]

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/llama_index/embeddings/openai.py:162, in get_embeddings(client, list_of_text, engine, **kwargs)
    158 assert len(list_of_text) <= 2048, "The batch size should not be larger than 2048."
    160 list_of_text = [text.replace("\n", " ") for text in list_of_text]
--> 162 data = client.embeddings.create(input=list_of_text, model=engine, **kwargs).data
    163 return [d.embedding for d in data]

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/resources/embeddings.py:105, in Embeddings.create(self, input, model, encoding_format, user, extra_headers, extra_query, extra_body, timeout)
     99         embedding.embedding = np.frombuffer(  # type: ignore[no-untyped-call]
    100             base64.b64decode(data), dtype="float32"
    101         ).tolist()
    103     return obj
--> 105 return self._post(
    106     "/embeddings",
    107     body=maybe_transform(params, embedding_create_params.EmbeddingCreateParams),
    108     options=make_request_options(
    109         extra_headers=extra_headers,
    110         extra_query=extra_query,
    111         extra_body=extra_body,
    112         timeout=timeout,
    113         post_parser=parser,
    114     ),
    115     cast_to=CreateEmbeddingResponse,
    116 )

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:1086, in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
   1072 def post(
   1073     self,
   1074     path: str,
   (...)
   1081     stream_cls: type[_StreamT] | None = None,
   1082 ) -> ResponseT | _StreamT:
   1083     opts = FinalRequestOptions.construct(
   1084         method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1085     )
-> 1086     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:846, in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
    837 def request(
    838     self,
    839     cast_to: Type[ResponseT],
   (...)
    844     stream_cls: type[_StreamT] | None = None,
    845 ) -> ResponseT | _StreamT:
--> 846     return self._request(
    847         cast_to=cast_to,
    848         options=options,
    849         stream=stream,
    850         stream_cls=stream_cls,
    851         remaining_retries=remaining_retries,
    852     )

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:884, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
    882 if retries > 0 and self._should_retry(err.response):
    883     err.response.close()
--> 884     return self._retry_request(
    885         options,
    886         cast_to,
    887         retries,
    888         err.response.headers,
    889         stream=stream,
    890         stream_cls=stream_cls,
    891     )
    893 # If the response is streamed then we need to explicitly read the response
    894 # to completion before attempting to access the response text.
    895 if not err.response.is_closed:

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:956, in SyncAPIClient._retry_request(self, options, cast_to, remaining_retries, response_headers, stream, stream_cls)
    952 # In a synchronous context we are blocking the entire thread. Up to the library user to run the client in a
    953 # different thread if necessary.
    954 time.sleep(timeout)
--> 956 return self._request(
    957     options=options,
    958     cast_to=cast_to,
    959     remaining_retries=remaining,
    960     stream=stream,
    961     stream_cls=stream_cls,
    962 )

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:884, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
    882 if retries > 0 and self._should_retry(err.response):
    883     err.response.close()
--> 884     return self._retry_request(
    885         options,
    886         cast_to,
    887         retries,
    888         err.response.headers,
    889         stream=stream,
    890         stream_cls=stream_cls,
    891     )
    893 # If the response is streamed then we need to explicitly read the response
    894 # to completion before attempting to access the response text.
    895 if not err.response.is_closed:

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:956, in SyncAPIClient._retry_request(self, options, cast_to, remaining_retries, response_headers, stream, stream_cls)
    952 # In a synchronous context we are blocking the entire thread. Up to the library user to run the client in a
    953 # different thread if necessary.
    954 time.sleep(timeout)
--> 956 return self._request(
    957     options=options,
    958     cast_to=cast_to,
    959     remaining_retries=remaining,
    960     stream=stream,
    961     stream_cls=stream_cls,
    962 )

    [... skipping similar frames: SyncAPIClient._request at line 884 (7 times), SyncAPIClient._retry_request at line 956 (7 times)]

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:884, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
    882 if retries > 0 and self._should_retry(err.response):
    883     err.response.close()
--> 884     return self._retry_request(
    885         options,
    886         cast_to,
    887         retries,
    888         err.response.headers,
    889         stream=stream,
    890         stream_cls=stream_cls,
    891     )
    893 # If the response is streamed then we need to explicitly read the response
    894 # to completion before attempting to access the response text.
    895 if not err.response.is_closed:

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:956, in SyncAPIClient._retry_request(self, options, cast_to, remaining_retries, response_headers, stream, stream_cls)
    952 # In a synchronous context we are blocking the entire thread. Up to the library user to run the client in a
    953 # different thread if necessary.
    954 time.sleep(timeout)
--> 956 return self._request(
    957     options=options,
    958     cast_to=cast_to,
    959     remaining_retries=remaining,
    960     stream=stream,
    961     stream_cls=stream_cls,
    962 )

File ~/anaconda3/envs/llmstreamlit/lib/python3.10/site-packages/openai/_base_client.py:898, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
    895     if not err.response.is_closed:
    896         err.response.read()
--> 898     raise self._make_status_error_from_response(err.response) from None
    899 except httpx.TimeoutException as err:
    900     if response is not None:

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

I also checked my openai api and it is not exceeding quota; I couldn't find where to check the quota limits in app.activeloop.ai, this is the first time I use activeloop/deeplake
could anyone help me, please?

Thank you!

Steps to Reproduce

here's my current state of code and errors following the tutorial

Feel free to point out any other mistakes I've made:)! I'm new to llama-index and activeloop deeplake

Expected/Desired Behavior

I'm expecting the vector index will be created and uploaded to activeloop successfully. Just like what tutorial shows:

Screenshot 2024-01-12 at 5 57 11 PM

Python Version

3.10.0

OS

macOS Sonoma 14.2.1

IDE

jupyter lab

Packages

aioboto3==12.1.0 aiobotocore==2.8.0 aiohttp==3.9.1 aioitertools==0.11.0 aiosignal==1.3.1 aiostream==0.5.2 altair==5.2.0 annotated-types==0.6.0 anyio==3.7.1 appnope==0.1.3 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens==2.4.1 async-lru==2.0.4 async-timeout==4.0.3 atlassian-python-api==3.41.4 attrs==23.1.0 Babel==2.14.0 backoff==2.2.1 beautifulsoup4==4.12.2 black==22.12.0 bleach==6.1.0 blinker==1.7.0 boto3==1.33.1 botocore==1.33.1 cachetools==5.3.2 certifi==2023.11.17 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 clip @ git+https://github.com/openai/CLIP.git@a1d071733d7111c9c014f024669f959182114e33 cohere==4.37 comm==0.2.1 dataclasses-json==0.6.3 debugpy==1.8.0 decorator==5.1.1 deeplake==3.8.14 defusedxml==0.7.1 Deprecated==1.2.14 dill==0.3.7 distro==1.8.0 docopt==0.6.2 exceptiongroup==1.2.0 executing==2.0.1 fastavro==1.9.2 fastjsonschema==2.19.1 fqdn==1.5.1 frozenlist==1.4.0 fsspec==2023.12.1 gitdb==4.0.11 GitPython==3.1.40 google-api-core==2.15.0 google-auth==2.25.2 googleapis-common-protos==1.62.0 greenlet==3.0.1 h11==0.14.0 html2text==2020.1.16 httpcore==1.0.2 httpx==0.25.2 humbug==0.3.2 idna==3.6 importlib-metadata==6.11.0 iniconfig==2.0.0 ipykernel==6.28.0 ipython==8.19.0 isoduration==20.11.0 isort==5.11.4 jedi==0.19.1 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.3.2 json5==0.9.14 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.20.0 jsonschema-specifications==2023.7.1 jupyter-contrib-core==0.4.2 jupyter-events==0.9.0 jupyter-lsp==2.2.1 jupyter-nbextensions-configurator==0.6.3 jupyter_client==8.6.0 jupyter_core==5.6.1 jupyter_server==2.12.4 jupyter_server_terminals==0.5.1 jupyterlab==4.0.10 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.2 langchain==0.0.346 langchain-community==0.0.1 langchain-core==0.0.13 langsmith==0.0.69 libdeeplake==0.0.95 llama-hub==0.0.44 llama-index==0.9.14.post3 lz4==4.3.3 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.20.1 matplotlib-inline==0.1.6 mdurl==0.1.2 mistune==3.0.2 multidict==6.0.4 multiprocess==0.70.15 mypy==0.991 mypy-extensions==1.0.0 nbclient==0.9.0 nbconvert==7.14.1 nbformat==5.9.2 nest-asyncio==1.5.8 nltk==3.8.1 notebook==7.0.6 notebook_shim==0.2.3 numexpr==2.8.6 numpy==1.24.4 oauthlib==3.2.2 openai==1.3.8 overrides==7.4.0 packaging==23.2 pandas==2.0.3 pandocfilters==1.5.0 parso==0.8.3 pathos==0.3.1 pathspec==0.11.2 pexpect==4.9.0 Pillow==10.1.0 pipreqs==0.4.13 platformdirs==4.1.0 pluggy==1.3.0 pox==0.3.3 ppft==1.7.6.7 prometheus-client==0.19.0 prompt-toolkit==3.0.43 protobuf==4.25.1 psutil==5.9.6 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==14.0.1 pyasn1==0.5.1 pyasn1-modules==0.3.0 pycparser==2.21 pydantic==2.5.2 pydantic_core==2.14.5 pydeck==0.8.0 Pygments==2.17.2 PyJWT==2.8.0 pypdf==3.17.1 pytest==7.4.3 pytest-asyncio==0.21.1 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 referencing==0.30.2 regex==2023.10.3 requests==2.31.0 requests-oauthlib==1.3.1 retrying==1.3.4 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rich==13.7.0 rpds-py==0.13.2 rsa==4.9 ruff==0.0.285 s3transfer==0.8.0 Send2Trash==1.8.2 six==1.16.0 smmap==5.0.1 sniffio==1.3.0 soupsieve==2.5 SQLAlchemy==2.0.23 stack-data==0.6.3 streamlit==1.28.0 streamlit-pills==0.3.0 tenacity==8.2.3 terminado==0.18.0 tiktoken==0.5.2 tinycss2==1.2.1 toml==0.10.2 tomli==2.0.1 toolz==0.12.0 tornado==6.4 tqdm==4.66.1 traitlets==5.14.1 types-python-dateutil==2.8.19.20240106 types-requests==2.28.11.8 types-urllib3==1.26.25.14 typing-inspect==0.8.0 typing_extensions==4.8.0 tzdata==2023.3 tzlocal==5.2 uri-template==1.3.0 urllib3==2.0.7 validators==0.22.0 wcwidth==0.2.12 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 wikipedia==1.4.0 wrapt==1.16.0 yarg==0.1.9 yarl==1.9.3 zipp==3.17.0

Additional Context

No response

Possible Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR (Thank you!)
@davidgao7 davidgao7 added the bug Something isn't working label Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant