Make RAG embedding model dimension configurable #1241

usamimeri · 2024-05-01T17:34:28Z

User description

For #1213 and #1239. When using text-embedding-3-large(the dimension is 3072) from OpenAI, or some other embedding models deployed by fastchat(e.g. bce-embedding-base_v1 the dimension is 512), it will cause assertion error due to mismatch between query and index.

Consequently,I think it's reasonable to make dimension configurable.

# fastchat API
embedding:
  api_type: "openai"
  base_url: "http://127.0.0.1:8000/v1/embeddings"
  model: "bce-embedding-base_v1"
  dimensions: 512

codiumai-pr-agent-pro · 2024-05-01T17:35:41Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Description updated to latest commit (5d2bcbd)

Copy walkthrough table to "Files Changed" Tab

codiumai-pr-agent-pro · 2024-05-01T17:35:59Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Review

⏱️ Estimated effort to review [1-5]	4, because the PR involves multiple complex changes across various files including configuration updates, API integrations, and provider implementations. The addition of a new model provider (Bedrock) and the handling of different model parameters increase the complexity. The PR also touches on sensitive areas like token counting and cost management which require careful review to ensure accuracy and efficiency.
🧪 Relevant tests	Yes
🔍 Possible issues	Possible Bug: The use of hardcoded "YOUR_MODEL_DIMENSIONS" in the example configurations in 'embedding_config.py' might not be clear to users on what values should be actually provided.
	Configuration Error: The 'dimensions' field is added as a string in the examples within 'embedding_config.py', whereas it is treated as an integer in other parts of the application. This inconsistency could lead to runtime errors or misconfigurations.
	Error Handling: The method '_get_completion_from_dict' in various provider classes does not handle cases where the expected keys might be missing in the response dictionary, which could lead to KeyError exceptions.
🔒 Security concerns	No

✨ Review tool usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be added by configuring the tool.

The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.

When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:

/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...

With a configuration file, use the following template:

[pr_reviewer]
some_config1=...
some_config2=...

See the review usage page for a comprehensive guide on using this tool.

codiumai-pr-agent-pro · 2024-05-01T17:36:58Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Code Suggestions

Category	Suggestions
Maintainability	Use a common dictionary to store repeated token cost values. To avoid duplication and potential errors in the future, consider using a dictionary to store the common token costs and reference this dictionary when defining the costs for each model. metagpt/utils/token_counter.py [205-207] -"amazon.titan-tg1-large": {"prompt": 0.0008, "completion": 0.0008}, -"amazon.titan-text-express-v1": {"prompt": 0.0008, "completion": 0.0008}, -"amazon.titan-text-express-v1:0:8k": {"prompt": 0.0008, "completion": 0.0008}, +common_cost = {"prompt": 0.0008, "completion": 0.0008} +"amazon.titan-tg1-large": common_cost, +"amazon.titan-text-express-v1": common_cost, +"amazon.titan-text-express-v1:0:8k": common_cost,
Bug	Correct a typo in a dictionary key. Correct the typo in the dictionary key from 'NOT_SUUPORT_STREAM_MODELS' to 'NOT_SUPPORT_STREAM_MODELS' to improve code readability and prevent potential bugs. metagpt/provider/bedrock/utils.py [4] -NOT_SUUPORT_STREAM_MODELS = { +NOT_SUPPORT_STREAM_MODELS = {
	Add error handling for splitting the model_id. To avoid potential runtime errors from incorrect model names, add error handling for the case where the model name does not include a version or is improperly formatted. metagpt/provider/bedrock/bedrock_provider.py [115] -provider, model_name = model_id.split(".")[0:2] # meta、mistral…… +parts = model_id.split(".") +if len(parts) < 2: + raise ValueError(f"Invalid model_id format: {model_id}") +provider, model_name = parts[0:2]
	Correct a typo in a variable name. Correct the typo in the variable name `NOT_SUUPORT_STREAM_MODELS` to `NOT_SUPPORT_STREAM_MODELS` for clarity and to avoid potential bugs due to misspelling. tests/metagpt/provider/test_bedrock_api.py [6] -NOT_SUUPORT_STREAM_MODELS +NOT_SUPPORT_STREAM_MODELS
	Add a check to prevent potential KeyError when accessing a dictionary. Add a check to ensure `config.embedding.api_type` is in `_embedding_type_to_dimensions` before accessing it to prevent potential `KeyError`. metagpt/rag/schema.py [53] -self.dimensions = self._embedding_type_to_dimensions.get(config.embedding.api_type, 1536) +if config.embedding.api_type in self._embedding_type_to_dimensions: + self.dimensions = self._embedding_type_to_dimensions[config.embedding.api_type] +else: + logger.info(f"API type {config.embedding.api_type} is not recognized. Defaulting to 1536 dimensions.") + self.dimensions = 1536
Best practice	Use a constant for default max tokens value. Instead of using a hardcoded value for unsupported models' max tokens, consider defining a constant at the module level to improve maintainability and readability. metagpt/provider/bedrock/utils.py [111] -max_tokens = 2048 +DEFAULT_MAX_TOKENS = 2048 +max_tokens = DEFAULT_MAX_TOKENS
	Add a warning log when configured max tokens exceed the model's limit. To ensure that the 'max_tokens' value does not exceed the model's limit, add a check to log a warning if the configured 'max_token' is higher than the model's maximum. This helps in debugging and maintaining correct flow control. metagpt/provider/bedrock_api.py [83-84] if self.config.max_token > model_max_tokens: + logger.warning(f"Configured max_token {self.config.max_token} exceeds model's max {model_max_tokens}. Using model's max.") max_tokens = model_max_tokens
	Rename a function to enhance code readability. Use a more descriptive function name than `is_subset` to clarify its purpose, such as `is_request_body_subset`. tests/metagpt/provider/test_bedrock_api.py [59] -def is_subset(subset, superset) -> bool: +def is_request_body_subset(subset, superset) -> bool:
Enhancement	Use a dictionary to manage conditional logic for provider responses. Replace the hardcoded provider names in `mock_invoke_model_stream` with a more scalable approach using a dictionary lookup to avoid repetitive code and improve maintainability. tests/metagpt/provider/test_bedrock_api.py [38-47] -if provider == "amazon": - response_body_bytes = dict2bytes({"outputText": "Hello World"}) -elif provider == "anthropic": - response_body_bytes = dict2bytes( - {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta", "text": "Hello World"}} - ) -elif provider == "cohere": - response_body_bytes = dict2bytes({"is_finished": False, "text": "Hello World"}) -else: - response_body_bytes = dict2bytes(BEDROCK_PROVIDER_RESPONSE_BODY[provider]) +provider_responses = { + "amazon": {"outputText": "Hello World"}, + "anthropic": {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta", "text": "Hello World"}}, + "cohere": {"is_finished": False, "text": "Hello World"} +} +response_body_bytes = dict2bytes(provider_responses.get(provider, BEDROCK_PROVIDER_RESPONSE_BODY[provider]))
Enhancement	Simplify list comprehension with a more direct generator expression. Use a dictionary comprehension to simplify the `messages_to_prompt` method in the `BaseBedrockProvider` class. tests/metagpt/provider/test_bedrock_api.py [27] -return "\n".join([f"{i['role']}: {i['content']}" for i in messages]) +return "\n".join(f"{msg['role']}: {msg['content']}" for msg in messages)

✨ Improve tool usage guide:

Overview:
The improve tool scans the PR code changes, and automatically generates suggestions for improving the PR code. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on a PR.

When commenting, to edit configurations related to the improve tool (pr_code_suggestions section), use the following template:

/improve --pr_code_suggestions.some_config1=... --pr_code_suggestions.some_config2=...

With a configuration file, use the following template:

[pr_code_suggestions]
some_config1=...
some_config2=...

See the improve usage page for a comprehensive guide on using this tool.

codiumai-pr-agent-pro · 2024-05-01T17:48:52Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Description updated to latest commit (5d2bcbd)

Copy walkthrough table to "Files Changed" Tab

codiumai-pr-agent-pro · 2024-05-01T17:49:08Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Review

⏱️ Estimated effort to review [1-5]	4, because the PR introduces a significant amount of new functionality across multiple files, including new configurations, provider implementations, and integration with external services like AWS Bedrock. The complexity and breadth of the changes require a thorough review to ensure correctness, security, and maintainability.
🧪 Relevant tests	Yes
🔍 Possible issues	Possible Bug: The handling of dimensions in `FAISSRetrieverConfig` relies on the external configuration being correctly set. If not, it defaults to 1536, which might not be suitable for all models.
🔍 Possible issues	Configuration Dependency: The system's operation now heavily depends on the correct configuration of dimensions in `embedding_config.py`. Any misconfiguration could lead to runtime errors or incorrect behavior.
🔒 Security concerns	No

✨ Review tool usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be added by configuring the tool.

The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.

When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:

/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...

With a configuration file, use the following template:

[pr_reviewer]
some_config1=...
some_config2=...

See the review usage page for a comprehensive guide on using this tool.

codiumai-pr-agent-pro · 2024-05-01T17:50:11Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Code Suggestions

Category	Suggestions
Maintainability	Improve naming consistency and clarity in dictionary keys. Consider using a more descriptive and consistent naming convention for the dictionary keys to avoid confusion and potential errors in future code maintenance or expansion. metagpt/utils/token_counter.py [204-246] BEDROCK_TOKEN_COSTS = { - "amazon.titan-tg1-large": {"prompt": 0.0008, "completion": 0.0008}, + "amazon_titan_tg1_large": {"prompt": 0.0008, "completion": 0.0008}, ... - "ai21.j2-ultra-v1": {"prompt": 0.0188, "completion": 0.0188}, + "ai21_j2_ultra_v1": {"prompt": 0.0188, "completion": 0.0188}, }
	Use a constant for default token values to enhance maintainability. Replace the hardcoded token values with a configurable setting or constant to facilitate easier adjustments and maintainability. metagpt/provider/bedrock/utils.py [4-10] -NOT_SUUPORT_STREAM_MODELS = { - "ai21.j2-grande-instruct": 8000, +DEFAULT_MAX_TOKENS = 8000 +NOT_SUPPORT_STREAM_MODELS = { + "ai21.j2-grande-instruct": DEFAULT_MAX_TOKENS, ... - "ai21.j2-ultra-v1": 8000, + "ai21.j2-ultra-v1": DEFAULT_MAX_TOKENS, }
	Refactor hardcoded conditions into a scalable function. Replace the hardcoded provider names in `mock_invoke_model_stream` with a more scalable approach using a dictionary or a function to handle different providers, enhancing maintainability and scalability. tests/metagpt/provider/test_bedrock_api.py [38-47] -if provider == "amazon": - response_body_bytes = dict2bytes({"outputText": "Hello World"}) -elif provider == "anthropic": - response_body_bytes = dict2bytes( - {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta", "text": "Hello World"}} - ) -elif provider == "cohere": - response_body_bytes = dict2bytes({"is_finished": False, "text": "Hello World"}) -else: - response_body_bytes = dict2bytes(BEDROCK_PROVIDER_RESPONSE_BODY[provider]) +response_body_bytes = dict2bytes(get_provider_response(provider))
Enhancement	Correct spelling and improve readability in dictionary name. Refactor the `NOT_SUUPORT_STREAM_MODELS` dictionary name to correct the spelling error and improve readability. metagpt/provider/bedrock/utils.py [4-10] -NOT_SUUPORT_STREAM_MODELS = { +NOT_SUPPORT_STREAM_MODELS = { "ai21.j2-grande-instruct": 8000, ... "ai21.j2-ultra-v1": 8000, }
Enhancement	Add error handling to the model invocation method to improve robustness. Implement error handling for the `invoke_model` method to manage exceptions that may occur during the API call. metagpt/provider/bedrock_api.py [67-72] def invoke_model(self, request_body: str) -> dict: - response = self.__client.invoke_model(modelId=self.config.model, body=request_body) - usage = self._get_usage(response) - self._update_costs(usage, self.config.model) - response_body = self._get_response_body(response) - return response_body + try: + response = self.__client.invoke_model(modelId=self.config.model, body=request_body) + usage = self._get_usage(response) + self._update_costs(usage, self.config.model) + response_body = self._get_response_body(response) + return response_body + except Exception as e: + logger.error(f"Failed to invoke model: {e}") + return {}
Security	Enhance security by avoiding hardcoded AWS credentials. Use a more secure method to handle AWS credentials instead of hardcoding them in the client initialization method. metagpt/provider/bedrock_api.py [29-39] def __init_client(self, service_name: Literal["bedrock-runtime", "bedrock"]): - self.__credentital_kwargs = { - "aws_secret_access_key": self.config.secret_key, - "aws_access_key_id": self.config.access_key, - "region_name": self.config.region_name, - } - session = boto3.Session(**self.__credentital_kwargs) - client = session.client(service_name) + session = boto3.Session() + client = session.client(service_name, region_name=self.config.region_name) return client
Bug	Correct a typo in a variable name. Correct the typo in the variable name `NOT_SUUPORT_STREAM_MODELS` to `NOT_SUPPORT_STREAM_MODELS` to maintain consistency and avoid potential bugs due to misspellings. tests/metagpt/provider/test_bedrock_api.py [6] -NOT_SUUPORT_STREAM_MODELS +NOT_SUPPORT_STREAM_MODELS
Bug	Add checks for dictionary keys before method calls to prevent runtime errors. Ensure that the `usage` dictionary keys used in `_update_costs` method calls are defined as parameters in the method's signature to avoid runtime errors if the keys are not present. tests/metagpt/provider/test_bedrock_api.py [27-50] -self._update_costs(usage, self.config.model) +if 'prompt_tokens' in usage and 'completion_tokens' in usage: + self._update_costs(usage, self.config.model) +else: + logger.error("Required keys are missing in the 'usage' dictionary") + raise KeyError("Required keys are missing in the 'usage' dictionary")
Error handling	Add specific exception handling for JSON operations. Use a more specific exception handling for the `json.dumps` and `json.loads` operations in `mock_invoke_model_stream` and `get_choice_text_from_stream` to handle potential JSON serialization and parsing errors. tests/metagpt/provider/test_bedrock_api.py [34-22] -return json.dumps(x).encode("utf-8") -rsp_dict = json.loads(event["chunk"]["bytes"]) +try: + return json.dumps(x).encode("utf-8") +except json.JSONEncodeError as e: + logger.error(f"Failed to encode JSON: {e}") + raise +try: + rsp_dict = json.loads(event["chunk"]["bytes"]) +except json.JSONDecodeError as e: + logger.error(f"Failed to decode JSON: {e}") + raise +
Performance	Optimize the `is_subset` function for better performance. Replace the `is_subset` function with a more efficient implementation using set operations to improve performance. tests/metagpt/provider/test_bedrock_api.py [69-77] -for key, value in subset.items(): - if key not in superset: +subset_keys = set(subset.keys()) +superset_keys = set(superset.keys()) +if not subset_keys <= superset_keys: + return False +for key in subset_keys: + if isinstance(subset[key], dict) and isinstance(superset[key], dict): + if not is_subset(subset[key], superset[key]): + return False + elif subset[key] != superset[key]: return False - if isinstance(value, dict): - if not isinstance(superset[key], dict): - return False - if not is_subset(value, superset[key]): - return False return True

✨ Improve tool usage guide:

Overview:
The improve tool scans the PR code changes, and automatically generates suggestions for improving the PR code. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on a PR.

When commenting, to edit configurations related to the improve tool (pr_code_suggestions section), use the following template:

/improve --pr_code_suggestions.some_config1=... --pr_code_suggestions.some_config2=...

With a configuration file, use the following template:

[pr_code_suggestions]
some_config1=...
some_config2=...

See the improve usage page for a comprehensive guide on using this tool.

codecov-commenter · 2024-05-01T18:11:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.20%. Comparing base (f201b2f) to head (553702f).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1241      +/-   ##
==========================================
- Coverage   70.22%   70.20%   -0.03%     
==========================================
  Files         316      316              
  Lines       18860    18864       +4     
==========================================
- Hits        13245    13244       -1     
- Misses       5615     5620       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

config/config2.example.yaml

metagpt/rag/schema.py

seehi · 2024-05-07T08:04:32Z

metagpt/rag/schema.py

+            self.dimensions = config.embedding.dimensions or self._embedding_type_to_dimensions.get(
+                config.embedding.api_type, 1536
+            )
+            if config.embedding.api_type not in self._embedding_type_to_dimensions:


if not config.embedding.dimensions and ...

seehi · 2024-05-15T09:33:53Z

LGTM

usamimeri requested a deployment to unittest May 1, 2024 17:34 — with GitHub Actions Waiting

usamimeri closed this May 1, 2024

codiumai-pr-agent-pro bot added documentation Improvements or additions to documentation enhancement New feature or request labels May 1, 2024

codiumai-pr-agent-pro bot added the Review effort [1-5]: 4 label May 1, 2024

usamimeri reopened this May 1, 2024

usamimeri requested a deployment to unittest May 1, 2024 17:48 — with GitHub Actions Waiting

make rag configurable

39ed967

usamimeri force-pushed the debug branch from 5d2bcbd to 39ed967 Compare May 1, 2024 17:48

usamimeri requested a deployment to unittest May 1, 2024 17:48 — with GitHub Actions Waiting

codiumai-pr-agent-pro bot added bug_fix and removed documentation Improvements or additions to documentation labels May 1, 2024

geekan reviewed May 3, 2024

View reviewed changes

config/config2.example.yaml Outdated Show resolved Hide resolved

metagpt/rag/schema.py Outdated Show resolved Hide resolved

add comment

fa8b35c

usamimeri requested a deployment to unittest May 3, 2024 08:46 — with GitHub Actions Waiting

seehi reviewed May 6, 2024

View reviewed changes

metagpt/rag/schema.py Outdated Show resolved Hide resolved

make dimensions default

553702f

usamimeri requested a deployment to unittest May 7, 2024 04:39 — with GitHub Actions Waiting

seehi reviewed May 7, 2024

View reviewed changes

fix

d31d750

usamimeri requested a deployment to unittest May 7, 2024 08:27 — with GitHub Actions Waiting

geekan approved these changes May 15, 2024

View reviewed changes

geekan merged commit 77c74fb into geekan:main May 15, 2024
0 of 2 checks passed

usamimeri deleted the debug branch May 17, 2024 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make RAG embedding model dimension configurable #1241

Make RAG embedding model dimension configurable #1241

usamimeri commented May 1, 2024 •

edited

codiumai-pr-agent-pro bot commented May 1, 2024

codiumai-pr-agent-pro bot commented May 1, 2024

codiumai-pr-agent-pro bot commented May 1, 2024

codiumai-pr-agent-pro bot commented May 1, 2024

codiumai-pr-agent-pro bot commented May 1, 2024

codiumai-pr-agent-pro bot commented May 1, 2024

codecov-commenter commented May 1, 2024 •

edited

seehi May 7, 2024

seehi commented May 15, 2024

Make RAG embedding model dimension configurable #1241

Make RAG embedding model dimension configurable #1241

Conversation

usamimeri commented May 1, 2024 • edited

User description

codiumai-pr-agent-pro bot commented May 1, 2024

codiumai-pr-agent-pro bot commented May 1, 2024

PR Review

codiumai-pr-agent-pro bot commented May 1, 2024

PR Code Suggestions

codiumai-pr-agent-pro bot commented May 1, 2024

codiumai-pr-agent-pro bot commented May 1, 2024

PR Review

codiumai-pr-agent-pro bot commented May 1, 2024

PR Code Suggestions

codecov-commenter commented May 1, 2024 • edited

Codecov Report

seehi May 7, 2024

Choose a reason for hiding this comment

seehi commented May 15, 2024

usamimeri commented May 1, 2024 •

edited

codecov-commenter commented May 1, 2024 •

edited