Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweep: the lexical search's add_document method should support multiprocessing (✓ Sandbox Passed) #3152

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sweep-nightly[bot]
Copy link
Contributor

@sweep-nightly sweep-nightly bot commented Feb 23, 2024

Description

This pull request aims to update the lexical search's add_document method to support multiprocessing, improving the efficiency of indexing documents.

Summary

  • Modified sweepai/core/lexical_search.py
  • Added support for multiprocessing in the add_document method
  • Utilized the multiprocessing Manager to manage the all_tokens list
  • Implemented a new add_document_worker function to handle document indexing in parallel
  • Updated progress tracking for indexing

Fixes #3150.


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To get Sweep to edit this pull request, you can:

  • Comment below, and Sweep can edit the entire PR
  • Comment on a file, Sweep will only modify the commented file
  • Edit the original issue to get Sweep to recreate the PR from scratch

Copy link
Contributor Author

sweep-nightly bot commented Feb 23, 2024

Rollback Files For Sweep

  • Rollback changes to sweepai/core/lexical_search.py

This is an automated message generated by Sweep AI.

Copy link
Contributor Author

sweep-nightly bot commented Feb 23, 2024

Apply Sweep Rules to your PR?

  • Apply: We should use loguru for error logging. If the log is inside an exception, use logger.exception to add tracebacks, where logger is imported from loguru. Use f-strings for string formatting in logger calls (e.g. logger.info(f'Hello {name}') instead of logger.info('Hello {name}', name=name)).
  • Apply: There should be no debug log or print statements in production code.
  • Apply: All functions should have parameters and output annotated with type hints. Use list, tuple and dict instead of typing.List, typing.Tuple and typing.dict.
  • Apply: Leftover TODOs in the code should be handled.
  • Apply: All new business logic should have corresponding unit tests in the same directory. For example, sweepai/api_test.py tests sweepai/api.py. Use unittest and unittest.mock as required.
  • Apply: Any clearly inefficient or repeated code should be optimized or refactored.
  • Apply: Remove any comments before code that are obvious. For example # this prints hello world; print('hello world').

This is an automated message generated by Sweep AI.

@sweep-nightly sweep-nightly bot added the sweep Assigns Sweep to an issue or pull request. label Feb 23, 2024
Copy link

vercel bot commented Feb 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sweep-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 23, 2024 9:21pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sweep Assigns Sweep to an issue or pull request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sweep: the lexical search's add_document method should support multiprocessing
1 participant