Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (index): Index selection based on engine type. #408

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kpfadnis
Copy link
Collaborator

Pull Request

What does this PR do?

  • Update Get Indexes behavior to only respond with indexes matching engine type
  • Additional metadata fields added to index information for UI/UX purposes.
  • Formatting and minor refactoring of BM25 codebase.
  • Adding code formatter (Black) related configuration in the PyProject.toml [Should have no impact for users without Black]

Notes:

  • Replace (issue) above ↑↑↑ with the issue this PR closes to automatically link the two.
    This must be done when the PR is created.
  • Add multiple Closes #(issue) as needed.
  • If this PR is work towards but does not close an issue, simply tag the issue without mentioning Closes.

Description

Describe the changes proposed by this PR below to give the reviewer context below ↓↓↓

  1. Update protos/data_models for indexing related services.
  2. Ran code formatter (Black) and linter (pylint) on BM25 code base and fixed some minor code styling.

Request Review

Be sure to request a review from one or more reviewers (unless the PR is to an unprotected branch).

Versioning

When opening a PR to make changes to PrimeQA (i.e. primeqa/) master, be sure to increment the version following
semantic versioning. The VERSION is stored here
and is incremented using bump2version {patch,minor,major} as described in the (development guide documentation)[development.html].

  • Have you updated the VERSION?
  • Or does this PR not change the primeqa package or was not into master?

After pulling in changes from master to an existing PR, ensure the VERSION is updated appropriately.
This may require bumping the version again if it has been previously bumped.

If you're not quite ready yet to post a PR for review, feel free to open a draft PR.

Releases

After Merging

If merging into master and VERSION was updated, after this PR is merged:

Checklist

Review the following and mark as completed:

  • Tag an issue or issues this PR addresses.
  • Added description of changes proposed.
  • Review requested as appropriate.
  • Version bumped as appropriate.
  • New classes, methods, and functions documented.
  • Documentation for modified code is updated.
  • Built documentation to confirm it renders as expected (see here).
  • Code cleaned up and commented out code removed.
  • Tests added to ensure all functionalities tested at >= 60% unit test coverage (see here).
  • Code cleaned up and commented out code removed.
  • Release created as needed after merging.

Copy link
Collaborator

@franzmpub franzmpub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started to look at it.
Are the formatting/style changes such as

  • "rank": i+1, -> "rank": i + 1,
  • single -> double quotes
  • one argument per line in dataclasses in in config.py

done automatically by an IDE?
It's nice to have unified style, but there is some cost in the initial review (hard to separate the functional and style changes).

@kpfadnis
Copy link
Collaborator Author

I personally use Black which is an opinionated formatter. It can be configured with VSCode and PyCharm and any other editor as well can be run via command line.

Black has some default settings and the only thing I tweaked is maximum number of characters to 79 (Adhering closely to PEP8 standards). Those settings can be found in pyproject.toml file.

I would highly recommend and insist that we adopt Black as our default formatter. I would also recommend that we start using a linter (pyLint) to improve overall code quality.

@franzmpub I have also skipped primeqa/ir/dense code from auto formatting for now. We can have a chat about if we should further restrict exclusion to ColBERT specific portion only.

@jdpsen @bhavani105 @avisil I think we should discuss and adopt this first few weeks of 2023.

Copy link
Collaborator

@franzmpub franzmpub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants