Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic doc changes #486

Open
wants to merge 12 commits into
base: mainline
Choose a base branch
from
Open

Basic doc changes #486

wants to merge 12 commits into from

Conversation

Jeadie
Copy link
Contributor

@Jeadie Jeadie commented May 23, 2023

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

  • What is the current behavior? (You can also link to an open issue here)

  • What is the new behavior (if this is a feature change)?

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

  • Have unit tests been run against this PR? (Has there also been any additional testing?)

  • Related Python client changes (link commit/PR here)

  • Related documentation changes (link commit/PR here)

  • Other information:

  • Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added / updated (for bug fixes / features)

@Jeadie Jeadie temporarily deployed to marqo-test-suite May 30, 2023 05:18 — with GitHub Actions Inactive
index_name: str
auto_refresh: bool

docs: List[Document] = Field(default_factory=list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to separate the documents from the parameters as part of a separate-data-from-config principle. Does this change your design?

Copy link
Contributor Author

@Jeadie Jeadie Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to parse the HTTP request's Dict/Json to a predefined structure immediately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't get parsed immediately, because requests can come in with batch_size and processes

)
return weights

class Document(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider logically separating this from the config-oriented objects. For example, by putting this at the top of the file and then having a section dedicated for config (# ------ ADD DOCUMENTS CONFIG OBJECTS: ------)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need Document above AddDocsParam for typing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that requirement shouldn't conflict with my comment

src/marqo/tensor_search/models/add_docs_objects.py Outdated Show resolved Hide resolved
src/marqo/tensor_search/tensor_search.py Show resolved Hide resolved
src/marqo/tensor_search/tensor_search.py Show resolved Hide resolved
[(None, 'error'), ("1511", 'error'), ("cool", 'result'), ("144451", "error")]),
([{123: "bad", "_id": "12345"}, {"_id": "cool"}], [("12345", 'error'), ("cool", 'result')]),
([{None: "bad", "_id": "12345"}, {"_id": "cool"}], [("12345", 'error'), ("cool", 'result')]),
# handle bad content
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these cases removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it doesn't seem I added some updates to the PR description. Because we validate immediately after the HTTP request comes in, there are some validations (generally just Typing validation) that cannot be caught and then returned as an error. These are those tests. May be a problem with this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way around this? Like doing pydantic validation on each doc individually? and then adding the relevant "doc failed to be indexed" message to the response?

tests/tensor_search/test_add_documents.py Show resolved Hide resolved
tests/tensor_search/test_validation.py Show resolved Hide resolved
@Jeadie Jeadie temporarily deployed to marqo-test-suite June 1, 2023 22:38 — with GitHub Actions Inactive
@Jeadie
Copy link
Contributor Author

Jeadie commented Jun 1, 2023

@@ -25,6 +28,10 @@ def check_keys(cls, values):


class MappingObject(BaseModel):
"""Field level control over a field. Currently only supports multi-modal combinations

See: https://docs.marqo.ai/*/API-Reference/mappings/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That link doesn't work. Use https://docs.marqo.ai/latest/API-Reference/mappings/ for this purpose. For better future proofing, link it to a specific version: https://docs.marqo.ai/0.0.21/API-Reference/mappings/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think that, but it also means the docs are instantly out of date. At least with this is it's a regex of a link (we should also get latest to work).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test it out. It works with latest but not with *. I think it's best if users can copy-paste the link directly into their browser.

Latest vs fixed version is a tradeoff with between these two scenarios:

  1. We use latest. If we refactor our docs, this link will be dead
  2. We use a specific version. This may seem out of date, but link will always work. There will also be a "you are looking at an outdated version of the docs message"

latest should be fine as it's just for developers' reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, latest did not work originally, but does now.

@pandu-k pandu-k temporarily deployed to marqo-test-suite June 2, 2023 04:04 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants