Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement /v3/chat/completions endpoint - forward to MP graph #2436

Merged
merged 6 commits into from May 17, 2024

Conversation

dkalinowski
Copy link
Collaborator

@dkalinowski dkalinowski commented May 8, 2024

馃洜 Summary

CVS-138032

Implementation of /v3/chat/completions endpoint and forwarding the HTTP message to MediaPipe graph.
The data is std::string now, to be adjusted in following tasks (CVS-139240/CVS-140684).

馃И Checklist

  • Unit tests added. Minimal positive unit test added. Remaining tests will be added in next PR.
  • The documentation updated. No documentation changes required for now.
  • Change follows security best practices.

+ if (!result) {
+ NET_LOG(ERROR, "Failed to EventLoopSchedule PartialReplyWithStatus()");
+ Abort();
+ // TODO(wenboz): should have a forced abort that doesn't write back anything
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this comes from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mimicking existing parts of the code in net_http module. There are non-partial reply methods with similar comments: https://github.com/tensorflow/serving/blob/83c949cd1c997ada425ac2e30510884f16f62b62/tensorflow_serving/util/net_http/server/internal/evhttp_request.cc#L362

I don't think we should care to remove it, let me know if you think otherwise @atobiszei


namespace ovms {

Status deserializeInputSidePacketsFromFirstRequestImpl(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentional to leave it empty or it will be implemented later on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now there is no known use case for side packets deserialization from requests, therefore it is enough implementation (returning ok and doing nothing)


const std::string& getRequestId(
const std::string& request) {
static const std::string _default{""};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code is going to be executed multithreaded then this is not safe to use static variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe due to the fact we return const ref, which will never change. Is it correct? @atobiszei

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad you're right

const std::string& endpointName,
const std::string& endpointVersion,
stream_types_mapping_t& inputTypes) {
return StatusCode::OK;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this complete implementation? Don't we need to do anything here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is complete implementation, the validation happens before reaching MP graph (in http parser) and will exist in MP calculator (for chat completion parameters)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, having this call is required because gRPC KServe path implements it...

@@ -92,6 +103,9 @@ const std::string HttpRestApiHandler::kfs_serverliveRegexExp =
const std::string HttpRestApiHandler::kfs_servermetadataRegexExp =
R"(/v2)";

const std::string HttpRestApiHandler::oai_chatCompletionRegexExp =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::string HttpRestApiHandler::oai_chatCompletionRegexExp =
const std::string HttpRestApiHandler::openai_chatCompletionRegexExp =

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what do you think about enum class member being named OAI_ChatCompletion? It is aligned with KFS_ TFS_

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd go with OpenAI name all the way. It's explicit and doesn't close the path for other APIs starting with O 馃槃

src/http_rest_api_handler.cpp Outdated Show resolved Hide resolved
src/http_rest_api_handler.cpp Outdated Show resolved Hide resolved
@dkalinowski dkalinowski merged commit 37d69bf into main May 17, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants