[Serve] Add Cpp Deployment in Ray Serve #34

MingYueQ · 2023-06-30T15:20:40Z

Summary

General Motivation

In scenarios such as search, inference, and others, system performance is of utmost importance. Taking the model inference and search engines of Ant Group as an example, these systems require high throughput, low latency, and high concurrency as they handle a massive amount of business. In order to meet these requirements, they have all chosen C++ for designing and developing their systems, ensuring high efficiency and stability. Currently, these systems plan to run on Ray Serve to enhance their distributed capabilities. Therefore, Ray Serve needs to provide C++ deployment so that users can easily deploy their services.

Should this change be within `ray` or outside?

main ray project. Changes are made to the Ray Serve level.

Stewardship

Required Reviewers

@sihanwang41
@edoakes
@akshay-anyscale

Shepherd of the Proposal (should be a senior committer)

@sihanwang41
@edoakes
@akshay-anyscale

Design and Architecture

Example Model

Taking the recommendation system as an example, when user inputs a search word, the system returns recommended words that are similar to it.

The RecommendService receives user requests and calls the FeatureService, SimilarityService, and RankService to calculate similar words, then return the results to users:

#pragma once

#include <memory>

#include "FeatureService.h"
#include "SimilarityService.h"
#include "RankService.h"

namespace ray {
namespace serve {

class RecommendService {
 public:
  RecommendService() {
    feature_service_ = std::make_shared<FeatureService>();
    similarity_service_ = std::make_shared<SimilarityService>();
    rank_service_ = std::make_shared<RankService>();
  }

  std::vector<std::string> Recommend(const std::string &request, const int num) {
    // 1. Calculate vector
    std::vector<float> features = feature_service_->GetVector(request);

    // 2. Calculate similarity
    std::unordered_map<std::string, float> similarities = similarity_service_->GetSimilarity(features);

    // 3. Get similar words
    std::vector<std::string> result = rank_service_->Rank(similarities, num);
    return result;
}

 private:
  std::shared_ptr<FeatureService> feature_service_;
  std::shared_ptr<SimilarityService> similarity_service_;
  std::shared_ptr<RankService> rank_service_;
};

}  // namespace serve
}  // namespace ray

The FeatureService convert requests to vector:

#pragma once

#include <vector>
#include <string>
#include <word2vec.h>

namespace ray {
namespace serve {

class FeatureService {
 public:
  FeatureService() {
    model_.load("model/word2vec.bin");
  }

  ~FeatureService() {
    delete model_;
    model_ = nullptr;
  }

  std::vector<float> GetVector(const std::string &request) {
    return model_.getVector(request);
  }

 private:
  word2vec::Word2Vec model_;
};

}  // namespace serve
}  // namespace ray

The SimilarityService is used to calculate similarity:

#pragma once

#include <vector>
#include <string>
#include <cmath>
#include <unordered_map>

namespace ray {
namespace serve {

class SimilarityService {
 public:
  std::unordered_map<std::string, float> GetSimilarity(std::vector<float> request_vec) {
    std::unordered_map<std::string, float> result;
    for (auto it = recommend_cache_.begin(); it != recommend_cache_.end(); it++) {
      result.insert({it->first, ComputeCosineSimilarity(request_vec, it->second)});
    }
    return result;
  }

 private:
  float ComputeCosineSimilarity(std::vector<float> v1, std::vector<float> v2) {
    int len = v1.size();
    float dotProduct = 0;
    float magnitude1 = 0;
    float magnitude2 = 0;
    for (int i = 0; i < len; i++) {
      dotProduct += v1[i] * v2[i];
      magnitude1 += v1[i] * v1[i];
      magnitude2 += v2[i] * v2[i];
    }
    magnitude1 = std::sqrt(magnitude1);
    magnitude2 = std::sqrt(magnitude2);
    return dotProduct / (magnitude1 * magnitude2);
  }

  std::unordered_map<std::string, std::vector<float>> recommend_cache_ = {
    {"mac", {1.5, 2.3, 3.5, 5.5}},
    {"car", {1.5, 3.2, 3.9, 7.5}},
    {"phone", {1.5, 2.0, 4.5, 8.1}},
  };
};

}  // namespace serve
}  // namespace ray

The RankService is used to sort based on similarity:

#pragma once

#include <vector>
#include <string>
#include <queue>
#include <unordered_map>
#include <algorithm>

namespace ray {
namespace serve {

class RankService {
 public:
  std::vector<std::string> Rank(std::unordered_map<std::string, float> recommends, int num) {
    std::priority_queue<std::pair<std::string, float>, std::vector<std::pair<std::string, float>>, std::greater<std::pair<std::string, float>>> pq;
    for (auto& pair : recommends) {
        pq.push(pair);
        if (pq.size() > num) {
            pq.pop();
        }
    }

    std::vector<std::string> result;
    while (!pq.empty()) {
        result.push_back(pq.top().first);
        pq.pop();
    }
    std::reverse(result.begin(), result.end());
    return result;
  }
};

}  // namespace serve
}  // namespace ray

This is the code that uses the RecommendService class:

#include "RecommendService.h"

int main(int argc, char *argv[]) {
  ray::serve::RecommendService recommend_service;
  std::vector<std::string> recommends = recommend_service.Recommend("computer", 1);
  for(std::string recommend_word : recommends) {
    printf("Recommend word is %s", recommend_word.c_str());
  }
  return 0;
}

In this way, all services need to be deployed together, which increases the system load and is not conducive to expansion.

Converting to a Ray Serve Deployment

Through Ray Serve, the core computing logic can be deployed as a scalable distributed service.
First, convert these Services to run on Ray Serve.
FeatureService:

#pragma once

#include <vector>
#include <string>
#include <word2vec.h>

namespace ray {
namespace serve {

class FeatureService {
 public:
  FeatureService() {
    model_.load("model/word2vec.bin");
  }

  ~FeatureService() {
    delete model_;
    model_ = nullptr;
  }

  std::vector<float> GetVector(const std::string &request) {
    return model_.getVector(request);
  }

  static FeatureService *FactoryCreate() {
    return new FeatureService();
  }
 private:
  word2vec::Word2Vec model_;
};

// Register function
SERVE_DEPLOYMENT(FeatureService::FactoryCreate);

}  // namespace serve
}  // namespace ray

SimilarityService:

#pragma once

#include <vector>
#include <string>
#include <cmath>
#include <unordered_map>

namespace ray {
namespace serve {

class SimilarityService {
 public:
  std::unordered_map<std::string, float> GetSimilarity(std::vector<float> request_vec) {
    std::unordered_map<std::string, float> result;
    for (auto it = recommend_cache_.begin(); it != recommend_cache_.end(); it++) {
      result.insert({it->first, ComputeCosineSimilarity(request_vec, it->second)});
    }
    return result;
  }

  static SimilarityService *FactoryCreate() {
    return new SimilarityService();
  }

 private:
  float ComputeCosineSimilarity(std::vector<float> v1, std::vector<float> v2) {
    int len = v1.size();
    float dotProduct = 0;
    float magnitude1 = 0;
    float magnitude2 = 0;
    for (int i = 0; i < len; i++) {
      dotProduct += v1[i] * v2[i];
      magnitude1 += v1[i] * v1[i];
      magnitude2 += v2[i] * v2[i];
    }
    magnitude1 = std::sqrt(magnitude1);
    magnitude2 = std::sqrt(magnitude2);
    return dotProduct / (magnitude1 * magnitude2);
  }

  std::unordered_map<std::string, std::vector<float>> recommend_cache_ = {
    {"mac", {1.5, 2.3, 3.5, 5.5}},
    {"car", {1.5, 3.2, 3.9, 7.5}},
    {"phone", {1.5, 2.0, 4.5, 8.1}},
  };
};

// Register function
SERVE_DEPLOYMENT(SimilarityService::FactoryCreate);

}  // namespace serve
}  // namespace ray

RankService:

#pragma once

#include <vector>
#include <string>
#include <queue>
#include <unordered_map>
#include <algorithm>

namespace ray {
namespace serve {

class RankService {
 public:
  std::vector<std::string> Rank(std::unordered_map<std::string, float> recommends, int num) {
    std::priority_queue<std::pair<std::string, float>, std::vector<std::pair<std::string, float>>, std::greater<std::pair<std::string, float>>> pq;
    for (auto& pair : recommends) {
        pq.push(pair);
        if (pq.size() > num) {
            pq.pop();
        }
    }

    std::vector<std::string> result;
    while (!pq.empty()) {
        result.push_back(pq.top().first);
        pq.pop();
    }
    std::reverse(result.begin(), result.end());
    return result;
  }

  static RankService *FactoryCreate() {
    return new RankService();
  }
};

// Register function
SERVE_DEPLOYMENT(RankService::FactoryCreate);

}  // namespace serve
}  // namespace ray

RecommendService is a sequential invocation of other services without complex processing logic, so we can directly use the DAG ability to connect these services, eliminating the need for RecommendService and simplifying user logic.
Next, we start the Ray Serve runtime and use Python Serve API deploy these Service as Deployment:

feature_service = serve.deployment(_func_or_class='FeatureService::FactoryCreate', name='feature_service', language='CPP')
similarity_service = serve.deployment(_func_or_class='SimilarityService::FactoryCreate', name='similarity_service', language='CPP')
rank_service = serve.deployment(_func_or_class='RankService::FactoryCreate', name='rank_service', language='CPP')

with InputNode() as input:
    features = feature_service.GetVector.bind(input[0])
    similarities = similarity_service.GetSimilarity.bind(features)
    rank_result = rank_service.Rank.bind(similarities, input[1])

graph = DAGDriver.bind(rank_result, http_adapter=json_request)
handle = serve.run(graph)
ref = handle.GetVector.remote()
result = ray.get(ref)
print(result)

Calling Ray Serve Deployment with HTTP

curl http://127.0.0.1:8000?request=computer&num=1

Overall Design

Ray Serve maintains Controller Actor and Ingress Actor. So these two roles are not related to the language of the user's choice. And they have the ability to manage cross-language deployments and route requests.

C++ Case Deduction

The businesses can send control commands to the Controller Actor of Ray Serve, which includes creating ingress, creating deployment, etc. When publishing C++ online services, the DeploymentState component needs to create C++ Deployment Actors. Users can call their business logic in Python/Java Driver, Ray task, or Ray Actor, and the requests will be dispatched to C++ Deployment Actors.

Package

C++ programs are typically compiled and packaged into three types of results: binary, static library, shared library.

binary: C++ does not have a standard load binary API, and binaries compiled by different compilers may not be the same. Directly loading the binary can result in many uncontrollable factors.
static library: Static library bundle all their dependencies together, whereas common utility libraries such as glog and boost may be loaded via dynamic dependencies in the Ray Serve Deployment. This can lead to a higher probability of conflicts when the same library exists as both a dynamic and static dependency.
shared library: C++ provides a standard API for loading shared library, and it can reduce memory usage and prevent conflicts.

In conclusion, the business needs to package the system as a shared library to run it on Ray Serve.

Register function

Ray Serve will add SERVE_FUNC and SERVE_DEPLOYMENT macros to publish user Service as Deployment.
SERVE_FUNC: Resolving overloaded function registration;
SERVE_DEPLOYMENT: Publishing user Service as Deployment.
Example:

static RecommendService *CreateRecommendService(std::string request) {
  return new RecommendService(request);
}

static RecommendService *CreateRecommendService(std::string request, int num) {
  return new RecommendService(request, num);
}

static FeatureService *CreateFeatureService() {
  return new FeatureService();
}

// Register function
SERVE_DEPLOYMENT(SERVE_FUNC(CreateRecommendService, std::string, int),
			    CreateFeatureService);

Compatibility, Deprecation, and Migration Plan

The new feature is to add C++ deployment for Ray Serve, without modifying or deprecating existing functionalities. The changes to the Ray Serve API are also adding new capabilities.

Test Plan and Acceptance Criteria

Unit and integration test for core components
Benchmarks on C++ Deployment

(Optional) Follow-on Work

Init Ray Serve C++ project structure
Create C++ Deployment through Python/Java API
ServeHandler for C++
Accessing C++ Deployment using Python/Java/C++ ServeHandle or HTTP

edoakes · 2023-07-12T23:05:36Z

@MingYueQ sorry for the delay on responding to the RFC. We have been putting out fires for Ray 2.6 release & have a company retreat this week. I will have a detailed look next week and get back to you.

The main concern is going to be around maintenance burden — the existing Java integrations are written in a way that causes some headaches because we have a lot of branches in the code that are special-cased for Java. If we decide to go ahead with the CPP extension, we’ll need to make sure to improve this across all three languages.

MingYueQ · 2023-07-16T15:55:13Z

@edoakes Thank you for your reply. We have the following suggestions regarding your concerns:

In scenarios with extremely high performance requirements, C++ Deployment is more suitable than Python Deployment and Java Deployment, which can expand the scope of Ray Serve services;
Adding C++ Deployment can give Ray Serve users more options and increase the flexibility of Ray Serve;
We will continue to invest in the development of C++ Deployment, and minimize the burden on the community.

liuyang-my · 2023-07-18T02:28:22Z

@edoakes Thank you for your reply. At the time when Java Deployment was supported, there was not a good abstraction for multiple languages. Because there was no expectation of support for other languages, Java was hardcoded in some places. In the development work related to this proposal, I will work with @MingYueQ to improve the abstraction part for multiple languages.

edoakes · 2023-07-25T19:52:43Z

As discussed offline let's re-open discussion after Ray summit when the Serve team has some more bandwidth.

MingYueQ · 2023-08-03T02:16:29Z

OK. During the period of the Ray Summit, we will also prepare more detailed information, including user guide, architecture design, benchmark, and so on.

MingYueQ · 2023-10-25T12:37:59Z

@edoakes Please review this topic again.

edoakes

Some comments inline. High level questions:

How does this relate to the existing Ray C++ actor support? Will this require any extensions of it?
Will we support the DeploymentHandle interface in C++? Or will applications be required to have a Python ingress deployment which may call into the C++ deployment? Please add an example of a Python deployment calling into a C++ one (including calling multiple different methods on it).

edoakes · 2023-10-27T21:41:55Z

reps/2023-06-30-serve-cpp-deployment.md

+similarity_service = serve.deployment(_func_or_class='SimilarityService::FactoryCreate', name='similarity_service', language='CPP')
+rank_service = serve.deployment(_func_or_class='RankService::FactoryCreate', name='rank_service', language='CPP')
+
+with InputNode() as input:


As discussed on the Java improvement REP, let's please update this to use bind( with DeploymentHandle calls instead of the call graph API (there should be no InputNode)

edoakes · 2023-10-27T21:43:12Z

reps/2023-06-30-serve-cpp-deployment.md

+RecommendService is a sequential invocation of other services without complex processing logic, so we can directly use the DAG ability to connect these services, eliminating the need for RecommendService and simplifying user logic. 
+Next, we start the Ray Serve runtime and use Python Serve API deploy these Service as Deployment: 
+```python
+feature_service = serve.deployment(_func_or_class='FeatureService::FactoryCreate', name='feature_service', language='CPP')


This looks pretty ugly: it's using a private attribute (func_or_class). Maybe we can introduce a simple wrapper class like serve.CPPDeployment(factory_function)

reps/2023-06-30-serve-cpp-deployment.md

liuyang-my · 2023-10-28T12:03:07Z

Some comments inline. High level questions:

How does this relate to the existing Ray C++ actor support? Will this require any extensions of it?

Will we support the DeploymentHandle interface in C++? Or will applications be required to have a Python ingress deployment which may call into the C++ deployment? Please add an example of a Python deployment calling into a C++ one (including calling multiple different methods on it).

Based on Ray C++ Actor, no extensions are needed at the Ray core level.
C++ DeploymentHandle will be supported, which also relates to whether the common parts of DeploymentHandle across different languages will be moved to C++. This can be implemented in stages.

edoakes · 2023-10-30T20:31:52Z

C++ DeploymentHandle will be supported, which also relates to whether the common parts of DeploymentHandle across different languages will be moved to C++. This can be implemented in stages.

Could you add some detail about this part to the REP please? What will be the deployment handle interface in C++ and roughly how would a shared core implementation work across the three languages (this part doesn't need to go into too much detail that can be left to implementation)?

Signed-off-by: wangyingjie <wangyingjie.wyj@antgroup.com>

edoakes

@MingYueQ sorry about the delay, just got back from a long vacation! The proposal is looking good. I just have one more comment for you, see inline.

I will send this out to the Ray committers mailing list for broader feedback on Monday and let's try to get it approved by end of next week.

edoakes · 2023-11-25T01:57:50Z

reps/2023-06-30-serve-cpp-deployment.md

+- Init Ray Serve C++ project structure
+- Create C++ Deployment through Python/Java API
+- ServeHandler for C++
+- Accessing C++ Deployment using Python/Java/C++ ServeHandle or HTTP


From my perspective I think supporting HTTP is a P0 requirement in order to really say that we support C++ as a language in Ray Serve.

Could you at least sketch a possible API that we could support here? It does not have to be part of the initial scope of implementation. But without having this I think the proposal is incomplete.

Yes.
Expose two primary APIs to users: SERVE_FUNC and SERVE_DEPLOYMENT, with a detailed explanation provided in the section titled "Register Function".
Furthermore, users can deploy services through these APIs, as exemplified in the "Converting to a Ray Serve Deployment" section.

I am a bit confused -- how would a C++ deployment be directly exposed via HTTP? Or are you proposing that in order to expose it via HTTP it must be placed behind a Python deployment that calls into it via DeploymentHandle?

In the example above, I see the RecommendService but this takes as arguments a std::string and an int. How are these parsed from an HTTP request? And how can an HTTP response be output?

The way of accessing via HTTP needs to be adjusted for this demo.

curl http://127.0.0.1:8000?request=computer&num=1

This is not sufficient because the type information for num=1 is lost in the http query. For cases with multiple input parameters, it is possible to transmit them as a JSON string in the HTTP body, for example:

curl -d '{"request": "computer", "num": 1}' http://127.0.0.1:8000

This way, the request flow is: -> HTTPProxy (access with DeploymentHandle) -> Cpp Deployment.
And currently, the HTTPProxy should be able to meet the requirements without any additional modifications.

@MingYueQ can adjust according to this approach.

Are there any standard HTTP -> C++ struct parsing libraries that we could lean on here to help define the input/output formatting? For example in Python we use FastAPI to handle all of this. Otherwise this will become a rabbit hole (HTTP input parsing/output formatting could be a whole library in itself).

Currently in the investigation, if C++ Deployment wants to have the capability to map the path to the user register function, it needs to implement this functionality itself. A more native approach is to implement it using macros. Specifically:

The SERVE_DEPLOYMENT macro mentioned in the REP above originally functions to generate a <method_name, std::function> mapping. In order to maintain this capability, a separate cache needs to be generated to store <path, method_name>, so that the call chain becomes: path -> method_name -> std::function. This will result in a longer call chain. Therefore, it would be better to directly generate a <path, std::function> mapping in SERVE_DEPLOYMENT. Taking RankService as an example:

#pragma once #include "api/serve_deployment.h" namespace ray { namespace serve { class RankService { public: static RankService *FactoryCreate() { return new RankService(); } }; // Register function SERVE_DEPLOYMENT("/rankService/factoryCreate", RankService::FactoryCreate); } // namespace serve } // namespace ray

In the SERVE_DEPLOYMENT, the first parameter is the path and the second parameter is the user function.

Signed-off-by: wangyingjie <wangyingjie.wyj@antgroup.com>

MingYueQ force-pushed the feature-serve-cpp-deployment branch from a8c21e7 to fcc5d72 Compare June 30, 2023 15:49

sihanwang41 assigned edoakes, akshay-anyscale and sihanwang41 Jul 3, 2023

liuyang-my closed this Jul 18, 2023

liuyang-my reopened this Jul 18, 2023

edoakes reviewed Oct 27, 2023

View reviewed changes

wangyingjie added 3 commits November 8, 2023 10:41

Add Cpp Deployment in Ray Serve

260e4b1

Signed-off-by: wangyingjie <wangyingjie.wyj@antgroup.com>

update to use bind

784e9d9

Signed-off-by: wangyingjie <wangyingjie.wyj@antgroup.com>

Introduction to deleting DAG

5b9417e

Signed-off-by: wangyingjie <wangyingjie.wyj@antgroup.com>

MingYueQ force-pushed the feature-serve-cpp-deployment branch from 5f2fb61 to 5b9417e Compare November 8, 2023 02:42

Change python api

fed4b3c

Signed-off-by: wangyingjie <wangyingjie.wyj@antgroup.com>

edoakes reviewed Nov 25, 2023

View reviewed changes

wangyingjie added 2 commits December 1, 2023 19:29

Change follow-on work

2873240

Signed-off-by: wangyingjie <wangyingjie.wyj@antgroup.com>

Change http request

8a1aa66

Signed-off-by: wangyingjie <wangyingjie.wyj@antgroup.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Add Cpp Deployment in Ray Serve #34

[Serve] Add Cpp Deployment in Ray Serve #34

MingYueQ commented Jun 30, 2023 •

edited by sihanwang41

edoakes commented Jul 12, 2023

MingYueQ commented Jul 16, 2023

liuyang-my commented Jul 18, 2023

edoakes commented Jul 25, 2023

MingYueQ commented Aug 3, 2023

MingYueQ commented Oct 25, 2023

edoakes left a comment

edoakes Oct 27, 2023

edoakes Oct 27, 2023

liuyang-my commented Oct 28, 2023

edoakes commented Oct 30, 2023 •

edited

edoakes left a comment

edoakes Nov 25, 2023

MingYueQ Dec 1, 2023

edoakes Dec 1, 2023

edoakes Dec 1, 2023 •

edited

liuyang-my Dec 4, 2023

edoakes Dec 4, 2023

MingYueQ Jan 30, 2024 •

edited

[Serve] Add Cpp Deployment in Ray Serve #34

Are you sure you want to change the base?

[Serve] Add Cpp Deployment in Ray Serve #34

Conversation

MingYueQ commented Jun 30, 2023 • edited by sihanwang41

Summary

General Motivation

Should this change be within ray or outside?

Stewardship

Required Reviewers

Shepherd of the Proposal (should be a senior committer)

Design and Architecture

Example Model

Converting to a Ray Serve Deployment

Calling Ray Serve Deployment with HTTP

Overall Design

C++ Case Deduction

Package

Register function

Compatibility, Deprecation, and Migration Plan

Test Plan and Acceptance Criteria

(Optional) Follow-on Work

edoakes commented Jul 12, 2023

MingYueQ commented Jul 16, 2023

liuyang-my commented Jul 18, 2023

edoakes commented Jul 25, 2023

MingYueQ commented Aug 3, 2023

MingYueQ commented Oct 25, 2023

edoakes left a comment

Choose a reason for hiding this comment

edoakes Oct 27, 2023

Choose a reason for hiding this comment

edoakes Oct 27, 2023

Choose a reason for hiding this comment

liuyang-my commented Oct 28, 2023

edoakes commented Oct 30, 2023 • edited

edoakes left a comment

Choose a reason for hiding this comment

edoakes Nov 25, 2023

Choose a reason for hiding this comment

MingYueQ Dec 1, 2023

Choose a reason for hiding this comment

edoakes Dec 1, 2023

Choose a reason for hiding this comment

edoakes Dec 1, 2023 • edited

Choose a reason for hiding this comment

liuyang-my Dec 4, 2023

Choose a reason for hiding this comment

edoakes Dec 4, 2023

Choose a reason for hiding this comment

MingYueQ Jan 30, 2024 • edited

Choose a reason for hiding this comment

MingYueQ commented Jun 30, 2023 •

edited by sihanwang41

Should this change be within `ray` or outside?

edoakes commented Oct 30, 2023 •

edited

edoakes Dec 1, 2023 •

edited

MingYueQ Jan 30, 2024 •

edited