Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? #1070

Open
tuanavu opened this issue Sep 21, 2023 · 1 comment
Labels
question Further information is requested

Comments

@tuanavu
Copy link

tuanavu commented Sep 21, 2023

❓ Questions & Help

Details

Hi, I have been experimenting with an existing TF2 model using the merlin-tensorflow image. This has allowed me to leverage the SOK toolkit for the SparseEmbedding Layer. Post training of the new TF2 model with SOK, I find that I need to separately export the sok_model and the tf2 model. The resulting outputs are as follows:

  • sok_model: This results in a collection of files named EmbeddingVariable_*_keys.file and EmbeddingVariable_*_values.file.
  • tf2 model: This exports saved_model.pb, variables files.

When I need to execute a local test prediction request, I have to load both models independently. I then call the inference_step as follows:

# Load the model
sok_model.load_pretrained_embedding_table()

tf_model = tf.saved_model.load(save_dir)

# Inference steps
@tf.function(experimental_relax_shapes=True, reduce_retracing=True)
def inference_step(inputs):
    return tf_model(sok_model(inputs, training=False), training=False)

# Call inference
res = inference_step(inputs)

Questions

  • Serving the Model: I'm interested in how to serve this model in AWS EKS using the Triton Inference Server. What would be the required structure? Should I treat it as an ensemble model that includes both the sok and TensorFlow 2 backends? Which would be the most suitable backend - HugeCTR, TensorFlow 2, or something else? Do you have any guides or resources that can help me with this?
  • Converting the Model to ONNX: According to the Hierarchical Parameter Server Demo, HugeCTR can load both the sparse and dense models and convert them to a single ONNX model. I'm wondering how I can perform a similar conversion for this merlin-tensorflow model that uses the SOK toolkit and exports both the sparse and dense model.

Environment details

  • Merlin version: nvcr.io/nvidia/merlin/merlin-tensorflow:23.02
@tuanavu tuanavu added the question Further information is requested label Sep 21, 2023
@tuanavu tuanavu changed the title [QST] How do you serve merlin-tensorflow model in Triton [QST] How to serve merlin-tensorflow model in Triton Inference Server and convert it to ONNX? Sep 22, 2023
@rnyak
Copy link
Contributor

rnyak commented Oct 3, 2023

@FDecaYed fyi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants