Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment requirements based on libtorch or ONNX #358

Closed
wangrui9720 opened this issue Apr 9, 2024 · 4 comments
Closed

Deployment requirements based on libtorch or ONNX #358

wangrui9720 opened this issue Apr 9, 2024 · 4 comments
Labels
resolution:WAI The software is working as intended

Comments

@wangrui9720
Copy link

After training ctgan, we hope to use C++ to call this model to work in real time. After trying, ctgan can't be deployed in torchscript and other formats, because the input and output data of ctgan are based on python's pandas library, while the input and output of libtorch are required to be in tensor format. We really need to provide a deployment method based on C++, which can improve the efficiency of software operation. We look forward to your proposal!

@wangrui9720 wangrui9720 added the new Label applied to new issues label Apr 9, 2024
@sdv-team
Copy link
Contributor

Hi @wangrui9720! It’s great to see your interest in the SDV ecosystem. This comment is a reminder to consult your legal before adopting the SDV into your project, as SDV (and most of the related libraries such as CTGAN) has source-available, BSL license.

For more information, you can read through our license FAQs (not legal advice) or our blog. For any other questions, please refer to our Support Page. You can also inquire about a commercial license to allow additional use.

@srinify
Copy link

srinify commented May 9, 2024

Hi there @wangrui9720 do you mind sharing a bit more about your use case? A few suggestions to consider:

  • GaussianCopulaSynthesizer, from SDV, is an alternative model that is significantly faster than our GAN based models like CTGAN. SDV is our batteries-included framework that sits one level above CTGAN and offers a better user experience.
  • To speed up CTGAN model training time, you can often get very good synthetic data quality with less rows than you think. You can read more about our thinking and advice here.

@srinify srinify added under discussion Issue is currently being discussed and removed new Label applied to new issues labels May 9, 2024
@srinify srinify changed the title Deployment requirements based on libtorch or onnx! Deployment requirements based on libtorch or ONNX May 9, 2024
@wangrui9720
Copy link
Author

Hi there @wangrui9720 do you mind sharing a bit more about your use case? A few suggestions to consider:

  • GaussianCopulaSynthesizer, from SDV, is an alternative model that is significantly faster than our GAN based models like CTGAN. SDV is our batteries-included framework that sits one level above CTGAN and offers a better user experience.
  • To speed up CTGAN model training time, you can often get very good synthetic data quality with less rows than you think. You can read more about our thinking and advice here.

This is the code that I call the trained ctgan model.

from ctgan import CTGAN
import pandas as pd

def load_ctgan_model():
model_path = 'Z:/project/pkl/ctgan-test.pkl'
ctgan = CTGAN.load(model_path)
return ctgan

def get_welding_parameters(ctgan, NG_piece, desired_rows=500, batch_size=100):

conditioned_data_list = []

while len(conditioned_data_list) < desired_rows:
   
    generated_data = ctgan.sample(batch_size)

    new_data = generated_data[generated_data[slice] == NG_piece]
 
    conditioned_data_list.extend(new_data.values)


conditioned_data = pd.DataFrame(conditioned_data_list, columns=generated_data.columns)

if len(conditioned_data) > desired_rows:
    conditioned_data = conditioned_data.iloc[:desired_rows]

average_welding_time = conditioned_data[time(ms)].mean()
average_welding_temp = conditioned_data[temp(℃)'].mean()

return average_welding_time, average_welding_temp

When I want to deploy the trained ctgan code for real-time output, I can only call this python code with c++. The Gaussiancoupulaasynthesizer you mentioned is also the python code that needs me to call Gaussiancoupulaasynthesizer with c++ to train, right? Looking forward to your reply!

@srinify
Copy link

srinify commented May 21, 2024

Ah now I understand @wangrui9720 you're correct that CTGAN and SDV don't actually currently support portability of just the machine learning model. The pkl file also contain a lot of Python library context because all that context is usually needed to run the Synthesizer capabilities to generate synthetic data.

We have a feature request issue in SDV to enable the exporting of just the model weights: sdv-dev/SDV#1970

I'll close this issue off and will add your use case over there so we can collect more examples for the team to prioritize! Thanks!

@srinify srinify closed this as completed May 21, 2024
@srinify srinify added resolution:WAI The software is working as intended and removed under discussion Issue is currently being discussed labels May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolution:WAI The software is working as intended
Projects
None yet
Development

No branches or pull requests

3 participants