Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Failed choosing local CPU in Object detection scenario, 'TorchSharp.torch' threw an exception #2891

Open
NickAtMixxus opened this issue Mar 28, 2024 · 4 comments
Labels
GS Tutorial Bug from customer using Getting Started Tutorial

Comments

@NickAtMixxus
Copy link

System Information (please complete the following information):

  • 17.18.2.2415501
  • Visual Studio Version: Visual Studio Community 2022, 17.9.5 (Installer reports All installations are up to date)

Describe the bug
Training Failed choosing local CPU in Object detection scenario

To Reproduce
In .mbconfig UI Scenario, choose Object detection, in Environment choose Local (CPU), Next step, in Data add .json file, created from vott after labeling images. (Images are then visible in Data Preview
Next step, Train
Training starts but is soon interrupted with Model Builder Error:
The type initializer for 'TorchSharp.torch' threw an exception.

Expected behavior
Expected to have a completed Training or some info if something is not installed or missing.

Additional context
Something may be wrong in my setup, I have had an older version of Model builder that I could train successfully. I have tried new projects, a clean install of Visual Studio and removed and reinstalled , both auto and manual installation of Model Builder (from Visual Studio Marketplace), tried two sets of different images, tagged in Vott.

Also noticing that under Advanced training options, in the fields Score threshold and IOU threshold the default settings (0.35 and 0.5) has red text next to them: "Not a valid input. The input must be a valid float and m"... it's impossible to read the rest of the sentence, and you can't expand the window. Trying to change those values have so far made no difference.

Some more details
From Log file
End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLEngine.d__21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 212

@NickAtMixxus NickAtMixxus added the GS Tutorial Bug from customer using Getting Started Tutorial label Mar 28, 2024
@vulvquang
Copy link

I had the same error, however you can try to use Object Detection on Aruze Machine Learning.

@NickAtMixxus
Copy link
Author

Thanks, good to know! I'll just have to wait for some update then.
About Azure, I thought I'd test on CPU/GPU first to not spend anything before I know I have some useful images. But I tried to start the Azure alternative but I'm not done yet. Even though watching a bunch of quick start videos. Even though I could create a free account it is pretty hard to understand which way to go to create a compute. In the Azure portal or in the Azure Machine Learning Studio? The profile looks a bit different in portal vs studio. (There are so many docs and tutorials for those with different dates also docs for AI Studio, AutoML Computer vision.) I get the impression that Azure and Machine Learning area is frequently updated and changed and it's a bit confusing now to keep up.

@LittleLittleCloud
Copy link
Contributor

@NickAtMixxus Can you share the complete log of model builder(where you can find the url on visual studio output window),, from you mention (The type initializer for 'TorchSharp.torch' threw an exception.) I wonder if this error is caused by the failure of loading torch.dll

@NickAtMixxus
Copy link
Author

@LittleLittleCloud

OK. Just a quick update, I have also tried training on GPU, that training is also quickly aborted with an error message. (I have installed CUDA 10.1 since that was mentioned in an Image Classification tutorial, I tried previously with a recent CUDA version but did not seem to matter.)

Attaching files with message from Output window from CPU training and GPU training, and complete LOG files also from both CPU and GPU training.
OutputWinObjDetTrainCPU.txt
OutputWinObjDetTrainGPU.txt
LOGcopyCPUtrain.txt
LOGcopyGPUtrain.txt

And here are the error messages for each training

Error Message CPU Training

The type initializer for 'TorchSharp.torch' threw an exception.

at Microsoft.ML.TorchSharp.Utils.TorchUtils.InitializeDevice(IHostEnvironment env)
at Microsoft.ML.TorchSharp.AutoFormerV2.ObjectDetectionTrainer.Trainer..ctor(ObjectDetectionTrainer parent, IChannel ch, IDataView input)
at Microsoft.ML.TorchSharp.AutoFormerV2.ObjectDetectionTrainer.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at Microsoft.ML.AutoML.SweepablePipelineRunner.Run(TrialSettings settings)
at Microsoft.ML.AutoML.SweepablePipelineRunner.RunAsync(TrialSettings settings, CancellationToken ct)
at Microsoft.ML.AutoML.AutoMLExperiment.d__24.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLService.LocalObjectDetectionExperiment.d__13.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/LocalObjectDetectionExperiment.cs:line 133
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLEngine.d__21.MoveNext() in /
/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 212

Error Message GPU Training

The type initializer for 'TorchSharp.torch' threw an exception.
at TorchSharp.torch.TryInitializeDeviceType(DeviceType deviceType)
at Microsoft.ML.ModelBuilder.AutoMLService.LocalObjectDetectionExperiment.d__13.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/LocalObjectDetectionExperiment.cs:line 53
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLEngine.d__21.MoveNext() in /
/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GS Tutorial Bug from customer using Getting Started Tutorial
Projects
None yet
Development

No branches or pull requests

3 participants