SdcaMaximumEntropy trainer goes into an infinite loop if it takes already transformed data view as an input

### System information

- **OS version**: Windows 10 Pro x64
- **.NET Version**: .NET Core 3.0
- **ML.NET**: 1.5.0-preview

### Issue

**What I did**
- create data-preparation pipeline 
- create trainer SdcaMaximumEntropy 
- execute pipeline, e.g. to debug transformed data view 
- add trainer to the pipeline and execute pipeline again, with the trainer included 
 
**What happened**

If I execute pipeline once, e.g. load from enumerables into data view and then execute entire transformation chain that includes transformations and trainer, everything works fine. 

If I execute pipeline twice, first time - separately, then - as a part of entire transformation chain, it consumes 3GB of RAM memory out of 16GB available, then **training hangs indefinitely** and never ends. 
Fixed this temporarily by changing this `MaximumNumberOfIterations` option, but not sure if it's a good idea...  

**What I expect**

I expect training to stop eventually, no matter how many times I execute pipeline. 
**Check the comment on the last line in the core below.**

### Source code 

Source code is taken from this issue https://github.com/dotnet/machinelearning/issues/4903

```C#

public IEstimator<ITransformer> GetPipeline(IEnumerable<string> columns)
{
  var pipeline = Context
    .Transforms
    .Conversion
    .MapValueToKey(new[] { new InputOutputColumnPair("Label", "Strategy") })
    .Append(Context.Transforms.Concatenate("Combination", columns.ToArray())) // merge "dynamic" colums into single property
    .Append(Context.Transforms.NormalizeMinMax(new[] { new InputOutputColumnPair("Features", "Combination") })) // normalize merged columns into Features
    .Append(Context.Transforms.SelectColumns(new string[] { "Label", "Features" })); // remove everything from data view, except transformed columns

  return pipeline;
}

public IEstimator<ITransformer> GetEstimator()
{
  var options = new SdcaMaximumEntropyMulticlassTrainer.Options
  {
    // MaximumNumberOfIterations = 100  // uncomment this to fix the issue
  };

  var estimator = Context
    .MulticlassClassification
    .Trainers
    .SdcaMaximumEntropy(options)
    .Append(Context.Transforms.Conversion.MapKeyToValue(new[]
    {
      new InputOutputColumnPair("Prediction", "PredictedLabel") // set trainer to use Prediction property as output
    }));

  return estimator;
}

public void TrainModel(IEnumerable<string> columns, IEnumerable<InputModel> items)
{
  var estimator = GetEstimator();
  var pipeline = GetPipeline(columns);
  var inputs = Context.Data.LoadFromEnumerable(items);  // create view 

  // If I stop execution here, everything is ok

  var model = pipeline.Append(estimator).Fit(inputs);  // works fine for the data view loaded from enumerables

  // Data preparation pipeline is a part of a transformation chain, so I don't need next 2 lines, but I don't understand why it's causing the issue
  
  var pipelineModel = pipeline.Fit(inputs);  
  var pipelineView = pipelineModel.Transform(inputs); // execute pipeline before the training
  var model = pipeline.Append(estimator).Fit(pipelineView); // use transformed pipelineView instead of initial inputs and ... go into infinite loop ... why?
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SdcaMaximumEntropy trainer goes into an infinite loop if it takes already transformed data view as an input #4926

System information

Issue

Source code

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SdcaMaximumEntropy trainer goes into an infinite loop if it takes already transformed data view as an input #4926

Description

System information

Issue

Source code

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions