Skip to content

Concatenating a range of columns in the data class into the "Features" column will lead to exception thrown #493

@hellothere33

Description

@hellothere33

Hello!

I often have CSV files with more than 50 float columns, so it's not feasible to specify each of them individually. I've failed to load them in one shot using a range/sweep specifier. To test things out in smaller scale, I used the Iris example because it ends with 4 float columns.

Here's the data class, I only added 2 lines at the end:

    public class IrisData
    {
        [Column("0")]
        public float Label;

        [Column("1")]
        public float SepalLength;

        [Column("2")]
        public float SepalWidth;

        [Column("3")]
        public float PetalLength;

        [Column("4")]
        public float PetalWidth;

        [Column("1-*", name: "Features")] // New
        public float[] Features; // New
    }

Here's the simplified pipeline, I only commented out the normal way with ColumnConcatenator:

            var pipeline = new LearningPipeline();
            pipeline.Add(new TextLoader(DataPath).CreateFrom<IrisData>(useHeader: true, separator: '\t'));
            //pipeline.Add(new ColumnConcatenator("Features",
            //                                    "SepalLength",
            //                                    "SepalWidth",
            //                                    "PetalLength",
            //                                    "PetalWidth"));
            pipeline.Add(new KMeansPlusPlusClusterer() { K = 3 });
            var model = pipeline.Train<IrisData, ClusterPrediction>();

So it worked when I load each column individually and then concatenate them in the pipeline, like the sample code says. But it always throws an exception when I use my above code:

System.Reflection.TargetInvocationException: 'Exception has been thrown by the target of an invocation.'
Inner Exception:
InvalidOperationException: Column 'Features' is a vector of variable size, which is not supported for normalizers

Please help! Thank you!

=============================================================

System information

  • OS version/distro: Windows 10
  • .NET Version (eg., dotnet --info): .Net Framework 4.7.1

Issue

  • What did you do?: trying to load a CSV's multiple float columns by specifying a range in the data class's declaration, for example: "1-4"
  • What happened?: I got an exception on the Features' size.
  • What did you expect?: that concatenating columns by specifying a range would work the same as adding a ColumnConcatenator to the pipeline.

Source code / logs

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions