Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoGenerateDuplicateColumnNames not being Respected in CsvReader #254

Open
xtens-digital opened this issue Nov 21, 2022 · 5 comments
Open
Labels

Comments

@xtens-digital
Copy link

xtens-digital commented Nov 21, 2022

I think there is a bug somewhere with AutoGenerateDuplicateColumnNames not being respected:

using (var r = ChoCSVReader.LoadText(CSVText)
.WithDelimiter("$$")
.WithFirstLineHeader()
.MayHaveQuotedFields()
.AutoIncrementDuplicateColumnNames(0, true)
.IgnoreCase(true)
.WithEOLDelimiter("$EOL$")
)
{
using (var w = new ChoParquetWriter(entryStream))
{
w.Write(r);
}
}
This appears to allow data to be read without throwing duplicate column error, however when trying to then do anything with ChoCSVReader it throws an exception which is presumably by creating a new dictionary.

at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException[T](T key) at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior) at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer) at ChoETL.ChoCSVRecordConfiguration.Validate(Object state) at ChoETL.ChoCSVRecordReader.<>c__DisplayClass24_0.<AsEnumerable>b__0(Tuple`2 pairElement) at ChoETL.ChoPeekEnumerator`1.MoveToNext() at ChoETL.ChoPeekEnumerator`1.TryFetchPeek() at ChoETL.ChoPeekEnumerator`1.get_Peek() at ChoETL.ChoCSVRecordReader.<AsEnumerable>d__24.MoveNext() at ChoETL.ChoCSVReader`1.<>c__DisplayClass59_0.<GetEnumerator>b__0() at ChoETL.ChoEnumeratorWrapper.<BuildEnumerable>d__0`1.MoveNext() at System.Linq.Enumerable.<OfTypeIterator>d__62`1.MoveNext() at ChoETL.ChoParquetRecordWriter.GetFirstNotNullRecord(IEnumerator`1 recEnum) at ChoETL.ChoParquetRecordWriter.<WriteTo>d__37.MoveNext() at ChoETL.ChoUtility.Loop(IEnumerable e, Action preActionCallback, Action`1 postActionCallback) at ChoETL.ChoParquetWriter`1.Write(IEnumerable`1 records)

Any thoughts or ways around this. Yes duplicate columns not ideal, but my understanding is AutoIncrementDuplicateColumnNames should facilitate this issue until we can address root cause?

@Cinchoo
Copy link
Owner

Cinchoo commented Nov 24, 2022

thanks for reporting it, fixed the issue. Pls take v1.2.1.51-beta4 and give it try.

@xtens-digital
Copy link
Author

xtens-digital commented Dec 16, 2022

Took package 1.0.1.25-beta1 - still seeing issue?

  using (var r = ChoCSVReader.LoadText(Info, new ChoCSVRecordConfiguration { MaxLineSize = _options.ChoReaderMaxLineSize })
            .WithDelimiter(columnDelimeter)
            .WithFirstLineHeader()
            .MayHaveQuotedFields()
            .AutoIncrementDuplicateColumnNames(0, true)
            .IgnoreCase(true)
            .WithEOLDelimiter(endOfLineDelimeter)
            )
            {
                using (var w = new ChoParquetWriter(patientStream.Stream))
                {
                    w.Write(r);
                }
            }

at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException[T](T key)
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
   at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer)
   at ChoETL.ChoCSVRecordConfiguration.Validate(Object state)
   at ChoETL.ChoCSVRecordReader.<>c__DisplayClass24_0.<AsEnumerable>b__0(Tuple`2 pairElement)
   at ChoETL.ChoPeekEnumerator`1.MoveToNext()
   at ChoETL.ChoCSVRecordReader.<AsEnumerable>d__24.MoveNext()
   at ChoETL.ChoEnumeratorWrapper.<BuildEnumerable>d__0`1.MoveNext()
   at System.Linq.Enumerable.<OfTypeIterator>d__61`1.MoveNext()
   at ChoETL.ChoParquetRecordWriter.GetFirstNotNullRecord(IEnumerator`1 recEnum)
   at ChoETL.ChoParquetRecordWriter.<WriteTo>d__37.MoveNext()
   at ChoETL.ChoParquetWriter`1.Write(IEnumerable`1 records)```

@Cinchoo
Copy link
Owner

Cinchoo commented Dec 18, 2022

u must take v1.2.1.51-beta4 for this issue to work.

@xtens-digital
Copy link
Author

I did not see this version published! 1.0.1.25-beta1 was the latest. Additionally in this version all parquet file cell values became null.

@Cinchoo
Copy link
Owner

Cinchoo commented Dec 20, 2022

well, you need to update the base lib to 1.2.1.51-beta4 at https://www.nuget.org/packages/ChoETL.NETStandard/1.2.1.51-beta4. (csv parser is in this lib)

you can pick latest parquet lib at https://www.nuget.org/packages/ChoETL.Parquet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants