New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoGenerateDuplicateColumnNames not being Respected in CsvReader #254
Comments
thanks for reporting it, fixed the issue. Pls take v1.2.1.51-beta4 and give it try. |
Took package 1.0.1.25-beta1 - still seeing issue?
|
u must take v1.2.1.51-beta4 for this issue to work. |
I did not see this version published! 1.0.1.25-beta1 was the latest. Additionally in this version all parquet file cell values became null. |
well, you need to update the base lib to 1.2.1.51-beta4 at https://www.nuget.org/packages/ChoETL.NETStandard/1.2.1.51-beta4. (csv parser is in this lib) you can pick latest parquet lib at https://www.nuget.org/packages/ChoETL.Parquet |
I think there is a bug somewhere with AutoGenerateDuplicateColumnNames not being respected:
using (var r = ChoCSVReader.LoadText(CSVText)
.WithDelimiter("$$")
.WithFirstLineHeader()
.MayHaveQuotedFields()
.AutoIncrementDuplicateColumnNames(0, true)
.IgnoreCase(true)
.WithEOLDelimiter("$EOL$")
)
{
using (var w = new ChoParquetWriter(entryStream))
{
w.Write(r);
}
}
This appears to allow data to be read without throwing duplicate column error, however when trying to then do anything with ChoCSVReader it throws an exception which is presumably by creating a new dictionary.
at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException[T](T key) at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior) at System.Linq.Enumerable.ToDictionary[TSource,TKey](IEnumerable`1 source, Func`2 keySelector, IEqualityComparer`1 comparer) at ChoETL.ChoCSVRecordConfiguration.Validate(Object state) at ChoETL.ChoCSVRecordReader.<>c__DisplayClass24_0.<AsEnumerable>b__0(Tuple`2 pairElement) at ChoETL.ChoPeekEnumerator`1.MoveToNext() at ChoETL.ChoPeekEnumerator`1.TryFetchPeek() at ChoETL.ChoPeekEnumerator`1.get_Peek() at ChoETL.ChoCSVRecordReader.<AsEnumerable>d__24.MoveNext() at ChoETL.ChoCSVReader`1.<>c__DisplayClass59_0.<GetEnumerator>b__0() at ChoETL.ChoEnumeratorWrapper.<BuildEnumerable>d__0`1.MoveNext() at System.Linq.Enumerable.<OfTypeIterator>d__62`1.MoveNext() at ChoETL.ChoParquetRecordWriter.GetFirstNotNullRecord(IEnumerator`1 recEnum) at ChoETL.ChoParquetRecordWriter.<WriteTo>d__37.MoveNext() at ChoETL.ChoUtility.Loop(IEnumerable e, Action preActionCallback, Action`1 postActionCallback) at ChoETL.ChoParquetWriter`1.Write(IEnumerable`1 records)
Any thoughts or ways around this. Yes duplicate columns not ideal, but my understanding is AutoIncrementDuplicateColumnNames should facilitate this issue until we can address root cause?
The text was updated successfully, but these errors were encountered: