New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Handling of datetimes not working as expected #256
Comments
pls take https://www.nuget.org/packages/ChoETL.Parquet/1.0.1.25-beta2 and give it try. Let me know. |
Apologies for the delay as I've been on other tasks. When using var reader = new ChoParquetReader(path, new ChoParquetRecordConfiguration
{
ParquetOptions = { TreatBigIntegersAsDates = true }
});
var dt = reader.AsDataTable(); (path is just a string path to a parquet file saved using Python 3.9.12) I get System.MissingMethodException: Method not found: 'System.Collections.Generic.IDictionary`2<System.String,System.Object> ChoETL.ChoRecordReader.MigrateToNewSchema(System.Collections.Generic.IDictionary`2<System.String,System.Object>, System.Collections.Generic.IDictionary`2<System.String,System.Type>)'.
at ChoETL.ChoParquetRecordReader.<AsEnumerable>d__25.MoveNext()
at ChoETL.ChoParquetRecordReader.<AsEnumerable>d__20.MoveNext()
at ChoETL.ChoParquetReader`1.<>c__DisplayClass40_0.<GetEnumerator>b__0()
at ChoETL.ChoEnumeratorWrapper.ChoEnumeratorWrapperInternal`1.MoveNext()
at ChoETL.ChoEnumeratorWrapper.<BuildEnumerable>d__0`1.MoveNext()
at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
at System.Linq.Enumerable.<OfTypeIterator>d__95`1.MoveNext()
at ChoETL.ChoPeekEnumerator`1.MoveToNext()
at ChoETL.ChoPeekEnumerator`1.MoveNext()
at ChoETL.ChoEnumerableDataReader..ctor(IEnumerable collection, IChoDeferedObjectMemberDiscoverer dom)
at ChoETL.ChoEnumerableEx.AsDataReader(IEnumerable collection, Action`1 membersDiscovered, String[] selectedFields, String[] excludeFields)
at ChoETL.ChoParquetReader`1.AsDataReader(Action`1 membersDiscovered)
at ChoETL.ChoParquetReader`1.AsDataTable(String tableName)
at Risk.ChoETL.ChoETLArrow.ReadParquet(String path) in D:\Code\Prototype\Parquet\Risk.ChoETL\ChoETLArrow.cs:line 24 Packages installed (Framework 4.8) were <?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="ChoETL.NETStandard" version="1.2.1.61" targetFramework="net48" />
<package id="ChoETL.Parquet" version="1.0.1.25-beta2" targetFramework="net48" />
<package id="IronSnappy" version="1.2.2" targetFramework="net48" />
<package id="Microsoft.CSharp" version="4.4.1" targetFramework="net48" />
<package id="Newtonsoft.Json" version="13.0.1" targetFramework="net48" />
<package id="Parquet.Net" version="3.7.4" targetFramework="net48" />
<package id="System.Buffers" version="4.5.1" targetFramework="net48" />
<package id="System.CodeDom" version="4.4.0" targetFramework="net48" />
<package id="System.ComponentModel.Annotations" version="4.4.1" targetFramework="net48" />
<package id="System.Configuration.ConfigurationManager" version="4.4.1" targetFramework="net48" />
<package id="System.Data.SqlClient" version="4.8.5" targetFramework="net48" />
<package id="System.Memory" version="4.5.4" targetFramework="net48" />
<package id="System.Numerics.Vectors" version="4.5.0" targetFramework="net48" />
<package id="System.Reflection.Emit" version="4.3.0" targetFramework="net48" />
<package id="System.Reflection.Emit.Lightweight" version="4.7.0" targetFramework="net48" />
<package id="System.Runtime.CompilerServices.Unsafe" version="4.5.3" targetFramework="net48" />
</packages> I couldn't find my original prototype code for testing so I started from scratch. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When running ChoETL.Parquet 1.0.1.24 with ChoETL.NetStandard 1.2.1.50 I am having an issue retrieving datetime values.
My aim is to be able to use Parquet as an information exchange format between Python and .Net. It has the potential to do so, but problems with handling dates consistently between the two languages when using the file format are proving a sticking point when large time-series are your game. I would like to be able to store/read .Net DateTime values which I would expect to convert to/from datetime64[ns].
Reading
If I create a simple parquet file from Python
and I read this back in using
ChoParquetReader
withParquetOptions = { TreatBigIntegersAsDates = true }
in theChoParquetRecordConfiguration
into a DataTable usingreader.AsDataTable()
the column is just a big integer.Am I misunderstanding this as I would expect the option setting to have caused the integer to be converted back to a DateTime?
Writing
When writing data, both DateTime and DateTimeOffset appear to be written as DateTimeOffset. This can be shown by:
This is even more of a problem as the data type coming back is not matching the date type written out. When using Pandas
Datetime values written out match to the type read back in, there is no conversion from a non-timezone to a timezone aware format.
The text was updated successfully, but these errors were encountered: