Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of HDF5 1.12 #85

Open
tomas-kucera opened this issue Jan 13, 2022 · 3 comments
Open

Use of HDF5 1.12 #85

tomas-kucera opened this issue Jan 13, 2022 · 3 comments

Comments

@tomas-kucera
Copy link

tomas-kucera commented Jan 13, 2022

What are you trying to do?

I am trying to read HDF5 database (version 1.12.1).

The datatabase was populated using Python's h5py library. The data is pandas dataframe but guess that should be no problem as h5ls and HDFView app read the data without any issues.

What did you do?

I used the example from this repo for reading table. Also tried to use DataSet instead. This is the code excerpt:

package main

import (
  "fmt"
  "gonum.org/v1/hdf5"
)

type ohlcv struct {
  Index      int64   `hdf5:"index"`
  Exchange   string  `hdf5:"exchange"`
  Pair       string  `hdf5:"pair"`
  Timestamp  int64   `hdf5:"timestamp"`
  PriceOpen  float64 `hdf5:"price_open"`
  PriceHigh  float64 `hdf5:"price_high"`
  PriceLow   float64 `hdf5:"price_low"`
  PriceClose float64 `hdf5:"price_close"`
  Volume     float64 `hdf5:"volume"`
}

func main() {
  version, _ := hdf5.LibVersion()
  fmt.Printf("HDF5 version: %s\n", version)

  file, _ := hdf5.OpenFile("tickers.h5", hdf5.F_ACC_RDONLY)
  month, _ := file.OpenGroup("M11")
  day, _ := month.OpenGroup("D07")
  table, _ := day.OpenTable("table")

  recs, _ := table.NumPackets()

  for i := 0; i != recs; i++ {
    p := make([]ohlcv, 1)
    if err := table.Next(&p); err != nil {
      panic(fmt.Errorf("next failed: %s", err))
    }
    fmt.Printf("data[%d]: O:%.2f H:%.2f L:%.2f C:%.2f V:%.2f \n", i, p[0].PriceOpen, p[0].PriceHigh, p[0].PriceLow, p[0].PriceClose, p[0].Volume)

  file.Close()
}

What did you expect to happen?

I expected something like this:

HDF5 version: 1.12.1
data[0]: O:62829.33 H:62858.35 L:62829.32 C:62853.66 V:10.72221
data[1]: O:62853.66 H:62920.04 L:62851.32 C:62896.75 V:10.19546
...
data[1439]: O:63276.08 H:63286.35 L:63250.01 C:63273.59 V:43.11052

What actually happened?

What I get is:

HDF5 version: 1.12.1
data[0]: O:24533265083020748587221761909950877822199906846513430683666835688641707196344354649178734577047675756970784403964996179506865859538714624.00 H:153999479823021862704498665709509248968354775291789269717488570675195022731875416084608859555430794393831940365058635304153349319996889497485119259215244127082639950809210292371944342687481593856.00 L:0.00 C:0.00 V:0.00 
data[1]: O:-0.00 H:11485591669347015527702671166617436553216.00 L:0.00 C:0.00 V:0.00 
data[2]: O:16786184717166469080015018654342952761822206471285346540228583460481189475663073161248768.00 H:116860917747596761471525066204868691258239771993742452785440191
...
data[1433]: O:-9500707167603260.00 H:59636916704940832875429063464307788500085761805873313238334329889516158976.00 L:0.00 C:0.00 V:0.00 
data[1434]: O:5019141222517546172965875509332335194542251462150973230580526953620246797924906675853593091481301449281850864438229980645461301058991845935258992265931878573396108562393519846894098059723698237228341624032758439241216420110000323300523402260850195612038020143058501041685903495909081088.00 H:-14749955137625195020933306096366472509413755436838140703864665704181336175168284990050108620591141868165148740370361311351055003224895967047494566278255701287614003245498688726479296978595350966497450734598780051966490510138076888153184354240036864.00 L:0.00 C:0.00 V:0.00 
...
data[1438]: O:-0.00 H:40804893379413961024208896.00 L:0.00 C:0.00 V:0.00 
data[1439]: O:11485478191699172345758915201790495424512.00 H:-0.00 L:0.00 C:0.00 V:0.00 

Also when trying to access Exchange or Pair attributes, I get the following error:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e pc=0x405fbc9]

What version of Go, Gonum, Gonum/netlib and libhdf5 are you using?

go version go1.17 darwin/amd64
gonum.org/v1/hdf5
h5cc -showconfig (excerpt)
General Information:
-------------------
HDF5 Version: 1.12.1
Configured on: Mon Jul 12 08:05:03 BST 2021
Configured by: brew@BigSur
Host system: x86_64-apple-darwin20.4.0
Uname information: Darwin BigSur 20.4.0 Darwin Kernel Version 20.4.0: Thu Apr 22 21:46:47 PDT 2021; root:xnu-7195.101.2~1/RELEASE_X86_64 x86_64
Byte sex: little-endian
Installation point: /usr/local/Cellar/hdf5/1.12.1

Does this issue reproduce with the current master?

Yes, it does!

@sbinet
Copy link
Member

sbinet commented Jan 13, 2022

I surmise this:

p := make([]ohlcv, 1)

needs to read instead:

var p ohlcv

and thus:

fmt.Printf("data[%d]: O:%.2f H:%.2f L:%.2f C:%.2f V:%.2f \n", i, p.PriceOpen, p.PriceHigh, p.PriceLow, p.PriceClose, p.Volume)

@tomas-kucera
Copy link
Author

tomas-kucera commented Jan 13, 2022

Thanks for a quick response.

Took that code from the example but also tried what you are suggesting. Unfortunately that leads to this error:
panic: unsupported kind (struct), need slice or array

which makes sense, as the function definition is:

func (*hdf5.Table).Next(data interface{}) error
(hdf5.Table).Next on pkg.go.dev

Next reads packets from a packet table starting at the current index into the value pointed at by data. i.e. data is a pointer to an array or a slice.

Can it be that you are using some newer version?

EDIT: Just checked the implementations of the Next function in h5pt_table.go and they are identical.
EDIT 2: BTW, I also tried ReadPackets instead of Next with the same results.

@tomas-kucera
Copy link
Author

Did some more research!

If I replace the printing line with

fmt.Printf("data[%d]: %v\n", i, p)

Then there are two possible results dependant on the defintion of the struct:

  1. full definition that includes the strings fails with this error comming from the fmt.Printf():
    panic: runtime error: growslice: cap out of range

  2. if the string are commented out, then the result is this:

data[1437]: [{1625183880000000000 7090182514096892258 1.814982667395619e-306 -3.79181233146521e-284 -4.643804396672689e-134 -9.500616071346912e+15 3.5854690526542615e+184}]
data[1438]: [{1625183940000000000 7090182514096892258 1.814982667395619e-306 5.9896317349078915e+183 3.434212986107372e+237 3.58550892285317e+184 -9.919075148868785e-38}]
data[1439]: [{1625184000000000000 7090182514096892258 1.814982667395619e-306 -3.3900115496356115e+111 -2.0293221659741413e+112 -3.177424435398634e-182 1.1485478191699172e+40}]

where the first column (Index) is perfectly correct but the rest is just messed up.

This leads me to think that the reading ignores the `hdf5:"column_name"` and reads the values in sequence and thus causing to mess up the data completely.

This hypothesis is somewhat being broken by the fact that even if I leave the struct defintion full (including the strings) then the Next passes and if I do not attempt to print the string values (Exchange / Pair), then the values are displayed but wrong. Which is the original output.

I am being totally lost.

But have a simple question: How does handling string in structs for reading from HDF5 work?

I have noticed that in the master/cmd/test-go-table-01-readback/main.go file there is definition of struct:

type particle struct {
	// name        string  `hdf5:"Name"`      // FIXME(sbinet)
	Lati        int32   `hdf5:"Latitude"`
	Longi       int64   `hdf5:"Longitude"`
	Pressure    float32 `hdf5:"Pressure"`
	Temperature float64 `hdf5:"Temperature"`
	// isthep      []int                     // FIXME(sbinet)
	// jmohep [2][2]int64                    // FIXME(sbinet)
}

That somehow indicates that strings can be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants