Skip to content

Latest commit

 

History

History
348 lines (340 loc) · 16 KB

changelog.org

File metadata and controls

348 lines (340 loc) · 16 KB

v0.6.0

  • PARTIALLY BREAKING: Nim support for 1.6 is a bit wonky. It’s still supported, but when running on refc there seem to be cases where destructors are being called too early, leading to subtle bugs. And on ORC one test case (tWithDset) does not compile. So please upgrade.
  • refactor attributes read / write logic
  • add support for more types in attributes (compound types etc.)
  • add support to read all types in withAttr, including compound and seq[dkObject> types. Note that in those two cases the data is available as attr as a JsonNode!
  • make copy_attributes work for all types

v0.5.12

  • better handle empty inputs in add / write_hyperslab. Do nothing in that case

v0.5.11

  • fix copy implementation for certain use cases and fix a regression in it

v0.5.10

  • fix JSON dataset reading not taking into account the shape of a dataset when the dataset is only simple data (same datatype)

v0.5.9

  • fix naming of matlab field structs, due to misunderstanding on my part (I thought the _ref was part of the HDF5 designation in my test file)
  • allow references to yield the input dataset if it is not a reference dataset

v0.5.8

  • BREAKING: fix issue with dataset and group iterators, where we treated the root group special. The root group was off by 1 / in the counting of the depth. This means no matter the group to start at, the meaning of depth is explicitly such that depth == 1 corresponds to any child of the given group.
  • export all fromFlat helper procs
  • allow read from a raw pointer (intended for manual use with Buffer). Note: you may need to call H5Dvlen_reclaim if your read data contains VLEN data!
  • add basic support for HDF5 references
  • add basic support for reading HDF5 datasets and groups into JsonNode, for easier handling of runtime based decisions. See nimhdf5/hdf5_json.nim
  • add experimental Matlab file loader (v7.3 Matlab files), see nimhdf5/hdf5_matlab. See examples/read_matlab.nim for an example.

v0.5.7

  • improves pretty printing of HDF5 objects (PR #65 by @AngelEzquerra)

v0.5.6

  • fix segfault due to stack corruption on Windows

    Wow, this was hard to understand!

    The final explanation is: In getNumAttrs we declare a variable h5info on the stack of type H5O_info_t. Due to time_t being interpreted as 4 bytes, the full H5O_info_t type had a size of 144 bytes for our Nim object. But the HDF5 library expects the object to be 160 bytes. We hand a pointer to the stack object to the Nim library, which writes to it. As a result of our object being too small, the HDF5 library caused a stack corruption leading to very bizarre bugs.

    My favorite:

    “` let isItNil = h5attrs.isNil doAssert not h5attrs.isNil, “But was it nil?” & $isItNil “`

    which failed in the assertion, but printed isItNil = false…

    The confusing part was that the API changed between 1.10 (is what I run locally on linux) and 1.12 (I run 1.14 on Windows), causing me to think the problem is in the difference between the H5O_info_t types. But that’s not the actual problem, because the type we actually map to is H5O_info1_t, which is backwards compatible, because we call the H5Oget_info2 (and friends) procedure, which takes that type as an argument. The new H5O_info2_t type is used in the H5Oget_info3 function.

v0.5.5

  • fix Windows support
    • use custom / proc to handle \ separators that are inserted intermittently from os./ and parentDir
    • fix normalizePath to always use / as separator
    • fix int type to always be mapped to 8 byte HDF5 type if machine’s type is also 8 byte by using H5T_NATIVE_LLONG in those cases
    • replace hardcoded paths in some tests by using getTempDir

v0.5.4

  • fixes potential source of segfaults in copyflat where we could call allocShared0 with a 0 argument
  • support array types to be written (at least as part of a compound datatype)
  • improve warning message when importing blosc filters without the Nim blosc library installed

v0.5.3

  • Drops support for Nim 1.4
  • add basic serialization submodule to auto serialize most objects to a H5 file. Scalar types are written as attributes and non scalar as datasets. Can be extended for complicated custom types by using the toH5 hook. See the tSerialize.nim test and the serialize.nim file. Note: currently no deserialization is supported. You need to parse the data back into your file if needed. An equivalent inverse can be added, but has no priority at the moment.
  • allow usage of tilde ~ in paths to H5 files
  • replace distinct `hid_t` types by traced ‘fat’ objects

    The basic idea here is the following: The `hid_t` identifiers all refer to objects that live in the H5 library (and possibly in a file). In our previous approach we kept track of different types by using `distinct hid_t` types. That’s great because we cannot mix and match the wrong type of identifiers in a given context. However, there are real resources underlying each identifier. Most identifiers require the user to call a `close` / `free` type of routine. While we can associate a destructor with a `=destroy` hook to a `distinct hid_t` (with `hid_t` just being an integer type), the issue is when that destructor is being called. In this old way the identifier is a pure value type. If an identifier is copied and the copy goes out of scope early, we release the resource despite still needing it! Therefore, we now have a ‘fat’ object that knows its internal id (just a real `hid_t`) and which closing function to call. Our actual IDs then are `ref objects` of these fat objects. That way we get sane releasing of resources in the correct moments, i.e. when the last reference to an identifier goes out of scope. This is the correct thing to do in 99% of the cases.

  • add FileID field to parent file for datasets, similar to already present for groups. Convenient in practice.
  • refactor read and write related procs. The meat of the code is now handled in one procedure each (which also takes care of reclaiming VLEN memory for example).
  • greatly improve automatic writing and reading of complex datatypes including Nim objects that contain string fields or other VLEN data. This is performed by performing a copy to a suitable datatype that matches the H5 definition of the equivalent data in Nim. type_utils and copyflat submodules are added to that end. In this context there is some trickyness involved, which causes the implementation to be more complex than one might expect. The necessity to get the correct alignment between naive `offsetOf` expectations and the reality of how structs are packed.

v0.5.2

  • remove support for reading into a cstring, as this is not well defined. A local cstring that needs to be created cannot be returned (without dealing manually with allocations)
  • add add, write_hyperslab, read working with ptr T for direct access with a manual memory region (useful when working with things like Tensors)
  • reorder dataset.nim code a little bit
  • support openArray in more places

v0.5.1

  • (finally!) add support for string datasets
    • fixed length string datasets, written by constructing a create_dataset("foo", <size>, array[N, char]) dataset (writing is done by simply giving a seq[string]
    • variable length string datasets, written by constructing a create_dataset("foo", <size>, string) dataset (writing is done by simply giving a seq[string])
    • support strings as variable length arrays of type char, constructed by create_dataset("foo", <size>, special_type(char)) dataset (writing is done by simply giving a seq[string]
  • add missing overload for write for the most general case, which was previously only possible via []=, so:
    let dset = ...
    dset.write(data)
        

    is now valid.

  • implement slicing read and write procedures for 1D datasets:
    let data = @[1, 2, 3]
    var dset = h5f.create_dataset("foo", 3, int)
    dset.write(data)
    doAssert data[0 .. 1] == data[0 .. 1]
    doAssert data.read(0 .. 1) == data[0 .. 1]
    dset.write(1 .. 2) = @[4, 5]
    doAssert dset[1 ..< 3] == @[4, 5]
    dset[0 .. 1] = @[10, 11]
    doAssert dset[int] == @[10, 11, 5]
        

    is now also all valid. These are implemented by using hyperslab reading / writing.

  • fix bug in write_norm about coordinate selection, such that writing specific indices now actually works correctly
  • fix bug in write when writing specific coordinates of a 1D dataset

v0.5.0

  • fix behavior of delete to make sure we also keep our internal TableRef in line with the file
  • BREAKING: fully support writing datasets as (N, ) instead of turning it into (N, 1) instead (especially for VLEN data). This has big implications when reading 1D data using hyperslabs. If instead of adding an extra dimension as:
    let data = dset.read_hyperslab(dtype, start = @[1000, 0], count = @[1000, 1])
        

    instead of

    let data = dset.read_hyperslab(dtype, start = @[1000], count = @[1000])
        

    reading performance is orders of magnitudes slower! Essentially when handing an integer to create_datasets it is now kept as such (and turned into a 1 element tuple). For non vlen data creating and writing such datasets correctly worked correctly before if I’m not mistaken.

  • add more exception types for dealing with filters & in particular blosc:
    • HDF5FilterError
    • HDF5DecompressionError
    • HDF5BloscFilterError
    • HDF5BloscDecompressionError

v0.4.7

  • add overwrite option to write_dataset convenience proc

v0.4.6

  • avoid copy of input data when writing VLEN data
  • CT error if composite data with string fields is being read, as it’s currently not supported (strings are vlen data & vlen in composite isn’t implemented)
  • fix regression in copy due to distinct hid_t variants
  • extend withDset to work properly with vlen data (returning dset variable with seq[seq[T]]) and add withDset overload working with a H5 file and a string name of a dataset
  • add test case for withDset

v0.4.5

  • treat akTruncate flag as write access to the file (create_dataset was not working with it)
  • fix blosc filter, regressed due to recent distinct introductions

v0.4.4

  • further fixes =destroy hooks introduced in v0.4.2. Under some circumstances the defined hooks caused segmentation faults when deallocating objects (these hooks are finicky!)
  • fix opening files with akTruncate (i.e. overwrite a file instead of appending)
  • SEMI-BREAKING: raise an exception if opening a file failed. This is more of an oversight rather than a feature that we did not raise so far. This is not really breaking in a sense, because in the past we simply failed in the getNumAttrs call that happened when trying to open the attributes of the root group in the file.

v0.4.3

  • fixed the =destroy hooks introduced in v0.4.2
  • added support for SWMR (see README)
  • introduce better checks on whether an object is open by using H5I interface
  • turn file access constants into an enum to better handle multiple constants at the same time as a set
  • lots of cleanup of old code, replace includes by imports, …

v0.4.2

  • adds getOrCreateGroup helper to always get a group, either returning the existing one or creating it. Before version v0.4.0 this was the default behavior for [] as well as create_group. As of now, [] raises a KeyError now if it does not exist (this is a breaking change that is retroactively added to the changelog of v0.4.0). However, create_group does not throw if the group already exists. This may change in the future though.

v0.4.1

  • adds missing import of os.`/` in datasets.nim, which got removed in the refactor
  • fixes a regression in open for datasets in the case of a not existing dataset

v0.4.0

  • NOTE: At the time of release of v0.4.0 the following breaking change was not listed as such:
    • [] for groups does not create a group anymore, if it does not exist. Use getOrCreateGroup added in v0.4.2 for that! This was an unintended side effect that was overlooked, as the implementation was based on create_group.
  • major change: introduce multiple different distinct types for the different usages of hid_t in the HDF5 library. This gives us more readability, type safety etc. We can write proper type aware close procedures etc.
  • also adds =destroy hooks for all relevant types, so manual closing is not required anymore (unless one wishes to close early)
  • breaking: iterators taking a depth argument now treat it differently. A depth of 0 now means only the same level where previously it meant all levels. The previous behavior is available via depth = -1. The default behavior has not changed though.
  • breaking: renames the shape_raw and dset_raw arguments of create_dataset to simply shape and dset. The purpose of the _raw suffix is completely unimportant for a user of the library.
  • improve output of pretty printing of datasets, groups and files
  • add tests for iterators and contains procedure

v0.3.16

  • refactor out pretty printing, iterators, some attribute related code into their own files
  • move constructors into datatypes.nim, as they don’t depend on other things and are often useful in other modules (better separation, less recursive imports)
  • move a lot of features into h5util that may be used commonly between modules
  • fixes issue with iterator for groups, which could cause to not find any datasets in a group, despite them existing

v0.3.15

  • fix segmentation fault in visit_file for C++ backend

v0.3.14

  • fix H5Attributes return values for [] template returning AnyKind
  • change [], []= templates for H5Attribtutes into procs
  • fix the high level example to at least make it compile

v0.3.13

  • visit_file now does not open all groups and datasets anymore. Only recognizes which groups / files actually exist
  • adds close for dataset / groups. Both are now aware if they are open or not
  • add a string conversion for H5Attr
  • fix accessing a dataset from a group. Now uses the path of the group as the base
  • fix error message in read_hyperslab_vlen
  • turn some templates into procs
  • make blosc an optional import

v0.3.12

  • H5File as a proc is deprecated and replaced by H5open!
  • reading of string attributes now takes care to check if they are variable length or fixed length strings
  • import of blosc plugin is not automatic anymore, but needs to be done manually by compiling with -d:blosc
  • remove a lot old comments and imports from days past…

v0.3.11

  • change usage of csize to csize_t in full wrapper / library. For most use cases this did not have any effect (csize was an int, instead of unsigned). But for H5T_VARIABLE = csize.high this caused problems, because the value was not the one expected (csize_t.high)
  • add support for compound datatypes. Creating a dataset / writing and reading data works for any objects `T` which have fields that can be stored in HDF5 files currently. Objects and tuples are treated the same!
  • add support for seq[string] attributes
  • reorder datasets.nim and clean up [] logic
  • add [] accessor from a H5Group
  • add isVlen helper to check if dataset is variable length
  • make special_type usage optional when reading datasets
  • fix branching in nimToH5type to be fully compile time
  • add H5File to replace H5FileObj (latter is kept as deprecated typedef)
  • variable lenght data is created automatically if user gives seq[T] type in create_dataset
  • read can automatically read variable length data if seq[T] datatype is given
  • add tests for compound data and seq[string] attributes

v0.3.10

  • change dtypeAnyKind definition when creating dataset
  • improve iteration over subgroups / datasets

v0.3.9

  • fix mapping of H5 types to Nim types, see PR #36.

v0.3.8

  • remove dependency of typetraits and typeinfo modules by introducing custom DtypeKind enum