PyTables: hierarchical datasets in Python

URL: http://www.pytables.org/

PyTables is a package for managing hierarchical datasets, designed to efficiently cope with extremely large amounts of data.

It is built on top of the HDF5 library and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively saving and retrieving very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that they take much less space (between 3 to 5 times and more if the data is compressible) than other solutions, like for example, relational or object-oriented databases.

State-of-the-art compression

PyTables supports the Blosc compressor out of the box. This allows for extremely high compression speed, while keeping decent compression ratios. By doing so, I/O can be accelerated by a large extent, and you may end up achieving higher performance than the bandwidth provided by your I/O subsystem. See the Tuning The Chunksize section of the Optimization Tips chapter of the user documentation for some benchmarks.

Not a RDBMS replacement

PyTables is not designed to work as a relational database replacement, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems, simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), or as a centralized repository for system logs, to name only a few possible use cases.

Tables

A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure, and all values in each field have the same data type. The terms "fixed-length" and strict "data types" seem to be a strange requirement for an interpreted language like Python, but they serve a useful function if the goal is to save very large quantities of data (such as generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O.

Arrays

There are other useful objects like arrays, enlargeable arrays, or variable-length arrays that can cope with different use cases on your project.

Easy to use

One of the principal objectives of PyTables is to be user-friendly. In addition, many different iterators have been implemented to make interactive work as productive as possible.

Platforms

We use Linux on top of Intel32 and Intel64 boxes as the main development platforms, but PyTables should be easy to compile/install on other UNIX (including macOS) or Windows machines.

Compiling

To compile PyTables, you will need a recent version of the HDF5 (C flavor) library, the Zlib compression library, and the NumPy and Numexpr packages. Besides, PyTables comes with support for the Blosc, LZO, and bzip2 compressor libraries. Blosc is mandatory, but PyTables comes with Blosc sources so, although it is recommended to have Blosc installed in your system, you don't absolutely need to install it separately. LZO and bzip2 compression libraries are, however, optional.

Make sure you have HDF5 version 1.10.5 or above. On Debian-based Linux distributions, you can install it with:

$ sudo apt install libhdf5-serial-dev

Installation

Install with `pip <https://pip.pypa.io/en/stable/>`:
```
$ python3 -m pip install tables
```
To run the test suite:
```
$ python3 -m tables.tests.test_all
```
If there is some test that does not pass, please send us the complete output using the [GitHub Issue Tracker](https://github.com/PyTables/PyTables/issues/new).

Enjoy data! -- The PyTables Team

Name		Name	Last commit message	Last commit date
Latest commit History 4,977 Commits
.github		.github
LICENSES		LICENSES
bench		bench
c-blosc @ d306135		c-blosc @ d306135
ci		ci
contrib		contrib
doc		doc
examples		examples
hdf5-blosc		hdf5-blosc
hdf5-blosc2/src		hdf5-blosc2/src
src		src
tables		tables
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yml		.readthedocs.yml
ANNOUNCE.txt.in		ANNOUNCE.txt.in
CITATION.bib		CITATION.bib
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
FUNDING.yml		FUNDING.yml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
RELEASE_NOTES.rst		RELEASE_NOTES.rst
SECURITY.md		SECURITY.md
THANKS		THANKS
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.py		setup.py

License

PyTables/PyTables

Folders and files

Latest commit

History

Repository files navigation

PyTables: hierarchical datasets in Python

State-of-the-art compression

Not a RDBMS replacement

Tables

Arrays

Easy to use

Platforms

Compiling

Installation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

Languages