Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threaded compression? #5341

Open
khdlr opened this issue Mar 28, 2024 · 1 comment
Open

Multi-threaded compression? #5341

khdlr opened this issue Mar 28, 2024 · 1 comment
Labels

Comments

@khdlr
Copy link

khdlr commented Mar 28, 2024

What I need help with / What I was wondering
I need to build a large dataset of imagery that has > 3 channels (multi-spectral satellite imagery), so I'm relying on the tfds.features.Tensor feature connector. As writing data uncompressed is highly inefficient, I'm using tfds.features.Encoding.ZLIB for compression.

However, this compression step actually becomes the bottleneck in my dataset building process as it is single-threaded, causing my dataset build to take longer than a month.

What I've tried so far
Read up on the docs, also checked the tf.io namespace for any possible workarounds.

It would be nice if...

  • Is there any way of speeding up the encoding/compression of the examples by using multiple cores?
  • Are there plans to support a faster compression method than ZLIB for generic Tensor features?
@khdlr khdlr added the help label Mar 28, 2024
@noahzhy
Copy link

noahzhy commented Apr 2, 2024

same problem when preparing tfrecord before training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants