tf.summary.histogram allocates a lot of memory when called with large data #6714

hrsht · 2024-01-03T01:10:36Z

Environment information (required)

Irrelevant

Issue description

I am calling tf.summary.histogram with large enough data (in my case a 1-D tensor with ~16M int32 elements). I noticed that this would cause the method to allocate around 3.6GB of memory, with the following log line from tensorflow:

2024-01-02 19:27:52.663452: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 3903564240 exceeds 10% of free system memory.

Digging down deeper, I notice that this call to tf.one_hot in the method to calculate buckets is the culprit. This tensor has bucket_count times the original number of elements and with it being a tf.float64 type, it explains the 3.6GB of memory allocation in my case with default bucket count value of 30 (16M * 30 * 8 = ~3.6GB).

Looks like this tensor is only used to calculate bucket_counts tensor, which is a count of the number of elements in each bucket. Can this be done in a more memory efficient way? Something like:

bucket_counts = tf.map_fn(lambda i: tf.reduce_sum(tf.cast(clamped_indices == i, tf.int32)), tf.range(bucket_count))

The text was updated successfully, but these errors were encountered:

arcra · 2024-01-04T00:08:40Z

I suppose this seems reasonable. I'll ask the team if they're aware of a particular reason why it was done this way.

I don't know if/when we'll prioritize this, but contributions are welcome. I expect that as long as tests pass, the change should be fine.

One additional thing to validate might be that this other solution doesn't have considerably worse performance than the existing solution, so it would be nice to do some tests in that regard, but I expect that would work fine.

hrsht · 2024-01-04T17:25:17Z

Thanks for your reply!

Can you let me know if there is a particular reason to do it this way? If there is no particular reason, I can work on this issue.

yatbear · 2024-01-04T22:25:36Z

Hi @hrsht,

We intentionally casted the value to float64 to avoid accumulating floating point errors with large counts: #5337 (see tensorflow/tensorflow#36128 for context).

hrsht · 2024-01-05T01:15:59Z

@yatbear That's not the issue here. The bucket count can be done as float64 (or IMO int64 for all practical purposes).

The usage of tf.one_hot is the issue as it ends up using order of bucket_count times more memory than the original data. So with default bucket count of 30, data of size 7e7 will allocate a tensor of size ~16GB in memory, which can be troublesome specially on a GPU (or in my case a machine with 16GB of RAM).

arcra · 2024-01-05T08:15:01Z

Indeed, apologies to both. I was seeking help from @yatbear about this issue but I probably did not communicate my question clearly.

It's been a long time since anybody has touched this code, so we're trying to think hard about it. Talking to @yatbear and @nfelt, Nick suggested that these operations need to be "TPU-compatible" (i.e. they need to be able to run in the TPU, not in the host machine... or perhaps a GPU as well, not sure), and that might be why this one-hot solution was implemented, since TPUs are pretty good at doing matrix operations, but we're not quite sure yet.

Additionally, as an FYI, related to the decision for using floats instead of integers, @nfelt mentioned that it was likely because "the output format of the histogram tensor is defined to be float64 anyway [...] and the 'wire' format is float64 because we allow the bucket boundaries to be floats and those are all serialized into a single tensor, so float64 is the common type".

So that explains why floats were used, whereas what @yatbear mentioned, explains why they are 64-bit floats, specifically, but as you stated, there's still the question of why using the one-hot tensor approach.

I'll let you know if we find out more about whether that operation (or a different one that is more memory-efficient than the current implementation) can be run in a TPU.

hrsht changed the title ~~tf.summary.histogram allocates a lot of memory with called with large data~~ tf.summary.histogram allocates a lot of memory when called with large data Jan 3, 2024

arcra added plugin:histogram core:summaries and removed plugin:histogram labels Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.summary.histogram allocates a lot of memory when called with large data #6714

tf.summary.histogram allocates a lot of memory when called with large data #6714

hrsht commented Jan 3, 2024

arcra commented Jan 4, 2024

hrsht commented Jan 4, 2024 •

edited

yatbear commented Jan 4, 2024

hrsht commented Jan 5, 2024 •

edited

arcra commented Jan 5, 2024

tf.summary.histogram allocates a lot of memory when called with large data #6714

tf.summary.histogram allocates a lot of memory when called with large data #6714

Comments

hrsht commented Jan 3, 2024

Environment information (required)

Issue description

arcra commented Jan 4, 2024

hrsht commented Jan 4, 2024 • edited

yatbear commented Jan 4, 2024

hrsht commented Jan 5, 2024 • edited

arcra commented Jan 5, 2024

hrsht commented Jan 4, 2024 •

edited

hrsht commented Jan 5, 2024 •

edited