How to initialize embeddings layer within Estimator API? #16058

slevental · 2018-01-12T00:22:23Z

I'm trying to use existing embeddings within tensorflow model, the size of embedding is greater than 2Gb and this makes my original try of doing this unsuccessful:

embedding_var = tf.get_variable(
        "embeddings", 
        shape=GLOVE_MATRIX.shape, 
        initializer=tf.constant_initializer(np.array(GLOVE_MATRIX))
)

Which gave me this error:

Cannot create a tensor proto whose content is larger than 2GB.

I'm using AWS SageMaker, which based on the Estimator API, and the actual running of the graph in session happens behind the scene, so I'm not sure how to initialize some placeholders for embedding given that. Would be helpful if someone will be able to share the way how to do such initialization in term of EstimatorAPI.

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

It must be a bug or a feature request.
The form below must be filled out.
It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
TensorFlow installed from (source or binary):
TensorFlow version (use command below):
Python version:
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:
Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

The text was updated successfully, but these errors were encountered:

cy89 · 2018-01-12T17:46:23Z

I think this would normally be a "send to StackOverflow" (standard response appended below) kind of issue, but the 2GB limit seems like it's within range of a bug or a feature request.

@martinwicke @ispirmustafa any suggestions?

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

ispirmustafa · 2018-01-12T21:24:07Z

I think it's related to graph size limit. using constant_initializer embeds the GLOVE_MATRIX into the graph which increases the graph size.
Could you please try to use non constant initializer?

slevental · 2018-01-17T21:04:37Z

looks like there the right way to initialize variables with embeddings would be to use tf.train.Scaffold. Here is more information regarding this on stackoverflow

cy89 added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Jan 12, 2018

martinwicke assigned ispirmustafa Jan 12, 2018

ispirmustafa added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jan 16, 2018

slevental closed this as completed Jan 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to initialize embeddings layer within Estimator API? #16058

How to initialize embeddings layer within Estimator API? #16058

slevental commented Jan 12, 2018

cy89 commented Jan 12, 2018

ispirmustafa commented Jan 12, 2018

slevental commented Jan 17, 2018

How to initialize embeddings layer within Estimator API? #16058

How to initialize embeddings layer within Estimator API? #16058

Comments

slevental commented Jan 12, 2018

System information

Describe the problem

Source code / logs

cy89 commented Jan 12, 2018

ispirmustafa commented Jan 12, 2018

slevental commented Jan 17, 2018