Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFX.components.transform id #6278

Open
raminmohammadi opened this issue Sep 13, 2023 · 4 comments
Open

TFX.components.transform id #6278

raminmohammadi opened this issue Sep 13, 2023 · 4 comments

Comments

@raminmohammadi
Copy link

If the bug is related to a specific library below, please raise an issue in the
respective repo directly:

TensorFlow Data Validation Repo

TensorFlow Model Analysis Repo

TensorFlow Transform Repo

TensorFlow Serving Repo

System information

  • Have I specified the code to reproduce the issue (Yes, No): yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows),
    Interactive Notebook, Google Cloud, etc): Linux, Notebook, Colab
  • TensorFlow version: 2.13.0
  • TFX Version: 1.14.0
  • Python version: 3.8
  • Python dependencies (from pip freeze output):
    requirements.txt

Describe the current behavior:

this problem only happens when i use the transfrom as part of the tfx. I'm encountering an issue while working with the "transform" function, which involves processing individual input data items. Each of these data inputs consists of two keys: 'entities' and 'text'.

My specific task is to perform a transformation on the "text" dimension of the input tensor, breaking it down into individual characters. For example, given the input "This is a test," I intend to follow these steps:

Split the text into character arrays: [['t', 'h', 'i', 's'], ['i', 's'], ['a'], ['t', 'e', 's', 't']]

Code 1: tf.strings.unicode_split(tf.strings.split('This is a test'), input_encoding='UTF-8')
Map each character to a dictionary, obtain its index, and pad each word to a width of 12 characters.

Code 2: tf.map_fn(get_index, text, fn_output_signature=tf.TensorSpec(shape=(1, Wlength), dtype=tf.int64, name=None))

currently transform only returns one vector starting with 1 and rest 0:
example = [[1, 0,0,0,0,0,0,0,0]]

Describe the expected behavior

expected output should be:

<tf.Tensor: shape=(4, 1, 12), dtype=int64, numpy=
array([[[58, 20, 21, 31, 0, 0, 0, 0, 0, 0, 0, 0]],

   [[21, 31,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]],

   [[13,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]],

   [[32, 17, 31, 32,  0,  0,  0,  0,  0,  0,  0,  0]]])>

Standalone code to reproduce the issue

Providing a bare minimum test case or step(s) to reproduce the problem will
greatly help us to debug the issue. If possible, please share a link to
Colab/Jupyter/any notebook.

https://colab.research.google.com/drive/1ap8Gycu7s--mz0VAxp4W2DphAd1HW1yi?usp=sharing

Name of your Organization (Optional)

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.

@singhniraj08
Copy link
Contributor

@raminmohammadi,

I am unable to run the shared notebook. My environment crashes while using tf.data.experimental.TFRecordWriter to write the TF Record file. Looking at the transform component, it should produce similar results within or outside TFX pipeline.

Can you please make sure the example notebook works so that we can replicate the issue on our end. Thank you!

@raminmohammadi
Copy link
Author

not sure how to run this! I am able to run the jupyter on a local machine but on colab it fails at the moment. Will appreciate any feedback on this or if you can run this locally.

@singhniraj08
Copy link
Contributor

@raminmohammadi, I tried but was unable to create a local setup to test your notebook because of some permission issues.

@zoyahav, Can you please give some feedback why the transform output in TFX pipeline is different from expected output when running the transformation outside TFX pipeline. Thanks.

@raminmohammadi
Copy link
Author

Any updates on this issue? Tnx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants