You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
keras.layers.TextVectorization does not convert Cyrillic characters to lowercase with 'lower_and_strip_punctuation'.
Deprecated keras.preprocessing.text.Tokenizer does this.
The lowercasing is simply does via the TensorFlow operation tf.strings.lower, and since it needs to be a TF op, we are not at liberty to change it. You could open the same issue on the TensorFlow repo instead. A workaround you could use is to expressing lowercasing via a regex and then use tf.strings.regex_replace, inside your own standardize function passed to TextVectorization.
The lowercasing is simply does via the TensorFlow operation tf.strings.lower, and since it needs to be a TF op, we are not at liberty to change it. You could open the same issue on the TensorFlow repo instead.
It's not because of tf.strings.lower()!
tf.strings.lower() works properly with encoding='utf-8'.
keras.layers.TextVectorization does not convert Cyrillic characters to lowercase with 'lower_and_strip_punctuation'.
Deprecated keras.preprocessing.text.Tokenizer does this.
The text was updated successfully, but these errors were encountered: