Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Added Tensor parameters to prevent redundant nodes. Also enabled support for beam sizes larger than the vocabulary size #323

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ralphabb
Copy link

My changes to the original code are as follows:

  1. This implementation accepts state parameters (lengths, finished and log_probs) and time_ as tensors. time_ is a float64 scalar value (which can be a placeholder). Float64 was used to allow for high-precision computation. All other state values (lengths, log_probs, finished) can also be replaced by placeholders (i.e. define a state with placeholder parameters and use with this code), as I have done in my use of this implementation.
    The reason behind this change is that I discovered, while using this code, that repeated calls to the beam_search_step function caused redundant nodes to be added to the computational graph and, as such, as more steps were called, more nodes were introduced and computational complexity was quadratic in the number of steps at best. Hence, I took it upon myself to "tensorize" the parameters, so that the nodes need only be defined once. To use this code, you
    a) Define your state using placeholders of dynamic shape for every state parameter (An example use can be given upon request, as I have used this in my own project)
    b) Define time as a float64 scalar tensor
    c) Compute next_state values using sess.run() and save these, along with time+1. Then, feed these as the new feed_dict to the same function.
    NOTE: The function beam_search_step and its output are set up outside any loop over time/steps. Within the loop calculating the beam programs, only feed_dicts are constructed for the next step using the outputs of sess.run (An example is available upon request)

  2. The previous implementation does not support a beam size larger than the vocabulary size, and would crash when this is the case. Using conditionals, namely tf.cond since I "tensorised" the parameters of this code in step 1, I added functionality to detect when the beam size is larger than the number of candidates using log operations, and to set the size of the relevant arrays accordingly

Feel free to contact me for more explanations and any questions
Best,
Ralph Abboud

My changes: 1) This implementation accepts state parameters (lengths, finished and log_probs) as well as time_ as tensors. time_ is a float64 scalar value (which can be a placeholder). Float64 was used to allow for high-precision computation. All other state values (lengths, log_probs, finished) can also be replaced by placeholders (i.e. define a state with placeholder parameters and use with this code), as I have done in my use of this implementation. The reason behind this change to tensor values is that I discovered, while using this code, that repeated calls to the beam_search_step function caused redundant nodes to be added to the computational graph and, as such, as more steps were called, more nodes were introduced and computational complexity was quadratic in the number of steps at best. Hence, I took it upon myself to "tensorize" the parameters, so that the nodes need only be defined once. To use this code, you 
a) Define your state using placeholders of dynamic shape for every state parameter (An example can be given upon request, as I have used this in my own project)
b) Define time as a float64 scalar tensor
c) Compute next_state values using sess.run() and save these, along with time+1. Then, feed these as the new feed_dict to the same function. 
NOTE: The function beam_search_step is set up outside any loop over time/steps. Within the loop calculating the beam programs, only feed_dicts are constructed for the next step using the outputs of sess.run (An example is available upon request)

2) The previous implementation does not support a beam size larger than the vocabulary size, and would crash when this is the case. Using conditionals, namely tf.cond since I "tensorised" the parameters of this code, I added functionality to detect when the beam size is larger than the number of candidates using log operations, and to set the size of the relevant arrays accordingly

Feel free to contact me for more explanations and any questions
Best,
Ralph Abboud
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant