Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions understanding bdnn_transform function. #23

Open
cqjjjzr opened this issue Apr 13, 2019 · 1 comment
Open

Questions understanding bdnn_transform function. #23

cqjjjzr opened this issue Apr 13, 2019 · 1 comment

Comments

@cqjjjzr
Copy link

cqjjjzr commented Apr 13, 2019

Hi.
I'm trying to rewrite this project in C++ in search of better interoperability, better user friendliness and better performance.

Now I successfully implemented MRCG extraction and get a huge quality boost as well as a small memory usage. However I have some problem understanding the scripts that does the prediction. This script involves lots of array allocating and I want to know the purpose of every single line in order to write better implementation.

So, could you please kindly give an explanation of the bdnn_transform function?

def bdnn_transform(inputs, w, u):

    # """
    # :param inputs. shape = (batch_size, feature_size)
    # :param w : decide neighbors
    # :param u : decide neighbors
    # :return: trans_inputs. shape = (batch_size, feature_size*len(neighbors))
    # """

    neighbors_1 = np.arange(-w, -u, u)
    neighbors_2 = np.array([-1, 0, 1])
    neighbors_3 = np.arange(1+u, w+1, u)

    neighbors = np.concatenate((neighbors_1, neighbors_2, neighbors_3), axis=0)

    pad_size = 2*w + inputs.shape[0]
    pad_inputs = np.zeros((pad_size, inputs.shape[1]))
    pad_inputs[0:inputs.shape[0], :] = inputs

    trans_inputs = [
        np.roll(pad_inputs, -1*neighbors[i], axis=0)[0:inputs.shape[0], :]
                    for i in range(neighbors.shape[0])]

    trans_inputs = np.asarray(trans_inputs)
    trans_inputs = np.transpose(trans_inputs, [1, 0, 2])
    trans_inputs = np.reshape(trans_inputs, (trans_inputs.shape[0], -1))

    return trans_inputs

Thanks in advance.

@jtkim-kaist
Copy link
Owner

jtkim-kaist commented Apr 14, 2019

Excellent! thank you for your interest and contributions!

Because it has been a long time since I implemented it, I can't exactly remember it in detail. However, the purpose is, implementing equation (7) in [1]. Also, it will be helpful to refer Fig. 2 in [1].

If there is some spare time for me, I can analyze the written code in detail, however, these day, I'm too busy. Thank you!

[1] X. Zhang and D. Wang, "Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 2, pp. 252-264, Feb. 2016.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants