Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Support add training #12417

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

Aeren1564
Copy link
Contributor

Draft PR for supporting Add training

@Aeren1564 Aeren1564 mentioned this pull request Jan 8, 2024
3 tasks
@Aeren1564 Aeren1564 force-pushed the draft_add branch 6 times, most recently from 12ccdba to 14612d8 Compare January 11, 2024 13:22
@Aeren1564 Aeren1564 force-pushed the draft_add branch 7 times, most recently from 978e12a to 3dc789c Compare January 17, 2024 05:17
@Aeren1564 Aeren1564 force-pushed the draft_add branch 2 times, most recently from a8a00ea to 1a16092 Compare January 24, 2024 01:12
Comment on lines 50 to 81
case ArithmeticType::kSub:
case ArithmeticType::kMul:
case ArithmeticType::kDiv:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are other operations always required to broadcast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to work with broadcasted data (which, to my understanding, is simple copying) for other OPs.
Is there a better alternative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if there is a way to calculate both broadcast and gradient at the same time, it would be great. But it is not related to this PR, so let's think about it later. :)

@Aeren1564 Aeren1564 force-pushed the draft_add branch 4 times, most recently from 8ecfd6b to ca069c0 Compare January 26, 2024 03:40
@Aeren1564
Copy link
Contributor Author

Aeren1564 commented Jan 26, 2024

Model with Subtract

    input_lhs = tf.keras.layers.Input(shape=(10))
    input_rhs = tf.keras.layers.Input(shape=(10))
    lhs = tf.keras.layers.Dense(10)(input_lhs)
    rhs = tf.keras.layers.Dense(10)(input_rhs)
    res_sub = tf.keras.layers.Subtract()([lhs, rhs])
    output = tf.keras.layers.Dense(10)(res_sub)
    model = tf.keras.models.Model(inputs=[input_lhs, input_rhs], outputs=output, name="subtract_training")

Data

    np.random.seed(123)
    data_lhs = np.random.rand(3000, 10).astype(np.float32) * 100
    data_rhs = np.random.rand(3000, 10).astype(np.float32) * 100
    coef_lhs, coef_rhs = np.random.rand(10, 10).astype(np.float32), np.random.rand(10, 10).astype(np.float32)
    data_res = np.array([(np.matmul(coef_lhs, x[0]) + np.matmul(coef_rhs, x[1])) for x in zip(data_lhs, data_rhs)], dtype=np.float32)

Tensorflow

__________________________________________________________________________________________________
Epoch 1/5
150/150 [==============================] - 0s 578us/step - loss: 647.6946 - mae: 20.2884
Epoch 2/5
150/150 [==============================] - 0s 585us/step - loss: 484.7763 - mae: 17.5722
Epoch 3/5
150/150 [==============================] - 0s 581us/step - loss: 382.2013 - mae: 15.5862
Epoch 4/5
150/150 [==============================] - 0s 592us/step - loss: 305.6406 - mae: 13.9046
Epoch 5/5
150/150 [==============================] - 0s 629us/step - loss: 241.3343 - mae: 12.2882

ONERT-Train

/home/aeren/Repos/ONE/Product/x86_64-linux.debug/out/bin/onert_train --modelfile /home/aeren/Repos/Scripts/_Product/circle+/result/model20240126_1244/model.circle --load_input:raw /home/aeren/Repos/Scripts/_Product/circle+/data/input.bin --load_expected:raw /home/aeren/Repos/Scripts/_Product/circle+/data/res.bin --epoch 5 --batch_size 20 --learning_rate 0.001 --loss 1 --loss_reduction_type 1 --optimizer 2 
Model Expected Filename /home/aeren/Repos/Scripts/_Product/circle+/data/res.bin
Model Input Filename /home/aeren/Repos/Scripts/_Product/circle+/data/input.bin
Model Filename /home/aeren/Repos/Scripts/_Product/circle+/result/model20240126_1244/model.circle
== training parameter ==
- learning_rate   = 0.001
- batch_size      = 20
- loss_info       = {loss = mean squared error, reduction = sum over batch size}
- optimizer       = adam
========================
Epoch 1/5 - time: 0.327ms/step - loss: [0] 647.6941
Epoch 2/5 - time: 0.318ms/step - loss: [0] 484.7753
Epoch 3/5 - time: 0.302ms/step - loss: [0] 382.2000
Epoch 4/5 - time: 0.317ms/step - loss: [0] 305.6393
Epoch 5/5 - time: 0.303ms/step - loss: [0] 241.3329

@Aeren1564
Copy link
Contributor Author

Model with Multiply

    input_lhs = tf.keras.layers.Input(shape=(10))
    input_rhs = tf.keras.layers.Input(shape=(10))
    lhs = tf.keras.layers.Dense(10)(input_lhs)
    rhs = tf.keras.layers.Dense(10)(input_rhs)
    res_mul = tf.keras.layers.Multiply()([lhs, rhs])
    output = tf.keras.layers.Dense(10)(res_mul)
    model = tf.keras.models.Model(inputs=[input_lhs, input_rhs], outputs=output, name="multiply_training")

Data

    np.random.seed(123)
    data_lhs = np.random.rand(3000, 10).astype(np.float32) * 100
    data_rhs = np.random.rand(3000, 10).astype(np.float32) * 100
    coef_lhs, coef_rhs = np.random.rand(10, 10).astype(np.float32), np.random.rand(10, 10).astype(np.float32)
    data_res = np.array([(np.matmul(coef_lhs, x[0]) + np.matmul(coef_rhs, x[1])) for x in zip(data_lhs, data_rhs)], dtype=np.float32)

Tensorflow

__________________________________________________________________________________________________
Epoch 1/5
150/150 [==============================] - 0s 688us/step - loss: 6594.2104 - mae: 64.8041
Epoch 2/5
150/150 [==============================] - 0s 676us/step - loss: 5605.8013 - mae: 59.9134
Epoch 3/5
150/150 [==============================] - 0s 573us/step - loss: 5306.5811 - mae: 58.3457
Epoch 4/5
150/150 [==============================] - 0s 548us/step - loss: 5146.8296 - mae: 57.5273
Epoch 5/5
150/150 [==============================] - 0s 547us/step - loss: 5031.8623 - mae: 56.9231

ONERT-Train

/home/aeren/Repos/ONE/Product/x86_64-linux.debug/out/bin/onert_train --modelfile /home/aeren/Repos/Scripts/_Product/circle+/result/model20240126_1329/model.circle --load_input:raw /home/aeren/Repos/Scripts/_Product/circle+/data/input.bin --load_expected:raw /home/aeren/Repos/Scripts/_Product/circle+/data/res.bin --epoch 5 --batch_size 20 --learning_rate 0.001 --loss 1 --loss_reduction_type 1 --optimizer 2 
Model Expected Filename /home/aeren/Repos/Scripts/_Product/circle+/data/res.bin
Model Input Filename /home/aeren/Repos/Scripts/_Product/circle+/data/input.bin
Model Filename /home/aeren/Repos/Scripts/_Product/circle+/result/model20240126_1329/model.circle
== training parameter ==
- learning_rate   = 0.001
- batch_size      = 20
- loss_info       = {loss = mean squared error, reduction = sum over batch size}
- optimizer       = adam
========================
Epoch 1/5 - time: 0.300ms/step - loss: [0] 6594.2061
Epoch 2/5 - time: 0.288ms/step - loss: [0] 5605.7979
Epoch 3/5 - time: 0.289ms/step - loss: [0] 5306.5791
Epoch 4/5 - time: 0.291ms/step - loss: [0] 5146.8281
Epoch 5/5 - time: 0.291ms/step - loss: [0] 5031.8633

@Aeren1564 Aeren1564 force-pushed the draft_add branch 4 times, most recently from 3225baf to 4606d2c Compare January 26, 2024 04:46
Comment on lines 85 to 86
lhs_grad_map = in_map.array() / rhs_map.array();
rhs_grad_map = in_map.array() * -lhs_map.array() / rhs_map.array() / rhs_map.array();
Copy link
Contributor Author

@Aeren1564 Aeren1564 Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing weird outputs :/
Is there something wrong with the following?

$L$: LHS
$R$: RHS
$O$: Output (of elementwise division)
$X$: Output (of entire model)

$L / R = O$

$\frac{\partial X}{\partial L} = \frac{\partial O}{\partial L} \cdot \frac{\partial X}{\partial O} = \frac{1}{R} \cdot \frac{\partial X}{\partial O}$
$\frac{\partial X}{\partial R} = \frac{\partial O}{\partial R} \cdot \frac{\partial X}{\partial O} = -\frac{L}{R^2} \cdot \frac{\partial X}{\partial O}$

@Aeren1564
Copy link
Contributor Author

@nnfw-bot test tizen-gbs

@Aeren1564
Copy link
Contributor Author

@nnfw-bot test onert-cross-debug

@Aeren1564
Copy link
Contributor Author

@nnfw-bot test onert-cross-release

@Aeren1564
Copy link
Contributor Author

TODO: try other optimizers for division

.

Signed-off-by: YongHyun An <yonghyunz.an@samsung.com>
@Aeren1564
Copy link
Contributor Author

I've tried following optimizers for division but all of them showed loss values differing from that of tensorflow :/

SGD, RMSProp, Adam, Adadelta

@Aeren1564
Copy link
Contributor Author

@nnfw-bot test onert-cross-debug

@Aeren1564
Copy link
Contributor Author

@nnfw-bot test onert-cross-release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants