Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about Loss Functions and Representation #196

Closed
XueYing126 opened this issue Mar 26, 2024 · 5 comments
Closed

questions about Loss Functions and Representation #196

XueYing126 opened this issue Mar 26, 2024 · 5 comments

Comments

@XueYing126
Copy link

XueYing126 commented Mar 26, 2024

Thank you for the great work!!!

I have a question regarding the choice of loss functions and data representation. Specifically, I noticed the use of a 263-D representation with L2 loss during training for the text-to-motion task with the HumanML3D dataset.

I'm curious about the role of foot contact loss and velocity loss, which seem optimized independently of joint positions. Can you clarify how these contribute to the final motion prediction?
( as far as I understand, The final output motion only used part of 263D representation: the root rotation/position and local joint position to calculate the global joint position. (in recover_from_ric()) did I miss something here?

Additionally, considering the availability and effectiveness of models like SMPL, could you explain why it wasn't utilized for this task? Do you think SMPL would also be a suitable representation for this task?

Thank you for your time and insights.

@GuyTevet
Copy link
Owner

GuyTevet commented May 7, 2024

Thanks @XueYing126 ! Please check out #19 and let me know if you have more questions. Thanks:)

@XueYing126
Copy link
Author

XueYing126 commented May 7, 2024

Thank you for checking the issue. However, I still have the question.

Are we primarily concerned with the final 22 human joints?

From what I understand, these 22 joints are derived from the aggregation of root velocity and the addition of local joint positions, meaning they are only influenced by the first 4 + 21 * 3, totaling 67 (out of 263).

The local joint velocity, rotation, and foot contact are unrelated to these final 22 joints....

This leaves me confused about how the 263-dimensional representation is evaluated. Shouldn't it be based on the predicted 22 joints rather than the entire 263 dimensions?
For instance, shouldn't foot contact loss be determined by comparing the predicted joints using a threshold(like how they computed the ground truth), rather than relying solely on the last binary feature in the 263-dimensional prediction?"

Thank you again!

@GuyTevet
Copy link
Owner

Indeed the visualization is based on the 22 joint locations, yet the evaluation is performed using all the 263 entries.

@Jocker9527
Copy link

Jocker9527 commented May 24, 2024

Can anyone tell me where the loss files are?

@GuyTevet
Copy link
Owner

def training_losses(self, model, x_start, t, model_kwargs=None, noise=None, dataset=None):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants