New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questions about Loss Functions and Representation #196
Comments
Thanks @XueYing126 ! Please check out #19 and let me know if you have more questions. Thanks:) |
Thank you for checking the issue. However, I still have the question. Are we primarily concerned with the final 22 human joints? From what I understand, these 22 joints are derived from the aggregation of root velocity and the addition of local joint positions, meaning they are only influenced by the first 4 + 21 * 3, totaling 67 (out of 263). The local joint velocity, rotation, and foot contact are unrelated to these final 22 joints.... This leaves me confused about how the 263-dimensional representation is evaluated. Shouldn't it be based on the predicted 22 joints rather than the entire 263 dimensions? Thank you again! |
Indeed the visualization is based on the 22 joint locations, yet the evaluation is performed using all the 263 entries. |
Can anyone tell me where the loss files are? |
|
Thank you for the great work!!!
I have a question regarding the choice of loss functions and data representation. Specifically, I noticed the use of a 263-D representation with L2 loss during training for the text-to-motion task with the HumanML3D dataset.
I'm curious about the role of foot contact loss and velocity loss, which seem optimized independently of joint positions. Can you clarify how these contribute to the final motion prediction?
( as far as I understand, The final output motion only used part of 263D representation: the root rotation/position and local joint position to calculate the global joint position. (in recover_from_ric()) did I miss something here?
Additionally, considering the availability and effectiveness of models like SMPL, could you explain why it wasn't utilized for this task? Do you think SMPL would also be a suitable representation for this task?
Thank you for your time and insights.
The text was updated successfully, but these errors were encountered: