Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation on My Dataset: How to Get the 251-dimensional Motion Vectors? #19

Open
JeremyCJM opened this issue Nov 2, 2022 · 4 comments

Comments

@JeremyCJM
Copy link

Hi Mingyuan,

Do you know how to get the 251-dimensional motion vectors as provided in the KiT dataset?

I am computing the FID on my dataset, but our data only has two channels (x, y) instead of 251. Therefore, I wonder how to map the low-dimensional motion sequence to 251-dimensional motion vectors.

Thanks,
Jeremy

@mingyuan-zhang
Copy link
Owner

Hi, you can find the defination of each dimension from here.

However, I think it's hard to directly evaluate on the 2D data with the pre-trained evalutor models on KIT-ML. The positions of each joint are greatly different between 2D data and 3D data. I think you may need to re-train the evaluators.

@JeremyCJM
Copy link
Author

Thanks for the reply! If I have 3D joints data, how to map it into 251 dimensions? Do you have the code to do this?

Also, if I want to retrain the evaluation network, which dataset and what task should I choose?

@mingyuan-zhang
Copy link
Owner

We follow the data preparation as HumanML3D. You can find the data processing in raw_pose_processing.ipynb and motion_representation.ipynb

To retrain evaluation network, the most appropriate way is to train on the same motion dataset as your generative model. You may split the whole motion data into a training split and a validation split. Then you can train a contrastive model (contains a motion encoder and a text encoder) for evaluation. Specifically, given several pairs of ( $\mathrm{text}_i$, $\mathrm{motion}_i$). You can build up a InfoNCE loss to increase the similarity between the extracted feature $\mathrm{text}_i$ and $\mathrm{motion}_i$, and decrease the similarity between the extracted feature $\mathrm{text}_i$ and $\mathrm{motion}_j (i \neq j)$

@JeremyCJM
Copy link
Author

Thanks! It sounds like a clip on text and motion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants