Skip to content

mrbid/FaceTo3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository holds the most upto-date versions of the scripts.

This is based on work in my Hugging Face repository "HeadsNet".

The Hugging Face repository has the full project including the dataset (I also go into a little more detail), this GitHub repository just stands to hold the bare code.

It all started with this rough idea that I had after spending much time looking into Neural Radiance Fields (NeRF) for generative 3D which can be viewed here: "PT-NePC".

The dataset this was trained on is a synthetic dataset I generated from StyleGAN2 using ThisPersonDoesNotExist.com and then feeding those synthetic 2D images into TripoSR to turn them into 3D heads, the dataset is on Hugging Face here: "FaceTo3D".

The first attempt was my PT-NePC approach in "headsnet". HeadsNet was the highest quality attempt. It took a simple two vector input to produce a random full color 3D point cloud of a head. It includes the scraper, a viewer for the scraped models, the dataset generator, training and prediction code.

The second attempt was to simplify the problem down to producing a 32^3 grayscale voxel volume of a head from a 32x32 grayscale input image.

  • facenet1 has the dataset generation code and the first attempt at FaceToVoxel. It attempts to train one large FNN/MLP on the problem.
  • facenet2 requires the dataset generated by facenet1, and attempts to train the problem on 32^3 individual networks with a single output (the grayscale value for a single voxel). This allows better parallelisation such as over multiple machines in a network - but also better parallelisation over multiple CPU cores in a single computer system.
  • facenet3 the successor model, a simplified version of facenet1.

This project deliberately focuses on MLP's while ignoring VAE's which would be a more traditional use case.

Training was done on a single HPE ProLiant DL580 Gen9 with Intel® Xeon® E7-8880 v4. Although I could have done with a few of these for facenet2 to be honest! 32 of them would reduce the training process from a week or multiple weeks to just a few hours or days. Being able to perform faster tests allows one to hone in on a working and quality model much faster (it's hard to say if there would have been a better quality model with more processing power, I would assume not but who knows until it is actually attempted).


An example of ground truth outputs that facenet is trained on is facenet_ground_truth.7z.