Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PointFlow Implementation #16

Open
avik-pal opened this issue Jul 23, 2020 · 1 comment
Open

PointFlow Implementation #16

avik-pal opened this issue Jul 23, 2020 · 1 comment

Comments

@avik-pal
Copy link
Member

PointFlow is a really interesting application of both Point Clouds and CNFs, and will as a great demo from a "marketing" perspective.

The current blocker for this is on the DiffEqFlux.jl side SciML/DiffEqFlux.jl#342. Reposting what @nirmal-suthar pointed out on Julia Slack:

For CNF layer in PointFlow model, I am using FFJORD from DiffEqFlux. Currently, I am having some trouble with matching generated data with original distribution after training dummy CNF layer. I have discussed this issue with the one who wrote this layer and will get back to this issue in some time. Additionally, this layer also lacks batched format, which is also a serious problem as training a single pointcloud of 1000 points for 10 epochs takes ~10 hours. I tried fixing this for a forward pass, but zygote gave some weird error.

@avik-pal
Copy link
Member Author

avik-pal commented Sep 14, 2020

With SciML/DiffEqFlux.jl#415 the training time is supposed to significantly improve. Might be a good idea to revisit this.

Here is a demo snippet

using DiffEqFlux, Distributions, CUDA, Flux, OrdinaryDiffEq

nn = Chain(Dense(3, 32, relu), Dense(32, 3)) |> gpu
tspan = (0.0f0, 1.0f0)

ffjord = FFJORD(nn, tspan, Tsit5())

pc = randn(Float32, 3, 1000) |> gpu;  # A single point cloud sampled from gaussian
e = randn(Float32, size(pc)) |> gpu;

@btime CUDA.@sync $ffjord($pc, $ffjord.p, $e)
# 184.201 ms (538473 allocations: 14.75 MiB)

@btime CUDA.@sync gradient(p -> sum($ffjord($pc, p, $e)[1]), $ffjord.p)
# 6.324 s (10207728 allocations: 441.39 MiB)

The timings are on a 1650Ti. If you want to train on a batched point cloud just reshaping the 3 x P x N array in to 3 x (P x N) should just do it (it takes only 3GB GPU memory to run a batch size of 1000). The training time after this would be around ~70s for 10 epochs (rather than 10 hrs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant