More Rresults

Side-by-side Comparsion with Consistent4D

Results are from the official webpage of Consistent4D, where only two selected novel views are provided. Note that our approach requires sigificant less optimization time (15 mins v.s. 2 hours)

missing
missing
missing
Input Video
Consistent4D View 1
Consistent4D View 2
missing
Ours
missing
missing
missing
Input Video
Consistent4D View 1
Consistent4D View 2
missing
Ours
missing
missing
missing
Input Video
Consistent4D View 1
Consistent4D View 2
missing
Ours
missing
missing
missing
Input Video
Consistent4D View 1
Consistent4D View 2
missing
Ours
missing
missing
missing
Input Video
Consistent4D View 1
Consistent4D View 2
missing
Ours
missing
missing
missing
Input Video
Consistent4D View 1
Consistent4D View 2
missing
Ours

Side-by-side comparison with 4DGen

4DGen results are from their official video.

Input Videos


Ours

missing
missing
missing
missing

4DGen

Driving videos

Top: Driving videos. Bottom: generated 4D.

missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing
missing

Texture Refinement: image-to-image v.s. video-to-video

Image-to-image refinement results in clear flickering on the back of the tiger.

missing
Image-to-image
missing
Video-to-video

Comparison Between Different Motion Representation

Motion representation is critical to 4D generation.

missing
Framewise 3DGS
missing
Framewise 3DGS (w/ init)
missing
MLP Deformation
missing
HexPlane Deformation

Effect of HexPlane Resolutions

We show results of different HexPlane resolutions.

missing
S/4
missing
Sx4
missing
T/4
missing
Tx4
missing
Ours (32x32x32)

Bird Flapping Wings

Example of bird flapping wings, compared with Animate124 and Dream-in-4D.

missing
Input Image
missing
Dream-in-4D
missing
Animate124
missing
Ours

Ice Cream Melting

We show an example of an ice cream melting.

missing
Input Image
missing
Driving Video
missing
Generated 4D

Failure Cases

Failure mode 1: low quality video generated by Stable Video Diffusion. The generated horse motion is temporarily inconsistent.

missing
Input Image
missing
Generated Video

Failure mode 2: low quality 3D generated by DreamGaussianHD. The back of the minion is wrongly textured.

missing
Input Image
missing
Generated 3D

Failure mode 3: unnatural deformation. The top of the elephant nose is wrongly moved to its right hand.

missing
Generated 3D
missing
Input Video
missing
Generated 4D

Refinement Ablation

Final results

missing
Generated 4D GS
missing
Extracted mesh
missing
Refined mesh

Refined results using differnt T in the video-to-video pipeline (without refence view reconstruction loss by default).

T=[0.7,0.95] denotes linearly decaying T from 0.7 to 0.95.

missing
T=0.6
missing
T=0.7
missing
T=0.8
missing
T=[0.7,0.95]
missing
T=0.7 + Recon. Loss

Refined videos by SVD at different T.

missing
Input video
missing
T=0.5
missing
T=0.6
missing
T=0.7
missing
T=0.8

Training Iterations

Longer training schudules do not bring visible corrections to the foot motion.

missing
Driving video
missing
#Iteration=200 (Ours)
missing
#Iteration=500
missing
#Iteration=1000
missing
#Iteration=2000

Diverse motions

We show 10 more different 3D motions as a supplementary to Figure. 10 in the main paper.

missing
missing
missing
missing
missing
missing
missing
missing
missing
missing

Dynamic Cameras

Our approach does not require the camera to be static. We show three examples when the camera rotates, shifts, and closes up.

missing
Input Video (rotate)
missing
Input Video (shift)
missing
Input Video (close up)
missing
Generarted 4D (rotate)
missing
Generated 4D (shift)
missing
Generated 4D (close up)

Temporal loss

We try different weights of temporal loss but observe limited or no improvement. weight=10 is the setting we report in the submission.

missing
weight=0
missing
weight=10
missing
weight=100
missing
weight=1000
missing
weight=10000