MoonShot: Towards Controllable Video Generation and Editing with Motion-Aware Multimodal Conditions

David Junhao Zhang Dongxu Li Hung LeMike Zheng ShouCaiming XiongDoyen Sahoo

Salesforce Research    Show Lab, NUS


视频网格

Text to Video Generation

An astronaut is walking on the moon
A middle aged woman walking on the street, whole body, simple background
A panda standing on a surfboard in the ocean in sunset
A corgi running on the grass
A wizard is summoning thunder and lightning next to a waterfall
Spider Man is looking around in the time square
A painting of a french bulldog dog portrait in the style of vincent van gogh the starry night, thick paint, big brushstrokes
a big ocean wave at daybreak
Ethnic lady on the beach at sunset, walking slowly into the waters, gentle light
Leica portrait of a gremlin skateboarding
There is a huge lobster on the bathtub, cartoon styleg
an astronaut sitting on the roof of Doge Charger RT 1969, whole scene view, heavy fog around the car
Machinarium is award-winning independent adventure game developed by the makers of Samorost and Bot
A space probe zooms by, carrying scientific instruments, exploring uncharted interstellar regions
A disoriented astronaut, lost in a galaxy of swirling colors, floating in zero gravity
Robotic eagle, 8k unreal engine render, wires and gears
a humming bird is flying, hovering around flowers in a garden, a super close-up
A cute and tiny frog commander inside the Space Shuttle's control cockpit
a gypsy woman playing with a pile of tarot cards, in the style of Hayao Miyazaki's animation
A self rotating hamburger
a cute monkey doing figure skating on the ice, with a dora backpack on its back

Zero-Shot Subject Customized Video Generation

Directly Using Image ControlNet

Image Animation Comparisons



Video Editing Comparisons

视频网格

Ablation for effect of the motion-aware module and spatiotemporal layers

baseline
A robot is walking
+ motion-aware "A robot is walking" Image to Video (Same Seed)
A robot is walking
++spatiotemporal attention
A robot is walking
baseline
A corgi running on the grass
+ motion-aware "A corgi running on the grass" Text to Video (Same Seed)
A corgi running on the grass
++ spatiotemporal attention
A corgi running on the grass

Ablation for Multimodal Dual Cross-Attn for Video Generation

Ablation for Multimodal Dual Cross-Attn for Image Animation