Video-As-Prompt: Unified Semantic Control for
Video Generation

Yuxuan Bian ^1,2,* Xin Chen ^2,†,‡ Zenan Li ² Tiancheng Zhi ² Shen Sang ² Linjie Luo ^2,‡ Qiang Xu ^1,‡

¹ The Chinese University of Hong Kong ² ByteDance

^* Work done at ByteDance Intelligent Creation Lab ^† Project Lead ^‡ Corresponding Author

📄 Technical Report GitHub 🤗 HF Model 💼 HF Dataset

^# This page contains many high-fidelity video demonstrations; we recommend waiting until all videos have completely loaded to ensure smooth and accurate visualization.

Treating Reference Videos As In-Context Prompts

By treating a reference video with the wanted semantic as a video prompt and achieving plug-and-play in-context generation via mixture-of-transformers structure, we can generate videos that are semantically consistent with the reference videos.

Applications

Our Video-as-Prompt model supports various downstream applications:
(1) Different reference videos (different semantic) + same reference images → generate the video aligned with each semantics consistently;
(2) Different reference videos (same semantic) + same reference images → generate the video aligned with the provided semantics consistently;
(3) Same reference videos + different reference images → transfer the same semantic (concept/style/motion/camera) to different reference images;
(4) Same reference video & image + user-modified prompt → preserve semantics and identity while using prompt to adjust fine-grained attributes.

Different Reference Video (different semantics) and Same Reference Images

Given different reference videos with the different semantic and a reference image, our model can also generate videos aligned with each semantics in the given reference videos.

Different Reference Video (same semantic) and Same Reference Images

Given different reference videos with the same semantic and a reference image, our model can consistently generate videos aligned with the provided semantics in the given reference videos.

Same Reference Video and Different Reference Images

Given a reference video, our model can generate new videos based on different reference images that are semantically consistent with the reference video.

Reference Video

Generated Video 1

Generated Video 2

Generated Video 3

Same Reference Video & Image and User-Modified Prompt

Given a reference video and a reference image, our model can preserve semantics and identity while using prompt to adjust some fine-grained attributes.

... a Ladudu toy character with black fur...

... a Ladudu toy character with golden fur...

... a Ladudu toy character with green fur...

... a Ladudu toy character with purple fur...

... a Ladudu toy character with red fur...

... a Ladudu toy character with white fur...

Zero-Shot Semantic-Guided Generation

Given reference videos with unseen semantics, our model can generate videos that are semantically consistent with the reference videos in a zero-shot manner.

Gallery: Concept-Guided Video Generation

Generates videos that share a high-level concept semantic, such as entity transformation (e.g., the target becomes a ladudu doll / Minecraft character) or entity interaction (e.g., an AI lover approaches the target, the target is covered by some liquid metal).

Our model Video-As-Prompt takes reference videos as prompts (left one) and generates videos that are semantically consistent with the reference videos (right one).

Turn into a ladudu

A young woman with curly hair... a brilliant shower of golden, sparkling particles erupts... she is revealed to have seamlessly transformed into a ladudu toy character...

A stone statue of Abraham Lincoln... a flash of golden, sparkling light erupts... the head has smoothly transformed into a ladudu toy...

An elderly man with a warm, wrinkled smile... a brilliant golden light flashes from his face... the head smoothly transforms into the head of a ladudu toy monster...

A man with long brown hair... a brilliant golden light with sparkling particles envelops his upper body... the head is now a large, tan-colored vinyl bunny character head with long ears, large black eyes, and a wide, toothy grin...

A man with a beard... a burst of golden, sparkling light erupts... the head has transformed into that of a furry, light-brown ladudu toy creature with large, expressive eyes and a wide, toothy grin...

A man with silver-streaked dark hair and a striped sweater... a brilliant burst of warm, golden light erupts... the head is now a large, stylized vinyl toy character...

A young man wearing a blue puffer jacket... A brilliant burst of golden, sparkling light suddenly emanates ...his head smoothly transforms into that of a ladudu toy character...

A smiling woman with long, wavy hair... a bright, golden, sparkling light effect with motion blur envelops her... revealing her head has smoothly transformed into a stylized toy character wearing a pink, fluffy bunny-ear hat. The character has large...

A smiling man with a beard... As he faces forward, a brief, warm golden light flares around him, and he seamlessly transforms into a ladudu...

Turn into a Minecraft character

A young man with short dark hair sits... A blinding white light erupts from... As the flash recedes, the entire reality transforms into a voxelated, block-style world...

A young woman with long dark hair and a dark off-the-shoulder top stands... A brilliant yellow light flares from her hand, quickly engulfing the frame in a bright white flash... As the flash subsides, the entire scene re-renders into a voxelated, Minecraft-style world...

A young boy in a blue jacket and camouflage pants sits... A brilliant, sparkling golden light suddenly emanates from his body, intensifying rapidly until it consumes the entire frame in a bright white flash... The light subsides to reveal the scene completely transformed into a blocky, voxelated world...

A young woman with brown bobbed hair, wearing a yellow tank top and dark skirt, stands... The image is hazy, and a soft glow emanates from her... The scene resolves from the light, now completely transformed into a voxelated, block-style world...

A woman in a colorful bikini poses... A brilliant white flash, instantly transforming the entire world into a Minecraft-inspired voxel art style...

A macro shot captures an iridescent green beetle resting on a leaf. As the camera subtly shifts, a brilliant white flash engulfs the frame. The image resolves into a fully voxelated, Minecraft-style world...

A live-action red fox stands in a lush garden... A brilliant white flash with sparkling light particles suddenly erupts from the center, completely obscuring the scene. As the light recedes, the entire image is revealed to be transformed into a voxelated, Minecraft-style world...

A small, white, fluffy dog wearing a red coat stands still in a snowy landscape, looking directly at the camera. A sudden, brilliant white flash filled with sparkling particles erupts, completely engulfing the dog and the scene. As the light dissipates, the entire view is transformed into a voxelated, Minecraft-style world...

A young orange tabby cat with green eyes lies on a gravel path... A brilliant white light with radiating sparkles suddenly engulfs the feline, initiating a transformation. As the light recedes, the cat's entire form shifts into a blocky, voxel-based style reminiscent of Minecraft...

Turn into a Squid Game character

A smiling baby in a white onesie... As the baby raises its hands to its head, a red hooded jumpsuit and a black mask with a white circle symbol instantly materialize over its body... Now fully transformed into a Squid Game guard, the baby kneels upright on the bed. It pulls out a black toy gun and points it directly at the camera...

A woman in a dark blue dress stands... As she raises her hands, a red hooded jacket and a black mask with a white square symbol instantly materialize over her head. The red fabric seamlessly expands downwards, transforming her dress into a full red jumpsuit, cinched with a black belt and paired with black gloves. As she completes her transformation...

In a field of white daisies, a young woman ... In a seamless transition, her original clothes morph into a full-body red jumpsuit, complete with a black belt and gloves, transforming her into a Squid Game guard...

Turn into a Fuzzy Toy

A young boy with dark hair sits... He performs a backflip, his form blurring and kicking up a cloud of leaves from the ground. As he lands, he seamlessly transforms into a smiling, soft plush doll...

A bouquet of red roses in a colorful glass vase rests... In a sudden puff of white smoke, the vase and flowers are instantly replaced by a cute, round, white plush toy...

Four small, black-and-tan Yorkshire Terrier puppies stand on a clean white studio floor. A sudden flash initiates the effect, launching the puppies upward in a quick, motion-blurred ascent. As they rise, their realistic fur and features seamlessly morph into soft, plush forms...

Turn into a Muscling Man

A young man with dark hair and a red polo shirt sits outdoors... In a swift, continuous motion, he pulls the shirt up over his head. As the fabric rises, his torso undergoes a transformation; a previously softer physique is replaced by a chiseled, muscular build, revealing defined pectoral muscles and prominent abdominal muscles...

A young man with styled light brown hair... As the blur settles, his shirt vanishes, and his torso is revealed to be significantly more muscular and sculpted. His chest, shoulders, and arms now display prominent definition and mass, while his core identity are perfectly preserved...

A man with a shaved head, beard... As the clothes clear, he is revealed to be shirtless, his physique instantly transformed into a much more muscular and well-defined version of himself...

Turn into a Figure

A woman wearing a lime green headwrap... As she turns, she seamlessly transforms into a stylized, chibi-like toy figurine with a smooth, polished texture. Her headwrap simplifies into a solid green beanie... The resulting figurine, standing on a black circular base, continues its uninterrupted clockwise spin...

A young woman initiates a quick clockwise spin, her form blurring and shrinking as she turns. In a smooth, continuous motion, she transforms into a small, stylized plastic figurine of herself. The resulting toy figure stands on a light pink circular base...

A young woman... She initiates a swift counter-clockwise spin on the spot. During the rotation, she seamlessly morphs into a small, cute plastic figurine. The transformation concludes with the figurine standing perfectly still on a circular pedestal in the same position...

AI Lover Drop

A young woman with long, wavy blonde hair... From the top right of the frame, a man in a white collared shirt smoothly descends, his motion continuous rather than a sudden appearance. As he settles beside her, she turns her head to meet him. He embraces her, and they share an intimate kiss...

A person in a full white hazmat suit ... From the top of the frame, a woman with long dark hair, dressed in a matching white jumpsuit with a red belt, smoothly drops into the scene. Her descent is continuous and fluid as she lands softly on her knees beside the man. Without hesitation, she leans forward, places her hands on his shoulders, and kisses the side of his respirator...

A young man with wavy brown hair, dressed in a black shirt... From the top of the frame, a woman wearing a white dress and matching heels smoothly descends into view behind him. The woman reveals her dark hair styled with a tiara and elegant earrings. She completes her gentle descent and stands behind the man, leaning in to place her hand on his neck in a tender embrace...

...a fluffy, long-haired cat with blue eyes and cream-point fur rests on a white shelf. A second cat, a blue-eyed lynx-point tabby, drops smoothly into the frame from the top. It lands gently beside the first cat without any abrupt cuts. The tabby settles in, turning to face its companion...

A hen with dark, iridescent feathers and a golden-hackled neck stands in... another lighter brown hen smoothly descends into the frame from the top left, its legs dangling as if in mid-flight. It lands gently beside the first hen, and they turn to face each other...

... a sleeping black-and-white French Bulldog puppy... From the top of the frame, a second tricolor puppy with pointed ears executes a smooth, continuous drop into the scene. This descending puppy lands gently atop its sleeping companion without any cuts or sudden appearances. It immediately begins to nuzzle and lick the face of the first puppy...

Cover Liquid Metal

A static, front-facing figurine of Grogu... A transformation begins as a viscous, reflective gold liquid appears on the figurine's forehead. This molten gold cascades downwards, flowing over its face, large ears, and dripping over the collar. The liquid metal continues to spread, smoothly enveloping the head, robe, and raised arm in a thick, shiny coating...

A brightly colored red-eyed tree frog with green skin... a thick, reflective liquid of molten gold pours onto the frog's head and back. This viscous metallic substance flows downwards, cascading in thick, syrupy drips along its body and sides, eventually pooling on the leaf surface beneath it...

...a stylish man in a green suit and fedora leans back on a couch, exhaling a plume of smoke. As the smoke drifts, a shimmering silver liquid metal begins to pour over his head and face. The metallic fluid flows and drips rapidly downwards, completely encasing his hat, head, and the collar of his suit...

...A thick, molten liquid begins to pour over the top of its head, cascading down its face and neck. The viscous substance, initially a mix of brown and silver, flows like thick honey, obscuring the animal's fur and features. As it spreads, the liquid transforms into a lustrous, reflective bronze metal...

...A shimmering, viscous golden liquid begins to pour from the top of the frames, dripping downwards like melting metal. The molten substance flows over the images, first covering the two larger pictures and then the smaller ones below...

A young woman with long brown hair and a denim jacket stands... A shiny, viscous bronze liquid metal begins to pour over the crown of her head, cascading down her face and hair. The metallic fluid flows rapidly, enveloping her skin and clothing in thick streams...

Paper Man

...A hand in a suit sleeve enters the frame from the left, grasps the man's shoulder, and pulls him swiftly sideways. The man slides out of the frame to the left as a single, flat, rigid entity, like a paper cutout being removed from a scene...

...A giant, photorealistic human hand descends from the top left corner of the frame. The hand approaches the man and appears to grasp his shoulders. It then swiftly lifts him up and to the left, pulling him out of the scene...

...From the left, another person's arm enters the frame and their hand grips the man's shoulder. In a quick, fluid motion, the hand pulls the man horizontally to the left. The man and his guitar are dragged out of the scene as a single, rigid, two-dimensional cutout, maintaining his seated posture perfectly, as if he were made of paper...

...A disembodied arm in a light blue sleeve enters from the left, its hand moving toward the man's shoulder. The hand appears to grasp him, and then in a swift, horizontal motion, pulls the man to the left and completely out of the frame. The man, his chair, the cup he holds, and the camera on the table all move together as a single, flat, rigid element, sliding away like a paper cutout...

...A hand in a dark suit sleeve enters from the left and grasps the side of the man's head and neck. In a swift, rigid motion, the hand pulls the man's entire figure horizontally to the left, yanking him out of the frame as if he were a flat paper cutout...

...From the left, a man's hand enters the frame, places itself on her shoulder, and then swiftly pulls her horizontally out of the scene. As she is removed, her body slides rigidly and without depth, like a flat, two-dimensional paper cutout being dragged away, showing no natural body rotation or weight shift...

Zongzi Wrap

...Large, vibrant green leaves suddenly sprout from the fish's sides, resembling wings. These leaves quickly expand and wrap around the fish's body, folding over it seamlessly. As the leaves encase the subject, light-colored twine materializes and binds the new form, crisscrossing over the midsection and securing the tail. The transformation concludes with the clownfish becoming a fish-shaped zongzi...

...Two large, vibrant green leaves emerge and wrap around her torso like a vest, introducing selective color. The transformation progresses as more leaves seamlessly grow and envelop her entire body, including her arms and head. Natural twine appears and binds the leaves, securing them around her form. The effect concludes with the woman completely encased in a green leaf costume resembling a zongzi...

...From the base of the cake, large, vibrant green leaves begin to emerge. These leaves grow upward and curve inward, their broad surfaces progressively enveloping the confection. The motion is smooth and continuous as the leaves meet at the top, completely concealing the fruit and frosting. Finally, a network of thin, tan-colored twine appears, wrapping around the leafy dome and cinching it tightly in the center, securing the package like a traditional zongzi...

Gallery: Style-Guided Video Generation

Generates videos in a reference style (e.g., Ghibli, Simpsons, etc.).

Our model Video-As-Prompt takes reference videos as prompts (left one) and generates videos that are semantically consistent with the reference videos (right one).

Ghibli Style

...The camera holds a static medium shot as the entire scene abruptly transforms into a Ghibli-style animation. The intricate textures of the fox's coat, the rocky ground, and the fallen log behind it all soften into hand-drawn surfaces with clean lines and a simplified color palette. The fox's realistic features morph into stylized...

...As he waves, the entire live-action scene instantly transforms into a Ghibli-style animation. The photorealistic details of his knitted beanie, plaid shirt, and the foliage-filled background are replaced with simplified, hand-drawn aesthetics. His features are rendered with clean lines, and his clothing adopts a soft, painterly texture...

...The realistic scene, filled with colorful carnival rides, stalls, and tiny figures, begins a gradual transformation into a Ghibli-style animation. The entire frame softens as photorealistic textures are replaced by painterly brushstrokes and simplified forms. The initial muted, overcast palette warms considerably, with vibrant primary colors on the rides and a lush, uniform green for the grass, all under a newly bright blue sky...

...The entire live-action scene instantly transitions into a Ghibli-style animation. Photographic details and complex textures are replaced with soft, hand-painted visuals, characterized by simplified lines and a warm, unified color palette...

...The entire scene then instantly transforms into a Studio Ghibli-style anime illustration. The photographic image becomes a hand-drawn cartoon with soft lines and a painterly aesthetic. The sequins on her top simplify into a pattern of soft dots, and her hair is rendered in solid color blocks with gentle highlights...

...The scene, which begins as a photograph, smoothly transitions into a Ghibli-style anime illustration. Photographic details like fabric texture and metallic sheen dissolve, replaced by soft brushstrokes and a simplified, warmer color palette. The men's faces and clothing become stylized but remain recognizable...

...A smooth transformation washes over the entire scene, converting the realistic image into a Ghibli-style animation. The sharp, complex textures of the rock and foliage soften into broad, painterly brushstrokes and simplified forms...

...The video abruptly transitions from a live-action shot to a Ghibli-style animation. The transformation replaces photorealistic detail with soft, hand-drawn lines, simplified textures, and a warm, cohesive color palette...

...The entire scene smoothly transitions into a Ghibli-style animation. The lion's realistic fur texture softens into broad, painterly strokes of tan and brown, and its features simplify with clean outlines. The detailed bark of the log and the complex foliage in the background are similarly transformed into hand-drawn elements with a warmer...

Simpsons Style

...The entire realistic scene smoothly cross-dissolves into a Simpsons-style animation. The man's skin becomes the iconic yellow, and his features, clothing, and the surrounding environment are redrawn with bold black outlines and flat, cel-shaded colors...

...The entire realistic photograph smoothly transforms into a cartoon illustration in the iconic style of "The Simpsons." During the transition, the image's photorealistic textures and muted colors dissolve into a world of flat, vibrant colors and bold black outlines...

...The entire photorealistic scene smoothly transitions into a cartoon animation in the style of The Simpsons. Her skin becomes yellow, her features simplify into large, stylized eyes and an overbite, and the textures of her hair and clothing flatten into cel-shaded forms, though their original brown and blue colors are preserved...

Blooming Style

...a massive bouquet of pink, orange, and white flowers magically blooms from the central bungalows, growing rapidly as petals begin to fall. The entire scene then transforms into a stylized, painterly world. The realistic bungalows are replaced by whimsical versions, their roofs now completely covered in a dense carpet of small, multicolored flowers...

...A dense bouquet of pink and white cosmos flowers with green stems suddenly blossoms from the statue's chest. This floral burst rapidly expands outward in a radial pattern, as individual flowers, petals, and leaves detach and float gracefully into the air...

...A dense carpet of tiny, colorful flowers and green foliage blooms across the dog's fur, starting from its head and spreading over its body, completely replacing its black coat with a vibrant floral texture. As the floral bloom completes...

Gallery: Motion-Guided Video Generation

Generates videos with a reference motion, including Non-Human Motion (e.g., float like balloons) and Human Motion (e.g., dance in a shaking style).

Our model Video-As-Prompt takes reference videos as prompts (left one) and generates videos that are semantically consistent with the reference videos (right one).

Expansion

...The horse begins to inflate, its defined, muscular body swelling and rounding into a smooth, balloon-like form while retaining its rich, brown hide color. Without changing its orientation, the now-buoyant horse lifts silently from the ground...

...Her torso suddenly begins to inflate like a balloon, expanding rapidly into a large, round form. The black fabric of her top stretches to become the balloon's skin, and a subtle purple-to-red gradient appears on its lower half...

...As they expand, the detailed texture of the fruit's pulp and segments smooths over, transforming the halves into a seamless, unified, bright-orange sphere that resembles a balloon. Once the transformation is complete, the newly formed balloon-like object lifts off the surface and accelerates vertically...

...its body begins to rapidly inflate, swelling outward into a perfect sphere. Its head, legs, and tail remain attached to the ballooning form, which inherits the beige and white coloring of the dog's underbelly. Once fully transformed into a buoyant orb, the beagle lifts from the branch...

...The bird begins to inflate rapidly, its body swelling and rounding out as its feathery texture smooths into a taut, balloon-like surface. The sparrow's grey and brown plumage maps onto the expanding sphere, becoming its primary colors. The stalk it perches on seamlessly becomes the balloon's string...

...As he expands, his facial expression briefly shifts from wide-eyed to a content smile, then to an angry frown. The inflation transforms his midsection into a large, spherical balloon, the gray fabric of his coat stretching over the top while a new pink color emerges from the bottom...

Squish

...As the hands make contact and squeeze, the man and his camera instantly transform into a single, soft, white, clay-like object. The original colors of his hair, skin, shirt, and the black camera are preserved as surface textures on this new malleable form. The hands continue to knead and press into the squishy mass, deforming its shape with each successive squeeze....

...As the hands squeeze inward, her entire form instantly transforms into a soft, dough-like material. The hands knead and compress the figure, folding it into a compact ball. Her features and clothing distort completely, but the yellow color of the hoodie and the dark tones of her hair are mapped onto the new malleable texture...

...As the hands make contact, the entire group of parrots instantly transforms into a single, cohesive piece of soft, malleable clay. The vibrant blue, yellow, and green colors of the parrots' feathers are preserved on the surface of the new clay-like object. The hands then proceed to squish and knead the mass...

Shake it Dance

...he raises his arms and clasps his hands behind his head. He then begins a rhythmic, side-to-side swaying motion with his upper body. His torso and head wiggle from left to right in a steady, fluid dance...

...The girl initiates a "shake it dance" by raising both hands to the sides of her head. She then begins to sway her upper body and head from side to side in a steady, rhythmic motion. This dance is continuous and has a stable, playful cadence...

...he raises both hands to the back of his head, framing his face. He then initiates a simple dance, rhythmically swaying his upper body from side to side with a steady cadence. His joyful expression remains as he continues the swaying motion...

Gallery: Camera-Guided Video Generation

Generates videos that follow reference camera motion, from basic translations (up, down, left, right, zoom-in, zoom-out) to the complex Hitchcock dolly zoom.

Our model Video-As-Prompt takes reference videos as prompts (left one) and generates videos that are semantically consistent with the reference videos (right one).

Hitchcock Camera Movement

...A smooth Hitchcock zoom commences, where the camera dollies backward away from the boy while the lens simultaneously zooms in. This technique keeps the boy's size relatively constant within the frame but causes the background of trees and green foliage to dramatically compress and warp...

...The camera executes a smooth dolly zoom, pulling backward while simultaneously zooming in on her. This technique maintains her size as a constant within the frame, making her appear isolated and static. In stark contrast, the soft-focus background of green foliage expands dramatically...

... Throughout the shot, the camera performs a slow Hitchcock zoom, pushing in towards the man while simultaneously zooming out. This effect keeps the man's size in the frame relatively constant but causes the background forest to visibly stretch and expand, creating a disorienting warp in perspective...

...A dolly zoom effect is executed, where the camera physically pulls away from her while the lens zooms in. This technique keeps the woman's size consistent within the frame, but the background undergoes a dramatic transformation...

...The camera performs a steady Hitchcock zoom, physically moving forward towards the cup while the lens simultaneously zooms out. This action holds the cup's size and position constant in the frame. The background, however, undergoes a significant transformation; an out-of-focus wooden lantern and a bright white curtain appear to expand and rush forward...

...The camera executes a slow, continuous Hitchcock zoom, pulling away from the hydrant while simultaneously zooming in. This creates a disorienting effect where the hydrant stays a constant size, but the out-of-focus green hedge in the background appears to expand and compress, dramatically altering the sense of perspective and depth...

Earth Zoom Out

...The camera immediately begins a continuous and rapid pull-out, ascending straight up and away from the subject. As the camera retreats at a smooth, uninterrupted speed, the person and the dunes shrink, transforming the scene into a high-altitude, satellite-like view of a coastline. The pull-out continues, revealing the deep blue of the ocean, swirling cloud formations, and the distinct curvature of the Earth...

...The camera begins a rapid and continuous pull-out, ascending straight up into the atmosphere. As it retreats, the individual and the swing dissolve into the landscape, which seamlessly transitions into a satellite view of a verdant landmass seen through wispy clouds...

...A continuous and rapid pull-out begins, with the camera moving straight back at an accelerating pace. As the camera retreats, the distinct image of the cat and its surroundings seamlessly dissolves and resolves into a high-altitude, top-down view of the Earth...

Orbit

...The camera executes a smooth, steady, partial orbit around the money, moving from left to right. The banknotes and the white background remain perfectly static, with only the camera's changing viewpoint creating a sense of three-dimensional space and depth...

...The camera performs a slow, smooth clockwise orbit around the cup, which remains stationary. This movement creates a subtle parallax effect, shifting the visual relationship between the cup, a wooden lantern in the shallow-focused background, and the brightly lit window...

...The camera executes a slow, subtle clockwise orbit around the food arrangement. This gentle arcing motion creates a soft parallax effect, where the foreground bowl shifts in perspective relative to the bowls in the mid-ground and background...

Move Left

...As the camera glides leftward, the entire scene drifts smoothly and uniformly to the right, maintaining its composition. The foreground structures move at a slightly faster rate than those higher up the green hill, creating a subtle sense of depth...

...The camera executes a smooth, steady pan to the left, causing the entire scene—the cat, the orchid, and the candlesticks—to drift uniformly across the frame from left to right. The background, a simple grey wall, and the white shelf remain consistent throughout the movement, maintaining perfect spatial relationships between all elements as they exit the frame...

...The camera begins a slow, steady pan to the left. The man remains static, causing him to drift out of the frame to the right as the camera's perspective shifts. The movement is smooth and continuous, revealing more of the industrial-style interior, including a long wooden bar with branded logos and several stools...

BibTeX

@article{bian2025videoasprompt,
  title   = {Video-As-Prompt: Unified Semantic Control for Video Generation},
  author  = {Yuxuan Bian and Xin Chen and Zenan Li and Tiancheng Zhi and Shen Sang and Linjie Luo and Qiang Xu},
  journal = {arXiv preprint arXiv:2510.20888},
  year    = {2025},
  url     = {https://arxiv.org/abs/2510.20888}
}