Midjourney users can now create short videos from static art

Midjourney, known for its trailblazing AI image-generation tools, has unveiled its first video-generation model, V1, marking a significant pivot toward multimedia creation. This initial release enables users to animate still images into short video clips, reflecting the company’s ambition to expand beyond static visuals into dynamic content generation. The announcement comes amid a broader industry push into AI-driven video synthesis, with competitors like OpenAI, Google, and Meta also rolling out or experimenting with similar capabilities.

At its core, V1 is an image-to-video model: after creating or uploading an image on Midjourney’s platform, users see an “animate” button. Pressing this generates a 5-second clip grounded in a text prompt that the system suggests by default, “just making things move,” but which users can override via a “manual” mode to specify motion characteristics. Users may also choose an uploaded image as a “starting frame.” The motion settings include “low motion” (where typically only the subject moves) and “high motion” (where both camera and subject may shift), offering a simple way to control the dynamism of the output. After the initial 5 seconds, users can extend the clip by four seconds at a time, up to four extensions, yielding a maximum of 21 seconds total.

V1 is accessible only via Midjourney’s web interface and Discord server, maintaining the company’s familiar workflow rather than a standalone application. Access requires a subscription: the entry-level plan starts at $10/month, which provides around 3.3 hours of “fast” GPU time (roughly 200 image generations) but video jobs cost about eight times more than image jobs, equating to roughly “one image worth of cost” per second of video. Higher-tier plans offer more GPU time and access in “Relax” mode for queued, slower processing. Midjourney indicates it will review and potentially adjust video pricing following early feedback, a common approach in nascent AI services as usage patterns emerge.

Midjourney’s move comes as part of an intensifying AI video generation race. OpenAI recently debuted Sora, Google has rolled out Veo 3, Adobe’s Firefly includes video features, and startups like Runway have models such as Gen 4. Each platform balances controllability, realism, speed, and cost differently. Midjourney distinguishes itself by targeting its existing user base—artists and creative explorers—retaining a focus on aesthetic exploration over purely commercial B-roll generation. As many competitors emphasize enterprise integration (e.g., advertising, film pre-production), Midjourney’s approach remains centered on experimental creativity, though broader commercial usage may follow once quality and controls improve.

According to founder David Holz, V1 is “a stepping stone” toward models capable of real-time open-world simulations, 3D rendering, and beyond. Transitioning from 2D clips to fully interactive environments poses substantial challenges: generating coherent 3D structures, ensuring temporal consistency over longer durations, and managing computational demands for real-time performance. Midjourney’s roadmap likely involves iterative improvements: enhancing resolution, extending allowed durations, refining motion realism, integrating multi-modal inputs (e.g., text-to-video directly), and eventually supporting 3D outputs. GPU infrastructure and cost-efficiency will be critical; the current subscription model and usage-based pricing must scale to more intensive tasks. Additionally, user feedback will inform model tuning—balancing artistic flexibility with guardrails against problematic content or infringing outputs.

For creative communities on Discord and beyond, V1 adds a new dimension: users can animate favorite artworks or memes, share short clips, and collaborate on animated storytelling. This fosters engagement, community learning, and experimentation with motion design principles. Tutorials and showcase channels will likely emerge rapidly, as happened with image-generation prompts. Educators and content creators may incorporate V1 demos into workshops on AI creativity, highlighting both potential and pitfalls. Meanwhile, limitations in realism and duration may spur hybrid workflows: combining AI-generated clips with traditional editing, compositing, or manual animation tweaks. As AI video tools proliferate, skillsets around prompt engineering, post-processing, and ethical sourcing will become increasingly valuable.

Midjourney’s V1 underscores a broader shift: AI is encroaching on domains once reserved for specialized skills, democratizing motion content creation for non-experts. This can empower individuals and small teams to prototype ideas rapidly, but it also stirs debates about originality, authorship, and value. If anyone can generate short video clips with minimal technical know-how, how will professional animators adapt? Historically, technological leaps (e.g., digital cameras, video editing software) have lowered barriers while creating new opportunities; AI video likely follows a similar trajectory, with initial novelty giving way to standard toolsets integrated into creative pipelines. Yet the novelty phase is exciting: seeing static art come alive in unexpected ways, exploring surreal motions that might be arduous to animate by hand, and envisioning new narrative forms.