Midjourney Launches Text to Video Generation Model V1

Midjourney Launches Text to Video Generation Model V1

Published Date: 19 Jun 2025

In a significant breakthrough in artificial intelligence, Midjourney has unveiled its first-ever text-to-video generation model, V1, allowing users to convert images into five-second AI-generated video clips with ease. This innovative technology has the potential to revolutionize the way we create and interact with visual content, and its implications are vast and exciting.

The V1 model, announced on June 18, enables users to upload images or use AI-generated images by Midjourney itself, and then animate them with a simple click. The resulting video clips can be extended up to 20 seconds, although it is unclear if they will include sound. Midjourney CEO David Holz envisions a future where this technology can generate real-time open-world simulations, essentially creating immersive, dynamic environments that respond to user input.

The animations generated through V1 can be created in either Automatic or Manual mode. In Automatic mode, the AI tool suggests a motion prompt to the user, while the Manual setting requires users to input prompts based on how they want the image to move and the scene to develop. Additionally, users can choose from different camera styles, including low-motion and high-motion settings, to customize the look and feel of their video clips.

Midjourney has made V1 accessible to all users, regardless of their subscription tier, although creating a video will require eight times more Graphics Processing Unit (GPU) time than generating still images. To manage this, users can access V1 in either 'fast mode' or 'relax mode', with the latter currently being tested for Pro subscribers and above. Fast mode uses a set amount of GPU time, while relax mode allows for unlimited GPU time, albeit with longer processing times.

The launch of V1 marks a significant milestone in the development of AI-generated content, and its potential applications are diverse and far-reaching. From entertainment and education to marketing and advertising, this technology has the power to transform the way we create, share, and interact with visual content. As Midjourney continues to refine and improve V1, we can expect to see even more innovative and exciting uses for this technology in the future.

In conclusion, Midjourney's V1 text-to-video generation model represents a major breakthrough in AI technology, offering users a powerful tool for creating dynamic, engaging video content. As this technology continues to evolve and improve, we can expect to see significant impacts across a wide range of industries and applications, and its potential to shape the future of visual content creation is vast and exciting.