Multimodal AI Dark Horse Unveils New Tool: One Product for Image, Video, and Podcast Generation with Hundreds of Effects by Expert Team

Frontier Models · Published: Jun 25, 2025 · James Hayes · ~22 min read

Author Info

Cloud & MLOps Staff Writer

AWS Solutions Architect Professional; ex-platform engineer at a Series C AI startup

James documents how teams ship models to production: inference stacks, observability, cost controls, and incident response. He reproduces deployment patterns in sandbox environments when feasible and labels what was not independently verified. Readers rely on his work for practical checklists and version-specific caveats.

#MLOps #Inference Infrastructure #Cost Optimization #Reliability Engineering

Full author profile →

AI Heavyweight Mei Tao Takes the Helm: A New Multimodal AI Arrives!

Its capabilities are nothing short of all-encompassing.

A dark horse in multimodal AI breaks records and creates another powerhouse: One product handles image, video, and podcast generation, featuring over a hundred built-in effects, produced by Dr. Mei Tao’s team

It not only supports image and video generation:

But also masters fantasy scenes and diverse camera angles:

Furthermore, the lip-sync feature is now live, allowing even introverts to easily create podcasts:

Video Link: https://mp.weixin.qq.com/s/bYNU6Mei2pq7KuFR8Ik2dQ

Key Highlights:

The official platform also provides hundreds of ready-to-use fun effect templates, enabling users to achieve “effortless creation.”

For cool transformations like the one below, the operation is as simple as uploading a single image:

Templates for transforming people, animals, and buildings are all available:

Additionally, the Image Agent in the image generation section is a flagship feature. Users can generate and edit images using plain language; not knowing how to write prompts is no longer an issue, as the system will automatically optimize and refine them for you.

To cut to the chase, this latest creative tool is vivago 2.0 (Zhi Xiaoxiang AI).

The team behind it, HiDream.ai, was founded by Mei Tao, a renowned figure in the industry and an academician of the Canadian Academy of Engineering. The R&D team is packed with core talents from the University of Science and Technology of China (USTC).

Recently, the team’s open-source model HiDream-I1 made a splash in the text-to-image arena. Within just 24 hours of being open-sourced, it topped the leaderboards, becoming one of the first domestic open-source large models to enter the top tier.

At the time, even Recraft (the team behind the mysterious viral “red_hat” panda) integrated it overnight, with global creators rushing to incorporate it into their workflows.

Interestingly, vivago 2.0 actually leverages the capabilities of HiDream-I1.

Currently, vivago 2.0 has launched globally on both the web and app platforms. As this site cannot miss out on such new toys, we got our hands on it for an immediate experience. We also conducted a deep dive into the models powering it.

![A dark horse in multimodal AI breaks records and creates another powerhouse: One product handles image, video, and podcast generation, featuring over a hundred built-in effects, produced by Dr. Mei Tao’s team](/news-archive/2026-04-9afdbd733f/images/img-01

A Beginner’s Guide to the New Multimodal Tool

VivaGo 2.0 focuses on six core features: Image Generation, Image-to-Video Conversion, AI Podcasts, Special Effects Templates, Creative Community, and Trending Topics.

Let’s explore them one by one.

Image, Video, and Podcast Generation in One Place

First, let’s look at image generation, which supports both text-to-image and text-plus-reference-image modes.

For pure text-to-image generation, VivaGo 2.0 solves the common problem of users not knowing how to write effective prompts.

You will notice a “Prompt Robot” button in the bottom-right corner of the prompt input field:

Multimodal AI dark horse tops charts again with a new tool: One product handles image, video, and podcast generation, featuring hundreds of special effects, produced by the renowned team led by Mei Tao

Once clicked, simply enter a few keywords from your mind, and it will automatically organize them into creative, complete prompts. You can click “Use Prompt” to auto-import it into the input field, or choose “Cite” to further modify it.

Additionally, you can set parameters such as the number of generated images, image dimensions, and negative prompts:

Without further ado, let’s look at the results.

Generating a glass of lemon sparkling water reveals almost no AI artifacts, with impressive detail:

First-person perspective image generation is also supported, as shown below:

For text-plus-image generation (uploading a reference image), there are three settings: Full Reference, Portrait, and Redraw.

“Full Reference” automatically uses the entire image as a guide for generation; “Portrait” extracts facial features to generate images of the same person in different styles; “Redraw” re-renders the original image into various artistic styles.

It effortlessly handles various styles, including photorealistic, illustration, Pixar-style, and 3D:

![Multimodal AI dark horse tops charts again with a new tool: One product handles image, video, and podcast generation, featuring hundreds of special effects, produced by the renowned team led by Mei Tao](https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/aaffa9da59f7494bac13757338641436~tplv-tt-origin-web:gif.jpeg?_iz=58558&from=article.pc_detail&lk3s=9531

△ Left: Reference image; Right: Cyberpunk style conversion

The most significant feature in image generation is Image Agent, which introduces a completely new interactive format for creating images.

Within a single chat interface, users can freely express their creative ideas. Whether the request involves editing existing images or generating new ones, the Agent accurately interprets user intent based on contextual information.

Both image generation and editing can be performed in batches.

For example, if you generate an image of a puppy chasing a frisbee on grass, you can then ask to modify it into a pixel art style. Vivago 2.0 can process four images simultaneously while maintaining consistency with other elements in the original photos.

Multimodal AI Dark Horse Breaks Leaderboards Again: One Product Handles Image, Video, and Podcast Generation with Built-in Hundreds of Effects, Produced by Veteran Team Member Mei Tao

Image Agent also offers “rewrite” and “help me write” prompt functions. Users only need to express their ideas in plain language to create content.

Multimodal AI Dark Horse Breaks Leaderboards Again: One Product Handles Image, Video, and Podcast Generation with Built-in Hundreds of Effects, Produced by Veteran Team Member Mei Tao

Next, in the realm of video generation, there are two main modes: image-to-video and text-to-video.

Image-to-video can be generated from a single image or by setting start and end frames using two images.

Multimodal AI Dark Horse Breaks Leaderboards Again: One Product Handles Image, Video, and Podcast Generation with Built-in Hundreds of Effects, Produced by Veteran Team Member Mei Tao

By setting two keyframes for the start and end, users can generate smooth “transformation” style videos with a single click.

Multimodal AI Dark Horse Breaks Leaderboards Again: One Product Handles Image, Video, and Podcast Generation with Built-in Hundreds of Effects, Produced by Veteran Team Member Mei Tao

Various scenes can be transitioned seamlessly:

Multimodal AI Dark Horse Breaks Leaderboards Again: One Product Handles Image, Video, and Podcast Generation with Built-in Hundreds of Effects, Produced by Veteran Team Member Mei Tao

Vivago 2.0 also features a more convenient and efficient design.

On the image generation interface, users can directly click buttons on the generated images to initiate video creation and other operations.

![Multimodal AI Dark Horse Breaks Leaderboards Again: One Product Handles Image, Video, and Podcast Generation with Built-in Hundreds of Effects, Produced by Veteran Team Member Mei Tao](https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/09a427d847914e

48ad88da8671ddf4f6~tplv-tt-origin-web:gif.jpeg?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1751437231&x-signature=LLGYUndY0xMjdE%2BcM%2BgvASnXjak%3D)

Thus, the bicycle-riding image we generated earlier comes to life with a single click:

Multimodal AI Dark Horse Makes Another Hit After Toppling Leaderboards: One Product Handles Image, Video, and Podcast Generation, With Hundreds of Built-in Effects, Produced by Expert Team Member Mei Tao

Whether it’s a realistic scene or an imaginative fantasy, Vivago 2.0 can transform any image into dynamic video with just one sentence.

For example, a dog surfing on the ocean:

Or even a modified static meme (I’m crying, but the tears are from menthol ointment fumes). Vivago 2.0 will also automatically enhance image quality.

After seeing the images and videos, let’s look at the AI Podcast feature.

The AI podcast creation function involves lip-syncing. You can either record your own voice or provide text for the AI to generate speech.

It can also be generated directly based on existing images or videos.

When the text “Life is like a box of chocolates. You never know what you’re gonna get” is input, the character in the image naturally syncs their lip movements to the audio.

At the same time, the character’s body language changes in sync with the speech.

We specifically selected an image showing a profile view of a person, and the lip-syncing remains smooth and natural.

![Multimodal AI Dark Horse Makes Another Hit After Toppling Leaderboards: One Product Handles Image, Video, and Podcast Generation, With Hundreds of Built-in Effects, Produced by Expert Team Member Mei Tao](https://

https://mp.weixin.qq.com/s/bYNU6Mei2pq7KuFR8Ik2dQ

VivaGo 2.0 offers even more social and open-ended features.

More Features, Hundreds of Effects to Choose From

First and foremost are the effect templates. The official platform provides over 300 stylish templates that users can apply with a single click, allowing beginners to instantly become effect masters.