AIME'25 Perfect Score! Qwen's Seven-Model Release Sparks Major Family Update

Frontier Models · Published: Sep 24, 2025 · Priya Sharma · ~8 min read

Author Info

Enterprise AI & Governance Editor

JD (technology policy focus); CIPP/US; former in-house counsel at a cloud provider

Priya writes about regulation, enterprise procurement, and responsible deployment. She separates legal fact from commentary, flags jurisdictional limits, and works with external counsel on high-risk governance topics. Her articles emphasize what changed, who is accountable, and what practitioners should verify locally.

#AI Regulation #Enterprise Adoption #Risk & Compliance #Policy Analysis

Full author profile →

Here it is, here it comes!

The next-generation flagship model, Qwen3-Max, has officially arrived—with a perfect score.

For the first time, a domestic large language model has achieved a 100% score on both the AIME25 and HMMT mathematics evaluation benchmarks!

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Consistent with the recently released Qwen3-Max-Preview, the parameter count remains in the trillions.

However, this official release introduces a version split:

Instruct Version
Thinking Version

Qwen3-Max also demonstrates performance improvements across both emotional and cognitive intelligence metrics.

The perfect mathematics score mentioned earlier was achieved by the Thinking Version.

Meanwhile, the Instruct Version scored 69.6 on the SWE-Bench evaluation (which tests large models’ ability to solve real-world problems via coding), placing it in the global top tier.

It also scored 74.8 on the Tau2 Bench test (evaluating Agent tool-calling capabilities), surpassing Claude Opus 4 and DeepSeek V3.1.

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

It is indeed quite powerful.

But to be fair, if Qwen3-Max is the “fire,” then at the recent Apsara Conference, the Tongyi team also unveiled many other “stars.”

Visual: Major Open-Source Release of Qwen3-VL

The first “star” emerging from Qwen3-Max is the visual understanding model, Qwen3-VL.

It was actually open-sourced early this morning, making it fresh off the press, but it has certainly been one of the most anticipated releases.

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Specifically, this model is named Qwen3-VL-235B-A22B, and it also comes in both Instruct and Reasoning versions.

The Instruct version achieves performance that matches or exceeds Gemini 2.5 Pro across multiple mainstream visual perception benchmarks, while the Reasoning version delivers SOTA (State-of-the-Art) results on various multimodal reasoning evaluation baselines.

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Additionally, the Instruct version of Qwen3-VL-235B-A22B supports visual reasoning (reasoning with images), showing improved scores across four benchmark tests.

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Upon seeing these results, netizens exclaimed:

Qwen3-VL is truly a monster (too powerful).

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Real-world test results are now available.

For example, when fed a hand-drawn webpage sketch, Qwen3-VL quickly generates the corresponding HTML and CSS:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Or consider this image:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

With the following task for Qwen3-VL:

Identify all instances belonging to the following categories: “head, hand, male, female, glasses.” Report bounding box coordinates in JSON format.

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Qwen3-VL also handles more complex video understanding tasks with ease:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

You can learn more through the video below:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Video link: https://mp.weixin.qq.com/s/nkNXwpDxxvFVleQ3yB-g5w

From a technical perspective, Qwen3-VL retains its native dynamic resolution design but features updated structural components.

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

First, it adopts MRoPE-Interleave. The original MRoPE divided dimensions in the order of time (t), height (h), and width (w), concentrating temporal information in high-frequency dimensions. Qwen3-VL interleaves t, h, and w to achieve full-frequency coverage, enhancing long-video understanding while maintaining image comprehension capabilities.

Second, it introduces DeepStack, which fuses multi-layer features from ViT to enhance visual detail capture and text-image alignment. The team expanded single-layer injection of visual tokens into the LLM to multi-layer injection, optimizing feature tokenization by tokenizing outputs from different ViT layers separately before inputting them into the model. This preserves multi-level visual information from low to high levels. Experiments show this design significantly improves performance across various visual understanding tasks.

Third, video temporal modeling has been upgraded from T-RoPE to a text-timestamp alignment mechanism. By interleaving “timestamps” with “video frames,” it achieves fine-grained alignment between frame-level time and visual content, natively supporting both “seconds” and “HMS” output formats. This improves the model’s semantic perception and temporal accuracy in complex sequential tasks such as event localization, action boundary detection, and cross-modal temporal question answering.

Although Qwen3-Omni was open-sourced early yesterday morning, it also made its debut at the Apsara Conference, highlighting its full-modal capabilities.

It is the first native end-to-end full-modal AI model, unifying text, images, audio, and video within a single architecture, achieving SOTA levels across 22 audio-video benchmarks.

The currently open-sourced versions include:

Qwen3-Omni-30B-A3B-Instruct
Qwen3-Omni-30B-A3B-Thinking
Qwen3-Omni-30B-A3B-Captioner

Furthermore, several specialized large models derived from Qwen3-Omni have been developed.

For instance, the newly released Qwen3-LiveTranslate is one such model—a visual, auditory, and vocal full-modal simultaneous interpretation system.

It currently supports offline and real-time audio-video translation across 18 languages.

Public test results show that Qwen3-LiveTranslate-Flash has surpassed models like Gemini-2.5-Flash and GPT-4o-Audio-Preview in accuracy:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Even in noisy environments, Qwen3-LiveTranslate-Flash maintains robust performance:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

To experience the specific effects, see the practical demonstration below:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Video link: https://mp.weixin.qq.com/s/nkNXwpDxxvFVleQ3yB-g5w

Original English: What is mask? This is mask. This is mask. This is mask. This is Musk.

Before Visual Enhancement: What is a face mask? This is a face mask, this is a face mask, this is a face mask, this is a face mask.

After Visual Enhancement: What is a mask? This is a facial mask, this is a face mask, this is a theatrical mask, this is Musk.

Netizens were visibly stunned:

It’s getting a bit creepy now.

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Beyond translation, the new version of Qwen3-Image-Edit—nicknamed “Qwen Banana”—is also a fascinating model.

It supports multi-image fusion, offering various combinations such as “person + person,” “person + product,” and “person + scene.” It has also enhanced single-image consistency for people, products, and text.

Moreover, it natively supports ControlNet, allowing users to change character poses via keypoint maps and easily fulfill outfit-changing requirements.

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Coding: Upgrade of Qwen3-Coder

The newly upgraded Qwen3-Coder-Plus employs a “combo” strategy, jointly training with Qwen Code and the Claude Code system.

This approach has significantly boosted its performance; compared to previous versions, scores have increased across all benchmark tests:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

At the same time, its associated coding product, Qwen Code, has also been upgraded to support multimodal models and sub-agents.

In other words, you can now input images when using Qwen Code:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

Netizens have already begun real-world testing. The 3D pagoda generated by Qwen3-Coder-Plus looks like this:

AIME'25 Perfect Score Explosion! Qwen’s Seven-Model Launch and Major Family Update

The Endgame for Qwen Is Not Just Open Source

To summarize, the highlights from this Apsara Conference…

First, from the day before yesterday until now, Alibaba’s Tongyi Qianwen has successively released and open-sourced nearly ten models of varying sizes, leaving industry insiders both domestic and international in awe of Alibaba Cloud’s open-source speed.

AIME'25 Perfect Score Explosion! Qwen Drops Seven Models at Once, Full Suite Major Update

That said, after listening to the speech by Wu Yongming, Chairman and CEO of Alibaba Cloud Intelligence Group, we realized that Tongyi Qianwen’s ambitions extend far beyond this.

Wu stated that achieving Artificial General Intelligence (AGI) is already a certainty, but this is merely the starting point; the ultimate goal is to develop Artificial Superintelligence (ASI) capable of self-iteration and comprehensive superiority over humans.

To achieve ASI, Wu outlined a path beginning with the internet, progressing through four stages:

The first stage is the emergence of intelligence (learning from humans), followed by autonomous action (assisting humans), then self-iteration (surpassing humans), and finally, Artificial Superintelligence (ASI).

AIME'25 Perfect Score Explosion! Qwen Drops Seven Models at Once, Full Suite Major Update

Furthermore, Wu offered a forward-looking perspective:

Large models will become the next-generation operating system, natural language will be the source code of the future, and AI Cloud will be the next-generation computer.

In the future, there may only be 5–6 super cloud computing platforms worldwide.

AIME'25 Perfect Score Explosion! Qwen Drops Seven Models at Once, Full Suite Major Update

However, one point must be clear: The stronger the AI, the stronger humanity becomes.

One More Thing

Oh, by the way, Tongyi Qianwen’s new generation of foundational model architecture—Qwen3-Next—was officially released today!

Its total parameter count is approximately 80B. However, with only 3B parameters activated, its performance rivals that of Qwen3-235B.

In terms of computational efficiency, it is simply on another level.

Compared to the dense model Qwen3-32B, its training costs have been reduced by over 90%, while long-text inference throughput has increased more than tenfold.

It must be said that the future of large model training and inference efficiency is about to become much more interesting.

References

https://qwen.ai/research

AIME'25 Perfect Score! Qwen's Seven-Model Release Sparks Major Family Update

Author Info

Visual: Major Open-Source Release of Qwen3-VL

Full-Modal: Open-Source Release of Qwen3-Omni

Coding: Upgrade of Qwen3-Coder

The Endgame for Qwen Is Not Just Open Source

One More Thing

References

Related News

Latest Headlines