SenseTime's 'Daily New 6.5' Upgrade Marks AI's Leap from Tool to Human

Author Info

Lin Mei Huang

Multimodal & Media AI Editor

M.F.A. Digital Media (RISD); former VFX pipeline technical director

Lin reports on image, video, and audio models with an eye toward rights, provenance, and creative workflows. She explains technical limits of generative media and highlights platform policy changes that affect commercial use. She collaborates with legal review on copyright-sensitive topics.

#Generative Media #Copyright & Licensing #Creative Workflows #Platform Policy

Full author profile →

The ability to perceive and process multimodal information is a core requirement for Artificial General Intelligence (AGI) and the essential path from language models toward AGI.

From multimodal perception and reasoning to interaction, the evolution of multimodal intelligence will drive AI’s next phase of development.

On July 27, 2025, at the WAIC 2025 Large Model Forum titled “Boundless Love · Shaping the Future,” hosted by the Artificial Intelligence Committee of the All-China Federation of Industry and Commerce (ACFIC) and organized by SenseTime, SenseTime unveiled its new SenseNova V6.5 large model system. This release marks a breakthrough upgrade in multimodal foundation models, enabling AI to leap from being a “productivity tool” to becoming actual “productivity.” Additionally, SenseTime’s core product, SenseTime Little Raccoon (Xiao Huan Xiong), has completed an agent-based upgrade.

SenseTime's "SenseNova 6.5" Upgrade: Enabling AI's Leap from "Tool" to "Human"

In 1950, Alan Turing defined AI as “human-like capabilities” through the “Imitation Game.” However, practical AI has long remained confined to the category of “tools,” even experiencing periods of stagnation. In the era of large models, AI is gradually approaching the boundaries of AGI and truly moving toward “human-like” standards, thanks to breakthroughs in multimodal fusion capabilities.

Xu Li, Chairman and CEO of SenseTime and the first rotating chairman of the Presidium of the ACFIC Artificial Intelligence Committee, stated: “SenseTime has always sought to understand the essence of artificial intelligence. By leveraging technological innovation to unlock maximum intelligence, we are driving AI’s transition from a ‘tool’ to a ‘human,’ becoming true productivity.”

SenseTime's "SenseNova 6.5" Upgrade: Enabling AI's Leap from "Tool" to "Human"

SenseNova V6.5 Refreshed: Breakthrough Upgrades Touching the “Depth of Understanding”

SenseTime’s SenseNova V6.5 multimodal foundation model introduces three major breakthrough upgrades:

  • Strong Reasoning: Image-text interleaved multimodal chain-of-thought reasoning, with performance comparable to Gemini 2.5 Pro and Claude 4-Sonnet.
  • High Efficiency: Optimized multimodal architecture, improving cost-effectiveness by more than three times.
  • Agent Capabilities: Significant leadership in data analysis, supporting end-to-end scenario implementation and achieving a closed-loop of value creation.

By advancing from standard multimodal chain-of-thought data to synthesized image-text interleaved chain-of-thought data, SenseTime’s SenseNova V6.5 has achieved substantial improvements in multimodal reasoning and interaction performance:

SenseTime's "SenseNova 6.5" Upgrade: Enabling AI's Leap from "Tool" to "Human"

SenseTime’s SenseNova V6.5 has pioneered the image-text interleaved chain-of-thought technology, introducing visual thinking into large models. It is now the first commercial-grade large model in China to implement this capability.

In human cognition, visual and logical thinking are equally important; their organic integration forms comprehensive thinking abilities. As the saying goes, “a picture is worth a thousand words.” An image often triggers more effective thought than lengthy text. While current mainstream multimodal models have achieved the fusion of multiple modalities at the input stage, their reasoning processes still rely primarily on linguistic inference, leaving gaps in graphical and spatial reasoning.

The key to constructing multimodal chains of thought lies in the graphical representation of information. This is more challenging than pure text-based chains, as it requires not only presenting textual thinking processes but also generating images that serve as nodes in the reasoning chain—a task difficult to achieve at scale through manual annotation alone. SenseTime’s R&D team first constructed seed data based on an understanding of the thinking process. After supervised fine-tuning (SFT), the model initially acquired the ability to think with interleaved text and images. Subsequent rounds of reinforcement learning significantly enhanced its multimodal reasoning capabilities.

SenseTime's "SenseNova 6.5" Upgrade: Enabling AI's Leap from "Tool" to "Human"

Simultaneously, SenseTime has improved the fusion architecture of its multimodal models to promote early cross-modal integration. The new architecture employs a significantly lighter visual encoder and a deep, narrow backbone model. This design allows visual representations to align and merge with language during the early stages of feedforward computation, resulting in more efficient perception and deeper modal fusion.

Thanks to these architectural improvements, SenseTime’s SenseNova V6.5 has increased pre-training throughput by over 20%, reinforcement learning efficiency by 40%, and inference throughput by more than 35%, achieving an optimal balance between performance and cost. Compared to SenseNova V6.0, the cost-effectiveness of SenseNova V6.5 has tripled.

SenseTime's "SenseNova 6.5" Upgrade: Enabling AI's Leap from "Tool" to "Human"

AI as a Productivity Driver: Sensetime’s Raccoon Agent Takes Center Stage in Office Work

Large language models have become auxiliary tools for many professionals today. However, relying solely on large language models is insufficient to elevate AI from a mere “tool” to an autonomous “agent.”

Human daily tasks inherently involve processing multimodal information, including text, images, video, and web pages. The key transition from a productivity tool to actual productivity lies in the ability to input, process, and output this multimodal data.

Leveraging the powerful multimodal data analysis capabilities of its “SenseNova V6.5” model, Sensetime’s Raccoon agent has undergone a comprehensive upgrade. It can now handle complex multimodal inputs, perform deep fused analysis across modalities, and deliver professional visual outputs. This evolution establishes “AI productivity in office scenarios,” enabling AI to leap from being a “productivity tool” to becoming actual “productivity.”

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

Simultaneously, Sensetime Raccoon maintains world-leading capabilities in complex data analysis. In comprehensive customer scenario tests, Raccoon achieved performance levels comparable to Claude 4 Opus, an international benchmark for data analysis and AI agents, significantly outperforming models such as OpenAI’s o3. Specifically, its accuracy approaches 100% in tasks involving time-series calculations, data matching, mathematical computations, and anomaly detection.

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

In real-world office environments, data input formats are highly complex. In data analysis scenarios, documents come in various forms such as screenshots, Word files, and PDFs, with structured information and tables accounting for only about 70% of the content. Even seemingly basic Excel spreadsheets often contain complex elements like merged cells, missing values, nested sub-tables, and embedded charts, significantly increasing processing difficulty.

Sensetime Raccoon employs a multimodal mindset to achieve holistic analysis. Through chain-of-thought reasoning, it engages in multi-step thinking and reflection before outputting structured results.

In reality, while a table may appear simple, the underlying logical causality can be intricate. Sensetime Raccoon simplifies these complex tables for users.

When a user uploads a complex Excel file containing merged cells, missing values, sub-tables, embedded charts, and external images, Raccoon accurately parses the content, establishes logical connections between sub-tables, and generates a complete analysis report.

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

In another complex input scenario, a small business owner might encounter useful tabular data while browsing video platforms like Douyin (TikTok). After taking a screenshot and uploading it, Raccoon decomposes the task using image information, filters out noise, extracts the table data, and allows users to export an editable Excel file with one click. Throughout this process—from input to analysis to output—multimodal capabilities ensure smooth execution.

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

Traditional AI tools mostly play an auxiliary role, with core work still driven by the user. Sensetime Raccoon, however, has upgraded this interaction paradigm: the AI proactively takes on core tasks and interacts through precise questioning and confirmation of key information, mimicking a collaborative workflow between colleagues.

By taking initiative in core tasks and confirming details via precise questions, the interaction logic resembles professional collaboration.

The newly launched task planning feature offers an intuitive interaction mode. Taking the recent surge in popularity of the “Scottish Premiership” (Su Chao) as an example:

When a user uploads an image or table requesting an analysis of the top players in the Scottish Premiership, Raccoon automatically gathers online information and leverages expert knowledge to generate a task list (such as defining criteria for “Top 5” players or analyzing youth academy performance). It then conducts a systematic analysis to produce a high-quality document, which can be exported into editable formats like Excel, PPT, or HTML. The overall process is as follows:

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

Let’s examine the breakdown steps: Upon receiving a task, Raccoon proactively organizes the details and asks clear questions at key nodes (e.g., “Should we proceed according to points 1, 2, and 3?”) to ensure the direction is correct. This truly realizes an efficient model where “AI handles the work while users make decisions.”

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

Next, it generates a task list based on expert knowledge (such as determining “Top 5” standards or analyzing youth training results) for systematic analysis. The next steps and potential collaborative approaches become clear at a glance:

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

Professional data integration and tool invocation ensure high-quality content generation:

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

Finally, it generates a high-quality analysis document that can be exported into editable formats such as Excel, PPT, or HTML:

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

With its strong capability in handling complex tasks, Sensetime Raccoon is accelerating industry penetration. This update introduces specialized versions for two specific sectors: Education and Finance.

The Sensetime Raccoon Education Edition can intelligently analyze student performance, course effectiveness, and learning behavior patterns. It currently serves over 500 institutions across more than 10 educational scenarios, impacting over 250,000 teachers and students. It helps improve student learning efficiency by 15–30%, assists teaching research teams in reducing academic anxiety rates by 40% in multiple schools, increases classroom participation by 2.1 times, reduces resource mismatch rates by 30%, and improves the timeliness of mental health interventions by 50%.

The Sensetime Raccoon Finance Edition provides financial institutions with knowledge assistants, intelligent data querying tools, and multimodal smart claims solutions, establishing a new paradigm for “human-machine collaborative” intelligent decision-making in the financial sector.

To date, the “Sensetime Raccoon Family” product matrix serves enterprises across multiple industries, with its user base exceeding 10 million.

Sensetime's SenseNova 6.5 Upgrade: Enabling AI’s Leap from Tool to Agent

By activating AI productivity through multimodal technology, the Sensetime SenseNova large model will continue to evolve, partnering with industries to embark on the next stage of AI advancement and accelerating the journey toward the era of Artificial General Intelligence (AGI).