BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most Search-Savvy'

Models & Benchmarks · Published: May 23, 2024 · David Kowalski · ~12 min read

Author Info

Developer Tools & Agents Editor

15+ years software engineering; maintainer of internal agent-evaluation playbooks

David tests coding agents, IDE integrations, and terminal workflows the way working teams use them. He documents prompts, environment pins, and regression cases so readers can compare tools fairly. When vendors sponsor access, he discloses it and keeps scoring criteria unchanged.

#Coding Agents #IDE Integrations #Developer Productivity #Tool Comparisons

Full author profile →

The seeds planted by Wang Xiaochuan during the search era have blossomed again in the age of large language models.

His startup, Baichuan Intelligence, has just released its first AI application for consumers: Baixiaoying.

At first glance, it appears to be another mainstream AI assistant. However, the company emphasizes that this assistant is unique because it understands search and employs guided questioning techniques.

Indeed, combining “search” with “Wang Xiaochuan” naturally sparks curiosity.

Baixiaoying can answer user questions at any time, rapidly read documents, organize materials, and assist in content creation. It also possesses capabilities such as multi-turn search and targeted search, enabling it to more accurately understand and meet user needs.

Baichuan explained that equipping the model with professional search skills is intended to “provide users with professional, rich knowledge and resources.”

Furthermore, it supports voice interaction.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 1

We have tested the app. Users can now download Baixiaoying from the iOS App Store, Android application markets, or Baichuan Intelligence’s official website. Alternatively, it is available for free use via the Ying.ai web interface.

Behind Baixiaoying lies Baichuan 4, Baichuan Intelligence’s newly unveiled next-generation foundational large model. The ability to interact via voice hints that this new model possesses multimodal capabilities.

It hit the market immediately upon release, competing in SuperCLUE (a comprehensive Chinese evaluation benchmark for general large models). It set a new domestic record with a total score of 80.64 and narrowly defeated GPT-4-Turbo-0125 by 1.51 points in the comprehensive Chinese ability test.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 2

By unveiling Baichuan 4 alongside Baixiaoying—an AI assistant that understands search and asks questions—Baichuan Intelligence has taken another step toward its planned super model and super application. This year, the company broke from its previous monthly update rhythm, holding back a major release…

How to Use Baixiaoying?

Baixiaoying is Baichuan Intelligence’s first AI application launched since its establishment over a year ago.

Like most AI assistants, it is a generalist, possessing foundational capabilities in long-text reading and multimodal understanding.

However, let us first examine the feature that distinguishes Baixiaoying from all other AI assistant products: it understands search and knows how to ask questions.

What does “understanding search” entail? By integrating Baichuan 4’s general capabilities with Baichuan’s search technology, Baichuan demonstrated three ways Baixiaoying demonstrates this proficiency.

First, it performs targeted searches.

When given a question, Baixiaoying precisely identifies the domain of the problem and directly extracts key information from authoritative sources to enrich its output.

The focus is on speed and accuracy.

Second, it supports multi-turn searches.

For every question, Baixiaoying breaks it down step-by-step to determine what the user truly seeks behind the query before providing a key answer.

Compared to single-turn search, multi-turn search effectively gathers more professional and in-depth information in complex scenarios such as market research and industry analysis.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 3

Third, it provides embedded search results.

Simply put, unlike other AIs that merely “summarize webpage information” after a single search call, Baixiaoying directly applies search results as viewpoints and arguments within its Q&A responses.

“Something like Perplexity is called summarizing search results. We believe that direction should be the work of Search 2.0, which search engine companies can handle themselves; it is not what we aim to do.”

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 4

Additionally, when outputting information, Baixiaoying emphasizes structured output.

It presents key information in a format of description plus tables, making answers clearer and highlighting main points at a glance.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 5

Empowered by technologies such as multi-turn, targeted, and embedded search, the result is an assistant that understands search.

The team explicitly acknowledges that integrating models with search technology can indeed improve answer accuracy and reduce hallucinations. However, technology alone is insufficient; product design must provide supporting solutions.

From initial hands-on experience, the benefits of “understanding search” include improving the accuracy and timeliness of AI outputs while reducing hallucinations. It also enriches answers with viewpoints, case studies, and data, making model responses more comprehensive and vivid.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 6

Having discussed Baixiaoying’s search capabilities, let us look at how it “asks questions.”

Non-professional prompt engineers typically provide vague and imprecise descriptions of their needs when conversing with AI.

Baixiaoying’s questioning feature is designed for this scenario. Based on the user’s initial query, it guides users step-by-step through questioning, helping them clearly articulate their requirements.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 7

These design philosophies and technical implementations for Baixiaoying aim to lower the barrier to entry for ordinary people using AI assistants, making it very user-friendly.

Ultimately, however, the goal is to provide answers that users truly find useful.

As mentioned earlier, Baixiaoying handles long-text reading and multimodal understanding with ease. Below are a few showcases; you can try them out yourself (finding bugs and testing limits is perhaps the most anticipated activity in the era of large models).

The multimodal test results were quite good; it accurately identified a half-face sculptural bust in a museum:

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 8

Its long-text capability allowed it to pass the test of reading financial reports smoothly:

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 9

After a thorough experience, it feels that Baixiaoying has many applicable scenarios, including daily conversation, office work, search and learning, and multimodal recognition.

However, Baichuan Intelligence’s Founder and CEO boldly stated that this is not the “super application” he previously mentioned would be launched.

Currently, there are neither super models nor super applications in the market.

In his words, Baixiaoying is currently an AI assistant, serving as an intermediate stage where user applications transform from “tools” into “partners” in the age of large models.

The entire process is one of gradual development and gradually meeting user needs.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 10

Behind the Scenes: Baichuan 4 Takes the Top Spot in Its Debut

As mentioned at the beginning, behind Baixiaoying is Baichuan 4, the latest version of Baichuan Intelligence’s Baichuan series models.

It is also the company’s first multimodal model since entering the large model arena.

Compared to its predecessor, Baichuan 3 (released in late January this year), Baichuan 4 shows significant improvements across various capabilities.

Specifically, instruction following improved by 20%, information understanding by 9%, knowledge Q&A by 15%, creation by 16%, and logical reasoning by 15%. In specialized abilities, mathematics improved by 14% and coding by 9%.

In its debut match on the SuperCLUE comprehensive benchmark, which has long been dominated by OpenAI, Baichuan 4 took first place:

With a total score of 80.64, it surpassed the previous top-ranked model by 0.61 points.

It was indeed a narrow victory…

However, although the margin is small, in the era of large models, even a difference of 0.01 points is considered precious.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 11

How was this achieved?

During the training process, Baichuan 4 introduced technical optimization methods, including collaborative data filtering and optimization based on model-based and human-based approaches. It also utilized a scientific Scaling Law for positional encoding in long-text modeling, effectively enhancing the model’s utilization of data.

In the alignment phase, the team focused on optimizing Baichuan 4’s Reasoning, Planning, and Instruction Following capabilities. This was achieved through loss-driven data selection and training, multi-stage progressive improvement, and multi-model parameter fusion.

Furthermore, the team proposed a Sequential Preference Optimization (SPO) method during this stage. By sequentially fine-tuning LLMs to align with multiple dimensions of human preference, key metrics and model stability were significantly improved.

They also broke through RLHF and RLAIF integration via the RLxF reinforcement learning alignment technology, greatly enhancing the model’s instruction-following abilities.

Additionally, Baichuan 4 possesses industry-leading multimodal capabilities, performing excellently on evaluation benchmarks such as MMMU, MMBench-EN, CMMMU, MMBench-CN, and MathVista, outperforming multimodal models like Gemini Pro and Claude 3 Sonnet.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 12

However, Baichuan 4 did not take the open-source route this time.

Wang Xiaochuan responded generously to this decision: “Last year, we took the lead in opening source as a pledge of commitment for the Baichuan team entering the large model space. At that time, the domestic open-source environment was very immature. Our initiative in open source made an important contribution to the domestic open-source industry. Now, there are many players competing in the open-source field.”

Friends, you must believe in market regulation mechanisms—said Wang Xiaochuan.

Although Baichuan 4 is closed-source, API supply remains available.

After opening the new generation of foundational models to the public, they simultaneously released four model APIs: Baichuan 4, Baichuan3-Turbo, Baichuan3-Turbo-128k, and Assistant API.

They are also divided into Flagship and Professional tiers. The Flagship tier fully opens all capabilities of Baichuan 4; the Professional tier offers Baichuan3-Turbo, which is more affordable than the Flagship version, performs better than Baichuan 2, and has been specifically optimized for high-frequency enterprise application scenarios.

Interestingly, although the Assistant API is also open for free trial by enterprise users, Baichuan’s stance on the recent intense large model price war was clear:

What? Price wars? We decline.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 13

First, Wang Xiaochuan clarified that their primary focus is on the to-C (consumer-facing) market. Cloud providers’ price

Grappling matches have little impact on Baichuan.

Secondly, he holds a firm stance, believing that intense competition is inevitable in the future but noting that current tactics are too aggressive. “In the Chinese market, providing API services is not a viable path for startup companies.”

Wang Xiaochuan further elaborated on his viewpoint:

“If we look at it purely from a business perspective, China’s current commercial environment means the To-B market is roughly ten times smaller than the To-C market. Such a disparity does not exist in the United States;

Secondly, when analyzing data, you find that while revenue is collected in RMB, computing power costs are incurred in USD. This highlights another significant difference between the Chinese and American API service markets.”

For Baichuan Intelligence, there is internal consensus on one point: it is essential to pursue differentiated strategies.

“Simply competing on price might give leading startups an advantage through low-cost models, but relying solely on low prices as a competitive edge is insufficient for market success.”

Why does the first To-C product look like this?

Last year, Wang Xiaochuan made several public statements suggesting that 2024 could see the emergence of several “super apps” built on large language models.

Predictably, this seems to be the prevailing trend; many LLM startups in the industry have already launched their To-C offerings.

However, Baichuan Intelligence, which remains calm amidst price wars, adopts a measured and gradual approach to launching applications.

Wang Xiaochuan smiled and said, “I don’t think Baixiaoying was released too late; on the contrary, I believe it was released too early. I think model applications require more time for refinement.”

He noted that an app with millions of Daily Active Users (DAU) is still far from earning the title of “super application.” Previously, companies released apps primarily to showcase their models, but now users often remain unclear about what these apps actually do.

The entire industry has not yet reached a mature state.
Having previously developed input methods, search engines, and browsers, we deeply understand that there is an optimal timing for when an application evolves into a widely used product.

Therefore, whether Baixiaoying’s debut was early or late is irrelevant; Baichuan Intelligence simply chose the right moment to introduce it to the industry, allowing the team to operate it more concretely.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 15

Before aiming for super applications, Baichuan Intelligence deployed Baixiaoying as the first warrior to face scrutiny on the battlefield, a decision rooted in logic.

As mentioned earlier, Baichuan believes that unlike products in the information age defined by their tool-like attributes, large models create new species.

Transforming AI from a tool into a partner means building an AI assistant based on large models is akin to “creating a human being.”

Just as humans can use tools, think, listen, read, see, and write, AI assistant products should possess corresponding capabilities as model performance continuously improves.

Search serves as the most critical tool for current large models. It not only enables real-time access to the latest information but also effectively mitigates hallucination issues, making it a key technology for LLMs and a primary exploration direction for Baichuan Intelligence. When releasing Baichuan-53B last year, the team already introduced the concept of search enhancement, with RAG (Retrieval-Augmented Generation) technology remaining at the forefront.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 16

Based on this philosophy, Baichuan chose an AI assistant proficient in search to fire the first shot in its To-C scenario.

Thus, the distinct “Baichuan flavor” embedded in Baixiaoying is immediately apparent:

An intermediate state of AI partnership + Baichuan model advantages + Sogou’s deep search expertise + accumulated past product experience.

BaiChuan's New Model Tops Chinese Benchmark; First AI Assistant 'Bai Xiaoying' Launched as 'Most … — figure 17

Regarding the future after the debut of its first application, Wang Xiaochuan hinted at a few details.

The vision is naturally to achieve super models and super applications that are reliable and must integrate with search.

The breakthrough point should be enabling AI to act like professionals in various industries, incorporating the data density and cognitive depth specific to those fields to ensure usability.

As for the direction of future iterations—well, Wang Xiaochuan kept it a secret; no matter how he was pressed, he refused to say more.

However, during the post-launch communication session, he inadvertently let some clues slip!

He mentioned that one reason AI assistants need to ask questions is to accumulate capabilities for future super applications. He gave an example: “If you go to a doctor and say you have a fever, and the AI directly gives you a result, that would certainly be unfeasible.”

He also referenced a recent interview with Geoffrey Hinton, Turing Award winner and mentor of Ilya Sutskever, in which the veteran stated that healthcare is the most promising application area for AI (he suggested everyone watch this interview).

Healthcare. Indeed, this line of thinking is very much in Wang Xiaochuan’s style.

For now, let us simply try using Baixiaoying and wait a little longer…

References

Intensifying Competition in China’s Domestic Large Models: Baichuan Intelligence’s “Baichuan4” Gets First Full-Network Test, Refreshing SuperCLUE Chinese Benchmark with a Total Score of 80.64 — Release of Baichuan4’s SuperCLUE Chinese Benchmark Evaluation Results