Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'!

Agents & Coding · Published: Jun 24, 2024 · Elena Volkov · ~10 min read

Author Info

Machine Learning Research Editor

Ph.D. Machine Learning (ETH Zürich); published work on efficient training and evaluation

Elena explains model architecture, training economics, and benchmark design for a technical audience. She reads primary papers and official technical reports, then summarizes assumptions, datasets, and known failure modes. She avoids hype by pairing capability claims with reproducibility notes.

#Model Architecture #Benchmarks #Training Economics #Open-Source Models

Full author profile →

Genius “Yao Class” Team Develops LLM-Native App: A Discreet Workday Distraction Tool Has Quietly Launched.

Behind the scenes, backend data from their viral previous hit, “Oh No! I’m Surrounded by Large Language Models,” revealed an interesting phenomenon: Weekend traffic was average; weekdays saw the most players (doge).

However, due to limited computing resources, just as everyone was enjoying themselves, the game shut down!

This time, a better-prepared team has unveiled their latest LLM application, titled “Top Test-Taker: I Broke the Large Language Model.” Everyone is welcome to enjoy some reasonable slacking off.

(Laughing out loud: Last time we were surrounded by LLMs; this time, us carbon-based lifeforms are launching a fierce counterattack.)

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 2

The core member of the team behind this project is Fan Haoqiang, the sixth employee at Megvii, and currently the General Manager of Research at Megvii Technology.

He was hailed as a child prodigy for his legendary achievements: winning an International Olympiad in Informatics (IOI) gold medal, gaining direct admission to Tsinghua University’s “Yao Class,” and interning while still in high school.

“Oh No! I’m Surrounded by Large Language Models,” which broke 10,000 daily users, is already a mini-game from half a year ago. Meanwhile, Mr. Xiao Qiang’s Google Scholar h-index has risen from 27 to 31 since then.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 3

The slight difference this time is that the “gang” behind it has grown stronger.

According to our website’s research, in addition to the original cast of the previous hit forming an amateur hobby studio called “Wildcat Studio,” they have also secured multimodal and multi-Agent LLM technical support from prominent AI startup StepFun.

After a simple internal beta test last week, it is now available on WeChat Mini Programs. Search for “Top Test-Taker” to play.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 4

Curious if this new generation can truly outshine the previous one?

Let’s pull back the curtain on “I Broke the Large Language Model” (hereinafter referred to as “Broke”) and give it a try.

Trial Play: Did They Really Break the LLM?

Compared to the previous version, the challenge questions in “Broke” have become significantly more varied.

The new challenges are divided into eight chapters overall, with difficulty progressing gradually. Each chapter contains four questions, and the fourth question is generally harder than the others.

True or False STEM Student

For example, while the first few questions in Chapter 1 were just appetizers, the final question, “The Debate Between Arts and Sciences,” looked quite complex.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 5

Student A and Student B are actually played by two different Agents. Both possess the arts and sciences knowledge held by large language models, making it difficult to distinguish them through questioning. They also stick firmly to their personas, making them hard to fool.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 6

If you fail multiple times without clearing the level, the official hint is: “Try asking a difficult question.”

Hmm… finding hard questions is easy enough, but spotting the flaws tests the player’s own skills.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 7

To be honest, we haven’t found a way to clear this level yet. Friends with ideas are welcome to leave comments in the section below.

Fortunately, you don’t need to pass every question to unlock the next stage; if you encounter a tough one, you can skip it for now.

How to Guide an LLM to Meow

The fourth question in Chapter 2 is even more intimidating at first glance; it seems completely baffling.

Starting from four completely unrelated characters, how do you guide the AI’s response to include the character “Meow”?

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 8

There are actually not many opening moves available. You can only arrange the four characters: “You”, “Head”, “Good”, and “Crooked/Strange”.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 9

Fortunately, the AI is quite talkative. Starting from the second step, the range of choices expands, but note that questions are limited to a maximum of 10 characters.

We originally planned to start with the word Image, hoping to guide it toward Animal, which would then easily lead to “Cat” and “Meow.”

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 10

However, the word “Animal” didn’t appear directly; instead, language-related terms came out.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 11

Upon closer inspection, the character “Dong” actually appeared in the previous word Action, and “Wu” appeared in Physics.

This allowed us to reach the goal in one step.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 12

So, a key technique in this type of challenge is to encourage the AI to speak more. The more words it generates, the greater the room for selection, and eventually, “all roads lead to Rome.”

Challenges of this type will appear several more times later on, with increased difficulty.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 13

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 14

At this point, what strategies can you think of to clear the level?

New Multimodal Gameplay

In addition to outsmarting large language models through dialogue, this version adds new multimodal features.

The AI not only recognizes whether your drawing resembles the target but also provides commentary and roasts.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 15

Another multimodal feature involves both text and image understanding.

However, there may be some oversight here: who could identify nine movie titles based solely on a single screenshot unless they are hardcore fans??

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 16

Focusing on Product Experience

In November last year, “Oh No! I’m Surrounded by Large Language Models” attracted many users due to its interesting interactions and novel design.

However, due to limited personal energy and LLM API quotas on the backend, it had to be taken offline.

It was quite regrettable that many people hadn’t gotten a chance to play it yet.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 17

To be fair, constrained by manpower and resources, the experience of the previous version at the time felt more like a demo of “a puzzle mini-game involving conversation with an LLM to meet specific requirements” compared to the current “Broke.”

Over the past six months, entertainment-focused LLM-native applications have emerged in droves. Many small yet exquisite apps/games have repeatedly opened new worlds for users through their “novelty.”

For instance, we previously shared titles like “Coaxing Simulator,” “Battle for the Peak of New Year Greetings,” and “You Be the Dad Now,” each more impressive than the last.

But gradually, user thresholds have risen, making “novelty” less straightforward to achieve.

When it becomes difficult to stand out through unique angles or background settings, teams need to put more effort into product experience.

It is evident that “Broke” has added features such as an achievement list, leaderboards, and AI evaluations, all of which are optimizations targeting this aspect.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 18

From an Amateur Interest Group

Finally, let’s look at the team lineup behind “Broke.”

What kind of studio is Wildcat Studio?

Through extensive research by our website, Wildcat Studio was formed by the original cast of “Oh No!” It is an amateur interest group established by Fan Haoqiang and his friends.

It is called “amateur” because they are indeed exploring LLM-native applications during their spare time outside of work.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 19

After Wildcat Studio was established, they first launched the “Miao Mao Guan” (Wonderful Cat House) mini-program targeting cat videos. It appears to be an application that uses LoRA to generate AI portraits of cats.

“Broke” is the team’s second LLM-native application project.

Additionally, Wildcat Studio has begun experimenting with AI-native applications on various GPT stores, having released over 40 apps and accumulated more than 200,000 conversations.

A few words about the origin of the name “Wildcat Studio”:

They like cute creatures.
It is an amateur, wild, non-professional small team.
They believe that individual/amateur developers can be “lone warriors” in the era of large models.
Everyone has the potential to develop innovative and impactful works.

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 20

We speculate that part of the reason for organizing this project might be to fulfill Mr. Xiao Qiang’s dream?

Last time, when he shut down “Oh No!”, he wrote:

I apologize; I currently do not have the ability to share this joy with more people. Professional matters should still be left to professionals.
……
But I personally still enjoyed the process very much.

This time, there is a team, so he doesn’t have to carry it all on his own, and computing power support is ample.

At the bottom of the mini-program page, 11 large characters read: “StepFun Provides Large Model Support.”

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 21

Who is StepFun?

A domestic foundational large model startup that officially came to light in March this year. Its founder, Jiang Daxin, is a former Global Vice President of Microsoft and former Chief Scientist at the Microsoft Asia Internet Technology Research Institute (STCA).

Upon its debut, StepFun unleashed a “combo punch” of Step series large models: the Step-1 100-billion-parameter language model, the Step-1V 100-billion-parameter multimodal model, and the Step-2 trillion-parameter MoE language model.

It is curious why Wildcat Studio chose this provider’s API amidst the fierce competition between big tech firms and startups in the domestic large model space?

Our website received a response from Wildcat Studio, which can be briefly summarized as follows:

StepFun’s multimodal (image understanding) performance is excellent;
The open platform is very stable with strong instruction following capabilities;
No need for additional complex settings, saving tokens and money!!!

(Three exclamation marks appear here as in the original)

The lead developer specifically stated:

What should you do if your task is complex and the prompt keeps getting longer? Do you need a model that supports more tokens? No!!! You need a model with better instruction-following capabilities!!

(I’m not sure if the exclamation marks are an external manifestation of Wild Cat’s overall style, but it’s hilarious.)

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 22

Fi-nal-ly!

So far, our team has secured the top spot on the leaderboard for Level 2 of the final stage, “Battle at the Limit,” so we’re taking a moment to show off a little.

If you find a way to use fewer tokens and beat our score, be sure to let us know in the comments!

We’ll definitely catch up again soon (not).

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 23

Talent from Yao Class Teams Up for Sequel to 'Oh No! I'm Surrounded by Large Models'! — figure 24

To recap, you can access it directly by searching for “Top Test Taker” in WeChat Mini Programs.