Google Assembles Elite Team to Build AI Scientists, Accelerating Scientific Discovery with Test-Time Compute

Frontier Models · Published: Feb 20, 2025 · Lin Mei Huang · ~7 min read

Author Info

Multimodal & Media AI Editor

M.F.A. Digital Media (RISD); former VFX pipeline technical director

Lin reports on image, video, and audio models with an eye toward rights, provenance, and creative workflows. She explains technical limits of generative media and highlights platform policy changes that affect commercial use. She collaborates with legal review on copyright-sensitive topics.

#Generative Media #Copyright & Licensing #Creative Workflows #Platform Policy

Full author profile →

CEO Sundar Pichai personally endorsed Google’s latest research, which has gained significant traction: the launch of an AI co-scientist designed for the scientific community. This multi-agent AI system leverages advanced reasoning to synthesize vast amounts of literature, generate novel hypotheses, and propose detailed research plans.

Key to this system is its use of test-time compute, similar to OpenAI’s o1 and DeepSeek-R1, to accelerate scientific discovery.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

According to Pichai, the system has already achieved preliminary results in critical research areas such as liver fibrosis treatment, antimicrobial resistance, and drug repurposing.

Capitalizing on this momentum, Google is also launching a “Trusted Test Program” for scientists worldwide, which is now open for applications.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

Additionally, the acknowledgments section reveals a formidable internal lineup at Google, including teams from Google Research, DeepMind, and Cloud AI, alongside scientists from top-tier universities who participated in testing.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

As netizens have noted, it is not far-fetched to imagine AI agents winning Nobel Prizes in the future.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

Literature Review/Hypothesis Generation/Reporting: An End-to-End Solution

Let’s examine the operation process of the AI co-scientist.

Its objective is clear: once a scientist provides a research topic in text, the system utilizes multiple AI agents to generate new research hypotheses, detailed research overviews, and experimental protocols.

The agents involved include but are not limited to:

Generation: Proposing new hypotheses or ideas
Reflection: Evaluating and analyzing generated hypotheses
Ranking: Prioritizing hypotheses
Evolution: Iteratively improving hypotheses
Proximity: Exploring fields adjacent to or related to existing knowledge
Meta-review: Supervising and optimizing the entire process

These agents iterate through automated feedback, generating, evaluating, and refining hypotheses to form a self-optimizing loop, ultimately outputting high-quality research proposals.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

Scientists can participate in the collaboration through several methods:

For instance, providing rough ideas or research topics/directions at the outset for the system to refine further; or offering natural language feedback on AI outputs to guide adjustments.

Additionally, scientists can utilize other tools, such as web search or specialized domain-specific AI models, to further enhance research quality.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

A deeper look at the collaboration process reveals that after a scientist proposes a research goal, a Supervisor Agent is responsible for task distribution.

Specialized agents (red boxes, with unique roles and logic);
Scientist input and feedback (blue boxes);
System information flow (dark gray arrows);
Feedback between agents (red arrows within agent internals).

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

As shown above, the operation of the entire AI co-scientist system is relatively straightforward.

Using Test-Time Compute for Scientific Discovery

The primary highlight of this research lies in:

Extending test-time compute to significantly enhance reasoning capabilities, including key reasoning steps such as self-play, hypothesis ranking, and evolution mentioned earlier.

In the study, the AI co-scientist utilizes an Elo rating system to measure and improve its ability to solve scientific problems.

Specifically, the Elo rating is a dynamic evaluation metric typically used to assess the relative skill levels of competitors in games. Researchers compared the average accuracy of the AI co-scientist (blue line) and Gemini 2.0 (red line) on the GPQA (General Problem-solving Question Answering) benchmark test.

The results showed that systems with higher Elo ratings could provide more accurate answers when solving challenging problems (positive correlation).

This indicates that Elo rating can serve as an effective metric for predicting and enhancing the performance of the AI co-scientist in solving scientific problems.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

Next, seven domain experts selected 15 open research goals and proposed corresponding optimal solutions.

The goal was to use automated Elo evaluation metrics to compare the performance of the AI co-scientist against other state-of-the-art agents and reasoning models (Gemini 2.0 Pro version and reasoning versions, as well as human experts) in solving these complex problems.

The results showed that the AI co-scientist performed significantly better than others.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

Furthermore, as the AI system invested more time in reasoning and improvement, the quality of its self-evaluation improved significantly, even surpassing traditional models and, in some cases, human experts who were unable to solve certain problems.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

Crucially, based on the analysis of a subset of 11 research goals, expert evaluations determined that the AI co-scientist outperformed other baseline models in terms of novelty and impact.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

To further assess the practical potential of the AI co-scientist for scientific discovery, the research team conducted a series of end-to-end laboratory experiments.

These experiments focused on three key biomedical areas: drug repurposing, proposing new therapeutic targets, and understanding the mechanisms behind antimicrobial resistance.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

During the experiments, the AI co-scientist received guidance and feedback from experts and covered a range of research scenarios from simple to complex.

Without further ado, let’s look at its preliminary achievements in detail.

First, drug repurposing is a strategy of using drugs already approved for other diseases to treat new conditions. This approach can shorten the time and cost of drug development and quickly bring effective treatments to patients.

In this instance, researchers focused on acute myeloid leukemia (AML). The AI co-scientist analyzed extensive biomedical data, including chemical structures, pharmacodynamics, and genetic information of drugs, to predict which existing medications might be effective against AML.

Ultimately, the AI co-scientist proposed three potential drugs. In further validation, the suggested drugs inhibited tumor viability in multiple AML cell lines at clinically relevant concentrations, proving their efficacy.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

More complex than drug repurposing is the identification of novel therapeutic targets, where researchers focused on liver fibrosis.

Treatment for liver fibrosis is a key focus in current liver disease research, and finding effective therapeutic targets is critical.

In this process, the AI co-scientist analyzed large volumes of biomedical data, including gene expression profiles, protein interaction networks, and known drug databases, to identify key molecules and biological pathways associated with liver fibrosis.

Comparing its proposed series of potential treatments against traditional fibrogenic inducers (as negative controls) and inhibitors (as positive controls), all drugs suggested by the AI co-scientist showed promising activity (p-value less than 0.01).

This suggests a high probability that these drugs could be effective in treating liver fibrosis.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

Finally, in the third experiment, by analyzing extensive genetic, protein structure, and drug activity data, the AI co-scientist was also found to be capable of identifying and predicting potential mechanisms of bacterial resistance.

Specifically, regarding capsid-forming phage-induced chromosomal islands (cf-PICIs), researchers utilized the AI co-scientist’s capabilities to explore the presence and functional mechanisms of cf-PICIs across various bacterial species.

By analyzing and integrating vast amounts of biological literature and data, the AI co-scientist independently proposed a new hypothesis:

cf-PICIs may expand their host range by interacting with tails of multiple phages.

This hypothesis was subsequently validated through laboratory experiments.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

However, it is important to note that while the AI co-scientist has achieved a series of preliminary results in scientific discovery, Google also highlighted its limitations:

Literature review capabilities, fact-checking, cross-checking with external tools, automated evaluation techniques, and larger-scale assessments all require further improvement.

Google's All-Star Team Builds an AI Scientist, Accelerating Scientific Discovery with Test-Time Compute

Researchers or teams interested in participating can now apply.

Google Assembles Elite Team to Build AI Scientists, Accelerating Scientific Discovery with Test-Time Compute

Author Info

Literature Review/Hypothesis Generation/Reporting: An End-to-End Solution

Using Test-Time Compute for Scientific Discovery

Related News

Latest Headlines