Models & Benchmarks

Releases, papers, SOTA benchmarks

Page 3 of 7 · All topics

Filter by category

Models & Benchmarks Agents & Coding Media & Embodied AI Industry & Startups Chips, Compute & Policy

Models & Benchmarks Sep 18, 2025 · Marcus Reeves · ~6 min read

China's AI Model Lands on Nature Cover! DeepSeek Reveals R1 Training Costed Just $2 Million

DeepSeek R1's cover feature in Nature sparks debate over training costs, scientific evaluation, and discourse power.

Models & Benchmarks Sep 12, 2025 · Amara Okonkwo · ~4 min read

2025 The Bund Summit: Ant Group and Renmin University Release First Native MoE Diffusion Language Model, Open Source Coming Soon

Ant Group and Renmin University unveil a diffusion language model roadmap, exploring the potential paradigm of combining Mixture-of-Experts with diffusion for language modeling.

Models & Benchmarks Aug 14, 2025 · Lin Mei Huang · ~4 min read

European OpenAI Faces Backlash Over Alleged DeepSeek Distillation and Fabrication

A European version of OpenAI is under fire for allegedly distilling data from DeepSeek and fabricating results, leading to a significant credibility crisis.

Models & Benchmarks Aug 09, 2025 · Yuki Tanaka · ~7 min read

"Give Us Back GPT-4o!" OpenAI's Push for GPT-5 Sparks Outrage, Prompting Emergency PR

GPT-5 pricing and price wars: API rates seen as pressure on competitors, sparking debate over 'commoditization' of models.

Models & Benchmarks Aug 08, 2025 · Priya Sharma · ~9 min read

GPT-5 Is Here! Free for All, Powered by the Simplest Interface

OpenAI launches GPT-5 with unified routing and fast-response modes to capture the default chat entry point. Day-one routing glitches spark user experience controversies.

Models & Benchmarks Aug 02, 2025 · Marcus Reeves · ~10 min read

Winning an IMO Gold Medal with Prompt Engineering: Tsinghua Alumni Uncover New Findings, Proving Academia Can Rival Tech Giants Without Heavy Spending

Google's IMO-level math reasoning model is now open for trial. Its performance at the Math Olympiad level has become a focal point of summer discussions.

Models & Benchmarks Jul 31, 2025 · Amara Okonkwo · ~6 min read

4B Small Model Surpasses Claude 4 in Math Reasoning; 700-Step RL Training Approaches 235B Performance | HKU, ByteDance Seed & Fudan University

A 4B-parameter model's math reasoning via reinforcement learning rivals large models. This 2025–2026 AI industry extension topic (batch 2 archived) complements the main timeline.