4B Small Model Surpasses Claude 4 in Math Reasoning; 700-Step RL Training Approaches 235B Performance | HKU, ByteDance Seed & Fudan University
A 4B-parameter model's math reasoning via reinforcement learning rivals large models. This 2025–2026 AI industry extension topic (batch 2 archived) complements the main timeline.