Gemini's New Version Tops Arena Leaderboard, But Is Jailbroken Shortly After Release
Multimodal failures and model personality issues: Interaction styles during code debugging and security jailbreak cases spark reflections on product reliability.
JustGhostIt — Frontier Models archive, page 3.
Multimodal failures and model personality issues: Interaction styles during code debugging and security jailbreak cases spark reflections on product reliability.
At Google I/O, the entire Gemini suite was upgraded. Gemini 2.5 lines, video (Veo), image models, and developer APIs received high-density updates on the same day.
Gemini 2.5 family GA and Flash-Lite introduce segmented SKUs along the latency-cost curve to target enterprise and edge scenarios.
Anthropic releases Claude 4 (Opus/Sonnet), focusing on long-horizon autonomous tasks and coding scenarios, while simultaneously raising API pricing tiers.
Search experience revolution: 'AI Mode' expands in the US, coexisting with traditional link lists, sparking debate over traffic distribution and ad landscapes.
OpenAI releases reasoning models like o3 and o4-mini, emphasizing tool use and scientific/programming logic. Third-party reproductions and hallucination rates spark debate.
BAAI releases a robot brain-cerebellum collaboration framework and open-source ecosystem, advancing real-world data integration and swarm intelligence narratives.
Google enhances Gemini 2.x with advanced reasoning and search integration, iterating on its 'thinking' multimodal model while expanding generative capabilities across search, developer tools, and cloud services.
Lightelligence reveals new photonic-electronic hybrid computing hardware, aiming to overcome energy efficiency bottlenecks in matrix operations for AI.
Open-source video generation breakthrough: Domestic teams and the global open-source community advance billion-parameter models and low-cost training, narrowing the gap with closed commercial solutions.