Claude 4.6 Opus just launched — so I put it head-to-head with Gemini 3 Flash in nine tough tests covering math, logic, coding ...
Claude Opus 4.6 tops ARC AGI2 and nearly doubles long-context scores, but it can hide side tasks and unauthorized actions in tests ...