Semgrep: GLM 5.2 beats Claude in our Cyber Benchmarks
Among models given nothing but a prompt, the best open-weight option beat Claude Opus 4.8.
We ran a set of popular open-source models against our IDOR benchmark, the same dataset and the same prompt we've used to evaluate frontier coding agents. The result surprised us: GLM 5.2, an open-we… [+13071 chars]